Difference between revisions of "Inline Assembly Language"
Chris Tyler (talk | contribs) (Created page with 'Category:SPO600-future{{Chris Tyler Draft}} '''Inline Assembly''' is assembly language code which is embedded in a program written in another language, …') |
(No difference)
|
Revision as of 23:42, 27 January 2014
Inline Assembly is assembly language code which is embedded in a program written in another language, typically C.
In open source software (especially in a Linux context), this is most commonly done using gcc, but inline assembler is also supported by llvm/clang, the Intel C compilers, Microsoft Visual Studio, and various other tools. Here we're going to focus on gcc.
Contents
Basic Syntax
Inline assembler is included in a GCC source file in one of these two forms:
asm(...); __asm__ (...); // Those are double underscores!
If it is critical that the code not be moved by the compiler (for example, as the result of an optimization), include the keyword volatile
:
asm volatile (...); __asm__ __volatile__ (...); // Double underscores all over the place
Inside the parenthesis, there are up to four sections separated by colons. There are:
- The assembler template (mandatory)
- Output operands (optional)
- Input operands (optional)
- Clobbers (optional)
Assembler Template
The assembler template is a piece of assembler code that is pre-processed to fill in register assignments. Registers may be referenced as %0, %1, %2 and so forth, indicating the registers mentioned in the output operands and input operands. For example, if there is one output operand and two input operands, you can refer to the register containing the output operand as %0
and the input operands as %1
and %2
.
Because % is used as a prefix for register numbers, a double percent-sign must be used to represent a single percent sign in the code. For example, in x86_64 gas assembler, the rax register is written as %rax
-- but in a template, it must be written as %%rax
.
The template is written as one or more strings enclosed in quotes, with no separator other than whitespace between the strings. Individual statements in the asm code must be separated by semi-colons (;) or explicit newline characters (\n). The sequence \t can be used to indicate a tab character.
These are all valid:
asm("mov %1,%0;inc %0"); __asm__ ("mov %1,%0\ninc %0); __asm__ ("mov %1,%0\n" "inc %0); __asm__ ("mov %1,%0\n\t" "inc $0");
These are not valid:
asm("mov %1,%0 inc %0"); // the assembler will not see a delimiter between the statements __asm__ ("mov %1,%0\n","inc %0"); // do not place a comma between the strings
Output and Input Operands
Output operands, if any, are specified as an optional name in square brackets, a quoted string containing a constraint, and a C expression in parenthesis.
Constraints are specified as a string of characters. Some commonly use constraints are:
- r - any general-purpose register is permitted.
- 0-9 - the same register used in the matching number operand should be used (for example, "1" indicates that the same register should be used as operand 1).
- i - an immediate integer value is permitted.
- F - an immediate floating-point value is permitted.
There are additional generic and platform-specific constraints (for example, for SIMD and floating-point registers). Refer to the gcc documentation for details (see resources).
These constraints are combined with a modifier, required for output operands:
- = - output-only register - previous contents are discarded and replaced with output value (this does not preclude use as in input register)
- + - input and output register - this register will be used to both pass input data to the asm code, and to receive a value from the asm code
- & - earlyclobber register - this value may be overwritten before input is processed, therefore it must not be used for input
- % - in addition to one of the symbols above, declares that this operand and the following operand are commutable (interchangeable) for optimization. Only one commutable pair may be specified.
The constraint is followed by the C expression in parenthesis the provides the value (input operand) or receives the value (output operand).
Here are some (trivial) examples in x86_64 assembler:
int x=10, y; __asm__ ("mov %1,%0" : "=r"(y) // output register value is moved to y // register is called %0 in template : "r"(x) // input value from x is placed in a register // register is called %1 in template : );
In the example above, one or two registers are used for input and output -- the compiler can choose whether to use the same register for input and output, or to use separate ones. We can specify that only one register is to be used:
int x=10, y; __asm__ ("mov %1,%0" : "+r"(y) // + indicates read/write register : "0"(x) // output register is same as %0 : );
The registers may be referenced by name instead of number if a name is provided in the operand sections:
int x=10, y; __asm__ ("mov %[in],%[out]" : [out]"=r"(y) // register may be called %[out] : [in]"r"(x) // register may be called %[in] : );
Constraining an Operand to a Specific Register
It is sometimes useful to constrain an operand to a particular register to avoid having to perform moves within the asm code (for example, if an operand will be used as the input to a function call or syscall.
Register Constraints using Explicit Register Variables
To select a specific register for a operand, use a (perhaps temporary) variable in operand's C expression that is locked to a particular register in that variable's C declaration, using explicit register variables.
For example, in aarch64 asm:
int x=10; register int y asm("r15"); asm("mov %1,%0; inc r15;" : "=r"(y) : "r"(x) // register r15 : );
In this example, the variable y is constrained in C to the r15 register. The "inc r15" in the assembler template therefore increments the output register after the mov instructions.
i386 Register Names
On i386 only, specific registers may be selected by using a
, b
, c
, or d
in place of r
as a register constraint. Using a
, for example, will select the rax/eax register.