Changes

Jump to: navigation, search

Assembler Basics

367 bytes added, 14:10, 31 January 2018
Format of an Assembly Language program
[[Category:Assembly Language]]
When you program in [[Assembler|assembly language]], you're directly programming the "bare metal" hardware. This means that many of the compile-time and run-time checks, error messages, and diagnostics that are present in other languages are not available. The computer will follow your instructions exactly, even if they are completely wrong (like executing data), and when something goes wrong, your program won't terminate until it tries to do something that's not permitted, such as execute an invalid opcode or attempt to access a protected or unmapped region of memory. When that happens, the CPU will signal an exception, and in most cases the operating system will shut down the offending process.
== Format of an Assembly Language program ==
The traditional extension for assembly-language source files is <code>.s</code> (e.g., <code>example.s</code>), or <code>.S</code> for files that need to go through the C preprocessor (<code>cpp</code>).
An assembly-language program consists of:
# Data - Values used by the program for things such as initial variable values and string or numeric constants.
Assembler '''directives''' are used to control the assembly of the code, by specifying output file sections (such as .text (machine code), .data (read/write data), or .rodata (read-only data /constants) in an ELF file) and data formats (such as word size for numeric values), and by defining macros.
Consider this x86_64 assembly language "Hello World" program:
<font color="green">mov</font> $<font color="blue">msg</font>,%<font color="blue">rsi</font> /* message location */
<font color="green">mov</font> $<font color="blue">stdout</font>,%<font color="blue">rdi</font> /* file descriptor stdout */
<font color="green">mov</font> $<font color="orange">1</font>,%<font color="blue">rax</font> /* [[Syscalls|syscall ]] sys_write */
<font color="green">syscall</font>
<font color="green">syscall</font>
<font color="red">.datarodata</font>
<font color="blue">msg:</font> <font color="red">.ascii</font> <font color="orange">"Hello, world!\n"</font>
<font color="red">.set</font> <font color="blue">len</font> , <font color="orange">. - msg</font>
In the program above:
* .start text is a directive (equivalent to the longer directive ".section .starttext") which specifies that the following instructions/data should be placed in the ".starttext" section of the output ELF file.* .data rodata is a similar directive which specifies that the following instructions/data should be placed in the .data rodata section of the output ELF file. In the case of this program, they could alternately be placed in the .rodata data section, which is for read-only write data (data which is write-protected in memory), but .rodata was used because the string is not modified by the program(it's a constant).
* .global (or .globl) is a directive which makes the following symbol visible to the linker. Otherwise, symbols are normally lost by link time. In this case, the linker needs to know the value of the special symbol _start in order to know where execution is to begin in the program (which is not always at the start of the .text section).
* _start is a label which is equivalent to the memory location of the first instruction in the program.
* msg is a label which is equivalent to the memory location of the first byte of the string "Hello, World!\n"
* .set is a directive which sets a symbol (len) equal to the value of an expression (in this example, ". - msg" meaning the current memory location minus the value of the label "msg"). Note that the GNU assembler accepts <code>a=1</code> as equivalent to <code>.set a , 1</code> -- both are counted as directives regardless of the presence of the <code>.set</code> keyword.
Note that symbols are not variables - they are constants that are calculated at compile-time. However, they may contain an address at which a variable is stored.
On a Linux system, you will need to meet three requirements to get your assembly language program to work:
# Code must be placed in the <code>.text</code> section of the ELF file.# Data must be placed in either the <code>.rodata</code> (read-only data) or <code>.data</code> (read/write data) sections of the ELF file.# There must be a globally-defined symbol which the linker (<code>ld</code>) will use to find the entry point to your program. If the code is being directly compiled by the assembler, this symbol must be <code>_start</code> -- but if the code is being compiled by gcc, this symbol must be called <code>main</code>(a preamble will be located at <code>_start</code> which will then transfer control to <code>main</code>).
The file extension should be <code>.s</code> for assembler source without directives (for compilation with the assembler) or <code>.S</code> for assembler with preprocessor directives (for compilation with gcc).
# Run the assembler: <code>nasm -g -f elf64 -o ''test''.o ''test''.s</code>
# Run the linker: <code>ls ld -o ''test'' ''test''.o</code>
Notes:

Navigation menu