Open main menu

CDOT Wiki β

Assembler Basics

Revision as of 13:27, 10 September 2014 by Chris Tyler (talk | contribs) (Examples: Ireland is no longer available.)

When you program in assembly language, you're directly programming the "bare metal" hardware. This means that many of the compile-time and run-time checks, error messages, and diagnostics are not available. The computer will follow your instructions exactly, even if they are completely wrong (like executing data), and when something goes wrong, your program won't terminate until it tries to do something that's not permitted, such as execute an invalid opcode or attempt to access a protected or unmapped region of memory. When that happens, the CPU will signal an exception, and in most cases the operating system will shut down the offending process.

Format of an Assembly Language program

The traditional extension for assembly-language source files is .s (e.g., example.s)

An assembly-language program consists of:

  1. symbols that are constants which correspond to memory addresses or other values.
  2. Instructions - Mnemonics for an operation followed by zero or more arguments.
  3. Data - Values used by the program for things such as initial variable values and string or numeric constants.

Assembler directives are used to control the assembly of the code, by specifying output file sections (such as .text or .data) and data formats (such as word size for numeric values), and defining macros.

Consider this x86_64 assembly language "Hello World" program:

.text
.global  _start

_start:
        mov    $len,%rdx                       /* message length */
        mov    $msg,%rsi                       /* message location */
        mov    $1,%rdi                         /* file descriptor stdout */
        mov    $1,%rax                         /* syscall sys_write */
        syscall

        mov    $0,%rdi                         /* exit status */
        mov    $60,%rax                        /* syscall sys_exit */
        syscall

.data

msg:    .ascii      "Hello, world!\n"
.set len . - msg

In this program, which was written using GNU Assembler (gas) syntax, text is coloured according to its type:

  • directives
  • symbols
  • expressions

A symbol may be set in one of two ways:

  1. Using a directive (in the example above, len line), or
  2. As a label (such as _start or msg in the example above). A label is identified by the trailing semi-colon, and is set to the current memory location in the instruction or data sequence. Labels may be used for loading/storing information, or as the target of branches/jumps.


In the program above:

  • .start is a directive (equivalent to the longer directive ".section .start") which specifies that the following instructions/data should be placed in the ".start" section of the output ELF file.
  • .global (or .globl) is a directive which makes the following symbol visible to the linker. Otherwise, symbols are normally lost by link time. In this case, the linker needs to know the value of the special symbol _start in order to know where execution is to begin in the program (which is not always at the start of the .text section).
  • .set is a directive which sets a symbol (len) equal to the value of an expression (in this example, ". - msg" meaning the current memory location minus the value of the label "msg"). Note that the GNU assembler accepts a=1 as equivalent to .set a,1<code> -- both are counted as directives regardless of the presence of the <code>.set keyword.
  • _start is a label which is equivalent to the memory location of the first instruction in the program.
  • msg is a label which is equivalent to the memory location of the first byte of the string "Hello, World!\n"

Note that symbols are not variables - they are constants that are calculated at compile-time.

Note also that the syntax will vary from assembler to assembler and from architecture to architechture.


Instruction Set Architecture Information

Resources