Assembly Lanugage and Computer Architecture

The Four Stages of Compilation

Source Code to Assembly Code

Assembly language provies a convenient symbolic representation of machine code.

int64_t fib(int64_t n){
    if(n < 2){
        return n;
    return (fib(n-1) + fib(n-2));

We can do clang -03 fib.c -S to generate the assembly code, which gives the code like this

From here, we assemble the assembly code to produce the binary.

$clang fib.s -o -fib.o

Binary executable produced with debug symbols (compiled with -g) can be dumped using

$objcdump -S fib

This will produce the disassembly of the machine code, which is more readable.

Why Assembly?

Why botter looking at the assembly of your program?

  • The assembly reveals what the compiler did and did not do
  • Bugs can arise at a low level. For example, a bug in the code might only have an effect when compiling with -03. Further more, sometimes, the compiler is the source of the bug!
  • You can modify the assembly by hand, when all else fails.
  • Reverse engineering: You can decipher what a program does when you only have access to its binary.

X86-64 ISA Primer

The instruction set architecture (ISA) specifies the syntax and semenatics of assemebly, which includes Registers, Instructions, Data types and Memory addressing modes.

There are a bunch of x86-64 registers

  • The Flags register keeps tracking of whether there is a overflow. Or whether the last operation returns 0, etc.
  • The instruction pointer register
  • The SSE and AVX registers are recently added.These are vector register for SIMD instructions.

The x86-64 general-purpose registers are aliased: each has multiple names, which refer to overlapping bytes in the register.

Here are all the general purpose regiesters

Some of them are still useful for a particular purpose, such as rsp is used as stack pointer, and rbp is used to point to the base of the frame.

The x86-64 Instruction Format is

<opcode> <operand_list>
  • <opcode> is a short mnemonic identifying the type of instruction
  • <operand_list> is 0,1,2 or (rarely) 3 operands, seperated by commas.
  • Typically, all oeprands are sources, and one operand might also be the destiniation.
addl %edi, %ecx

What does <op> A, B mean? There are two syntax standards namely AT&T Syntax and Intel Syntax. In the AT&T syntax, the second operand is the destination, whereas in the Intel syntax, the first operand is the destination.

The common x86-64 Opcodes

Note that, the substraction operation subq %rax, %rbx computes

Opcodes might be augmented with a suffix that describes the data type of the operation or a condition code.

  • An opcode for data movement, arithmetic or logic uses a single-char suffix to inidicate the data type
  • If the suffix is missing, it can usually be inferred from the sizes of the operand registers.

Condiional jumps and conditional moves use a one or two char suffix to indicate the condition codd

cmpq $4096, %r14
jne .LBB1_1

In the example above, ne means the jump should only be taken if the argumetns of the previous comparision are not equal. The previous comparision result is stored in the RFLAG register.

Arithmetic and logic operations update status flags in the RFLAGS register. Common RFLAGS register meanings are listed below

Let take a look at an example

decq %rbx // Decremetn %rbx, and set ZF if the result is 0.
jne .LBB7_1 // Jump to the label .LBB7_1 if ZF is not set

Condition Codes

x86-64 Direct Addressing Modes

The operands of an instruction specify values using a variety of addressing modes. At most one operand may specify a memory address.

Direct addressing modes

  • Immediate: Use the specified value.
    • movq $172, %rdi
  • Register: Use the value in the speicified register
    • movq %rcx, %rdi
  • Direct memory: Ues the value at the specified memory address
    • movq 0x172, %rdi

Indirect Addressing modes

The x86-64 ISA also supports indirect addressing: specifying a memory address by some computation

  • Register Indirect: The address is stored in the specified register
    • movq (%rax), %rdi
  • Register indexed: The address is a constant offset of the value in the specific register
    • movq 172(%rax), %rdi
  • Instruction-pointer relative: The address is indexed relative to %rip
    • movq 172(%rip), %rdi
  • Base Indexed Scale Displacement: The most general form of indirect addressing supported by x86-64
    • movq 172(%rdi, %rdx, 8), %rax

Jump Instructions

The x86-64 jump instructions, jmp and j<condition>, take a label as their operand, which identifies a location in the code.

  • Lables can be symbols, exact addresses, or relative addresses.
  • An indirect jump takes as its operand an indirect address.
    • jmp *%eax
// example from fib.s
jge LBB0_1
    leaq -1(%rbx), %rdi

// example from objdump fib
jge 5 <_fib + 0x15>
  leaq -1(%rbx), %rdi

Assembly Idioms

The XOR opcode xor A, B computes the bitwise XOR of A and B. For example

xor %rax, %rax
# zeors the register

The test opcode test A, B, computes the bitwise AND of A and B and discard the result, preserving the RFLAGS register. For example, the code snippets below checks to see whether the register is 0.

test %rcx, %rcx
je 400c0a <mm+0xad>

test %rax, %rax
cmovne %rax, %r8

The x86-64 ISA includes several no-op (no operation) instructions, including nop, nop A (no-op with an argument), and data16.

data16 data16 data16 nopw %cs:0x0(%rax, %rax,1)
# the instruction does nothing!

Why would the compiler generate assembly with these idioms? Mainly to optimize instruction memory (e.g., code size, alignment)

Floating-Point Instruction Sets

Modern x86-64 architectures support scalar (i.e., non-vector) floating-point arithmetic via a couple of different instruction sets

The original 8086 did not have a floating-point unit. Floating point was done in software. And then they made a companion chip that would do floating point and later was integrated to the CPU.

  • The SSE and AVX Instructions support single-precision and double precision scalar floating-point arithmetic, i.e., float and double
  • The x87 instructions support single-, double-, and extended-precision scalar floating-point arithmetic, i.e., float, double, and long double.

Vector Instructions

Vector instructions generally operate in an elementwise fashion:

  • The ith element of one vector register can only take part in operations with the ith element of the other vector registers
  • All lanes perfom exactly the same operation on their respective elements of the vector
  • Depending on the architecture, vector memory operands might need to be aligned, meaning their address must be multiple of the vector width.
  • Some architectures support cross-lane operations, such as inserting or extracting subsets of vector elements, permuting the vector, scatter, or gather.

Modern x86-64 architectures support multiple vector-instruction sets

  • Modern SSE instruction sets support vector operations aon integer, single-precision, and double-precision floating-point values.
  • The AVX instructions support vector operations on single-precision, and double-precision floating point values
  • The AVX2 instructions add integer-vector operations to the AVX instruction sets
  • The AVX-512 instructions increase the register length to 512 bits and priovide new vector operations, including popcound.