Assembly Language¶
Learning Objectives: After reading this material, the student will understand the y86 assembly language syntax and ways to implement high-level programming language functionality in assembly.
In this course, we first learn y86 assembly and machine language, which are then used to illustrate instruction execution in the processor and introduce the processor's internal components related to executing instructions. Additionally, y86 assembly language is used for the course's exercises and project work with an assembler and simulator.
The y86-64 assembly syntax and instruction set are, in essence, very similar to x86 assembly. In fact, with minor adjustments, programs can run on both processors! However, it’s important to note that the y86 instruction set is significantly simplified, as it’s designed for educational purposes. Not even close to all of the thousand-plus x86 family assembly instructions are implemented in the y86-64 processor.
Overview¶
When programming in y86 assembly, we naturally rely on the y86 processor architecture described in previous material. We have at our disposal:
- A 64-bit architecture, in which:
- Byte order is little endian, meaning the least significant byte comes first. For example, the bytes of hexadecimal
0x12345678
are represented in memory as0x78,0x56,0x34,0x12
. - Bit order is familiar the familiar one, with the the MSB on the left.
- Registers:
- General-purpose
%rax,%rcx,%rdx,%rbx,%rsi,%rdi,%r8-%r14
- Stack registers: base pointer
%rbp
and stack pointer%rsp
- Status flags: zero flag
ZF
, sign flagSF
, and overflow flagOF
- Memory addressing modes: immediate, register, and indirect.
Note that y86 assembly instructions often end with the letter
q
, which means the instruction operates on a 64-bit operand (quad). Generally, a 32-bit operand is indicated by the letter l (long word), and a 16-bit operand by the letter w (word), but in y86 simulator we are using only 64-bit numbers are used.With this in mind, let’s start looking at the y86 assembly syntax.
Transfer Operations¶
In the Computer Systems part of the course, we started learning C programming by using variables in our programs, so let’s now do the same with y86 assembly. In general, assembly languages do not use logical names for variables, such as
uint8_t fish_count = 3;
. Instead, we use the processor’s registers and memory directly via memory addresses as variables.We recall from the architecture diagram that the ALU operates on operands stored in the processor’s internal memory (registers). So,
- With assignment operations, we can set and read the values of registers, which are operands for instructions.
- Similarly to C pointers, which reference what’s in a memory address, memory addresses can be given to assembly language move instructions, allowing the operand to be fetched first and then stored in a specified register.
There are four transfer instructions in y86 assembly. The general instruction is
movq
, and it requires two letters in front to indicate the types of both operands, meaning where the value is coming from and where it is going. The instructions take two operands, depending on the type, indicating the source and destination.irmovq
: moves an immediate value (i=immediate) to a register (r=register)ir
rrmovq
: moves a value from one register to another registerrr
mrmovq
: moves a value from memory (m=memory) to a registermr
rmmovq
: moves a value from a register to memoryrm
Note that in y86 assembly, transfers cannot occur directly from one memory address to another, and immediate values cannot be moved directly to memory. Transfers always go through a register. The reason for this will be explained in later material.
Code example:
#.pos 0
irmovq $4,%rax # Immediate: assign %rax = 4
rrmovq %rax,%rcx # Register: assign %rcx = %rax
mrmovq (%rdi), %r10 # Indirect: fetch value from memory
# address pointed to by %rdi and assign it to %r10
rmmovq %rsi,8(%rdx) # Indirect: assign %rsi's value
# to memory address %rdx + 8
# halt
All previously discussed memory addressing modes are used in transfer operations, and we’ll look at them more closely soon.
Arithmetic Operations¶
Only four different arithmetic operations are available, meaning more complex operations, such as multiplication or division, must be implemented as separate subroutines!
The instructions take two register operands, which can be any registers (in this example, they’re arbitrary registers).
addq %rax,%r9
: additionr9 = r9 + rax
subq %r9,%r12
: subtractionr12 = r12 - r9
andq %rax,%rbx
: AND operationrbx = rbx & rax
xorq %rax,%rsi
: XOR operationrsi = rsi ^ rax
Additionally, the processor’s status flags are set based on the operation’s result.
Code example:
#.pos 0
irmovq $4,%rax # 1st operand, number 4
irmovq $3,%rbx # 2nd operand, number 3
addq %rbx,%rax # %rax = %rax + %rbx = 4 + 3 = 7
irmovq $7, %rbx # 2nd operand, number 7
subq %rbx,%rax # %rax = %rax - %rbx = 7 - 7 = 0
# Now ZF flag is set
# halt
Note: Immediate values (numbers) cannot be used as operands, because the ALU always operates with registers.
Conditional Move Operations¶
It turns out we can’t move on from transfer operations that easily… In addition to direct move operations, assembly language also has conditional move operations, whose behavior depends on the status flags. Conditional move operations are a convenient (and sometimes the only) way to implement conditional and control structures similar to high-level languages.
Conditional move instructions take register operands. The conditionality is implemented by checking the specific status flag before executing the instruction, and if the flag is set, the move is performed.
Conditional move instructions, where r1 and r2 are the operand registers:
cmove
(equal): move is executed when the zero flag is set (ZF=1)cmovne
(not equal): move is executed when the zero flag is not set (ZF=0)cmovl
(less): executed when the sign flag or overflow flag is set (SF^OF)cmovle
(less or equal): executed when ((SF^OF) | ZF)cmovge
(greater or equal): executed when (~(SF^OF))cmovg
(greater): executed when (~(SF^OF) & ~ZF)
The rationale behind conditional move instructions is to keep the ALU’s digital logic implementation as simple as possible. As a result, the assembly programmer must think in terms of status bits rather than straightforward comparison operators… This approach is not processor-specific and is used in all assembly languages.
Code example: A C language conditional statement…
if (a == 10) {
b = a;
}
… implemented in y86 assembly:
#.pos 0
irmovq $10,%rax # a = 10
irmovq $10,%rcx # Comparison value 10
subq %rax,%rcx # "Comparison" operation
cmove %rax,%rbx # if ZF=1, assign %rbx = %rax, i.e., b = a
# halt
Conditional move instructions can optimize processor performance, which we will discuss further later…
Jump Instructions¶
y86 assembly has a broader set of jump instructions, with a total of seven different types. Similar to conditional move operations, jump instructions themselves do not perform any comparison; instead, they make a jump decision based on the status bits. The operand for a jump instruction is a memory address or a logical name for the memory address to jump to.
jmp
: Unconditional jumpje
equal, executed when (ZF=1)jne
not equal, executed when (ZF=0)jle
less or equal, executed when ((SF^OF) | ZF)jl
less, executed when (SF^OF)jge
greater or equal, executed when (~(SF^OF))jg
greater, executed when (~(SF^OF) & ~ZF)
Code example: An imaginary loop structure in C…
a = 10;
while(a > 0) {
a--;
}
… implemented in y86 assembly:
#.pos 0
irmovq $10,%rax # Loop variable
irmovq $1,%r8 # Constant 1
loop:
subq %r8,%rax # %rax = %rax - 1
jne loop # Jump back if %rax > 0 (ZF!=0). jg loop toimi myös
# halt
Code example: A C conditional statement…
a = 9;
if (a > 10) {
b = 1;
} else {
b = 0;
}
… implemented in y86 assembly:
#.pos 0
irmovq $9,%rax # a = 9
irmovq $10,%r8 # Comparison value 10
subq %r8,%rax # if (a > 10)
jg greater # condition true, jump to "greater"
irmovq $0,%rbx # otherwise b = 0
jmp skip # skip "greater"
greater:
irmovq $1,%rbx # b = 1
skip: # program execution continues here
# halt
Other Instructions¶
Additionally, y86 assembly includes other instructions related to program execution.
The
nop
instruction does nothing except increment the PC register. However, in real processors, this instruction has several uses, including timing and aligning code in memory. We’ll discuss this further later on, as an instruction that does nothing plays an important role in optimizing processor performance.The
halt
instruction stops program execution. The program’s status register changes to state STAT=HLT
.Note! In the example programs above, the halt instruction was commented out. If you run the examples in an emulator, it’s important to remove the comment from halt.
Using the Stack¶
In general, stack memory, or the runtime memory for programs, is integrated into (almost) all modern processors. The stack stores information needed in the near future in a program, which we don’t want to store in registers. Examples include subroutine return addresses and arguments, which we’ll cover shortly.
The stack is a LIFO (Last In, First Out) type of memory, meaning it’s literally a stack where we "pile up" information "on top of each other." The stack grows "upwards," and, conversely, is cleared from the "top down," with the topmost information being removed first.
To make stack usage "simple," the stack actually grows downwards in memory. In other words, the stack’s first memory location is at its highest address. When something is added to the stack, it’s stored in the previous memory location. (There are mostly historical reasons for this convention; the stack was often placed at the end of the memory area, so there was room only downwards, or indexing was simpler for earlier processors.)
In machine/assembly languages, stack usage has its own instructions; in y86:
pushq
, which pushes a word from a register onto the stack.popq
, which removes the topmost word from the stack.
Additionally, stack management requires support from specific registers:
- Stack base address
%rbp
- Stack pointer register
%rsp
, which always holds the address of the topmost stack location.
Therefore, stack instructions automatically look at the memory address from the
%rsp
register and update the register value after executing.In every y86 assembly program, we must first define the stack’s location. Strictly speaking, it’s not absolutely necessary, and we can create small programs without it. However, in this course, we always use the stack in programs, because without a stack, subroutines won’t work, see below.
In y86 assembly, stack initialization is done by setting the stack registers to point to the desired starting address of the stack (here,
0x400
). After that, we can use the stack in our program:.pos 0
irmovq stack,%rbp # stack base address
irmovq stack,%rsp # topmost memory address of the stack
main:
irmovq $1,%rax # value to store in the stack
pushq %rax # register value to stack
irmovq $2,%rax
pushq %rax
irmovq $3,%rax
pushq %rax
popq %rax # topmost stack value to register
popq %rax
popq %rax # three values were stored, so remove them all
halt
.pos 0x400
stack:
Note! You can run the example code in the y86 simulator to see how the stack registers and memory behave.
Note2! A typical mistake on this course is initially forgetting to initialize the stack and then wondering why subroutine calls don’t work.
Note3! Since the stack is in the same memory as code and data, be careful not to overwrite the code or other data with values pushed onto the stack!
Subroutines¶
In assembly languages, it is possible to create subroutines (or functions) in the program, but writing and using them involves several important considerations.
Subroutine Location in Memory¶
The location of a subroutine in memory is entirely up to the programmer. By using the
.pos
directive and a logical name for the memory address, the programmer can place the subroutine anywhere in the memory available to the program.Example: Place a subroutine/function named
addition
starting at memory address 0x100
..pos 0x100
addition:
addq %rax,%rbx
ret
Subroutine Execution¶
A subroutine is executed by jumping to its memory address, but it’s done with a specific
call
instruction. The return from the subroutine to the calling program is also done with a special ret
instruction. The program must also have a stack memory defined for these instructions to work properly.Example: Place a subroutine/function named
addition
starting at memory address 0x100
and the stack starting at address 0x400
..pos 0
irmovq stack,%rsp # stack pointer
irmovq stack,%rbp # stack base address
call addition # subroutine call
halt # program execution stops
.pos 0x100
addition: # subroutine starts here
addq %rsi,%rdi
ret # return from subroutine
.pos 0x400
stack: # stack base address
Ok… but why are specific instructions needed for calling subroutines and using the stack? Why can’t we just use jump instructions? The answer isn’t clear from the examples above, but the reason is that before jumping to a subroutine, the processor actually saves its current state, including general registers and the PC, to the stack, so the exact same processor state can be restored after returning from the subroutine.
There is a very important reason for this: it allows reusing the same registers in subroutines. In the main program, when we save register values to the stack, the register becomes available for use in the subroutine, and when the subroutine ends, the "old" register values can be restored from the stack. This is especially handy in larger programs or when only a few registers are available. Interrupt handlers (and some firmware calls) actually work in the same way, but more on that later…
In y86 assembly:
- The
call
instruction automatically saves the return address of the subroutine to the stack. In the example, the address of the nexthalt
instruction goes onto the stack. - The
ret
instruction retrieves the return address from the stack and transfers program execution to that address. - The simplified y86 processor only saves the return address to the stack. It’s therefore the programmer’s responsibility to save register values to the stack if needed during subroutine execution.
Subroutine Arguments¶
In assembly languages, it’s also possible to pass arguments to subroutines, either through registers or stack memory. As you might guess, assembly programmers have to write a bit more code for this… but thankfully, there are generally agreed-upon conventions.
1. Passing arguments through registers. Now, in x86 processor architectures (yes, x86), it’s customary to use specific registers for storing arguments. Having this convention makes code reuse and general-purpose functions easier, as register use doesn’t need to be rethought.
- Arguments are usually passed in the following registers and in this order:
%rdi, %rsi, %rdx, %rcx, %r8, %r9
. This allows a maximum of six parameters to be passed through registers. - If needed, a register can also be chosen for the function's… *ahem* subroutine return value. This register can, for efficiency, be one of the registers chosen for argument passing.
Example: Using registers
%rdi
and %rsi
to pass two arguments and a return value (in register %rdi)..pos 0
irmovq $100,%rdi # 1st argument
irmovq $200,%rsi # 2nd argument
call addition # subroutine call
halt # program stops
.pos 0x100
addition: # subroutine functionality
addq %rsi,%rdi # result in %rdi is the return value
ret # return from subroutine
2. Arguments can also be stored on the stack before calling the subroutine and read from the stack during execution. This allows passing more than six parameters. This is a convenient method, but note that the return address also goes onto the stack. In this case, indirect addressing must be used to read values from the stack.
.pos 0
irmovq stack,%rsp # stack pointer
irmovq stack,%rbp # stack base address
irmovq $10,%rax # 3 arguments (10, 20, and 30) to stack
pushq %rax
irmovq $20,%rax
pushq %rax
irmovq $30,%rax
pushq %rax
call addition # subroutine call
popq %rax # clear stack to avoid leftover data
popq %rax
popq %rax
halt # program stops
.pos 0x100
addition: # subroutine begins
mrmovq 8(%rsp),%rax
addq %rax,%rbx
mrmovq 16(%rsp),%rax
addq %rax,%rbx
mrmovq 24(%rsp),%rax
addq %rax,%rbx
ret # return from subroutine
.pos 0x400
stack: # stack base address
Let's examine the stack memory in this program using a screenshot from the y86-64 emulator.
On the left, we see memory addresses. Notice that the stack base address
%rbp
is indeed set at memory location 0x400
. The program then pushes arguments 1
, 2
, and 3
onto the stack one at a time using pushq
instructions. Each time a value is pushed onto the stack, the stack pointer %rsp
changes (grows upwards) by the size of the stored value (here, 64 bits = 8 bytes). So initially, %rsp
holds the value 0x400
, and after storing the first argument (1), the stack pointer moves up one memory location and now points to memory address 0x3f8
. Similarly, as 2 and 3 are pushed onto the stack, the stack pointer moves up by one memory location each time and ends up pointing to memory location 0x3e8
. After that, the program makes a call
instruction, which pushes the return address (0x41
) onto the stack. Thus, when entering the subroutine, four memory locations in the stack are in use.In the subroutine, we use the stack as follows:
- Now
%rsp = 0x03e0
and we want to read from one memory location above it, so we add 8 (bytes) to the %rsp value, resulting in memory address0x3e0 + 8 = 0x3e8
, where we read the 3rd argument into%rax
. - Then, with
%rsp = 0x03e0
, we want to read from two memory locations above, so we add 16 (bytes) to %rsp to fetch the 2nd argument into the register. - And finally, with
%rsp = 0x03e0
, we want to read from three memory locations above, so we add 24 (bytes) to %rsp to fetch the 1st argument into the register. - At the end of the subroutine, we store the result of the addition in register
%rbx
.
At the end of the program, we need to clear the values from the stack with
popq
instructions. When using the stack, always ensure that you retrieve exactly as much data as was stored there.Note! You can also directly allocate memory from the stack by manipulating the
%rsp
register. For example, if you need local variables in a subroutine, you can shift the stack pointer forward by the desired number of memory locations and use indirect addressing to handle these as if they were reserved memory locations. In "real-world" assembly programming, the stack is typically divided into sections with specific purposes, especially for subroutine handling.Compilation Control Instructions¶
The following are instructions used to control compilation:
name:
A symbolic/logical name for the code block that follows. This allows for marking memory addresses. For example, a symbolic name can be given to a subroutine. The name disappears from the code during compilation and is replaced by the actual memory address.
Names are not mandatory; they can be used to help organize the code. The programmer has free rein here. In the example below, you could just as easily use
my_func
instead of main
. In the example, memory addresses are reserved for main and two functions:main:
...
function1:
...
function2:
...
.pos
Sets the starting address for the following code/block.
.pos 0 # Main program (i.e., main function) starts at address 0
main:
...
.pos 0x100 # function1 starts at memory address 0x100
function1:
...
The program's memory addresses don’t need to be sequential, so there can be "empty" space in between. This is particularly useful if you want to update a subroutine later without having to move the entire code or its memory addresses.
.quad
This instruction allows for storing predefined data in memory, such as arrays. This setup happens before running the code and acts as a kind of storage from which data can be read into main memory.
Example: First, the
.pos
instruction sets the desired memory location, the .align
instruction sets the desired alignment, and the .quad
instruction stores data at that memory location..pos 0x80
.align 8 # Align memory in 8-byte increments
.quad 0x1234 # 2-byte number
.quad 0x5678 # 2-byte number
Memory will look as follows, with each number padded to the word length.
... 0x80: 0x3412000000000000 0x88: 0x7856000000000000 ...
Using Memory in Programs¶
Finally, here are a few handy tips on how to use memory for working with variables in y86 assembly programs.
Custom Variables¶
As we learned earlier, a variable is simply a memory location in the computer’s memory. So in assembly languages, we can freely use program memory (indirectly) for defining and using custom variables.
Example: Define two custom variables {*
{{my_var1}}} and
.align
aligns the memory address to the specified word length, which in this case is 8 bytes. If the size of the value to be stored in memory is less than a word, align effectively "pads" it with zeros to make it word-sized. This is helpful with arrays, for example, but typically isn’t needed in this course.{{my_var1}}} and
my_var2
at memory locations 0x300
and 0x308
and store and read values from them..pos 0 irmovq $1,%rax # Value to store in memory location irmovq my_var1, %rbx # Load address of memory location into register rmmovq %rax, (%rbx) # Store register value in memory (indirect addressing) irmovq $2, %rax # Value to store in memory location irmovq my_var2, %rbx # Load address of memory location into register rmmovq %rax, (%rbx) # Store register value in memory (indirect addressing) irmovq my_var1, %rcx # Load address of memory location into register mrmovq (%rcx), %rdi # Read value from memory location into register .pos 0x300 # Memory location my_var1: # Declaration of my_var1 .pos 0x308 # Memory location my_var2: # Declaration of my_var2
Array Variables¶
Similarly, we can define and use arrays in assembly programs as in higher-level languages, but using them requires mastering various memory addressing modes to index array elements.
Here, indirect memory addressing is especially handy because we can use a chosen register (an "index register") to operate with memory addresses. That is, we write the memory address of the array element to a register and update the address as we move through the array.
array[0] # Address of the first element 0x300 array[1] # Address of the second element 0x308 array[2] # Address of the third element 0x310 ... # Why does the address increase by 8 bytes each time? # Because the architecture is 64-bit
Example: The array starts at address
0x300
, where we store values for variables..pos 0
irmovq array, %rdi # Index register, array start
irmovq $8, %rcx # Constant 8, index increases/decreases by 8 bytes
irmovq $1, %rax # Value to store in the first element
rmmovq %rax, (%rdi) # Store in memory location of the first element
addq %rcx, %rdi # Next (second) element's address +8 bytes
irmovq $2, %rax # Value to store in the second element
rmmovq %rax, (%rdi) # Store in memory location of the second element
addq %rcx, %rdi # Next (third) element's address +8 bytes
irmovq $3, %rax # Value to store in the third element
rmmovq %rax, (%rdi) # Store in memory location of the third element
.pos 0x300
array:
Similarly, to read a value from an array element, just change the value of the register (in this case
%rdi
) used as an index. For example, the address of the second element would be array + 16
, and the fourth element array + 32
, etc.Structure of an Assembly Program¶
Below is an example code from the course textbook (Bryant, p. 403) and the simulator, which shows the basic structure of a y86 assembly language program.
- Initialization: Setting the memory address of the stack is enough here
- Main program
main
- Corresponds to the C language function
main()
- Subroutines
- Here,
sum
, function equivalent to C (long *array, long count) - Memory Allocation
- Here, an array with 4 elements
# Program starts at memory address 0
.pos 0
# Initialization
irmovq stack, %rbp # Initialize stack
irmovq stack, %rsp # Stack pointer
# Main Program
main:
irmovq array,%rdi # Arguments for subroutine
irmovq $4,%rsi #
call sum # Call subroutine sum(array, 4)
halt # End of main program
# Subroutine sum(long *array, long count)
# start in %rdi
# count in %rsi
sum:
irmovq $8,%r8 # Constant 8
irmovq $1,%r9 # Constant 1
xorq %rax,%rax # sum = 0 -> %rax = 0
andq %rsi,%rsi # Clear flags -> The and of one positive number never ~0 activates none flag.
jmp test # Jump to test block
loop:
mrmovq (%rdi),%r10 # Array start (*array)
addq %r10,%rax # Add element to sum
addq %r8,%rdi # Next element address (array=..)
subq %r9,%rsi # Array length (count--)
test:
jne loop # Loop back if count != 0. First time no flag is activated (we did an and)
ret # Return from subroutine
# Stack location in memory
.pos 0x200
stack:
# Array: 4 elements
.pos 0x300
.align 8
array:
.quad 0x000d000d000d
.quad 0x00c000c000c0
.quad 0x0b000b000b00
.quad 0xa000a000a000
y86-64 Assembly reference¶
Please refer to the course book Bryant & O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd edition. Chapter 3.
Here (Chapter 4) you can find the y86-64 processor simulator that is used in the course for exercises and course project.
There is also a browser version for y86-64-simulator here. You should take a look a the exercise material before starting assembly programming.
Summary¶
As you can see from the material, programming in assembly requires the programmer to work very close to the processor, which means that even "routine" tasks in C language need to be implemented carefully, instruction by instruction at a low level, with knowledge of techniques for handling memory.
A convenient web-based simulator is available for y86 assembly programming on this course. But first, read more about assembly programming in the exercise materials.
Give feedback on this content
Comments about this material