y86-Assembly exercises¶
The exercises here are returned to Lovelace as with C programming exercises. Please follow the instructions below.
Simulator in Internet¶
The network version offers all the necessary functionality for implementign the course exercises and the final project work.
Installing y86¶
You can of course install the programming environment into your own computer. The source code is available from
here (Menu
Student site and
Chapter 4: Processor Architecture).
Simulator manual can be found from
here.
The simulator works best with linux, e.g. with Ubuntu
- If you use Windows/MAC, its best to install Ubuntu in a virtual machine
- You need VirtualBox'
- You also need a 'Ubuntu iso image from the official Ubuntu Web pages, version >16.04 LTS.
- Installation instructions
- Instructions are compiled here
- Then follow the simulator manual
Compiling assembly programs¶
The compiler works with command yas
.
ubuntu:~$ ./yas program.ys
The compiler creates an object file program.yo
that is given to the simulator. You can actually take a look inside the object file to see how the program was compiled to machine language.
Sequential simulator is used with the command ssim
, where the command line parameter -g
opens the GUI. However, GUI components are not installed into the course virtual machine.
ubuntu:~$ ./ssim program.yo
If you program uses more than 10.000 instructions, which is the default limit for the simulator, you can adjust the max number of instructions with command line parameter -l n
, where n is the new maximum.
Exercises¶
You need to return your exercise answers in a particular form as described below. This is needed that your answers can be automatically checked.
- The code must be returned using
.ys
extension - The code first line must be main: code block:
- Do not give memory position directives
.pos
nor any other initializations before the main label!
- Each exercise has its own input and output registers, which you muset use.
- Of course follow any other exercise instructions
- Always end your program with the instruction
halt
. - y86 compiles requires each code line to end with next line character.
yas
compilers gives an error of this, but not the Lovelace exercise output.
An example. Sum up the registers %rax ja %rbx and return the sum in %rbx-register:
.pos 0
irmovq $2,%rax
irmovq $3,%rbx
main:
addq %rax,%rbx
halt
You need to eliminate the first three lines of code and only return the following:
main:
addq %rax,%rbx
halt
This enables Lovelace to correctly interpret and test your answer.
Please note that Lovelace exercises can only handle programs with less than 10.000 executed instructions.
Ok, so here we go..
Multiplication (1p)¶
Implement multiplication with y86 assembly language. Put multiplicant and multiplier into registers %rdi
ja %rsi
and give the answer in register %rax
.
HINT: To multiply x
* y
you add x
y
times.
Division (1p)¶
The y86 processor does not have division assembly instruction, so you need to write code to use it.
Use register %rdi
for the dividend and register %rsi
for the divisor. Use register %rax
for the result.
You can forget about decimals and rounding, just return the lower integer part. For example, 5/2=2
.
HINT: A division of a
/ b
is how many times you need to substract a
from b
to get 0.
Strlen (1p)¶
Write assembly code equivalent for the C strlen-function. Assume a string in the memory (starts from memory address 0x400
). Return its length in register %rax
.
As in C, string characters are in ASCII and string ends at 0
.
An example string ABCD
in memory:
0x0400: 0x4100000000000000
0x0408: 0x4200000000000000
0x0410: 0x4300000000000000
0x0418: 0x4400000000000000
0x0420: 0x0000000000000000
In this example, our strlen returns 4.Note that each character occupies 64 bits (1 memory position).
Hint. For testing, the .quad command puts values in memory.
Average (1p)¶
Write an y86 aseembly program that calculates the average of numbers stored in stack. Return average in the %rax
register. Leave out the stack pointer definition and initialization from your task response; the checker will do it itself.
Hint. You must use either stack registers %rsp
and %rbp
or the pushqand popq commands.
MaxMin (1p)¶
Write a program in y86 assembly that searches the stack for smallest and largest number. The program stores the smallest number in the register %rsi
and largest in register %rdi
.
Hint. Use stack registers. The stack will be initialized automatically
Assembling Ambient light (1p)¶
The SensorTag ambient light sensor gave the illumination value in 16-bit register, with two values E[3:0]
and R[11:0]
that were extracted using btiwise operations.
Implement y86 assembly program that separates the values from a value in register %r13
. Return the bits E[3:0]
in register %r14
and bits R[11:0]
in register %r13
.
ATTENTION:
- A bit shift to the right (towards the LSB) indeed works with division, but how could it be optimized so that a zillion subtraction operations are not necessary? You may need to increase the divisor to get below 10000 machine language instructions.
- A bit shift to the left (towards the MSB) works with multiplication, of course, but it's not efficient with large numbers. (Here it is advisable to google if you don't come up with a solution.)
Square root (3p)¶
The number which square root is calculated (variable num in the algorithm) is stored in %r12
and the result (variable res in algorithm) must be returned in %rcx
. The numbers are 16 bits and always positive
// num is the number which square root is calculated
// res is the result
int32_t res = 0;
int32_t bit = 1 << 16;
while (bit > num) {
bit >>= 2;
}
while (bit != 0) {
if (num >= res + bit) {
num -= res + bit;
res = (res >> 1) + bit;
} else {
res >>= 1;
}
bit >>= 2;
}
If the result is not exact, present the result without decimals and always rounding down. For instance: sqrt(2345) = 48.425200051.
-> 48
.
In this case, you can use up to 85000 instructions (in contrast to 10000 that are required normally). However,if for any of the test you run more than 10000 the exercise grade will be reduced.
Hox! You can implement the algorithm better if you understand it. Remember the value of a division, can be greater than 2.
Hox 2!: Although we do not recommend it in this exercise, the algorithm can be optimized using lookuptables.
Efficient multiplication (2p)¶
Previously, multiplication was implemented by "adding the multiplicand to the register as many times as the multiplier," which is an obviously inefficient way to perform multiplications. For example, the operation 8000*8000
takes more than 8000 instructions when performed with an addition loop.
Well... how can multiplication be done more efficiently? One answer is the Peasant binary algorithm.
You have encountered this calculation method when you were in elementary school and did multiplications with long multiplication. So let's start in the same way by multiplying our example by multiplying the binary numbers 101101 (45)
and 110010 (50)
side by side.
110010
x 101101
------------
110010
000000 (kertoja 0)
110010
110010
000000 (kertoja 0)
+110010
------------
100011001010
It is observed that one large multiplication turns into a summation of the multiplicand shifted by bits.
Let's go through this calculation step by step using another example, when multiplying the same binary numbers 101101 (45)
and 110010 (50)
together.
1. Binary numbers representation.
(101101) * (110010) = ?
101101 = 1x2^5 + 0x2^4 + 1x2^3 + 1x2^2 + 0x2^1 + 1x2^0 = 45
= 1x2^5 + 1x2^3 + 1x2^2 + 1x2^0
110010 = 1x2^5 + 1x2^4 + 0x2^3 + 0x2^2 + 1x2^1 + 0x2^0 = 50
= 1x2^5 + 1x2^4 + 1x2^1
Since the latter number has fewer terms, let's choose it as the multiplier this time.
2. Note that when the multiplier's bit is zero, it does not need to be taken into account because 0 * n = 0
. Now the multiplication can be expressed as:
(110010) x (101101) = (1x2^5 + 1x2^4 + 1x2^1) x (101101) = ?
3. Hence, in the multiplication, each term is calculated once at a time and summed.
(1x2^5) => 100000 x 101101 = 1440
+
(1x2^4) => 010000 x 101101 = 720
+
(1x2^1) => 000010 x 101101 = 90
Nyt 45 * 50 = 1440 + 720 + 90 = 2250
4. Individual terms can still be conveniently calculated using bit shifting. For example:
101101 x 100000 = (101101 << 5) = 10110100000
A more efficient multiplication can thus be implemented as follows:
1. Examine the bits of the multiplier individually to calculate partial products.
1. If the multiplier's bit is zero, the corresponding partial product is also zero.
1. If the multiplier's bit is one, calculate the corresponding multiplication between that individual bit and the multiplicand using bit shifting.
1. Sum the partial products for the final result.
Implement a program that calculates the multiplication with the method described above. The numbers to be multiplied are in registers %r11
and %r12
. The program returns the result in the %rax
-register."
Give feedback on this content
Comments about these exercises