y86 Advanced Assembly exercises¶
This section contains exercises that might be needed later in the final project.
Efficient modulo (1p)¶
In the same way you can calculate a division using substraction you can also calculate the module of a number (that is the remainder of the division). You can test your skills in this voluntary exercise
The previous modulo implementation is good enough when dealing with small numbers. However, if we are calculating
one trillion % 17, the subtraction has to be performed billions of times! This is very slow.In this task you need to implement a faster modulo that scales better for larger inputs.
One way we can improve the simple subtraction loop is to increase the divisor. That way the dividend will be reduced faster. However, we can't increase the divisor willy nilly, as that will ruin the result:
123 % 17 = 4 123 % 18 = 15
However, if the new divisor is a multiple of the old one, and we do one more modulo with the original divisor, we can get the correct result:
123 % 17 = 4 123 % (17 * 3) = 21 21 % 17 = 4 123 % (17 * 3) % 17 = 4 123456789 % 7 = 1 123456789 % (7 * 100) = 589 589 % 7 = 1 123456789 % (7 * 100) % 7 = 1 In general: a % b = a % (b * n) % b, n > 0
However it is difficult to find a good value to multiply the divisor ahead of time, and multiplication can be slow too. But if you remember from previous tasks, doubling i.e. multiplication by two is cheap, as you can just as add the value to itself. Doubling will also ensure the divisor is a multiple of the original divisor. Doing the doubling once or twice will not be enough for large dividends. But if we keep doubling the divisor as we perform the subtractions we can reduce the dividend pretty fast. When the divisor grows bigger than the dividend, we can just reset it to the original and start the doubling again.
Here is an example of how the algorithm would work:
321 % 11 = ? 321 > 11, keep going 321 - 11 = 310, double 11 -> 22 310 > 22, keep going 310 - 22 = 288, double 22 -> 44 288 > 44, keep going 288 - 44 = 244, double 44 -> 88 244 > 88, keep going 244 - 88 = 156, double 88 -> 176 156 < 176, too big! -> reset to original divisor (11) 156 > 11, keep going 156 - 11 = 145, double 11 -> 22 145 > 22, keep going 145 - 22 = 123, double 22 -> 44 123 > 44, keep going 123 - 44 = 79, double 44 -> 88 79 < 88, too big! -> reset to original divisor (11) 79 > 11, keep going 79 - 11 = 68, double 11 -> 22 68 > 22, keep going 68 - 22 = 46, double 22 -> 44 46 > 44, keep going 46 - 44 = 2, double 44 -> 88 2 < 88, too big! -> reset to original divisor (11) 2 < 11, even the original divisor is too big! We are done. 321 % 11 = 2
The additional doubling and checks make this algorithm a bit slower for smaller inputs, but greatly more optimal for larger inputs: the naive implementation would have to do about 59 billion subtractions to calculate
one trillion % 17, while this more efficient implementation would only require 365.Again, the dividend will be in register
%r8 and the divisor in %r9. Put the result in register %rax. Only nonnegative numbers are used in this task. The divisor will also never be zero.Watch out: The divisor will overflow if the dividend is greater or equal to 2 ^ 63. To protect against this, you can add an additional check, which resets the divisor to its original value, if it happens to overflow. However, in this task such large dividends will not be tested, so overflow protection is optional.
Hints
Messages
Efficient multiplication (2p)¶
Previously, multiplication was implemented by "adding the multiplicand to the register as many times as the multiplier," which is an obviously inefficient way to perform multiplications. For example, the operation
8000*8000 takes more than 8000 instructions when performed with an addition loop.Well... how can multiplication be done more efficiently? One answer is the Peasant binary algorithm.
You have encountered this calculation method when you were in elementary school and did multiplications with long multiplication. So let's start in the same way by multiplying our example by multiplying the binary numbers
101101 (45) and 110010 (50) side by side. 110010
x 101101
------------
110010
000000 (kertoja 0)
110010
110010
000000 (kertoja 0)
+110010
------------
100011001010
It is observed that one large multiplication turns into a summation of the multiplicand shifted by bits.
Let's go through this calculation step by step using another example, when multiplying the same binary numbers
101101 (45) and 110010 (50) together.1. Binary numbers representation.
(101101) * (110010) = ?
101101 = 1x2^5 + 0x2^4 + 1x2^3 + 1x2^2 + 0x2^1 + 1x2^0 = 45
= 1x2^5 + 1x2^3 + 1x2^2 + 1x2^0
110010 = 1x2^5 + 1x2^4 + 0x2^3 + 0x2^2 + 1x2^1 + 0x2^0 = 50
= 1x2^5 + 1x2^4 + 1x2^1
Since the latter number has fewer terms, let's choose it as the multiplier this time.
2. Note that when the multiplier's bit is zero, it does not need to be taken into account because
0 * n = 0. Now the multiplication can be expressed as:(110010) x (101101) = (1x2^5 + 1x2^4 + 1x2^1) x (101101) = ?
3. Hence, in the multiplication, each term is calculated once at a time and summed.
(1x2^5) => 100000 x 101101 = 1440
+
(1x2^4) => 010000 x 101101 = 720
+
(1x2^1) => 000010 x 101101 = 90
Nyt 45 * 50 = 1440 + 720 + 90 = 2250
4. Individual terms can still be conveniently calculated using bit shifting. For example:
101101 x 100000 = (101101 << 5) = 10110100000
A more efficient multiplication can thus be implemented as follows:
1. Examine the bits of the multiplier individually to calculate partial products.
1. If the multiplier's bit is zero, the corresponding partial product is also zero.
1. If the multiplier's bit is one, calculate the corresponding multiplication between that individual bit and the multiplicand using bit shifting.
1. Sum the partial products for the final result.
1. If the multiplier's bit is zero, the corresponding partial product is also zero.
1. If the multiplier's bit is one, calculate the corresponding multiplication between that individual bit and the multiplicand using bit shifting.
1. Sum the partial products for the final result.
Implement a program that calculates the multiplication with the method described above. The numbers to be multiplied are in registers
%r11 and %r12. The program returns the result in the %rax-register."
Hints
Messages
Efficient division by two (1p)¶
Earlier, you implemented a simple, but slow, division algorithm. In this task we will concentrate on optimizing a special case: division by two. If we use the previous subtraction algorithm to divide a big number by two, the code will need to iterate an amount that is half of the number to be divided.
If you look what happens to a number's binary form, when it is divided by two, you can see that all the bits are shifted right by one:
11100111 / 2 = 01110011. We can use this observation to implement a better division algorithm.We use a "cursor"-value that starts at two. In a loop we double this cursor, by adding it to itself. Doubling a value shifts all its binary bits to the left by one, so this cursor will look like:
10, 100, 1000, 10000, ... in binary. We then use this cursor value to check the status of each individual bit (except the first as it disappears anyway) in the input value we wish to divide by two. This is done with the bitwise and operation:Input: 00111011 (59) Cursor: 00000010 (2) 00111011 00000010 -> is set 00111011 00000100 -> not set 00111011 00001000 -> is set 00111011 00010000 -> is set 00111011 00100000 -> is set 00111011 01000000 -> not set 00111011 10000000 -> not set The actual inputs will have 64 bits.
Because we want to create a result where each bit of the input is shifted right by one, every time we find a bit that is set in the input, we add to the result (which starts at zero) half of the cursor. So, if the cursor is
100 (in binary) and it detects that the third bit is set, then we have to add 10 (in binary) to the result.To clarify, given the previous example and using a variable named value that will provide intermediate values:
Input: b00111011 (59) Cursor: b00000010 (2) value: b00000000 (0) Input: b00111011 Cursor: b00000010 -> is set Value: b00000000 + b00000001 (0 + 1) = b00000001 (1) -> Since cursor is set Input: b00111011 Cursor: b00000100 -> not set Value: b00000001 (1) -> does not change Input: b00111011 Cursor: b00001000 -> is set Value: b00000001 + b00000100 (1+4) = b00000101 (5)-> since cursor is set Input: b00111011 Cursor: b00010000 -> is set Value: b00000101 + b00001000 (5+8) = b000001101 (13)-> since cursor is set Input: b00111011 Cursor: b00100000 -> is set Value: b00001101 + b00010000 (13+16) = b00011101 (29)-> since cursor is set Input: b00111011 Cursor: b01000000 -> not set Value: b00011101 (29) -> does not change Input: b00111011 Cursor: b10000000 -> not set Value: b00011101 (29) -> does not change
This algorithm has a constant time worst case performance, because the amount of bits in the input is fixed.
The value to be halved is given in register
%r8. Write the result to register %rax. Dividend will never be zero.HINT: To avoid a chicken and the egg -problem, you probably want to keep track of the cursor's previous value, so you when the cursor is doubled, you have easy access to a value that is half the current cursor. Be careful not to overflow the cursor! The automatic tester will try larger inputs as well.
Hints
Messages
Modular exponentiation (3p)¶
Certain mathematical algorithms require the computation of a remainder of an exponentiation:
base ^ exponent % modulo. This is called modular exponentiation. At first, calculating it may look daunting, as even a simple expression of 123 ^ 456 yields a result with almost a thousand digits. That is way bigger than the 64-bit unsigned integer limit most computers can efficiently work with. However, the presence of the modulo operation allows us to greatly optimize this algorithm to be blazingly fast.123 ^ 456 might be hundreds of digits long, but if it is immediately followed by a modulo of 789, we know the answer is in range of 0-788. Also, modulo operation has a nice property: a * b % c = (a % c) * (b % c) % c, i.e. we can add modulos to the intermediary products and the final product will be the same. And because exponentiation is just repeated multiplication, we break it up to:a ^ b % c = (a * a * ... b times ... * a) % c = ((((a % c) * a % c) * a % c) * ...) * a % cBasically, as we calculate the exponent and multiply the exponent base with itself, we take a modulo after each of the multiplication steps, which ensures the result never grows too big. This is a good improvement, but it still has a problem: if we need to calculate lets say
101 ^ 1100303023 % 93, we need to perform more than a billion multiplication and modulo operations.The last improvements are a bit technical. If you want a proof for why they work, you should read the attached link above.
First we reduce the base of the exponentiation by taking the modulo of it:
base <- base % modulo. We also initialize the result to one. We then go through all the bits in the exponent, from least significant to most. (You can use the same cursor-method as in the efficient division by two -task, but this time the cursor starts at one.) For each bit that is set, we update the result: result <- result * base % modulo. After this, in the same iteration of the loop, we update the base (no matter if the bit in the exponent was set or not): base <- base * base % modulo.Pseudo-code for the algorithm:
base := base % modulo
result = 1
for each bit in exponent from LSB to MSB:
if bit is set:
result := result * base % modulo
base := base * base % modulo
The result will be ready after the loop.
Please, note that you need to use the efficient multiplication and the efficient module if you want to keep the results in a reasonable set of instructions. Be careful, when you call to subroutines to not override registers. Consider also where do you place the stack.
The base of the exponent is given in
%r8, while the exponent will be in %r9 and the modulo in %r10. Write the result to %rax. All the values will be non-negative, the modulo at least two, and the base at least one.HINT: You should reuse the efficient multiplication and efficient modulo you implemented earlier. Call them as subroutines.
Hints
Messages