Bits and Number Systems¶
Learning Objectives: After going through this material, you will understand how information is represented in a computer using binary numbers and you will be able to convert between different number systems used in computer technology.
In earlier programming courses, we learned which number systems are recognized by computers (and different programming languages): binary, decimal, and hexadecimal numbers. Of these, the decimal system is included only for us humans, i.e., programmers and users. The computer, being a humble calculator, only needs binary numbers, and low-level programmers need to know how the computer interprets binary numbers and their bits.
Bits in Computer Memory¶
A bit is the smallest part of an information element that can be processed. A bit has two mutually exclusive logical states, 0 and 1. The logical state here means that it is up to the processor, firmware/operating system, and programmer to decide what the bit states represent.
A single bit represents very little information, but when bits are placed in a sequence as a binary number, a wide variety of information can be represented. In previous lectures, we learned that the processor's architecture defines the size of the data element (word length) as an n-bit binary number.
For example, when dealing with a peripheral device in an embedded system, the logical state 0 could mean that the peripheral device is ready to receive commands, and 1 could mean that the device is busy, i.e., still processing the previous command.
Memory Location¶
It is generally straightforward and useful for the computer's memory to be organized into memory locations of the same size as the word length. This way, the word length (data bus) and memory size (address bus) determine the required width of the memory bus.
As an example, below is a snippet of memory from an 8-bit computer architecture. Now, the memory consists of only 16 memory locations, so the address bus width is 4 bits (2^4=16). (Such a small memory could be non-volatile EEPROM memory in an embedded device.)
Memory Address | Content | Interpretation in Decimal |
0000 | 00000011 | 3 |
0001 | 11111111 | -1 |
0010 | 10110111 | -73 |
0011 | 00110110 | 54 |
.. | .. | .. |
1111 | 00110111 | 55 |
In the memory above, at memory address 3 (0011), the value 54 (00110110) is stored. This situation could occur when we have declared an 8-bit variable in the program, which is initialized to the value 54. The compiler and operating system (without going into details) determined that this value should be stored at memory location number 3, from which it will be available to the program during its execution.
Example: You can easily think of the memory location address as a street address. Each house on the street has a number that uniquely identifies the house, and each house can store one piece of information. In the memory example above, the number 54 would be stored in house number 3, from which it can be retrieved or replaced with a new number. (We will return to this house analogy in later material...)
Naming Conventions¶
Now, different-sized binary numbers have generally been given specific names in computer science:
- An 8-bit binary number is called a byte .
- 16/32-bit numbers are referred to as a word. We know that the word length varies from one architecture to another, and the C language standard does not define the length of a word either.
- A 32/64-bit number is called a integer / long integer (also can be seen as long word).
It is important to be familiar with these terms as part of the professional skills of an information/electrical engineer, and they are commonly used in digital and computer technology.
Interpretation of a Binary Number¶
Unfortunately, it is not enough to simply associate the variable name in our program with a memory location and its value. Additionally, the computer must know how to interpret the binary number stored in memory.
Bit Order¶
When representing a binary number to the processor's arithmetic unit, there are still a couple of complications, as it must also be decided how the processor interprets the sequence of bits in a memory location.
In the decimal system, we have agreed that the most significant digit is the leftmost digit, with decreasing significance as we move to the right. For example, in the number 217, the greatest weight is on the digit 2, which represents hundreds. So, the number is
2 x 100 + 1 x 10 + 7 x 1 = 217
. However, nothing but mutual agreement prevents us from reading the number in the opposite direction, 7 x 100 + 1 x 10 + 2 x 1 = 712
.The same applies to binary numbers. Now, for example, the number 1011 can be read either left to right or the other way around. Depending on the direction, the value of the number can change, which can cause confusion if there is no agreement.
8 + 0 + 2 + 1
=11
8 + 4 + 0 + 1
=13
When defining the direction of reading, we talk about bit order, which requires two definitions:
- The Most Significant Bit (MSB) refers to the most significant digit in the number, i.e., the bit with the highest power of two.
- The Least Significant Bit (LSB) is the bit with the smallest power of two.
- Now, the reading direction is from the MSB towards the LSB, with the power of two decreasing by one each time.
Well, the bit order has already been decided for us by the processor architecture designers. Since the direction can still vary even today, it is mentioned in the processor's manuals. Typically, the programmer does not need to pay attention to this aspect because in programming languages, the bit order can be defined as desired, and compilers will adjust the code to suit the processor. However, in embedded systems, when dealing with peripherals, you may need to pay attention to bit order.
Byte Order¶
It’s still not enough to know in which order the bit sequence of a binary number should be interpreted. Especially in embedded systems, we often deal with numbers that are larger than the word length!
For example, in an 8-bit embedded system processor architecture, we may commonly handle 16- or 32-bit numbers.
The solution is to split the number into chunks that fit the word length, so in an 8-bit architecture, a 16-bit number becomes two 8-bit numbers, and a 32-bit number becomes four 8-bit numbers. These numbers are then stored in consecutive memory locations. For example, in 8-bit Arduinos, C compilers generally allow the use of larger numbers, and the compiler adapts the code to suit the microcontroller (in machine code, operations on larger numbers must be done in steps).
Let’s revisit the memory example above:
Memory Address | Memory Content | Interpretation in Decimal |
0000 | 00000011 | 3 |
0001 | 11111111 | -1 |
0010 | 10110111 | -73 |
0011 | 00110110 | 54 |
Now, in this memory dump (a raw copy of the memory contents), there are four bytes, which could represent four 8-bit numbers, two 16-bit numbers (memory locations 0-1 and 2-3), or one 32-bit number (memory locations 0-3). Okay, but… this raises the question of in which order the bytes should be read. Is the first byte of the 32-bit number in memory location 0 or 3?
Well, as expected, the byte order (i.e., the direction in which the bytes are read) has also been determined during the design of the processor architecture. Here, two general terms are used:
- Big endian: reading order is left to right. In the example memory, the order would be from memory location 0 to memory location 3.
- Little endian: reading order is right to left. In the example memory, the order would be from memory location 3 to memory location 0.
Example: The current x86-64 processor architecture used in PC workstations is little endian, but the bit order is most significant first, meaning bits are read from left to right, MSB -> LSB.
Agreement¶
To ensure we understand each other in this course, let's define the common reading directions:
- The most significant byte is the leftmost byte (big endian), i.e., the smallest memory location in memory.
- The most significant bit (MSB) is the leftmost bit of the number.
- Whenever interpreting a binary number, we move from left to right, from the largest bit weight to the smallest.
Example: Interpretation of a 16-bit number. MSB -> LSB 1100101000101100 = -13780
Striving for Efficiency¶
Great, but why are we so concerned about bytes and words? Wouldn't it be enough to always use 64-bit architecture in applications, so we never run out of range and compilers can handle large numbers?
When programming on a workstation, this isn’t usually a concern because there are plenty of resources like memory and processing power. However, in embedded systems, microcontrollers typically have a small amount of RAM, often just a few kilobytes, which must be shared between the application, firmware, and possibly the operating system. Therefore, memory usage needs to be carefully considered.
The key here is to use variables that are just the right size in the program. Now, if the number range we need fits within an 8-bit variable, why define a 64-bit variable for that purpose? In the latter case, memory usage would be eight bytes, whereas the application could actually get by with just one byte. For someone used to workstation programming, saving a few bytes may seem insignificant, but let's look at it from an embedded perspective with an example.
Example: ATmel's ATtiny series microcontrollers have RAM that is less than one kilobyte (1024 bytes). Let's do some calculations. Now, if 32-bit variables are used, the maximum number of variables that can fit in memory is (1024 bytes / 4 bytes = 256). If 16-bit variables are used, 512 variables could fit in memory (1024 / 2 bytes = 512). But isn’t a hundred variables quite a lot for a single program? Well, it depends on the application. For example, if we are collecting accelerometer data at a fast pace (several times per second) from three axes (X, Y, Z) into an array, a few kilobytes of memory may no longer be enough...
We will return to this topic in the next material, where we introduce the C language variable types.
Negative Binary Numbers¶
Earlier, we used binary numbers to represent positive numbers, but hey… what about negative binary numbers?
No worries, there are several solutions proposed in computer science, which are based on the agreement to reserve one bit of the binary number as a sign bit to represent the sign (+ / -). Typically, this bit is the MSB, where a value of 0 indicates a positive number, and 1 indicates a negative number. When one bit is reserved for the sign, we are left with n-1 bits for the actual number range.
For example, the number range of an 8-bit number is reduced to 7 bits, so the positive number range would be only
0..127
. However, with the help of the sign bit, we can now include a negative number range of -128..-1
. Without the sign bit, the number range would be 0-255
, so the range effectively shifts in the negative direction by the value of the MSB.Signed Magnitude¶
In the signed magnitude representation, the most significant bit (MSB) is reserved as the sign bit. This means, for example, that 0010 would be 2, and 1010 would be -2.
Notice that in this case, zero has two representations: 0000 (representing +0) and 1000 (representing -0). This is problematic because the program has to account for both representations of zero.
One's Complement¶
Now, a negative number is obtained by taking the logical negation of the positive number. For example, if the number 3 is represented as 0011 in binary, then -3 would be 1100.
In this representation as well, zero has two representations: 0000 and 1111, so it is neither ideal nor suitable for our purposes.
Two's Complement¶
The two's complement representation is commonly used in digital technology/computing, so we will spend a little more time and course material on it.
Again, let's agree that the most significant bit (MSB) is the sign bit, where 0 represents positive and 1 represents negative.
- Positive numbers: Nothing special.
- Negative numbers: There is a specific method for representing negative integers.
1. Take the negation of the positive binary number (i.e., flip the bits: 0s become 1s and vice versa).
2. Add 1 to the negated number.
2. Add 1 to the negated number.
Examples with 4-bit numbers. Number 1 0001 Number 2 0010 Negation 1110 Negation 1101 Addition +1 Addition +1 ---- ---- 1111 = -1 1110 = -2
Conversion back from negative to positive is done in the same way (convenient!):
Number -1 1111 Number -2 1110 Negation 0000 Negation 0001 Addition +1 Addition +1 ---- ---- 0001 = 1 0010 = 2
Example: The image shows 4-bit numbers without a sign and in two's complement representation.
Convenience¶
Two's complement is also convenient because the addition of two's complement numbers works the same way, whether the operand is positive or negative. This significantly simplifies the implementation of logic circuits that perform arithmetic in digital technology.
Example: Calculate with two's complement numbers 2+3 = ? and 2-3 = 2+(-3) = ? Number +2 0010 Number +2 0010 Number +3 0011 Number -3 1101 + ---- + ---- 0101 = 5 1111 = -1
In signed magnitude and one's complement representations, this doesn't always work. Try to find an example where addition fails!
Greetings from the Tricks Department¶
And now, a tip from the tricks department… converting a negative binary number to decimal can be done without a calculator by using simple addition. The idea is to think of it this way:
- The most significant bit (MSB) is considered as a negative value.
- The other bits retain their positive values.
In the image below, the value of the MSB bit is considered negative, and the others are positive.
Now, the conversion happens by adding together the decimal values corresponding to the bits.
Example: 8-bit numbers 99 and -99. 99 in binary is 01100011, which is 64 + 32 + 2 + 1 = 99 -99 in binary is 10011101, which is -128 + 16 + 8 + 4 + 1 = -99
Hexadecimal Numbers¶
Another number system often used in computer technology is base 16. Here, we are talking about hexadecimal numbers, which include the digits 0-9 and 10-15, where the latter are represented by the letters A-F.
Notice that each hexadecimal digit can be represented with 4 bits (a so-called nibble). This is one of the main reasons why hexadecimal numbers are used in computer science and programming. Now, a byte can be represented by two hexadecimal digits (vs. a decimal number that is awkward to interpret as bits, or a binary number that is visually cumbersome).
Conversion between hexadecimal and binary numbers is easy once we know that each hexadecimal digit corresponds to 4 bits. You simply need to convert each hexadecimal digit to its 4-bit binary equivalent.
Example: The binary number 01011101 is hexadecimal 5D and in decimal is 93.
5 D 0101 1101 0 + 64 + 0 + 16 + 8 + 4 + 0 + 1 = 93
Example: Convert the hexadecimal number 173A4C to binary?
1 7 3 A 4 C 0001 0111 0011 1010 0100 1100
In many programming languages, hexadecimal numbers are presented with the prefix
0x
. For example, all of the following numbers are valid hexadecimal numbers. 0x0 0x0123456789ABCDEF 0xA 0x1000 0x001 0xBEEF 0xC0DE 0xBA5EBA11
Conclusion¶
... sometimes working with different number systems can take unexpected turns (http://www.xkcd.com/571)
Well, now it's clear to us what happens in the third panel! Our friend is counting sheep using 16-bit two's complement numbers, and it so unfortunately happens that while waiting for sleep, the positive number range runs out, causing the sign bit to flip, and suddenly the count moves into the negative number range.
... 0111111111111111 = 32767 sheep +1 sheep 1000000000000000 = -32768 sheep // the sign bit flipped when MSB changed from 0 to 1 +1 sheep 1000000000000001 = -32767 sheep ...
Give feedback on this content
Comments about this material