PART II. COMPUTER ARCHITECTURE¶
In this part of the course, we will explore the architecture and operation of the processor (microprocessor, central processing unit, CPU) and the computer system. Details on completing this course section can be found on the second part of the course instructions.
But before diving deeper into processors and computer systems, let’s take a brief look back in time…
The History of the Programmable Computer¶
Milestones in the history of the modern programmable computer.
- 100-200 BCE: The ancient Greeks developed the Antikythera mechanism, which could mechanically calculate the movements of celestial bodies.
- 1670-80s: Mathematician Gottfried Wilhelm Leibniz developed a model of a mechanical calculator capable of the four basic operations (addition, subtraction, multiplication, and division) and later expanded the model to solve algebraic equations. He also built the Leibniz wheel, the first widely used mechanical calculator. Leibniz is also considered the father of the binary number system!
- 1804: Joseph-Marie Jacquard introduced the loom, which could weave patterns into fabric using punch cards.
- 1834: Charles Babbage presented a plan for constructing an analytical engine, capable of performing various calculations with its own machine language. However, the technology of the time, steam power and gears, was not well-suited for building a complex calculator.
- 1843: Ada Lovelace presented an algorithm (=computer program) designed for Babbage’s engine, which is why she is considered the first computer programmer.
- 1847: George Boole introduced his algebra based on truth values, laying the foundation for digital logic later common in computers.
- 1936: Mathematician Alan Turing presented a theoretical model of a general-purpose computer.
- 1938: Claude Shannon demonstrated in his master's thesis how Boolean algebra could be applied to implement digital logic. Shannon is also known as the father of information theory, which modern telecommunication technology largely relies on.
- 1941: Konrad Zuse introduced the first programmable automatic calculator.
- 1942: The Atanasoff-Berry computer was introduced, considered the first electronic computer, though it was not programmable or “general-purpose.”
- 1944: Howard Aiken, in collaboration with IBM, built the electromechanical calculator (Harvard Mark I), which implemented Babbage's calculator functionality.
- 1945: The general-purpose ENIAC computer was presented at the University of Pennsylvania, capable of performing arithmetic operations with decimal numbers (vs current binary representation).
- 1945: Konrad Zuse also published the first high-level programming language, Plankalkül.
- 1949: The general-purpose EDVAC computer was introduced, which used the binary system and could store programs in memory (i.e., a stored-program computer). Mathematician John von Neumann played a significant role in designing the machine. Konrad Zuse also independently developed a similar computer.
- 1952: Grace Hopper developed the first compiler that could translate programming languages resembling English into machine code.
- 1959: The general-purpose COBOL programming language, based on Hopper’s ideas, was released.
- 1964: Douglas Engelbart introduced the first computer with a (modern-like) mouse and graphical user interface.
- 1971: Intel released the world’s first commercial microprocessor, the Intel 4004.
- 1975: The Altair 8080 microcomputer was published, which enthusiasts could assemble from a kit.
- 1976: Steve Jobs and Steve Wozniak introduced the Apple I home computer.
- 1978: Intel introduced the first processor of the x86 family.
- 1985: Microsoft released the first version of the Windows operating system.
- 1991: Linus Torvalds (then a student at the University of Helsinki) introduced the open-source Linux operating system, based on Unix.
Finland's first own computer was the ESKO (Electronic Serial Computer), built between 1954-1960. The University of Oulu’s first computer was the Elliott 803 from the 1960s.
Professor Matti Otala and the Elliott 803 at the University of Oulu in 1970.
The Elliott 803 was programmed using the ALGOL programming language.
FLOATING POINT ALGOL TEST' BEGIN REAL A,B,C,D' READ D' FOR A:= 0.0 STEP D UNTIL 6.3 DO BEGIN PRINT PUNCH(3),££L??' B := SIN(A)' C := COS(A)' PRINT PUNCH(3),SAMELINE,ALIGNED(1,6),A,B,C' END' END'
Moore's Law¶
Moore's Law is an observation made by Gordon Moore (co-founder of Intel) about how, with the advancement of technology, the number of transistors on a microchip doubles approximately every two years. In computer technology, Moore's Law has been used / is used as one measure of the development of computer systems.
Moore's Law has been the subject of ongoing discussion, but so far it has held remarkably true. The number of transistors has even doubled faster, approximately every 18 months. Although in the last years there has been discussion if the Moore Law would be valid in the near future, it looks like still it is valid.
The increase in the number of transistors does not directly translate to an increase in computing power (as we will see in the lecture materials). However, within the same physical scale, more internal logic and memory can be added to processors, leading to more efficient operation, the implementation of more extensive machine language instruction sets, etc.
In the images below, on the left, is the change rate of Intel's processor instruction set, and on the right, the increased capacity of DRAM memory chips. The guiding principle of Intel's instruction set has been backward compatibility across different processor generations.
From C Language to Machine Language¶
In this course, we have already been introduced to one high-level programming language, namely C. Alright, alright... earlier we referred to C as a low-level hardware-oriented language, and it still is compared to many newer languages, such as Python from the Basics of Programming course. However, for the CPU, C is still far too high-level/expressive to be understood.
A program must be presented to the processor in a form it can comprehend through its digital logic, that is, machine language. As we will see, a machine language program consists of a series of very simple instructions that operate on data (primarily) located in the internal registers of the processor.
Even though programming in machine language is now rare (or rarely necessary), understanding machine language and how the processor operates helps in producing the most efficient code possible by maximizing the use of the processor's power. This knowledge is standard professional skill for computer engineers and also reflects in high-level programming. In embedded programming, there are situations in the industry where time-critical sections of code must be implemented in assembly language.
In the past, machine language (or assembly language) programming was the only option for coding or producing efficient code for the then "embedded systems"... like space rockets, for instance.
(Hamilton didn't write all this code herself, of course. She also led the Apollo project coding team...)
The C Compilation Process¶
Since there are different CPU architectures, machine language is hardware/processor-dependent, whereas C has a standardized implementation that works across systems. C must be compiled into the machine language of each specific processor using a compiler. We won't dive deeply into compiler construction in this course, but understanding the general compilation process is essential. And yes, a compiler is also a computer program...
In the first phase, a C program is preprocessed and compiled into assembly language. Next, the assembler converts the assembly implementation into machine language in object file format (with extensions like .o, .so, or .obj). Then, the linker combines object files into an executable machine code program. Often, our program uses existing system libraries (such as stdio in C, RTOS libraries, etc.), whose object-form implementations are linked into our executable by the linker. If the program uses other libraries, they are attached to the output at this stage, such as the C standard library's
printf
function. The end result is a machine code program that, when run, the operating system's loader loads into RAM and starts the execution.Assembly and Machine Language¶
Processors are built on digital microcircuits (discussed in previous courses) that only handle bits through their combinatorial and sequential logic using Boolean algebra. Thus, the commands understood by the processor must be represented as bits, i.e., binary numbers, where each bit in a command has a meaning known by both the programmer and the processor. Machine language, at its core, is just bits.
Generally, a binary number describing an instruction has bits that identify the command, while others specify the operands and their sources. It becomes clear that machine language instructions are far simpler than high-level language instructions and statements, performing only very small operations at a time. Now, for those used to programming in high-level languages, even simple functionality requires many lines of machine code. The binary representation of machine code is obviously challenging for programmers to handle (think of Hamilton's team...), but understanding its logic and function helps programmers think like a computer when designing and writing code. The result is more efficient code overall when we understand how the processor operates and how our code is executed (optimally, in the best case) within the CPU.
To facilitate programming in machine language, a symbolic representation (i.e., human-readable) of machine language syntax, called assembly language, is used. Assembly instructions are designed to be readable by humans as text, with named instructions, etc., and map directly (even mentally) to actual machine language. We will delve deeper into assembly languages in the next lecture materials, but let's present examples below.
Example. C program
mahtijuttu.c
:int main() {
int i=0,a=0;
for (i=0; i<10; i++) {
a += i;
}
}
...is compiled with the command
gcc -S -Og mahtijuttu.c
, producing the assembly language translation mahtijuttu.s
for x86. Let's take a look at what it contains:main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl $0, -4(%rbp) # declaration of i: i=0 movl $0, -8(%rbp) # declaration of a: a=0 movl $0, -4(%rbp) # initialization (loop start) i=0 jmp .L2 # jump to condition .L2 .L3: movl -4(%rbp), %eax # load i from memory into register addl %eax, -8(%rbp) # a = a + i addl $1, -4(%rbp) # i++ .L2: cmpl $9, -4(%rbp) # compare, is i < 10 jle .L3 # if true, jump to execution .L3 popq %rbp .cfi_def_cfa 7, 8 ret # return to main .cfi_endproc
Okay, it looks quite cryptic. Each assembly instruction is presented on its own line and consists of two parts: the instruction code and its operands. For example, the line
movl $0, -4(%rbp)
has the instruction movl
followed by operands (which correspond to variables in C) represented in assembly as registers and/or memory addresses. In the example, the operands are the value 0, denoted $0
, register rbp
, and memory address -4
. We will revisit this in more detail...(The
.
-prefixed commands are directives to the assembler on how to compile the program into machine code, akin to C's preprocessor directives. The program also shows several named code blocks, such as main:
or .L2:
. Clearly, the main block could correspond to our program's main function.)Next, we compile the C code directly to machine language with the command
gcc -c mahtijuttu.c
. The result is a machine code object file mahtijuttu.o
, where machine language instructions are in binary format understood by the processor. This machine code file can then be disassembled back into assembly language using the objdump -d mahtijuttu.o
command, as shown below.0000000000000000 main: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) b: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp) 12: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) 19: eb 0a jmp 251b: 8b 45 fc mov -0x4(%rbp),%eax 1e: 01 45 f8 add %eax,-0x8(%rbp) 21: 83 45 fc 01 addl $0x1,-0x4(%rbp) 25: 83 7d fc 09 cmpl $0x9,-0x4(%rbp) 29: 7e f0 jle 1b 2b: 5d pop %rbp 2c: c3 retq
On the left, we have memory addresses and the machine code stored as hexadecimal numbers. Note that one assembly instruction can correspond to 1-n bytes. Looking at the example, while assembly language was already quite cryptic, it is much more readable than machine code, right?
In the example, the line
0: 55
indicates that at memory address 0x0, there is the instruction 0x55. On the right, the corresponding assembly instruction push %rbp
is shown. Similarly, we can deduce from reading the code that the movl
instruction seems to correspond to the hex values 0xc7
and 0x45
. Since the structure of machine language instructions is precisely defined, they can be interpreted back into assembly language by a disassembler program, like the above-mentioned objdump program.Below is an example of the same loop compiled for an 8-bit RISC architecture embedded microcontroller, ATtiny2313. Note that the instruction set in the AVR architecture is simpler, so more instructions are needed for the machine code representation of the same C code. Also, observe the different byte order compared to the x86 architecture.
Note! Converting machine or assembly code back into C is not feasible because C code has much greater expressiveness. Was a for- or while-loop structure used in the original C program? Was the conditional statement
i==9
in C, or i < 10
?Conclusion¶
This brief presentation provided general knowledge about the history of computer technology and showed how a C program is translated into machine code. In fact, even the processor doesn't always execute machine language instructions one at a time; instead, a machine language instruction is internally broken down into a microprogram, which the processor then executes as such. This applies to complex machine language instructions, but more on that later...
Give feedback on this content
Comments about this material