Processor Architecture¶
Please refer to the course book Bryant & O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd edition. Chapter 4.
Learning Objectives: After reading this material, the student will understand the general parts of processor architectures and the programmer-visible architecture of the y86-64 processor used as an example in the course.
Let’s expand a bit on the general computer system architecture introduced in the previous course section.
In the internal architecture of a processor, five parts can be distinguished:
1. The ALU (Arithmetic Logic Unit), which performs arithmetic and logical operations on data in the registers.
2. The Control Unit (control unit), which "drives" the processor (implements the sequential logic): it controls and synchronizes the operation of different components and subsystems of the processor and manages the system and I/O buses.
3. Registers, which are the internal memory of the processor. They contain the instructions and the data required by them, as well as other information such as the state of the processor. The image below shows the register layout of the 8086 processor.
1. The ALU (Arithmetic Logic Unit), which performs arithmetic and logical operations on data in the registers.
2. The Control Unit (control unit), which "drives" the processor (implements the sequential logic): it controls and synchronizes the operation of different components and subsystems of the processor and manages the system and I/O buses.
3. Registers, which are the internal memory of the processor. They contain the instructions and the data required by them, as well as other information such as the state of the processor. The image below shows the register layout of the 8086 processor.
- Some registers are general-purpose (GPR, general purpose registers), whose use is determined by the programmer/compiler.
- Although registers can be used freely, there are established conventions to improve compatibility between machine and assembly language programs. More on this shortly...
- Some registers have a defined purpose related to program execution.
- The instruction register (PC, program counter / instruction pointer)
- (In this x86 architecture) memory address registers (segment registers) and the stack pointer (Stack segment). The purpose of these registers is to point to blocks of main memory reserved for specific uses.
- The status register of the processor, whose bits act as status flags (Condition codes) to indicate the state of the processor to the control logic so that, for instance, errors and hardware interrupts can be detected.
4. Memory:
- Main/working memory (Random Access Memory, RAM), which contains the program code and data.
- Devices connected to the I/O bus can also be considered part of this memory through memory-mapped I/O. An example is provided in earlier material with I/O device drivers and their registers.
- Stack memory, a memory area reserved in RAM for the processor, used as auxiliary memory during program execution.
5. Internal Buses of the Processor: Separate data, instruction, and address buses that may also be part of the system bus.
- An example includes external RAM or co-processors.
Example: The architecture of the 8088 processor used in the "original" IBM PC.
Example: The I/O bus solution of a modern PC. Hierarchical I/O bridges (north and south bridge) connect buses of different speeds, types, and sizes.
Architecture Models¶
Historically, the architectures of programmable computer processors have been divided into the Harvard and von Neumann architecture models, whose operating principles were created as early as the 1930s and 40s. The key difference between the models is in the system bus solution, i.e., in how the program and data are fetched from memory and the implications of that. It is essential to understand these models because both are still in use today.
1. In the Harvard architecture model, the program and data are stored in separate memories, each with its own bus. This means separate memory/address spaces are used for program memory and data memory, so memory addresses are not unique. When talking about a processor following the Harvard architecture today, it usually refers to a modified Harvard architecture that uses these separate memory spaces.
In principle, the Harvard architecture is a fast solution because memory addresses can be accessed simultaneously in both memories through separate buses. A current example of the Harvard architecture is embedded systems, which have separate memories, such as Flash memory for programs and RAM for data.
2. In the von Neumann architecture model, a single system bus is used for fetching both program instructions and data from the same (physical) memory. Storing both the program and data in the same memory actually enabled the creation of a general-purpose computer, as programs could be treated like data, meaning they could be loaded into memory, overwritten with new code, stored from memory to mass storage, etc.
It should be noted that the von Neumann model actually describes the building blocks of a modern computer system: separate control and execution units in the processor, an I/O bus, and separate memory. The von Neumann architecture leads to a simpler processor implementation, but there are delays in program execution since the program and its data cannot be fetched simultaneously. However, the impact of this problem can be reduced with caches and other methods, which we will cover later in the material.
Today, the von Neumann architecture is the model for general-purpose computers, such as modern PC workstations. However, modern processor and computer system architectures are hybrids of both models, incorporating the best aspects of each.
y86-64 Processor Architecture¶
In this course, we use the y86-64 processor developed for educational purposes. This processor is a (significantly) stripped-down version of the x86 processor architecture commonly found in modern PCs, but its instruction set and internal operations are intentionally very similar. The modern x86 family architecture is highly sophisticated, the result of decades of work by thousands of engineers, which we cannot cover in depth within the timeframe of an introductory course.
Below is an image of a simplified y86-64 processor architecture. Generally, the y86-64 processor includes an ALU, general-purpose and special-purpose registers, three processor status bits, and main memory. Memory is divided between programs and data (following the von Neumann architecture), with part of it allocated as stack memory. We will also cover the control unit logic as it relates to program execution.
General overview of the y86-64 processor architecture:
- The architecture is 64-bit, meaning memory locations and registers are 64-bit.
- Thus, the word size is 64 bits.
- Numbers are represented in two's complement, with the MSB as the sign bit.
- The byte order is little endian, meaning least significant byte first.
- The bit order is big endian, meaning most significant bit first.
Below is an image showing the programmer-visible state of the processor and memory.
Registers¶
When programming in assembly language, general-purpose registers are memory locations for machine language variables. Additionally, the processor's operation requires a set of its own registers.
The y86 processor has several registers, some of which have designated purposes:
- General-purpose registers (GPR):
%rax
,%rcx
,%rdx
,%rbx
,%rsi
,%rdi
,%r8
-%r14
- Base pointer for the stack:
%rbp
- Stack pointer register:
%rsp
- Status register STAT: Indicates the execution status of the program, see below.
- Note! This register is different from the processor status bits.
- Program counter (PC) contains the memory address of the next instruction to be executed.
Note! In assembly language, the
%
symbol indicates a programmatically accessible register. The register names follow the naming conventions of the x86 processor family.Processor and Program State¶
The behavior of an assembly/machine language program is based on checking the values of status flags/bits indicating the processor's state. A separate status register is reserved for these status bits, commonly known as flags:
- A status flag is set when the processor receives an interrupt.
- The conditional behavior of machine language programs (comparison operators, jump instructions, etc.) relies on the status bits in the status register.
- Errors resulting from an instruction are indicated by setting a status flag.
In general, as a result of executing machine language instructions, the processor's status flags change, and the subsequent instructions respond to these flags. This enables the creation of program logic using status flags in machine language.
The y86 processor has three status flags/bits (condition codes, CC):
1.
1.
ZF
(zero flag): Indicates if the result of the previous ALU operation was 0. This flag is often used for equality checks.- ZF=0, result is not 0
- ZF=1, result is 0
Example of subtracting a positive number from itself: 01111111 - 01111111 -------- 00000000 ZF=1, because the result is 0
2.
SF
(sign flag): Indicates if the result of the previous ALU operation was negative.- SF=0, result > 0
- SF=1, result < 0
Example of adding a positive and a negative number: 00001111 10000001 -------- 10010000 The MSB is 1, so the result is negative -> SF=1
3.
OF
(overflow flag): Indicates if an operation resulted in an overflow, meaning the result did not fit within the register's value range (exceeds the word size).- OF=0, no overflow
- OF=1, overflow occurred
Example of adding two positive numbers: 01111111 01111111 -------- 11111110 The result cannot be represented as a positive number, MSB = 1 -> OF=1 Additionally -> SF=1 Example of adding two negative numbers: 10000001 + 10000001 -------- 100000010 The MSB is 1, but it gets truncated 00000010 The result is positive -> OF=1
The status register (
STAT
) in y86 indicates the execution status of a program, which can be:AOK
: All OKHLT
: Processor haltedADR
: Invalid memory address referencedINS
: Instruction error / invalid instruction
In real processors, there are additional status bits, such as the 16-bit status register in the 8086. Real processors often include an "extra" bit known as the carry flag, which helps perform operations that exceed the word size.
Example: An 8-bit value range + carry flag effectively gives the program 9 bits. However, the state of the carry flag can only be read, not modified directly. Machine languages often have specific instructions that respond to the carry flag.
Memory Addressing Modes¶
In assembly/machine language, the operands of an instruction can be presented and used to address data/memory in different ways. When interpreting an instruction, the processor calculates where to find the data required by the instruction based on the addressing mode.
y86-64 has three addressing modes:
- Immediate addressing, where the operand is a constant value:
$number
. Examples include values$1
,$-13
, or$0x1F
. - Example: Store the decimal number -13 in the %rsi register with the instruction
irmovq $-13,%rsi
. - Register addressing, where the operand is the current content of a register:
%register
. - The ALU typically performs operations only on data in registers, so the data must first be stored in a register.
- Example: Store the value of register %rax in register %rbx with the instruction
rrmovq %rax,%rbx
. - Indirect addressing, where the operand is fetched from the memory location pointed to by a register. This register is denoted with parentheses:
(%register)
. - A number can be added in front of the parentheses
n(%register)
(base + displacement), which points to the memory locationregister value +- n
, where n can be positive or negative. (This feature is typically used for stack memory access.) - Example: Retrieve a data value from memory at the address in %rax and store it in %rbx with the instruction
mrmovq (%rax),%rbx
. - Example: Retrieve a data value from memory by calculating %rax value - 16 and store it in %rbx with the instruction
mrmovq -16(%rax),%rbx
. - This addressing mode corresponds to a pointer in C language, so (%register) could be represented in C as uint64_t *register.
Main Memory¶
The main memory in the y86 processor is approximately four kilobytes. The size of a memory location is the word size, which is 64 bits, or 8 bytes.
Keep in mind the bit and, especially, byte order, which is little-endian: least significant byte first. This differs from the definitions used in the C language section!
Conclusion¶
This introductory material in the lecture chapter provided the foundational knowledge needed for understanding the operation of processors and computer systems, the common problems encountered, and their solutions, which will be covered later in the course.
Give feedback on this content
Comments about this material