Sequential instruction cycle¶
Learning Objective: Execution of machine language instructions in a sequential processor.
In this course, we introduce two versions of the y86-64 processor, the first of which implements sequential instruction execution. This means that every instruction is fully executed within a single clock cycle. The execution of an instruction is divided into smaller phases across the processor's subsystems, which execute the instruction sequentially within the same clock cycle. Depending on the instruction, different subsystems are active during various phases of execution. The rationale for sequential execution lies in the simplicity of processor design, as having a uniform instruction cycle for all instructions reduces the complexity of control logic and subsystem components.
In modern processors, this approach is considered a slow method of executing programs, as we will see in future materials. However, understanding the concept is essential before exploring more efficient implementations.
In all processors, instruction execution involves a register called the Program Counter (PC) (also known as the program counter or instruction pointer). This register points to the memory address of the instruction to be fetched for execution. As execution progresses, the PC register's value is updated to the address of the next instruction to be executed. Once the execution sequence begins again, the instruction at the memory location indicated by the PC is fetched for execution.
Instruction Execution Sequence¶
In a sequential y86-64 processor, the execution of an instruction, or the instruction cycle, consists of six phases. We will use their common English names for clarity:
1. Fetch: The instruction is fetched from the memory location indicated by the PC register, and its opcode and operands are decoded in the microarchitecture.
2. Decode: The values of the operands are read from the registers they reference and prepared for the ALU.
3. Execute: The ALU performs the arithmetic or logical operation specified by the instruction's opcode, calculates memory addresses for memory references, or updates the stack pointer.
1. Fetch: The instruction is fetched from the memory location indicated by the PC register, and its opcode and operands are decoded in the microarchitecture.
2. Decode: The values of the operands are read from the registers they reference and prepared for the ALU.
3. Execute: The ALU performs the arithmetic or logical operation specified by the instruction's opcode, calculates memory addresses for memory references, or updates the stack pointer.
- Note: The y86-64 processor's ALU supports only four simple operations, which are sufficient for building a functional processor.
4. Memory: If the instruction references memory, the value is read from or written to the memory location specified by the operand.
5. Write Back: The results of the ALU operation are written back to the destination register.
6. PC Update: The PC register is updated with the address of the next instruction.
5. Write Back: The results of the ALU operation are written back to the destination register.
6. PC Update: The PC register is updated with the address of the next instruction.
While the Execute phase is mandatory for every instruction, not all phases are required for all instructions. For example, if an instruction does not involve a memory reference, the memory phase is skipped.
The following diagram provides an overview of the instruction cycle for the y86 processor. Note that the processor internally stores the results of each phase in registers (invisible to the programmer) and controls the process through signals between its subsystems. (A signal in this context is a message that may also carry data.)
This instruction cycle is then repeated endlessly by our processor unless an exception occurs or the program execution ends.
Subsystems¶
Let’s delve into the functionality of the processor's subsystems.
As shown in the diagram, the microarchitecture uses several internal registers (e.g., srcA, dstM, valC, etc.) and control signals (e.g., Cond) for sequencing logic. However, understanding their detailed usage is unnecessary for grasping how the instruction cycle is implemented. A more detailed explanation can be found in the textbook.
1. Fetch¶
The first step involves fetching the instruction from memory at the address indicated by the PC register and decoding it so that the control logic can execute it as intended.
From the previous material, we know the structure of a machine instruction:
1. First byte: opcode, consisting of icode (4 bits) + ifun (4 bits)
2. Second byte: operands, consisting of register codes rA (4 bits) and rB (4 bits)
3-9. Bytes: immediate value or memory address (8 bytes)
1. First byte: opcode, consisting of icode (4 bits) + ifun (4 bits)
2. Second byte: operands, consisting of register codes rA (4 bits) and rB (4 bits)
3-9. Bytes: immediate value or memory address (8 bytes)
The control unit decodes the following values:
- Opcode:
icode
: instruction group (refer to the machine language lecture materials)ifun
: opcode-specific function within the group- Operands:
- Registers
rA
and/orrB
: numeric codes for registers or indication that the operand is not a register - Immediate value or memory address:
valC
- Address of the next instruction:
valP
, which is the current PC value plus the size of the current instruction in bytes (jumping to the start of the next instruction in memory) - If the opcode or memory address is invalid, the processor sets exception states
INS
orADR
, and program execution halts.
2. Decode and 5. Write-back¶
In this phase, two distinct actions are performed by the same subsystem because both phases involve handling registers:
- During the Decode phase, the instruction operands are fetched/prepared for the ALU to execute the instruction.
- During the Write-back phase, the result of the ALU operation is written back to the destination register.
In the Decode phase, the subsystem uses six internal registers to handle read and write operations for the destination registers:
- Input
srcA
: The registerrA
of the first operand of the instruction. - Input
srcB
: The registerrB
of the second operand of the instruction. - Output
dstE
: The result registerrB
of the executed instruction. - Output
dstM
: The valuerA
to be written to memory. - Output
valA
: The operand fetched from a register. - Output
valB
: The operand fetched from a register.
During the Write-back phase, depending on the instruction, the result from the internal
valE
register is written to the result register rB
, or the value fetched from memory valM
is written to register rA
.3. Execute¶
To perform execution, the operand values
valA
and valB
are prepared in advance for the ALU and stored in its internal registers aluA
and aluB
. This allows the ALU to operate on the values without needing to know their origin or fetch details. The operation to be performed is specified in the icode:ifun
inputs.Interestingly, the y86-64 ALU supports only four operations, yet these suffice to build a functioning computer: addition, subtraction, AND, and XOR. Note that a machine instruction may logically imply something else but is actually translated into one of these four ALU operations internally!
Addition is the most frequently used operation, enabling the implementation of various instructions surprisingly simply—see examples at the end of this material.
Once the ALU operation is completed, its result is stored in the
valE
register. Additionally, the ALU updates the condition code (CC) register with the status of the condition flags.4. Memory¶
In this phase, data is either read from memory into a register or written from a register to memory. The operation is determined by the
icode
. A key principle is that memory operations always go through the ALU (and its internal registers), avoiding direct memory-to-memory operations.The memory address is determined by the control logic as follows:
- Generally, from the
valE
register, calculated during the Execute phase. - From the
valA
register of the Decode phase. - From the
valC
register of the Fetch phase.
The data to be written is chosen based on the opcode:
- Typically from the
valA
register (from the Decode phase). - As an exception, in a
call
(subroutine call), thevalP
(the address of the next instruction) is written to the stack to enable returning.
The data fetched from memory is stored in the
valM
register.6. PC Update¶
In the final phase, the
PC
register value is updated to point to the next instruction to be fetched during the next Fetch phase.The new address is determined as follows:
- Typically, the
valP
calculated in the Fetch phase. - For a
ret
(return) instruction, the address invalM
, fetched from the stack during the Memory phase. - For conditional jumps, the address is determined by the opcode and condition flags, either
valC
orvalP
. In this case, the decision to jump is made at this stage.
Instruction Execution Examples¶
The diagram below illustrates the subsystems' operations for different types of instructions. The notation used describes register and memory transfers as follows:
R[x]
indicates reading a value from or writing a value to registerx
.M[x]
indicates reading a value from or writing a value to memory locationx
.- The operator
<-
indicates a value being read and assigned.
The diagram demonstrates how the ALU operation is used in different instructions:
addq
: Performs an arithmetic operation and sets the condition flags.irmovq
: Executes a move operation using addition (with zero).rmmovq
: Calculates the memory address for reading or writing data.popq
: Retrieves a value from the stack and adjusts the stack pointer using the current memory address and word size.jne
: Executes a conditional jump, requiring the condition flags to be read.
Bibliography¶
Please refer to the course book Bryant & O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd edition. Chapter 4.
In Conclusion¶
Understanding processor operations is key to understanding how a computer works as a whole. In this chapter, we introduced the general functioning of the sequential y86-64 processor in executing instructions. A major weakness of a sequential processor is its inability to optimize the execution time of instructions—every instruction is executed at the speed of the slowest one. This means the clock cycle must be slow enough to allow all instructions to complete. Typically, memory operations are the slowest, and for example, in von Neumann architectures, two separate memory accesses per instruction are required. Additionally, in a sequential processor, the subsystems responsible for different execution phases spend most of the clock cycle idle.
In fact, rather than adding hardware components or functional digital blocks to improve the performance of a sequential processor, it may be cheaper to achieve the same result programmatically! Software can also more easily handle special cases related to instructions. We will revisit this concept in later material.
Give feedback on this content
Comments about this material