y86-64 Machine Language¶
Learning Objectives: Instruction Set Architectures and implementation in the y86 processor.
The hardware implementation of a processor is based on microarchitecture, which defines and implements the instruction set architecture (ISA) visible to the programmer (engl. Instruction Set Architecture, ISA), with machine language being one part of it. Machine language is the lowest-level, hardware-dependent representation of a computer program in the processor’s “native language.” The bits that make up the machine language instructions directly correspond to the internal gates of physical logic/microchips, and the processor executes instructions based on the information in these bits through sequential logic.
In the previous material, we covered the y86-64 assembly language instructions. In this section, we present the corresponding instruction set architecture and machine language implementation.
Instruction Set Architecture¶
The instruction set architecture (ISA) describes the processor's implementation as it appears to the programmer. The ISA consists of the following components:
- Data types understood by the processor: including 2's complement representation and floating-point representation
- Registers, or the processor's internal memory
- Memory addressing modes: direct, register, indirect
- Modern processors can have dozens of different addressing modes!
- Processor status flags and status register
- State information needed for sequential logic (and operating system), such as different error states, etc.
- The machine language instruction set, where instructions are divided as follows:
- Arithmetic operations
- Logical operations
- Transfer operations
- Program control instructions
- Memory architecture, virtual memory: the memory seen by machine language programs as a continuous block where each memory location consists of words.
- Memory addresses start at zero (virtually) for the program, regardless of where the block is physically located in main memory.
- Memory addresses are made using this virtual memory address
- The stack, or runtime temporary memory for the program
- Default handlers for interrupts and exception conditions
- External interfaces: I/O registers and address space
The same instruction set architecture can be implemented with different microarchitectures. Examples include the x86 (or Intel's IA-32 and Intel64) instruction set architectures, with nearly identical implementations across processors from different manufacturers, such as Intel, AMD, and SPARC. Additionally, various manufacturers have extended the instruction set over time with their own processor-specific implementations.
y86 Machine Language Instructions¶
Machine language instructions consist of two parts:
- Opcode (engl. opcode), which indicates the type or group of instructions and its sub-function. One or more bytes.
- Operands : immediate value / memory / register in different addressing modes. One or more bytes.
In assembly language, the same division into parts is clearly visible, where we have human-readable instruction code (=instruction name) and for it (in this case) up to two operands.
rrmovq %rax,%rbx
Operands¶
Different types of operands can be used in machine language:
- Immediate values, i.e., numbers especially following 2's complement representation in machine language.
- Direct memory addresses, which are also represented as numbers.
- Each register address has its own numeric code.
In y86-64, the register code is four bits (2^4 = 16 registers), so all processor registers can be represented within one byte conveniently. Here, the source register and destination register are distinguished. The number
0xF
denotes an empty operand, meaning that a register operand is not required in the instruction, for example, when the operand is an immediate value or read from memory.Register | Numeric Code | Register | Numeric Code |
%rax | 0 | %r8 | 8 |
%rcx | 1 | %r9 | 9 |
%rdx | 2 | %r10 | A |
%rbx | 3 | %r11 | B |
%rsp | 4 | %r12 | C |
%rbp | 5 | %r13 | D |
%rsi | 6 | %r14 | E |
%rdi | 7 | - | F |
Memory addresses are presented as immediate values, here as 8-byte (64-bit) numbers. (And remember the different byte order) Depending on the instruction code, the immediate value is then interpreted as a memory address in these instructions.
Encoding and Structure of Instructions¶
Thus, each y86-64 machine language instruction (opcode + operands) is represented by 1-10 bytes as follows.
The first byte (1) of the instruction specifies the instruction's opcode, with the first part indicating the instruction type and the second part indicating the instruction function.
Note! The hierarchical grouping of similar instructions based on the instruction code.
The next bytes (2-10) specify the instruction's operands.
- In the case of two register operands, one byte is enough to represent the operands:
r1
andr2
. - When using immediate values or memory addresses, an additional 8 bytes is needed to represent the memory address, offset, or value. Here, the byte order is little endian. In this case, the second register operand, if not required in the instruction, is marked with code
F
. - Offset refers to a possible number used in indirect addressing, allowing relative addressing, where the memory address for the fetch is given by the base address plus/minus the offset value.
Examples of y86 assembly instructions in machine language representation.
Assembly | Machine Language Instruction | Explanation |
cmovg %rax,%rcx | 2601 | cmovg=26, rax=0, rcx=1 |
irmovq $0x1234,%rsi | 30f63412000000000000 | irmovq=30, F=none, rsi=6, immediate value=1234 |
addq %rdi,%rax | 6070 | addq=60, rdi=7, rax=0 |
jne 0x3c | 743c00000000000000 | jne=74, address=3c |
ret | 90 | ret=90 |
pushq %rbx | a03f | pushq=a0, rbx=3, F=none |
If the opcode or operands of a machine language instruction are incorrect, the processor enters an exception state (status flag) and triggers the predefined exception
INS
. If there is an error in the memory address, it causes the exception ADR
. However, the y86 processor lacks an actual exception handling mechanism, so all exceptions halt its operation.
y86-64 machine language external references¶
Please refer to the course book Bryant & O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd edition. Chapters 3 and 4.
Conclusion¶
In the 1980s, during the home computer boom in Finland, one of the most popular home computers was the Commodore 64, whose 6510 processor's instruction set can be found here (pp. 8-9). Note that this real processor had only three registers, A (accumulator), X, and Y! Additionally, for arithmetic operations, each register had its own instruction. Separate instructions also exist for reading and writing from/to memory. The instruction set is thus even more primitive than that of the y86 processor, and the data book is only 10 pages versus a modern x86's approximately 8000 pages!
Give feedback on this content
Comments about this material