Termbank
  1. A
    1. Abstraction
    2. Alias
    3. Argument
    4. Array
  2. B
    1. Binary code file
    2. Binary number
    3. Bit
    4. Bitwise negation
    5. Bitwise operation
    6. Byte
  3. C
    1. C library
    2. C-function
    3. C-variable
    4. Character
    5. Code block
    6. Comment
    7. Compiler
    8. Complement
    9. Conditional statement
    10. Conditional structure
    11. Control structure
  4. D
    1. Data structure
    2. Duck typing
  5. E
    1. Error message
    2. Exception
  6. F
    1. Flag
    2. Float
  7. H
    1. Header file
    2. Headers
    3. Hexadecimal
  8. I
    1. Immutable
    2. Initialization
    3. Instruction
    4. Integer
    5. Interpreter
    6. Introduction
    7. Iteroitava
  9. K
    1. Keyword
  10. L
    1. Library
    2. Logical operation
  11. M
    1. Machine language
    2. Macro
    3. Main function
    4. Memory
    5. Method
  12. O
    1. Object
    2. Optimization
  13. P
    1. Parameter
    2. Placeholder
    3. Pointer
    4. Precompiler
    5. Precompiler directive
    6. Prototype
    7. Python console
    8. Python format
    9. Python function
    10. Python import
    11. Python list
    12. Python main program
    13. Python variable
    14. Python-for
    15. Pääfunktio
    16. printf
  14. R
    1. Resource
    2. Return value
  15. S
    1. Statement
    2. Static typing
    3. String
    4. Syntax
  16. T
    1. Terminal
    2. Type
    3. Typecast
  17. U
    1. Unsigned
  18. V
    1. Value
  19. W
    1. Warning
    2. while
Completed: / exercises

Assembly Language

Learning Objectives: After reading this material, the student will understand the y86 assembly language syntax and ways to implement high-level programming language functionality in assembly.
In this course, we first learn y86 assembly and machine language, which are then used to illustrate instruction execution in the processor and introduce the processor's internal components related to executing instructions. Additionally, y86 assembly language is used for the course's exercises and project work with an assembler and simulator.
The y86-64 assembly syntax and instruction set are, in essence, very similar to x86 assembly. In fact, with minor adjustments, programs can run on both processors! However, it’s important to note that the y86 instruction set is significantly simplified, as it’s designed for educational purposes. Not even close to all of the thousand-plus x86 family assembly instructions are implemented in the y86-64 processor.

Overview

When programming in y86 assembly, we naturally rely on the y86 processor architecture described in previous material. We have at our disposal:
Note that y86 assembly instructions often end with the letter q, which means the instruction operates on a 64-bit operand (quad). Generally, a 32-bit operand is indicated by the letter l (long word), and a 16-bit operand by the letter w (word), but in y86 simulator we are using only 64-bit numbers are used.
With this in mind, let’s start looking at the y86 assembly syntax.

Transfer Operations

In the Computer Systems part of the course, we started learning C programming by using variables in our programs, so let’s now do the same with y86 assembly. In general, assembly languages do not use logical names for variables, such as uint8_t fish_count = 3;. Instead, we use the processor’s registers and memory directly via memory addresses as variables.
We recall from the architecture diagram that the ALU operates on operands stored in the processor’s internal memory (registers). So,
There are four transfer instructions in y86 assembly. The general instruction is movq, and it requires two letters in front to indicate the types of both operands, meaning where the value is coming from and where it is going. The instructions take two operands, depending on the type, indicating the source and destination.
Note that in y86 assembly, transfers cannot occur directly from one memory address to another, and immediate values cannot be moved directly to memory. Transfers always go through a register. The reason for this will be explained in later material.
Code example:
#.pos 0               
   irmovq $4,%rax      # Immediate: assign %rax = 4 
   rrmovq %rax,%rcx    # Register: assign %rcx = %rax
   mrmovq (%rdi), %r10 # Indirect: fetch value from memory
                       # address pointed to by %rdi and assign it to %r10
   rmmovq %rsi,8(%rdx) # Indirect: assign %rsi's value
                       # to memory address %rdx + 8
#  halt               
All previously discussed memory addressing modes are used in transfer operations, and we’ll look at them more closely soon.

Arithmetic Operations

Only four different arithmetic operations are available, meaning more complex operations, such as multiplication or division, must be implemented as separate subroutines!
The instructions take two register operands, which can be any registers (in this example, they’re arbitrary registers).
Additionally, the processor’s status flags are set based on the operation’s result.
Code example:
#.pos 0                
   irmovq $4,%rax      # 1st operand, number 4
   irmovq $3,%rbx      # 2nd operand, number 3
   addq   %rbx,%rax    # %rax = %rax + %rbx = 4 + 3 = 7

   irmovq $7, %rbx     # 2nd operand, number 7
   subq   %rbx,%rax    # %rax = %rax - %rbx = 7 - 7 = 0
                       # Now ZF flag is set
#  halt                
Note: Immediate values (numbers) cannot be used as operands, because the ALU always operates with registers.

Conditional Move Operations

It turns out we can’t move on from transfer operations that easily… In addition to direct move operations, assembly language also has conditional move operations, whose behavior depends on the status flags. Conditional move operations are a convenient (and sometimes the only) way to implement conditional and control structures similar to high-level languages.
Conditional move instructions take register operands. The conditionality is implemented by checking the specific status flag before executing the instruction, and if the flag is set, the move is performed.
Conditional move instructions, where r1 and r2 are the operand registers:
The rationale behind conditional move instructions is to keep the ALU’s digital logic implementation as simple as possible. As a result, the assembly programmer must think in terms of status bits rather than straightforward comparison operators… This approach is not processor-specific and is used in all assembly languages.
Code example: A C language conditional statement…
if (a == 10) {
   b = a;
}
… implemented in y86 assembly:
#.pos 0                
   irmovq $10,%rax     # a = 10
   irmovq $10,%rcx     # Comparison value 10 
   subq %rax,%rcx      # "Comparison" operation
   cmove %rax,%rbx     # if ZF=1, assign %rbx = %rax, i.e., b = a
#  halt 
Conditional move instructions can optimize processor performance, which we will discuss further later…

Jump Instructions

y86 assembly has a broader set of jump instructions, with a total of seven different types. Similar to conditional move operations, jump instructions themselves do not perform any comparison; instead, they make a jump decision based on the status bits. The operand for a jump instruction is a memory address or a logical name for the memory address to jump to.
Code example: An imaginary loop structure in C…
a = 10;
while(a > 0) {
   a--;
} 
… implemented in y86 assembly:
#.pos 0
    irmovq $10,%rax    # Loop variable
    irmovq $1,%r8      # Constant 1 
loop:
    subq %r8,%rax      # %rax = %rax - 1
    jne loop            # Jump back if %rax > 0 (ZF!=0). jg loop toimi myös
#   halt 
Code example: A C conditional statement…
a = 9;
if (a > 10) {
   b = 1;
} else {
   b = 0;
}
… implemented in y86 assembly:
#.pos 0  
   irmovq $9,%rax      # a = 9
   irmovq $10,%r8      # Comparison value 10 
   subq %r8,%rax       # if (a > 10)
   jg greater          # condition true, jump to "greater"
   irmovq $0,%rbx      # otherwise b = 0
   jmp skip            # skip "greater"
greater:
   irmovq $1,%rbx      # b = 1
skip:                  # program execution continues here
#  halt  

Other Instructions

Additionally, y86 assembly includes other instructions related to program execution.
The nop instruction does nothing except increment the PC register. However, in real processors, this instruction has several uses, including timing and aligning code in memory. We’ll discuss this further later on, as an instruction that does nothing plays an important role in optimizing processor performance.
The halt instruction stops program execution. The program’s status register changes to state STAT=HLT.
Note! In the example programs above, the halt instruction was commented out. If you run the examples in an emulator, it’s important to remove the comment from halt.

Using the Stack

In general, stack memory, or the runtime memory for programs, is integrated into (almost) all modern processors. The stack stores information needed in the near future in a program, which we don’t want to store in registers. Examples include subroutine return addresses and arguments, which we’ll cover shortly.
The stack is a LIFO (Last In, First Out) type of memory, meaning it’s literally a stack where we "pile up" information "on top of each other." The stack grows "upwards," and, conversely, is cleared from the "top down," with the topmost information being removed first.
To make stack usage "simple," the stack actually grows downwards in memory. In other words, the stack’s first memory location is at its highest address. When something is added to the stack, it’s stored in the previous memory location. (There are mostly historical reasons for this convention; the stack was often placed at the end of the memory area, so there was room only downwards, or indexing was simpler for earlier processors.)
In machine/assembly languages, stack usage has its own instructions; in y86:
Additionally, stack management requires support from specific registers:
Therefore, stack instructions automatically look at the memory address from the %rsp register and update the register value after executing.
In every y86 assembly program, we must first define the stack’s location. Strictly speaking, it’s not absolutely necessary, and we can create small programs without it. However, in this course, we always use the stack in programs, because without a stack, subroutines won’t work, see below.
In y86 assembly, stack initialization is done by setting the stack registers to point to the desired starting address of the stack (here, 0x400). After that, we can use the stack in our program:
.pos 0
    irmovq stack,%rbp   # stack base address 
    irmovq stack,%rsp   # topmost memory address of the stack 
main:
    irmovq $1,%rax      # value to store in the stack
    pushq %rax          # register value to stack
    irmovq $2,%rax
    pushq %rax
    irmovq $3,%rax
    pushq %rax       
    popq %rax           # topmost stack value to register
    popq %rax
    popq %rax           # three values were stored, so remove them all
    halt
    
.pos 0x400
stack:
Note! You can run the example code in the y86 simulator to see how the stack registers and memory behave.
Note2! A typical mistake on this course is initially forgetting to initialize the stack and then wondering why subroutine calls don’t work.
Note3! Since the stack is in the same memory as code and data, be careful not to overwrite the code or other data with values pushed onto the stack!

Subroutines

In assembly languages, it is possible to create subroutines (or functions) in the program, but writing and using them involves several important considerations.

Subroutine Location in Memory

The location of a subroutine in memory is entirely up to the programmer. By using the .pos directive and a logical name for the memory address, the programmer can place the subroutine anywhere in the memory available to the program.
Example: Place a subroutine/function named addition starting at memory address 0x100.
.pos 0x100
addition:
   addq %rax,%rbx
   ret

Subroutine Execution

A subroutine is executed by jumping to its memory address, but it’s done with a specific call instruction. The return from the subroutine to the calling program is also done with a special ret instruction. The program must also have a stack memory defined for these instructions to work properly.
Example: Place a subroutine/function named addition starting at memory address 0x100 and the stack starting at address 0x400.
.pos 0
   irmovq stack,%rsp    # stack pointer
   irmovq stack,%rbp    # stack base address
   call addition        # subroutine call
   halt                 # program execution stops

.pos 0x100
addition:               # subroutine starts here
   addq %rsi,%rdi
   ret                  # return from subroutine
   
.pos 0x400
stack:                  # stack base address   
Ok… but why are specific instructions needed for calling subroutines and using the stack? Why can’t we just use jump instructions? The answer isn’t clear from the examples above, but the reason is that before jumping to a subroutine, the processor actually saves its current state, including general registers and the PC, to the stack, so the exact same processor state can be restored after returning from the subroutine.
There is a very important reason for this: it allows reusing the same registers in subroutines. In the main program, when we save register values to the stack, the register becomes available for use in the subroutine, and when the subroutine ends, the "old" register values can be restored from the stack. This is especially handy in larger programs or when only a few registers are available. Interrupt handlers (and some firmware calls) actually work in the same way, but more on that later…
In y86 assembly:

Subroutine Arguments

In assembly languages, it’s also possible to pass arguments to subroutines, either through registers or stack memory. As you might guess, assembly programmers have to write a bit more code for this… but thankfully, there are generally agreed-upon conventions.
1. Passing arguments through registers. Now, in x86 processor architectures (yes, x86), it’s customary to use specific registers for storing arguments. Having this convention makes code reuse and general-purpose functions easier, as register use doesn’t need to be rethought.
Example: Using registers %rdi and %rsi to pass two arguments and a return value (in register %rdi).
.pos 0
   irmovq $100,%rdi    # 1st argument
   irmovq $200,%rsi    # 2nd argument
   call addition       # subroutine call
   halt                # program stops

.pos 0x100
addition:              # subroutine functionality
   addq %rsi,%rdi      # result in %rdi is the return value
   ret                 # return from subroutine
2. Arguments can also be stored on the stack before calling the subroutine and read from the stack during execution. This allows passing more than six parameters. This is a convenient method, but note that the return address also goes onto the stack. In this case, indirect addressing must be used to read values from the stack.
.pos 0
   irmovq stack,%rsp     # stack pointer
   irmovq stack,%rbp     # stack base address
   irmovq $10,%rax       # 3 arguments (10, 20, and 30) to stack 
   pushq %rax
   irmovq $20,%rax
   pushq %rax
   irmovq $30,%rax
   pushq %rax
   call addition         # subroutine call
   popq %rax             # clear stack to avoid leftover data
   popq %rax             
   popq %rax             
   halt                  # program stops

.pos 0x100
addition:                # subroutine begins
   mrmovq 8(%rsp),%rax
   addq %rax,%rbx
   mrmovq 16(%rsp),%rax
   addq %rax,%rbx
   mrmovq 24(%rsp),%rax
   addq %rax,%rbx
   ret                   # return from subroutine
   
.pos 0x400
stack:                   # stack base address   
Let's examine the stack memory in this program using a screenshot from the y86-64 emulator.
On the left, we see memory addresses. Notice that the stack base address %rbp is indeed set at memory location 0x400. The program then pushes arguments 1, 2, and 3 onto the stack one at a time using pushq instructions. Each time a value is pushed onto the stack, the stack pointer %rsp changes (grows upwards) by the size of the stored value (here, 64 bits = 8 bytes). So initially, %rsp holds the value 0x400, and after storing the first argument (1), the stack pointer moves up one memory location and now points to memory address 0x3f8. Similarly, as 2 and 3 are pushed onto the stack, the stack pointer moves up by one memory location each time and ends up pointing to memory location 0x3e8. After that, the program makes a call instruction, which pushes the return address (0x41) onto the stack. Thus, when entering the subroutine, four memory locations in the stack are in use.
In the subroutine, we use the stack as follows:
At the end of the program, we need to clear the values from the stack with popq instructions. When using the stack, always ensure that you retrieve exactly as much data as was stored there.
Note! You can also directly allocate memory from the stack by manipulating the %rsp register. For example, if you need local variables in a subroutine, you can shift the stack pointer forward by the desired number of memory locations and use indirect addressing to handle these as if they were reserved memory locations. In "real-world" assembly programming, the stack is typically divided into sections with specific purposes, especially for subroutine handling.

Compilation Control Instructions

The following are instructions used to control compilation:
Names are not mandatory; they can be used to help organize the code. The programmer has free rein here. In the example below, you could just as easily use my_func instead of main. In the example, memory addresses are reserved for main and two functions:
main:
    ...
function1:
    ...
function2:
    ...
.pos 0 # Main program (i.e., main function) starts at address 0
main: 
    ...
.pos 0x100 # function1 starts at memory address 0x100
function1:
    ...
The program's memory addresses don’t need to be sequential, so there can be "empty" space in between. This is particularly useful if you want to update a subroutine later without having to move the entire code or its memory addresses.
Example: First, the .pos instruction sets the desired memory location, the .align instruction sets the desired alignment, and the .quad instruction stores data at that memory location.
.pos 0x80
.align 8 # Align memory in 8-byte increments
.quad 0x1234 # 2-byte number
.quad 0x5678 # 2-byte number
Memory will look as follows, with each number padded to the word length.
...
0x80: 0x3412000000000000
0x88: 0x7856000000000000
...

Using Memory in Programs

Finally, here are a few handy tips on how to use memory for working with variables in y86 assembly programs.

Custom Variables

As we learned earlier, a variable is simply a memory location in the computer’s memory. So in assembly languages, we can freely use program memory (indirectly) for defining and using custom variables.
Example: Define two custom variables {* .align aligns the memory address to the specified word length, which in this case is 8 bytes. If the size of the value to be stored in memory is less than a word, align effectively "pads" it with zeros to make it word-sized. This is helpful with arrays, for example, but typically isn’t needed in this course.
{{my_var1}}} and my_var2 at memory locations 0x300 and 0x308 and store and read values from them.
.pos 0
   irmovq $1,%rax      # Value to store in memory location        
   irmovq my_var1, %rbx # Load address of memory location into register
   rmmovq %rax, (%rbx) # Store register value in memory (indirect addressing)
   irmovq $2, %rax     # Value to store in memory location        
   irmovq my_var2, %rbx # Load address of memory location into register
   rmmovq %rax, (%rbx) # Store register value in memory (indirect addressing)

   irmovq my_var1, %rcx # Load address of memory location into register
   mrmovq (%rcx), %rdi # Read value from memory location into register
   
.pos 0x300             # Memory location   
my_var1:               # Declaration of my_var1

.pos 0x308             # Memory location   
my_var2:               # Declaration of my_var2

Array Variables

Similarly, we can define and use arrays in assembly programs as in higher-level languages, but using them requires mastering various memory addressing modes to index array elements.
Here, indirect memory addressing is especially handy because we can use a chosen register (an "index register") to operate with memory addresses. That is, we write the memory address of the array element to a register and update the address as we move through the array.
array[0] # Address of the first element 0x300
array[1] # Address of the second element 0x308
array[2] # Address of the third element 0x310
...
# Why does the address increase by 8 bytes each time? 
# Because the architecture is 64-bit
Example: The array starts at address 0x300, where we store values for variables.
.pos 0
   irmovq array, %rdi  # Index register, array start
   irmovq $8, %rcx     # Constant 8, index increases/decreases by 8 bytes
   
   irmovq $1, %rax     # Value to store in the first element
   rmmovq %rax, (%rdi) # Store in memory location of the first element
   addq %rcx, %rdi     # Next (second) element's address +8 bytes
   irmovq $2, %rax     # Value to store in the second element
   rmmovq %rax, (%rdi) # Store in memory location of the second element
   addq %rcx, %rdi     # Next (third) element's address +8 bytes
   irmovq $3, %rax     # Value to store in the third element
   rmmovq %rax, (%rdi) # Store in memory location of the third element
   
.pos 0x300
array:
Similarly, to read a value from an array element, just change the value of the register (in this case %rdi) used as an index. For example, the address of the second element would be array + 16, and the fourth element array + 32, etc.

Structure of an Assembly Program

Below is an example code from the course textbook (Bryant, p. 403) and the simulator, which shows the basic structure of a y86 assembly language program.
  1. Initialization: Setting the memory address of the stack is enough here
  2. Main program main
    • Corresponds to the C language function main()
  3. Subroutines
    • Here, sum, function equivalent to C (long *array, long count)
  4. Memory Allocation
    • Here, an array with 4 elements
# Program starts at memory address 0 
.pos 0
# Initialization
    irmovq stack, %rbp   # Initialize stack
    irmovq stack, %rsp   # Stack pointer
# Main Program    
main:   
    irmovq array,%rdi    # Arguments for subroutine
    irmovq $4,%rsi       # 
    call sum             # Call subroutine sum(array, 4)
    halt                 # End of main program

# Subroutine sum(long *array, long count)
# start in %rdi
# count in %rsi
sum:    
    irmovq $8,%r8        # Constant 8
    irmovq $1,%r9        # Constant 1
    xorq %rax,%rax       # sum = 0 -> %rax = 0
    andq %rsi,%rsi       # Clear flags -> The and of one positive number never ~0 activates none flag.
    jmp     test         # Jump to test block
loop:   
    mrmovq (%rdi),%r10   # Array start (*array)
    addq %r10,%rax       # Add element to sum
    addq %r8,%rdi        # Next element address (array=..)
    subq %r9,%rsi        # Array length (count--)
test:   
    jne    loop          # Loop back if count != 0. First time no flag is activated (we did an and)
    ret                  # Return from subroutine

# Stack location in memory
.pos 0x200
stack:

# Array: 4 elements
.pos 0x300
.align 8
array:  
    .quad 0x000d000d000d
    .quad 0x00c000c000c0
    .quad 0x0b000b000b00
    .quad 0xa000a000a000

y86-64 Assembly reference

Please refer to the course book Bryant & O'Hallaron, Computer Systems: A Programmer's Perspective, 3rd edition. Chapter 3.
Here (Chapter 4) you can find the y86-64 processor simulator that is used in the course for exercises and course project.
There is also a browser version for y86-64-simulator here. You should take a look a the exercise material before starting assembly programming.

Summary

As you can see from the material, programming in assembly requires the programmer to work very close to the processor, which means that even "routine" tasks in C language need to be implemented carefully, instruction by instruction at a low level, with knowledge of techniques for handling memory.
A convenient web-based simulator is available for y86 assembly programming on this course. But first, read more about assembly programming in the exercise materials.
?
Abstraction is a process through which raw machine language instructions are "hidden" underneath the statements of a higher level programming language. Abstraction level determines how extensive the hiding is - the higher the abstraction level, the more difficult it is to exactly say how a complex statement will be turned into machine language instructions. For instance, the abstraction level of Python is much higher than that of C (in fact, Python has been made with C).
Alias is a directive for the precompiler that substitus a string with another string whenever encountered. In it's basic form it's comparable to the replace operation in a text editor. Aliases are define with the #define directeve, e.g. #define PI 3.1416
Argument is the name for values that are given to functions when they are called. Arguments are stored into parameters when inside the function, although in C both sides are often called just arguments. For example in printf("%c", character); there are two arguments: "%c" format template and the contents of the character variable.
Array is a common structure in programming languages that contains multiple values of (usually) the same type. Arrays in C are static - their size must be defined when they are introduced and it cannot change. C arrays can only contain values of one type (also defined when introduced).
Binary code file is a file that contains machine language instructions in binary format. They are meant to be read only by machines. Typically if you attempt to open a binary file in a text editor, you'll see just a mess of random characters as the editor is attempting to decode the bits into characters. Most editors will also warn that the file is binary.
Binary number is a number made of bits, i.e. digits 0 and 1. This makes it a base 2 number system.
A bit is the smallest unit of information. It can have exactly two values: 0 and 1. Inside the computer everything happens with bits. Typically the memory contains bitstrings that are made of multiple bits.
Bitwise negation is an operation where each bit of a binary number is negated so that zeros become ones and vice versa. The operator is ~.
Bitwise operations are a class of operations with the common feature that they manipulate individual bits. For example bitwise negation reverses each bit. Some operations take place between two binary values so that bits in the same position affect each other. These operations include and (&), or (|) and xor (^). There's also shift operations (<< and >>) where the bits of one binary number are shifted to the left or right N steps.
Byte is the size of one memory slot - typically 8 bits. It is the smallest unit of information that can be addressed from the computer's memory. The sizes of variable types are defined as bytes.
External code in C is placed in libraries from which they can be taken to use with the #include directive. C has its own standard libraries, and other libraries can also be included. However any non-standard libraries must be declared to the compiler. Typically a library is made of its source code file (.c) and header file (.h) which includes function prototypes etc.
Functions in C are more static than their Python counterparts. A function in C can only have ne return value and its type must be predefined. Likewise the types of all parameers must be defined. When a function is called, the values of arguments are copied into memory reserved for the function parameters. Therefore functions always handle values that are separate from the values handled by the coe that called them.
C variables are statically typed, which means their type is defined as the variable is introduced. In addition, C variables are tied to their memory area. The type of a variable cannot be changed.
Character is a single character, referred in C as char. It can be interpreted as an ASCII character but can also be used as an integer as it is the smallest integer that can be stored in memory. It's exactly 1 byte. A character is marked with single quotes, e.g. 'c'.
Code block is a group of code lines that are in the same context. For instance, in a conditional structure each condtion contains its own code block. Likewise the contents of a function are in their own code block. Code blocks can contain other code blocks. Python uses indentation to separate code blocks from each other. C uses curly braces to mark the beginning and end of a code block.
Comments are text in code files that are not part of the program. Each language has its own way of marking comments. Python uses the # character, C the more standard //. In C it's also possible to mark multiple lines as comments by placing them between /* and */.
A compiler is a program that transforms C source code into a binary file containing machine language instructions that can be executed by the computer's processor. The compiler also examines the source code and informs the user about any errors or potential issues in the code (warnings). The compiler's behavior can be altered with numerous flags.
Complement is a way to represent negative numbers, used typically in computers. The sign of a number is changed by flipping all its bits. In two's complement which is used in this course, 1 is added to the result after flipping.
Conditional statement is (usually) a line of code that defined a single condition, followed by a code block delimited by curly braces that is entered if the condition evaluates as true. Conditional statements are if statements that can also be present with the else keyword as else if. A set of conditional statements linked together by else keywords are called conditional structures.
Conditional structure is a control structure consisting of one or more conditional statements. Most contrl structures contain at least two branches: if and else. Between these two there can also be any number of else if statements. It is however also possible to have just a single if statement. Each branch in a conditional structure cotains executable code enclosed within a block. Only one branch of the structure is ever entered - with overlapping conditions the first one that matches is selected.
Control structures are code structures that somehow alter the program's control flow. Conditional structures and loops belong to this category. Exception handling can also be considered as a form of control structure.
Data structure is a comman name for collection that contain multiple values. In Python these include lists, tuples and dictionaries. In C the most common data structures are arrays and structs.
Python's way of treating variable values is called dynamic typing aka duck typing. The latter comes from the saying "if it swims like a duck, walks like a duck and quacks like a duck, it is a duck". In other words, the validity of a value is determined by its properties in a case-by-case fashion rather than its type.
An error message is given by the computer when something goes wrong while running or compiling a program. Typically it contains information about the problem that was encountered and its location in the source code.
An exception is what happens when a program encounters an error. Exceptions have type (e.g. TypeError) that can be used in exception handling within the program, and also as information when debugging. Typically exceptions also include textual description of the problem.
Flags are used when executing programs from the command line interface. Flags are options that define how the program behaves. Usually a flag is a single character prefixed with a single dash (e.g. -o) or a word (or multiple words connected with dashes) prefixed with two dashes (e.g. --system. Some flags are Boolean flags which means they are either on (if present) or off (if not present). Other flags take a parameter which is typically put after the flag separated either by a space or = character (e.g. -o hemulen.exe.
Floating point numbers are an approximation of decimal numbers that are used by computers. Due to their archicture computers aren't able to process real decimal numbers, so they use floats instead. Sometimes the imprecision of floats can cause rounding errors - this is good to keep in mind. In C there are two kinds of floating point numbers: float and double, where the latter has twice the number of bits.
Header files use the .h extension, and they contain the headers (function prototypes, type definitions etc.) for a .c file with the same name.
Headers in C are used to indicate what is in the code file. This includes things like function prototypes. Other typical content for headers are definition of types (structs etc.) and constants. Headers can be at the beginning of the code file, but more often - especially for libraries - they are in placed in a separate header (.h) file.
Hexadecimal numbers are base 16 numbers that are used particularly to represent memory addresses and the binary contents of memory. A hexadecimal number is typically prefixed with 0x. They use the letters A-F to represent digits 10 to 15. Hexadecimals are used because each digit represents exactly 4 bits which makes transformation to binary and back easy.
In Python objects were categorized into mutable and immutable values. An immutable value cannot have its contents changed - any operations that seemingly alter the object actually create an altered copy in a new memory location. For instance strings are immutable in Python. In C this categorization is not needed because the relationship of variables and memory is tighter - the same variable addresses the same area of memory for the duration of its existence.
When a variable is given its initial value in code, the process is called initialization. A typical example is the initialization of a number to zero. Initialization can be done alongside with introduction: int counter = 0; or separately. If a variable has not been initialized, its content is whatever was left there by the previous owner of the memory area.
Instruction set defines what instructions the processor is capable of. These instructions form the machine language of the processor architecture.
Integers themselves are probably familiar at this point. However in C there's many kinds of integers. Integer types are distinguished by their size in bits and whether they are signed or not. As a given number of bits can represent up to (2 ^ n) different integers, the maximum value for a signed integer is (2 * (n - 1))
Python interpreter is a program that transforms Python code into machine language instructions at runtime.
The moment a variable's existence is announed for the first is called introduction. When introduced, a variable's type and name must be defined, e.g. int number;. When a variable is introduced, memory is reserved for it even though nothing is written there yet - whatever was in the memory previously is still there. For this reason it's often a good idea to initialize variables when introducing them.
Iteroitava objekti on sellainen, jonka voi antaa silmukalle läpikäytäväksi (Pythonissa for-silmukalle). Tähän joukkoon kuuluvat yleisimpinä listat, merkkijonot ja generaattorit. C:ssä ei ole silmukkaa, joka vastaisi Pythonin for-silmukan toimintaa, joten taulukoiden yms. läpikäynti tehdään indeksiä kasvattavilla silmukoilla.
Keywords are words in programming languages that have been reserved. Good text editors generally use a different formatting for keywords (e.g. bold). Usually keywords are protected and their names cannot be used for variables. Typical keywords include if and else that are used in control structures. In a way keywords are part of the programming language's grammar.
A library is typically a toolbox of functions around a single purpose. Libraries are taken to use with the include directive. If a library is not part of the C standard library, its use must also be told to the compiler.
Logical operation refers to Boole's algebra, dealing with truth values. Typical logical operations are not, and, or which are often used in conditional statements. C also uses bitwise logical operations that work in the same way but affect each bit separately.
Machine language is made of instructions understood by the processor. Machine language is often called Assembly and it is the lowest level where it's reasonable for humans to give instructions to computers. Machine language is used at the latter part of this course - students taking the introduction part do not need to learn it.
Macro is an alias that defines a certain keyword to be replaced by a piece of code. When used well, macros can create more readable code. However, often the opposite is true. Using macros is not recommended in this course, you should just be able to recognize one when you see it.
In C the main function is the starting point when the program is run. The command line arguments of the program are passed on to the main function (although they do not have to be received), and its return value type is int. At its shortest a main function can defined as int main().
When programs are run, all their data is stored in the computer's memory. The memory consists of memory slots with an address and contents. All slots are of equal size - if an instance of data is larger, a continuous area of multiple memory slots is reserved.
Method is a function that belongs to an object, often used by the object to manipulate itself. When calling a method, the object is put before the method: values.sort().
Object is common terminology in Python. Everything in Python is treated as objects - this means that everything can be referenced by a variable (e.g. you can use a variable to refer to a function). Objects are typically used in object-oriented languages. C is not one.
Optimization means improving the performance of code, typically by reducing the time it takes to run the code or its memory usage. The most important thing to understand about opimization is that it should not be done unless it's needed. Optimization should only be considered once the code is running too slowly or doesn't fit into memory. Optimization should also not be done blindly. It's important to profile the code and only optimize the parts that are most wasteful.
A parameter is a variable defined alongside with a function. Parameters receive the values of the function's arguments when it's called. This differentation between parameters and arguments is not always used, sometimes both ends of the value transfer are called arguments.
Placeholders are used in string formatting to mark a place where a value from e.g. a variable will be placed. In Python we used curly braces to mark formatting placeholders. In C the % character is used which is followed by definitions, where the type of the value is mandatory. For instance "%c" can only receive a char type variable.
Pointers in C are special variables. A pointer contains a memory address of the memory location where the actual data value is located. In a sense they work like Python variables. A variable can be defined as a pointer by postfixing its type with * when it's being introduced, e.g. int* value_ptr; creates a pointer to an integer. The contents of the memory address can be fetched by prefixing the variable name with * (e.g. *value_ptr. On the other hand, the address of a memory adress can be fetched by prefixing a variable name with &, (e.g. &value.
The C precompiler is an apparatus that goes through all the precompiler directives in the code before the program is actually compiled. These directives include statements which add the source code of the included libraries into the program, and define directives that can define constant values (aliases) and macros.
Directives are instructions that are addressed at the precompiler. They are executed and removed from the code before the actual compilation. Directives start with the # character. The most common one is include which takes a library into use. Another common one is define, which is used e.g. to create constant values.
Prototype defines a function's signature - the type of its return value, its name and all the arguments. A prototype is separate from the actual function definition. It's just a promise that the function that matches the prototype will be found in the code file. Prototypes are introduced at the beginning of the file or in a separate header file. In common cases the prototype definition is the same as the line that actually starts the function introduction.
Interactive interpreter or Python console is a program where users can write Python code lines. It's called interactive because each code line is executed after its been fully written, and the interpreter shows the return value (if any).
The format method of string in Python is a powerful way to include variable values into printable text. The string can use placeholders to indicate where the format method's arguments are placed.
Python functions can have optional parameters that have a given default value. In Python the values of arguments in a function call are transferred to function parameters through reference, which means that the values are the same even though they may have different names. Python functions can have multiple return values.
In Python the import statement is used for bringing in modules/libraries - either built-in ones, thrid party modules or other parts of the same application. In Python the names from the imported module's namespace are accessible through the module name (e.g. math.sin). In C libraries are taken to use with include, and unlike Python import it brings the library's namespace into the program's global namespace.
Python lists were discovered to be extremely effective tools in Elementary Programming. A Python list is an ordered collection of values. Its size is dynamic (i.e. can be changed during execution) and it can include any values - even mixed types. Lists can also include other lists etc.
In Python main program is the part of code that is executed when the program is started. Usually the main program is at the end of the code file and most of the time under if __name__ == "__main__": if statement. In C there is no main program as such, code execution starts with the main function instead.
In Python a variable is a reference to a value, a connection between the variable's name in code and the actual data in memory. In Python variables have no type but their values do. The validity of a value is tested case by case when code is executed. In these ways they are different from C variables, and in truth Python variables are closer to C pointers.
Pythonin for-silmukka vastaa toiminnaltaan useimmissa kielissä olevaa foreach-silmukkaa. Se käy läpi sekvenssin -esim. listan - jäsen kerrallaan, ottaen kulloinkin käsittelyssä olevan jäsenen talteen silmukkamuuttujaan. Silmukka loppuu, kun iteroitava sekvenssi päättyy.
Pääfunktio on C:ssä ohjelman aloituspiste ja se korvaa Pythonista tutun pääohjelman. Oletuksena pääfunktion nimi on main ja se määritellään yksinkertaisimmillaan int main().
Resource referes to the processing power, memory, peripheral devices etc. that are availlable in the device. It includes all the limitations within which programs can be executed and therefore defines what is possible with program code. On a desktop PC resources are - for a programmer student - almost limitless, but on embedded devices resources are much more scarce.
Return value is what a function returns when its execution ends. In C functions can only have one return value, while in Python there can be multiple. When reading code, return value can be understood as something that replaces the function call after the function has been executed.
A statement is a generic name for a single executable set of instructions - usually one line of code.
C uses static typing This means that the type of variables is defined as they are created, and values of different types cannot be assigned to them. The validity of a value is determined by its type (usually done by the compiler). Python on the other hand uses dynamic typing aka.duck typing.
In Python all text is handled as strings and it has no type for single characters. However in C there are no strings at all - there's only character arrays. A character array can be defined like a string however, e.g. char animal[7] = "donkey"; where the number is the size of the array + 1. The +1 is neede because the string must have space for the null terminator '\0' which is automatically added to the end of the "string".
Syntax is the grammar of a programming language. If a text file does not follow the syntax of code, it cannot be executed as code, or in the case of C, it cannot be compiled.
Terminal, command line interface, command line prompt etc. are different names to the text-based interface of the operating system. In Windows you can start the command line prompt by typing md to the Run... window (Win+R). Command line is used to give text-based commands to the operating system.
The data in a computer's memory is just bits, but variables have type. Type defines how the bits in memory should be interpreted. It also defines how many bits are required to store a value of the type. Types are for instance int, float and char.
Typecast is an operation where a variable is transformed to another type. In the elementary course this was primarily done with int and float functions. In C typecast is marked a bit differently: floating = (float) integer}. It's also noteworthy that the result must be stored in a variable that is the proper type. it is not possible to change the type of an existing variable.
Unsigned integer is a an integer type where all values are interpreted as positive. Since sign bit is not needed, unsigned integers can represent twice as large numbers as signed integers of the same size. An integer can be introduced as unsigned by using the unsigend keyword, e.g. unsigned int counter;.
In the elementary programming course we used the term value to refer to all kinds of values handled by programs be it variables, statement results or anything. In short, a value is data in the computer's memory that can be referenced by variables. In C the relationship between a variable and its value is tighter as variables are strictly tied to the memory area where its value is stored.
A warning is a notification that while executing or - in this course particularly - compiling it, something suspicious was encountered. The program may still work, but parts of it may exhibit incorrect behavior. In general all warnings should be fixed to make the program stable.
One way to print stuff in C is the printf function, which closely resembles Python's print function. It is given a printable string along with values that will be formatted into the string if placeholders are used. Unlike Python, C's printf doesn't automatically add a newline at the end. Therefore adding \n at the end is usually needed.
Out of loops, while is based on repetition through checking a condition - the code block inside the loop is repeated until the loop's condition is false. The condition is defined similarly to conditional statements, e.g. while (sum < 21).