Variable Types in C¶
Learning Objectives: After going through this material, you will know the types of variables commonly used in the C language.
Variable Types¶
The C language standards have predefined a set of variable types for the convenience of the programmer (and okay, to save memory). The variable type tells us its intended use: character, integer, or floating-point number. Since the variable type is standardized, you can rely on it to work the same way across different development environments and systems.
Variable Type | Keyword | Bytes | Standardized? |
Character | char | 1 | Yes |
Integer (word) | int | 2 / 4 | No, depends on the architecture. Modern architectures 4 bytes in general |
Short integer | short int | At least 2 | Yes. |
Long integer | long int | At least 4 | Yes |
Single-precision floating-point | float | 4 | Yes |
Double-precision floating-point | double | 8 | Yes |
You can check the size of different types using the
sizeof
operator.Integers¶
In the C language, integers are represented as two's complement numbers. According to the standard, integer variable types are guaranteed to have minimum and maximum ranges, meaning the variable will work within this specified range regardless of the compiler and computer. (Note! The full range is not always available as specified in the standard.)
The integer variable type can be refined with qualifiers:
signed
(signed), reserves the MSB for the sign, as we learned earlier. In two's complement numbers, this is the default, so it doesn't need to be specified separately (except in special cases).unsigned
(unsigned) "frees" the MSB for use in the number range, making the numbers always positive. Naturally, this changes the number range accordingly.short
andlong
are actually qualifiers, but when used, the wordint
can be omitted. In general, and to make the code easier to read, you will often findshort int
andlong int explicitly
used.const
makes the variable's value constant, meaning it cannot be changed in the program. But why would you want to make a variable constant? The benefit is that in C, the compiler move constants to the device's program memory instead of RAM. This produce clear memory savings, as these constants can be used in code just like variables.
For example, a 4-bit integer's range: unsigned and signed.
Character Variable¶
The character variable
char
is an interesting type because, in the C language, it is actually at least an 8-bit number, whose value is interpreted by the compiler as a character according to the ASCII table. So, every character corresponds to a numeric value in the C language. This solution is inherited from the 1970s when handling characters separately would have been too expensive and memory-intensive.This leads to interesting results. If the programmer wants, they can forget the character nature of the
char
type and use it (as a signed integer) for integer variables. This allows for arithmetic operations on letters to work, as they are interpreted by the compiler as numbers. Only when character is shown in any system output the corresponding character is presented. For example, operations like 'a' + 1 = 'b'
(97 + 1 = 98) or 'c' - 'a' = 2
(99 - 97 = 2).In embedded programming, this is actually hugely beneficial in resource-constrained devices, as strings composed of letters can be handled numerically, such as comparing the "equality" of two words. Strange but handy.
Please, note that in moder system, instead of ASCII we are using Unicode symbols which allows to present larger set of characters in different languages (e.g. cyrilic ...). In this case, most popular encoding is UTF-8(4 bytes). Recent C standards like C11 supports already unicode, but we are not going to cover that much about it in this course.
Other variable modifiers¶
C also has register, pointer, global, and static variables, but more on those later.
Derived Types¶
The C language standard defines a set of derived variable types. From now on in this course, we will use these variable types to keep the size of the used variable clear in our program code.
Derived Variable Type | Size in Bytes |
int8_t / uint8_t | 1 |
int16_t / uint16_t | 2 |
int32_t / uint32_t | 4 |
int64_t / uint64_t | 8 |
The
intN_t
variable type means a signed integer, and the uintN_t
type means an unsigned integer. The use of fixed-width types is increasingly important in modern programming specially in cross-platform development. These types are also widely use in C++ and are critical to ensure portability across different systems. Only available from the C99 standard. You need to use the
<stdint.h>
library. This is the preferred way of defining variables (specially ints) in this course, specially when we move to the embedded programming part. Initializing Variables¶
Initializing variables, as in many other programming languages, happens when the variable is assigned its value (value that is stored in memory). This process is usually done at the same time as the variable definition.
Examples of initializing integer variables:
int16_t integer = -123;
uint16_t unsigned_integer = 3333;
uint32_t long_integer = 0x12345678;
double floating_point = 1.234;
float smaller_floating_point = 1.2e-10;
For character variables
char
, the initialization uses the '
(single quote) to enclose a character, or a number as shown above. The corresponding ASCII value is stored in memory.These initializations are equivalent:
char character = 'a'; // Value is the ASCII code for 'a'
char character = 97; // In the ASCII table, 'a' corresponds to the number 97
Challenges¶
Since C is an old hardware-oriented programming language, many things are left to the programmer's knowledge.
For example, if a programmer tries to initialize a variable with a number outside its range, the C compiler politely generally (depends on the compiler, and the used flags) warns of an error .
int8_t a = 1234;
...
warning: overflow in implicit constant conversion.
OFFTOPIC: If you want to be sure that compiler warns about this possible error (and many others) you can compile it with the
-Wall
flag. However! The compiler only warns about the error, and the program continues to compile. It's important to be careful not to let such mistakes end up in the compiled program.
What happens in the compiler?why is this just a warning? We know that a specific amount of memory, based on the type, was allocated for the variable (int8_t is one byte). Now, if more data is written into the variable than it can hold, a memory overflow occurs. Fortunately, C compilers catch this. However, the solution is quite brutal, as the compiler simply chops off the extra bits of the number, like a guillotine.
OFFTOPIC:
What is the difference between warning vs error? WARNING: A warning is a message generated by the compiler that indicates a potential issue in the code. The compiler allows the program to compile and run, but it alerts the developer to something that could lead to unintended behavior or a bug. Warnings do not stop the compilation process, but they highlight areas of code that may need attention. ERROR: An error is a message from the compiler that indicates a critical issue in the code, which prevents the program from being compiled. Errors usually indicate syntax or semantic mistakes that must be corrected before the code can be successfully compiled and executed.
In the example above, here's what happens: 1234 in binary is
10011010010
(11 bits). The compiler cuts off the top three bits 100
, so that the remaining 8 bits fit in memory 11010010
. The problem is that this binary number is interpreted as a two's complement number, so the variable a is initialized as int8_t a = -46
instead of the intended 1234
. Oops!!!!Array Variables¶
In C, array variables are declared using square brackets (as in many other programming languages), with the size of the array inside the brackets. You can declare arrays for all basic variable types (and other types, which we'll cover later). Of course, multidimensional arrays are also possible in C.
The syntax is as follows:
uint8_t array[5];
uint8_t array[5] = { 1, -3, 5, -7, 9 }; // Initialize at the same time
uint8_t array[] = { 1, -3, 5, -7, 9 }; // Compiler determines the array size automatically!
//This is wrong: uint8_t[]; We cannot define an array without including its size unless we are initializing at the same time.
uint8_t array[3][3];
uint8_t array[2][3] = { { 1, 2, 3 }, // Initialize
{ 4, 5, 6 },
};
uint8_t array[][3] = { { 1, 2, 3 }, // The compiler can sometimes determine the array size!
{ 4, 5, 6 },
{ 7, 8, 9 } };
Strings¶
Since C does not have a separate string variable type, strings are arrays of type char. That is, they are regular numeric arrays, but the compiler interprets them as representing characters according to the ASCII table.
However, there is a little elegant peculiarity in initializing strings. Strings must always end with the literal character '\0' (NULL Terminator). This is crucial to know because many C standard library functions (and other ready-made functions) assume this! String handling in the program typically breaks if the string does not end with a zero. See for instance how
strlen
and strcpy
uses the null terminator to detect the end of a string. Forgetting the null terminator, will lead to security vulnerabilities, such as buffer overflow. Note! The null terminator is entirely different (in the compiler's view) from the ASCII character '0', which corresponds to the number 48. Null terminator is represented in memory by 0x00.
String initialization can be done in several ways, just like with arrays, as shown above.
char message[] = "Hello"; // The compiler automatically adds a trailing 0, array length is 5+1 characters
char message[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
char message[] = {'H', 'e', 'l', 'l', 'o', '\0'}; // The compiler calculates the array size
char message[] = {72, 101, 108, 108, 111, 0}; // ASCII codes for the characters
If we initialize a string/array too short, the compiler will automatically fill it with zeros. The following initializations are equivalent:
char message[5] = "H";
char message[5] = {'H'};
char message[5] = {'H', 0, 0, 0, 0 };
Indexes¶
Otherwise, handling arrays is familiar: you move through them using indexes, as in other programming languages. However, the first index value is
0
and the last allowed index is array size - 1
.An example of using indexes:
uint8_t array[3][3] = { { 1, 2, 3 },
{ 4, 5, 6 },
{ 7, 8, 9 } };
for (i = 0; i < 3; i++) { // elements 0,1,2
for (j = 0; j < 3; j++) { // elements 0,1,2
printf("%d\n", array[i][j]);
}
}
You can also traverse arrays in other ways, which we will cover in upcoming material on pointers. Being close to the hardware offers all kinds of fun ways to handle memory. In some cases, a skilled or even less skilled programmer might manage to go outside the bounds of an array using an index, without the compiler noticing, so be careful!! This can lead to situations named as undefined behaviour an important concept in C. Sometimes applications might work, other time might crash.
Variable Type Conversions¶
In C, the type of variables can be changed according to certain rules. The standard explains this in detail, but the following general rules are sufficient for us.
- Conversions that may result in loss of data will trigger a warning in the compiler. For example, converting a value larger than the range of the target type, where the compiler's guillotine drops the highest bits.
- In addition, during addition and multiplication, overflow can occur, causing the guillotine to cut off the highest bits.
- The general conversion hierarchy when using arithmetic operation is seen from bottom to top: double <- float <- long int <- int <- char.
- In arithmetic operations, the basic rule is that the "lower" type of the operands is converted to the "higher" type according to the hierarchy.
- In conversions between signed types (e.g., short -> long), sign extension occurs, where the sign bit is copied to the higher bits.
- Example.
int8_t -> int16_t: 11001010 -> 1111111111001010
, which is derived from two's complement representation. - Signed types are converted to unsigned types in the compiler.
- When converting floating-point numbers to integers, the decimal part is dropped without rounding!!!
- For example, the conversion
(int)0.5 = 0
. - The character variable type char is somewhat special. Since there are 127 ASCII characters, they can be represented using 7 bits. When char is considered as an 8-bit number, the MSB is left as the sign bit. There is variability in compilers regarding whether the char type is interpreted as a negative number using two's complement or as an 8-bit positive number. Therefore, to be certain, you can use the signed or unsigned specifiers with numerically treated chars.
Examples.
uint8_t a = 1234; // Compilation produces an error message main.c: In function 'main': main.c:6: warning: large integer implicitly truncated to unsigned type // Addition where the result wraps around the value range // The maximum value for int8_t is 127 int8_t a = 33; int8_t b = 101; int8_t c = a + b // result is -122
Forcing Type Conversion (Casting)¶
C provides a type conversion operator (known as a cast), which can be used to force a conversion at any point in the code. This can be useful when, for example, a library function requires arguments of a specific type. The cast operator is in the form
(typename) expression
.Example.
uint16_t x = 7;
double y = sqrt((double) x); // Here, x is forced to double
// before calculating the square root
The cast operator can break the previously mentioned hierarchy when needed.
Conclusion¶
We will work more with arrays in conjunction with pointer variables. When dealing with variable conversions and initializations, it is good to remember the compiler's guillotine, which can save you from many strange bugs.
Give feedback on this content
Comments about this material