COMP 348: Principles of
Programming Languages
Section 5: Data Types
Section 5 Topics
• Basic data types
– ints and floats
• Data type size
• C arrays
• Composite data types
• Forms of memory allocation
• Creating data elements dynamically
• Use of pointers
5-2
Basic data types
• As is the case with other compiled languages, C has a
set of basic or primitive data types
– Primitive types essentially correspond to the data types that
can be directly understood and manipulated by the CPU.
• This consists of:
– Integer types
– Floating point types
• ints can take several forms
– char
– short int
– int
– long int
– long long int
• All integers can use a signed (default) or unsigned
form.
5-3
Basic types…cont’d
• Floating point values can be:
– float
– double
– long double
• The C99 standard also includes a boolean type
called bool (defined in <stdbool.h>
– Adds a new bool data type
– Allows tests against true and false
• By the way, what if your compiler does not
support the boolean type?
– #define TRUE 1
– #define FALSE 0
– Add these in a header file available to your project
5-4
Basic data types
• Superficially, this seems to be similar to Java.
• However, there is a very BIG difference
between the two languages.
• Java runs on a virtual machine that is
standardized across all OS/CPU architectures.
– So, for example, in Java an int is ALWAYS 4 bytes
(or 32 bits).
– It does not matter at all if the native word size of
the machine your app is running on is 32 bits or
64 bits
– If you port your Java source code to a completely
different architecture, it should work the same
way.
5-5
Basic types…cont’d
• This is NOT the case with C.
• Instead, C uses the data types supported by
the underlying machine hardware
– So an int may be 2 bytes, 4 bytes, or 8 bytes.
– This is (just) one of the reasons why C is not
considered to be a particularly portable language.
• The same is true for floats and doubles
– i.e., single-precision and double-precision floating
point values.
• This doesn’t mean that you can’t port C code.
– It just means that you have to be aware of these
issues.
5-6
Does this matter?
• Yes, there are times when it will matter a lot.
• In a simple program than uses small integer
values, it’s not a problem.
– E.g., all values less than 10,000.
• If, however, you have large values in your program,
your application may fail when it is ported to a new
platform.
• For example, if you assign a number like
10,000,000,000 (needs an 8 byte int) to a 4-byte
int, the result will be garbage.
– The program may compile, but it may either
1. Run perfectly if the numbers are not too big
2. Run but produce incorrect results if the numbers are large
3. Run but fail at run-time if there are, for example,
conditional checks that depend on these values.
5-7
How do we know what we have?
• In serious commercial applications, you should
ensure that you are using the right types.
• C provides a sizeof operator to determine the
byte count of a given primitive type.
• Note that sizeof can be used with types or
variables.
– When used with data types, the type value must be
enclosed in parentheses.
– sizeof (int)
– sizeof foo // asumming foo is a variable
• sizeof can also be used with complex data
types.
– More on this later.
5-8
Arrays
• C supports basic arrays with a simple
subscript syntax.
• Arrays may contain primitives, but also the
complex types we will see soon.
• The syntax itself is very similar to what you
have seen already
– int myArray[10];
– float foo[140];
– int bar[foo]; // assuming foo is a variable
5-9
Arrays…cont’d
• We can also have multi-dimensional arrays.
– int foo[10][20];
• It’s also possible to initialize arrays at
compile time
– float array1[4] = { 10.0, 20.0, 100.0, 0.001 };
• Finally, you can use the sizeof operator on
arrays
– sizeof foo
• This gives the number of bytes in the foo array
– sizeof foo / sizeof foo[0]
• number of elements in foo
5-10
Bounds checking
• Java provides automatic “bounds checking” to
make sure you don’t index outside of the array
– This will generate a run-time error
• C does NOT do this
– The compiler provides NO bounds checking on
arrays
– It will simply jump to the location indicated by your
array subscript(s) and return whatever it finds there
– This will be complete junk.
– The most common occurrence of this is the “off by
one” error on a FOR loop.
• Bottom line: It is your job as a programmer to
ensure that your array indexing is correct.
5-11
C structures
• As noted, C has no classes/objects.
• It does, however, provide a composite data
type.
• This is called a struct
• Informally, a struct is like a class definition
without any methods
– In other words, it only has data members.
• structs consist of one or more C data types
– ints, floats, other structs, character strings, etc.
5-12
struct syntax
• The basic format is quite simple. An example
would be:
struct foo {
int x;
float y;
};
• This defines a new struct type called foo, consisting of
an int and a float
• IMPORTANT: This does not set aside any space – it just
defines the format of the new type
• struct definitions are typically included in header files
so that other functions/programs can use them
5-13
structs…cont’d
• We can now use the struct as follows
struct foo bar;
• This sets aside memory for a variable called bar that is of
type struct foo.
– In C, struct members are accessed with a dot notation, much like
Java objects
– E.g., bar.x
• How much memory?
• Well, we can use the sizeof operator again
– sizeof bar OR sizeof (struct foo)
• However, the result is not always what you expect
• The compiler will often “pad” the structure in order to have
each element of the structure aligned on the CPU’s natural
word boundaries
– E.g., every 4 bytes or 8 bytes
– This make access much more efficient
5-14
Static versus dynamic data
• All of the data types we have looked at so far
have been statically defined.
• In other words, the compiler knows how much
space is required to store an int, an array of
ints, a struct, or even an array of structs.
• For applications to be useful, however, we
must be able to dynamically generate data at
run-time
• This is similar to what we do with new in Java;
– Thing x = new Thing(43);
5-15
Memory Allocation
• Before we look at the mechanism used for
this, we must be clear on the forms of
memory allocation in a C program.
• In fact, there are three ways that memory
can be set aside for data in C
1. Static
2. Automatic
3. Dynamic
• It is very important that you understand the
difference.
5-16
Static Allocation
• Static allocation occurs when the compiler knows
that a variable will exist for the lifetime of the
program.
• This generally occurs when a variable is declared
outside of any function (typically at the top of the
source file).
• This is a “global” variable that may be seen by any
function in the entire application
– We can add a static specifier to restrict it to the local
source file
• Bottom line: There is only one copy of this variable
so the compiler can set aside memory for this at
compile time
– In other words, space for this is allocated directly
alongside the object code.
5-17
Automatic Allocation
• Automatic allocation, as the name implies, occurs
automatically at run-time.
– In practice, this kind of allocation is used for variables that
are defined inside a function.
• These variable only exist while the function is
executing.
• A run-time component, added to your code by the
compiler, will set up memory for these variables
when the function is called.
– When the function finishes, all memory is released.
• We do not set aside memory for these variables at
compile time because:
1. We would have to create memory for ALL variables in the
program, even though only one or two functions might
be running at any one time
2. Recursion would not be possible since each recursive call
needs its own version of the function’s variables
5-18
Automatic allocation…cont’d
• Where is this function memory?
• It is allocated as required on the process stack.
– A stack is a last in/first out (LIFO) data structure.
• Each function is given its own stack frame.
• Inside the frame are each of the local variables in
the function
– There is other stuff in the frame as well, including:
• The address in the calling function to return to after the
current function finishes
• The arguments passed to the function
• The location of the stack for a given process is at
a known location in memory
– The stack grows and shrinks as the program runs
5-19
Automatic allocation…cont’d
• At the right is a simplified
depiction of a process stack 43
– As noted, there are some other top of
values in each frame (e.g., return 2 67.999 stack
address and function args) ‘a’
• In this case, there are three
frames on the stack. 37.66
• Here, function 0 was invoked first 1
19
and its frame was set up and
populated 9090
– function 1 was called before
function 0 finishes 6
– Function 2 was called before 234
function 1 finishes 0
43567
• Question: what is function 0? Base of
0.7899 stack
– function 0 is main() since it is at the 9
base of the stack
– It must be first on the stack and the
last one off since when it completes,
the program exits
5-20
Dynamic allocation
• We now get to dynamic allocation.
• In this case, memory is allocated at run-time
only as you (the programmer) asks for it.
• The memory itself will be taken from a pool of
memory called the heap, that is set aside for
each process.
• Like the stack, the heap will grow as required.
• The low level allocation of the memory will be
done for you
– Your job is to indicate how much memory you need.
5-21
The malloc family
• <stdlib.h> contains prototypes for a series or memory
allocation functions.
• Note these functions are not “built-in” to the C
language like Java’s new.
• Instead, it is done with standard library functions, just
like printf.
• There are three basic memory allocation functions
1. malloc: allocate some number of bytes of uninitialized
memory
• By far the most common of the three functions
2. calloc: similar to malloc except that it initializes the memory
to 0.
3. realloc: re-allocate a previous memory block to a new size.
• Typically used for re-sizing arrays
• It will keep the existing contents if your re-size to a larger space.
• Note that realloc will create a completely new memory area, copy the
contents of old into new, and then delete the old space
5-22
malloc…cont’d
• The syntax of malloc is as follows:
void *malloc(size_t size)
• Note that size_t is just an integer type.
• So the idea is that you simply pass malloc a count of
the number of bytes to create
• The function returns a pointer to the location in the
heap where the data has been created
– In C, the “*” refers to a pointer value (it is also used for
multiplication)
– A pointer is a reference to a memory location
– Note that this is a void pointer
• In other words, this is a “generic” pointer that could be used to point
to any type of data.
5-23
malloc…cont’d
• The pointer is then assigned to a variable.
• Traditionally, we cast the void pointer to the
appropriate type for the variable
– In recent compilers, the cast is not strictly required.
• The following code creates a buffer to hold an
array of 10 integers
int *buffer = (int *)malloc( 10 * sizeof (int) );
• Here, malloc creates a buffer of 40 bytes (assuming 4
byte integers) and assigns this pointer to the variable
buffer.
• We can now access values in the buffer using standard
array syntax
– buffer[2] will access the third integer in the buffer
5-24
malloc errors
• What if something goes wrong?
• Specifically, what happens if malloc cannot give
you the memory you want?
• In that case, malloc returns a NULL pointer.
– In practice the NULL pointer is equivalent to address 0.
• While malloc rarely fails (unless you are literally out
of memory), you still SHOULD check for this.
– If you don’t, and there is a problem, your program WILL
crash when you try to access the NULL pointer
• So there should be something equivalent to the
following
int *buffer = (int *)malloc( 10 * sizeof (int) );
if (buffer == NULL){
// error handling, possibly abort
}
5-25
Pointers and the heap
• A pointer represents a
memory address, often one
inside the heap Memory
• A variable stores this used for
address
foo (e.g.,
*foo 1248 a large
– Here, the variable foo stores array)
the memory address 1248.
– This could have been
generated by a previous call unused
to malloc. space
• Note that Java does within the
something similar when *bar 924
heap
new is used
– You just don’t have access Memory
used for
to the actual pointer bar
address.
– We just talk about a
“reference” to the object
instead. Heap
5-26
Logical versus physical addresses
• Important: The addresses generated by malloc are
logical addresses.
• They represent the relative position of a data
element in the current process space
– From 0 to max_size of process (e.g., 4 GB)
• They do NOT refer to a physical address in
memory/RAM.
• At compile time, the compiler has no way to know
where your process will be loaded into memory
– This is decided by the operating system at runtime
• The OS and CPU’s memory hardware will convert
logical addresses to physical addresses as the
program runs
– This is completely transparent to you as a programmer
and you generally do not have to worry about this.
5-27
Putting it all together
Stack max - 1
• Graphically, the memory regions
in a given process can be
displayed as follows
• Automatic allocation:
currently
– The stack is located at the top of the unused
In practice,
space and grows downward this is by far
space
• Dynamic allocation: the largest
region
– The heap is in the lower region of
the space and grows upward
• Static allocation Heap
– Static data is in a fixed location near
the bottom of the space and is
assigned at compile time by the
compiler, along with the executable
code
• Note: when you try to access an Code
invalid memory location, the OS
will abort your application and No valid
print a segmentation fault user code or Static Data
error. data at 0
position 0
Logical process space
5-28
Cleaning up memory
• Any dynamically allocated memory should be
returned to the heap when it is no longer needed.
– There is no automatic “garbage collection” in C.
• If you don’t do this, the program produces what
are called “memory leaks”.
– These are heap areas that never get reclaimed
– A few of these are not a problem, but if leaks are
produced each time a function is called, a long running
program will eventually run out of memory and crash.
• Note that memory is always fully reclaimed when a
program ends but it is VERY bad practice to rely on
this.
– It is also very difficult to maintain large complex systems
that have used poor (or NO) memory management
practices.
5-29
The free() function
• In addition to malloc(), stdlib provides the
free() function.
void free (void *ptr);
• The ptr parameter refers to a pointer previously
produced by a malloc call.
• This will return the memory used by the data
element back to the heap, where it will be
available for re-use.
• IMPORTANT: You should NEVER try to free() the
same data more than once (i.e., free something
that has already been free’d)
– This will generate an error and the program will abort
5-30
The address operator
• Occasionally, we need to obtain the memory
address of an existing data element
• The & operator is used for this purpose.
– Note that pointers (“*”) are already addresses, so we
typically use & with primitive data types.
• The most obvious example of its use is with
functions that need to assign a value directly to
a memory location
• An example would be the scanf function that
writes a keyboard entry into a variable.
int count;
printf(“Enter count: “);
scanf(“%d”, &count);
5-31
Character strings
• One of the most challenging elements of C
programming is the manipulation of character
strings.
• Most modern languages have built-in String
types (or classes) that provide powerful and
intuitive string processing.
• C does not.
• In practice, strings are just arrays of
characters.
• This may sound fine, but doing anything
sophisticated with raw character arrays is both
tedious and error prone.
5-32
String basics
• A group of string functions are provided by the C
libraries.
• The header file is <string.h> and contains 20+
functions for manipulating strings of chars,
including functions for:
– Determining the length of a string
– Concatenating two strings
– Creating a duplicate of string
– Comparing strings for equality (does “foo” = “fOo” ?)
– Finding substrings
– Breaking a string into individual tokens
• Many of the functions have an unlimited and
limited version
– E.g., strcpy versus strncpy (copy only first n chars)
– If possible, it is safer to use the limited version.
5-33
Strings…cont’d
• So all of the basic functionality is there.
• But string manipulation is very “low level”.
• Strings are char arrays, typically accessed with
a char *
– char *my_string = (char *) malloc(10);
• Strings in C end with a terminating character
– In practice, this is NULL or ‘\0’;
– Printing or processing strings without this
terminating char will likely cause your program to
fail
– The terminating char is NOT part of the string length
– A training newline char DOES count as part of the
length (“abc\n\0” has length = 4”)
5-34
Copying and concatenating
• When you append something to an existing
string, you MUST ensure that the new string
has space available.
• C treats strings as char arrays, so it will
happily write past the end of the current
array and destroy what is currently there.
• Note: simply declaring a char * does not
create space for the char array
– it just creates a pointer variable.
– This variable does NOT point to anything yet.
5-35
Strings…cont’d
• For example, let’s say that you want to concatenate two
strings, A and B.
• There is a string concat function but it simply copies string B
to the end of string A. This is probably not what you want
• The logic for a “proper” operation might be as follows:
lenA = length of A
lenB = length of B
Create tmp string C of length lenA + lenB + 1
// the “+ 1” is for the terminating char
Copy A to tmp
Concat tmp and B into new longer tmp
// NULL terminator automatically added in last pos
Free memory of A
Copy tmp pointer to A pointer
// A is now the combination of A and B
5-36
What can go wrong?
• In the previous example, let’s say that you forgot to
take the terminating char into account
• This would make the new array 1 char too short
• When the concat function is called, the \0 would be
written AFTER the last valid position of the char array
– Recall that there is no bounds checking with arrays
• If you process this string immediately, the code will
appear to work
– Functions will keep reading the char array until they hit the
\0
• However, this char position may belong to the memory
of another variable
– When you eventually update this variable, the new value will
over-write the \0 char
– Future string processing will no longer have a terminating
char and the program will likely crash at some point
– These are AWFUL bugs to fix
5-37
How to deal with this
• It is generally a good idea to create “wrapper”
functions or borrow third party string functions
that do this work for you.
– In other words, if you often use string concatenation
in your program, write a high-level function that
does the length determination and memory
management for you.
– You can test and debug this to make sure that it will
always work perfectly
• Secondly, you should use memory profiling
tools that identify these types of errors
– More on this shortly.
5-38
Final string issues
• A “pre-defined” string can be declared as
follows:
– char *foo = “My dog house”;
– No malloc or free is needed here
• We can also have this:
– char foo[4] = {‘a’, ‘b’, ‘c’, ‘\0’};
• An empty string is perfectly valid (e.g., foo = “”)
– This is a “real” string of length 0
• A un-initialized string pointer is NOT an empty
string
– It is just a pointer variable that points to nothing
5-39
Function parameters
• When functions are invoked, we often pass
arguments to the functions
– Parameter is the term for the name/type of the
component in the function definition
– Argument is the term for the actual value that is
passed to the function
• In general, we often talk about two main
techniques for passing arguments to a
function:
1. Pass by value
2. Pass by reference
5-40
Passing arguments in practice
• Pass by value: a copy of the argument is
passed to the function
– e.g., this is what happens with primitives like ints
and floats
– Changing this value in the function has no impact on
the original value in the calling program
• It only changes the values within the stack frame
• Virtually all languages provide a call by value
mechanism
• Call by reference: the address of the argument
is extracted and passed to the function
– Here, any changes made to the argument DO change
the value of the argument in the calling function
5-41
How C does it
• In C, we often pass a pointer to an array (or string)
– E.g., void my_func( char *parm_string);
• Here, we can directly modify the contents of
parm_string since we are passed a pointer to the char
array.
• This looks like pass-by-reference.
• However, strictly speaking, it is still pass by value since
we have actually passed a copy of the pointer
– In true pass-by-reference (available in C++), when we pass a
variable, the function actually gets a direct reference to that
variable.
– Note that Java is actually pass by value, just like C, since we
cannot change the reference to the Object, even though we
can change the object’s contents.
• NOTE: the bottom line, however, is that when you pass
pointers to a function, you have direct access to the
data referenced by the pointer, NOT a copy of the data.
5-42
Summary
• In this section, we’ve looked a many of the
basic data types in C
• We’ve also looked at some of the issues
that are exposed by C, but might not be
relevant to Java.
• Take a look at the source file below and
make sure you know what is happening in
each case
Source file: cData.c
5-43