0% found this document useful (0 votes)
59 views135 pages

Data Structures and Algorithms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views135 pages

Data Structures and Algorithms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 135

Data Structures and

Algorithms

Compiled by:

ELIAS A. AUSTRIA
LYDINAR D. DASTAS
MICHAEL B. DELA FUENTE
RIA A. SAGUM

Revised Edition (April 2021)


Data Structures and Algorithms

Overview:
Data structure is an essential area of study for computer scientists and for anyone who will ever
undertake any serious programming task. This course deals with the fundamentals of organizing
and manipulating data efficiently using clean conceptual models. In particular, the emphasis is
on the organization of information, the implementation of common data structures and techniques
of data abstraction. Implementations in this course are carried out in any programming
languages, but the principles are more generally applicable to most modern programming
environments.

Data Structures are containers that contain objects of data types. There are several common
data structures, and each has its own behavior and applications. Common data structures are
arrays of one or more dimensions, stacks, linked lists (singly and doubly linked), queues, trees
(balanced, binary, and so on), graph and hashing. Understanding data structures helps the
student understand how they behave, and when to use any of them.

Course Information:
Course Code: COMP 20063
Course Title: Data Structures and Algorithms
No. of Units: 3
No. of Hours: 3

Course Outcomes:
After successful completion of this instructional material, you should be able to:

• Demonstrate an understanding of basic data structures (such as an array-based list,


linked list, stack, queue, binary search tree) and algorithms

• Demonstrate the ability to analyze, design, apply, and use data structures and
algorithms to solve engineering problems and evaluate their solutions.

• Demonstrate an understanding of analysis of algorithms through the idea of trade-offs


and measurement of effectiveness.

ii
Course Contents:

Unit 1 – Overview of Data Structures. In this course we will study data structures and learn
how to write efficient programs. But this has nothing to do with programming tricks but
rather with good organization of information and good algorithms that saves computer
memory and running time.

Unit 2 – Algorithm Analysis. Algorithm Analysis or Analysis of Algorithm is the theoretical


study of computer program’s performance and resource usage. In this unit, we present
the idea of algorithms and how performance and resource usage affect the effectiveness
of an algorithm.

Unit 3 – Arrays. Most programming languages in one way or another have its own
implementation of an array. In this unit we present an array which is used by most of the
data structures to implement their algorithms.

Unit 4 – Lists. List provide a convenient way for keeping track of things in our everyday life:
shopping lists, laundry lists, list of thigs to be done, and so on. Similarly, lists provide a
convenient way for representing arbitrary nonnumeric information in a computer. The
remainder of this unit will deal with the list a formal data structure and will explain how it
is represented and manipulated in a computer.

Unit 5 – Stacks. Stacks are dynamic data structures that follow the Last In First Out (LIFO)
principle. The last item to be inserted into a stack is the first one to be deleted from it. In
this unit we will explore stacks and its applications and learn how to implement it.

Unit 6 – Queues. Queues are data structures that follow the First In First Out (FIFO) principle
where in the first element that is added to the queue is the first one to be removed. In
this unit we will explore queues and its applications and learn how to implement it.

Unit 7 – Trees. We have already considered two data structures, the stack and the queue. Both
are linear in nature and are represented in the memory either by a sequentially allocated
vector (array) or by a linked linear list. In this unit we now focus on two important non-
linear structures- the tree and the binary tree.

Unit 8 – Balanced Trees. The search time in a binary search tree depends on the form of the
tree, that is on the order in which its nodes were inserted. In practice, one can however
not always guarantee that binary search trees are built randomly. Binary search trees
are thus only interesting if they are “relatively complete.” So we must look for
specializations of binary search trees whose worst-case performance on the basic tree
operations can be guaranteed to be good, that is O(log2n) time. In this unit we will talk
about balanced trees as a solution to the problem of search time. Particularly we will look
into the balanced properties of binary search trees and m-way search trees.

Unit 9 – Graphs. In computer science, a graph is an abstract data type that is meant to
implement the undirected graph and directed graph concepts from the field of graph
theory within mathematics. In this unit we will focus on the basic concepts of graphs, its
representations and searching in graphs.

iii
Unit 10 – Hashing and Collision. As data structures deals with how data is represented in
memory (primary), file structures on the other hand deals more on how data is
represented on secondary storage with emphasis on file access. In this unit we will
discuss the different types of file structures, the different techniques used in file access
and how to resolve some of the issues in relation to file access.

References:

Goodrich, Michael T., Tamassia, Roberto, Goldwasser, Michael H. (2014). Data


Structures and Algorithms in Java 6th Edition, Wiley

Sedgewick, Robert, Wayne, Kevin (2011). Algorithms 4th Edition

Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

Grading System:

Class Standing 70%

• Assignments
• Unit Activities
• Programming Problems

Examination 30%

TOTAL 100%

Final Grade = 50% of MidTerm Grade + 50% of FinalTerm Grade

Revised Edition (April 2021)

iv
Table of Contents
Data Structures and Algorithms .................................................................................... i
Overview:............................................................................................................................................ ii
Course Information: .......................................................................................................................... ii
Course Code: COMP 20063 ........................................................................................................... ii
Course Title: Data Structures and Algorithms ............................................................................. ii
No. of Units: 3 ................................................................................................................................. ii
No. of Hours: 3 ................................................................................................................................. ii
Course Outcomes:............................................................................................................................. ii
Unit 1 – Overview of Data Structures ........................................................................... 1
Overview:............................................................................................................................................ 1
Module Objectives: ............................................................................................................................ 1
Course Materials:............................................................................................................................... 1
Unit 2 – Algorithm Analysis .......................................................................................... 8
Overview:............................................................................................................................................ 8
Module Objectives: ............................................................................................................................ 8
Course Materials:............................................................................................................................... 8
Activities/Assessments: ................................................................................................................. 19
Unit 3 – Arrays .............................................................................................................. 20
Overview:.......................................................................................................................................... 20
Module Objectives: .......................................................................................................................... 20
Course Materials:............................................................................................................................. 20
Activities/Assessments: ................................................................................................................. 28
Unit 4 – Linked Lists .................................................................................................... 30
Overview:.......................................................................................................................................... 30
Module Objectives: .......................................................................................................................... 30
Course Materials:............................................................................................................................. 30
Activities/Assessments: ................................................................................................................. 36
Unit 5 – Stacks .............................................................................................................. 40
Overview:.......................................................................................................................................... 40
Module Objectives: .......................................................................................................................... 40
Course Materials:............................................................................................................................. 40
Activities/Assessments: ................................................................................................................. 51
Unit 6 - Queues ............................................................................................................. 53
Overview:.......................................................................................................................................... 53
Module Objectives: .......................................................................................................................... 53
Course Materials:............................................................................................................................. 53
Activities/Assessments: ................................................................................................................. 61
Unit 7 - Trees ................................................................................................................ 62
Overview:.......................................................................................................................................... 62
Module Objectives: .......................................................................................................................... 62
Course Materials:............................................................................................................................. 62
Activities/Assessments: ................................................................................................................. 83
Unit 8 – Balanced Trees ............................................................................................... 86
Overview:.......................................................................................................................................... 86
Module Objectives: .......................................................................................................................... 86
Course Materials:............................................................................................................................. 86
Activities/Assessments: ............................................................................................................... 102

v
Unit 9 - Graphs ........................................................................................................... 103
Overview:........................................................................................................................................ 103
Module Objectives: ........................................................................................................................ 103
Course Materials:........................................................................................................................... 103
Activities/Assessments: ............................................................................................................... 109
Unit 10 – Hashing and Collision ............................................................................... 111
Overview:........................................................................................................................................ 111
Module Objectives: ........................................................................................................................ 111
Course Materials:........................................................................................................................... 111
Activities/Assessments: ............................................................................................................... 128

vi
Unit 1 – Overview of Data Structures

Overview:
In this course we will study data structures and learn how to write efficient programs. But this has
nothing to do with programming tricks but rather with good organization of information and good
algorithms that saves computer memory and running time.

Module Objectives:
After successful Completion of this module, you should be able to:

• Define and differentiate data structures, data type and abstract data type
• Discuss how one will select a data structure
• Describe the different data structures
• Knowledgeably converse about the concepts of abstract data types

Course Materials:

Learning about computers does not stop in learning how to do programs to run in them. It
is also important to know how data is represented in memory (data structure) and how it is
represented on the disk (file structure). Data structure is a way of collecting and organizing data
in such a way that we can perform operations on these data in an effective and sometimes efficient
way.

Doing efficient programs requires efficient data structures. By efficient we mean a problem
has to be solved within the given time and space constraints. Each problem puts constraints on
time and space. For example, in banking, opening or starting an account will require a few
minutes. Doing transactions, say withdrawing cash or depositing it, requires a few seconds. But
closing and account normally takes overnight. Here we see constraints on time and space. A
solution is efficient if it solves the problem within its space and time constraints. The cost of a
solution equates to the amount of resources consumed.

In order to solve problems efficiently one can do the following:

1. Analyze the problem to determine the resource constraints a solution must meet.
2. Determine the operations that must be supported (e.g., record search, insertion, deletion,
etc.)
3. Quantify the constraints for each operation (e.g., search operations must be very fast)
4. Select data structure that best meet these requirements.

1
Cost and Benefits

Each data structures requires spaces for each data item it stores, time to perform each
operation, and some programming effort to implement it. Every data structure has costs and
benefits. Rarely one data structure is better than another in all situations. One may permit faster
search (or insertion or deletion) operations than the other but that does not mean that it is better
than the rest. Note also if all operations are of the same importance.

Data Structures, Data Type and Abstract Data Type

Earlier we defined data structure as a way of collecting and organizing data in such a way
that we can perform operations on these data in an effective and sometimes efficient way. This
notion of data structures involves:

• a set of data elements;


• the way the data elements are logically related;
• and a set of allowable operations on the data elements.

From this notion, data structures are structures programmed to store ordered data, so that
various operations can be performed on it easily. It represents the knowledge of data to be
organized in memory, both primary and secondary. It should be designed and implemented in
such a way that it reduces the complexity and increases the efficiency.

Figure 1 Classification of Data Structures

Data type is another term usually used synonymously with data structures but in itself is
different from data structures. In programming, data type is a set of values from which a variable,
constant, function, or other expression may take its value. For example, integer takes on whole
numbers, real numbers represent fixed-point and floating-point values, character takes on
alphanumeric values, Boolean represents a true or false value, while date/time represents a range
of values depicting date and time. Data type can be thought of as a set of the “same kind” of data

2
processed by the computer. Data has to be examined carefully so that the most appropriate data
type can be used.

Basically, anything that can store data can be called as a data structure, hence Integer,
Float, Boolean, Char, etc., all are data structures. They are known as Primitive Data Structures
or Basic Type. Figure 1 and Table 1 represent two ways of looking at data structures. Though
differently labeled with varying categories or classifications, you can see that they show almost
the same thing.

Table 1 Categories of Data Structures

Complex Data Structure


Primitive Compound Data Structure
Simple Data
Data File
Structure Non-Linear
Structure Organization
Linear
Binary N-ary
Graph
General Sequential
Binary Tree
Integer Array List Tree Relative
Binary
Char String Stack M-way Indexed
Search Tree
Search Tree Sequential
Real Record Queue
AVL Tree
Heap Multi-Key
Boolean
B-Tree

Basic Data Type or Primitive Data Structures

Basic Data Type or Primitive Data Structures represent a set of individual data and is
frequently used to create a program. Sometimes called atomic data structure as they represent a
form where data can no longer be divided or have no parts. This group can be further divided into
Simple Type and Pointer Type.

Simple Type

Simple type is the most basic data type which is usually declared according to the
syntax rule of a programming language. Most common data types fall under this group.

• Integer type – represents integers or whole numbers where in the maximum or


minimum value is the unit of data that a computer can process at one time and is
determined by the length of one word.

• Real number type – represent fixed-point and floating-point numbers

• Character type – comprised of alphabets, numerals, and symbols as characters. A


character code is expressed as a binary number inside a computer.

• Logical type – sometimes referred to as Boolean type where the values are used
in performing logical operations such as AND, OR, and NOT.

3
• Enumeration type – a data type that enumerates all possible values of variables.

• Partial type – used to specify an original-value subset by constraining existing data


types, that is identifying upper and lower limits of a variable.

Pointer Type

Pointer type are addresses that are allocated in a main memory unit. Pointer types
are used to refer to variables, file records, or functions.

Pointer-type variable Variable “b”

(Address of
Data
variable “b”)

Figure 2 Pointer type

Structure Type or Simple Data Structure

Structure Type or Simple Data Structure is a data structure that contains a basic data type
or any of the defined data types as its elements. Arrays, strings, and records are examples of
Structure Type. They represent a collection of data elements. Array type or array is simply a finite
set of elements having the same type referenced under a common name. If this type is a character
type, we refer to it as a string or a collection of character elements. Record type on the other hand
is also a set of elements but this time of different data types referenced under a common name.
This is useful if we would like to organized a collection of data but do not want to manage them
separately (individually).

Abstract Data Type

Abstract Data Type is a part of Basic Data Structure but represents those under Problem-
oriented Data Structure. Abstract Data Type or ADT is almost always synonymous to Data
Structures but represents more of a logical description (abstract) rather than actual
implementation. ADT is basically a mathematical model where a set of data values and its
associated operations are precisely specified independent of any particular implementation.

Figure 3 Abstract Data Type representation

4
ADT is a kind of data abstraction where a type’s internal form is hidden (information hiding)
behind a set of access functions. By data abstraction, we mean any implementation of data in
which the implementation details are hidden (abstracted) from the user. Hiding data on the level
of data type is often called data encapsulation.

Think of ADT as a black box where inputs and outputs are known but how things are
programmed are not (ADTs hide implementation details). This idea keep internal calculations
private providing better modularity and allows for localize changes and better division of labor. A
data structure is the implementation of an ADT where the operations associated with the ADT are
implemented by one or more functions.

Logical and Physical Forms

Every data items have both a logical and physical form.

1. Logical form: definition of the data item within an ADT (e.g. integers in mathematical
sense: +,-, *, / (operations)
2. Physical form: implementation of the data item (e.g. 16 or 32 bit integers)

DATA TYPE

ADT: Data Items:


Type + Operations Logical Form

Data Structure: Data Items:


Storage Space + functions Physical Form

Figure 4 Logical and Physical Form of a Data Type

Defining an ADT depends on the application one is making and there are different
definitions for the same application. An ADT hides implementation details providing different
implementations for the same ADT. When the ADT is given, its data type can be used by the
programmer (e.g., string and math libraries in C). When the implementation changes, the
programs need not changed.

Classification of Data Structures based on Characteristics

The different data structures can also be classified on the basis of their characteristics.
The following table presents the different characteristics and how they are described.

5
Table 2 Classification of Data Structures based on Characteristics

Characteristic Description
Linear In linear data structures, the data items are arranged in a linear
sequence. Example: Array
Non-Linear For non-linear data structures, the data items are not in sequence.
Example: Tree, Graph
Homogeneous Homogeneous data structures represent a structure whose elements
are of the same type. Example: Array
Non-Homogeneous In Non-homogeneous data structure, the elements may or may not be
of the same type. Example: Structures
Static Static data structures are those whose sizes and structures associated
memory locations are fixed, at compile time. Example: Array
Dynamic Dynamic structures are those which expands or shrinks depending
upon the program need and its execution. Also, their associated
memory locations changes. Example: Linked List created using
pointers.

Types of data structures

Data structure types are determined by what types of operations are required or what
kinds of algorithms are going to be applied. These types include:

• Arrays - An array stores a collection of items at adjoining memory locations. Items that
are the same type get stored together so that the position of each element can be
calculated or retrieved easily. Arrays can be fixed or flexible in length.
• Stacks - A stack stores a collection of items in the linear order that operations are applied.
This order could be last in first out (LIFO) or first in first out (FIFO).
• Queues - A queue stores a collection of items similar to a stack; however, the operation
order can only be first in first out.
• Linked lists - A linked list stores a collection of items in a linear order. Each element, or
node, in a linked list contains a data item as well as a reference, or link, to the next item
in the list.
• Trees - A tree stores a collection of items in an abstract, hierarchical way. Each node is
linked to other nodes and can have multiple sub-values, also known as children.
• Graphs - A graph stores a collection of items in a non-linear fashion. Graphs are made
up of a finite set of nodes, also known as vertices, and lines that connect them, also known
as edges. These are useful for representing real-life systems such as computer networks.
• Tries - A trie, or keyword tree, is a data structure that stores strings as data items that can
be organized in a visual graph.
• Hash tables - A hash table, or a hash map, stores a collection of items in an associative
array that plots keys to values. A hash table uses a hash function to convert an index into
an array of buckets that contain the desired data item.

6
These are considered complex data structures as they can store large amounts of
interconnected data. Examples of primitive, or basic, data structures are integers, floats, Booleans
and characters.

Uses of data structures

In general, data structures are used to implement the physical forms of abstract data types.
This can be translated into a variety of applications, such as displaying a relational database as
a binary tree.

In programming languages, data structures are used to organize code and information in
a digital space. For example, Python lists and dictionaries or JavaScript array and objects are
common coding structures used for storing and retrieving information. Data structures are also a
crucial part of designing efficient software.

Importance of data structures

Data structures are essential for managing large amounts of data, such as information
kept in databases or indexing services, efficiently. Proper maintenance of data systems requires
the identification of memory allocation, data interrelationships and data processes, all of which
data structures help with.

Additionally, it is not only important to use data structures but it is important to choose the
proper data structure for each task. Choosing an ill-suited data structure could result in slow
runtimes or unresponsive code. A few factors to consider when picking a data structure include
what kind of information will be stored, where should existing data be placed, how should data be
sorted and how much memory should be reserved for the data.

Watch:

• What are Data Structures? By CS Dojo (https://youtu.be/bum_19loj9A)


• Data Structures By CrashCourse (https://youtu.be/DuDz6B4cqVc)
• Do You Need To Learn Data Structures and Algorithms? By Andy Sterkowitz
(https://youtu.be/2mOgMkSFLiA)

Review:

1. What is a data structure? How is it different from an abstract data type?


2. What is abstraction?
3. Name two things you can use data structures for.
4. Is our mental picture of a data structure always the same as the actual way the data
structure is represented in the computer?
5. When we consider a data structure from the abstract viewpoint, what are we mainly
interested in?

7
Unit 2 – Algorithm Analysis

Overview:
Algorithm Analysis or Analysis of Algorithm is the theoretical study of computer program’s
performance and resource usage. In this unit, we present the idea of algorithms and how
performance and resource usage affect the effectiveness of an algorithm.

Module Objectives:
At the end of this module, you should be able to:

• Discuss the concept of algorithms


• Discuss how to determine space complexity
• Discuss how to determine time complexity
• Know precise ways of analyzing whether a data structure or algorithm is good.

Course Materials:

An algorithm is a finite set of instructions or logic, written in order, to accomplish a certain


predefined task. Algorithm is not the complete code or program, it is just the core logic (solution)
of a problem, which can be expressed either as an informal high-level description as pseudocode
or using a flowchart.

Every algorithm must satisfy the following criteria:

1. Input – An algorithm has zero or more input quantities which are externally supplied taken
from a set of objects called the domain of the algorithm.

2. Output - It has one or more output quantities which generate a set called the range of the
algorithm.

3. Definiteness - Each instruction must be clear and unambiguous, meaning each step of
an algorithm must be precisely defined.

4. Finiteness - If we trace out the instructions of an algorithm, then for all cases the algorithm
will terminate after a finite number of steps.

5. Correctness or Effectiveness - Every instruction must be sufficiently basic that it can in


principle, be carried out by a person by manual means and must generate a correct output.

8
To give a concrete example on the concepts of algorithms, consider Euclid’s algorithm for
finding the greatest common divisor of two positive integers m and n. Euclid’s algorithm is:

Step 1: Divide m by n and let r be the remainder

Step 2: If r is zero, stop; the answer is n.

Step 3: Set m ß n, n ß r; got back to Step 1.

Let us try to gain more insight into it, by simply trying it out. So, let m = 608, n = 133. Then:
!"# '!
Loop 1: Step 1. = 4 , so r ß 76
%&& %&&

Step 2. r <> 0 so we continue

Step 3. m ß 133; n ß 76
%&& ('
Loop 2: Step 1. '!
= 1 '!
, so r ß 57

Step 2. r <> 0 so we continue

Step 3. m ß 76; n ß 57
'! %)
Loop 3: Step 1. ('
= 1 ('
, so r ß 19

Step 2. r <> 0 so we continue

Step 3. m ß 57; n ß 19
(' "
Loop 4: Step 1. %)
= 3 %)
, so r ß 0

Step 2. r is equal 0, stop; gcd = 19

Let us now put to test Euclid’s method in terms of the five criteria of an algorithm.

1. Input. The input consists of m and n, which are taken from the set of positive integers.

2. Output. The output is n, which contains the greatest common divisor upon termination of
the algorithm.

3. Definiteness. The stipulation that m and n be positive integers precisely defines step 1,
since there is no ambiguity as to what the remainder is upon dividing a positive integer by
another. If m and n were any real number, then step 1 may not satisfy the criterion of
definiteness.

4. Finiteness. Will the algorithm terminate for any input pair of positive integers m and n?
The answer is yes, it will. In step 1, r is always less than n. This is the value assigned to
n in step 3, unless r is already zero, in which case the algorithm terminates in step 2.

9
Therefore, each time step 1 is subsequently executed, the value of n is smaller than the
previous. A decreasing sequence of positive integers eventually terminates (at worst, at
n=1), so the algorithm must also terminate.

5. Correctness or Effectiveness. We have been able to carry out the steps of the algorithm
in working out the example. If we are patient enough, we should be able to do the same
for any pair (m, n), even if it takes a thousand loops.

Effectiveness of Programs

Effectiveness is defined as being able to perform an action or make a result (output) based
on specifications or what is meant to be achieved. For a program (or an algorithm for that matter)
to be considered effective it must answer two factors.

1. Does it run correctly? Provides no error and runs in accordance with the specifications
given.
2. Does it run efficiently? The program runs effectively within the shortest time possible.

Analysis of Algorithms

The effectiveness of an algorithm can be seen through its performance. Program


performance refers to a characteristic of the program as a whole, while leaving the term efficiency
to refer to a characteristic of a part of the program.

An algorithm is said to be efficient and fast, if it takes less time to execute and consumes
less memory space. Algorithms that are equally correct can vary in their utilization of
computational resources. The performance of an algorithm or algorithm efficiency is usually
measured on the basis of the following properties:

1. Space complexity. Space complexity refers to the amount of memory space required
by the algorithm, during the course of its execution. Space complexity must be taken
seriously for multi-user systems and in situations where limited memory is available.

An algorithm generally requires space for following components:

a. Instruction Space: The space required to store the executable version of the
program. This space is fixed but varies depending upon the number of lines of
code in the program.

b. Data Space: Is the space required to store all the constants and variables
(including temporary variables) value.

c. Environment Space: Refers to the space required to store the environment


information needed to resume the suspended function. An algorithm (function)
may be called inside another algorithm (function). In such a situation, the
current variables are pushed onto the system stack, where they wait for further
execution and then the call to the inside algorithm (function) is made.

10
For example, if a function A() calls function B() inside it, then all the variables
of the function A() will get stored on the system stack temporarily, while the
function B() is called and executed inside the function A().

When memory was expensive, we focused on making programs as space efficient as


possible and developed schemes to make memory appear larger than it really was
(virtual memory and memory paging schemes)

Space complexity is still important in the field of embedded computing especially for
handheld computer-based equipment like cell phones, palm devices, etc.

2. Time complexity. Time Complexity is a way to represent the amount of time required
by the program to run till its completion. It's generally a good practice to try to keep the
time required minimum, so that our algorithm completes its execution in the minimum
time possible.

Note however that time complexity is affected by the complexity of instruction which
depends on the following:
a. machine execution time
b. time to execute the instruction
c. the instruction set
d. translation (the use of compilers)

All of which pertains to the physical hardware that runs the program and the software
tools that is used to translate the program into machine readable form.

Space Complexity of Algorithms

When a solution to a problem is written some memory is required to complete. For any
algorithm memory may be used for the following:

1. Variables (this include the constant values and temporary values)


2. Program instruction
3. Execution

As defined earlier, space complexity is the amount of memory used by the algorithm
(including input values to the algorithm) to execute and produce result. Sometime Auxiliary Space
is confused with Space Complexity. But Auxiliary Space is the extra space or the temporary space
used by the algorithm during its execution. Thus, we can say:

Space Complexity = Auxiliary Space + Input space

Also earlier, while executing, an algorithm uses or require memory space for three
reasons: 1) instruction space; 2) environment space or environmental stack; and 3) data space.
But in calculating the Space Complexity of any algorithm, we usually consider only Data Space
and neglect Instruction Space and Environmental Stack. The two being affected by the type of
hardware used to run the algorithm.

11
For calculating the space complexity, we need to know the value of memory used by
different type of datatype variables, which generally varies for different operating systems, but the
method for calculating the space complexity remains the same.

Table 3 Size in bytes of Different Data Types

Type Size
bool, char, unsigned char, signed char, __int8 1 byte
__int16, short, unsigned short, wchar_t, __wchar_t 2 bytes
float, __int32, int, unsigned int, long, unsigned long 4 bytes
double, __int64, long double, long long 8 bytes

Now let's learn how to compute space complexity by taking a few examples:

{
int z = a + b + c;
return(z);
}

In the above code, variables a, b, c and z are all integer types, hence they will take up 4
bytes each, so total memory requirement will be (4(4) + 4) = 20 bytes, the additional 4 bytes is for
return value. And because this space requirement is fixed for the above example, hence it is
called Constant Space Complexity.

Let's have another example, this time a bit complex one,

// n is the length of array a[]


int sum(int a[], int n)
{
int x = 0; // 4 bytes for x
for(int i = 0; i < n; i++) // 4 bytes for i
{
x = x + a[i];
}
return(x);
}

In the above code, 4*n bytes of space is required for the array a[] elements. 4 bytes each
for x, n, i and the return value.

Hence the total memory requirement will be (4n + 16), which is increasing linearly with the
increase in the input value n, hence it is called as Linear Space Complexity. Similarly, we can
have quadratic and other complex space complexity as well, as the complexity of an algorithm
increases. But we should always focus on writing algorithm code in such a way that we keep the
space complexity minimum.

12
Time Complexity of Algorithms

In order to analyze an algorithm, we need to determine its (algorithm) efficiency which


measures the amount of resources consumed in solving a problem of size in in terms of time and
space. A way of doing this is by benchmarking which is implementing the algorithm, running using
some specific input and measure time taken. But this method is better for comparing performance
of processors rather than for comparing performance of algorithms.

The best option is to perform asymptotic analysis where we evaluate the performance of
an algorithm in terms of the input size (problem size) in relation with the processing time (or space)
required to solve the problem. In here we look for three cases to examine:

1. Best Case – if the algorithm is executed, the fewest number of instructions are
executed.
2. Average Case – executing the algorithm produces path lengths that will on average
be the same.
3. Worst Case – executing the algorithm produces path lengths that are always a
maximum.

Of the three cases, the only useful case from the standpoint of program design is that of
the worst case. Worst case helps us answer the software life cycle question of: If it is good enough
today, will it be good enough tomorrow?

For any defined problem, there can be N number of solutions. This is true in general. If I
have a problem and I discuss about the problem with all of my friends, they will all suggest me
different solutions. And I am the one who has to decide which solution is the best based on the
circumstances.

Similarly, for any problem which must be solved using a program, there can be infinite
number of solutions. Let's take a simple example to understand this. Below we have two different
algorithms to find square of a number (for some time, forget that square of any number n is n*n):

One solution to this problem can be running a loop for n times, starting with the number n
and adding n to it every time.

/*
we have to calculate the square of n
*/
for i=1 to n
do n = n + n
// when the loop ends n will hold its square
return n

Or, we can simply use a mathematical operator * (multiplication) to find the square.

/*
we have to calculate the square of n
*/
return n*n

13
In the above two simple algorithms, we saw how a single problem can have many
solutions. While the first solution required a loop, which will execute for n number of times, the
second solution used a mathematical operator * to return the result in one line. So which one is
the better approach, of course the second one.

Time complexity of an algorithm signifies the total time required by the program to run till
its completion. The time complexity of algorithms is most commonly expressed using the big O
notation which is an asymptotic notation to represent the time complexity.

Time Complexity is most commonly estimated by counting the number of elementary steps
(we call this frequency count) performed by any algorithm to finish execution. Like in the example
above, for the first code the loop will run n number of times, so the time complexity will be n at
least and as the value of n will increase the time taken will also increase. While for the second
code, time complexity is constant, because it will never be dependent on the value of n, it will
always give the result in 1 step. And since the algorithm's performance may vary with different
types of input data, hence for an algorithm we usually use the worst-case Time complexity of an
algorithm because that is the maximum time taken for any input size.

Let us now tap onto the next big topic related to Time complexity, which is How to
Calculate Time Complexity. It becomes very confusing sometimes, but we will try to explain it in
the simplest way.

Now the most common metric for calculating time complexity is Big O notation. This
removes all constant factors so that the running time can be estimated in relation to N, as N
approaches infinity. In general you can think of it like this:

statement;

Above we have a single statement. Its Time Complexity will be constant. The running time
of the statement will not change in relation to N.

for(i=0; i < N; i++)


{
statement;
}

The time complexity for the above algorithm will be Linear. The running time of the loop is
directly proportional to N. When N doubles, so does the running time.

This time, the time complexity for the code below will be Quadratic. The running time of
the two loops is proportional to the square of N. When N doubles, the running time increases by
N * N.

for(i=0; i < N; i++)


{
for(j=0; j < N;j++)
{
statement;
}
}

14
The algorithm below breaks a set of numbers into halves to search a particular field. Now,
this algorithm will have a Logarithmic Time Complexity. The running time of the algorithm is
proportional to the number of times N can be divided by 2 (N is high-low here). This is because
the algorithm divides the working area in half with each iteration.

while (low <= high)


{
mid = (low + high) / 2;
if (target < list[mid])
high = mid - 1;
else if (target > list[mid])
low = mid + 1;
else break;
}

Taking the previous algorithm forward, below we have a small logic of Quick Sort. In Quick
Sort, we divide the list into halves every time, but we repeat the iteration N times (where N is the
size of list). Hence time complexity will be N*log(N). The running time consists of N loops (iterative
or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.

void quicksort(int list[], int left, int right)


{
int pivot = partition(list, left, right);
quicksort(list, left, pivot - 1);
quicksort(list, pivot + 1, right);
}

In general, doing something with every item in one dimension is linear, doing something
with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic.

Figure 5 Comparing Growth Rate

15
Types of Notations for Time Complexity

1. Big Oh denotes "fewer than or the same as" <expression> iterations.


2. Big Omega denotes "more than or the same as" <expression> iterations.
3. Big Theta denotes "the same as" <expression> iterations.
4. Little Oh denotes "fewer than" <expression> iterations.
5. Little Omega denotes "more than" <expression> iterations.

O(expression) is the set of functions that grow slower than or at the same rate as
expression. It indicates the maximum required by an algorithm for all input values. It represents
the worst case of an algorithm's time complexity.

Omega(expression) is the set of functions that grow faster than or at the same rate as
expression. It indicates the minimum time required by an algorithm for all input values. It
represents the best case of an algorithm's time complexity.

Theta(expression) consist of all the functions that lie in both O(expression) and
Omega(expression). It indicates the average bound of an algorithm and it represents the average
case of an algorithm's time complexity.

Running Time Calculations

Generally, there are several algorithmic solutions to every programming problem and we
would like to eliminate the bad ones early on so an analysis is usually required. The ability to do
an analysis usually provides insight into designing efficient algorithms. The analysis also generally
pinpoints the bottlenecks, which are worth coding carefully.

To simplify the analysis, we adopt the convention that there are no particular units of time
thus throwing away leading constants. We will also throw away low-order terms, so what we are
essentially doing is computing a Big-Oh running time. Since Big-Oh is an upper bound, we must
be careful never to underestimate the running time of the program. In effect, the answer provided
is a guarantee that the program will terminate within a certain time period. The program may stop
earlier than this, but never later.

Example.

Here is a simple program fragment to calculate ∑*


+,% 𝑖
&

int sum( int n )


{
int partialSum;

1 partialSum = 0;
2 for( int i = 1; i <= n; i++ )
3 partialSum += i * i * i;
4 return partialSum;
}

16
The analysis of this fragment is simple. The declarations count for no time. Lines 1 and 4
count for one unit each. Line 3 counts for four units per time executed (two multiplications, one
addition, and one assignment) and is executed N times, for a total of 4N units. Line 2 has the
hidden costs of initializing i, testing i ≤ N, and incrementing i. The total cost of all these is 1 to
initialize, N+1 for all the tests, and N for all the increments, which is 2N + 2. We ignore the costs
of calling the method and returning, for a total of 6N + 4. Thus, we say that this method is O(N).

If we had to perform all this work every time we needed to analyze a program, the task would
quickly become infeasible. Fortunately, since we are giving the answer in terms of Big-Oh, there
are lots of shortcuts that can be taken without affecting the final answer. For instance, line 3 is
obviously an O(1) statement (per execution), so it is silly to count precisely whether it is two, three,
or four units; it does not matter. Line 1 is obviously insignificant compared with the for loop, so it
is silly to waste time here. This leads to several general rules.

General Rules for Running Time Calculations

Rule 1—FOR loops

The running time of a for loop is at most the running time of the statements inside the for
loop (including tests) times the number of iterations.

Rule 2—Nested loops

Analyze these inside out. The total running time of a statement inside a group of nested
loops is the running time of the statement multiplied by the product of the sizes of all the
loops.

As an example, the following program fragment is O(N2):

for (i=0; i<n; ++i)


for (j=0; j<n; ++j)
++k;

Rule 3—Consecutive Statements

These just add (which means that the maximum is the one that counts). Example is the
following program fragment, which has O(N) work followed by O(N2) work, is also O(N2):

for (i=0; i<n; ++i)


a[i] = 0;
for (i=0; i<n; ++i)
for (j=0; j<n; ++j)
a[i] += a[j] + i + j

17
Rule 4—If/Else

For the fragment

if (condition)
S1
else
S2

the running time of an if/else statement is never more than the running time of the test
plus the larger of the running times of S1 and S2.

Other rules are obvious, but a basic strategy of analyzing from the inside (or deepest part)
out works. If there are function calls, these must be analyzed first. If there are recursive functions,
there are several options. If the recursion is really just a thinly veiled for loop, the analysis is
usually trivial.

Watch:

• What’s an algorithm? By TED-Ed (https://youtu.be/6hfOvs8pY1k)


• Introduction to Big O Notation and Time Complexity By CS Dojo
(https://youtu.be/D6xkbGLQesk)

Read:

• Section 1.4 – Analysis of Algorithms (pp. 172-205)


Sedgewick, Robert, Wayne, Kevin (2011). Algorithms 4th Edition

• Chapter 4 – Algorithm Analysis


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Chapter 2 – Algorithm Analysis


Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

Review:

1. What is an algorithm?
2. In formal computer science, how do you distinguish between algorithm and a program?
3. Why is there a need to analyze algorithms?

18
Activities/Assessments:
1. For each of the following three program fragments:
a. Give an analysis of the running time (Big-Oh will do).
b. Implement the code in the language of your choice, and give the running time for
several values of N.
c. Compare your analysis with the actual running times.

(1) sum = 0;
for( i = 0; i < n; ++i )
++sum;

(2) sum = 0;
for( i = 0; i < n; ++i )
for( j = 0; j < n * n; ++j )
++sum;

(3) sum = 0;
for( i = 0; i < n; ++i )
for( j = 0; j < i * i; ++j )
for( k = 0; k < j; ++k )
++sum;

19
Unit 3 – Arrays

Overview:
Most programming languages in one way or another have its own implementation of an array. In
this unit we present an array which is used by most of the data structures to implement their
algorithms.

Module Objectives:
After successful Completion of this module, you should be able to:

• Define and describe array and its characteristics


• Discuss the different ways to represent an array
• Discuss the basic operations in an array
• Develop your own implementation of an array
• Apply array implementation in solving programming challenges

Course Materials:

Array type or array is defined as a finite set of elements having the same type referenced
under a common name. Each individual data is called an array element and must be of the same
type and size. Figure 6 shows a one-dimensional array where data is arrayed in a line. Each data
element is accessed using an index.

Figure 6 One-dimensional array

Arrays can have more than one dimension, these are called multidimensional arrays.
Though they may have certain limitations as to the number of definable dimensions, depending
on the type of programming language or compiler used. Figure 7 shows a two-dimensional array
sometimes called a table as it represents a series of rows and columns. The figure shows data
lined up in both vertical and horizontal directions. Data elements are identified by a row index and
a column index. Two-dimensional arrays are also called array of arrays.

20
Figure 7 Two-dimensional array

The figure below represents a three-dimensional array. Notice that aside from the rows
and columns, a third element representing depth can also be seen.

Figure 8 Three-dimensional Array

Memory Address Calculation in an Array

An array is a collection of items stored at contiguous memory locations. The idea is to


store multiple items of the same type together thus arrays are classified as homogeneous data
structures because they store elements of the same type. This also makes it easier to calculate
the position of each element by simply adding an offset to a base value (the base value here is
the memory location of the first element of the array denoted by the name of the array).

Figure 9 Array Elements in Contiguous Memory Locations

21
The above image can be looked as a top-level view of a staircase where you are at the
base of the staircase (index 0). Each element can be uniquely identified by their index in the array
(in a similar way as you could identify your friends by the step on which they were on in the above
example).

Address Calculation in Single (One) Dimension Array

Figure 10 Address Calculation in Single Dimension Array

In doing address calculation of an array element it is important to take note of the


base address and the storage size of one element (in this case the size of the data type).
The address of an element of an array say “A[X]” is calculated using the following formula:

Address of A[X] = B + W * (X – LB)

Where:
B = base address
W = storage size of one element stored in the array (in bytes)
X = subscript of element whose address is to be found
LB = Lower limit/Lower bound of subscript, if not specified assume 0 (zero)

Example:

Given the base address of an array B[1300…..1900] as 1020 and size of each element is
2 bytes in the memory. Find the address of B[1700].

Solution: The given values are: B = 1020, LB = 1300, W = 2, X = 1700

Address of A [ X ] = B + W * ( X – LB )

= 1020 + 2 * (1700 – 1300)


= 1020 + 2 * 400
= 1020 + 800
= 1820 [Ans]

22
Address Calculation in Two Dimension Array

While storing the elements of a Two-dimensional array in memory, these are


allocated contiguous memory locations. Therefore, a two-dimension array must be
linearized so as to enable their storage. There are two alternatives to achieve linearization:
Row-Major and Column-Major.

Figure 11 Two-dimensional Array

Figure 12 Alternatives to Linearization of Two-dimension Array

Address of an element of any array say “A[ I ][ J ]” is calculated in two forms as


given: 1) Row Major System and 2) Column Major System.

Row Major System: The address of a location in Row Major System is calculated using
the following formula:

Address of A [ I ][ J ] = B + W * [ N * ( I – Lr ) + ( J – Lc ) ]

23
Column Major System: The address of a location in Column Major System is calculated
using the following formula:

Address of A [ I ][ J ] Column Major Wise = B + W * [( I – Lr ) + M * ( J – Lc )]

Where:
B = Base address
I = Row subscript of element whose address is to be found
J = Column subscript of element whose address is to be found
W = Storage Size of one element stored in the array (in byte)
Lr = Lower limit of row/start row index of matrix, if not given assume 0 (zero)
Lc = Lower limit of column/start column index of matrix, if not given assume 0
M = Number of rows of the given matrix
N = Number of columns of the given matrix

Note however that number of rows and columns of a matrix are usually given (like
A[20][30] or A[40][60] ) but if it is given as A[Lr- – – – – Ur, Lc- – – – – Uc]. In this case
number of rows and columns are calculated using the following methods:

Number of rows (M) will be calculated as = (Ur – Lr) + 1

Number of columns (N) will be calculated as = (Uc – Lc) + 1

And rest of the process will remain same as per requirement (Row Major Wise or
Column Major Wise).

Example:

An array X [-15……….10, 15……………40] requires one byte of storage. If beginning


location is 1500 determine the location of X [15][20].

Solution:

As you see here the number of rows and columns are not given in the question. So they
are calculated as:

Number or rows say M = (Ur – Lr) + 1 = [10 – (- 15)] +1 = 26


Number or columns say N = (Uc – Lc) + 1 = [40 – 15)] +1 = 26

(i) Column Major Wise Calculation of above equation

The given values are: B = 1500, W = 1 byte, I = 15, J = 20, Lr = -15, Lc = 15, M = 26

Address of A [ I ][ J ] = B + W * [ ( I – Lr ) + M * ( J – Lc ) ]

= 1500 + 1 * [(15 – (-15)) + 26 * (20 – 15)]


= 1500 + 1 * [30 + 26 * 5]
= 1500 + 1 * [160]
= 1660 [Ans]

24
(ii) Row Major Wise Calculation of above equation

The given values are: B = 1500, W = 1 byte, I = 15, J = 20, Lr = -15, Lc = 15, N = 26

Address of A [ I ][ J ] = B + W * [ N * ( I – Lr ) + ( J – Lc ) ]

= 1500 + 1* [26 * (15 – (-15))) + (20 – 15)]


= 1500 + 1 * [26 * 30 + 5]
= 1500 + 1 * [780 + 5]
= 1500 + 785
= 2285 [Ans]

Array Representation

Arrays can be declared in various ways in different programming languages.

Figure 13 Declaring an Array in C and Python

For illustration, let’s take C array declaration.

Figure 14 Array Representation in C

25
As seen in the illustration (Figure 14), following are the important points to be considered.

• Index starts with 0.


• Array length is 10 which means it can store 10 elements.
• Each element can be accessed via its index. For example, we can fetch an element at
index 6 as 19.

In C, when an array is initialized with size, then it assigns default values to its elements:

Table 4 Default values for an Array

Data Type Default Value


bool false
char 0
int 0
float 0.0
double 0.0f
void
wchar_t 0

Basic Array Operations

There are several basic operations supported by an array though the implementation
differs from one programming language to another. In Python, there is an array module that has
separate functions for performing array operations. In C language, the operations are explicitly
defined using loops and assignment statements. Following are the basic operations supported by
an array.

1. Traverse. This operation traverses through the elements of an array meaning we print
all the array elements one by one as we visit them.

2. Insertion. Insert operation is to insert one or more data elements into an array. Based
on the requirement, a new element can be added at the beginning, end, or any given
index of the array.

3. Deletion. Deletion refers to removing an existing element from the array and re-
organizing all elements of an array.

4. Search. With this operation, you can search for an item in an array based on a given
value (or through an index).

5. Update. Update operation refers to updating an existing element from the array at a
given index. This operation is quite similar to the insert method, except that it will
replace the existing value at the given index.

26
Table 5 Time Complexity of an Array

Operation Average Case Worst Case


Read/Update O(1) O(1)
Insert O(n) O(n)
Delete O(n) O(n)
Search O(n) O(n)
Traverse O(n) O(n)

Advantages and Disadvantages of Arrays

Advantages

1. Reading an array element is simple and efficient. As shown in the above table, the read
time of array is O(1) in both best and worst cases. This is because any element can be
instantly read using indexes (base address calculation behind the scene) without
traversing the whole array.

2. Array is a foundation of other data structures. For example, other data structures such as
LinkedList, Stack, Queue etc. are implemented using array.

3. All the elements of an array can be accessed using a single name (array name) along with
the index, which is readable, user-friendly and efficient rather than storing those elements
in different variables.

Disadvantages

1. While using array, we must need to make the decision of the size of the array in the
beginning, so if we are not aware how many elements we are going to store in array, it
would make the task difficult.

2. The size of the array is fixed so if at later point, if we need to store more elements in it
then it can’t be done. On the other hand, if we store a smaller number of elements than
the declared size, the remaining allocated memory is wasted.

Watch:

• An Overview of Arrays and Memory By CS Dojo (https://youtu.be/pmN9ExDf3yQ)

Read:

• Section 3.1 – Using Arrays (pp. 104-121)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

27
• Arrays in C (https://www.w3resource.com/c-programming/c-array.php)
• Arrays (https://en.wikibooks.org/wiki/Data_Structures/Arrays)

Review:

1. What is an array?
2. What are sequential lists and how are they limited?
3. A one-dimensional array is a collection of what? A two-dimensional array is a collection of
what?
4. If an array holds integers, each of which is four bytes, how many bytes from the base
location of the array is the location of the fifth element?
5. Describe a row major ordering and a column major ordering.
6. On average, how many items must be moved to insert a new item into an unsorted array
with N items?
7. On average, how many items must be moved to delete an item from an unsorted array
with N items?
8. On average, how many items must be examined to find a particular item in an unsorted
array with N items?

Activities/Assessments:
1. How many values can be held by an array with dimensions?

a) X [0..8]
b) a [0..n]
c) b [-1..n, 1..m]
d) c [-n..0, 1..2]

2. When storing a two-dimensional array A with ten rows and ten columns in continuous
memory space in the direction of rows, what is the address where A[5,6] is stored? In this
question, the address is represented in decimal numbers.

Address
100 A[1,1]
101
102 A[1,2]
103
104

3. Assume that the following lists of numbers represents the physical view of a two-
dimensional array in memory. If the array has 3 columns and 3 rows, show the row major
ordering and the column major ordering for each list.

a) 4, 0, 1, 3, 5, 1, 7, 4, 4
b) 7, 7, 7, 3, 3, 3, 1, 1, 1
c) 1, 2, 3, 4, 5, 6, 7, 8, 9

28
4. Answer the following problems given the following array variable/s assuming that an int is
equal to 2 bytes. int x [4][3] = {5, 2, 9, 0, 3, 1, 4, 12, 6, 8, 10, 11};

a) what will be the total size (in bytes) of the array? ___________
b) what value is found at x [0][0]? _____________
c) what value is found at x [3][2]? _____________
d) what value is found at x [2][1]? _____________
e) what value is found at x [1][2]? _____________

Programming:

1. Write a program in C to find the maximum and minimum element in an array.


Test Data :
Input the number of elements to be stored in the array : 3
Input 3 elements in the array :
element - 0 : 45
element - 1 : 25
element - 2 : 21
Expected Output :
Maximum element is : 45
Minimum element is : 21

2. Write a program in C to sort elements of array in ascending order.


Test Data :
Input the size of array : 5
Input 5 elements in the array :
element - 0 : 2
element - 1 : 7
element - 2 : 4
element - 3 : 5
element - 4 : 9
Expected Output :
Elements of array in sorted ascending order:
24579

3. Write a program in C to delete an element at desired position from an array.


Test Data :
Input the size of array : 5
Input 5 elements in the array in ascending order:
element - 0 : 1
element - 1 : 2
element - 2 : 3
element - 3 : 4
element - 4 : 5
Input the position where to delete: 3
Expected Output :
The new list is : 1 2 4 5

29
Unit 4 – Linked Lists

Overview:
List provide a convenient way for keeping track of things in our everyday life: shopping lists,
laundry lists, list of thigs to be done, and so on. Similarly, lists provide a convenient way for
representing arbitrary nonnumeric information in a computer. The remainder of this unit will deal
with the list a formal data structure and will explain how it is represented and manipulated in a
computer.

Module Objectives:
After successful Completion of this module, you should be able to:

• Define and describe a linked list and its characteristics


• Discuss the different operations on a linked list
• Develop your own implementation of a linked list
• Apply your linked list implementation in solving programming challenges

Course Materials:

List is a finite, ordered collection of items (elements) of some element type E. Note that
this doesn't mean that the objects are in sorted order, it just means that each object has a position
in the List, starting with position zero (first element in the list), second element, and so on. In
programming list is describe as a collection of elements that have a linear relationship with each
other. A linear relationship means that, except for the first one, each element on the list has a
unique successor. Also lists have a property intuitively called size, which is simply the number of
elements on the list. Know that every list has a size.

In computer programs, lists are extremely versatile ADTs that provide storage for
information without imposing any limitations on how that information is added, accessed, or
removed. Recall that when we think about an ADT, we think about both the external and internal
views. The external view includes the "conceptual picture" and the set of "conceptual operations".
The conceptual picture of a List is something like this:

item 0
item 1
item 2
...
item n

30
and one reasonable set of operations is:

Operation Description
void add(E item) add item to the end of the List
void add(int pos, E item) add item at position pos in the List, moving the items originally
in positions pos through size()-1 one place to the right to make
room (error if pos is less than 0 or greater than size())
boolean contains(E item) return true iff item is in the List (i.e., there is an item x in the List
such that x.equals(item))
int size() return the number of items in the List
boolean isEmpty() return true iff the List is empty
E get(int pos) return the item at position pos in the List (error if pos is less than
0 or greater than or equal to size())
E remove(int pos) remove and return the item at position pos in the List, moving
the items originally in positions pos+1 through size()-1 one
place to the left to fill in the gap (error if pos is less than 0 or
greater than or equal to size())

Many other operations are possible; when designing an ADT, you should try to provide
enough operations to make the ADT useful in many contexts, but not so many that it gets
confusing. It is not always easy to achieve this goal; it will sometimes be necessary to add
operations in order for a new application to use an existing ADT.

List Implementations

In some ways, a List is similar to an array: both Lists and arrays are ordered collections of
data elements and in both cases you can add or access items at a particular position (and in both
cases we consider the first position to be position zero). You can also find out how many items
are in a List and how large an array is.

You can actually implement a list using an array which is one of the most common
implementation of a list. Unfortunately, one disadvantage of using arrays to store data is that
arrays are static structures and therefore cannot be easily extended or reduced to fit the data set.
Arrays are also expensive to maintain new insertions and deletions.

A solution is to use another implementation, that of using pointers or memory cells forming
what we call a Linked list. A linked list is a linear data structure where each element is a separate
object connected together via links.

The implementation of a linked list in C is done using pointers. Linked list can be visualized
as a chain of nodes, where every node points to the next node. Each element (node) of a list is
comprised of two items - the data and a reference to the next node. The last node has a reference
to null and often identified as tail. The entry point into a linked list is called the head of the list. It

31
should be noted that head is not a separate node, but the reference to the first node. If the list is
empty then the head is a null reference.

Figure 15 Linked List

Implementing a list in C language requires the use of structures and pointers. Each
structures represents a node having some data and also a pointer to another structure of the
same kind. This pointer holds the address of the next node and creates a link between two nodes.
The following code shows how to define a node in C (this is for a singly linked list).

struct node
{
int data;
struct node *next;
};

The first data member of the structure (named node) is an integer to hold an integer and
the second data member is the pointer to a node (same structure). This means that the second
data member holds the address of the next node and in this way, every node is connected as
represented in the picture above.

A linked list is a dynamic data structure. Unlike an array, a linked list does not store its
nodes in consecutive memory locations. The number of nodes in a list is not fixed and can grow
and shrink on demand. Any application which has to deal with an unknown number of objects will
need to use a linked list.

Another point of difference between an array and a linked list is that a linked list does not
allow random access of data or direct access to the individual elements. If you want to access a
particular item then you have to start at the head and follow the references until you get to that
item. Nodes in a linked list can be accessed only in a sequential manner. But like an array,
insertions and deletions can be done at any point in the list in a constant time. A disadvantage to
a linked list is that it uses more memory compare with an array - we need an extra 4 bytes (on
32-bit CPU) just to store a reference to the next node.

Types of Linked List

Following are the various types of linked list.

1. Simple Linked List or Singly Linked List – navigation in the list is forward only. In this
type of linked list, every node stores address or reference of next node in list and the
last node has next address or reference as NULL.

32
Figure 16 Singly Linked List

2. Doubly Linked List – navigation through the list can be both ways – forward and
backward. In this type of Linked list, there are two references associated with each
node, One of the reference points to the next node and one to the previous node.
Advantage of this data structure is that we can traverse in both the directions and for
deletion we don’t need to have explicit access to previous node.

Figure 17 Doubly Linked List

3. Circular Linked List – The last item (TAIL) in the list contains a link to the first element
(HEAD) as next and the first element has a link to the last element as previous. Circular
linked list is a linked list where all nodes are connected to form a circle. There is no
NULL at the end. A circular linked list can be a singly circular linked list or doubly
circular linked list. Advantage of this data structure is that any node can be made as
starting node. This is useful in implementation of circular queue in linked list.

Figure 18 Circular Singly Linked List

Figure 19 Circular Doubly Linked List

Basic Operations

Though there exist various types of linked list, some of the basic operations supported by
a linked list are almost the same with slight modification. The operations to be presented below
are that of a singly linked list.

33
• Addition/insertion – adding an element in a list. Note that the process of adding or
inserting an item varies depending on where the new element will be placed (at the
beginning, at the end of the list, or in between nodes)

• Deletion – removing an item from the list.

• Search – finding an element from the list. Searching for an item in the list always begins
at the head.

Addition/Insertion Operation

The process of adding or inserting a new node to a linked list (singly linked list)
depends on the position where the new item is to be placed. Following are the processes
for adding an item to a list assuming the list already has data on it.

1. Adding a node at the beginning (prepend)

Step 1 – Make a new node


Step 2 – Point the ‘next’ of the new node to the ‘head’ of the linked list.
Step 3 – Mark the new node as ‘head’.

Figure 20 Adding a node at the beginning of a list

2. Adding a node at the end (append)

Step 1 – Make a new node


Step 2 – Point the last node of the linked list to the new node.

Figure 21 Adding a node at the end of a list

3. Adding a node in between nodes (insert after prev)

Step 1 – Make a new node


Step 2 – Point the ‘next’ of the new node to the node ‘cur’. Till now, two nodes are
pointing at the same node ‘cur’, the node ‘prev’ and the new node.

34
Step 3 – Point the ‘next’ of ‘prev’ to the new node.

Figure 22 Adding a node in between existing node

Deletion Operation

We delete any node of a linked list by connecting the predecessor node of the
node to be deleted by the successor node of the same node. For example, if we have a
linked list a → b → c, then to delete the node ‘b’, we will connect ‘a’ to ‘c’ i.e., a → c. But
this will make the node ‘b’ inaccessible and this type of inaccessible nodes are called
garbage and we need to clean this garbage by using the ‘free’ function in C.

Step 1 – Find a node to be deleted (in this case ‘cur’)


Step 2 – Point the ‘next’ of ‘prev’ to the node pointed to by ‘next’ of ‘cur’. At this point,
two nodes are pointing at the same node (successor of ‘cur’).
Step 3 – node ‘cur’ is no longer part of the list thus we need to free up the space using
the ‘free’ function in C.

Figure 23 Deleting a node from a linked list

Search Operation

Searching for an item in the list start with the head and accessing each node
comparing the value of the item being searched until we reach null (the end of the list). Do
not change the head reference. The same process (without the comparison) is done when
traversing all entries in the list.

Figure 24 Traversing a linked list

35
Watch:

• Pointers By CS50 (https://youtu.be/XISnO2YhnsY)


• Introduction to Linked Lists By CS Dojo (https://youtu.be/WwfhLC16bis)
• Singly-Linked Lists By CS50 (https://youtu.be/zQI3FyWm144)
• Doubly-Linked Lists By CS50 (https://youtu.be/FHMPswJDCvU)

Read:

• Section 3.2 – The List ADT (pp. 78-80)


Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

• Section 3.2 – Singly Linked Lists (pp. 122-127)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 3.3 – Circularly Linked Lists (pp. 128-131)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 3.4 – Doubly Linked Lists (pp. 132-137)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 3.5 – Equivalence Testing (pp. 138-140)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

Review:

1. What is a list?
2. What is the different between a list and linked list?
3. How is an array different from a linked list?
4. What are the different types of linked list?

Activities/Assessments:
1. Fill in the values indicated by the pointer variable expressions below, based on the
following linked list:

a) A->next->next->info ___________________
b) B->next->info ___________________
c) B->next->back->back->info ___________________
d) A->back->next->info ________________

36
2. The figure below is a uni-directional list. Manila is the head of the list and the pointer
contains addresses of data shown below. Batanes is the tail of the list and the pointer
contains 0. Choose one correct way to insert Cebu placed at address 150 between
Boracay and Davao.

Pointer for the head Address Data Pointer


10 10 Manila 50
30 Batanes 0
50 Vigan 90
70 Davao 30
90 Boracay 70
150 Cebu

a) The pointer for Cebu to be set to 50 and that for Davao to be set to 150
b) The pointer for Cebu to be set to 70 and that for Boracay to be set to 150
c) The pointer for Cebu to be set to 90 and that for Davao to be set to 150
d) The pointer for Cebu to be set to 150 and that for Boracay to be set to 90

3. Show what is written by the following segments of code:


a) P = (node) malloc(sizeof(NodeEntry));
Q = (node) malloc(sizeof(NodeEntry));
P->info = 5; Q->info = 6;
P = Q;
P->info = 1;
printf(“%d %d”, P->info, Q->info);

b) P = (node) malloc(sizeof(NodeEntry));
P->info = 3;
Q = (node) malloc(sizeof(NodeEntry));
Q->info = 2;
P = (node) malloc(sizeof(NodeEntry));
P->info = Q->info;
Q->info = 0;
printf(“%d %d”, P->info, Q->info);

4. Swap two adjacent elements by adjusting only the links (and not the data) using:
a) singly linked lists
b) doubly linked lists

Provide a diagram or steps of your solution.

Programming:
You were tasked to do programs that will illustrate the use of lists implemented using an
array. One uses a sorted list, the other an unsorted list. Run the programs separately
and answer the following questions.

37
1. Write a program that will initially create an UNSORTED list containing the following and
then run the program:

Mitch
Diane
Jack
Robbie
Katherine

Answer and explain the following questions below:


a) If we check the memory location of each element in the list, what would it be? What
index represent each element?
b) What happens if we add Morrie in the list? What will be its index value?
c) What does the new list look like? Where do you think Morrie should be placed and
why?

With the same list above (with Morrie added), delete/remove Jack. Answer and explain
the following questions below:
a) What is the new list? Identify the elements of the list and its index.
b) What happened to the former location occupied by Jack?

2. Write a program that will initially create a SORTED list containing the following and then
run the program:

Diane
Jack
Katherine
Mitch
Robbie

Answer and explain the following questions below:


a) If we check the memory location of each element in the list, what would it be? What
index represent each element?
b) What happens if we add Morrie in the list? What will be its index value?
c) What does the new list looks like? Where do you think Morrie should be placed
and why?

With the same list above (with Morrie added), delete/remove Jack. Answer and explain
the following questions below:
a) What is the new list? Identify the elements of the list and its index.
b) What happened to the former location occupied by Jack?

3. In most personal computers, the largest integer is 32,767 and the largest long integer is
2,147,483,647. Some applications, such as the national debt, may require an unbounded
integer. One way to store and manipulate integers of unlimited size is by using a linked
list. Each digit is stored in a node of the list. For example, the figure below shows how
we could store a five-digit number in a linked list.

38
While the linked list in the figure is represented as moving from right to left, remember that
there is no physical direction in a linked list. We represent it this way to assist us in
understanding the problem.

To add two numbers, we simply add the corresponding digit in the same location in their
respective linked lists with the carry from the previous addition. With each addition, if the
sum is greater than 10, then we need to subtract ten and set the carry to one. Otherwise,
the carry is set to zero.

Write a program to add two integer linked lists. Design your solution so that the same
logic adds the first numbers (units’ position) as well as the rest of the number. In other
words, do not have special one-time logic for adding the units’ position.

4. Write a program in C to create and display a circular linked list.


Test Data :
Input the number of nodes : 3
Input data for node 1 : 2
Input data for node 2 : 5
Input data for node 3 : 8
Expected Output :

Data entered in the list are :


Data 1 = 2
Data 2 = 5
Data 3 = 8

39
Unit 5 – Stacks

Overview:
Stacks are dynamic data structures that follow the Last In First Out (LIFO) principle. The last item
to be inserted into a stack is the first one to be deleted from it. In this unit we will explore stacks
and its applications and learn how to implement it.

Module Objectives:
After successful Completion of this module, you should be able to:

• Define and describe stacks and its characteristics


• Discuss the different operations on stack
• Develop your own implementation of stack
• Apply your stack implementation in solving programming challenges

Course Materials:

A stack is an Abstract Data Type (ADT), commonly used in most programming languages.
It is named stack as it behaves like a real-world stack, for example – a deck of cards or a pile of
plates, etc. A real-world stack allows operations at one end only. For example, we can place or
remove a card or plate from the top of the stack only.

Figure 25 Stack

40
A stack is a container of objects that are inserted and removed according to the last-in
first-out (LIFO) principle. In the pushdown stacks only two operations are allowed: push the item
into the stack and pop the item out of the stack. A stack is a limited access data structure as
elements can be added and removed from the stack only at the top. Push adds an item to the top
of the stack, pop removes the item from the top. A helpful analogy is to think of a stack of books;
you can remove only the top book, also you can add a new book on the top.

Applications of Stacks

Here are some examples where stacks can be applied.

1. The simplest application of a stack is to reverse a word. You push a given word to a
stack one character at a time and then pop letters from the stack.

2. Another application is an "undo" mechanism in text editors; this operation is


accomplished by keeping all text changes in a stack.

3. Problems involving Backtracking also uses stack. Backtracking is a process when you
need to access the most recent data element in a series of elements. Think of a
labyrinth or maze - how do you find a way from an entrance to an exit?

Figure 26 Backtracking applied to a maze problem

Once you reach a dead end, you must backtrack. But backtrack to where? to the
previous choice point. Therefore, at each choice point you store on a stack all possible
choices. Then backtracking simply means popping a next choice from the stack.

4. Stacks can also be used in Language processing:

a. space for parameters and local variables is created internally using a stack.
b. compiler's syntax check for matching braces is implemented by using stack.
c. support for recursion

41
Stack Representation and Implementation

A Stack Data Structure can be implemented in two ways: array implementation and linked
list implementation.

Array-based Implementation

In an array-based implementation we maintain the following fields: an array A of a


default size (≥ 1), the variable top that refers to the top element in the stack and the
capacity that refers to the array size.

The variable top changes from - 1 to capacity - 1. We say that a stack is empty
when top = -1, and the stack is full when top = capacity - 1.

Figure 27 Array-based implementation of Stack

In a fixed-size stack, the capacity stays unchanged, therefore when top reaches
capacity, the stack throws an overflow error. Whereas, in a dynamic stack when top
reaches capacity, we double up the stack size.
An advantage of an array-based implementation is that it is easy to implement,
and that memory is saved as pointers are not involved. The drawback is that it is not
dynamic. It doesn’t grow and shrink depending on needs at runtime.

Linked List-based Implementation

Linked List-based implementation provides the best (from the efficiency point of
view) dynamic stack implementation. The linked list implementation of a stack can grow
and shrink according to the needs at runtime. This advantage also closes into its
disadvantage that is it requires extra memory due to involvement of pointers.

Figure 28 Linked List-based implementation of Stack

Basic Operations

Stack operations may involve initializing the stack, using it and then de-initializing it. Apart
from these basic stuffs, a stack is used for the following two primary operations:

42
• push() − Pushing (storing) an element on the stack.
• pop() − Removing (accessing) an element from the stack.

To use a stack efficiently, we need to check the status of stack as well. For the same
purpose, the following functionality is added to stacks −

• peek() − get the top data element of the stack, without removing it.
• isFull() − check if stack is full.
• isEmpty() − check if stack is empty.

At all times, we maintain a pointer to the last PUSHed data on the stack. As this pointer
always represents the top of the stack, hence named top. The top pointer provides top value of
the stack without actually removing it.

Operations to support stack functions showing the procedures.

peek()

Algorithm of peek() function −

begin procedure peek


return stack[top]
end procedure

Implementation of peek() function in C programming language −

int peek() {
return stack[top];
}

isfull()

Algorithm of isfull() function −

begin procedure isfull

if top equals to MAXSIZE


return true
else
return false
endif

end procedure

43
Implementation of isfull() function in C programming language −

bool isfull() {
if(top == MAXSIZE)
return true;
else
return false;
}

isempty()

Algorithm of isempty() function −

begin procedure isempty

if top less than 1


return true
else
return false
endif

end procedure

Implementation of isempty() function in C programming language is slightly different. We


initialize top at -1, as the index in array starts from 0. So we check if the top is below zero
or -1 to determine if the stack is empty. Here's the code −

bool isempty() {
if(top == -1)
return true;
else
return false;
}

Push Operation

The process of putting a new data element onto stack is known as a Push Operation. Push
operation involves a series of steps −

Step 1 − Checks if the stack is full.


Step 2 − If the stack is full, produces an error and exit.
Step 3 − If the stack is not full, increments top to point next empty space.
Step 4 − Adds data element to the stack location, where top is pointing.
Step 5 − Returns success.

44
Figure 29 Stack Push Operation

If the linked list is used to implement the stack, then in step 3, we need to allocate space
dynamically.

A simple algorithm for Push operation can be derived as follows −

begin procedure push: stack, data

if stack is full
return null
endif

top ← top + 1
stack[top] ← data

end procedure

Implementation of this algorithm in C, is very easy. See the following code −

void push(int data) {


if(!isFull()) {
top = top + 1;
stack[top] = data;
} else {
printf("Could not insert data, Stack is full.\n");
}
}

45
Pop Operation

Accessing the content while removing it from the stack, is known as a Pop Operation. In
an array implementation of pop() operation, the data element is not actually removed, instead top
is decremented to a lower position in the stack to point to the next value. But in a linked-list
implementation, pop() actually removes data element and deallocates memory space. A Pop
operation may involve the following steps −

Step 1 − Checks if the stack is empty.


Step 2 − If the stack is empty, produces an error and exit.
Step 3 − If the stack is not empty, accesses the data element at which top is pointing.
Step 4 − Decreases the value of top by 1.
Step 5 − Returns success.

Figure 30 Stack Pop Operation

A simple algorithm for Pop operation can be derived as follows −

begin procedure pop: stack


if stack is empty
return null
endif

data ← stack[top]
top ← top - 1
return data
end procedure

Implementation of this algorithm in C, is as follows −

int pop(int data) {


if(!isempty()) {
data = stack[top];
top = top - 1;
return data;
} else {
printf("Could not retrieve data, Stack is empty.\n");
}
}

46
Arithmetic Expression Evaluation

The stack organization is very effective in evaluating arithmetic expressions. Expressions


are usually represented in what is known as notation. An arithmetic expression can be written in
three different but equivalent notations, i.e., without changing the essence or output of an
expression. These notations are −

• Infix Notation
• Prefix (Polish) Notation
• Postfix (Reverse-Polish) Notation

These notations are named as how they use operator in expression.

Infix Notation

Each operator is written between two operands (i.e., A + B). It is easy for us humans to
read, write, and speak in infix notation but the same does not go well with computing devices. An
algorithm to process infix notation could be difficult and costly in terms of time and space
consumption. With this notation, we must distinguish between ( A + B ) * C and A + ( B * C ) by
using either parentheses or some operator-precedence convention. Thus, the order of operators
and operands in an arithmetic expression does not uniquely determine the order in which the
operations are to be performed.

Prefix Notation

In this notation, operator is placed before its two operands. For example, +ab, this is
equivalent to its infix notation a + b. Prefix notation is also known as Polish Notation and no
parentheses are required.

Postfix Notation

This notation style is known as Reversed Polish Notation. It refers to the analogous
notation in which the operator is placed after its two operands. For example, ab+, this is equivalent
to its infix notation a + b. Again, no parentheses is required in Reverse Polish notation

Expression Parsing

Stack organized computers are better suited for post-fix notation than the traditional infix
notation. Thus the infix notation must be converted to the post-fix notation. An important
application of stacks is in parsing. For example, a compiler must parse arithmetic expressions
written using infix notation:

1 + ((2 + 3) * 4 + 5) * 6

We break the problem of parsing infix expressions into two stages. First, we convert from
infix to a different representation called postfix. Then we parse the postfix expression, which is a

47
somewhat easier problem than directly parsing infix. The conversion from infix notation to post-
fix notation must take into consideration the operational hierarchy.

There are 3 levels of precedence for 5 binary operators as given below:

• Highest: Exponentiation (^)


• Next highest: Multiplication (*) and division (/)
• Lowest: Addition (+) and Subtraction (-)

For example –

Infix notation: ( A – B ) * [ C / ( D + E ) + F ]

Post-fix notation: A B – C D E + / F + *

Here, we first perform the arithmetic inside the parentheses (A-B) and (D+E). The division
of C/(D+E) must done prior to the addition with F. After that multiply the two terms inside the
parentheses and bracket.

Converting from Infix to Postfix

Typically, we deal with expressions in infix notation

2 + 5

where the operators (e.g. +, *) are written between the operands (e.q, 2 and 5). Writing the
operators after the operands gives a postfix expression 2 and 5 are called operands, and the '+'
is operator. The above arithmetic expression is called infix, since the operator is in between
operands. The expression below is postfix since the operator is found after the operands.

2 5 +

Writing the operators before the operands gives a prefix expression

+ 2 5

Suppose you want to compute the cost of your shopping trip. To do so, you add a list of
numbers and multiply them by the local sales tax (7.25%):

70 + 150 * 1.0725

48
Depending on the calculator, the answer would be either 235.95 or 230.875. To avoid this
confusion we shall use a postfix notation

70 150 + 1.0725 *

Postfix has the nice property that parentheses are unnecessary.

Now, we describe how to convert from infix to postfix.

1. Read in the tokens one at a time

2. If a token is an integer, write it into the output

3. If a token is an operator, push it to the stack, if the stack is empty. If the stack is not empty,
you pop entries with higher or equal priority and only then you push that token to the stack.

4. If a token is a left parentheses '(', push it to the stack

5. If a token is a right parentheses ')', you pop entries until you meet '('.

6. When you finish reading the string, you pop up all tokens which are left there.

7. Arithmetic precedence is in increasing order: '+', '-', '*', '/';

Example. Suppose we have an infix expression:2+(4+3*2+1)/3. We read the string by characters.

'2' - send to the output.


'+' - push on the stack.
'(' - push on the stack.
'4' - send to the output.
'+' - push on the stack.
'3' - send to the output.
'*' - push on the stack.
'2' - send to the output.

Evaluating a Postfix Expression

We describe how to parse and evaluate a postfix expression.

1. We read the tokens in one at a time.

2. If it is an integer, push it on the stack

3. If it is a binary operator, pop the top two elements from the stack, apply the operator, and
push the result back on the stack.

49
Consider the following postfix expression

5 9 3 + 4 2 * * 7 + *

Here is a chain of operations

Stack Operations Output


--------------------------------------
push(5); 5
push(9); 5 9
push(3); 5 9 3
push(pop() + pop()) 5 12
push(4); 5 12 4
push(2); 5 12 4 2
push(pop() * pop()) 5 12 8
push(pop() * pop()) 5 96
push(7) 5 96 7
push(pop() + pop()) 5 103
push(pop() * pop()) 515

Note, that division is not a commutative operation, so 2/3 is not the same as 3/2.

Watch:

• Introduction to Stacks By mycodeschool (https://youtu.be/F1F2imiOJfk)


• Stacks By CS50 (https://youtu.be/hVsNqhEthOk)
• Array implementation of stacks By mycodeschool (https://youtu.be/sFVxsglODoo)
• Linked list implementation of stacks By mycodeschool (https://youtu.be/MuwxQ2IB8lQ)
• Infix, Prefix and Postfix By mycodeschool (https://youtu.be/jos1Flt21is)

Read:

• Section 3.6 The Stack ADT (pp. 103-112)


Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

• Section 6.1 Stacks (pp.226-237)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 1.3 Bags, Queues, and Stacks (pp. 120-157)


Sedgewick, Robert, Wayne, Kevin (2011). Algorithms 4th Edition

Review:

1. What is a stack and where it can be used?


2. What are the different ways to implement stacks?

50
3. How is a stack similar to a list? How is it different?
4. Give two stack operations, and what do these operations do?
5. What is one advantage of using a linked list to implement a stack rather than an array?
What is one disadvantage?

Activities/Assessments:
1. Many real-world situations correspond to a stack. For example, a pile of trays in a cafeteria
is a stack since the last tray put on the pile is the first tray used. Think of other real-world
stacks and describe the push and pop operations on these.

2. Determine what is ask for the following table:

Infix Postfix Prefix

1 4 * B + C – 5 4 B * C + 5 - - + * 4 B C 5

2 A + B * C - D A B C * + D – - + A * B C D

3 A + B * C ^ D A B C D ^ * + + A * B ^ C D

4 A * B + C – (D – E) A B * C + D E - - - + * A B C – D E

5 3 * 4 + 5 / 2 - 3 3 4 * 5 2 / + 3 - – + * 3 4 / 5 2 3

3. A letter means push and an asterisk means pop in the following sequence. Give the
sequence of values returned by the pop operations when this sequence of operations is
performed on an initially empty LIFO stack.

E A S * Y * Q U E * * * S T * * * I O * N * * *

4. A letter means push and an asterisk means pop in the following sequence. Give the
contents of s[0], ..., s[4] after the execution of this sequence of operations performed on
an initially empty LIFO stack.

L A * S T I * N * F I R * S T * * O U * T * * * * * *

5. Suppose that a client performs an intermixed sequence of (stack) push and pop
operations. The push operations push the integers 0 through 9 in order on to the stack;
the pop operations print out the return value. Circle the following sequence(s) that could
not occur.
a. 4 3 2 1 0 9 8 7 6 5
b. 2 1 4 3 6 5 8 7 9 0
c. 0 4 6 5 3 8 1 7 2 9
d. 4 6 8 7 5 3 2 9 0 1

51
Programming:

1. Write a program that will continuously accept a character and store in a stack until letter
‘Z’ is entered.

2. Write a program that will continuously accept a character and store in a stack until a vowel
is entered, then display the letters in reverse order of their arrival.

3. Write a program that will continuously accept a character and store in a stack until an
asterisk (*) is entered, and then display the third to the last character entered.

4. Write a program that takes from standard input an expression without left parentheses
and prints the equivalent infix expression with the parentheses inserted. For example,
given the input:
1 + 2 ) * 3 - 4 ) * 5 - 6 ) ) )
your program should print
( ( 1 + 2 ) * ( ( 3 - 4 ) * ( 5 - 6 ) )

5. Write a program that takes an infix expression from standard input, evaluates it, and prints
the value. (Piping the output of your program from the previous exercise to this program
gives equivalent behavior to Evaluate.

6. Write a program InfixToPostfix that converts an arithmetic expression from infix to postfix.

7. Write a program EvaluatePostfix that takes a postfix expression from standard input,
evaluates it, and prints the value.

52
Unit 6 - Queues

Overview:
Queues are data structures that follow the First In First Out (FIFO) principle where in the first
element that is added to the queue is the first one to be removed. In this unit we will explore
queues and its applications and learn how to implement it.

Module Objectives:
After successful Completion of this module, you should be able to:

• Define and describe queue and its characteristics


• Discuss the different operations on queue
• Develop your own implementation of queue
• Apply your queue implementation in solving programming challenges

Course Materials:

Queue is an abstract data structure, somewhat similar to Stacks. Unlike stacks, a queue
is open at both ends. One end (REAR; also called tail) is always used to insert data (enqueue)
and the other (FRONT; also called head) is used to remove data (dequeue). Queue follows First-
In-First-Out methodology, i.e., the data item stored first will be accessed first. A real-world
example of queue can be a single-lane one-way road, where the vehicle enters first, exits first.
More real-world examples can be seen as queues at the ticket windows and bus-stops.

Figure 31 Real-world example of Queue (Single-lane One-way road)

In the queue only two operations are allowed enqueue and dequeue. Enqueue means to
insert an item into the back of the queue, dequeue means removing the front item. The picture
demonstrates the FIFO access.

The difference between stacks and queues is in removing. In a stack we remove the item
the most recently added; in a queue, we remove the item the least recently added.

53
Application of Queues

Queue, as the name suggests is used whenever we need to manage any group of objects
in an order in which the first one coming in, also gets out first while the others wait for their turn,
like in the following scenarios:

1. Serving requests on a single shared resource, like a printer, CPU task scheduling etc.

2. In real life scenario, Call Center phone systems uses Queues to hold people calling
them in an order, until a service representative is free.

3. Handling of interrupts in real-time systems. The interrupts are handled in the same
order as they arrive i.e First come first served.

Queue Representation and Implementation

Queue can be implemented using an Array, Stack or Linked List. The easiest way of
implementing a queue is by using an Array.

Initially the head (FRONT) and the tail (REAR) of the queue points at the first index of the
array (starting the index of array from 0). As we add elements to the queue, the tail keeps on
moving ahead, always pointing to the position where the next element will be inserted, while the
head remains at the first index.

Figure 32 Implementation of Queue

54
When we remove an element from Queue, we can follow two possible approaches
(mentioned [A] and [B] in above diagram). In [A] approach, we remove the element at head
position, and then one by one shift all the other elements in forward position.

In approach [B] we remove the element from head position and then move head to the
next position.

In approach [A] there is an overhead of shifting the elements one position forward every
time we remove the first element.

In approach [B] there is no such overhead, but whenever we move head one position
ahead, after removal of first element, the size on Queue is reduced by one space each time.

Circular Queue

In order to resolve the problem with [B] we can implement what we call a circular
queue. Given an array A of a default size (≥ 1) with two references back and front,
originally set to -1 and 0 respectively. Each time we insert (enqueue) a new item, we
increase the back index; when we remove (dequeue) an item - we increase the front index.
Here is a picture that illustrates the model after a few steps:

Figure 33 Circular Queue implementation

As you see from the picture, the queue logically moves in the array from left to
right. After several moves back reaches the end, leaving no space for adding new
elements

Figure 34 Circular queue (end of linear list reached)

However, there is a free space before the front index. We shall use that space for
enqueueing new items, i.e. the next entry will be stored at index 0, then 1, until front. Such
a model is called a wrap-around queue or a circular queue.

55
Figure 35 Circular queue (wrap around)

Finally, when back reaches front, the queue is full. There are two choices to handle
a full queue: a) throw an overflow error ; b) double the array size (increase storage space).

The circular queue implementation is done by using the modulo operator


(denoted %), which is computed by taking the remainder of division (for example, 8%5 is
3). By using the modulo operator, we can view the queue as a circular array, where the
"wrapped around" can be simulated as "back % array_size". In addition to the back and
front indexes, we maintain another index: current - for counting the number of elements in
a queue. Having this index simplifies a logic of implementation.

Basic Operations

Queue operations may involve initializing or defining the queue, utilizing it, and then
completely erasing it from the memory. Here we shall try to understand the basic operations
associated with queues −

• enqueue() − add (store) an item to the queue.


• dequeue() − remove (access) an item from the queue.

Few more functions are required to make the above-mentioned queue operation efficient.
These are −

• peek() − Gets the element at the front of the queue without removing it.
• isfull() − Checks if the queue is full.
• isempty() − Checks if the queue is empty.

In queue, we always dequeue (or access) data, pointed by front pointer and while
enqueueing (or storing) data in the queue we take help of rear pointer.

Let's first learn about supportive functions of a queue −

peek()

This function helps to see the data at the front of the queue. The algorithm of peek()
function is as follows −

begin procedure peek


return queue[front]
end procedure

56
Implementation of peek() function in C programming language −

int peek() {
return queue[front];
}

isfull()

As we are using single dimension array to implement queue, we just check for the rear
pointer to reach at MAXSIZE to determine that the queue is full. In case we maintain the
queue in a circular linked-list, the algorithm will differ. Algorithm of isfull() function −

begin procedure isfull


if rear equals to MAXSIZE
return true
else
return false
endif
end procedure

Implementation of isfull() function in C programming language −

bool isfull() {
if(rear == MAXSIZE - 1)
return true;
else
return false;
}

isempty()

Algorithm of isempty() function −

begin procedure isempty

if front is less than MIN OR front is greater than rear


return true
else
return false
endif

end procedure

57
If the value of front is less than MIN or 0, it tells that the queue is not yet initialized, hence
empty. Here's the C programming code −

bool isempty() {
if(front < 0 || front > rear)
return true;
else
return false;
}

Enqueue Operation

Queues maintain two data pointers, front and rear. Therefore, its operations are
comparatively difficult to implement than that of stacks. The following steps should be taken to
enqueue (insert) data into a queue −

Step 1 − Check if the queue is full.


Step 2 − If the queue is full, produce overflow error and exit.
Step 3 − If the queue is not full, increment rear pointer to point the next empty space.
Step 4 − Add data element to the queue location, where the rear is pointing.
Step 5 − return success.

Figure 36 Queue Insert Operation (Enqueue)

Sometimes, we also check to see if a queue is initialized or not, to handle any unforeseen
situations.

Algorithm for enqueue operation

procedure enqueue(data)
if queue is full
return overflow
endif

rear ← rear + 1
queue[rear] ← data
return true
end procedure

58
Implementation of enqueue() in C programming language −

int enqueue(int data)


if(isfull())
return 0;

rear = rear + 1;
queue[rear] = data;

return 1;
end procedure

Dequeue Operation

Accessing data from the queue is a process of two tasks − access the data where front is
pointing and remove the data after access. The following steps are taken to perform dequeue
operation −

Step 1 − Check if the queue is empty.


Step 2 − If the queue is empty, produce underflow error and exit.
Step 3 − If the queue is not empty, access the data where front is pointing.
Step 4 − Increment front pointer to point to the next available data element.
Step 5 − Return success.

Figure 37 Queue Remove Operation (Dequeue)

Algorithm for dequeue operation

procedure dequeue
if queue is empty
return underflow
end if

data = queue[front]
front ← front + 1
return true
end procedure

59
Implementation of dequeue() in C programming language −

int dequeue() {
if(isempty())
return 0;

int data = queue[front];


front = front + 1;

return data;
}

Watch:

• Introduction to Queues By mycodeschool (https://youtu.be/XuCbpw6Bj1U)


• Queues By CS50 (https://youtu.be/3TmUv1uS92s)
• Array implementation of queues By mycodeschool (https://youtu.be/okr-XE8yTO8)
• Linked list implementation of queues By mycodeschool (https://youtu.be/A5_XdiK4J8A)
• Circular Queue Implementation – Array By Blue Tree Code (https://youtu.be/8sjFA-IX-
Ww)

Read:

• Section 3.7 The Queue ADT (pp. 112-116)


Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

• Section 6.2 Queues (pp.238-247)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 1.3 Bags, Queues, and Stacks (pp. 120-157)


Sedgewick, Robert, Wayne, Kevin (2011). Algorithms 4th Edition

Review:

1. What is a queue? How is it different from stack?


2. How are queues implemented?
3. Give two queue operations, and what do these operations do?
4. What are the different types of queues?
5. What is the advantage of using the linked list implementation of queues, as opposed to
the array implementation?

60
Activities/Assessments:
1. Suppose that Q is an initially empty array-based queue of size 5. Show the values of the
data members front and back after each statement has been executed. Indicate any errors
that might occur.

Queue<Character> Q( 5 ); front = _____ back = _____

Q.enqueue ( ‘A’ ); front = _____ back = _____

Q.enqueue ( ‘B’ ); front = _____ back = _____

Q.enqueue ( ‘C’ ); front = _____ back = _____

char c = Q.dequeue( ); front = _____ back = _____

Q.enqueue ( ‘A’ ); front = _____ back = _____

2. A letter means enqueue and an asterisk means dequeue in the following sequence. Give
the sequence of values returned by the dequeue operations when this sequence of
operations is performed on an initially empty FIFO queue.

E A S * Y * Q U E * * * S T * * * I O * N * * *

3. Suppose that a client performs an intermixed sequence of (queue) enqueue and dequeue
operations. The enqueue operations put the integers 0 through 9 in order onto the queue;
the dequeue operations print out the return value. Which of the following sequence(s)
could not occur?
a. 0 1 2 3 4 5 6 7 8 9
b. 4 6 8 7 5 3 2 9 0 1
c. 2 5 6 7 4 8 9 3 1 0
d. 4 3 2 1 0 5 6 7 8 9

Programming:
1. The Josephus problem is the following game: N people, numbered 1 to N, are sitting in a
circle. Starting at person 1, a hot potato is passed. After M passes, the person holding the
hot potato is eliminated, the circle closes ranks, and the game continues with the person
who was sitting after the eliminated person picking up the hot potato. The last remaining
person wins. Thus, if M = 0 and N = 5, players are eliminated in order, and player 5 wins.
If M = 1 and N = 5, the order of elimination is 2, 4, 1, 5.

a) Write a program to solve the Josephus problem for general values of M and N. Try
to make your program as efficient as possible. Make sure you dispose of cells.

b) What is the running time of your program?

c) If M = 1, what is the running time of your program? How is the actual speed affected
by the delete routine for large values of N (N > 100,000)?

61
Unit 7 - Trees

Overview:
We have already considered two data structures, the stack and the queue. Both are linear in
nature and are represented in the memory either by a sequentially allocated vector (array) or by
a linked linear list. In this unit we now focus on two important non-linear structures- the tree and
the binary tree.

Module Objectives:
After successful completion of this unit, you should be able to:

• Discuss what a tree and its characteristics


• Discuss the different representation of a tree
• Discuss what a binary tree and its characteristics
• Differentiate between a tree and a binary tree
• Demonstrate the conversion between a tree and a binary tree
• Discuss the concept of searching using a binary tree
• Implement your own tree and binary tree application

Course Materials:

A tree is a nonlinear data structure and is a special form of directed graph. It consist of a
finite set of elements called nodes (or vertices) and a finite set of directed arcs called branches
(or edge) that connect pairs of nodes with the following properties:

a) there is a specially designated node called the root


b) the remaining nodes are partitioned into n >= 0 disjoint sets T1, …, Tn where each of
these sets is a tree. T1, …, Tn are called the subtrees of the root.

Vertices of a tree which have successors are called ‘nonterminal vertices,’ while vertices
that have no successors are called ‘terminal vertices’ or ‘leaves.’ Figure 38 Illustrates a tree with
root vertex a; two nonterminal vertices a, c; and four terminal vertices b, d, e, f.

Trees in which each nonterminal vertex has at most n successors are called ‘n-ary’ trees.
Trees in which nonterminal vertex has at most two successors are called ‘binary’ trees. The tree
in Figure 38 is an example of an n-ary tree or a General Tree.

62
a

b c

d e f

Figure 38 A tree showing terminal and nonterminal vertices

Tree Representation

Tree structures may be indicated by parentheses, nesting, or indentation as illustrated in


Figure 39 which shows alternative representations of the tree of Figure 38.

A
B
C
(A(B)(C(D)(E)(F))) D
E
F
Parentheses

Indentation

B D E F
b c
C

A
d e f

Nesting Tree

Figure 39 Alternative representations of tree

63
Tree Vocabulary

Node a component of the tree that holds data


the line or connection between a parent and child (there may be a
Edge/branch
weight value associated with the edge)
top element of the hierarchical structure; the starting element. There
Root
is only one root per tree
a node's immediate successor; a node may have zero or more
Child
successors
a node's predecessor: if node C is a child of node P, then P is a
Parent
parent of C; a node has exactly one parent
Leaf /Terminal a node that has no children
an extension of the child relationship: If C is a child of P, then C is a
Descendant descendant of P, or if C is a child of a descendant D then C is a
descendant of D
an extension of the parent relationship: If D is a descendant of A then
Ancestor
A is an ancestor of D.
Sibling child node that has the same parent as another node
Internal node/
a node that has at least one child
Non-terminal
Path the sequence of edges between a node and one of its descendants
Path length the number of edges in a path
the length of the path connecting a node to the root. The root's depth
Depth or level or level is 0. A level of a tree consists of all nodes at the same depth.
The depth of the tree is equal to the height of a tree.
the longest path length in the tree; the number of edges from the node
Height
to the deepest leaf; the height of a tree is the height of the root
Subtree the tree made of all descendants using some non-root node
Degree The number of subtrees of a node; the number of children of a node
Degree of a
The maximum degree of the nodes in the tree.
tree
A set of n >=0 disjoint trees. If you remove the root of a tree you get a
Forest
forest. If you add a root to a forest, you get a tree.

N-ary Tree Traversals

Many algorithms pertaining to tree structures usually involve a process in which each node
of the tree is “visited”, or processed, exactly once. Such a process is called a traversal. A complete
traversal of a tree yields a linear arrangement of the nodes of the tree. Such a linear ordering can
be very useful in processing the information stored in the nodes of the tree.

64
There are various ways to visit (traverse or search) a general tree but the two most
common ones will be discussed here. The simplest two search techniques are known as Depth-
First Search (DFS) and Breadth-First Search (BFS). These two searches are described by looking
at how the search tree (representing all the possible paths from the start) will be traversed.

Depth-First Search with a Stack

In depth-first search we go down a path until we get to a dead end; then we


backtrack or back up (by popping a stack) to get an alternative path.

Create a stack
Create a new choice point
Push the choice point onto the stack
while (not found and stack is not empty)
Pop the stack
Find all possible choices after the last one tried
Push these choices onto the stack
Return

Breadth-First Search with a Queue

In breadth-first search we explore all the nearest possibilities by finding all possible
successors and enqueue them to a queue.

Create a queue
Create a new choice point
Enqueue the choice point onto the queue
while (not found and queue is not empty)
Dequeue the queue
Find all possible choices after the last one tried
Enqueue these choices onto the queue
Return

Binary Tree

The binary tree is by far the most important non-linear data structure in computer
algorithms. A binary tree is a finite set of nodes which is either empty or consists of a root node
and maximum of two disjoint binary trees called left and right subtrees of the root.

Note that while a tree must have at least one node, a binary tree may be empty. Likewise,
a tree may have any number of subtrees; a binary tree can have at most two subtrees. Following
are examples of binary trees.

Figure 40 Different binary trees

65
Tree Vocabulary

Left subtree The descendants to the left of a node


Right subtree The descendants to the right of a node
Similar Two binary trees are said to be similar if they have the same structure
Two binary trees are said to be equivalent if they are similar and contain
Equivalent
the same information.

Figure 41 Tree1, Tree2, Tree3 are all similar; Tree1, Tree2 are equivalent

n-Ary Tree to Binary Tree Conversion

One issue with n-ary traversal is that it is quite difficult to search or traverse as there can
be uneven number of subtrees in the tree. A way to resolve this is to convert the n-ary tree into a
binary tree. With a binary tree we can be assured of the search or traversal as there is only two
possible paths to take. Following are the steps in converting a n-ary tree to a binary tree.

Step 1 − The left child of a node in the n-ary tree becomes the left child of the node in the
binary tree.
Step 2 − The next sibling of each node in the n-ary tree becomes the right child of the
node in the binary tree.

A
A
B

B C D C

E D
E F G
F G
n-ary tree
binary tree
Figure 42 Convert an N-ary Tree to a Binary Tree

66
Types of Binary (based on Structure)

• Rooted binary tree: It has a root node and every node has at most two children.

• Full binary tree: It is a tree in which every node in the tree has either 0 or 2 children.

The number of nodes, n, in a full binary tree


is at least n = 2h – 1, and at most n = 2h+1 –
1, where h is the height of the tree.

The number of leaf nodes l, in a full binary


tree is number, L of internal nodes + 1, i.e, l
= L+1.

Figure 43 Full Binary tree

• Perfect binary tree: It is a binary tree in which all interior nodes have two children and
all leaves have the same depth or same level.

A perfect binary tree with l leaves has n =


2l-1 nodes.

In perfect full binary tree, l = 2h and n =


2h+1 - 1 where, n is number of nodes, h is
height of tree and l is number of leaf nodes

Figure 44 A perfect binary tree

• Complete binary tree: It is a binary tree in which every level, except possibly the last,
is completely filled, and all nodes are as far left as possible.

Figure 45 An example of a complete binary tree

The number of internal nodes in a complete binary tree of n nodes is floor(n/2).

67
• Balanced binary tree: A binary tree is height balanced if it satisfies the following
constraints:

a) The left and right subtrees' heights differ by at most one, AND
b) The left subtree is balanced, AND
c) The right subtree is balanced

An empty tree is height balanced.

Figure 46 Balanced binary tree

The height of a balanced binary tree is O(Log n) where n is number of nodes. Further
discussion about balanced trees is presented in the next unit.

• Degenarate tree: It is a tree where each parent node has only one child node. It
behaves like a linked list. This type of tree is considered a skewed tree. A skewed tree
is one with more levels of nodes in one subtree of a node than in the other subtree.

Figure 47 A Degenerate tree (skewed to the right)

Binary Tree Traversals

Recall that tree traversal is a process to visit all the nodes of a tree and may print their
values too. Because, all nodes are connected via edges (links) we always start from the root
(head) node. That is, we cannot randomly access a node in a tree. Generally, we traverse a tree
to search or locate a given item or key in the tree or to print all the values it contains.

68
In a binary tree, there are three different entities involved: a root, its left subtree, and its
right subtree. The sequence in which we process these entities defines a particular traversal
method.

We will consider three such traversal methods, namely: preorder traversal, inorder traversal, and
postorder traversal. These three methods are defined recursively. When a binary tree is empty, it
is traversed by doing nothing.

1. Preorder traversal

In this traversal method, the root node is visited first, then the left subtree and
finally the right subtree.

Figure 48 Pre Order Traversal

We start from A, and following pre-order traversal, we first visit A itself and then
move to its left subtree B. B is also traversed pre-order. The process goes on until all the
nodes are visited. The output of pre-order traversal of this tree will be −

A→B→D→E→C→F→G

Algorithm

Until all nodes are traversed −


Step 1 − Visit root node.
Step 2 − Recursively traverse left subtree.
Step 3 − Recursively traverse right subtree.

2. Inorder (or symmetric order) traversal

In this traversal method, the left subtree is visited first, then the root and later the
right sub-tree. We should always remember that every node may represent a subtree
itself.

69
If a binary tree is traversed in-order, the output will produce sorted key values in
an ascending order.

Figure 49 In Order Traversal

We start from A, and following in-order traversal, we move to its left subtree B. B
is also traversed in-order. The process goes on until all the nodes are visited. The output
of inorder traversal of this tree will be −

D→B→E→A→F→C→G

Algorithm

Until all nodes are traversed −


Step 1 − Recursively traverse left subtree.
Step 2 − Visit root node.
Step 3 − Recursively traverse right subtree.

3. Postorder traversal

In this traversal method, the root node is visited last, hence the name. First, we
traverse the left subtree, then the right subtree and finally the root node.

Figure 50 Postorder Traversal

70
We start from A, and following Post-order traversal, we first visit the left subtree B.
B is also traversed post-order. The process goes on until all the nodes are visited. The
output of post-order traversal of this tree will be −

D→E→B→F→G→C→A

Algorithm

Until all nodes are traversed −


Step 1 − Recursively traverse left subtree.
Step 2 − Recursively traverse right subtree.
Step 3 − Visit root node.

Construction of a Tree Using Traversal Information

We have seen in previous discussions how a tree can be presented in various forms. We
also have learned of the different traversal methods that one can use in passing through the
different nodes of the tree.

By using the traversal information, we can re-construct a tree to its original form. To do
this, one will need two pieces of information: the in-order traversal, and the pre-order or post-
order traversal.

With the pre-order or post-order traversal information, one can easily identify the root. The
root is the first value or node in a pre-order traversal, and the last value or node for a post-order
traversal. On the other hand, with the in-order traversal, we can easily identify which nodes falls
on the left-side or on the right-side of a tree, given of course the root node (from the pre-order or
post-order traversal).

To present clearly, let us consider the binary tree below with its corresponding pre-order,
post-order, and in-order traversal information. The tree will be our basis if what we have
reconstructed is correct or not.

C Traversal Information

Pre-order:
B G CBAGFDEH
Post-order:
ABEDFHGC
In-order:
A F H ABCDEFGH

D Figure 51 Tree reconstruction using Traversal


information

E
71
Sample 1: Using the pre-order and in-order traversal information in Figure 51, reconstruct the
binary tree.

Pre-order: CBAGFDEH

In-order: ABCDEFGH

First, we begin by analyzing the pre-order traversal information and identify the root among
the values. Since in a pre-order traversal, it is the root node that is first visited, we identify the
value C to be the root node. We then draw the tree with the information we have as seen below.

Figure 52 Root node

Using this information, we proceed with the second process, that is we look into the in-
order traversal information and identify the nodes on the left of node C and the nodes on the right
of node C. From here, we see that nodes A and B are found on the left of node C and thus form
the left-subtree, while nodes D, E, F, G, and H are found on the right of node C forming the right-
subtree.

Let’s take the left subtree of root C first. Here we see that nodes A and B are part of the
left subtree. We need to identify which node is the root of the left subtree. For this, we need to go
back to the pre-order traversal and identify among nodes A and B the root node. Again, in the
pre-order traversal, the first node evaluated is always the root. Thus, we can say that node B is
the root of the left subtree. From, the in-order traversal, we see that node A is at the left of node
B, so we draw the tree as seen below.

Figure 53 Forming the left subtree for root C

We check now the right subtree. From amongst nodes D, E, F, G, and H, we need to see
using the pre-order traversal information the node that will serve as the root. Among said nodes,
we find that G is at the front of the list, thus G serves as the root of the right subtree. We now
have a tree with information on the roots of both its left and right subtree as seen in Figure 54.

72
C

B G

Figure 54 Roots for the left-subtree and right-subtree of node C

Going back to the in-order traversal, we check the remaining nodes with respect to node
G. Here we can ascertain that nodes D, E, and F are found on the left side of G, whereas node H
is found at the right of G. We then update our tree, adding node H to the picture.

B G

A H

Figure 55 Updated tree with node H at the right of node G

We are now left with nodes D, E, and F. Going back to the pre-order traversal information,
we identified node F to be the root of the left subtree of node G. With that, we check back on the
in-order traversal and find nodes D and E at the left of node F and thus forming the nodes of the
left-subtree of node F. The updated tree with the node F added can be seen below.

B G

A F H

Figure 56 Updated tree with node F inserted.

Looking back at the pre-order traversal, we notice that node D comes first before node E
and thus mean that node D is the root of the subtree of node F. At the in-order traversal, we can
discern that node E is at the right of node D. The final tree is shown in Figure 57.

73
C

B G

A F H

Figure 57 Reconstructed tree from pre-order and in-order traversal information.

Expression Trees

An expression tree is a representation of expressions arranged in a tree-like data


structure. In other words, it is a tree with leaves as operands of the expression and nodes contain
the operators. Similar to other data structures, data interaction is also possible in an expression
tree. Expression trees are mainly used for analyzing, evaluating and modifying expressions,
especially complex expressions.

Expression tree represents an expression as a binary tree. Operators are parent nodes to
operands (leaf nodes) for example expression tree for 3 + ((5+9)*2) would be:

Figure 58 Expression tree for 3 + ( (5 + 9) * 2)

The specified order of tree traversal gives method for evaluating expression. Inorder
traversal of expression tree produces infix version of given postfix expression (same with preorder
traversal it gives prefix expression)

74
Evaluating the expression represented by expression tree:
Let t be the expression tree
If t is not null then
If t.value is operand then
Return t.value
A = solve(t.left)
B = solve(t.right)

// calculate applies operator 't.value'


// on A and B, and returns value
Return calculate(A, B, t.value)

Construction of Expression Tree:

In constructing an expression tree we use a stack. We loop through the input expression
and do the following for every character.

a) If character is operand push that into stack


b) If character is operator pop two values from stack make them its child and push
current node again.

At the end only element of stack will be root of expression tree.

Binary Search Tree

A "binary search tree" (BST) or "ordered binary tree" is a type of binary tree where the
nodes are arranged in order: for each node, all elements in its left subtree are less-or-equal to the
node (<=), and all the elements in its right subtree are greater than the node (>). Recursively,
each of the subtrees must also obey the binary search tree constraint: in the (1, 3, 4) subtree (see
Figure 59), the 3 is the root, the 1 <= 3 and 4 > 3.

Figure 59 Example of a Binary Search Tree (BST)

75
The above properties of Binary Search Tree provide an ordering among keys so that the
operations like search, minimum and maximum can be done fast. If there is no ordering, then we
may have to compare every key to search a given key.

Tree Vocabulary

The largest value in the left subtree. The immediate predecessor is located
Immediate
by choosing the left pointer and chasing right pointers until a node is
predecessor
reached with a nil right pointer.
The smallest value in the right subtree. Located by choosing the right
Immediate
pointer of a node, then chase left pointers until a node is reached with a nil
successor
left pointer.

Searching in BST

Search operation in binary search tree will be very similar to that of binary search. Let’s
say we want to search for number, what we’ll do is we’ll start at root and then we will compare the
value to be searched with value of root if it’s equal we are done with the search if it’s lesser we
know that we need to go to the left subtree because in a binary search tree all the elements in the
left subtree are lesser and all the elements in right subtree are greater.

Searching an element in the binary search tree is basically this traversal in which at each
step we will go either towards left or right and hence in at each step we discard one of the sub-
trees. If the tree is balanced, we call a tree balanced if for all nodes the difference between the
heights of left and right subtrees is not greater than one, we will start with a search space of ‘n’
nodes and when we will discard one of the sub-trees we will discard ‘n/2’ nodes so our search
space will be reduced to ‘n/2’ and then in the next step we will reduce the search space to ‘n/4’
and we will go on reducing like this till we find the element or till our search space is reduced to
only one node. The search here is also a binary search and that’s why the name binary search
tree.

Algorithm

Until an element is found or not −


Step 1 – Start from the root
Step 2 − Compare the inserting element with root, if less than root,
then recurse for left, else recurse for right.
Step 3 − If element to search is found anywhere, return true, else
return false.

Insertion and Deletion of Nodes in a BST

A new key is always inserted at leaf. We start searching a key from root till we hit a leaf
node. Once a leaf node is found, the new node is added as a child of the leaf node.

76
The worst-case time complexity of search and insert operations is O(h) where h is height
of Binary Search Tree. In worst case, we may have to travel from root to the deepest leaf node.
The height of a skewed tree may become n and the time complexity of search and insert operation
may become O(n).

Algorithm

Inserting key as child of leaf node −


Step 1 – Start from the root
Step 2 − Compare the inserting element with root, if less than root,
then recurse for left, else recurse for right.
Step 3 − After reaching end, just insert that node at left(if less
than current) else right.

When we delete a node, three possibilities arise.

1. Node to be deleted is leaf: Simply remove from the tree.

Figure 60 BST - node deleted as a leaf

2. Node to be deleted has only one child: Copy the child to the node and delete the child.

Figure 61 BST - node deleted has one child

3. Node to be deleted has two children: Find inorder successor of the node. Copy
contents of the inorder successor to the node and delete the inorder successor. Note
that inorder predecessor can also be used.

Figure 62 BST - node deleted has two children

77
The important thing to note is, inorder successor is needed only when right child is not
empty. In this particular case, inorder successor can be obtained by finding the
minimum value in right child of the node.

Heap Tree

Heaps are based on the notion of a complete tree, for which we gave an informal definition
earlier. A heap is a binary tree with two special properties:

1. Ordering property

The value in a node is smaller than all the values in the node's subtrees (MinHeap
property). This is very different than a binary search tree! Note that this property makes
no distinction between ‘left’ and ‘right’. All the following trees have this property but
only one is a heap:

Figure 63 Ordering property - node is smaller than


all the values in the node's subtree

The smallest value must always be in the root, but the largest value could be in any
leaf. Similar values are not collected together in any convenient location. Incidentally,
heaps can also be defined with the opposite ordering property: the value in each node
is larger than all the values in its subtrees (MaxHeap property).

2. Structural property

All the levels are full, except perhaps the bottom level, and all the nodes in the bottom
level are as far to the ‘left’ as possible. A level is full if there are no ‘missing nodes’ at
the level - there are 2**L nodes in level L.

This structural property could be rephrased: the only nodes that can have an empty
subtree are ones in the bottom level (by definition these have 2 empty subtrees) and
the rightmost nodes in the next-to-bottom level.

A tree that has this property is called a complete tree. There are two motives for
restricting ourselves to complete trees:

a. In a complete tree Height = O(logN). As we'll see, our Heap operations are
O(Height), so by restricting ourselves to complete trees, we get O(logN)
operations.

b. complete trees have certain advantages over general binary trees in some
implementations.

78
Although all 3 of the Figure 63 trees have the ordering property, only one of them
satisfies the structural requirement – that is the middle one.

Operations on Heaps

The common operation involved using heaps are:

• Heapify → Process to rearrange the heap in order to maintain heap-property.

• Find-max (or Find-min) → find a maximum item of a max-heap, or a minimum item


of a min-heap, respectively.

• Insertion → Add a new item in the heap.

• Deletion → Delete an item from the heap.

• Extract Min-Max → Returning and deleting the maximum or minimum element in max-
heap and min-heap respectively.

Heapify

It is a process to rearrange the elements of the heap in order to maintain the heap
property. It is done when a certain node causes an imbalance in the heap due to some
operation on that node. The heapify can be done in two methodologies:

• up_heapify() → It follows the bottom-up approach. In this, we check if the


nodes are following heap property by going in the direction of rootNode and if
nodes are not following the heap property, we do certain operations to let the
tree follows the heap property.

• down_heapify() → It follows the top-down approach. In this, we check if the


nodes are following heap property by going in the direction of the leaf nodes
and if nodes are not following the heap property, we do certain operations to
let the tree follows the heap property.

void down_heapify(int heap[], int parent, int size)


{
largest = parent
leftChild = 2*parent + 1
rightChild = 2*parent + 2

if(leftChild < size && heap[leftChild] > heap[largest])


largest = leftChild
if(rightChild < size && heap[rightChild] > heap[largest])
largest = rightChild
if(parent != largest)
{
swap(heap[parent], heap[largest])
down_heapify(heap,largest,size)
}
}

79
void up_heapify(int heap[], int child)
{
parent = (child-1)/2;
if(heap[parent] < heap[child]) {
swap(heap[parent], heap[child])
up_heapify(heap,parent)
}
}

Insertion

The insertion in the heap follows the following steps

• Insert the new element at the end of the heap.


• Since the newly inserted element can distort the properties of the Heap. So,
we need to perform up_heapify() operation, in order to keep the properties of
the heap in a bottom-up approach.

(a) (b) (c)

Figure 64 Insertion process in Heaps

As an example, consider heap tree (a) in Figure 64 which follows max-heap


property. New element to be inserted is 12.

Performing Step 1: Insert the new element at the end will result in (b). Second step
is to heapify the new element following bottom-up approach. Final heap is (c).

void up_heapify(int heap[], int child)


{
parent = (child-1)/2;
if(heap[parent] < heap[child]) {
swap(heap[parent], heap[child])
up_heapify(heap,parent)
}
}

void insert(int heap[], int size, int key)


{
heap.append(key)
up_heapify(heap,size+1,size)
}

80
Deletion

The deletion operations follow the following step:

• Replace the element to be deleted by the last element in the heap.


• Delete the last item from the heap.
• Now, the last element is placed at some position in heap, it may not follow the
property of the heap, so we need to perform down_heapify() operation in order
to maintain heap structure. The down_heapify() operation does the heapify in
the top-bottom approach.

The standard deletion on Heap is to delete the element present at the root node of
the heap.

(a) (b) (c)

Figure 65 Deletion process in Heaps

For example, consider the heap tree (a) in Figure 65 following a max-heap
property. Element to be deleted is 12. Replacing the root with the last element ‘4’ and
deleting it will result to (b). Since the resulting tree does not follow the property of the heap
we perform down_heapify() to the new root resulting to the final heap tree (c).

void down_heapify(int heap[], int parent, int size)


{
largest = parent
leftChild = 2*parent + 1
rightChild = 2*parent + 2

if(leftChild < size && heap[leftChild] > heap[largest])


largest = leftChild
if(rightChild < size && heap[rightChild] > heap[largest])
largest = rightChild
if(parent != largest)
{
swap(heap[parent], heap[largest])
down_heapify(heap,largest,size)
}
}

void deleteRoot(int heap[], int size)


{
swap(heap[0], heap[size-1]) //swap first and last element
heap.pop_back() //delete the last element
down_heapify(heap,0,size-1)
}

81
Find-max (or Find-min)

The maximum element and the minimum element in the max-heap and min-heap
is found at the root node of the heap.

int findMax(int heap[])


{
return heap[0]
}

Extract Min-Max

This operation returns and deletes the maximum or minimum element in max-heap
and min-heap respectively. The maximum element is found at the root node.

int extractMax(int heap[], int size)


{
ans = heap[0]
deleteRoot(heap, size)
return ans
}

Watch:

• Introduction to Trees By CS Dojo (https://youtu.be/1-l_UOFi1Xw)


• Introduction to Trees By mycodeschool (https://youtu.be/qH6yxkw0u78)
• Binary Search Tree By mycodeschool (https://youtu.be/pYT9F8_LFTM)
• Deleting a node in a BST By mycodeschool (https://youtu.be/gcULXE7ViZw)
• Binary tree traversal - breadth-first and depth-first strategies By mycodeschool
(https://youtu.be/9RHO6jU--GU)
• Binary tree traversal – Preorder, Inorder, Postorder By mycodeschool
(https://youtu.be/gm8DUJJhmY4)
• Binary Expression Trees By Miran Fattah (https://youtu.be/_LxbhLNRZkI)
• Introduction to Binary Heaps By Algorithms with Attitude
(https://youtu.be/WCm3TqScBM8)
• Heaps (MinHeaps) By HackerRank (https://youtu.be/t0Cq6tVNRBA)

Read:

• Chapter 8 Trees
Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

82
• Chapter 4 – Trees (Sec. 4.1 – 4.3, pp. 121-144)
Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

• Section. 4.6 Tree Traversals (pp. 166-168)


Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

• Section 9.3 Heaps (pp. 370-384)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

Review:

1. What is a tree?
2. What is a binary tree? How is it different from a tree?
3. What is an advantage of a binary search tree over a binary tree?
4. How to check if a given Binary Tree is BST or not?
5. What is a heap?
6. Describe the difference between a Max Heap over a Min Heap.

Activities/Assessments:
1. The following questions refer to the tree below.
a. Which node is the root?
b. What are the internal nodes?
c. How many descendants does node cs016/ have?
d. How many ancestors does node cs016/ have?
e. What are the siblings of node homeworks/?
f. Which nodes are in the subtree rooted at node projects/?
g. What is the depth of node papers/?
h. What is the height of the tree?

83
2. Draw a sketch of the logical representation for the following tree.

3. Draw a sketch of the logical representation for the following tree.

4. Draw a sketch of the logical representation for the following tree.

5. Draw a sketch of the logical representation for the following tree.

6. Given the pre-order traversal and in-order traversal of a tree, reconstruct the tree.
Pre: Y O C P H I R G T
In: COPYRIGHT

7. Given the post-order traversal and in-order traversal of a tree, reconstruct the tree.
Post: C O P Y R I G H T
In: COYPTRGIH

8. Draw the binary tree representation of the following arithmetic expression:


(((5+2) ∗ (2−1))/((2+9)+((7−2)−1)) ∗8)

84
9. Draw the structure of a binary search tree
a. after these values have been inserted: 19, 34, 23, 16, 54, 89, 24, 29, 15, 61, 27.
b. after two delete operations (deleting the root)
10. Draw the structure of a binary heap (MinHeap):
a. after these priorities have been inserted: 19, 34, 23, 16, 54, 89, 24, 29, 15, 61, 27.
b. after two deleteMin operations (using the tree in (a)).
Programming:
1. Write a program that implements BINARY TREE TRAVERSALS using preorder traversal,
inorder traversal, and postorder traversal to produce a linear arrangement of the nodes.
Program input consists of the root nodes and its left and right subtrees.

The output should list the following information:


1. the left and right subtrees of each node (show null if none)
2. the root of the tree
3. preorder traversal
4. inorder traversal
5. postorder traversal

Example

Input:
(A,B,E) (this means A is the root, B is the left subtree, and E is the right subtree)
(B,C,null) (this means B is the root, C is the left subtree, and there is no right subtree)
(C,null,D)
(E,F,H)
(F,null,G)
(H,I,J)

Output:
Node L-Subtree R-Subtree
A B E
B C null
C null D
E F H
F null G
H I J

Root of the Tree: A

Preorder Traversal: A B C D E F G H I J
Inorder Traversal: C D B A F G E I H J
Postorder Traversal: D C B G F I J H E A

85
Unit 8 – Balanced Trees

Overview:
The search time in a binary search tree depends on the form of the tree, that is on the order in
which its nodes were inserted. In practice, one can however not always guarantee that binary
search trees are built randomly. Binary search trees are thus only interesting if they are “relatively
complete.” So we must look for specializations of binary search trees whose worst-case
performance on the basic tree operations can be guaranteed to be good, that is O(log2n) time. In
this unit we will talk about balanced trees as a solution to the problem of search time. Particularly
we will look into the balanced properties of binary search trees and m-way search trees.

Module Objectives:
After successful completion of this unit, you should be able to:

• Describe what a balanced tree is


• Discuss AVL tree and its characteristics
• Discuss B-tree and its characteristics

Course Materials:

The Binary Search Tree has a serious deficiency for practical use as a search structure.
That is the fact that it can easily become unbalanced, so that some nodes are deep in the tree. In
fact, it is possible for a BST with n nodes to have a depth of n, making it no faster to search in the
worst case than a linked list. If we could keep the tree balanced in some way, then search cost
would only be O(log2n), a huge improvement.

Balanced Tree is a tree where no leaf is much farther away from the root than any other
leaf. Different balancing schemes allow different definitions of "much farther" and different
amounts of work to keep them balanced:

1. A binary tree is height-balanced if for each node the heights of its subtrees differ by
at most 1. (Remember, the height of an empty subtree is -1).
2. A binary tree is weight-balanced if for each node the numbers of inner nodes in its
left subtree and right subtree differ by at most 1.

The tree below is height-balanced. It is not weight-balanced because the left subtree of
the root has 3 inner nodes but the right subtree of the root has only 1. It has been shown that a
weight-balanced tree is also height-balanced.

86
Figure 66 Height-balanced Tree

The key term in both definitions is “for each node”. The property being described must
hold for all nodes, not just the root. To see the need for “for each node”, consider the tree whose
root has a copy of the tree to the right as its left subtree and another copy as its right subtree. It
is not weight-balanced, even though the left and right subtrees of the root have the same number
of inner nodes.

The important data structure called a Binary Search Tree, or BST, was invented in order
to make searching for a value more efficient, say, than searching in an unordered array. However,
if the BST tree is unbalanced then search is not efficient. It is difficult and inefficient to keep a BST
balanced as values are added and deleted.

Therefore, computer scientists looked for ways to enhance the BST data structure in order
to make balancing more efficient as values are added and deleted. This required modifying the
BST —by adding more information to each node or by allowing a node to have more than two
children. Invented were the AVL tree, the 2-3 tree, the B- tree, the red-black tree, and many more.

With each such new tree data structure, the inventor had to say what property they meant
by balanced. The goal was the same, of course: if the tree was balanced according to the
definition, then the inventors could prove that the height of the tree was O(log size-of-tree).

Height Balanced Trees

Height balance tree (or self-balancing tree) is a binary tree which automatically maintain
height of tree, and its sub tree on each insertion and deletion of node. And tree is always a
complete tree. AVL tree, red-black tree are example of height balanced tree. AVL tree uses
balance factor to balance the height of tree. Balance factor is nothing but difference between
height of left subtree and right subtree.

AVL Tree

AVL trees is named after its two inventors, G.M. Adelson-Velskii and E.M. Landis,
who published it in their 1962 paper "An algorithm for the organization of information." As
the name suggests AVL trees are used for organizing information. They are not perfectly
balanced, but maintain O(log2n) search, insertion, and deletion times, when n is the
number of nodes in the tree.

87
An AVL tree is a binary search tree where the sub-trees of every node differ in
height by at most 1.

Figure 67 Example of an AVL tree

A tree is perfectly height-balanced if the left and right subtrees of any node are the
same height. e.g.

Figure 68 Perfectly height-balanced tree

It is clear that at every level there are twice as many nodes as at the previous level,
so we do indeed get H = O(logN). However, perfect height balance is very rare: it is only
possible if there are exactly 2^H-1 nodes.

As a practical alternative, we use trees that are `almost' perfectly height balanced.
We will say that a tree is height-balanced if the heights of the left and right subtree's of
each node are within 1. The following tree fits this definition:

Figure 69 Height balanced tree

88
How can we tell if a tree is height-balanced? We have to check every node. The
fastest way is to start at the leaves and work your way up. When you reach a node, you
will know the heights of its two subtrees; from that you can tell whether it is height-balanced
and also compute the node's height (for use by its parent). For example:

(a) (b) (c)


Figure 70 Checking for height balance

a. C and D are leaves. Their subtrees are all height 0 so C and D are both
perfectly balanced. Having finished D we can compute the heights of B's
subtrees.
b. B is not perfectly balanced, but the heights of its subtrees differ only by 1, so B
is regarded as height balanced. Now we can compute the balance of A.
c. Like B, A's two subtrees differ by 1 in height. We have now looked at every
node; every node is height-balanced, so the tree as a whole is considered to
be height-balanced.

Insertion in an AVL Tree

Insertion in an AVL tree is the same as in a binary search tree which is always
done by expanding an external node. To make sure that the given tree remains AVL after
every insertion, we must augment the standard BST insert operation to perform some re-
balancing. Following are two basic operations that can be performed to re-balance a BST
without violating the BST property (keys(left) < key(root) < keys(right)).

1) Left Rotation
2) Right Rotation

A tree rotation can be an intimidating concept at first. You end up in a situation


where you're juggling nodes, and these nodes have trees attached to them, and it can all
become confusing very fast. It helps to block out what's going on with any of the subtrees
which are attached to the nodes you're working with.

Left Rotation (LL) A


Imagine we have this situation:
B

C
Figure 71 Before Left Rotation (Single)

89
To fix this, we must perform a left rotation, rooted at A. This is done in the
following steps:

• b becomes the new root.


• a takes ownership of b's left child as its right child, or in this case, null.
• b takes ownership of a as its left child.

B
The tree now looks like this:

A C
Figure 72 After performing a Left Rotation (Single)

Right Rotation (RR)

A right rotation is a mirror of the left rotation operation described above.


Imagine we have this situation:

To fix this, we will perform a single right rotation,


C rooted at C. This is done in the following steps:

• b becomes the new root.


B
• c takes ownership of b's right child, as its
left child. In this case, that value is null.
A • b takes ownership of c, as it's right child.
Figure 73 Before Right Rotation (Single)

B
The resulting tree:

A C
Figure 74 After performing a Right Rotation (Single)

Left-Right Rotation (LR) or "Double left"

Sometimes a single left rotation is not sufficient to balance an unbalanced


tree. Take this situation:

A C
A
C A
C
B B
(a) (b) (c)
Figure 75 Unbalanced tree after a Left Rotation (Single)

90
Initially with (a) we have a balanced tree. From this tree we insert node ‘B’,
the resulting tree is shown in (b) which is an unbalanced tree. An initial reaction
here is to do a single left rotation but the resulting tree (c) is still unbalanced. If we
were to do a single right rotation in this situation, we would be right back where we
started. What's causing this?

The answer is that this is a result of the right subtree having a negative
balance. In other words, because the right subtree was left heavy, our rotation was
not sufficient. What can we do? The answer is to perform a right rotation on the
right subtree. Read that again. We will perform a right rotation on the right subtree.
We are not rotating on our current root. We are rotating on our right child. Think of
our right subtree, isolated from our main tree, and perform a right rotation on it:

A A
B

C B
A C

B C
(a) (b) (c)

Figure 76 Left-Right Rotation (or Double Left Rotation)

Image (a) is the resulting unbalanced tree after adding ‘B’ in a balanced
tree. Performing a right rotation on the right subtree will result to (b). After
performing a rotation on our right subtree, we have prepared our root to be rotated
left. The final tree is represented by (c).

Right-Left Rotation (RL) or "Double right"

A double right rotation, or right-left rotation, or simply RL, is a rotation that


must be performed when attempting to balance a tree which has a left subtree,
that is right heavy. This is a mirror operation of what was illustrated in the section
on Left-Right Rotations, or double left rotations.

Let's look at an example of a situation where we need to perform a Right-


Left rotation.

C A C

B B
A C

B B A A C
(a) (b) (c) (d)

Figure 77 Right-Left Rotation (or Double Right Rotation)

91
In this situation, we have a tree (a) that is unbalanced. The left subtree has
a height of 2, and the right subtree has a height of 0. This makes the balance factor
of our root node, c, equal to -2. What do we do? Some kind of right rotation is
clearly necessary, but a single right rotation will not solve our problem. Performing
a right rotation resulted to a tree (b) that has a balance of 2. It would appear that
we did not accomplish much. Let’s go back to the original tree, before we did our
pointless right rotation.

The reason our right rotation did not work, is because the left subtree, or
'a', has a positive balance factor, and is thus right heavy. Performing a right rotation
on a tree that has a left subtree that is right heavy will result in the problem we just
witnessed. What do we do? The answer is to make our left subtree left-heavy. We
do this by performing a left rotation our left subtree. Doing so leaves us with (c).

This is a tree which can now be balanced using a single right rotation. We
can now perform our right rotation rooted at C resulting to (d).

How to decide when you need a tree rotation is usually easy, but determining which
type of rotation you need requires a little thought.

A tree rotation is necessary when you have inserted or deleted a node which
leaves the tree in an unbalanced state. An unbalanced state is defined as a state in which
any subtree has a balance factor of greater than 1, or less than -1. That is, any tree with
a difference between the heights of its two subtrees greater than 1, is considered
unbalanced.

Deletion in an AVL Tree

Deleting a node from an AVL tree is similar to that in a binary search tree. Deletion
may disturb the balance factor of an AVL tree and therefore the tree needs to be
rebalanced in order to maintain the AVL property. The same rotation rules is followed as
that of inserting a node in an AVL tree. Note however that unlike insertion, fixing the node
won’t fix the complete AVL tree. After fixing the unbalanced subtree, we may have to fix
ancestors as well.

B-Trees

M-way Search Trees

A binary search tree has one value in each node and two subtrees. This notion
easily generalizes to an M-way search tree, which has (M-1) values per node and M
subtrees. M is called the degree of the tree. A binary search tree, therefore, has degree
2.

In an M-way subtree a node can have anywhere from 1 to (M-1) values, and the
number of (non-empty) subtrees can range from 0 (for a leaf) to 1+(the number of values).
M is thus a fixed upper limit on how much data can be stored in a node.

92
The values in a node are stored in ascending order, V1 < V2 < ... Vk (k <= M-1)
and the subtrees are placed between adjacent values, with one additional subtree at each
end. We can thus associate with each value a `left' and `right' subtree, with the right
subtree of Vi being the same as the left subtree of V(i+1). All the values in V1's left subtree
are less than V1 ; all the values in Vk's subtree are greater than Vk; and all the values in
the subtree between V(i) and V(i+1) are greater than V(i) and less than V(i+1).

Figure 78 A 3-way Search Tree

Searching in an M-way Search Tree

The algorithm for searching for a value in an M-way search tree is the obvious
generalization of the algorithm for searching in a binary search tree. If we are searching
for value X and currently at node consisting of values V1...Vk, there are four possible
cases that can arise:

1. If X < V1, recursively search for X in V1's left subtree.


2. If X > Vk, recursively search for X in Vk's right subtree.
3. If X=Vi, for some i, then we are done (X has been found).
4. the only remaining possibility is that, for some i, Vi < X < V(i+1). In this case
recursively search for X in the subtree that is in between Vi and V(i+1).

For example, suppose we were searching for 68 in the tree in Figure 78. At the
root, case (2) would apply, so we would continue the search in V2's right subtree. At the
root of this subtree, case (4) applies, 68 is between V1=55 and V2=70, so we would
continue to search in the subtree between them. Now case (3) applies, 68=V2, so we are
done. If we had been searching for 69, exactly the same processing would have occurred
down to the last node. At that point, case (2) would apply, but the subtree we want to
search in is empty. Therefore we conclude that 69 is not in the tree.

The other algorithms for binary search trees - insertion and deletion - generalize in
a similar way. As with binary search trees, inserting values in ascending order will result
in a degenerate M-way search tree; i.e. a tree whose height is O(N) instead of O(logN).
This is a problem because all the important operations are O(height), and it is our aim to
make them O(logN). One solution to this problem is to force the tree to be height-balanced;
we sketched this last lecture for binary search trees and now we will examine it in detail
for M-way search trees.

93
B-Trees as a balanced M-way Search Tree

B-Tree is a self-balancing search tree. In most of the other self-balancing search


trees (like AVL and Red-Black Trees), it is assumed that everything is in main memory.
To understand the use of B-Trees, we must think of the huge amount of data that cannot
fit in main memory. When the number of keys is high, the data is read from disk in the
form of blocks. Disk access time is very high compared to the main memory access time.
The main idea of using B-Trees is to reduce the number of disk accesses. Most of the tree
operations (search, insert, delete, max, min, ..etc ) require O(h) disk accesses where h is
the height of the tree. B-tree is a fat tree. The height of B-Trees is kept low by putting
maximum possible keys in a B-Tree node. Generally, the B-Tree node size is kept equal
to the disk block size. Since the height of the B-tree is low so total disk accesses for most
of the operations are reduced significantly compared to balanced Binary Search Trees like
AVL Tree, Red-Black Tree, ..etc.

A B-tree is an M-way search tree with two special properties:

1) It is perfectly balanced: every leaf node is at the same depth.


2) Every node, except perhaps the root, is at least half-full, i.e. contains M/2 or
more values (of course, it cannot contain more than M-1 values). The root may
have any number of values (1 to M-1).

(a) (b)

Figure 79 A 3-way search tree; only (b) is a B-tree

Figure 80 5-way B-tree

Insertion into a B-tree

There are 3 steps to insert value X into a B-tree:

1. Using the SEARCH procedure for M-way trees find the leaf node to which X
should be added.

94
2. Add X to this node in the appropriate place among the values already there.
Being a leaf node there are no subtrees to worry about.
3. If there are M-1 or fewer values in the node after adding X, then we are finished.

If there are M nodes after adding X, we say the node has overflowed. To repair
this, we split the node into three parts:

Left: the first (M-1)/2 values


Middle: the middle value (position 1+((M-1)/2)
Right: the last (M-1)/2 values

Notice that Left and Right have just enough values to be made into individual
nodes. That's what we do... they become the left and right children of Middle, which we
add in the appropriate place in this node's parent.

But what if there is no room in the parent? If it overflows we do the same thing
again: split it into Left-Middle-Right, make Left and Right into new nodes and add Middle
(with Left and Right as its children) to the node above. We continue doing this until no
overflow occurs, or until the root itself overflows. If the root overflows, we split it, as usual,
and create a new root node with Middle as its only value and Left and Right as its children
(as usual).

For example, let's do a sequence of insertions into this B-tree (M=5, so each node other
than the root must contain between 2 and 4 values):

Figure 81 B-tree with M=5

Insert 17: Add it to the middle leaf. No overflow, so we're done.

Figure 82 B-tree with M=5 after inserting 17

Insert 6: Add it to the leftmost leaf. That overflows, so we split it:

• Left = [ 2 3 ]
• Middle = 5
• Right = [ 6 7 ]

95
Left and Right become nodes; Middle is added to the node above with Left and
Right as its children.

Figure 83 B-tree with M=5 after inserting 6

The node above (the root in this small example) does not overflow, so we are done.

Insert 21: Add it to the middle leaf. That overflows, so we split it:

• left = [ 17 21 ]
• Middle = 22
• Right = [ 44 45 ]

Left and Right become nodes; Middle is added to the node above with Left and
Right as its children.

Figure 84 B-tree with M=5 after inserting 21

The node above (the root in this small example) does not overflow, so we are done.

Insert 67: Add it to the rightmost leaf. That overflows, so we split it:

• Left = [ 55 66 ]
• Middle = 67
• Right = [ 68 70 ]

Left and Right become nodes; Middle is added to the node above with Left and
Right as its children.

96
Figure 85 B-tree with M=5 after inserting 67

But now the node above does overflow. So it is split in exactly the same manner:

• Left = [ 5 10 ] (along with their children)


• Middle = 22
• Right = [ 50 67 ] (along with their children)

Left and Right become nodes, the children of Middle. If this were not the root,
Middle would be added to the node above and the process repeated. If there is no node
above, as in this example, a new root is created with Middle as its only value.

Figure 86 B-tree with M=5 after splitting

The tree-insertion algorithms we're previously seen add new nodes at the bottom
of the tree, and then have to worry about whether doing so creates an imbalance. The B-
tree insertion algorithm is just the opposite: it adds new nodes at the top of the tree (a new
node is allocated only when the root splits). B-trees grow at the root, not at the leaves.
Because of this, there is never any doubt that the tree is always perfectly height balanced:
when a new node is added, all existing nodes become one level deeper in the tree.

97
Deleting Value from a B-tree

Recall our deletion algorithm for binary search trees: if the value to be deleted is
in a node having two subtrees, we would replace the value with the largest value in its left
subtree and then delete the node in the left subtree that had contained the largest value
(we are guaranteed that this node will be easy to delete).

We will use a similar strategy to delete a value from a B-tree. If the value to be
deleted does not occur in a leaf, we replace it with the largest value in its left subtree and
then proceed to delete that value from the node that originally contained it. For example,
if we wished to delete 67 from the tree in Figure 86, we would find the largest value in
67's left subtree, 66, replace 67 with 66, and then delete the occurrence of 66 in the left
subtree. In a B-tree, the largest value in any value's left subtree is guaranteed to be in
leaf. Therefore wherever the value to be deleted initially resides, the following deletion
algorithm always begins at a leaf.

To delete value X from a B-tree, starting at a leaf node, there are 2 steps:

1. Remove X from the current node. Being a leaf node there are no subtrees to
worry about.
2. Removing X might cause the node containing it to have too few values.

Recall that we require the root to have at least 1 value in it and all other nodes to
have at least (M-1)/2 values in them. If the node has too few values, we say it has
underflowed.

If underflow does not occur, then we are finished the deletion process. If it does
occur, it must be fixed. The process for fixing a root is slightly different than the process
for fixing the other nodes.

How do we fix a non-root node that has underflowed? Let us take as a specific
example, from this B-tree (of degree 5):

Figure 87 B-tree with degree 5 (M=5)

Removing 6 causes the node it is in to underflow, as it now contains just 1 value


(7). Our strategy for fixing this is to try to `borrow' values from a neighboring node. We join
together the current node and its more populous neighbor to form a `combined node' - and
we must also include in the combined node the value in the parent node that is in between
these two nodes.

98
In this example, we join node [7] with its more populous neighbor [17 22 44 45]
and put ‘10’ in between them, to create

[7 10 17 22 44 45]

How many values might there be in this combined node?

• The parent node contributes 1 value.


• The node that underflowed contributes exactly (M-1)/2-1 values.
• The neighboring node contributes somewhere between (M-1)/2 and (M-1)
values.

The treatment of the combined node is different depending on whether the


neighboring contributes exactly (M-1)/2 values or more than this number.

Case 1: Suppose that the neighboring node contains more than (M-1)/2 values. In
this case, the total number of values in the combined node is strictly greater than 1 + ((M-
1)/2 - 1) + ((M-1)/2), i.e. it is strictly greater than (M-1). So it must contain M values or
more.

We split the combined node into three pieces: Left, Middle, and Right, where
Middle is a single value in the very middle of the combined node. Because the combined
node has M values or more, Left and Right are guaranteed to have (M-1)/2 values each,
and therefore are legitimate nodes. We replace the value we borrowed from the parent
with Middle and we use Left and Right as its two children. In this case the parent's size
does not change, so we are completely finished.

This is what happens in our example of deleting 6 from the tree in Figure 87. The
combined node [7 10 17 22 44 45] contains more than 5 values, so we split it into:

• Left = [ 7 10 ]
• Middle = 17
• Right = [ 22 44 45 ]

Then put Middle into the parent node (in the position where the ‘10’ had been) with
Left and Right as its.

Figure 88 Resulting B-tree after deleting 6

99
Case 2: Suppose, on the other hand, that the neighboring node contains exactly
(M-1)/2 values. Then the total number of values in the combined node is 1 + ((M-1)/2 - 1)
+ ((M-1)/2) = (M-1)

In this case the combined node contains the right number of values to be treated
as a node. So we make it into a node and remove from the parent node the value that has
been incorporated into the new, combined node. As a concrete example of this case,
suppose that, in the tree Figure 87, we had deleted 3 instead of 6. The node [2 3]
underflows when 3 is removed. It would be combined with its more populous neighbor [6
7] and the intervening value from the parent (5) to create the combined node [2 5 6 7].
This contains 4 values, so it can be used without further processing. The result would be:

Figure 89 Resulting B-tree in Case 2

It is very important to note that the parent node now has one fewer value. This
might cause it to underflow - imagine that 5 had been the only value in the parent node. If
the parent node underflows, it would be treated in exactly the same way - combined with
its more populous neighbor etc. The underflow processing repeats at successive levels
until no underflow occurs or until the root underflows (the processing for root-underflow is
described next).

Now let us consider the root. For the root to underflow, it must have originally
contained just one value, which now has been removed. If the root was also a leaf, then
there is no problem: in this case the tree has become completely empty.

If the root is not a leaf, it must originally have had two subtrees (because it originally
contained one value). How could it possibly underflow?

The deletion process always starts at a leaf and therefore the only way the root
could have its value removed is through the Case 2 processing just described: the root's
two children have been combined, along with the root's only value to form a single node.
But if the root's two children are now a single node, then that node can be used as the
new root, and the current root (which has underflowed) can simply be deleted.

To illustrate this, suppose we delete 7 from this B-tree (M=5):

Figure 90 Deleting 7 from B-tree (M=5)

100
The node [3 7] would underflow, and the combined node [3 10 18 20] would be
created. This has 4 values, which is acceptable when M=5. So it would be kept as a node,
and ‘10’ would be removed from the parent node - the root. This is the only circumstance
in which underflow can occur in a root that is not a leaf. The situation is this:

Figure 91 Underflow occurring in the root

Clearly, the current root node, now empty, can be deleted and its child used as the
new root.

Watch:

• AVL Trees & Rotations (Self-Balancing Binary Search Trees) By Back To Back SWE
(https://youtu.be/vRwi_UcZGjU)
• Introduction to B-Trees By David Taylog (https://youtu.be/I22wEC1tTGo)
• File Indexes with B-Trees By Jacob Schrum (https://youtu.be/2AD13EPQAms)

Read:

• Section 11.2 Balanced Search Trees (pp. 472-478)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 11.3 AVL Trees (pp. 479-487)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 4.4 AVL Trees (pp. 144-157)


Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

• Section 11.5 (2,4) Trees (pp. 500-509)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 4.7 B-Trees (pp. 168-173)


Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

Review:

1. What is an AVL tree? How is it different from a BST?


2. Why is it important to keep the tree height balanced?
3. What is a B-Tree? How is it different from an AVL tree?

101
Activities/Assessments:
1. Show the result of inserting 3, 1, 4, 6, 9, 2, 5, 7 into an initially empty binary search tree.

2. Given the tree in (1), show the result of deleting the root.

3. Draw the AVL Tree of the following. NOTE: Show the tree before and after rotations.
COMPUTING

4. Using the AVL in (3), add S E A R

5. Draw the AVL Tree of the following. NOTE: Show the tree before and after rotations.
GOLDSTAR

6. Using the AVL in (5), delete S L O T

7. What will be the resulting B- Trees (2-3 trees) for the following items?

a. Insert 112
47

15 86 96

b. Insert 1
59

23 27 86

c. Delete 73
62

20 73

d. Delete 31
67

31 79 95

102
Unit 9 - Graphs

Overview:
In computer science, a graph is an abstract data type that is meant to implement the undirected
graph and directed graph concepts from the field of graph theory within mathematics. In this unit
we will focus on the basic concepts of graphs, its representations and searching in graphs.

Module Objectives:
After successful Completion of this module, you should be able to:

• Discuss and describe a graph


• Discuss the different representations of graphs
• Discuss searching with graphs

Course Materials:

Graphs are mathematical structures that represent pairwise relationships between


objects. This can be presented as graph G = (V, E) where V is a set of vertices and E is a set
of edges. Each edge e in E is a 2-tuple of the form (v, w) where v and w are in V, and e is called
an incident on v and w.

Figure 92 Graph as pairwise relationship

A graph is a flow structure that represents the relationship between various objects. It can
be visualized by using the following two basic components:

• Nodes: These are the most important components in any graph. Nodes are entities
whose relationships are expressed using edges. If a graph comprises 2 nodes A and
B and an undirected edge between them, then it expresses a bi-directional relationship
between the nodes and edge.

103
• Edges: Edges are the components that are used to represent the relationships
between various nodes in a graph. An edge between two nodes expresses a one-way
or two-way relationship between the nodes.

Types of graphs

• Undirected: An undirected graph is a graph in which all the edges are bi-directional
i.e. the edges do not point in any specific direction. The graph in Figure 92 is an
example of an undirected graph.

• Directed: A directed graph is a graph in which all the edges are uni-directional i.e. the
edges point in a single direction.

• Weighted: In a weighted graph, each edge is assigned a weight or cost.

Figure 93 Directed Graph (unweighted at left; weighted at right)

• Cyclic: A graph is cyclic if the graph comprises a path that starts from a vertex and
ends at the same vertex. That path is called a cycle. An acyclic graph is a graph that
has no cycle.

A path is a sequence of vertices connected by edges, and represented as a sequence


in 2 ways:

o (v0, e1, v1, e2, v2,..,vn-1, en, vn) -- alternating vertices and edges
o (v0, v1, v2,..,vn-1, vn) -- vertices only

A graph is connected if, for any vertices v and w, there is a path from v to w.

Figure 94 Unconnected graph (the graph at right is cyclic)

104
A tree is an undirected graph in which any two vertices are connected by only one path.
A tree is an acyclic graph and has N - 1 edges where N is the number of vertices. Each node in
a graph may have one or multiple parent nodes. However, in a tree, each node (except the root
node) comprises exactly one parent node (a root node has no parent). Also, A tree cannot contain
any cycles or self-loops, however, the same does not apply to graphs.

Graph Representations

You can represent a graph in many ways. The two most common ways of representing a
graph is as follows:

• Adjacency Matrix

An adjacency matrix is a VxV binary matrix A, where V is the number of vertices. Element
A(i,j) is 1 if there is an edge from vertex i to vertex j else A(i,j) is 0. A binary matrix is a
matrix in which the cells can have only one of two possible values - either a 0 or 1.

For weighted graph: A(i,j) = w (weight of edge), or positive infinity otherwise.

Figure 95 Adjacency matrix for an unweighted graph (above)


and weighted graph (below)

Adjacency matrix provides constant time access (O(1) ) to determine if there is an edge
between two nodes. Space complexity of the adjacency matrix is O(V2).

• Adjacency List

The other way to represent a graph is by using an adjacency list. An adjacency list is an
array A of separate lists. Each element of the array A(i) is a list, which contains all the
vertices that are adjacent to vertex i.

For a weighted graph, the weight or cost of the edge is stored along with the vertex in the
list using pairs. In an undirected graph, if vertex j is in list A(i) then vertex i will be in list
A(j).

105
The space complexity of adjacency list is O(V + E) because in an adjacency list information
is stored only for those edges that actually exist in the graph. In a lot of cases, where a
matrix is sparse using an adjacency matrix may not be very useful. This is because using
an adjacency matrix will take up a lot of space where most of the elements will be 0,
anyway. In such cases, using an adjacency list is better.

Note: A sparse matrix is a matrix in which most of the elements are zero, whereas a dense
matrix is a matrix in which most of the elements are non-zero.

Figure 96 Adjacency List for unweighted (above) and weighted (below) trees

Graph Traversal

Graph traversal means visiting every vertex and edge exactly once in a well-defined order.
While using certain graph algorithms, you must ensure that each vertex of the graph is visited
exactly once. The order in which the vertices are visited are important and may depend upon the
algorithm or question that you are solving.

During a traversal, it is important that you track which vertices have been visited. The most
common way of tracking vertices is to mark them.

Breadth First Search

There are many ways to traverse graphs. Breadth First Search (BFS) is the most
commonly used approach. BFS is a traversing algorithm where you should start traversing
from a selected node (source or starting node) and traverse the graph layer by layer thus
exploring the neighbor nodes (nodes which are directly connected to source node). You
must then move towards the next-level neighbor nodes.

As the name BFS suggests, you are required to traverse the graph breadthwise as
follows:

• First move horizontally and visit all the nodes of the current layer
• Move to the next layer

106
Figure 97 Moving process in BFS

The distance between the nodes in layer 1 is comparatively lesser than the
distance between the nodes in layer 2. Therefore, in BFS, you must traverse all the nodes
in layer 1 before you move to the nodes in layer 2.

Traversing child nodes

A graph can contain cycles, which may bring you to the same node again while
traversing the graph. To avoid processing of same node again, use a boolean array which
marks the node after it is processed. While visiting the nodes in the layer of a graph, store
them in a manner such that you can traverse the corresponding child nodes in a similar
order.

In the earlier diagram (Figure 97), start traversing from 0 and visit its child nodes
1, 2, and 3. Store them in the order in which they are visited. This will allow you to visit the
child nodes of 1 first (i.e. 4 and 5), then of 2 (i.e. 6 and 7), and then of 3 (i.e. 7) etc.

To make this process easy, use a queue to store the node and mark it as 'visited'
until all its neighbors (vertices that are directly connected to it) are marked. The queue
follows the First In First Out (FIFO) queuing method, and therefore, the neighbors of the
node will be visited in the order in which they were inserted in the node i.e. the node that
was inserted first will be visited first, and so on.

BFS (G, s) //Where G is the graph and s is the source node


let Q be queue.
Q.enqueue( s ) //Enqueue s until all neighbor vertices are marked.

mark s as visited.
while ( Q is not empty)
//Removing that vertex from queue (neighbor will be visited now)
v = Q.dequeue( )

//processing all the neighbours of v


for all neighbors w of v in Graph G
if w is not visited
Q.enqueue( w ) //Stores w in Q to further visit its neighbor
mark w as visited.

107
Depth First Search

The DFS algorithm is a recursive algorithm that uses the idea of backtracking. It
involves exhaustive searches of all the nodes by going ahead, if possible, else by
backtracking.

Here, the word backtrack means that when you are moving forward and there are
no more nodes along the current path, you move backwards on the same path to find
nodes to traverse. All the nodes will be visited on the current path till all the unvisited nodes
have been traversed after which the next path will be selected.

This recursive nature of DFS can be implemented using stacks. The basic idea is
as follows:

• Pick a starting node and push all its adjacent nodes into a stack.
• Pop a node from stack to select the next node to visit and push all its adjacent
nodes into a stack.

Repeat this process until the stack is empty. However, ensure that the nodes that
are visited are marked. This will prevent you from visiting the same node more than once.
If you do not mark the nodes that are visited and you visit the same node more than once,
you may end up in an infinite loop.

DFS-iterative (G, s): //Where G is graph and s is source vertex


let S be stack
S.push( s ) //Inserting s in stack
mark s as visited.
while ( S is not empty):
//Pop a vertex from stack to visit next
v = S.top( )
S.pop( )
//Push all the neighbors of v in stack that are not visited
for all neighbors w of v in Graph G:
if w is not visited :
S.push( w )
mark w as visited

DFS-recursive(G, s):
mark s as visited
for all neighbors w of s in Graph G:
if w is not visited:
DFS-recursive(G, w)

Watch:

• Introduction to Graphs By mycodeschool (https://youtu.be/gXgEDyodOJU)

108
• Graph Data Structure (Representations) By freeCodeCamp.org
(https://youtu.be/DBRW8nwZV-g)
• Graph Search (Depth First Search and Breadth First Search) By HackerRank
(https://youtu.be/zaBhtODEL0w)

Read:

• Graph Algorithms (pp. 379-382)


Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

• Graph Algorithms (Sec. 4.1 & 4.2, pp. 612-629)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 14.3 Graph Traversals (pp. 630-642)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

Review:

1. What is a graph?
2. How are graphs similar to trees? How are they different?
3. If two graph nodes are connected by an edge, what are they called?
4. How can you tell if two graph nodes are adjacent?

Activities/Assessments:
1. Explain how an adjacency matrix is used to represent a graph.

2. Use the following description of an undirected graph and draw the graph:
V(Graph1) = { A, B, C, D}
E(Graph1) = { (A,B), (A,D), (B,C), (B,D) }

3. Described the graph pictured below using the formal graph notation.

V(CoAuthorship) =
E(CoAuthorship) =

109
4. Draw an adjacency matrix representation of the undirected graph shown in (3).

5. Draw an adjacency list representation of the undirected graph shown in (3).

6. Draw the graph for the following adjacency matrix.

7. Let G be an undirected graph whose vertices are the integers 1 through 8, and let the
adjacent vertices of each vertex be given by the table below:

Vertex Adjacent vertices


1 (2, 3, 4)
2 (1, 3, 4)
3 (1, 2, 4)
4 (1, 2, 3, 6)
5 (6, 7, 8)
6 (4, 5, 7)
7 (5, 6, 8)
8 (5, 7)

Assume that, in a traversal of G, the adjacent vertices of a given vertex are returned in the
same order as they are listed in the table above.
a. Draw G.
b. Give the sequence of vertices of G visited using a DFS traversal starting at vertex
1.
c. Give the sequence of vertices visited using a BFS traversal starting at vertex 1.

110
Unit 10 – Hashing and Collision

Overview:
As data structures deals with how data is represented in memory (primary), file structures on the
other hand deals more on how data is represented on secondary storage with emphasis on file
access. In this unit we will discuss the different types of file structures, the different techniques
used in file access and how to resolve some of the issues in relation to file access.

Module Objectives:
After successful Completion of this module, you should be able to:

• Discuss and differentiate the different types of file organization


• Discuss how the concept of mapping is made
• Discuss the different addressing techniques
• Demonstrate the concept of address calculation techniques in

Course Materials:

What is File?

File is a collection of records related to each other. The file size is limited by the size of
memory and storage medium. There are two important features of file:

1. File Activity - specifies percent of actual records which proceed in a single run.
2. File Volatility - addresses the properties of record changes. It helps to increase the
efficiency of disk design than tape.

File Organization

File Structures is the organization of data in secondary storage device in such a way that
minimize the access time and the storage space. A File Structure is a combination of
representations for data in files and of operations for accessing the data. It also allows applications
to read, write and modify data. It might also support finding the data that matches some search
criteria or reading through the data in some particular order.

111
File organization ensures that records are available for processing. It is used to determine
an efficient file organization for each base relation.

For example, if we want to retrieve employee records in alphabetical order of name.


Sorting the file by employee name is a good file organization. However, if we want to retrieve all
employees whose marks are in a certain range, a file is ordered by employee name would not be
a good file organization.

Types of File Organization

There are three types of organizing the file:

1. Sequential access file organization


2. Direct access file organization
3. Indexed sequential access file organization

Sequential access file organization

Storing and sorting in contiguous block within files on tape or disk is called as
sequential access file organization. In sequential access file organization, all records are
stored in a sequential order. The records are arranged in the ascending or descending
order of a key field.

Sequential file search starts from the beginning of the file and the records can be
added at the end of the file. In sequential file, it is not possible to add a record in the middle
of the file without rewriting the file.

Advantages of sequential file

1. It is simple to program and easy to design.


2. Sequential file is best use if storage space.

Disadvantages of sequential file

1. Sequential file is time consuming process.


2. It has high data redundancy.
3. Random searching is not possible.

Direct access file organization

Direct access file is also known as random access or relative file organization. In
direct access file, all records are stored in direct access storage device (DASD), such as
hard disk. The records are randomly placed throughout the file.

112
The records does not need to be in sequence because they are updated directly
and rewritten back in the same location. This file organization is useful for immediate
access to large amount of information. It is used in accessing large databases. It is also
called as hashing.

Advantages of direct access file organization

1. Direct access file helps in online transaction processing system (OLTP) like online
railway reservation system.
2. In direct access file, sorting of the records are not required.
3. It accesses the desired records immediately.
4. It updates several files quickly.
5. It has better control over record allocation.

Disadvantages of direct access file organization

1. Direct access file does not provide backup facility.


2. It is expensive.
3. It has less storage space as compared to sequential file.

Indexed sequential access file organization

Indexed sequential access file combines both sequential file and direct access file
organization. In indexed sequential access file, records are stored randomly on a direct
access device such as magnetic disk by a primary key.

This file have multiple keys. These keys can be alphanumeric in which the records
are ordered is called primary key. The data can be access either sequentially or randomly
using the index. The index is stored in a file and read into memory when the file is opened.

Advantages of Indexed sequential access file organization

1. In indexed sequential access file, sequential file and random file access is
possible.
2. It accesses the records very fast if the index table is properly organized.
3. The records can be inserted in the middle of the file.
4. It provides quick access for sequential and direct processing.
5. It reduces the degree of the sequential search.

Disadvantages of Indexed sequential access file organization

1. Indexed sequential access file requires unique keys and periodic reorganization.
2. Indexed sequential access file takes longer time to search the index for the data
access or retrieval.
3. It requires more storage space.
4. It is expensive because it requires special software.
5. It is less efficient in the use of storage space as compared to other file
organizations.

113
One important development with regard to file access is relative file organization. Relative
file organization allows a user to locate a record in a file with as few accesses as possible (ideally
one). A relative file organization implies a predictable relationship between the key used to
identify an individual record and that individual record’s location in an external file. It is to be
noted however, that the logical ordering of the records need not have any bearings to their
physical sequence. That is, the records do not necessarily appear in order by their key values
physically.

Figure 98 Basic relative file organization

The question now is how would the Nth record be found? When a relative file is
established, the relationship (R) that will be used to translate between key values and physical
addresses is designated. This relationship is a mapping function from key values to addresses
in the file.

R (key value) à address

How is this relationship used?

When a record is to be written into a relative file, the mapping function R is used to
translate the record’s key value to an address, which indicates where the record is to be stored.
When it is necessary to retrieve the record with a particular key value, the mapping function R is
applied to that key value, translating it to the address where the record can be found.

In contrast to sequentially organized files, it is not necessary to access the records of a


relative file in serial order but rather, a particular record occurrence can be accessed directly.
Note that the direct-access character of a relative file organization would mean noting if a relative
file were stored on a serial-access medium such as magnetic tape. Therefore, relative files are
stored in main memory or on direct-access storage devices (DASDs) such as magnetic disks.

114
Addressing Techniques

Let us now delve into the problems of implementing relative files. There are three
fundamental techniques used by mapping functions R, where R (key value) à address:

1. Direct Mapping
a. absolute addressing
b. relative addressing
2. Directory Look up Techniques
3. Address Calculation Techniques

Direct Mapping (Direct Addressing)

Direct mapping is the simplest technique for translating a record key to a storage
address though it is not widely used. The purpose of presenting it here is because an
understanding of the disadvantages of the direct mapping techniques helps us to
understand more on the benefits of the other techniques.

Absolute Addressing

One simple approach to implementing R (key value) à address is to have key


value = address. This mapping function is called absolute addressing where the key value
given to a record is the same as the record’s actual address. The absolute address of the
record is a machine-dependent address consisting of:

1. cylinder number, surface number, and record number, if cylinder addressing is


used.
2. sector number and record number, if sector addressing is used.

Note however, that relocating of the direct file to another part of the disk requires
changing the absolute addresses.

Figure 99 Key position relationship with direct organization on cylinder-addressable devices

Figure 100 Key position relationship with direct organization on sector-addressable devices

115
There are two advantages to this absolute addressing scheme:

1. Mapping function R is very simple; and


2. Given a record’s key value, no processing time is required to determine the
record’s location on secondary storage.

The disadvantages of the scheme, however, nearly always outweigh the


advantages:

1. Logical and physical considerations are not independent with absolute addressing.
This means that the user must know how the records are stored physically.
2. Users generally do not supply the appropriate types of key values.
3. Absolute addresses are device dependent. As the device upon which the file
resides is updated or change, more likely that key values would also need to be
changed.
4. Absolute addresses are address-space-dependent. As the relative file is
reorganize for purposes of enlarging the address space, it is likely that key values
would need to be changed.

Relative Addressing

Another simple approach to implementing R (key value) à address that essentially


has the same advantages with that of direct addressing, but which removes some of the
disadvantages of absolute addressing is called relative addressing, where key value =
relative address.

Once a key-position relationship is established, the position of the record in the file
is specified as a record number relative to the beginning of the file, where the first record
is record number 0.

A relative file with space for N records contains positions with relative record
numbers 0, 1, 2, …, N – 1 where the i-th record has the relative record number of i – 1.

Figure 101 The key-position must be a predictable


relationship so that direct access to the record is possible
once it is stored in the relative file.

One of the simplest mapping functions is one of which the key value for the record
is the same as the record's relative address or position in the external file. The primary

116
advantage to this approach is that there is no processing time needed to determine the
record's relative address in the file when the record is to be accessed.

Example.

For a file of N records, the key values are in the range of 1 to N. So for a relative
file containing 200 key values in the range 1 to 200, a part number 156 is stored in the
position in the file with a relative record number 156.

Those values of consecutive keys that differ by only one are called dense keys.
For a range of key values that do not start with 1 or 0, the address could be the part
number less a constant.

For data with a large range of key values that are not dense, such as social security
numbers as key values, the programmer may have to allocate a large relative file in order
to use the key as the relative record number. But with this kind of process, a high
percentage of the file space may be wasted, or empty.

Relative addressing presents two advantages: (1) the mapping function R is very
simple; and (2) given a record’s key value, no processing time is required to determine the
record’s location on secondary storage.

Though relative addresses are not nearly so device-dependent as are absolute


addresses, they are however still address-space-dependent like absolute addresses.

Directory Lookup Techniques

After direct mapping, the next simplest approach to implementing R (key value) à
address is directory lookup. This approach takes advantage of the benefits of direct
mapping while eliminating its disadvantages, though it introduces a couple of new costs.

Figure 102 Relative file with directory structured as a table

The basic idea behind the concept is to keep a table or directory of key
value:address pairs (or key value:relative address pairs). In order to find a record on a

117
relative file, one locates its key value in the directory and then the indicated address is
used to find the record on storage.

The directory is implemented as an array of key value:address records (Figure


102). Here the entries are sorted by key value so that a binary rather than sequential
search can be used to locate a target entry more rapidly. Note that the relative file entries
are not in sorted order.

Thus the advantages of the directory lookup scheme to implement R (key value)
à address includes:

1. A record’s location can be determined with essentially no processing, once the key
value is found in the directory.
2. Keys can be represented in intuitively understandable forms, since they are
translated “internally” to addresses.
3. Logical-physical independence is achieved, because the key values are address-
space-independent. If the relative file is reorganized, address entries in the
directory must be changed, but the key values will not be affected.

Figure 103 Relative file with directory structured as a tree

Address Calculation Techniques

Another approach to implementing R (key value) à address is to perform a


calculation on the key value such that the result is a relative address. This technique can
be used alone or in conjunction with the directory lookup scheme.

The idea of address calculation is to apply a function that translates a relatively


large domain of possible key values to a relatively small range of relative address values.
This is normally termed as hashing and the calculation applied to a key value to obtain
an address is called a hash function.

It is sometimes called randomizing scheme since the function randomly selects a


relative address for a specific key value without regard to the physical sequence of records
in the file.

118
Figure 104 A small phonebook as a hash table

Some of the most widely used hashing schemes are:

1. Prime-Number Division Remainder


2. Digit Extraction
3. Folding
4. Radix Conversion
5. Mid-Square

Many times, a key value contains characters that make it difficult to manipulate the
key to an address. A nonnumeric key value can be converted to a numeric value by using
the numerically coded internal representation of each character in the key.

Example.

The numeric ASCII or EBCDIC representation of each letter can be used in place
of each character to provide a numeric key.

Figure 105 Nonnumeric key converted to a numeric value

The primary problem with most hashing functions is that the function does not
always produce unique relative addresses. This situation, where R (K1) = R (K2) but K1
<> K2 is called a collision; when two different keys are hashed and the results are in the
same relative address.

K1 and K2 here are considered synonyms.

Most hashing functions can be improved by allocating more file space than the
number needed to store the keys.

119
The relationship between file space and the number of keys needed is called the
load factor. The load factor, also called packing factor or packing density, is the ratio of
the number of key values to be stored versus the number of file positions.

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑘𝑒𝑦 𝑣𝑎𝑙𝑢𝑒𝑠


𝑙𝑜𝑎𝑑 𝑓𝑎𝑐𝑡𝑜𝑟 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑖𝑙𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑠

All hash functions begin to work quite poorly when the target file becomes almost
full. As a general guideline, a load factor of 70 or 80 % is about as great as can be
tolerated with reasonable performance.

Example.

Twenty percent of a file with a load factor of 80 percent is empty. This kind of
arrangement reduces the number of collisions over a file with a load factor of 100 percent.
The number of file locations needed for a load factor of 80 percent is

0.80 𝑥 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑖𝑙𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑠 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑘𝑒𝑦 𝑣𝑎𝑙𝑢𝑒𝑠

or

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑘𝑒𝑦 𝑣𝑎𝑙𝑢𝑒𝑠


𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑖𝑙𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑠 =
80
or

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑖𝑙𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑠 = 1.25 𝑥 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑘𝑒𝑦 𝑣𝑎𝑙𝑢𝑒𝑠

The smaller the load factor, the less dense the file is, and the less chance of
collisions. But the disadvantage of having a low load factor (less than 50 percent) is by
having a great amount of file space that is wasted being empty.

Hashing Schemes

• Prime-Number Division Remainder

The basic idea of this approach is to divide a key by an appropriate number, then
to use the remainder as the relative address for the record. If the divisor (N) is the number
of relative positions in the file, this method can be used to map the keys into N record
locations

If a large range of key values is being mapped into a much smaller range, collisions
may occur between keys that yield the same remainder. Researchers generally find that
choosing a prime number that is close to the number of record locations in the file results
in relative addresses that are less prone to collision. Also, choosing a prime number as
divisor is better than an even divisor.

120
Figure 106 Example using divisor 4001

• Digit Extraction

The distribution of this hashing function is dependent on the distribution of the key
values. In digit extraction, the key values are analyzed to determine which digit positions
of the key are more evenly distributed. The more evenly distributed digit positions are
assigned from right to left.

The digit values in the chosen digit positions are then extracted, and the resulting
integer is used as the relative address. If none of the digit positions has a uniform
distribution, more collisions occur. Note however, that digit extraction requires that the
key values be known in order to determine the digit positions for extraction.

Figure 107 Example using 2nd, 4th, 7th and 8th positions

• Folding

Using the folding scheme, the key value is split into two or more parts and then
summed, or subjected to AND or XOR, as if each part were an integer. If the resulting
address contains more digits than the highest address in the file, the excess high order
digits are truncated.

121
Folding is useful for converting keys with a large number of digits to a smaller
number of digits so the address fits into a word of memory. Folding is easier to compute
than some of the other hashing schemes, but it can produce erratic results.

Figure 108 Example using 1234 + 5678 + 9

• Radix Conversion

The key value is interpreted as having a different base or radix and is converted to
a decimal number. Excess high order digits are truncated to yield the required address
size of the relative file.

Example.

Consider a five-digit key and a relative file with an address range from 0 to 9999.

38652 considered as value in base 11 to be converted to a decimal.

3 x 114 + 8 x 113 + 6 x 112 + 5 x 111 + 2 x 110 = 55354

truncating the high order 5 to yield 5354.

• Mid-Square

The mid-square method involves treating the key as a single large number,
squaring the number, and extracting whatever number of digits is needed from the middle
of the result.

Figure 109 Example extracting 4 middle digits

122
Other Hashing Schemes

A hashing function that provides one to one mapping from a key into a position is
termed a perfect hashing function since no collisions occur. A perfect hashing function
requires the knowledge of the set of key values in order to generate a perfectly uniform
distribution of addresses. Most perfect hashing functions require extensive manipulation
of the key set and applied to static keys sets only.

Perfect Hashing Functions

1. Quotient Reduction
2. Remainder Reduction
3. Associated Value Hashing
4. Reciprocal Hashing

Approaches to the Problem of Collisions

Because hashing functions maps a relatively large key-value space to a relatively small
address space, there are great possibilities that collisions would occur. Several techniques are
available in handling collisions. These techniques can be classified into two groups, namely: open
addressing and chaining.

Two classes for handling collisions

1. Open addressing
a) Linear probing
b) Two pass file creation with hashing
c) Separate overflow area
d) Double hashing
2. Chaining
a) Synonym chaining
b) Bucket addressing

For all of the techniques examined, n positions are allocated in the file (numbered 0, 1,.. ,
n-1) where each positions contains a flag, which is initialized as empty. The flag indicates that
nothing has yet been stored in that position. Once information has been stored in a particular
position, the flag is occupied.

Linear Probing

In linear probing, the file is searched as a circular file to use as many empty
positions as possible. In a search of a noncircular file, a key hashes to position i, which is
occupied, and positions i + 1, … , n – 1 are searched for an empty position. The back
portion of the file tends to fill up, leaving more empty positions at the front portion of the
file.

In a search of a circular file, the entire file is searched, starting at position i + 1,


and the synonyms are distributed throughout the file, not clustered at the back portion.

123
Storing the synonym in the nearest available position is simple but accessing the synonym
later may require a sequential search of the entire file.

Figure 110 Linear Probing example

Two-Pass File Creation

The displacement problem of linear probing usually occurs when the relative file is
initially created with one-pass algorithm. A one-pass algorithm hashes the key of each
record to be loaded to the relative file. It stores a record in the hashed address, if empty,
and stores synonyms in the first available position nearest the hashed address. A two-
pass algorithm reduces the displacement problem by loading hashed positions in the first
pass and loading synonyms in the available positions in the second pass.

The first pass of the two-pass algorithm hashes the key of each record to be loaded
to the relative file and stores the record in the hashed address, if empty. The synonyms
are stored in an output file that will be used during the second pass. The second pass
inputs the synonyms separated by the first pass, hashes the key of each record, and stores
the synonym in the first available position nearest the hashed address.

The two-pass algorithm allows more records to be stored in the positions to which
they were hashed, thereby reducing the amount of linear probing that must be done later
to access any record without synonyms. The two-pass algorithm works well if the set of
key values is known before the file is created. Any additions to the file after the initial
loading may produce displacement problems similar to those produced by the one-pass
algorithm.

Separate Overflow Area

Another way for loading the relative file initially is storing synonyms sequentially in
a separate overflow area. Using a separate overflow area eliminates the displacement
problem altogether and allows use of the one-pass loading algorithm. Accessing
synonyms entails sequentially searching all the records in the overflow area, which is
probably not as large as the hashed positions of the relative file.

Depending on the size of the overflow area, the searching time for locating
synonyms could be lengthy. Accessing a record that is not in the hashed address involves

124
sequentially searching the overflow area for the record. The average number of probes
for a file with synonyms stored in a separate overflow area depends on the number of
synonyms.

Figure 111 Use of a separate overflow area

Double Hashing

When a separate overflow area is established for synonyms, the synonyms are
stored in the next available overflow position. Rather than searching sequentially for the
next available overflow position or for the synonym, the synonym is often hashed to an
overflow record position. This method of hashing synonyms to an overflow area using a
second hash function is called double hashing.

Under this method, the record is initially hashed to a relative file position. If the
hashed address is not available, a second hash function is applied and added to the first
hashed value, and the synonym is hashed to the overflow area. If the hashed overflow
area is not available, a secondary collision occurs; then linear probing is used. Access to
synonyms are more direct than searching the overflow area sequentially. The synonyms
involved in secondary collisions are the only records that have to be accessed
sequentially.

Synonym Chaining

Synonym chaining is a technique used with and without a separate overflow area
to reduce the number of records examined when searching for a synonym. Initial creation
of a relative file that uses synonym chaining involves hashing the key of the record to be
loaded and retrieving the record in the hashed address. If the hashed address is occupied,
then the synonym is stored using one of the techniques discussed previously.

The record in the initial hashed address contains a link to the position containing
the synonym. If several synonyms hash to one address, all the synonyms are linked
together even though the synonyms are not physically stored contiguously. Access to any
synonym involves searching only the linked synonyms rather than searching the whole file
(if linear probing is used) or the whole overflow area (if a separate overflow area is used).
Each record is enlarged to include a link field, but the access of a synonym record is more
direct than what have been discussed before.

125
Figure 112 Synonym chaining

Bucket Addressing

This technique involves the allocation of a group of record positions, called a


bucket, to be associated with a particular hash address. The primary idea behind bucket
addressing is to allocate the bucket size of each hash address so it is as large as the
maximum number of synonyms for any hash address.

All synonyms for a particular hash address are stored sequentially in the bucket
for the particular hash address. Accessing a record in a relative file that uses bucket
addressing involves hashing the key, then sequentially searching the bucket of records at
the hashed address. The number of records that must be examined to find a record is
limited by the bucket size; the algorithm does not have to search the whole file or the
whole overflow area.

Figure 113 Bucket Addressing

126
The major problem with bucket addressing is the amount of space wasted when
the number of synonyms for any hashed address varies greatly. Since the bucket size is
determined by the maximum number of synonyms for any hashed address (each hashed
address has the same size bucket), those hashed addresses having a lower number of
synonyms also have buckets with unused space – space that is wasted. A secondary
problem with bucket addressing is how to determine the appropriate bucket size. Suppose
data in a relative file were unavailable for analysis prior to the creation of the file, and the
bucket size is smaller than the maximum number of synonyms as a result. All the
synonyms that hash to a bucket address cannot be stored in the bucket, so hashing
collisions must be resolved by using one of the techniques discussed. One solution is to
set a very large bucket size, but this solution is at the expense of wasted space.

A good solution to hashing collisions in bucket addresses is consecutive spill


addressing. If hashing collisions occur, the nearest bucket with available space is used
to store the synonym. The term is derived from the fact that when a bucket becomes full,
the full bucket spills over into the next bucket. Searching for the synonym later involves
the same problems encountered with linear probing: the synonym is not located in the
hashed addresses, so a sequential search of consecutive buckets must take place.

Watch:

• Understanding File Structure By Papercoinagevideo (https://youtu.be/anCLeogSTE0)


• What is Hashing? By Lisk (https://youtu.be/2BldESGZKB8)
• Introduction to Hashing By Kindson The TechPro (https://youtu.be/2bAeF2NhO-g)
• Hash Table and Hash Functions By Computer Science (https://youtu.be/KyUTuwz_b7Q)

Read:

• Chapter 5 – Hashing
Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition

• Section 10.2 Hash Tables (pp. 410-416)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

• Section 3.4 Hash Tables (pp.458- 477)


Sedgewick, Robert, Wayne, Kevin (2011). Algorithms 4th Edition

• Section 10.2.2 Collision Handling Schemes (pp. 417-421)


Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition

Review:

1. What are the different types of accessing a file?


2. What is hashing? How can it help in solving the problem with file access?
3. What are the different ways to implement hashing?
4. What is a collision? How do we handle the problem of collision?

127
Activities/Assessments:
1. Complete the table below

Prime Number
Radix
Division Digit Extraction Mid Square
Folding Conversion
Key Value Remainder (3rd and 2nd (2nd and 3rd
(123+45) (Base 9 to
Method (PN = digits) digits squared)
Base 10)
97)
12345 26 32 68 83 29
14344 85 34 87 40 49
24680 42 64 26 56 16
42577 91 52 2 57 25
54321 1 34 64 3 49

2. From the table above, how many synonyms are there?

3. Given a hash table with m=11 entries and the following hash function h1 and step function
h2:
h1(key) = key mod m
h2(key) = {key mod (m-1)} + 1

Insert the keys {22, 1, 13, 11, 24, 33, 18, 42, 31} in the given order (from left to right) to
the hash table using each of the following hash methods:
a. Chaining with h1 ⟹ h(k) = h1(k)
b. Linear-Probing with h1 ⟹ h(k,i) = (h1(k)+i) mod m
c. Double-Hashing with h1 as the hash function and h2 as the step function
⟹ h(k,i) = (h1(k) + ih2(k)) mod m

Hash Table

Chaining Linear Probing Double Hashing


0
1
2
3
4
5
6
7
8
9
10

128
129

You might also like