0% found this document useful (0 votes)
176 views428 pages

Data Structures

The document introduces the concepts of data structures and algorithms. It provides examples of common data structures like arrays, linked lists, stacks and queues. It also discusses algorithms for operations on these data structures like sorting, searching and hashing. The document outlines the process for designing and implementing algorithms using programming tools. It provides examples of algorithms for calculating greatest common divisor and representing connectivity between nodes. It specifically describes the quick-find algorithm for connectivity using an array representation.

Uploaded by

Sara Maghsoudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views428 pages

Data Structures

The document introduces the concepts of data structures and algorithms. It provides examples of common data structures like arrays, linked lists, stacks and queues. It also discusses algorithms for operations on these data structures like sorting, searching and hashing. The document outlines the process for designing and implementing algorithms using programming tools. It provides examples of algorithms for calculating greatest common divisor and representing connectivity between nodes. It specifically describes the quick-find algorithm for connectivity using an array representation.

Uploaded by

Sara Maghsoudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BM267 - Introduction to Data

Structures

1. Introduction

1/64
Objectives

• Learn most important basic data structures used


in programming computers. Such as Arrays,
Linked List, Queues, Stacks, Trees…
• Learn most important basic algorithms used on
these data structures. Such as Sorting, Searching,
and Hashing algorithms …
• Learn basic tools to measure/compare the
efficiency of these algorithms

BLM 267 2/64


Programming Tools

There will be programming assignments and


labs.
• Brush up your C skills, -- especially:

• Built-in data types


• Arrays
• Structures, structure operations
• Pointers, pointer operations
• Dynamic memory allocation
• void and void* data types
BLM 267 3/64
Definitions
• An “Algorithm” is method for solving problems that are suited for
computer implementation.
• An “Algorithm” is a sequence of instructions for solving a problem for
obtaining a required output for any legitimate input in a finite amount of
time.

BLM 267 4/64


Definitions-2
• A “data structure” is a way to store and
organize data in order to facilitate access
and modifications.
• Algorithm + Data Structures = Program

BLM 267 5/64


Design process - overview

• Analyze the problem


• Determine I / O data
• Determine relevant data structures
• Determine the operations needed on data
structures.
• Estimate the approximate space/time
requirements
• Re-analyze the problem.
BLM 267 6/64
Implementation - overview

• Choose a programming Language


• Determine the data structure implementation
method (Consecutive storage locations? Arrays?
Dynamic memory?)
• Implement data structures
• Implement the algorithm
• Test and modify to improve the space/time
requirements

BLM 267 7/64


Example-1: Greatest Common Divisor
• Greatest common divisor( gcd( m,n) ) is defined
as the largest integer that divides both m and n,
where m and n are both non-zero positive integers,
evenly i.e with a remainder of zero.
• 1st Algorithm:Calculating the common prime factors
 Step 1 Find the prime factors of m.
 Step 2 Find the prime factors of n.
 Step 3 Identify all the common factors in the both sets.
 Step 4 Compute the product of the all the common
factors and return it as the gcd of the numbers given.

BLM 267 8/64


Example-1: Greatest Common Divisor(2)

For the numbers 60 and 24, we get


60 = 2*2*3*5
24 = 2*2*2*3
gcd(60, 24 ) = 2*2*3 = 12

BLM 267 9/64


Example-1: Greatest Common Divisor(3)
• 2nd Algorithm: Consecutive integer checking
algorithm.
 Step 1 Assign the value of min{ m, n} to t
 Step 2 Divide m by t. If the remainder of this division is
0, go to Step 3; otherwise go to Step 4.
 Step 3 Divide n by t. If the remainder of this division is
0, return the value of t as the answer and stop;
otherwise proceed to Step 4.
 Step 4 Decrease the value of t by 1. Go to step 2.
For the numbers 60 and 24:
Step 1 t= 24;
Step 2 Go to Step 4
Step 2 Now t=23;
BLM 267 10/64
Example-1: Greatest Common Divisor(4)
Finally t becomes 12;
Step 2 remainder of the division is zero
Step 3 also the remainder of the division is zero
and return the t value which is 12 and stop
execution.

BLM 267 11/64


Example-1: Greatest Common Divisor(5)
3rd Algorithm:Euclid’s algorithm.
 Step 1 if n=0, return the value of m as the answer and stop; otherwise
proceed to step 2.
 Step 2 Divide m by n and assign the value of the remainder to r
 Step 3 Assign the value of n to m and value of r to n. Go to Step1.
• Implementation of Euclid’s algorithm
int Euclid( int m, int n){
int r;
while (n != 0 ){
r = m mod n;
m=n;
n=r;
}
return m;}

BLM 267 12/64


Example-2: Connectivity
• Assume: a set of nodes (representing cities,
nets of networks, circuit board connections...)
• Given: pairs of nodes: p-q denotes a
connection between p and q
• The relation ‘–’ is symmetric:
If p-q is true, then q-p is true.
• The relation ‘–’ is transitive:
If p-q and q-r are true, then p-r is true.

BLM 267 13/64


Example-2: Connectivity (2)

BLM 267 14/64


Example-2: Connectivity (3)

1-2
2-7
3-4
7-4
9-7
8-7
6-8
...

BLM 267 15/64


Example-2: Connectivity (4)

(Two disconnected sets of nodes)

BLM 267 16/64


Example-2: Connectivity (5)
Problem: Filter out redundant pairs – print out a
pair only if it provides new information.
Input 3-4 4-6 6-0 3-6 4-0 4-1 1-3 1-5
Output 3-4 4-6 6-0 4-1 1-5
6
4 5

1
3
0
BLM 267 17/64
Example: Connectivity (6) - Abstract operations

Each time we get a new pair:


• Determine whether it represents a new
connection (find operation)
• Add the information represented by the pair to
the ‘knowledge’ base (union operation)

BLM 267 18/64


Example: Connectivity (7) - Data Structure

We will:
• Assume an upper limit for the number of nodes, N.
• Use an array that has N elements (one int per node).
• Nodes are numbered (0..N-1), corresponding to array
indices.

0 1 2 3 N-2 N-1

BLM 267 19/64


Example: Connectivity (8) - Representing sets
• Initially, array elements must indicate that all nodes are
disconnected. (No two array elements will have the same value,
each will contain its own index)
• This algorithm assumes that p and q are connected if and only if
the pth(id[p]) and qth(id[q]) entries are equal.
• To implement the union for p and q, we go through the array,
changing all the entries with the same name as p to have the
same name as q.

0 1 2 3 . . . . N-2 N-1
0 1 2 3 N-2 N-1

BLM 267 20/64


Example: Quick-Find(1) - Initialization

N = 10000
ID[N] is an array of int
FOR (i = 0 TO N-1) ID[i]  i

#define N 10000;
int i;
int ID[N];
for (i = 0; i < N; i++)
ID[i] = i ;
BLM 267 21/64
Example: Quick-Find(2) -Operations

WHILE (a new pair p, q exists)


{
IF ID[p] == ID[q]
{
 Find operation
Connection already noted, no output
}
ELSE
{
Union operation
Connection not noted, add, output
}
}

BLM 267 22/64


Example: Quick-Find(3) -Operations

N = 10000
ID[N] is an array of int
FOR (i = 0 TO N-1) ID[i] = i
WHILE (a new pair p-q exists)
{
IF ID[p] == ID[q]  Find operation
{ continue }
ELSE
{ FOR i=0 to N-1 
IF ID[i] == ID[p] Union operation
THEN ID[i] = ID[q] 
Output p-q
}
BLM 267 23/64
}
Example: Quick-Find(4)

Initial 0 1 2 3 4 5 6 7 8 9

34 0 1 2 4 4 5 6 7 8 9

49 0 1 2 9 9 5 6 7 8 9

80 0 1 2 9 9 5 6 7 0 9

23 0 1 9 9 9 5 6 7 0 9

56 0 1 9 9 9 6 6 7 0 9
BLM 267 24/64
Example: Quick-Find(5)

29 0 1 9 9 9 6 6 7 0 9

59 0 1 9 9 9 9 9 7 0 9

73 0 1 9 9 9 9 9 9 0 9

48 0 1 0 0 0 0 0 0 0 0

56 0 1 0 0 0 0 0 0 0 0

02 0 1 0 0 0 0 0 0 0 0
BLM 267 25/64
Example: Quick-Find(6)

61 1 1 1 1 1 1 1 1 1 1

BLM 267 26/64


Example: Quick-Find(7)

0 1 2 3 4 5 6 7 8 9 The initial
state
34
0 1 2 4 5 6 7 8 9

49

0 1 2 9 5 6 7 8

3 4

BLM 267 27/64


Example: Quick-Find(8)
80
0 1 2 9 5 6 7
8 3 4

23
0 1 9 5 6 7

8 2 3 4

BLM 267 28/64


Example: Quick-Find(9)
56
0 1 9 6 7
8 2 3 4 5

29 Do nothing

59
0 1 9 7

8 2 3 4 5 6

BLM 267 29/64


Example: Quick-Find(10)
73
0 1 9

8 2 3 4 5 6 7

48
0 1

8 2 3 4 5 6 7 9

56 Do nothing

02 Do nothing

BLM 267 30/64


Example: Quick-Find(11)

61
1

0 2 3 4 5 6 7 8 9

BLM 267 31/64


Example: Quick-Find(12) - Analyzing Quick-find

The running time of the algorithm depends on


actual data.
Find: one operation (that’s why it’s called quick-find)

Union: N operations (at least)

Assuming a total of M union operations are


required, the running time of the algorithm will be
at least MN.

BLM 267 32/64


Example: Quick-Find(13) - Analyzing Quick-find

FOR (i = 0 TO N-1) ID[i] = i


WHILE (a new pair p-q exists) (once for each pair)
{
IF ID[p] == ID[q] (once for each pair)
continue
Else (assume M unions)
{ FOR i=0 to N-1 (MN times)
IF ID[i] == ID[p] (MN times)
THEN ID[i] = ID[q] (depends on data)
Output p-q (depends on data)
}
} BLM 267 33/64
Example: Connectivity - Another approach

Let’s try to find a better (more efficient) algorithm


that solves the same problem ...
... Without increasing space requirements.

BLM 267 34/64


Example: Quick–union – Data Structures

• Same data structure as previous algorithm.


0 1 2 3 . . . . N-2 N-1
0 1 2 3 N-2 N-1

• Connections are to be visualized as tree branches.


• Array elements indicate whether the nodes are in the
same set by pointing to the same root node.
• Initially, array elements must indicate that all nodes are
disconnected. (Each array element contains its own
index)
BLM 267 35/64
Example: Quick–union(3) -Operations

N = 10000
ID[N] is an array of int
FOR (i = 0 TO N-1) ID[i] = i
WHILE (a new pair p, q exists)
{
FOR (i=p; i != ID[i]; i = ID[i]) 
FOR (j=q; j != ID[j]; j = ID[j]) Find
IF i != j 
THEN { ID[i] = j, output p-q }
}
 Union operation
BLM 267 36/64
Example: Quick–union(4)

Initial 0 1 2 3 4 5 6 7 8 9

34 0 1 2 4 4 5 6 7 8 9

49 0 1 2 4 9 5 6 7 8 9

80 0 1 2 4 9 5 6 7 0 9

23 0 1 9 4 9 5 6 7 0 9

56 0 1 9 4 9 6 6 7 0 9
BLM 267 37/64
Example: Quick–union(5)

29 0 1 9 4 9 6 6 7 0 9

59 0 1 9 4 9 6 9 7 0 9

73 0 1 9 4 9 6 9 9 0 9

48 0 1 9 4 9 6 9 9 0 0

56 0 1 9 4 9 6 9 9 0 0

02 0 1 9 4 9 6 9 9 0 0
BLM 267 38/64
Example: Quick–union(6)

61 1 1 9 4 9 6 9 9 0 0

BLM 267 39/64


Example: Quick–union(7)

0 1 2 3 4 5 6 7 8 9 The initial
state
34

0 1 2 4 5 6 7 8 9

49

0 1 2 9 5 6 7 8

3 BLM 267 40/64


Example: Quick–union(8)
80

1 2 9 5 6 7 0

4 8

3
23
1 9 5 6 7 0
2 4 8

BLM 267 41/64


Example: Quick–union(9)
56
1 9 6 7 0
2 4 5 8

3
29 Do nothing

59

1 9 7 0

2 4 6 8

3 5
BLM 267 42/64
Example: Quick–union(10)
73

1 9 0

2 4 6 7 8

3 5

48 0

1 9 8

2 4 6 7

3 5
BLM 267 43/64
Example: Quick–union(11)
56 Do nothing

02 Do nothing

61 1
0

9 8

2 4 6 7

3 5

BLM 267 44/64


Example: Quick–union(12) - Analyzing Quick-union

The union operation does not have to go through


N nodes, therefore it is faster than Quick-find.
Find operation is slower.

We can say that the algorithm is more efficient


than the previous version in the average case
(only after some analysis using the tools of the
coming lectures).

BLM 267 45/64


Worst-case Behavior of Algorithms

Some input sequences will cause the algorithm


to operate slower.
Such situations are called the worst-case.
Worst-case analysis of algorithms specify the
worst-case behavior of algorithms.
Algorithms are usually compared using the
worst and the average case behavior.

BLM 267 46/64


Example: Quick–union(13) - Worst-case

Suppose input pairs arrive in the order:


0-1 , 1-2 , 2-3 , 3-4 , 4-5 ...

The find operation for object 0 has to follow N-1 pointers.

BLM 267 47/64


Example: Quick–union(14) - Worst-case

The average number of pointers followed for the first N


pairs:
(0 + 1 + 2 .... + (N-1)) / N = (N-1) / 2

If N is large, similar situations may occur for many


subsets of nodes.

BLM 267 48/64


Example: Quick–union(15) - Worst-case

Note that we (arbitrarily) connect the first node (p) to the


second (q). This may cause long chains of nodes.
If we chose to connect the second node (q) to the first (p)
the analysis result would not change (the sequence could
be 1-0 , 2-1 , 3-2 , 4-3 , 5-4 ...)
We need to flatten the trees so that the union operation
could always be completed by going through a smaller
number of pointers.

BLM 267 49/64


Example:Weighted quick union

Use a second array to keep track of the number of nodes


in each tree.
Initial: 0 1 2 3 . . . . N-2 N-1
0 1 2 3 N-2 N-1

Size 1 1 1 1 . . . . 1 1
0 1 2 3 N-2 N-1

BLM 267 50/64


Example: Weighted quick union(2)

If pair p-q needs to be added:


• If size(p) < size(q) then make node p a subtree of q
• Otherwise make node q a subtree of p
(then update node counts accordingly)

01 0 0 2 3 . . . . N-2 N-1

2 1 1 1 . . . . 1 1

BLM 267 51/64


Example: Weighted quick union(3)
N = 10000, ID[N] SZ[N] are arrays of int
FOR (i = 0 TO N-1) { ID[i] = i SZ[i] = 1 }
WHILE (a new pair p, q exists)
{
FOR (i=p; i != ID[i]; i = id[i])
FOR (j=q; j != ID[j]; j = id[j])
IF i == j THEN continue
IF SZ[i] < SZ[j]
THEN { ID[i] = j SZ[j] += SZ[i] }
ELSE { ID[j] = i SZ[i] += SZ[j] }
output p-q
} BLM 267 52/64
Example: Weighted quick union(4)

Initial 0 1 2 3 4 5 6 7 8 9

Size 1 1 1 1 1 1 1 1 1 1

34 0 1 2 3 3 5 6 7 8 9

1 1 1 2 1 1 1 1 1 1

49 0 1 2 3 3 5 6 7 8 3

1 1 1 3 1 1 1 1 1 1
BLM 267 53/64
Example: Weighted quick union(5)

80 8 1 2 3 3 5 6 7 8 3

1 1 1 3 1 1 1 1 2 1

23 8 1 3 3 3 5 6 7 8 3

1 1 1 4 1 1 1 1 2 1

56 8 1 3 3 3 5 5 7 8 3

1 1 1 4 1 2 1 1 2 1
BLM 267 54/64
Example: Weighted quick union(6)

29 8 1 3 3 3 5 5 7 8 3

1 1 1 4 1 2 1 1 2 1

59 8 1 3 3 3 3 5 7 8 3

1 1 1 6 1 2 1 1 2 1

73 8 1 3 3 3 3 5 3 8 3

1 1 1 7 1 2 1 1 2 1
BLM 267 55/64
Example: Weighted quick union(7)

48 8 1 3 3 3 3 5 3 3 3

1 1 1 9 1 2 1 1 2 1

56 8 1 3 3 3 3 5 3 3 3

1 1 1 9 1 2 1 1 2 1

02 8 1 3 3 3 3 5 3 3 3

1 1 1 9 1 2 1 1 2 1
BLM 267 56/64
Example: Weighted quick union(8)
61 8 3 3 3 3 3 5 3 3 3

1 1 1 10 1 2 1 1 2 1

BLM 267 57/64


Example: Weighted quick union(9)

0 1 2 3 4 5 6 7 8 9 The initial
state
34

0 1 2 3 5 6 7 8 9

49

0 1 2 3 5 6 7 8

4 9
BLM 267 58/64
Example:Weighted quick union(10)
80

1 2 3 5 6 7 8

4 9 0

23

1 3 5 6 7 8

4 2 9 0

BLM 267 59/64


Example: Weighted quick union(11)
56

1 3 5 7 8

4 2 9 6 0

29 Do nothing

59

1 3 7 8

4 2 9 5 0

6
BLM 267 60/64
Example:Weighted quick union(12)
73

1 3 8

4 2 9 5 7 0

6
48

1 3

4 2 9 5 7 8

6 0

BLM 267 61/64


Example: Weighted quick union(13)
56 Do Nothing

02 Do Nothing

61

1 4 2 9 5 7 8

6 0

BLM 267 62/64


Example: Weighted quick union(14)

Maximum number of pointers that must be traversed


when the tree has 2n nodes is n.
When two trees of 2n nodes are merged, we get a tree
of 2n+1 nodes, max of number of pointers that must be
traversed become n+1.
The weighed quick-union algorithm follows at most
2log2N = 2 lg N pointers to determine whether two of
N objects are connected (much better than (N-1))

BLM 267 63/64


BLM-267 to Data
BM267 - Introduction
Structures
3. Principles of Algorithm Analysis

BLM 267 1
Analysis Framework
• There are two kinds of algorithm efficiency: “time
effinciency” and “space effinciency”. Time
efficiency indicates that how fast an algorithm
runs; space efficiency deals with amount of space
it needs.
• Nowadays the amount of extra space required by
an algorithm is typically not of as much concern,
due to the improvements in memory capacity.
• We are going to concentrate on time efficiency but
analytical framework is applicable to analyzing
space efficiency as well.
BM 267 2
Measuring an Input’s size
• It is logical to investigate an algorithm’s effinciency as a
function of some parameter “n” indicating the algorithm’s
input size.

• The input size will be the size of the lists or arrays for
problems such as sorting, searching, or finding the smallest
element.

BM 267 3
Units for Measuring Running time
• First approach: We can use time measurement ( second, or
milllisecond) to measure an algorithm’s running time.
• There are drawbacks of this approach, such as speed of the
computer, quality of programmer, the compiler used in
generating the machine code.
• Second approach: We can count the number of times each
algorithm’s operations is executed. The thing to do is to
identify the most important operation of the algorithm,
called the basic operation, the operation contributing the
most of the total running time, and compute the number of
times the basic operaion is executed.
• As a rule, it is not difficult to identify the basic operation
of an algorithm: it is usually the most time-consuming
operation in the algorithm’s innermost loop.

BM 267 4
Units for Measuring Running time(2)
• Ex: Sequential Search
int search( int a[0...n-1], int K)
{
int i; executes only once
for( i=0; i < n; i++ ) executes n times
if( a[i] = = K ) executes at most n times
return i; at most once
return –1; at most once
}

BM 267 5
Units for Measuring Running time(3)
• Most sorting algorithms work by comparing elements of a
list with each other; for such algorithms the basic operation
is a key comparision.
• As another example algorithms for matrix multiplication
require two operations; multiplication and addition.
multiplication takes much more longer time than addition
so we can use the total number of multiplication as our unit
for algorithm measure.
• MatrixMultiplication(A[0..n-1, 0..n-1], B[0..n-1, 0..n-1])
for( i=0; i < n; i++)
for( j=0; j < n; j++)
C[i,j]=0;
for( k=0; k < n; k++)
C[i,j] = C[i,j] + A[i,k]*B[k,j];
BM 267 6
Units for Measuring Running time(4)
• Efficiency T(n) is investigated as a function
of some parameter n indicating problem’s
size.
• T(n) is computed as the number of times the
algorithm’s “basic operation” is executed.

• For matrix multiplication T(n)=n3

BM 267 7
Values of several functions important for analysis of
algorithms

BM 267 8
Worst, Best, and Average Case
•There are many algorithms for which running time depends
not only on an input size but also on a particular input.

•The running time of search algorithm can be quite different


for the same list size n. In the worst-case, when there are no
matching elements or the first matching element happens to
be the last one on the list, the algorithm makes the largest
number of key comparisons among all possible inputs of size
n:
T(n)worst= n.
•The worst-case efficiency of an algorithm is its efficiency for
the worst case input of size n, which is an input of size n for
which the algorithm runs the longest among all possible
inputs of that size.
BM 267 9
Worst, Best, and Average Case(2)
• In order to find the worst case efficiency of an algorithm,
we analyze the algorithm to see what kind of inputs yield
the largest value of the basic operation’s count among all
possible inputs of size n.
• The worst case analysis provides very important
information about an algorithm’s efficiency by bounding
its running time from above. In other words, it guarantees
that for any instance of size ‘n’ the running time will not
exceed T(n)worst .
• The best-case efficiency of an algorithm is its efficiency
for the best case input of size n, which is an input of size n
for which the algorithm runs the fastest.
• The best case does not mean that the smallest input; it
means the input of size n for which the algorithm runs the
fastest.
BM 267 10
Worst, Best, and Average Case(3)
• For example, for search best case inputs will be of size n
with their first elements equal to search key. T(n)best=1
• The best case efficiency is not as important as worst case
efficiency.
• Neither worst-case nor best-case efficiency yields the
necessary information about an algorithms behavior on a
random input. The average-case efficiency provide us this
kind of information.
• To analyze the algorithm’s average-case efficiency’ we
must make some assumptions about possible inputs of size
n.
• The assumptions are:
– The probability of a successful search is equal to p ( 0
p 1)
– The probability of the first match occurring in the i th
position of the list is the same for every i.
Worst, Best, and Average Case(4)
• In the case of a successful search, the probability of the
first match occurring in the i th position of the list is p/n for
every i.
• The number of comparisons made by the algorithm is i.
• In the case of unsuccessful search, the number of
comparisons is n with the probability of such a search
being (1-p).
• Tavg(n)= [1*p/n + 2*p/n+…+n*p/n] + n*(1-p)
= p/n*[1 + 2+…+n] + n*(1-p)
= p*(n+1)/2 + n(1-p)
If p=1( successful search) Tavg(n)= (n+1)/2
If p=0( unsuccessful search) Tavg(n)= n

BM 267 12
Mathematical Analysis of Nonrecursive Algorithms
• General Plan for Analyzing Efficiency on Nonrecursive
Algorithms
– Decide on parameters indicating an input’s size
– Identify the algorithms basic operations( As a rule, it is
located in its inner loop)
– Check whether the number of times the basic operation
is executed. Investigate worst, best, and average case
efficiency.
– Set up a sum expressing the number of times the
algorithm's basic operation is executed.
– Using standard formulas and rules of sum
manipulation, find a closed form for the count

BM 267 13
Mathematical Analysis of Nonrecursive Algorithms(2)
• Ex 1:
Algorithm MaxElement( A[0…n-1])
{ //Determines the largest element of the array
maxval=A[0];
for( i=1; i < n; i++)
if( A[i]>maxval)
maxval=A[i];
return maxval;
}
• The input size is the number of element in the array; n
• The operation that are going to be executed most often are in the
algorithm’s for loop.

BM 267 14
Mathematical Analysis of Nonrecursive Algorithms(3)
• There are two operations in loop’s body;the comparisons
A[i]>maxval and the assignment maxval=A[i]. We consider the
comparison as the basic operation because it is executed in each
repetition.
• The number of comparisons will be n-1.
• The worst, best and average case efficiency will be same.
• The algorithm makes one comparisons on each execution of the
loop; therefore
n-1

T(n) =  1 = n-1
i=1

BM 267 15
Mathematical Analysis of Nonrecursive Algorithms(4)
• Ex 1:
Algorithm UniqueElements( A[0…n-1] )
{ //checks whether all the elements in a given array are distinct
for( i=0; i  n-2; i++)
for( j=i+1; j  n-1; j++)
if( A[i] = = A[j])
return false;
return true;
}
• The input’s size is the number of elements in the array: n
• The basic operation is comparison: A[i] = = A[j]

BM 267 16
Mathematical Analysis of Nonrecursive Algorithms(5)
• Worst case occurs in two situations: arrays with no equal
elements and the last two elements are the only equal pairs.
• For such inputs, one comparison is made for each repetition of
innermost loop therefore:
n-2 n-1 n-2 n-2
Tworst(n) =   1=  [( n-1)-(i+1) +1] =  ( n-1+i)
i=0 j=i+1 i=0 i=0
=(n-1)*n / 2
In the worst case algorithm has to make (n-1)*n / 2 comparisons.
What is efficiency of matrix multiplication?
n-1 n-1 n-1

Solution: =    1 = n3
i=0 j=0 k=0
BM 267 17
Mathematical Analysis of Recursive Algorithms
• General Plan for Analyzing Efficiency on Recursive Algorithms
– Decide on parameters indicating an input’s size
– Identify the algorithms basic operations( As a rule, it is located in its inner
loop)
– Check whether the number of times the basic operation is executed.
Investigate worst, best, and average case efficiency.
– Set up a recurrence relation, with an initial condition, for the number of
times the basic operation is executed.
• Ex 1: Computing the factorial function F(n) = n!
n! = n*(n-1)*(n-2)*…*3*2*1 = n*(n-1)! for n  1 and 0!=1
we can compute F(n)= F(n-1)*n

BM 267 18
Mathematical Analysis of Recursive Algorithms(2)
• Algorithm F(n)
{//Computes n! recursively
If ( n = = 0 )
return 1;
else
return F(n-1)*n; }
• We consider “n” as the algorithm input size.
• The basic operation of the algorithm is again multiplication.
• We denote T(n) the number of multiplications needed to
compute F(n).
T(n) = T ( n – 1 ) + 1
to compute to multiply
F(n-1) F(n-1) by n

BM 267 19
Mathematical Analysis of Recursive Algorithms(3)
• How to solve a recurrence equation
 We need a initial condition that tells us the value which
the sequence starts. We can obtain this value by
inspecting the condition that makes the algorithm stop its
recursive calls.
if ( n = = 0 ) return 1;
This tells us two things. First the calls stop when n=0.
Second we can see that when n=0 the algorithm performs
no multiplication.There fore
T(0) = 0
 We use backward substitution to solve the equation

BM 267 20
Mathematical Analysis of Recursive Algorithms(4)
T(n) = T(n-1) + 1 substitute T(n-1) with T(n-2) +1
= [T(n-2) +1] +1 = T(n-2) +2 substitute T(n-2) with T(n-3) +1
= [T(n-3) +1] +2 = T(n-3) +3
….
T(n) = T(n-i) +i

When i = n
T(n) = T(n-n) + n = T(0) + n = 0 + n = n
So we need n multiplication to compute F(n)

BM 267 21
BM267 - Introduction to
Data Structures

2. Elementary Data Structures


Part 1

1
Objectives
• Review basic C knowledge
– Learn basic types, int, float, char.
– Learn how to write and call functions.
– Learn how to define C structures which put
pieces of information together.
– Learn to use pointers which refer to information
indirectly.
– Learn general approach to organize our C
programs.

BM 267 2
Basic Types
• A data type is a set of values and collection of operations
on those values.
• In C, programs are built from just a few types of data
– Integers:short int, int, long int
– Floating-point numbers: float, double
– Characters:char
• When we perform an operation, we need to ensure that its
operands and result are of the correct type.
• C performs implicit type conversions.
• We can use cast, or explicit type conversions.
• For example, if x and N are integers
((float ) x ) / N the result of this operation is floating point

BM 267 3
Operations on basic data types

Arithmetic operations + - * / % ++ --

Relational operations == < > != <= >=

Logical operations && ||

Bitwise operations & | ^ ~

Shift operations << >>

BM 267 4
Functions
• We define functions to implement new operations on data.
• All C programs include a definition of the function main().
• All functions have a list of parameters, the list can be
empty and functions may return a value or nothing.
• In order to declare a function, you should give return type,
its name and paramater types.
• Ex: int lg(int);
• In a function definition, you should give names to the
arguments, do the desired computation using these
parameters.
• Definition and declaration of function could be in different
files, but you should include the declaration file into
definition file.

BM 267 5
Functions Example
#include <stdio.h>
int lg(int);
int main(){
int i, N;
for (i = 1, N = 10; i <= 6; i++, N *= 10)
printf("%7d %2d %9d\n", N, lg(N),
N*lg(N));
return 0;
}
int lg(int N){
int i;
for (i = 0; N > 0; i++, N /= 2) ;
return i;
}

BM 267 6
Functions Example –2 Average and
Standard Deviation of N integers
1. #include <stdlib.h>
2. #include <stdio.h>
3. #include <math.h>
4. typedef int numType;
5. numType randNum()
6. { return rand(); }
7. int main(int argc, char *argv[])
8. { int i, N = atoi(argv[1]);
9. float m1 = 0.0, m2 = 0.0;
10. numType x;
11. for (i = 0; i < N; i++)
12. {
13. x = randNum();
14. m1 += ((float) x)/N;
15. m2 += ((float) x*x)/N;
16. }
17. printf(" Average: %f\n", m1);
18. printf("Std. deviation: %f\n", sqrt(m2-m1*m1));
19. }
BM 267 7
Program Organization
• As it is recommended, you can split your program
into three files.
• .h file: An interface, which defines the data
structure and declares the funtions to be used to
manipulate.
• .c: An implementation of the functions declared in
the .h file ( must include .h file).
• A client program that uses the functions declared
in the interface ( must include .h file). This file
must implement main() function.

BM 267 8
Program Organization (2)
Num.h
1. typedef int numType;
2. numType randNum();
Num.c
1. #include <stdlib.h>
2. #include “Num.h”
3. numType randNum()
4. { return rand(); }
Client.c
1. #include <stdio.h>
2. #include <math.h>
3. #include “Num.h”
4. int main(int argc, char *argv[])
5. { implementation of main }

BM 267 9
Structs
• We need data structures that allow us to handle
collections of data. Arrays and struct allow us to
organize data.
• Structs define a new type of data.
• Structs are aggregate types that we use to define
collections of data. The members of a struct can be
different type, it can even be another struct, but
arrays can hold only one type of data.
• Assume that we need a new type which is callled
Point, unfortunately, there is no such a built-in
type in C standard.
• But C allows us to define such a mechanism using
“struct”.
BM 267 10
Structs(2)
• Accordingly, we can write;
struct Point {float x; float y;}; Do
not forget the semicolon.
• struct Point a, b; declares two Point
variables.
• We can refer each member of the Point struct by
their names. For example
a.x=1.0; a.y=1.0; b.x=4.0;
b.y=5.0;
• We can also pass structs as arguments of a
functions. For example

BM 267 11
Structs(3)
Point.h
• typedef struct { float x; float y; }
point;
• float distance(point a, point b);
Point.c
• #include <math.h>
• #include "Point.h"
• float distance(point a, point b)
• { float dx = a.x - b.x, dy = a.y -
b.y;
• return sqrt(dx*dx + dy*dy);
• }

BM 267 12
Pointers
• C pointers provides us to manipulate data
indirectlly. Basically pointer is a reference to an
object in the memory.
• In order to declare a pointer, you should first give
its type and then put a “*” before giving the
variables name. Ex int *a_Ptr;
• We can declare pointers to any type of data.
Ex: float *f_Ptr, Point *point_Ptr.
• The “&” operator returns the adreess of a
variable.
• When you want to initialize a pointer you can use
“&” operator. Ex int a, *a_Ptr=&a;
BM 267 13
Pointers(2)
• C functions returns only one value, but pointers
allow us to manipulate more variables.
• Ex: polar( float x, float y, float*r,
float* theta)
{
*r = sqrt(x*x+y*y);
*theta= atan2(y,x);
}
• The function call polar(1.0, 1.0, &a, &b) will
effect the values of a and b. ‘a’ will become
sqrt(x*x+y*y) and ‘b’ will become atan2(y,x);

BM 267 14
Arrays
• An array is fixed collection of same type
data that are stored contiguously in the
memory.
• a 0 1 2 3 4

• You can declare an array in this way:


type name[const unsigned int];
• You can reach the element of an array by its
index. Ex: a[i];

BM 267 15
Dynamic Memory Allocation
• Dynamic memory allocation allow us to obtain blocks of
memory as needed during execution.
• Using dynamic memory allocation, we can design data
structures that grow and shrink.
• To allocate memory dynamically, we will need to call one
of the three memory allocation functions declared in the
<stdlib.h> header.
 malloc: allocates a block of memory, but doesn’t initialize it.
 calloc: allocates a block of memory and clears it.
 realloc: Resizes a previously allocated block of memory.
• malloc returns a value of type void*
• When we call a memory function, it may not allocate
enough memory, in this case it returns NULL, we must test
this situation.
BM 267 16
Dynamic Memory Allocation(2)
• Ex: p = malloc( 10000 );
İf( p==NULL)
// allocation failed, take
appropriate action;
• We use sizeof operator to calculate the amount of space
required.
• Ex: Point * p = malloc(sizeof(Point)) or
Point * p = malloc( n*
sizeof(Point)) it allocates n Point object.
• Once it points to a dynamically allocated block of memory,
we can use p as if it is an array.
• for( int i = 0; i < n; i++)
p[i].x = i;
p[i].y = i;

BM 267 17
Dynamic Memory Allocation(3)
• free() Deallocates memory allocated by malloc
• Takes a pointer as an argument
• free ( ptr );

BM 267 18
BM267 - Introduction to Data
Structures

3. Elementary Data Structures

BLM267 1
Objectives
Learn about elementary data structures - Data structures
that form the basis of more complex data structures.
• Structure: Combines different data types as a unit.
• Array: Combines many instances of the same data type
as a unit.
• Linked list: allows insertions and removals anywhere.
Implementation using dynamic allocation vs fixed
arrays.
• String: allows insertions and removals of substrings
hence the size changes dynamically.

BLM267 2
Self-referential objects

• Object = any data type (elementary or aggregate)


that has memory allocated for it (we talk about
concrete types now).
• Node: object that contains a pointer to a object of
the same type
• Nodes can be linked together to form useful data
structures such as lists, queues, stacks and trees
• By convention, a NULL pointer (value 0) is used to
indicate that the link is not meaningful.

BLM267 3
Node
• A node in the simplest form has two fields:
Data Link

• A node can have two or more links, each pointing to


a node.
Data Link Link

• Link field always contains information that


identifies the successor node.
• This information may be a pointer (memory address),
or an index (into an array).

BLM267 4
Node

Data Link

• Data field can contain any data type.


• Link field points to a node (it contains a memory
address)

• If link field has a value of 0 (NULL or NIL) then


node does not have a successor.
• Then, “the link is terminated”:

Data 0 Data NULL Data Data

BLM267 5
Node

A node for int values defined in C:


struct node
{
int data; data next
node * next;
};

• You can define an array of such structures:


struct node array[N];

• Or allocate each node separately:


malloc(sizeof( struct node));

BLM267 6
Programming errors with pointers
• Dereferencing a NULL pointer ptr

struct node *q = NULL; 0

q->data = 5; /* ERROR */

• Using a freed element node

free(q->next);
q->next->data = 6; /* PROBLEM */

• Using a pointer before set


q = (node*)malloc(sizeof(node));
q->next->data = 7; /* ERROR */

BLM267 7
Linked List (as an elementary data type)
The simplest kind of linked list is a linear chain of links
(pointers) to nodes of the same type.
More sophisticated linked lists maye have several chains
of pointers.
A linked list can only be reached thru its handle(s).
Handles are called the head pointer (or list-head), tail-
pointer (or list-end).

BLM267 8
Linked list - most common types
Singly linked list
• Each node points to the next
• Terminates with a null pointer
• Only traversed in one direction
Circular, singly linked
• Pointer in the last node points back to the first node
Doubly linked list
• May have two “start pointers” – head element and tail element
• Each node has forward / backward pointers
• Allows traversals both forwards and backwards
Circular, doubly linked list
• Forward pointer of the last node points to the first node and
backward pointer of the first node points to the last node
BLM267 9
Singly-linked list

• Consists of a sequence of nodes with one link field.

12 23 26 44
LH

• The first node in the list is called the head node.


• To access a node, the predecessor of the node must
be available.
• To access the head node, its address is kept in a
special pointer named head (or listhead, LH)
pointer outside the list.
BLM267 10
Singly-linked list
List operations:
• Insert a new item (pointer to the new node is given)
• Delete an item (the key value of the item is given)
• Search for an item (the key value is given)

Normally, only the listhead (LH) is given.


There is no information on:
• The number of nodes (might be 0, 1 or any number N).
• The value in each node.

BLM267 11
Singly-linked unordered list

Let’s assume that the items are always added to the


beginning of the list.

22 13 26 9

Deleting an item may require the traversal of the entire


list.
Searching for an item may require the traversal of the
entire list

BLM267 12
Singly-linked unordered list

Adding a node to an unordered list:


LH

• List head pointer (LH) is given,


• A pointer to the new node (newNode) is given
newNode

The desired result is: 12 ?


LH

12

BLM267 13
Singly-linked unordered list

struct LNODE
{
int val;
LNODE * next;
};
LNODE * LH;
LNODE * newNode;

C code for creating a node:


newNode = malloc(sizeof(LNODE));

BLM267 14
Singly-linked unordered list
C code for adding a node to the beginning of the list:
• Let the caller update the listhead pointer
Caller code: LH = AddNode(LH, newNode);
Where:
LNODE * AddNode(LNODE* head, LNODE* newNode)
{ newNode->next = head;
return newNode;
}

• Let the called function update the listhead pointer


Caller code: AddNode(&LH, newNode);
Where:
void AddNode(LNODE** head, LNODE* newNode)
{ newNode->next = *head;
*head = newNode;
}

BLM267 15
Singly-linked unordered list
Deleting a node:
• List head pointer (LH) is given,
• The key value of the item (key) is given (Assume 13)

22 13 26 9

The desired result is:

22 13 26 9

Modified Deleted

BLM267 16
Singly-linked unordered list
For deletions, need to keep two pointers, pointing to the
modified and deleted items.
Special cases: In case of deleting the first item, listhead
pointer LH must be updated.
We will assume that an item with key always exists in the
list.

BLM267 17
Singly-linked unordered list
C code for deleting a node whose data value is given:
• Let the caller update the listhead pointer
Caller code: LH = DeleteNode(LH, key);
Where:
LNODE * DeleteNode(LNODE * head, int key)
{
LNODE * node = head;
LNODE * prev = NULL;
while (node->val != key)
{ prev = node;
node = node->next;
}
if (!prev) (Deleting the first node?)
head = node->next;
else
prev->next = node->next;
free (node);
return head; (Return listhead)
}
BLM267 18
Singly-linked unordered list

• Let the called function update the listhead pointer


Caller code: DeleteNode(&LH, key);
Where:
void DeleteNode(LNODE ** head, int key)
{
LNODE * node = head;
LNODE * prev = NULL;
while (node->val != key) (Find node to delete)
{ prev = node;
node = node->next;
}
if (!prev) (Deleting the first node?)
*head = node->next;
else
prev->next = node->next;
free (node);
}

BLM267 19
Singly-linked unordered list
Search for a value:
• List head pointer (LH) is given,
• The key value (key) is given
22 13 26 9

The desired result is:


• TRUE : Value is in the list
• FALSE: Value is not in the list

BLM267 20
Singly-linked unordered list

Caller code:
if(Search(LH,12)) /*Search for an item with value 12 */
... /* Value found */
else
... /* Value not found */

Where:
int Search(LNODE * node, int key)
{
while (node)
if (node->val == key)
return 1; /* Value found */
else
node = node->next;
return 0; /* Value not found */
}

BLM267 21
Sorted lists
Keep the items on the list in a sorted order, based on
data value in each node
9 13 14 22

Advantages:
• already sorted, no need for sort operation
• operations such as delete, find, etc. need not search to
the end of the list if the item is not in list
Disadvantages
• Insert operation must search for the right place to add
element (slower than simply adding at beginning)
BLM267 22
Singly-linked ordered list
Adding a node to an ordered list:
LH

• List head pointer (LH) is given,


• A pointer to the new node is given
12 ?
Initial configuration
10 14 20

The desired result is:


10 12 14 20

Modified Added
BLM267 23
Circular singly-linked list
• Last node references the first node
• Every node has a successor
• No node in a circular linked list contains NULL

Checking if the circle is completed:

if(node->next == list) ... Pointer comparison

BLM267 24
Singly-linked list with dummy head node

• Dummy head node is always present, even when


the linked list is empty.
• Insertion and deletion algorithms use two pointers,
prev and node,
• For empty lists, initialize prev to reference the
dummy head node, rather than NULL.
• Move both pointers together.

prev node

BLM267 25
Doubly-linked list
A double link node for int values defined in C:
struct LNODE
{
LNODE * llink; llink data rlink
int val;
prev data next
LNODE * rlink;
};

12 15 22 28

BLM267 26
Doubly-linked ordered list
Adding a node to an ordered list:
LH

• List head pointer (LH) is given,


• A pointer to the new node is given
14

12 15 22 28

12 15 22 28

14

BLM267 27
Doubly-linked ordered list
newNode val Adding a node to an ordered list
(Assuming LH is global)
Node = LH (Initialize node pointer)
WHILE (node  NULL)
{
IF (node->val > val) (Find the location to insert)
{ newnode->rlink  node
newnode->llink  node->llink
node->llink  newnode
IF (newnode->llink == NULL) (Adding as first?)
LH = newnode (Update listhead)
ELSE
(newnode->llink)->rlink  newnode
break
}
ELSE node  node->rlink; (keep trying)
}

BLM267 28
Doubly-linked ordered list
Deleting a node:
• List head pointer (LH) is given,
• The key value of the item (val) is given (Assume 15)

12 15 22 28

12 15 22 28

Modified Deleted
Modified

BLM267 29
Implementing lists using arrays
• Arrays have fixed number of elements (check for
overflow).
• Pointers now become array indices (of type integer).
• Since C arrays start with index 0, we will assume -1
corresponds to the NULL pointer.
• The last used array element need to be maintained.
• Here is the NODE structure for use with arrays in C:
struct NODE
{
int val;
int next;
};
BLM267 30
Implementing lists using arrays (2)

NODE node[N]; A NODE array of size N,


(N is a compile-time constant)
int LH = -1;

Note that we have to distinguish unused array elements.


We will accept the convention that a link value of -2
denotes an unused array element.
Therefore, the array has to be initialized.
-1
for(i=0; i<N; i++) node[i].next = -2; LH

-1 means: ‘the chain ends here’, -2


-2 means: ‘node is available (not currently used). -2
Any other value means: 1) ‘node is currently in the list’ and, -2
2) ‘is followed by node whose index is here’. -2
BLM267 31
Implementing lists using arrays (3)
We will also need to find an empty element to
insert the new node (instead of using malloc()).
Caller Code: index = FindEmpty( );
if (index == -1)
{ /* No more space on array */}
else
{ /* Use node[index] */ }
...
Where: int FindEmpty()
{ int i;
for(i=0; i<N; i++)
if(node[i].link == -2)
return i;
return -1;
}

BLM267 32
Implementing lists using arrays (4)
C code for adding a node to the beginning of the list,
assuming node[] and LH are defined globally.

Caller code:
AddNode(newNode);
Where:
void AddNode(int newVal)
{ int index;
index = FindEmpty( );
if (index == -1)
{ /* No more space on array */ }
else
{ node[index].val = newVal;
node[index].next = LH;
LH = index;
}
}
BLM267 33
Implementing lists using arrays (5)
C code for deleting a node, assuming node[] and LH
are defined globally and val always exists in the list.
Caller code: DeleteNode(val);
Where: void DeleteNode(int delVal)
{ int index=LH;
while(!(index < 0))
if(node[index].val == delVal)
break;
else
{ prev = index;
index = node[index].next;
}
if(prev < 0) (Deleting the first node?)
LH = node[index].next;
else
node[prev].next = node[index].next;
node[index].next = -2; (Mark as empty)
} BLM267 34
Implementing lists using arrays (6)
C code for searching a value in an unordered list,
assuming node[] and LH are defined globally.
Caller code: if(SearchVal(val)) /*Search for an item with value val */
... /* Value found */
else
... /* Value not found */

Where: int SearchVal(int val)


{ int index=LH;
while(!(index < 0))
if(node[index].val == val)
return 1; /* Success */
else /* keep trying */
index = node[index].next;
return 0; /* Failure*/
}

BLM267 35
Lists vs. Arrays - A comparison
Space (storage) considerations
• A linked list requires pointers to nodes.
• An array requires the maximum number of elements to
be known in advance. If that maximum is not required,
space is wasted at the end of the array.
Time considerations
• Operations on a linked list require more lines of explicit
code than those in an array. However, addressing an
array element uses more implicit (compiler generated) code.
• Arrays are quicker at finding and altering ‘in the middle’
• Linked lists are quicker at insertions and removals ‘in the
beginning/middle’
BLM267 36
String (as an elementary data type)

• Not a built-in C/C++ type.


• String type can be (and is) implemented transparently.
• Need to grow and shrink in size - Efficiently
represented as (variable-length) array of characters.
• Need to be maintained dynamically (by the system or
by the user.)
• Compute length
• String Operations: • Copy
• Compare strings
• Check substring existence
• Append, insert, delete substring
BLM267 37
String representation
Allocate a fixed number of bytes for characters.
Use as many characters as needed, disregard the rest.
char str1[50];
A N K A R A U N I V E R S I T Y ? ? ? ? ?

Array elements can be addressed individually as chars.


Beginning of the string is the first character: str1[0].
Handle of the string is a pointer to the first character.
How to tell at which position the string ends?

BLM267 38
String representation

Two accepted ways to represent strings.


• ASCIIZ representation: End the string with a binary 0.
A N K A R A U N I V E R S I T Y \0 ? ? ? ?

Null (empty) string: \0 ? ? ? ? ? ? ? ? ? ? ? ? ?

• Size-Content representation: Keep string size at the front.


17 A N K A R A U N I V E R S I T Y ? ? ? ?

Null (empty) string: 00 ? ? ? ? ? ? ? ? ? ? ? ? ?

BLM267 39
Null string representation
char * str; An empty (null) string is NOT represented by:
str = 0; 00000000

An empty (null) string is represented by:


*str = 0;

• Empty or not, string


pointers must always point 0x00
to valid memory locations.

• If a string pointer is NULL, the string is invalid (cannot do


string operations on it)

• A ‘null pointer’ and a ‘null string’ are different entities.

BLM267 40
String buffer
• When processing strings, strings are allowed to grow/shrink
in size (create, copy, append).
• Allocating one (maximum size) array for each string is very
inefficient.
• Compilers use a contiguous memory block called string
buffer (or string space) to keep literal strings which are
constant (format strings, initialization strings etc.)
• Some applications define a string buffer which is a memory
area that contains all strings side by side, and can be
manipulated under program control.
A N K A R A \0 U N I V E R S I T Y \0 ? ? ? ?

BLM267 41
String buffer
If a string grows in size, it may need to be moved to a different
location in string buffer.

A N K A R A \0 U N I V E R S I T Y \0 ? ? ? ?

char * p char * q

After strcat(p,q):
A N K A R A U N I V E R S I T Y \0 \0 ? ? ? ?

char * p char * q

BLM267 42
String buffer
Empty locations in string buffer may need to be compacted from
time to time.

     \0      \0 
   \0   \0    
        \0  
    \0  \0     
   \0 \0

  
  \0

BLM267 43
Elementary data structures

Elementary data structures are the data types that are


implemented in programming language syntax, or can be
added with little effort.
Basically, structures, arrays, linked lists and strings are
sufficient to implement most of the useful data
structures.
Many languages have elementary types as syntactic
(language-defined) data types, or have extensive libraries
that implement them.

BLM267 44
BM267 - Introduction to Data
Structures

4. Abstract Data Types


Pushdown Stack ADT
• A container of objects that are inserted and removed
according to the last-in-first-out (LIFO) principle.
• Objects are inserted into a stack in at any time, but only the
most recently inserted object (last one!) can be removed at
enter exit
any time.

TOP

BOTTOM

BM267 2
Pushdown stacks in action

Web browser:
• Stores the addresses of recently visited sites on a stack.
• Each time a user visits a new site, the address of the site is
pushed into the stack of addresses.
• Using the 'back' button the user can pop back to previously
visited sites!

Text editors:
• Powerful text editors keep text changes in a stack.
• The user can use the undo mechanism to cancel recent editing
operations

BM267 3
Pushdown stack operations
The fundamental operations involved in a stack are
“push” and “pop”.
• push: adds a new element on top of the stack
• pop: removes an element from the top of the stack
It is an error to pop an element from an empty stack.
It is also an error to push an elemet to a full stack.
Other operations
• isEmpty: Checks if the stack is empty
• isFull: checks if the stack is full (if there is an
implementation dependent limit)
BM267 4
Pushdown stack ADT implementation
A stack can be implemented with an array of objects
easily.
• The maximum size of the stack (array) must be
estimated when the array is declared.
• Space is wasted if we use less elements.
• We cannot "push" more elements than the array can
hold (overflow).

If the maximum stack size cannot be be estimated, use


a linked list to implement the stack.

BM267 5
Pushdown stack ADT implementation

Interface for pushdown stack ADT for objects of type


‘Item’:
void STACKinit(int);
int STACKempty();
void STACKpush(Item)
Item STACKpop();

BM267 6
Pushdown stack ADT implementation

Note that, if an array is used, you can visualize (and


implement) stack in several ways.

Top
N-1 0
0
N-2
Top 1 1

1 N-2
Top N-2
0 N-1
N-1

Top points to next available element Top points to last


inserted element

BM267 7
Pushdown stack ADT implementation
void STACKinit(int);
• Initializes an array of Items of specified size.
• Item (structure) is known by both the client and the
implementation

• Must have a pointer (or array index) that points the next available
(or last filled) slot on the stack.

Top
BM267 8
Pushdown stack ADT implementation

Convention:
0

• Need pointer initialized to (index) 0, since we


adopted the convention that top refers to the next
free place in the stack, where a push will be written.
• The last item pushed onto the stack is therefore at
top-1.
• Delete item: We have to decrease top by 1 before
we pop an item from the stack.
• Insert item: We have to push the item first then
increment top.

BM267 9
Pushdown stack ADT implementation

Convention:

• Need pointer initialized to (index) -1, since we


adopted the convention that top refers to the last item
inserted in the stack
• The last item pushed onto the stack is therefore at
top.
• Delete item: We have to remove the item first, then
decrease top by 1.
• Insert item: We have to increment top before
pushing the item.

BM267 10
Pushdown stack ADT implementation
void push(Item); Insert new item at the top of the stack.
Precondition: The stack is not full.
Postcondition: The stack has a new item at the top.

Item pop( ); Remove the item from the top of the stack.
Precondition: The stack is not empty
Postcondition: Either the stack is empty or the stack has a
new topmost item from a previous push.
int STACKempty(); Returns a logical value depending on
the number of elements in the stack.
Precondition: The stack has 0  N  Max elements
Postcondition: The stack has N elements.
BM267 11
Pushdown stack ADT implementation

push(1) push(2)

2
Top
1 1

push(3) pop( ) pop( )


=3 =2

3
2 2
1 1 1

BM267 12
Example: Using a Stack to compute a Hex Number
431 / 16 = 26 26 / 16 = 1 1 / 16 = 0
Rem: 15 Rem: 10 Rem: 1
Push (Hex 15) Push (Hex 10) Push (Hex 1)

'1'
Stack: Empty 'A' 'A'
'F' 'F' 'F'
'

pop( ) = '1' pop( ) = 'A' pop( ) = '1'


String = 1 String = 1A String = 1AF

'1'
'A'
'A' Stack: Empty
'F' 'F' 'F'

BM267 13
Example: Array Implementation of a Stack
int *s;
int Top;
void STACKinit(int maxN){
s = (int *) malloc( maxN * sizeof(int) );
Top = 0; }
int STACKempty(){
return Top == 0; }
void STACKpush(int item){
s[Top++] = item; }
int STACKpop(){
return s[--Top]; }

BM267 14
Example: Postfix Calculation
• Suppose that we need to find the value of a simple arithmetic
expression involving multiplication and addition of integers,
such as
5 * ( ( ( 9 + 8) * ( 4 * 6 ) ) + 7)
• The calculation saves intermediate results: For example, if we
calculate ( 9 + 8 ), then we have to save the result.
• A pushdown stack is the ideal mechanism for saving
intermediate results in a such calculation.
• We can convert to arithmetic expression into postfix
representation. In postfix representation each operator appears
after its two operand.
( 5 + 9 )  5 9 +

BM267 15
Example: Postfix Calculation
5 9 8 + 4 6 * * 7 + *
5 ( 9 + 8 ) 4 6 * * 7 + *
5 17 4 6 * * 7 + *
5 17 ( 4 * 6 ) * 7 + *
5 17 24 * 7 + *
5 ( 17 * 24 ) 7 + *
5 408 7 + *
5 ( 408 + 7 ) *
5 415 *
( 5 * 415 )
2075

BM267 16
Example: Postfix Calculation
6 6 6 6

5 5 5 5

4 4 4 4

3 3 3 3

2 2 2 8 2

1 1 9 1 9 1
Top 5 5
0 5 0 0 0

Top = 0 push(5) push(9) push(8)


Top = 1 Top = 2 Top = 3

5 9 8 + 4 6 * * 7 + *

BM267 17
Example: Postfix Calculation
6 6 6

5 5 5

4 4 4

3 3 3

2 2 2 4
9 1 1 17 1 17
5 5 5 0 5
0 0

pop() = 8 pop() = 9 push(17) push(4)


Top = 2 Top = 1 Top = 2 Top = 3

5 9 8 + 4 6 * * 7 + *

BM267 18
Example: Postfix Calculation
6 6 6

5 5 5

4 4 4
6 3 3 3
4 2 4 2 2 24
17 1 17 1 17 1 17
5 5 5 0 5
0 0

push(6) pop() = 6 pop() = 4 push(24)


Top = 4 Top = 3 Top = 2 Top = 3

5 9 8 + 4 6 * * 7 + *

BM267 19
Example: Postfix Calculation
6 6 6

5 5 5

4 4 4

3 3 3

2 2 2 7
17 1 1 408 1 408
5 5 5 0 5
0 0

pop() = 24 pop() = 17 push(408) push(7)


Top = 2 Top = 1 Top = 2 Top = 3

5 9 8 + 4 6 * * 7 + *

BM267 20
Example: Postfix Calculation
6 6 6

5 5 5

4 4 4

3 3 3

2 2 2
408 1 1 415 1
5 5 5 0 5
0 0

pop() = 7 pop() = 408 push(415) pop()=415


Top = 2 Top = 1 Top = 2 Top = 1

5 9 8 + 4 6 * * 7 + *

BM267 21
Example: Postfix Calculation
6 6
5 5
4 4
3 3
2 2
1 1
0 2075 0

pop() = 5 push(2075)
Top = 0 Top = 1

5 9 8 + 4 6 * * 7 + *

BM267 22
Example: Array Implementation of a Stack
/* Postfix-expression evaluation */
int main(int argc, char *argv[]){
char a[] = "5 11 * 5 + 2 *";
int i, array_size = strlen(a);
STACKinit( array_size );
for (i = 0; i < array_size; i++) {
if (a[i] == '+')
STACKpush( STACKpop()+ STACKpop() );
if (a[i] == '*')
STACKpush( STACKpop() * STACKpop() );
if ( (a[i] >= '0') && ( a[i] <= '9') )
STACKpush(0);
while ((a[i] >= '0') && (a[i] <= '9'))
STACKpush(10*STACKpop() + (a[i++]-'0')); }
printf("%d \n", STACKpop());
return 0;}
BM267 23
Queues

• A queue is a data structure that items can be inserted only at


one end (called rear ) and removed at the other end (called the
front).
• The item at the front end of the queue is called the first item.

rear front
enter 1 7 3 9
exit

BM267 24
Queue operations
• put: adds a new element at the rear of the queue
• Increase the number of element in the queue by 1.
• get: removes an element from the front of the queue
• Decrease the number of elements in the queue by 1
It is an error to get an element from an empty queue.
It is also an error to put an element to a full queue.
Other operations
• isEmpty: Checks if the queue is empty
• isFull: checks if the queue is full (if there is an
implementation dependent limit)

BM267 25
Queue implementation
A queue can be implemented with an array of objects
easily.
• The maximum size of the queue (array) must be
estimated when the array is declared.
• Space is wasted if we use less elements.
• We cannot "put" more elements than the array can
hold (overflow).

If the maximum queue size cannot be be estimated,


use a linked list to implement the queue.

Fall 2005 BM267 26


Queue ADT implementation

Interface for queue ADT for objects of type ‘Item’:


void QUEUEinit(int);
int QUEUEempty();
void QUEUEput(Item)
Item QUEUEget();

(Compare these operations with stack operations)


They are almost the same:
• push, put: insertion
• pop, get: removal

Fall 2005 BM267 27


Queue ADT implementation
void QUEUEinit(int);
• Initializes an array of Items of specified size.
• Item (structure) is known by both the client and the
implementation

• Must have two pointers (or array indices) that point to the rear
and the front of the queue.

rear front

1 4 8 3 2 5 7 6

Fall 2005 BM267 28


Queue Linked List implementation
• Elements can be added and removed in any order
• Therefore it is easier to use a singly-linked list as a queue,
provided two extra pointers are kept.
9 13 14 22

front rear

• Or better yet, use a doubly linked list, to maintain the head


pointer easily.

12 15 22 28

front rear

Fall 2005 BM267 29


Queue Array implementation
First Approach( not efficient!! )

? ? ? ? ? Initial state

rear front = 0 rear = -1


front

1 ? ? ? ? put( 1)
front rear front = 0 rear = 0

put( 5)
1 5 ? ? ?
front = 0 rear = 1
front rear

Fall 2005 BM267 30


Queue Array implementation
put( 3)
1 5 3 ? ? front = 0 rear = 2
front rear

put( 4)
1 5 3 4 ? front = 0 rear = 3
front rear
get( ) = 1
5 3 4 ? ? front = 0 rear = 2
front rear

Fall 2005 BM267 31


Queue Array implementation
get( ) = 5
3 4 ? ? ? front = 0 rear = 1
front rear
put(8)
3 4 8 ? ? front = 0 rear = 2
front rear
front always “0”
rear is initially “–1” and can be at most “N-1”
Observations: put(int) adds 1 to rear
get() subtracts 1 from rear but needs some
elements to be shifted.
if rear = -1 queue is empty
if rear = N-1 queue is full
Queue Array implementation
Second Approach( more efficient)

? ? ? ? ? Initial state

rear front = 0 rear = 0


front
put( 1)
? 1 ? ? ? front = 0 rear = 1
front rear

put( 5)
? 1 5 ? ? front = 0 rear = 2
front rear

Fall 2005 BM267 33


Queue Array implementation
put( 3)
? 1 5 3 ? front = 0 rear = 3
front rear

put( 4)
? 1 5 3 4 front = 0 rear = 4
front rear
get( ) = 1
? ? 5 3 4 front = 1 rear = 4
front rear

Fall 2005 BM267 34


Queue Array implementation
get( ) = 5
? ? ? 3 4 front = 2 rear = 4
front rear
put(9)
9 ? ? 3 4 front = 2 rear = 0
rear front

put(10)
9 10 ? 3 4 front = 2 rear = 1
rear front

Fall 2005 BM267 35


Queue Array implementation
Allocate maxSize+1 element ( 1 for front )
Initially Queue empty front = 0 rear = 0
Observations: Put(int) adds 1 to rear and inserts to array[rear] = item
Get() adds 1 to front and then returns the item
if “rear + 1 = front” Queue is full

Fall 2005 BM267 36


Queue Array implementation
void QUEUEinit(int maxN){
q = ( int * )malloc((maxN+1)*sizeof(int));
N = maxN+1;
front = 0;
rear= 0; }
int QUEUEempty(){
if( rear == front )
return 1;
return 0; }
int QUEUEfull(){
if( ( ( rear + 1) % N ) == front )
return 1;
return 0; }

Fall 2005 BM267 37


Queue Array implementation
void QUEUEput(int item){
if( QUEUEfull() )
printf(" Queue is full!!!");
else {
rear = (rear + 1) % N;
q[rear] = item;} }
int QUEUEget(){
if( QUEUEempty() ){
printf(" Queue is empty!!!");
return -1;}
else{
front = (front + 1) % N;
return q[front];} }

Fall 2005 BM267 38


ADT observations
Pushdown stacks and FIFO queues are special instances of the
generalized queue ADT.
Generalized queue ADT can take many forms depending on the
element insertion and removal policy.
• Pushdown stack: remove the last item.
• FIFO queue: remove the oldest item.
• Random queue: remove a randomly selected item.
• Priority queue: Remove the item with highest (lowest) value.
• Symbol table: remove item whose key is given.
• De-queue (double-ended queue): add/remove items at either end.

Fall 2005 BM267 39


ADT duplicate elements
ADT's also differ in their element acceptance criteria.
"Is element duplication allowed?"
9 14 14 22

Some policies are:


• Let the client (user) decide.
• Duplicates are allowed (triplicates as well...)
• Duplicates are never allowed, new element is ignored.
• Duplicates are never allowed, new element replaces the old.
• Duplicates are never allowed, retain the more desirable element.

Fall 2005 BM267 40


ADT duplicate elements

If duplication is not allowed,


• A test function is needed to determine item existence
(whether an item is already in the data structure). Sometimes
a second array may be used for this purpose.
• A test function is needed for testing item equality.

Fall 2005 BM267 41


ADT duplicate elements
If the keys are unique and relatively small, use a second array:

1 12 14 22

1 0 ... 1 0 1 0 ... 1 0 0 0
1 2 ... 12 13 14 15 ... 22 ...

Note that the linked list shown above is an ADT: it may actually
be implemented using an array.

Fall 2005 BM267 42


BM267 - Introduction to Data
Structures

5. Recursion

BM267 1
Objectives

Learn about
• Recursion
• Divide and conquer
• General and binary tree structures
• Implementation of trees
• Mathematical properties of trees.
• Tree operations, tree traversal algorithms

BM267 2
Recursion
• A recursive definition is one which uses the word or concept
being defined in the definition itself

• Consider the following list of numbers:

24, 88, 40
• Such a list can be defined on paper as
A LIST is a: number
or a: number comma LIST

• That is, a LIST is defined to be a single number,


• Or a number followed by a comma followed by another LIST

• The concept of LIST is used to define itself.


BM267 3
Recursion
• If you apply this definition to the actual list of numbers, the
recursive part of the LIST definition is used several times,
terminating with the non-recursive part:

LIST  number , LIST


24 , 30, 40

number , LIST
24 , 30 , 40

number
24 , 30 , 40

BM267 4
Recursion
• All recursive definitions have to have a terminating case
A LIST is:
either number, followed by a comma, followed by another LIST
or number

• Otherwise, there would be no way to terminate the recursive


path

• Such a definition would cause infinite recursion

• This problem is similar to an infinite loop, but the non-


terminating "loop" is part of the definition itself

• The non-recursive part is often called the base case


BM267 5
Recursive functions

• Recursion simply means a function that calls itself.


• The conditions that cause a function to call itself
again are called the recursive case.
• In order to keep the recursion from going on forever,
you must make sure you hit a termination condition
called the base case.
• The number of nested invocations is called the depth
of recursion.
• Function may call itself directly or indirectly. (All of
our examples are direct.)

BM267 6
Recursive functions

• Recursive functions must satisfy two basic


properties:
– They must explicitly solve a base case.
– Each recursive call must involve smaller values of the
argument.

Euclid: Greatest common divisor


int gcd(int m, int n)
{
if (n == 0)
return m;
return gcd(n, m % n);
}

BM267 7
Recursive functions
int puzzle(int N) Here, we cannot use
{ induction to prove
if (N = = 1) that this program
terminates,
return 1;
because not every
if (N % 2 == 0) recursive call uses
return puzzle(N/2); an argument
else smaller than the
return puzzle(3*N+1); one given.
}

BM267 8
Recursive functions
Linked list node count:
int count(link x)
{
if (x == NULL)
return 0;
return 1 + count(x->next);
}

BM267 9
Recursive functions
Factorial function
 N!, for any positive integer N, is defined to be the product
of all integers between 1 and N inclusive
 This definition can be expressed recursively as:
 0! = 1
 1! = 1
 N! = N * (N-1)!
 The concept of the factorial is defined in terms of another
factorial
 Eventually, the base case of 1! is reached.

BM267 10
Recursive functions
• Recursion and looping has similar meanings.
• Loop termination condition has the same role as a recursive
base case.
• A loop’s control variable serves the same role as a general case.
Termination
sum = 0;
Condition int Factorial(int n)
i = 1;
{
while(i <= 10)
if (n == 0) (base case)
{
return 1;
sum += i;
else ( n>0, recursive case)
i++;
return n*Factorial(n-1);
}
}

Loop control and recursive case


both move toward termination condition

BM267 11
Recursive functions
Function call Factorial(5) proceeds as below:

120
5!
24
5 * 4!
6
4 * 3!
3 * 2!
2
2 * 1!
1
BM267 12
Recursive functions

• Consider the problem of computing the sum of all the numbers


between 1 and any positive integer N

• This problem can be recursively defined as:

N N-1 N-2
i = N + i = N + (N-1) + i
i=1 i=1 i=1

BM267 13
Recursive functions

result = 6
main
sum(3)

result = 3
sum
sum(2)

result = 1
sum
int countdown (int n){
sum(1)
if (n > 1)
return (n + countdown(n - 1)); sum
else
return 1;
}
BM267 14
Recursive functions

• Note that just because we can use recursion to solve a problem,


doesn't mean we should

• For instance, we usually would not use recursion to solve the


sum of 1 to N problem, because the iterative version is easier to
understand

• However, for some problems, recursion provides an elegant


solution, often cleaner than an iterative version

• You must carefully decide whether recursion is the correct


technique for any problem

BM267 15
Recursive functions
•Fibonacci numbers
–0, 1, 1, 2, 3, 5, 8...
–Each number sum of the previous two
fib( n ) = fib( n - 1 ) + fib( n - 2 )
–Base case: fib(0) = 0 and fib(1) = 1

Fall 2005 BM267 16


Recursive functions
int fib(int n){
if( n == 0)
return 0;
else if ( n == 1 )
return 1;
else
return ( fib(n-2) + fib(n-1) );
}

Fall 2005 BM267 17


Recursive functions
int main(){
int i, array[32];
for( i = 0 ; i<32; i++) {
if( i == 0 )
array[i]= 0;
else if( i == 1)
array[i] = 1;
else
array[i] = array[i-2] + array[i-1];
}
cout<<array[31];
return 0;
}

Fall 2005 BM267 18


Divide and conquer
• An effective approach to designing fast algorithm in sequential
computation is the method known as divide and conquer.
• The problem to be solved is broken into a number of
subprograms (typically two) of the same form as the original
problem; this is the divide step.
• The subproblems are then solved independently, usually
recursively; this is the conquer step.
• Finally, the solutions to the subproblems are combined to
provide the answer to the original problem.

Fall 2005 BM267 19


Divide and conquer

•The sorting algorithms Mergesort and Quicksort are both based


on the divide-and-conquer approach.
•Example:Let us consider the task of finding the maximum( or
minimum) of N items stored in an array.
•Iterative Findmax:
for( max =a[0], i =1 ; i < N; i++)
if(a[i] > max)
max = a[i];

T(n) = n-1

Fall 2005 BM267 20


Divide and conquer
Divide and Conquer solution of finding max of N integers.
int max(int a[], int l, int r){
int u, v;
int m = (l+r)/2;
if (l == r)
return a[l];
u = max(a, l, m);
v = max(a, m+1, r);
if (u > v)
return u;
else
return v;
}

Fall 2005 BM267 21


Divide and conquer
• Assume that N = 2k
• T(n) = 2 T( n/ 2) +1
1 5 2 6 9 3 4 8

1 5 2 6 9 3 4 8

1 5 2 6 9 3 4 8

1 5 2 6 9 3 4 8

Fall 2005 BM267 22


Divide and conquer

1 5 2 6 9 3 4 8

5 6 9 8

6 9

9 max

Fall 2005 BM267 23


Divide and conquer
T(n) = 2 T( n/ 2) +1 T(n/2) = 2 T( n/ 4) +1
T(n) = 2 (2 T( n/ 4) +1) +1
= 22T(n/4) + 2 +1 T(n/4) = 2 T( n/ 8) +1

T(n) = 22(2 T( n/ 8) +1) + 2 + 1


= 23T(n/8) + 22 + 2 + 1
= 23T(n/23) + 22 + 2 + 1
…..
= 2kT(n/2k) + 2k-1+ 2k-2+… + 22 + 2 + 1
k-1

= 2kT(n/2k) + Σ 2i
i=0

Fall 2005 BM267 24


Divide and conquer
From the base case where T(1) = 0
(n/2k) = 1  n = 2k k = logn

k-1 x k -1
and we know that Σ 2i = --------------------

i= 0 x–1

k-1 2 k -1
T(n) = 2kT(n/2k) + Σ 2i = 2kT(n/2k) + --------------------
i=0 2– 1

T(n) = 2k * 0 + 2 k -1 = 2 logn -1 = n-1


Fall 2005 BM267 25
BM267 - Introduction to Data
Structures

5. Trees

BM267 1
Objectives

Learn about the definitions, characteristics and


implementation details for:
• General trees
• Rooted trees
• Binary and N-ary trees
• Tree operations, tree traversal algorithms

BM267 2
Tree Structures

One of the most frequently used ordering methods of data.


Many logical organizations of everyday data exhibit tree
structures
Promotional tournaments
Organizational charts
Hierarchical organization of entities
Parsing natural and computer languages
Game trees
Decision trees
...

BM267 3
Trees

A real tree

Computer Scientist's tree

BM267 4
Trees

A general tree is a nonempty collection of vertices


(nodes) and connections between nodes (edges) that
satisfy certain rules. These rules impose a hierarchical
structure on the nodes with a parent-child relation.

There is only one connecting path between any two


nodes.

BM267 5
Trees
Root
14

17 11

9 53 4

19 7
13

BM267 6
Trees

+ *

9 53 4 7

= ( 9 + 53 ) – ( 4 * 7 )

BM267 7
Rooted Trees

• There is a unique node called root node. Node “14” is the root
of the tree.
• The parent of a node is the node linked above it. Every non-
root node has a unique parent. Node 17 is the parent of 9 and
53.
• The nodes whose parent is node n are n’s children. The
children of Node 17 are 9 and 53.
• Nodes without children are leaves. Nodes 13, 53, 19, and 7 are
leaves.
• Two nodes are siblings if the have the same parent. 9 and 53 are
siblings of each other.

BM267 8
Rooted Trees

Root

Root’s
children

Leaves

BM267 9
Rooted Trees
• An empty tree has no nodes
• The descendants of a node are its children and the
descendents of its children
• The ancestors of a node are its parent (if any) and the
ancestors of its parent
• An ordered tree is one in which the order of the
children is important; an unordered tree is one in
which the ordering of the children is not important.
• The branching factor of a node is the number of
children it has.

BM267 10
Rooted Trees

The depth or level of a node n is the number


of edges on a path from the root to n.
0
The depth of the root is 0.
Root is at level 0.
1 1 1 1

2 2 2 2 2 2 2

3 3

BM267 11
Rooted Trees
The height of a node n is the number of
edges on the longest path from n to a
descendent leaf.
3

1 1 2 1

0 0 0 1 0 0 0

0 0 The height of each leaf is 0.


BM267 12
Binary Trees

A binary tree is a special rooted tree in which


every node has at most 2 children.

Children are ordered: every child is explicitly


designated as left or right child.
BM267 13
Binary Trees
• The i-th level of a binary tree contains all nodes at
depth i.
• The height of a binary tree is the height of its root.
• The i-th level of a binary tree contains at most 2i
nodes.
• A binary tree of height h contains at most 2h+1–1
nodes.
• A binary tree of height h has at most 2h leaves.

BM267 14
Binary Trees

20
Level 0
21
Level 1

Level 2 22
23
Level 3

Total nodes = 2h + 2h-1 +…+ 22 +21 +20


2h+1 - 1
= ------------
2 -1
BM267 15
Binary Trees
A binary tree is complete( perfect ) if:
• Every node has either zero or two children. (Every
internal node has two children.)
• Every leaf is at the same level.

BM267 16
Binary Trees

A binary tree is almost complete (perfect) if


• All levels of the tree are complete, except possibly
the last one.
• The nodes on the last level are as far left as possible.

BM267 17
Binary Trees

• A almost complete binary tree of height h


contains between 2h and 2h + 1 – 1 nodes.
• A almost complete binary tree of size n has
height h = floor( log n ).
2h <= n <= 2h + 1 – 1
h <= log n < h+1

BM267 18
Binary Trees

(integer
parent(i)= (i –1) / 2 division )
left(i) = 2i + 1
0 a
right(i) = 2i + 2
1 l 2 g

3 o 4 r 5 i 6 t

s 9
7 h 8m
a l g o r i t h m s
0 1 2 3 4 5 6 7 8 9
BM267 19
Binary Trees
We can also represent incomplete binary trees in an array

A 0

1 B 2

5 C 6
3 4

A B C
0 1 2 3 4 5 6
BM267 20
Binary Trees
struct TreeNode{
Linked representations of binary trees.
char data;
root
struct TreeNode *left;
struct TreeNode *right;
A }

B G

E K M

BM267 21
Binary Trees
Common Binary Tree Operations
– Determine its height
– Determine the number of elements in it
– Display the binary tree on the screen.
Returns the height of the tree.
int height(link h)
{ int u, v;
if (h == NULL)
return -1;
u = height(h->l);
v = height(h->r);
if (u > v) return u+1;
else return v+1; }
BM267 22
Binary Trees
Returns the number of elements in the tree.
int count(link h){
if (h == NULL)
return 0;
return count(h->l) + count(h->r) + 1;
}

BM267 23
Tree Traversals

• To traverse (or walk) the binary tree is to visit each


node in the binary tree exactly once
• Tree traversals are naturally recursive.
• Since a binary tree has two dimensions, there are
two possible ways to traverse the binary tree
• Depth-first - visit nodes on the same path first
(start from top, go as far down as possible)
• Breadth-first - visit nodes at the same level first
(start from left, go as far right as possible)

BM267 24
Depth-first Traversals (binary trees)

• Since a binary tree has three “parts,” there are three


possible ways to traverse the binary tree (from left to
right) :
• Pre-order: the node is visited first, then the
children (left to right)
• In-order: the left child is visited, then the
node, then the right child
• Post-order: the node is visited after the
children
BM267 25
Pre-order Traversal

j f

d b a e

i c g

hjdicbgfae
: Node is visited here

BM267 26
In-order Traversal

j f

d b a e

i c g

idcjgbh af e

BM267 27
Post-order Traversal

j f

d b a e

i c g

icdgbjaefh

BM267 28
Breadth-first Traversal

j f

d b a e

i c g

hjfdbaeicg

BM267 29
Tree Traversal - Preorder
Prints the nodes’ data in Preorder
void traverse( LINK h )
{
if (h)
{
printf(“%d”, h->data); //(prints the node)
traverse(h->left);
traverse(h->right);
}
}

BM267 30
Tree Traversal - Inorder
Prints the nodes’ data in Inorder
void traverse( LINK h )
{
if (h)
{
traverse(h->left);
printf(“%d”, h->data); //(prints the node)
traverse(h->right);
}
}

BM267 31
Tree Traversal - Postorder
Prints the nodes’ data in Postorder
void traverse( LINK h )
{
if (h)
{
traverse(h->left);
traverse(h->right);
printf(“%d”, h->data); //(prints the node)
}
}

BM267 32
Implementing (general) rooted trees

BM267 33
Example: Variable-length codes

ASCII uses 8-bits for coding letters (fixed-length code).


To minimize the space requirements, we can use an
alternate coding scheme (variable-length code):
• Let the most frequently used letters be represented
with shorter bit sequences (depends on the
language being coded).
• Let the least frequently used letters be represented
with longer bit sequences.

BM267 34
Example: Variable-length codes
Requirements: For each possible coded sequence, the
sequence must be
• uniquely decodeable.
• instantaneously decodeable (without the need for
further computations or table look-ups).

This philosophy had been employed in Morse code.


Also known as Huffman coding.

BM267 35
Example: Variable-length codes
Let our alphabet consist of 5 symbols, A, B, C, D, E.

Symbol Freq.(%)
A 40
B 25 Consider the code for
C 15 ABCDE.
D 15
E 5

BM267 36
Example: Variable-length codes
Assume the following
codes were chosen:
Symbol Code Consider the coding for ABCDE.
A 1
The code will be: 1000111011
B 00
C 01 Can you decode it?
D 11 1 . 00 . 01 . 11 ? 011
E 011
Is 011 = 011 (E) or 01.1 (CA) ?

This code is not


uniquely decodeable.

BM267 37
Example: Variable-length codes
Assume the following
codes were chosen: Consider the coding for ABCDE.
Symbol Code
The code will be: 0010110111111
A 0
B 01 Can you decode it?
C 011
0.01 ? 0110111111
D 0111
Is 011 = 01 (B) . 1 or 011 (C) ?
E 111

This code is not instantaneously


decodeable. You have check the
next digit.
BM267 38
Example: Variable-length codes

Draw the code tree. Start from the root and follow
the edges until a code word is found.
Repeat until decoding is completed.

This code is not instantaneously


1 0
decodeable. You have check the
A next digit (compare the next digit
1 1
B with the next edge on the tree).
1 1
C However, it is uniquely
E 1 decodeable.
D

BM267 39
Huffman Coding

5 15 15 25 40 Initial State
E D C B A

15 20 25 40
C B A

5 15
E D

BM267 40
Huffman Coding

25 35 40
B A
15 20

5 15
E D

BM267 41
Huffman Coding
40 60
A

25 35

B
15 20

5 15
E D

BM267 42
Huffman Coding
0 100 1

40 60
0 1
A
25 35
1
0
B
15 20

C 0 1

5 15
E D
BM267 43
Huffman Coding
Symbol Code
A 0 Consider the coding for ABCDE.
B 10
The code will be: 01011011111110
C 110
D 1111 Can you decode it?
E 1110
0 . 10 . 110. 1111 . 1110

Analysis: With the given frequencies, the expected number of bits


per character is: = 1X0.40 + 2X0.25+ 3X0.15 + 4X0.15 + 4 X 0.05
= 2.25

BM267 44
BM267 - Introduction to Data
Structures

6. Elementary Sorting
Methods

BM267 1
Objectives

Learn about
• Elementary sorting algorithms
• Selection sort
• Insertion sort
• Bubble sort
• Analysis of each sorting method

BM267 2
Sorting

• Sorting takes an unordered collection and makes it


an ordered one.
1 2 3 4 5 6

77 42 35 12 101 5

1 2 3 4 5 6
5 12 35 42 77 101

BM267 3
Choosing a sorting algorithm

• Elementary sorting algorithms are usually slower, but


easy to implement.
• The more complex algorithms are not always the
preferred ones. (Needs more attention)
• Elementary algorithms are generally more appropriate
in the following situations:
• Less than a few hundred values to be sorted
• The values will be sorted just once
• Special cases such as:
• the input data are "almost sorted"
• there are many equal keys
BM267 4
Internal / external sorting
• If the file to be sorted can fit into computer’s
memory, then sorting method is called internal
sorting.

• Sorting files from tape or disk is called external


sorting.

• Partially sorted blocks need to be combined or


merged in some manner to eventually sort the entire
list

BM267 5
Element stability
• A sorting method is said to be stable if it preserves the relative
order of items with duplicated keys in the file. Items with
identical keys should appear in the same order as in the
original input.

• For instance, consider sorting a list of student records


alphabetically by name, and then sorting the list again, but this
time by letter grade in a particular course. If the sorting
algorithm is stable, then all the students who got "A" will be
listed alphabetically.

• Stability is a difficult property to achieve if we also want our


sorting algorithm to be efficient.

BM267 6
Element stability
Adams A Adams A Adams A
Black B Smith A Smith A
Brown D Washington B Black B
Jackson B Jackson B Jackson B
Jones D Black B Washington B
Smith A White C White C
Thompson D Wilson C Wilson C
Washington B Thompson D Brown D
White C Brown D Jones D
Wilson C Jones D Thompson D

BM267 7
Indirect sorting

• Many sorting algorithms move and interchange


records in memory several times during the process
of sorting.

• For large records, or when the data set is large, this is


inefficient.

• With "indirect sorting“, the indices (or pointers) of


the records are sorted, rather than the records
themselves.

BM267 8
Selection sort
• Selection sort works as follows:
• Find the smallest element in the array, and exchange it
with the element in the first position.
• Then, find second smallest element and exchange it with
the element in the second position.
• Continue in this way until the array is sorted.

i
25 50 10 95 75 30 70 55 60 80

10 50 25 95 75 30 70 55 60 80

BM267 9
Selection sort
i
10 50 25 95 75 30 70 55 60 80

10 25 50 95 75 30 70 55 60 80

10 25 30 95 75 50 70 55 60 80

10 25 30 50 75 95 70 55 60 80

10 25 30 50 55 95 70 75 60 80

10 25 30 50 55 60 70 75 95 80
BM267 10
Selection sort

10 25 30 50 55 60 70 75 95 80

10 25 30 50 55 60 70 75 95 80

10 25 30 50 55 60 70 75 80 95

BM267 11
Selection sort

void SelectionSort (int A[], int N)


{ int i,j,min;
for (i=0; i<N-1, i++)
{
/* find the the smallest among A[j]...A[n-1] */
/* place it in A[i] */
min = i;
for (j=i+1; j<N; j++)
if ( A[j] < A[min] )
min = j;
swap(A[i], A[min]);
}
}

BM267 12
Selection sort- Is selection sort stable?

25 10 75 25 95 15

10 25 75 25 95 15

10 15 75 25 95 25

10 15 25 75 95 25

10 15 25 25 95 75

10 15 25 25 75 95
BM267 13
Selection sort - analysis
Iteration 1:
• Find smallest value in a list of n values: n-1 comparisons
• Exchange values and move marker
Iteration 2:
• Find smallest value in a list of n-1 numbers: n-2 comparisons
• Exchange values and move marker

Iteration n-2:
• Find smallest value in a list of 2 numbers: 1 comparison
• Exchange values and move marker
Total: (n-1) + (n-2) + …. + 2 + 1 = n(n-1)/2

BM267 14
Selection sort - analysis
Space efficiency:
• No extra space used (except for a few
variables)

Time efficiency:
• The best-case and worst-case are same. All
input sequences need same number of
comparisons.
• the amount of work is the same:
T(n) = n(n-1)/2

BM267 15
Insertion sort
• Insert the first element of the unsorted array into
already sorted portion of the array by shifting all
larger elements to the right.

• Initially the sorted portion consists of the first


element.

• The sorted portion grows by one after every pass.

BM267 16
Insertion sort
i
25 50 10 95 75 30 70 55 60 80

25 50 10 95 75 30 70 55 60 80

10 25 50 95 75 30 70 55 60 80

10 25 50 95 75 30 70 55 60 80

10 25 50 75 95 30 70 55 60 80

10 25 30 50 75 95 70 55 60 80
BM267 17
Insertion sort- Worst Case
i
90 80 70 60 50 40 30 20 10 # of Comparison = 1

i
80 90 70 60 50 40 30 20 10 # of Comparison = 2

i
70 80 90 60 50 40 30 20 10 # of Comparison = 3
i
60 70 80 90 50 40 30 20 10 # of Comparison = 4
i
50 60 70 80 90 40 30 20 10 # of Comparison = 5

BM267 18
Insertion sort- Worst Case
i
40 50 60 70 80 90 30 20 10 # of Comparison = 6

i
30 40 50 60 70 80 90 20 10 # of Comparison = 7

i
20 30 40 50 60 70 80 90 10 # of Comparison = 8

10 20 30 40 50 60 70 80 90

BM267 19
Insertion sort- Worst Case
• For size n, total # of comparisons:
T(n)worst = n-1 + n-2 + n-3+ … + 2 + 1 = ( n-1)n / 2
T(n) avg = N2/ 4

BM267 20
Insertion sort

void insertionSort(int A[], int N)


{ int i, j, next;

for (i=1; i<N; i++)


{
next = A[i];

for(j=i; j>0 && next < A[j-1]; j--)


A[j] = A[j-1];
A[j] = next;
}
}

BM267 21
Comparing insertion sort / selection sort

Selection sort Insertion sort

scan unsorted portion of array scan sorted portion of array

the amount of work does not for already sorted arrays runs
depend on the type of input faster.

insert each element into its final after the insertion every element
position can be moved later.

minimal amount of element a lot of shifts.


exchanges

BM267 22
Bubble sort
• Bubble sort compares adjacent elements of the list and
exchange them if they are out of order.
• After the first pass, we put the largest element to the
last position in the list.
• The next pass puts the second largest element in its
position( just before the last position).

First pass 25 50 10 95 75 30 70 55 60 80

25 10 50 95 75 30 55 70 60 80

25 10 50 95 75 30 55 70 60 80

BM267 23
Bubble sort

25 10 50 75 95 30 55 70 60 80

25 10 50 75 30 95 55 70 60 80

25 10 50 75 30 55 95 70 60 80

25 10 50 75 30 55 70 95 60 80

25 10 50 75 30 55 70 60 95 80

BM267 24
Bubble sort
25 10 50 75 30 55 70 60 80 95

void Bubblesort(int A[], int N)


for (i=0; i<N-1; i++)
for (j= 0; j<n-2-i; j++)
if (A[j+1] < A[j])
Swap(A[j], A[j+1]);

•For size n, total # of comparisons:


T(n) = n-1 + n-2 + n-3+ … + 2 + 1 = ( n-1)n / 2

BM267 25
BM267 - Introduction to Data
Structures

7. Quicksort

Bm 267 1
Quicksort

• Quicksort uses a divide-and-conquer strategy


– A recursive approach
– The original problem partitioned into simpler
subproblems.
– Each sub problem considered independently.
• Subdivision continues until sub problems are
simple enough to be solved directly.

Bm 267 2
Quicksort - Example 1

How to partition an array A[p,r]:


2 8 7 1 3 5 4
p ... r

Choose some element called a pivot


(Usually the rightmost or leftmost element)

2 8 7 1 3 5 4
p ... r

Bm 267 3
Quicksort - Example 1

The array will have three sections, plus the pivot


element

p ... r

values  pivot values > pivot


pivot

untested

i will point to the high end of the ‘smaller’ sublist


j will point to the high end of the ‘larger’ sublist
Bm 267 4
Quicksort - Example 1
Perform a sequence of exchanges so that
All elements that are less than pivot go to left and
All elements that are greater than the pivot go to right.

i,j i j
2 8 7 1 3 5 4 2 8 7 1 3 5 4
p ... r p ... r

i j i j
2 8 7 1 3 5 4 2 1 7 8 3 5 4
p ... r p r

i j i j
2 8 7 1 3 5 4 2 1 3 8 7 5 4
p ... r p ... r
Bm 267 5
Quicksort - Example 1

i j
2 1 3 8 7 5 4
p .. r
(exchange element i+1
with the pivot)
i
2 1 3 4 7 5 8
p .. r

• This operation divides the array into two smaller


sub arrays,
• Each of which may then be sorted independently in
the same way.
Bm 267 6
Quicksort

Quicksort (A[p..q])
If the array has 0 or 1 elements,
then return. // the array is sorted
else do:
Pick an element in the array to use as the pivot.
Split the remaining elements into two disjoint groups:
– "Smaller" elements not greater than the pivot, A[p...m-1]
– "Larger" elements greater than pivot, A[m+1… r]

Return the array rearranged as:


Quicksort(A[p...m-1]),
pivot,
Quicksort(A[m+1… r]).
Bm 267 7
Quicksort- Example 2
Here is a slightly different partitioning algorithm:

• Select, arbitrarily, the first element, 75, as pivot.


• Search from right for the first element  75, (which is
60)
• Search from left for the first element > 75, (which is
88) i j
75 70 65 88 98 78 99 93 55 59 81 60
• Swap these two elements, and then repeat this process
i j
75 70 65 60 98 78 99 93 55 59 81 88
Bm 267 8
Quicksort- Example 2

i j
75 70 65 60 98 78 99 93 55 59 81 88

75 70 65 60 59 78 99 93 55 98 81 88

75 70 65 60 59 55 99 93 78 98 81 88
When done, exchange the rightmost element in group
"Smaller" with the pivot

55 70 65 60 59 75 99 93 78 98 81 88
75 is now placed appropriately.
Need to sort sublists on either side of 75.
Bm 267 9
Quicksort
int partition(Item a[], int l, int r);
void quicksort(Item a[], int l, int r)
{ int m;
if (r <= l) return;
m = partition(a, l, r);
quicksort(a, l, m-1);
quicksort(a, m+1, r);
}
int partition(Item a[], int l, int r){
int i = l-1, j = r; Item v = a[r];
for (;;){
while (less(a[++i], v)) ;
while (less(v, a[--j])) if (j == l) break;
if (i >= j) break;
exch(a[i], a[j]);
}
exch(a[i], a[r]);
return i; }
Bm 267 10
Quicksort - Analysis

Best Case
– If the pivot results in sub arrays of approximately
the same size.
– T(n) = 2T(n/2) + n – 1
= n log2 n

Bm 267 11
Quicksort - Analysis

Best case O(n log2n)


• We cut the array size in half each time
• So the depth of the recursion in log2n
• O(log2n) * O(n) = O(n log2n)
• Hence in the best and average cases, quicksort
has time complexity O(n log2n)

Bm 267 12
Quicksort - Analysis

O(n2) worst-case
• List already ordered (either way)
• Then the pivot element is the largest or smallest
element: one of the sublists is almost always empty.
• Partitioning always divides the size n array into
these three parts:
• A length one part, containing the pivot itself
• A length zero part, and
• A length n-1 part, containing everything else

Bm 267 13
Quicksort - Analysis

Worst-case P = Pivot element

• We don’t recur on the zero-length part


• Recurring on the length n-1 part requires (in the
worst case) recurring to depth n-1
Bm 267 14
Quicksort - Analysis

• If the array is already sorted, Quicksort is


terrible: O(n2)
• However, Quicksort is on the average O(n
log2n)

• The constants are so good that Quicksort is


generally the fastest algorithm known

• Most real-world sorting is done by Quicksort

Bm 267 15
Quicksort - Possible Improvements

• Almost anything you can try to “improve”


Quicksort will actually slow it down
• One good idea is to switch to a different sorting
method when the subarrays get small (say, 10 or 12)
– Quicksort has too much overhead for small array sizes
• For large arrays, it might be a good idea to check
beforehand if the array is already sorted

Bm 267 16
Quicksort - Possible Improvements
• Often the list to be sorted is already partially ordered.
• An arbitrary pivot gives a poor partition for nearly sorted lists
• In these cases, virtually all the elements either go into the
group "Smaller" or to the "Larger", all through the recursive
calls.
• Quicksort takes quadratic time to do essentially nothing at all.
• There are better methods for selecting the pivot, such as the
median-of-three rule:
Select the median of the first, middle, and last elements
in each sublist as the pivot.
• Median-of-three rule will select a pivot closer to the middle of
the sublist than will the “first-element” rule.

Bm 267 17
Quicksort - Possible Improvements
#define M 10
void quicksort(Item a[], int l, int r)
{ int i;
if (r-l <= M) return;
exch(a[(l+r)/2], a[r-1]);
compexch(a[l], a[r-1]);
compexch(a[l], a[r]);
compexch(a[r-1], a[r]);
i = partition(a, l+1, r-1);
quicksort(a, l, i-1);
quicksort(a, i+1, r);
}
Bm 267 18
Quicksort - Possible Improvements
void sort(Item a[], int l, int r)
{
quicksort(a, l, r);
insertion(a, l, r);
}

Bm 267 19
BM267 - Introduction to Data
Structures

8. Mergesort

BLM 267 1
Mergesort
• Also a divide and conquer algorithm.
• Let’s see how the mergesort works with a
small array.
The first step divides the array into two equally
sized groups.

9 12 31 25 5 8 20 2 3 6

After dividing we have two smaller arrays with


five elements in each array.

BLM 267 2
Mergesort
• Assume that we sort these arrays somehow
with recursive calls.

5 9 12 25 31 2 3 6 8 20

• We have to merge these arrays to get the


solution.

2 3 5 6 8 9 12 20 25 31

BLM 267 3
Mergesort
• As shown, the mergesort algorithm divides the
array its midpoint, sorts the two half-arrays by
recursive call.
• The stopping case for mergesort is when the array
to be sorted consists of one element.
• If there is more than one element, mergesort
carries out these steps:
 Calculate the pieces of the two pieces of the array.
• The last element of the first half, m is (r + l) / 2
• The first element of the second half m+1
 Use recursive calls to sort the two halves. (l, m) and
(m+1, r)
 Finally, activate merge function to combine the two
sorted halves.

BLM 267 4
Mergesort
void mergesort(Item a[], int l, int r){
int m = ( l+r )/2;
if (r <= l) return;
mergesort(a, l, m);
mergesort(a, m+1, r);
merge(a, l, m, r);
}

BLM 267 5
Merge Function
• Let’s us suppose that we have two ordered arrays
a[0]… a[N-1] and b[0]… b[M-1]of
integers that we wish to merge into a third array
c[0]… c[N+M-1].
• The obvious strategy is to choose the
smallest remaining element from a and b.
a 5 9 12 25 31 b 2 3 6 8 20

c 2 ? ? ? ? ? ? ? ? ?

BLM 267 6
Merge Function

a 5 9 12 25 31 b 2 3 6 8 20

c 2 3 ? ? ? ? ? ? ? ?

a 5 9 12 25 31 b 2 3 6 8 20

c 2 ? ? ? ? ? ? ? ? ?
BLM 267 7
Merge Function

a 5 9 12 25 31 b 2 3 6 8 20

c 2 3 ? ? ? ? ? ? ? ?

a 5 9 12 25 31 b 2 3 6 8 20

c 2 3 5 ? ? ? ? ? ? ?
BLM 267 8
Merge Function

a 5 9 12 25 31 b 2 3 6 8 20

2 3 5 6 ? ? ? ? ? ?

Finally

a 5 9 12 25 31 b 2 3 6 8 20

2 3 5 6 8 9 12 20 25 31
BLM 267 9
Merge Function
mergeAB(Item c[], Item a[], int N, Item b[], int M )
{ int i, j, k;
for (i = 0, j = 0, k = 0; k < N+M; k++)
{
if (i == N) { c[k] = b[j++]; continue; }
if (j == M) { c[k] = a[i++]; continue; }
c[k] = ((a[i] < b[j])) ? a[i++] : b[j++];
}
}

BLM 267 10
Mergesort
8 3 2 9 7 1 5 4

8 3 2 9 7 1 5 4

8 3 2 9 7 1 5 4

8 3 2 9 7 1 5 4

3 8 2 9 1 7 4 5

2 3 8 9 1 4 5 7

1 2 3 4 5BLM
7 267
8 9 11
Analysis of Mergesort
• Mergesort requires about nlogn comparisons to
sort n elements.
– T(n) = 2T(n/2) + n-1 (for simplicity assume n = 2 k)
• Mergesort uses extra space proportional n.
• Mergesort is stable, if the underlying merge is stable.

BLM 267 12
BM267 - Introduction to Data
Structures

9. Heapsort

BLM 267 1
(Binary) Heap Structure

The heap data structure is an array organized as a


almost complete binary tree (which is always
balanced ).
The tree is completely filled (except possibly for the
right side of the lowest level)
Which means, if N is the number of heap elements, the
first N array elements are always full.

BLM 267 2
(Binary) Heap Structure

Structural Properties: 1
2
• The root of the tree is located 3
in the first array element A[1]. 4
5
• The left subtree of node A[i] 6
7
is located in A[2i]
8
9
• The right subtree of node A[i] 10
is located in A[2i+1] 11

(Note the 1-based notation for array indices)


BLM 267 3
(Binary) Heap Structure

More structural Properties:


• Left subtrees are rooted on even numbered array
elements,
• Right subtrees are rooted on odd numbered array
elements.
• Parent of node i is in node A[  i / 2  ].
• Heap with N elements has height = log2 N.

(  x  denotes truncation to integer.)

BLM 267 4
(Binary) Heap Structure
The tree nodes are located in the array in a top-down, left-to-
right traversal order.
• A[1] contains the largest element;
• Elements in A[2] .. A[N] are not sorted. 22

12 14

9 11 13 5

4 2 8

22 12 14 9 11 13 5 4 2 8
1 2 3 4 5 6 7 8 9 10 11 12 ..
BLM 267 5
(Binary) Heap Structure

Content Properties: A heap must satisfy either of:


• Max-Heap property: A[parent(i)]  A[i]
• The value at node i cannot be greater than its parent.
• Which means that the value at the root has the largest value
currently in the heap.
• Min-Heap property: A[parent(i)]  A[i]
• The value at node i cannot be smaller than its parent.
• Which means that the value at the root has the minimum
value currently in the heap.

BLM 267 6
(Binary) Heap Structure

Some terminology:
• Height of a node: the number of segments of a downward
path to the farthest leaf node.
• Height of the tree: the height of the root node.

Height 2 Height 1

Height 1 Height 0

Height 0

BLM 267 7
Heap Operations

MaxHeapInsert( ) Adds an item to the heap


MaxHeapify( ) Maintains the heap property.
BuildMaxHeap( ) Converts an unordered array into a heap
HeapSort( ) Sorts the array in place
HeapExtractMax( ) Removes the item at the root

BLM 267 8
MaxHeapInsert( ) - Insert an item into the heap

Given: A heap of size M, and a new element.


Operation: Insert the new element into next available slot.

14 12 11 9 5 13
1 2 3 4 5 6 7 8 9 10 11 12 ..

14

12 11

9 5 13 Next empty tree node


BLM 267 9
MaxHeapInsert( )

Given: A heap of size N, and a new element.


Operation:
• Insert new element into next available slot.
• Bubble up until the heap property is established

14 12 11 9 5 13
1 2 3 4 5 6 7 8 9 10 11 12 ..

14

12 13 Now the heap is of


size N+1

9 5 11 O(log2 N) operations
BLM 267 10
MaxHeapify( ) - Combine heaps with the parent
Given: Node i, whose children at nodes 2i and 2i+1
already heapified.
Operation: Let the value of A[i] float down until the node
A[i] becomes a heap.

12 11

9 2 6 7

13 8 1 5
4 12 11 9 2 6 7 13 8 1 5
1 2 3 4 5 6 7 8 9 10 11
BLM 267 11
MaxHeapify( ) - Combine heaps with the parent

If the heap property is not satisfied, exchange the child node


with the largest value and A[i].

12 11

9 2 6 7

13 8 1 5
4 12 11 9 2 6 7 13 8 1 5
1 2 3 4 5 6 7 8 9 10 11
BLM 267 12
MaxHeapify( ) - Combine heaps with the parent

If the heap property is not satisfied at the selected child node,


repeat the process.

4
O(log N) operations.
12 11

9 5 6 7

13 8 1 2
4 12 11 9 2 6 7 13 8 1 5
1 2 3 4 5 6 7 8 9 10 11
BLM 267 13
BuildMaxHeap( ) - Convert an array into a heap
Given: An unordered array of size N.
Operation:
• Convert each node in the tree into a heap, bottom-up.
• Nodes A[N/2+1]..A[N]) are already heaps of size 1.
• Convert nodes A[N/2] .. 1 into heaps, using MaxHeapify(i)
4

12 11

O(N) MaxHeapify calls


9 2 6 7

13 8 1 5
4 12 11 9 2 6 7 13 8 1 5
1 2 3
BLM 267
4 5 6 7 8 9 10 1411
HeapExtractMax( ) - Remove the item at the root
Given: A heap of size N,
Operation:
• Exchange root with rightmost leaf.

13

12 11

9 5 6 7

3 8 1 2
13 12 11 9 5 6 7 3 8 1 2
1 2 3 4 5 6 7 8 9 10 11
BLM 267 15
HeapExtractMax( ) - Remove the item at the root
Let the value of A[1] float down until the heap property is
established.
Proceed in the direction of the child node with the larger value.

2 12

12 11 2 11

9 5 6 7 9 5 6 7

3 8 1 13 3 8 1 13

Not in heap
any longer

BLM 267 16
HeapExtractMax( ) - Remove the item at the root
Proceed in the direction of the child node with the larger value.

12 12

9 11 9 11

2 5 6 7 8 5 6 7

3 8 1 13 3 2 1 13

Heap property satisfied.

O(log N) operations.
BLM 267 17
Heapsort( )

Illustrate the operation of Heapsort(A) on the array


whose element values are:
27 17 10 16 13 9 1 5 7 12 4 8 3 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14

Note: Since there a 14 nodes, and lg 8 = 3 < log 14 < log 16 = 4,


• there are 4 levels (0,1,2 and 3) in the tree,
• The height of the tree is 3.
27

17 10

16 13 9 1

5 7 12 4 8 3 0
BLM 267 18
Heapsort
27

17 10
exchange A[1]  A[i]
16 13 9 1
0
5 7 12 4 8 3 0 17 10

16 13 9 1

5 7 12 4 8 3 27
Max-Heapify(A,1)
17

0
Not in heap
10 any longer
16 13 9 1

5 7 12 4 8 3 27

BLM 267 19
Heapsort
17
16 10
0 13 9 1

5 7 12 4 8 3 27 17

16 10

7 13 9 1

5 0 12 4 8 3 27
exchange A[1]  A[i]
3
16 10
7 13 9 1

5 0 12 4 8 17 27 Not in heap
any longer
BLM 267 20
Heapsort
Max-Heapify(A,1)
3

16 10

7 13 9 1
16
5 0 12 4 8 17 27
3 10

7 13 9 1

5 0 12 4 8 17 27

16

13 10

7 3 9 1

5 0 12 4 8 17 27
BLM 267 21
Heapsort

16 Heap property is satisfied for


13 the first N-2 elements.
10

7 12 Last two elements are in


9 1
their (sorted) place.
5 0 3 4 8 17 27

Perform N-1 extract-max operations during sort.


O(N log N).
No extra storage.
BLM 267 22
Heapsort - Analysis

• Here’s how the heapsort algorithm starts:


heapify the array;
• Heapifying the array: we add each of n nodes
• Each node has to float up, possibly as far as the
root
• Since the binary tree is perfectly balanced,
sifting up a single node takes O(log n) time
• Since we do this n times, heapifying takes
n*O(log n) time, that is, O(n log n) time

BLM 267 23
Heapsort - Analysis

• Here’s the rest of the algorithm:


while the array isn’t empty {
remove and replace the root;
heapify the new root node;
}
• We do the while loop n times (actually, n-1 times),
because we remove one of the n nodes each time
• Removing and replacing the root takes O(1) time
• Therefore, the total time is n. How long does the
heapify( ) operation take?

BLM 267 24
Heapsort - Analysis

• To heapify the root node, we have to follow one path


from the root to a leaf node (and we might stop before
we reach a leaf)
• The binary tree is perfectly balanced
• Therefore, this path is O(log n) long
• And we only do O(1) operations at each node
• Therefore, heapify( ) takes O(log n) times
• Since we heapify inside a while loop that we do n
times, the total time for the while loop is
n*O(log n), or O(n log n)

BLM 267 25
Heapsort - Analysis

• Here’s the algorithm again:


heapify the array;
while the array isn’t empty {
remove and replace the root;
reheap the new root node;
}
• We have seen that heapifying takes O(n log n) time
• The while loop takes O(n log n) time
• The total time is therefore O(n log n) + O(n log n)
• Which is equivalent to O(n log n) time

BLM 267 26
Priority Queue

• Problem:
• Maintain a dynamically changing set S so that every
element in S has a priority (key) k.
• Allow efficiently reporting the element with maximal
priority in S.
• Allow the priority of an element in S to be increased.

BLM 267 27
Priority Queue

• A (max-)priority queue supports the following operations:

• Insert(S, x, k): Insert element x into S and give it priority k.

• Delete(S, x): Delete element x from S.

• Find-Max(S): Report the element with maximal priority in S.

• Delete-Max(S): Report the element with maximal priority in


S and remove it from S.

• Change-Priority(S, x, k): Change the priority of x to k.

BLM 267 28
Binary Heap as Priority Queue
• Binary heaps are binary trees that satisfy the following
heap property:

• For every node v with parent u, let kv and ku be the


priorities of the elements stored at v and u.
• Then kv ≤ ku.
75

54 65

27 14 41 13

3 23 4
BLM 267 29
Heaps, Priority Queues

• Priority queues support operations:


– Insert, Delete, and Increase-Key
– Find-Max and Delete-Max
• Binary heaps are priority queues that support the
above operations in O(lg n) time.
• We can sort using a priority queue.
• Heapsort:
– Sorts using the priority-queue idea
– Takes O(n lg n) time (as Mergesort)
– Sorts in place (as Insertion Sort)

BLM 267 30
BM267 - Introduction to Data
Structures

12. Searching and BST

Bm267 1
Searching
Sequential search
The algorithm simply compares successive elements
of a given array or list with a given key until either a
match is encountered (successful search) or the list is
exhausted without finding the a match(unsuccessful
search).
Algorithm Sequential Search( A[0…n-1], Key)
for( int i =0; i < n; i++)
if( A[i] == Key )
return 1;
return 0;

Bm267 2
Searching
• An improvement can be incorporated in sequential
search if the given list or array is sorted: searching
in such list can be stopped as soon as an element
greater than or equal to the search key is
encountered.
• Analysis of sequential search
– Average case: the sequential search compares n/2
element of the list
– Worst case: worst case occurs in two cases, either the
search is unsuccessful or the key is found at the end of
the list. Therefore in these cases sequential search has
to make n comparisons

Bm267 3
Searching
String matching
• String matching problem is searching a string of m
characters called pattern in a given string of n
characters called text. (m <= n)
• To put it more precisely, we want to find i-the
index of the leftmost character of the first matching
substring in the text-such that ti = p0, …,
ti+j= pj,…,ti+m-1 = pm-1.
t0 … ti … ti+j … ti+m-1 … tn-1 text T

p0 … pj … pm-1 pattern P

Bm267 4
Searching
• A brute-force algorithm for the string-matching problem is
quite obvious: align the pattern against the first m
characters of the text and start matching the corresponding
pairs of characters from left to right until either all m pairs
of the characters match or a mismatching pair is
encountered.
• If there is a mismatch, the pattern is shifted one position to
the right and character comparisons are resumed.
Algorithm BruteForceStringMatching( T[0…n-1], P[0…m-1])
for( i =0; i < n-m; i++)
for( j =0; j < m && P[j] == T[i+j]; j++);
if( j == m )
return i;
return –1;
Bm267 5
Searching
N O B O D Y _ N O T I C E D _ H I M

N O T
N O T
N O T
N O T
N O T
N O T
N O T
N Bm267
O T 6
Searching
• Analysis of string matching: in the worst case the
algorithm has to make (n*m) comparisons.
Binary search
• Binary search is an efficient algorithm for searching in a
sorted array. It works comparing a search key with the
array’s middle element a[m]. If they match, the algorithm
stops; Otherwise , the same operation is repeated
recursively for first half of the array if key < a[m] and for
the second half if key > a[m].
key

a[0] … a[m-1] a[m] a[m+1] … a[n-1]



Search here if key< a[m] search here if key>a[m]
Bm267 7
Searching
Searching Key = 70

index 0 1 2 3 4 5 6 7 8 9 10 11 12
value 3 14 27 31 39 42 55 70 74 81 85 93 98
iteration 1 l m r
iteration 2 l m r
iteration 3 l,m r

Bm267 8
Searching
Algorithm BinarySearch( A[0…n-1], Key)
l = 0; r = n-1;
while(l <= r ){
m = (l+r)/2;
if(K = A[m])
return m;
else if ( K < A[m])
r = m-1;
else
l = m+1
}
return –1;

Bm267 9
Searching
Analysis of binary search
The average case analysis of binary search depends
on the key and values in the array.
The worst case occur if the search is unsuccessful. In
this case algorithm has to make:
T(n) = T(n/2) + 1 comparisons
= log2n

Bm267 10
BM267 - Introduction to Data
Structures

12. Binary Search Tree

Bm267 1
BST
• The basic operations (search, insert, delete)
can be performed in O (logn) time when a
balanced search tree is used.
• Definition: A binary search tree is a binary
tree. A nonempty binary search tree
satisfies the following properties:
1.Every element has a key( or value) and no
two elements have the same key; therefore
all keys are distinct.
2.The keys (if any ) in the left subtree of the
root are smaller than the key in the root.

Bm267 2
BST
3. The keys ( if any ) in the right subtree of the root
are larger than the key in the root.
4. The left and right subtrees of the root are also a
binary search trees. root
root root
60
20 30
70
15 25 5 40

65 80
12 10 22 7

Bm267 3
BST
• We can remove the requirement that all elements
in a binary search tree need be distinct keys. Now
we replace smaller in property 2 by smaller than
or equal to.
• The resulting tree is called a binary search tree
with duplicates.
struct TreeNode
{
int data;
struct TreeNode *left;
struct TreeNode *right;
}
Bm267 4
BST
Linked representations of binary search trees.
root

B G

A E M

Bm267 5
BST- Searching
• Suppose we wish to search for an element with key k. We
begin at the root. If the root is NULL, the search tree
contains no elements and the search is unsuccessful.
• Otherwise, we compare k with the key in the root. If key is
less than the root, then no element in the right subtree can
have key value k and only left subtree is to be searched.
• If the key is larger than the key in the root, only right
subtree needs to be searched.
• If k equals the key in the root, then the search terminates
successfully.
• The subtrees can be searched similarly.
• The time complexity is O(h) where h is the height of the
tree.

Bm267 6
BST- Searching
int tree_search ( int key ){
struct treenode *p = root;
while( p !=NULL ){
if( key < p->data)
p = p->left;
else if ( key > p->data)
p = p->right;
else
return 1; }

return 0;}

Bm267 7
BST- Searching
int search_rec(struct treenode *p, int key ){
if( p == NULL )
return 0;
if( key == p->data )
return 1;
if ( key < p->data )
return search_rec( p->left, key);
else
return search_rec( p->right, key);
}

Bm267 8
BST- Insertion
• To insert a new element ‘e’ into a binary search
tree, we must first verify that its key is different
from those of existing elements by performing a
search in the tree.
• If the search is unsuccessful, then the element is
inserted at the point the search terminated.
• For instance, to insert an element with key 80 into
tree, we must search for 80.
• This search terminates unsuccessfully, and the last
node examined the one with key 40.
• The new element is inserted as the right child of
this node.
Bm267 9
BST- Insertion
void tree_insert( int key ){
struct treenode *p, *pp, *r;
p = root; // search pointer
pp = NULL; //parent of p
while( p ){
pp = p;
if( key < p->data)
p = p->left;
else if ( key > p->data)
p = p->right;
else
printf(" The key is already in the tree.\n"); }
Bm267 10
BST- Insertion
r = malloc( sizeof(struct treenode * ));
r->data = key;
r->left = NULL;
r->right = NULL;
if(root) // tree is not empty {
if( key < pp->data)
pp->left = r;
else
pp->right = r; }
else // insertion into an empty tree
root = r; }

Bm267 11
BST- Deletion
• For deletion we consider three possibilities for the
node p that contains the element to be deleted:
 p is a leaf
 p has one nonempty subtree
 p has two nonempty subtree
• Case 1 is handled by discarding the leaf node.
• To delete 35 from the tree, the left-child field of
its parent is set to NULL and node is discarded.

Bm267 12
BST- Deletion
root
30

5 40 pp

2
p 35 80

pp->left = NULL

Bm267 13
BST- Deletion
• Next consider the case when p has only one nonempty
subtree.
• if p has no parent (i.e. it is the root), node p is discarded and
the root of its single subtree becomes the new search tree root.
root root
root = p->right;
p 30 40

40
35 80

35 80

Bm267 14
BST- Deletion
• if p has a parent pp, then we change the pointer
from pp so that it points to p’s only child and then
delete the node p.
• For instance if we want to delete 5, we change the
left-child field of its parent( the node containing
30) to point the node containing the 2.
root pp->left = p->left;
30 pp

p 5 40

2 35 80
Bm267 15
BST- Deletion
• Finally, to delete an element that has two
nonempty subtrees, we replace this element with
either the largest element in its left subtree or the
smallest element in its right subtree.
root root
30
30

5 40 5 60

2 35 80
2 35 80

32 85
32 60 85

31 33
Bm267 31 33
16
BST- Deletion
root
root
30
30

5 35
5 40

2 32 80
2 35 80

85 31 33 60 85
32 60

31 33

Bm267 17
BST- Tree Min
• An element in binary search tree whose key is the
minimum can always be found by following left
child from the root until a NULL is encountered.
root
30

5 35

2 32 80

31 33 60 85

Bm267 18
BST- Tree Min
struct node * tree_min(struct node * n )
{
while( n->left != NULL )
n = n->left;
return n;
}

Bm267 19
BST- Tree Max
• An element in binary search tree whose key is the
maximum can always be found by following right
child from the root until a NULL is encountered.
root
30

5 35

2 32 80

31 33 60 85

Bm267 20
BST- Tree Max
struct node * tree_max(struct node * n )
{
while( n->right != NULL )
n = n->right;
return n;
}

Bm267 21
BST- Successor
• The successor of a node ‘x’ is the node with the
smallest key greater than ‘x’.
• If the right subtree of node x is nonempty, then the
successor of x is the leftmost node in the right
subtree. 30

5 35

2 32 80

31 33 60 85

Bm267 22
BST- Successor
• If the right subtree of node x is empty and x has a
successor y, then y is the lowest ancestor of x
whose left child is also an ancestor of x.
y 30

5 35

2 7 32 80
13 x
9

Bm267 23
BST- Successor
struct node * tree_succ(struct node * n ){
struct node *y;
if( n->right != NULL )
return tree_min(n->right );
y = n->parent;
while( y != NULL && n == y->right)
{
n = y;
y = y->parent;
}
return y;}

Bm267 24
BST- Analysis
What is the run-time of search, insert, and delete?
• Proportional to the height of the tree
What is the height of a tree with n nodes?
• Worst-Case: O(n) in a linear tree
• Best-Case: O(log n) in a complete binary tree
30 2
5
5 40
7
2 20 35 80 20
Bm267 40 25
• References: Jeffrey S. Childs
Clarion University of PA

Bm267 Fall 2005 26


BM267 - Introduction to Data
Structures

13. Balanced Search Trees

Bm267
Balanced Search Trees
• Binary search trees in the average requires ‘logn’
comparison for each operation (searching, deletion
and insertion), unfortunately in the worst case they
need ‘n’ comparisons.
• Computer scientist come up with some solutions
to find a structure that preserves the good
properties of the classical binary search tree.
– AVL Trees
– 2-3 Trees
– Red Black Trees

Bm267
AVL Trees
• Definition: An AVL tree is binary search tree in
which the balance factor of every node, which is
defined as the difference between the heights of the
node’s left and right subtrees, is either 0 or +1 or –1.

10 1 10 2

0 5 15 1 0 5 15 0

1 4 7 -1 20 0 1 4 7 -1

2 0 8 0 2 0 8 0
Bm267
AVL Trees
• If an insertion of a new node makes an AVL tree
unbalanced, we transform the tree by rotation.
• A rotation in an AVL tree is a local
transformations of its subtree rooted at a node
whose balance has become either +2 or –2.
• There are only four types of rotations; in fact two
of them are mirror images of the other two.
2 10 5 0

1 5 0 4 10 0
Single R rotation
0 4
Fall 2005 Bm267
AVL Trees
1 -2 2 0

2 -1 0 1 3 0
Single L rotation
3 0

2 3 3 2 2 0

1 -1 2 1 0 1 3 0

0 2 1 0

Bm267
Double LR rotation
AVL Trees
-2 1 -2 1
2 0
1 3 2 1
0 1 3 0
0 0
2 3

Double RL rotation

Bm267
AVL Trees
C
R Single R rotation

R
C

T3 T1

T2 T3
T1 T2

{T1} < C < {T2} < R < {T3}

Bm267
AVL Trees
C
R Single L rotation
R
C

T3
T1
T1 T2
T2 T3

{T1} < R < {T2} < C < {T3}


Bm267
AVL Trees
R

G T4

T1
T2 T3
Double LR rotation
or

{T1} < C < {T2} < G < {T3} <R < {T4}
Bm267
AVL Trees
G

C R

T2 T3

T1 T4
or

{T1} < C < {T2} < G < {T3} <R < {T4}
Bm267
AVL Trees
0 -1
5 5 -2 5 6 0
L

0 6 -1 6 0 5 8 0

0 8

R
6 1 2 6 6 1

1 5 8 0 2 5 0 8 0 3 8 0

0 3 3 1 0 2 5 0

2 0 Bm267
AVL Trees

6 2 LR 5 0

-1 3 8 0 0 3 6 -1

0 2 5 1 0 2 4 0 8 0

4 0

Bm267
AVL Trees

5 -1 RL 5 0

0 3 6 -2 0 3 7 0

0 2 4 0 8 1 0 2 4 0 06 0 8

7 0

Bm267
AVL Trees
• How efficient is an AVL trees?
• The height h of any AVL tree with n nodes
satisfies the inequalities
log2n  h < 1.4405log2(n+2)–1.3277

Bm267
2-3 Search Trees
• The second idea of balancing a search tree is to
allow more than one key in the same node.
• The simplest implementation of this idea is 2-3
trees.
• A 2-3 tree is a tree that can have nodes of two
kinds: 2-nodes and 3-nodes.
• A 2-node contains a single key K and has two
children: the left child serves as the root of a
subtree whose keys are less than K and the right
child serves as the root of a subtree whose keys
are greater than K.

Bm267
2-3 Search Trees

<K >K

2-node
Bm267
2-3 Search Trees
• A 3-node contains two ordered keys K1 and K2
(K1 < K2 ) and has three children.
• The leftmost subtree has keys less than K1.
• The middle subtree has keys between K1 and K2.
• The rightmost subtree has keys greater than K2

Bm267
2-3 Search Trees

K1 < K2

< K1 (K1, K2) > K2

Bm267
2-3 Search Trees
• The last requirement of the 2-3 tree is that its
leaves must be on the same level.
• A 2-3 tree is always height balanced;the length of
a path from the root of the tree to a leaf must be
same for every leaf.
• Searching for a given key K in a 2-3 tree quite
straightforward. We start with the root. If the root
is a 2-node, we act as if it were a binary search
tree: we either stop if K is equal to the root’s key
or continue the search in the left or right subtree.

Bm267
2-3 Search Trees
• If the root is a 3-node, we know after no more than
two key comparisons whether the search can be
stopped ( If K is equal to one of the root’s keys) or
in which of the root’s three subtrees it needs to be
continued.

3, 8

2 4,5 9

Bm267
2-3 Search Trees
• Inserting a new key in a 2-3 tree is done as
follows:
– We always insert a new key K at a leaf except for the
empty tree.
– The appropriate leaf is found by performing a search
for K.
• If the leaf in question is a 2-node key, we insert K
there as either the first or the second key, depending
on whether K is smaller or larger than the node’s old
key.
• If the leaf is a 3- node, we split the leaf in two; The
smallest of the three is put in the first leaf, the
largest key is put in the second leaf , while the
middle key is promoted to the old leaf’s parent.
Bm267
2-3 Search Trees
• If the leaf happens to be the tree’s root, a new root
is created to accept the middle key.
• Note that promotion of a middle key to its parent
can cause the parent’s overflow and hence can
lead to several node splits along the chain of the
leaf’s ancestors.
9,5,8,3,2,4,7

9 5, 9 5, 8, 9 8

5 9

Bm267
2-3 Search Trees
8 8 3, 8

3, 5 9 2, 3, 5 9 2 5 9

3, 8 3, 8

2 4,5 9 2 4, 5, 7 9

Bm267
2-3 Search Trees
3, 5, 8 5

3 8
2 4 7 9

2 4 7 9

Bm267
2-3 Search Trees
• The efficiency of the dictionary operations depends on the
tree’s height.
• Let’s find an upper bound for a 2-3 tree. A 2-3 tree of
height h with the smallest number of keys is full tree of 2-
nodes.
n >= 1+2+…+2h = 2h+1 –1
and hence
h <= log2(n+1) –1
• On the other hand, a 2-3 tree of height h with the largest
number of keys is a full of 3-nodes, each with two keys
and three children.
n <= 2*1+2*3+…+2*3h = 3h+1 –1
h>=log3(n+1) –1
Therefore: log3(n+1)–1 <= h <= log2(n+1)–1

Bm267
BM267 - Introduction to Data
Structures

14. Hashing

Bm267 1
Hashing
• A dictionary is an abstract type. A set of
operations searching, insertion, and deletion are
defined on its element.
• The elements of this set can be of numbers,
characters, character strings and so on.
• Typically, tables have several fields, each
responsible for keeping a particular type of
information about an entity.
• For example, a student record may contain fields
for student’s ID, name, date of birth and major and
so on.
Bm267 2
Hashing
• At least one field called key is used for
identifying entities ( the student’s ID ).
• Hashing is based on the idea of distributing n keys
among a one-dimensional array H[0…m-1] called
a hash table.
• The distribution is done by computing the value of
some predefined functions h called hash function.
• This function assigns an integer between 0 and
m-1, called hash address, to a key.

Bm267 3
Hashing

0
U
(universe of keys) h() h(k1)

h(k4)
k1 k4
k2
h(k2)=h(k5)
k5
k3 collision
h(k3)
K (current keys)

m–1

Bm267 4
Hashing
• The hash function depends on the key type.
• For example, if keys are positive integers, hash
function can be of the form h(k)= k mod m.
This function assures that the result is always
between 0 and m-1.
• In general, hash function needs to satisfy two
requirement:
 A hash function needs to distribute keys among the
cells of the hash table as evenly as possible. ( Because
of this requirement the value of m is usually chosen to
be prime.)
 A hash function has to be easy to compute
Bm267 5
Hashing
• If we choose a hash table’s size m to be smaller than
the number of keys n, we will get collisions ( two
(or more) keys being hashed into same cell of the
hash table).
• In fact, in the worst case, all the keys could be
hashed to the same cell of the hash table.
• With an appropriately chosen size of the hash table
and a good hash function, this situation will happen
rarely.
• Still, every hashing scheme must have a collision
resolution mechanism.
• There are two version of hashing: open hashing
(separate chaining) and closed hashing (open
Bm267 Fall 2005 6
addressing)
Collisions

• The goal of the hash function is to distribute the


keys in an apparently in a random way to prevent
mapping different keys to the same table index.

• Total prevention is not possible in practice.

• When h(x) = h(y) for x ≠ y, this is called a collision


• Collisions occur when different elements are
mapped to the same cell.

Bm267 7
Collisions

• As the number of elements in the table increases, the


likelihood of a collision increases - so make the table
as large as possible.
• If the table size is 100, and all the hashed keys are
divisible by 10, there will be many collisions!
• Particularly bad if table size is a power of a small integer
such as 2 or 10
• Therefore, make the table size a prime number

Bm267 8
Collisions handling

• Open Hashing (Separate chaining): Create an array


of linked list, so that the item can be inserted into the
linked list if collision occurs.
• Closed Hashing (Open Addressing): Search the array
in some systematic way for an empty cell and insert
the new item there if collision occurs.

Bm267 9
Open Hashing (Separate Chaining)
• In open hashing, keys are stored in linked
list attached to cells of a hash table.
keys A S E R C H I N G X M P L
hash value 0 2 0 4 4 4 2 2 1 2 4 3 3

0 A 0 A
1 1
2 2 S
3 3
4 4
Bm267 10
Open Hashing (Separate Chaining)
0 E A
1

2 S
3
4 0 E A
1 G
2 X N I S

3 L P
4 M H C R
Bm267 11
Open Hashing (Separate Chaining)
• Consider , as an example, the following list of
words:
A, FOOL, AND, HIS MONEY, ARE, SOON, PARTED
• As a hash function, we will use the simple
function for strings, that is we will add the
positions of a word’s letters in the alphabet and
compute the sum’s remainder after division by 13.
• We start with an empty table.
• The first key is the word “A”; its hash is
h(A) = 1 mod 13 = 1

Bm267 12
A
Open Hashing (Separate Chaining)
• The second key –the word FOOL; its hash is
h(FOOL) = (6 +15 +15+12) mod 13 = 9
0 1 2 3 4 5 6 7 8 9 10 11 12

A
FOOL
0 1 2 3 4 5 6 7 8 9 10 11 12

A
AND FOOL SOON PARTED

Bm267 13
ARE
Open Hashing (Separate Chaining)
• A collision occurred at position 11, because h(ARE) =
(1+8+5) mod 13 = 11 and h(SOON) = (19+15+15+14) mod
13 = 11.
Searching
•Apply same procedure to search key that was used for
creating the table.
•If we want to search a key “KID” in the hash table, we first
compute the value of the same hash function for the key:
h(KID) = 11.
•Since the list attached to cell 11 is not empty, so the list may
contain the key. After comparing the string KID first with
SOON and then ARE, we end up with an unsuccessful search.

Bm267 14
Open Hashing (Separate Chaining)
• The efficiency of searching depends on the length
of the list.
• The length of the list depends on the table size
and the quality of the hash function.
• If the hash function distributes n keys among m
cells of the hash table evenly, each list will be
about n/m keys long.
• The ratio  = n/m called the load factor of the
hash table.
• The average number of pointers inspected in
successful search will be 1 + /2. In unsuccessful
search, it will be .
Bm267 15
Open Hashing (Separate Chaining)
• The other two dictionary operations –
insertion and deletion are almost identical to
searching.
• Insertion are done to the front or end of the
list.
• Deletion is performed by searching for a
key to be deleted and then removing it from
the its list.

Bm267 16
Closed Hashing( Open Addressing)
• In closed hashing, all keys are stored in the hash
table itself without the use of linked list.
• This implies that the table size m must be at least
as large as the number of keys n.
• Different strategies can be employed for collision
resolution. The simplest one – called linear
probing- checks the cell following the one where
collision occurs. If that cell is empty, the new key
is saved there, if not the first empty cell is
searched. If the end of the table is reached, the
search starts from the beginning of the table. The
table is treated as a circular array.

Bm267 17
Linear Probing
keys A FOOL AND HIS MONEY ARE SOON PARTED
hash ad 1 9 6 10 7 11 11 12
0 1 2 3 4 5 6 7 8 9 10 11 12
A
A FOOL
A AND FOOL
A AND FOOL HIS
A AND MONEY FOOL HIS
A AND MONEY FOOL HIS ARE
A AND MONEY FOOL HIS ARE SOON
PARTED A AND MONEY FOOL HIS ARE SOON

Bm267 18
Linear Probing
• To search for a given key k, we start by computing h(k)
where h is the hash function used in the table’s
construction.
• If the cell is empty the search is unsuccessful.
• If the cell is not empty, we must compare k with the cell: if
they are equal, we have found a matching key; if they are
not we compare k with a key in the next cell and continue
in this manner until either we encounter a matching key or
an empty cell (unsuccessful search)
• For example, if we search the word LIT in the table, first
we need to calculate the hash value. We will get
h(LIT) = ( 12 + 9 +20 ) mod 13 = 2 , since cell is empty
the search is unsuccessful. We can stop immediately

Bm267 19
Linear Probing
• However if we search for KID with h(KID) = (11+9+4)
mod 13 = 11, we will compare KID with ARE, SOON,
PARTED, and, A before we can declare the search is
unsuccessful.
• While the search and insertion operations are
straightforward, deletion is not.
• For example, if we simply delete the key ARE from the
table, we would unable to to find key SOON afterward.
• After computing h(SOON) = 11, the algorithm would find
this location empty and report the search is unsuccessful.
• The solution is to mark previously occupied locations by a
special symbol to distinguish them from locations that
have not been occupied.
Bm267 20
Linear Probing
•Linear probing creates clusters. Clusters are bad news in
hashing because they make operations less efficient.

•Several other collision resolution strategies have been


suggested; Quadratic and Double hashing.
•Quadratic probing uses a hash function of the form
h(k, i)=( h’(k) + c1i + c2i2) mod m
where h’ is an auxiliary hash function, c1 and c2 are auxiliary
constants, and i = 0,1,…, m-1.

Bm267 21
Double Hashing
• Double Hashing is one of the best methods available for
open addressing.
• Double hashing uses a hash function of the form
h(k,i) = ( h1(k) + ih2(k)) mod m
where h1 and h2 are auxiliary hash functions.
• The initial probe is to position h1(k); successive probe
positions are offset from previous positions by the amount
h2(k) mod m.
• The h2(k) must be relatively prime to the hash-table size m.
• A convenient way to ensure this condition, let m be prime
and to design h2 so that it always returns a positive integer
less than m.

Bm267 22
Double Hashing
•For example, we could choose m prime and
let
h1(k) = k mod m,
h2(k) = 1 + (k mod m’)
where m’ is chosen to be slightly less than ( say m-
1).
•If k =123456, m = 701 and m’ =700, we have
h1(k) = 80 and h2(k) =257 so the first probe is
at position 80, and then every 257th slot is
examined.
Bm267 23
Double Hashing

0 1 2 3 4 5 6 7 8 9 10 11 12
79 69 14 92 72

h1(k) = k mod 13
h2(k) = 1 + ( k mod 11)

Bm267 24

You might also like