Data Structures
Data Structures
Structures
1. Introduction
1/64
Objectives
1-2
2-7
3-4
7-4
9-7
8-7
6-8
...
1
3
0
BLM 267 17/64
Example: Connectivity (6) - Abstract operations
We will:
• Assume an upper limit for the number of nodes, N.
• Use an array that has N elements (one int per node).
• Nodes are numbered (0..N-1), corresponding to array
indices.
0 1 2 3 N-2 N-1
0 1 2 3 . . . . N-2 N-1
0 1 2 3 N-2 N-1
N = 10000
ID[N] is an array of int
FOR (i = 0 TO N-1) ID[i] i
#define N 10000;
int i;
int ID[N];
for (i = 0; i < N; i++)
ID[i] = i ;
BLM 267 21/64
Example: Quick-Find(2) -Operations
N = 10000
ID[N] is an array of int
FOR (i = 0 TO N-1) ID[i] = i
WHILE (a new pair p-q exists)
{
IF ID[p] == ID[q] Find operation
{ continue }
ELSE
{ FOR i=0 to N-1
IF ID[i] == ID[p] Union operation
THEN ID[i] = ID[q]
Output p-q
}
BLM 267 23/64
}
Example: Quick-Find(4)
Initial 0 1 2 3 4 5 6 7 8 9
34 0 1 2 4 4 5 6 7 8 9
49 0 1 2 9 9 5 6 7 8 9
80 0 1 2 9 9 5 6 7 0 9
23 0 1 9 9 9 5 6 7 0 9
56 0 1 9 9 9 6 6 7 0 9
BLM 267 24/64
Example: Quick-Find(5)
29 0 1 9 9 9 6 6 7 0 9
59 0 1 9 9 9 9 9 7 0 9
73 0 1 9 9 9 9 9 9 0 9
48 0 1 0 0 0 0 0 0 0 0
56 0 1 0 0 0 0 0 0 0 0
02 0 1 0 0 0 0 0 0 0 0
BLM 267 25/64
Example: Quick-Find(6)
61 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 The initial
state
34
0 1 2 4 5 6 7 8 9
49
0 1 2 9 5 6 7 8
3 4
23
0 1 9 5 6 7
8 2 3 4
29 Do nothing
59
0 1 9 7
8 2 3 4 5 6
8 2 3 4 5 6 7
48
0 1
8 2 3 4 5 6 7 9
56 Do nothing
02 Do nothing
61
1
0 2 3 4 5 6 7 8 9
N = 10000
ID[N] is an array of int
FOR (i = 0 TO N-1) ID[i] = i
WHILE (a new pair p, q exists)
{
FOR (i=p; i != ID[i]; i = ID[i])
FOR (j=q; j != ID[j]; j = ID[j]) Find
IF i != j
THEN { ID[i] = j, output p-q }
}
Union operation
BLM 267 36/64
Example: Quick–union(4)
Initial 0 1 2 3 4 5 6 7 8 9
34 0 1 2 4 4 5 6 7 8 9
49 0 1 2 4 9 5 6 7 8 9
80 0 1 2 4 9 5 6 7 0 9
23 0 1 9 4 9 5 6 7 0 9
56 0 1 9 4 9 6 6 7 0 9
BLM 267 37/64
Example: Quick–union(5)
29 0 1 9 4 9 6 6 7 0 9
59 0 1 9 4 9 6 9 7 0 9
73 0 1 9 4 9 6 9 9 0 9
48 0 1 9 4 9 6 9 9 0 0
56 0 1 9 4 9 6 9 9 0 0
02 0 1 9 4 9 6 9 9 0 0
BLM 267 38/64
Example: Quick–union(6)
61 1 1 9 4 9 6 9 9 0 0
0 1 2 3 4 5 6 7 8 9 The initial
state
34
0 1 2 4 5 6 7 8 9
49
0 1 2 9 5 6 7 8
1 2 9 5 6 7 0
4 8
3
23
1 9 5 6 7 0
2 4 8
3
29 Do nothing
59
1 9 7 0
2 4 6 8
3 5
BLM 267 42/64
Example: Quick–union(10)
73
1 9 0
2 4 6 7 8
3 5
48 0
1 9 8
2 4 6 7
3 5
BLM 267 43/64
Example: Quick–union(11)
56 Do nothing
02 Do nothing
61 1
0
9 8
2 4 6 7
3 5
Size 1 1 1 1 . . . . 1 1
0 1 2 3 N-2 N-1
01 0 0 2 3 . . . . N-2 N-1
2 1 1 1 . . . . 1 1
Initial 0 1 2 3 4 5 6 7 8 9
Size 1 1 1 1 1 1 1 1 1 1
34 0 1 2 3 3 5 6 7 8 9
1 1 1 2 1 1 1 1 1 1
49 0 1 2 3 3 5 6 7 8 3
1 1 1 3 1 1 1 1 1 1
BLM 267 53/64
Example: Weighted quick union(5)
80 8 1 2 3 3 5 6 7 8 3
1 1 1 3 1 1 1 1 2 1
23 8 1 3 3 3 5 6 7 8 3
1 1 1 4 1 1 1 1 2 1
56 8 1 3 3 3 5 5 7 8 3
1 1 1 4 1 2 1 1 2 1
BLM 267 54/64
Example: Weighted quick union(6)
29 8 1 3 3 3 5 5 7 8 3
1 1 1 4 1 2 1 1 2 1
59 8 1 3 3 3 3 5 7 8 3
1 1 1 6 1 2 1 1 2 1
73 8 1 3 3 3 3 5 3 8 3
1 1 1 7 1 2 1 1 2 1
BLM 267 55/64
Example: Weighted quick union(7)
48 8 1 3 3 3 3 5 3 3 3
1 1 1 9 1 2 1 1 2 1
56 8 1 3 3 3 3 5 3 3 3
1 1 1 9 1 2 1 1 2 1
02 8 1 3 3 3 3 5 3 3 3
1 1 1 9 1 2 1 1 2 1
BLM 267 56/64
Example: Weighted quick union(8)
61 8 3 3 3 3 3 5 3 3 3
1 1 1 10 1 2 1 1 2 1
0 1 2 3 4 5 6 7 8 9 The initial
state
34
0 1 2 3 5 6 7 8 9
49
0 1 2 3 5 6 7 8
4 9
BLM 267 58/64
Example:Weighted quick union(10)
80
1 2 3 5 6 7 8
4 9 0
23
1 3 5 6 7 8
4 2 9 0
1 3 5 7 8
4 2 9 6 0
29 Do nothing
59
1 3 7 8
4 2 9 5 0
6
BLM 267 60/64
Example:Weighted quick union(12)
73
1 3 8
4 2 9 5 7 0
6
48
1 3
4 2 9 5 7 8
6 0
02 Do Nothing
61
1 4 2 9 5 7 8
6 0
BLM 267 1
Analysis Framework
• There are two kinds of algorithm efficiency: “time
effinciency” and “space effinciency”. Time
efficiency indicates that how fast an algorithm
runs; space efficiency deals with amount of space
it needs.
• Nowadays the amount of extra space required by
an algorithm is typically not of as much concern,
due to the improvements in memory capacity.
• We are going to concentrate on time efficiency but
analytical framework is applicable to analyzing
space efficiency as well.
BM 267 2
Measuring an Input’s size
• It is logical to investigate an algorithm’s effinciency as a
function of some parameter “n” indicating the algorithm’s
input size.
• The input size will be the size of the lists or arrays for
problems such as sorting, searching, or finding the smallest
element.
BM 267 3
Units for Measuring Running time
• First approach: We can use time measurement ( second, or
milllisecond) to measure an algorithm’s running time.
• There are drawbacks of this approach, such as speed of the
computer, quality of programmer, the compiler used in
generating the machine code.
• Second approach: We can count the number of times each
algorithm’s operations is executed. The thing to do is to
identify the most important operation of the algorithm,
called the basic operation, the operation contributing the
most of the total running time, and compute the number of
times the basic operaion is executed.
• As a rule, it is not difficult to identify the basic operation
of an algorithm: it is usually the most time-consuming
operation in the algorithm’s innermost loop.
BM 267 4
Units for Measuring Running time(2)
• Ex: Sequential Search
int search( int a[0...n-1], int K)
{
int i; executes only once
for( i=0; i < n; i++ ) executes n times
if( a[i] = = K ) executes at most n times
return i; at most once
return –1; at most once
}
BM 267 5
Units for Measuring Running time(3)
• Most sorting algorithms work by comparing elements of a
list with each other; for such algorithms the basic operation
is a key comparision.
• As another example algorithms for matrix multiplication
require two operations; multiplication and addition.
multiplication takes much more longer time than addition
so we can use the total number of multiplication as our unit
for algorithm measure.
• MatrixMultiplication(A[0..n-1, 0..n-1], B[0..n-1, 0..n-1])
for( i=0; i < n; i++)
for( j=0; j < n; j++)
C[i,j]=0;
for( k=0; k < n; k++)
C[i,j] = C[i,j] + A[i,k]*B[k,j];
BM 267 6
Units for Measuring Running time(4)
• Efficiency T(n) is investigated as a function
of some parameter n indicating problem’s
size.
• T(n) is computed as the number of times the
algorithm’s “basic operation” is executed.
BM 267 7
Values of several functions important for analysis of
algorithms
BM 267 8
Worst, Best, and Average Case
•There are many algorithms for which running time depends
not only on an input size but also on a particular input.
BM 267 12
Mathematical Analysis of Nonrecursive Algorithms
• General Plan for Analyzing Efficiency on Nonrecursive
Algorithms
– Decide on parameters indicating an input’s size
– Identify the algorithms basic operations( As a rule, it is
located in its inner loop)
– Check whether the number of times the basic operation
is executed. Investigate worst, best, and average case
efficiency.
– Set up a sum expressing the number of times the
algorithm's basic operation is executed.
– Using standard formulas and rules of sum
manipulation, find a closed form for the count
BM 267 13
Mathematical Analysis of Nonrecursive Algorithms(2)
• Ex 1:
Algorithm MaxElement( A[0…n-1])
{ //Determines the largest element of the array
maxval=A[0];
for( i=1; i < n; i++)
if( A[i]>maxval)
maxval=A[i];
return maxval;
}
• The input size is the number of element in the array; n
• The operation that are going to be executed most often are in the
algorithm’s for loop.
BM 267 14
Mathematical Analysis of Nonrecursive Algorithms(3)
• There are two operations in loop’s body;the comparisons
A[i]>maxval and the assignment maxval=A[i]. We consider the
comparison as the basic operation because it is executed in each
repetition.
• The number of comparisons will be n-1.
• The worst, best and average case efficiency will be same.
• The algorithm makes one comparisons on each execution of the
loop; therefore
n-1
T(n) = 1 = n-1
i=1
BM 267 15
Mathematical Analysis of Nonrecursive Algorithms(4)
• Ex 1:
Algorithm UniqueElements( A[0…n-1] )
{ //checks whether all the elements in a given array are distinct
for( i=0; i n-2; i++)
for( j=i+1; j n-1; j++)
if( A[i] = = A[j])
return false;
return true;
}
• The input’s size is the number of elements in the array: n
• The basic operation is comparison: A[i] = = A[j]
BM 267 16
Mathematical Analysis of Nonrecursive Algorithms(5)
• Worst case occurs in two situations: arrays with no equal
elements and the last two elements are the only equal pairs.
• For such inputs, one comparison is made for each repetition of
innermost loop therefore:
n-2 n-1 n-2 n-2
Tworst(n) = 1= [( n-1)-(i+1) +1] = ( n-1+i)
i=0 j=i+1 i=0 i=0
=(n-1)*n / 2
In the worst case algorithm has to make (n-1)*n / 2 comparisons.
What is efficiency of matrix multiplication?
n-1 n-1 n-1
Solution: = 1 = n3
i=0 j=0 k=0
BM 267 17
Mathematical Analysis of Recursive Algorithms
• General Plan for Analyzing Efficiency on Recursive Algorithms
– Decide on parameters indicating an input’s size
– Identify the algorithms basic operations( As a rule, it is located in its inner
loop)
– Check whether the number of times the basic operation is executed.
Investigate worst, best, and average case efficiency.
– Set up a recurrence relation, with an initial condition, for the number of
times the basic operation is executed.
• Ex 1: Computing the factorial function F(n) = n!
n! = n*(n-1)*(n-2)*…*3*2*1 = n*(n-1)! for n 1 and 0!=1
we can compute F(n)= F(n-1)*n
BM 267 18
Mathematical Analysis of Recursive Algorithms(2)
• Algorithm F(n)
{//Computes n! recursively
If ( n = = 0 )
return 1;
else
return F(n-1)*n; }
• We consider “n” as the algorithm input size.
• The basic operation of the algorithm is again multiplication.
• We denote T(n) the number of multiplications needed to
compute F(n).
T(n) = T ( n – 1 ) + 1
to compute to multiply
F(n-1) F(n-1) by n
BM 267 19
Mathematical Analysis of Recursive Algorithms(3)
• How to solve a recurrence equation
We need a initial condition that tells us the value which
the sequence starts. We can obtain this value by
inspecting the condition that makes the algorithm stop its
recursive calls.
if ( n = = 0 ) return 1;
This tells us two things. First the calls stop when n=0.
Second we can see that when n=0 the algorithm performs
no multiplication.There fore
T(0) = 0
We use backward substitution to solve the equation
BM 267 20
Mathematical Analysis of Recursive Algorithms(4)
T(n) = T(n-1) + 1 substitute T(n-1) with T(n-2) +1
= [T(n-2) +1] +1 = T(n-2) +2 substitute T(n-2) with T(n-3) +1
= [T(n-3) +1] +2 = T(n-3) +3
….
T(n) = T(n-i) +i
When i = n
T(n) = T(n-n) + n = T(0) + n = 0 + n = n
So we need n multiplication to compute F(n)
BM 267 21
BM267 - Introduction to
Data Structures
1
Objectives
• Review basic C knowledge
– Learn basic types, int, float, char.
– Learn how to write and call functions.
– Learn how to define C structures which put
pieces of information together.
– Learn to use pointers which refer to information
indirectly.
– Learn general approach to organize our C
programs.
BM 267 2
Basic Types
• A data type is a set of values and collection of operations
on those values.
• In C, programs are built from just a few types of data
– Integers:short int, int, long int
– Floating-point numbers: float, double
– Characters:char
• When we perform an operation, we need to ensure that its
operands and result are of the correct type.
• C performs implicit type conversions.
• We can use cast, or explicit type conversions.
• For example, if x and N are integers
((float ) x ) / N the result of this operation is floating point
BM 267 3
Operations on basic data types
Arithmetic operations + - * / % ++ --
BM 267 4
Functions
• We define functions to implement new operations on data.
• All C programs include a definition of the function main().
• All functions have a list of parameters, the list can be
empty and functions may return a value or nothing.
• In order to declare a function, you should give return type,
its name and paramater types.
• Ex: int lg(int);
• In a function definition, you should give names to the
arguments, do the desired computation using these
parameters.
• Definition and declaration of function could be in different
files, but you should include the declaration file into
definition file.
BM 267 5
Functions Example
#include <stdio.h>
int lg(int);
int main(){
int i, N;
for (i = 1, N = 10; i <= 6; i++, N *= 10)
printf("%7d %2d %9d\n", N, lg(N),
N*lg(N));
return 0;
}
int lg(int N){
int i;
for (i = 0; N > 0; i++, N /= 2) ;
return i;
}
BM 267 6
Functions Example –2 Average and
Standard Deviation of N integers
1. #include <stdlib.h>
2. #include <stdio.h>
3. #include <math.h>
4. typedef int numType;
5. numType randNum()
6. { return rand(); }
7. int main(int argc, char *argv[])
8. { int i, N = atoi(argv[1]);
9. float m1 = 0.0, m2 = 0.0;
10. numType x;
11. for (i = 0; i < N; i++)
12. {
13. x = randNum();
14. m1 += ((float) x)/N;
15. m2 += ((float) x*x)/N;
16. }
17. printf(" Average: %f\n", m1);
18. printf("Std. deviation: %f\n", sqrt(m2-m1*m1));
19. }
BM 267 7
Program Organization
• As it is recommended, you can split your program
into three files.
• .h file: An interface, which defines the data
structure and declares the funtions to be used to
manipulate.
• .c: An implementation of the functions declared in
the .h file ( must include .h file).
• A client program that uses the functions declared
in the interface ( must include .h file). This file
must implement main() function.
BM 267 8
Program Organization (2)
Num.h
1. typedef int numType;
2. numType randNum();
Num.c
1. #include <stdlib.h>
2. #include “Num.h”
3. numType randNum()
4. { return rand(); }
Client.c
1. #include <stdio.h>
2. #include <math.h>
3. #include “Num.h”
4. int main(int argc, char *argv[])
5. { implementation of main }
BM 267 9
Structs
• We need data structures that allow us to handle
collections of data. Arrays and struct allow us to
organize data.
• Structs define a new type of data.
• Structs are aggregate types that we use to define
collections of data. The members of a struct can be
different type, it can even be another struct, but
arrays can hold only one type of data.
• Assume that we need a new type which is callled
Point, unfortunately, there is no such a built-in
type in C standard.
• But C allows us to define such a mechanism using
“struct”.
BM 267 10
Structs(2)
• Accordingly, we can write;
struct Point {float x; float y;}; Do
not forget the semicolon.
• struct Point a, b; declares two Point
variables.
• We can refer each member of the Point struct by
their names. For example
a.x=1.0; a.y=1.0; b.x=4.0;
b.y=5.0;
• We can also pass structs as arguments of a
functions. For example
BM 267 11
Structs(3)
Point.h
• typedef struct { float x; float y; }
point;
• float distance(point a, point b);
Point.c
• #include <math.h>
• #include "Point.h"
• float distance(point a, point b)
• { float dx = a.x - b.x, dy = a.y -
b.y;
• return sqrt(dx*dx + dy*dy);
• }
BM 267 12
Pointers
• C pointers provides us to manipulate data
indirectlly. Basically pointer is a reference to an
object in the memory.
• In order to declare a pointer, you should first give
its type and then put a “*” before giving the
variables name. Ex int *a_Ptr;
• We can declare pointers to any type of data.
Ex: float *f_Ptr, Point *point_Ptr.
• The “&” operator returns the adreess of a
variable.
• When you want to initialize a pointer you can use
“&” operator. Ex int a, *a_Ptr=&a;
BM 267 13
Pointers(2)
• C functions returns only one value, but pointers
allow us to manipulate more variables.
• Ex: polar( float x, float y, float*r,
float* theta)
{
*r = sqrt(x*x+y*y);
*theta= atan2(y,x);
}
• The function call polar(1.0, 1.0, &a, &b) will
effect the values of a and b. ‘a’ will become
sqrt(x*x+y*y) and ‘b’ will become atan2(y,x);
BM 267 14
Arrays
• An array is fixed collection of same type
data that are stored contiguously in the
memory.
• a 0 1 2 3 4
BM 267 15
Dynamic Memory Allocation
• Dynamic memory allocation allow us to obtain blocks of
memory as needed during execution.
• Using dynamic memory allocation, we can design data
structures that grow and shrink.
• To allocate memory dynamically, we will need to call one
of the three memory allocation functions declared in the
<stdlib.h> header.
malloc: allocates a block of memory, but doesn’t initialize it.
calloc: allocates a block of memory and clears it.
realloc: Resizes a previously allocated block of memory.
• malloc returns a value of type void*
• When we call a memory function, it may not allocate
enough memory, in this case it returns NULL, we must test
this situation.
BM 267 16
Dynamic Memory Allocation(2)
• Ex: p = malloc( 10000 );
İf( p==NULL)
// allocation failed, take
appropriate action;
• We use sizeof operator to calculate the amount of space
required.
• Ex: Point * p = malloc(sizeof(Point)) or
Point * p = malloc( n*
sizeof(Point)) it allocates n Point object.
• Once it points to a dynamically allocated block of memory,
we can use p as if it is an array.
• for( int i = 0; i < n; i++)
p[i].x = i;
p[i].y = i;
BM 267 17
Dynamic Memory Allocation(3)
• free() Deallocates memory allocated by malloc
• Takes a pointer as an argument
• free ( ptr );
BM 267 18
BM267 - Introduction to Data
Structures
BLM267 1
Objectives
Learn about elementary data structures - Data structures
that form the basis of more complex data structures.
• Structure: Combines different data types as a unit.
• Array: Combines many instances of the same data type
as a unit.
• Linked list: allows insertions and removals anywhere.
Implementation using dynamic allocation vs fixed
arrays.
• String: allows insertions and removals of substrings
hence the size changes dynamically.
BLM267 2
Self-referential objects
BLM267 3
Node
• A node in the simplest form has two fields:
Data Link
BLM267 4
Node
Data Link
BLM267 5
Node
BLM267 6
Programming errors with pointers
• Dereferencing a NULL pointer ptr
q->data = 5; /* ERROR */
free(q->next);
q->next->data = 6; /* PROBLEM */
BLM267 7
Linked List (as an elementary data type)
The simplest kind of linked list is a linear chain of links
(pointers) to nodes of the same type.
More sophisticated linked lists maye have several chains
of pointers.
A linked list can only be reached thru its handle(s).
Handles are called the head pointer (or list-head), tail-
pointer (or list-end).
BLM267 8
Linked list - most common types
Singly linked list
• Each node points to the next
• Terminates with a null pointer
• Only traversed in one direction
Circular, singly linked
• Pointer in the last node points back to the first node
Doubly linked list
• May have two “start pointers” – head element and tail element
• Each node has forward / backward pointers
• Allows traversals both forwards and backwards
Circular, doubly linked list
• Forward pointer of the last node points to the first node and
backward pointer of the first node points to the last node
BLM267 9
Singly-linked list
12 23 26 44
LH
BLM267 11
Singly-linked unordered list
22 13 26 9
BLM267 12
Singly-linked unordered list
12
BLM267 13
Singly-linked unordered list
struct LNODE
{
int val;
LNODE * next;
};
LNODE * LH;
LNODE * newNode;
BLM267 14
Singly-linked unordered list
C code for adding a node to the beginning of the list:
• Let the caller update the listhead pointer
Caller code: LH = AddNode(LH, newNode);
Where:
LNODE * AddNode(LNODE* head, LNODE* newNode)
{ newNode->next = head;
return newNode;
}
BLM267 15
Singly-linked unordered list
Deleting a node:
• List head pointer (LH) is given,
• The key value of the item (key) is given (Assume 13)
22 13 26 9
22 13 26 9
Modified Deleted
BLM267 16
Singly-linked unordered list
For deletions, need to keep two pointers, pointing to the
modified and deleted items.
Special cases: In case of deleting the first item, listhead
pointer LH must be updated.
We will assume that an item with key always exists in the
list.
BLM267 17
Singly-linked unordered list
C code for deleting a node whose data value is given:
• Let the caller update the listhead pointer
Caller code: LH = DeleteNode(LH, key);
Where:
LNODE * DeleteNode(LNODE * head, int key)
{
LNODE * node = head;
LNODE * prev = NULL;
while (node->val != key)
{ prev = node;
node = node->next;
}
if (!prev) (Deleting the first node?)
head = node->next;
else
prev->next = node->next;
free (node);
return head; (Return listhead)
}
BLM267 18
Singly-linked unordered list
BLM267 19
Singly-linked unordered list
Search for a value:
• List head pointer (LH) is given,
• The key value (key) is given
22 13 26 9
BLM267 20
Singly-linked unordered list
Caller code:
if(Search(LH,12)) /*Search for an item with value 12 */
... /* Value found */
else
... /* Value not found */
Where:
int Search(LNODE * node, int key)
{
while (node)
if (node->val == key)
return 1; /* Value found */
else
node = node->next;
return 0; /* Value not found */
}
BLM267 21
Sorted lists
Keep the items on the list in a sorted order, based on
data value in each node
9 13 14 22
Advantages:
• already sorted, no need for sort operation
• operations such as delete, find, etc. need not search to
the end of the list if the item is not in list
Disadvantages
• Insert operation must search for the right place to add
element (slower than simply adding at beginning)
BLM267 22
Singly-linked ordered list
Adding a node to an ordered list:
LH
Modified Added
BLM267 23
Circular singly-linked list
• Last node references the first node
• Every node has a successor
• No node in a circular linked list contains NULL
BLM267 24
Singly-linked list with dummy head node
prev node
BLM267 25
Doubly-linked list
A double link node for int values defined in C:
struct LNODE
{
LNODE * llink; llink data rlink
int val;
prev data next
LNODE * rlink;
};
12 15 22 28
BLM267 26
Doubly-linked ordered list
Adding a node to an ordered list:
LH
12 15 22 28
12 15 22 28
14
BLM267 27
Doubly-linked ordered list
newNode val Adding a node to an ordered list
(Assuming LH is global)
Node = LH (Initialize node pointer)
WHILE (node NULL)
{
IF (node->val > val) (Find the location to insert)
{ newnode->rlink node
newnode->llink node->llink
node->llink newnode
IF (newnode->llink == NULL) (Adding as first?)
LH = newnode (Update listhead)
ELSE
(newnode->llink)->rlink newnode
break
}
ELSE node node->rlink; (keep trying)
}
BLM267 28
Doubly-linked ordered list
Deleting a node:
• List head pointer (LH) is given,
• The key value of the item (val) is given (Assume 15)
12 15 22 28
12 15 22 28
Modified Deleted
Modified
BLM267 29
Implementing lists using arrays
• Arrays have fixed number of elements (check for
overflow).
• Pointers now become array indices (of type integer).
• Since C arrays start with index 0, we will assume -1
corresponds to the NULL pointer.
• The last used array element need to be maintained.
• Here is the NODE structure for use with arrays in C:
struct NODE
{
int val;
int next;
};
BLM267 30
Implementing lists using arrays (2)
BLM267 32
Implementing lists using arrays (4)
C code for adding a node to the beginning of the list,
assuming node[] and LH are defined globally.
Caller code:
AddNode(newNode);
Where:
void AddNode(int newVal)
{ int index;
index = FindEmpty( );
if (index == -1)
{ /* No more space on array */ }
else
{ node[index].val = newVal;
node[index].next = LH;
LH = index;
}
}
BLM267 33
Implementing lists using arrays (5)
C code for deleting a node, assuming node[] and LH
are defined globally and val always exists in the list.
Caller code: DeleteNode(val);
Where: void DeleteNode(int delVal)
{ int index=LH;
while(!(index < 0))
if(node[index].val == delVal)
break;
else
{ prev = index;
index = node[index].next;
}
if(prev < 0) (Deleting the first node?)
LH = node[index].next;
else
node[prev].next = node[index].next;
node[index].next = -2; (Mark as empty)
} BLM267 34
Implementing lists using arrays (6)
C code for searching a value in an unordered list,
assuming node[] and LH are defined globally.
Caller code: if(SearchVal(val)) /*Search for an item with value val */
... /* Value found */
else
... /* Value not found */
BLM267 35
Lists vs. Arrays - A comparison
Space (storage) considerations
• A linked list requires pointers to nodes.
• An array requires the maximum number of elements to
be known in advance. If that maximum is not required,
space is wasted at the end of the array.
Time considerations
• Operations on a linked list require more lines of explicit
code than those in an array. However, addressing an
array element uses more implicit (compiler generated) code.
• Arrays are quicker at finding and altering ‘in the middle’
• Linked lists are quicker at insertions and removals ‘in the
beginning/middle’
BLM267 36
String (as an elementary data type)
BLM267 38
String representation
BLM267 39
Null string representation
char * str; An empty (null) string is NOT represented by:
str = 0; 00000000
BLM267 40
String buffer
• When processing strings, strings are allowed to grow/shrink
in size (create, copy, append).
• Allocating one (maximum size) array for each string is very
inefficient.
• Compilers use a contiguous memory block called string
buffer (or string space) to keep literal strings which are
constant (format strings, initialization strings etc.)
• Some applications define a string buffer which is a memory
area that contains all strings side by side, and can be
manipulated under program control.
A N K A R A \0 U N I V E R S I T Y \0 ? ? ? ?
BLM267 41
String buffer
If a string grows in size, it may need to be moved to a different
location in string buffer.
A N K A R A \0 U N I V E R S I T Y \0 ? ? ? ?
char * p char * q
After strcat(p,q):
A N K A R A U N I V E R S I T Y \0 \0 ? ? ? ?
char * p char * q
BLM267 42
String buffer
Empty locations in string buffer may need to be compacted from
time to time.
\0 \0
\0 \0
\0
\0 \0
\0 \0
\0
BLM267 43
Elementary data structures
BLM267 44
BM267 - Introduction to Data
Structures
TOP
BOTTOM
BM267 2
Pushdown stacks in action
Web browser:
• Stores the addresses of recently visited sites on a stack.
• Each time a user visits a new site, the address of the site is
pushed into the stack of addresses.
• Using the 'back' button the user can pop back to previously
visited sites!
Text editors:
• Powerful text editors keep text changes in a stack.
• The user can use the undo mechanism to cancel recent editing
operations
BM267 3
Pushdown stack operations
The fundamental operations involved in a stack are
“push” and “pop”.
• push: adds a new element on top of the stack
• pop: removes an element from the top of the stack
It is an error to pop an element from an empty stack.
It is also an error to push an elemet to a full stack.
Other operations
• isEmpty: Checks if the stack is empty
• isFull: checks if the stack is full (if there is an
implementation dependent limit)
BM267 4
Pushdown stack ADT implementation
A stack can be implemented with an array of objects
easily.
• The maximum size of the stack (array) must be
estimated when the array is declared.
• Space is wasted if we use less elements.
• We cannot "push" more elements than the array can
hold (overflow).
BM267 5
Pushdown stack ADT implementation
BM267 6
Pushdown stack ADT implementation
Top
N-1 0
0
N-2
Top 1 1
1 N-2
Top N-2
0 N-1
N-1
BM267 7
Pushdown stack ADT implementation
void STACKinit(int);
• Initializes an array of Items of specified size.
• Item (structure) is known by both the client and the
implementation
• Must have a pointer (or array index) that points the next available
(or last filled) slot on the stack.
Top
BM267 8
Pushdown stack ADT implementation
Convention:
0
BM267 9
Pushdown stack ADT implementation
Convention:
BM267 10
Pushdown stack ADT implementation
void push(Item); Insert new item at the top of the stack.
Precondition: The stack is not full.
Postcondition: The stack has a new item at the top.
Item pop( ); Remove the item from the top of the stack.
Precondition: The stack is not empty
Postcondition: Either the stack is empty or the stack has a
new topmost item from a previous push.
int STACKempty(); Returns a logical value depending on
the number of elements in the stack.
Precondition: The stack has 0 N Max elements
Postcondition: The stack has N elements.
BM267 11
Pushdown stack ADT implementation
push(1) push(2)
2
Top
1 1
3
2 2
1 1 1
BM267 12
Example: Using a Stack to compute a Hex Number
431 / 16 = 26 26 / 16 = 1 1 / 16 = 0
Rem: 15 Rem: 10 Rem: 1
Push (Hex 15) Push (Hex 10) Push (Hex 1)
'1'
Stack: Empty 'A' 'A'
'F' 'F' 'F'
'
'1'
'A'
'A' Stack: Empty
'F' 'F' 'F'
BM267 13
Example: Array Implementation of a Stack
int *s;
int Top;
void STACKinit(int maxN){
s = (int *) malloc( maxN * sizeof(int) );
Top = 0; }
int STACKempty(){
return Top == 0; }
void STACKpush(int item){
s[Top++] = item; }
int STACKpop(){
return s[--Top]; }
BM267 14
Example: Postfix Calculation
• Suppose that we need to find the value of a simple arithmetic
expression involving multiplication and addition of integers,
such as
5 * ( ( ( 9 + 8) * ( 4 * 6 ) ) + 7)
• The calculation saves intermediate results: For example, if we
calculate ( 9 + 8 ), then we have to save the result.
• A pushdown stack is the ideal mechanism for saving
intermediate results in a such calculation.
• We can convert to arithmetic expression into postfix
representation. In postfix representation each operator appears
after its two operand.
( 5 + 9 ) 5 9 +
BM267 15
Example: Postfix Calculation
5 9 8 + 4 6 * * 7 + *
5 ( 9 + 8 ) 4 6 * * 7 + *
5 17 4 6 * * 7 + *
5 17 ( 4 * 6 ) * 7 + *
5 17 24 * 7 + *
5 ( 17 * 24 ) 7 + *
5 408 7 + *
5 ( 408 + 7 ) *
5 415 *
( 5 * 415 )
2075
BM267 16
Example: Postfix Calculation
6 6 6 6
5 5 5 5
4 4 4 4
3 3 3 3
2 2 2 8 2
1 1 9 1 9 1
Top 5 5
0 5 0 0 0
5 9 8 + 4 6 * * 7 + *
BM267 17
Example: Postfix Calculation
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2 4
9 1 1 17 1 17
5 5 5 0 5
0 0
5 9 8 + 4 6 * * 7 + *
BM267 18
Example: Postfix Calculation
6 6 6
5 5 5
4 4 4
6 3 3 3
4 2 4 2 2 24
17 1 17 1 17 1 17
5 5 5 0 5
0 0
5 9 8 + 4 6 * * 7 + *
BM267 19
Example: Postfix Calculation
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2 7
17 1 1 408 1 408
5 5 5 0 5
0 0
5 9 8 + 4 6 * * 7 + *
BM267 20
Example: Postfix Calculation
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
408 1 1 415 1
5 5 5 0 5
0 0
5 9 8 + 4 6 * * 7 + *
BM267 21
Example: Postfix Calculation
6 6
5 5
4 4
3 3
2 2
1 1
0 2075 0
pop() = 5 push(2075)
Top = 0 Top = 1
5 9 8 + 4 6 * * 7 + *
BM267 22
Example: Array Implementation of a Stack
/* Postfix-expression evaluation */
int main(int argc, char *argv[]){
char a[] = "5 11 * 5 + 2 *";
int i, array_size = strlen(a);
STACKinit( array_size );
for (i = 0; i < array_size; i++) {
if (a[i] == '+')
STACKpush( STACKpop()+ STACKpop() );
if (a[i] == '*')
STACKpush( STACKpop() * STACKpop() );
if ( (a[i] >= '0') && ( a[i] <= '9') )
STACKpush(0);
while ((a[i] >= '0') && (a[i] <= '9'))
STACKpush(10*STACKpop() + (a[i++]-'0')); }
printf("%d \n", STACKpop());
return 0;}
BM267 23
Queues
rear front
enter 1 7 3 9
exit
BM267 24
Queue operations
• put: adds a new element at the rear of the queue
• Increase the number of element in the queue by 1.
• get: removes an element from the front of the queue
• Decrease the number of elements in the queue by 1
It is an error to get an element from an empty queue.
It is also an error to put an element to a full queue.
Other operations
• isEmpty: Checks if the queue is empty
• isFull: checks if the queue is full (if there is an
implementation dependent limit)
BM267 25
Queue implementation
A queue can be implemented with an array of objects
easily.
• The maximum size of the queue (array) must be
estimated when the array is declared.
• Space is wasted if we use less elements.
• We cannot "put" more elements than the array can
hold (overflow).
• Must have two pointers (or array indices) that point to the rear
and the front of the queue.
rear front
1 4 8 3 2 5 7 6
front rear
12 15 22 28
front rear
? ? ? ? ? Initial state
1 ? ? ? ? put( 1)
front rear front = 0 rear = 0
put( 5)
1 5 ? ? ?
front = 0 rear = 1
front rear
put( 4)
1 5 3 4 ? front = 0 rear = 3
front rear
get( ) = 1
5 3 4 ? ? front = 0 rear = 2
front rear
? ? ? ? ? Initial state
put( 5)
? 1 5 ? ? front = 0 rear = 2
front rear
put( 4)
? 1 5 3 4 front = 0 rear = 4
front rear
get( ) = 1
? ? 5 3 4 front = 1 rear = 4
front rear
put(10)
9 10 ? 3 4 front = 2 rear = 1
rear front
1 12 14 22
1 0 ... 1 0 1 0 ... 1 0 0 0
1 2 ... 12 13 14 15 ... 22 ...
Note that the linked list shown above is an ADT: it may actually
be implemented using an array.
5. Recursion
BM267 1
Objectives
Learn about
• Recursion
• Divide and conquer
• General and binary tree structures
• Implementation of trees
• Mathematical properties of trees.
• Tree operations, tree traversal algorithms
BM267 2
Recursion
• A recursive definition is one which uses the word or concept
being defined in the definition itself
24, 88, 40
• Such a list can be defined on paper as
A LIST is a: number
or a: number comma LIST
number , LIST
24 , 30 , 40
number
24 , 30 , 40
BM267 4
Recursion
• All recursive definitions have to have a terminating case
A LIST is:
either number, followed by a comma, followed by another LIST
or number
BM267 6
Recursive functions
BM267 7
Recursive functions
int puzzle(int N) Here, we cannot use
{ induction to prove
if (N = = 1) that this program
terminates,
return 1;
because not every
if (N % 2 == 0) recursive call uses
return puzzle(N/2); an argument
else smaller than the
return puzzle(3*N+1); one given.
}
BM267 8
Recursive functions
Linked list node count:
int count(link x)
{
if (x == NULL)
return 0;
return 1 + count(x->next);
}
BM267 9
Recursive functions
Factorial function
N!, for any positive integer N, is defined to be the product
of all integers between 1 and N inclusive
This definition can be expressed recursively as:
0! = 1
1! = 1
N! = N * (N-1)!
The concept of the factorial is defined in terms of another
factorial
Eventually, the base case of 1! is reached.
BM267 10
Recursive functions
• Recursion and looping has similar meanings.
• Loop termination condition has the same role as a recursive
base case.
• A loop’s control variable serves the same role as a general case.
Termination
sum = 0;
Condition int Factorial(int n)
i = 1;
{
while(i <= 10)
if (n == 0) (base case)
{
return 1;
sum += i;
else ( n>0, recursive case)
i++;
return n*Factorial(n-1);
}
}
BM267 11
Recursive functions
Function call Factorial(5) proceeds as below:
120
5!
24
5 * 4!
6
4 * 3!
3 * 2!
2
2 * 1!
1
BM267 12
Recursive functions
N N-1 N-2
i = N + i = N + (N-1) + i
i=1 i=1 i=1
BM267 13
Recursive functions
result = 6
main
sum(3)
result = 3
sum
sum(2)
result = 1
sum
int countdown (int n){
sum(1)
if (n > 1)
return (n + countdown(n - 1)); sum
else
return 1;
}
BM267 14
Recursive functions
BM267 15
Recursive functions
•Fibonacci numbers
–0, 1, 1, 2, 3, 5, 8...
–Each number sum of the previous two
fib( n ) = fib( n - 1 ) + fib( n - 2 )
–Base case: fib(0) = 0 and fib(1) = 1
T(n) = n-1
1 5 2 6 9 3 4 8
1 5 2 6 9 3 4 8
1 5 2 6 9 3 4 8
1 5 2 6 9 3 4 8
5 6 9 8
6 9
9 max
= 2kT(n/2k) + Σ 2i
i=0
k-1 x k -1
and we know that Σ 2i = --------------------
i= 0 x–1
k-1 2 k -1
T(n) = 2kT(n/2k) + Σ 2i = 2kT(n/2k) + --------------------
i=0 2– 1
5. Trees
BM267 1
Objectives
BM267 2
Tree Structures
BM267 3
Trees
A real tree
BM267 4
Trees
BM267 5
Trees
Root
14
17 11
9 53 4
19 7
13
BM267 6
Trees
+ *
9 53 4 7
= ( 9 + 53 ) – ( 4 * 7 )
BM267 7
Rooted Trees
• There is a unique node called root node. Node “14” is the root
of the tree.
• The parent of a node is the node linked above it. Every non-
root node has a unique parent. Node 17 is the parent of 9 and
53.
• The nodes whose parent is node n are n’s children. The
children of Node 17 are 9 and 53.
• Nodes without children are leaves. Nodes 13, 53, 19, and 7 are
leaves.
• Two nodes are siblings if the have the same parent. 9 and 53 are
siblings of each other.
BM267 8
Rooted Trees
Root
Root’s
children
Leaves
BM267 9
Rooted Trees
• An empty tree has no nodes
• The descendants of a node are its children and the
descendents of its children
• The ancestors of a node are its parent (if any) and the
ancestors of its parent
• An ordered tree is one in which the order of the
children is important; an unordered tree is one in
which the ordering of the children is not important.
• The branching factor of a node is the number of
children it has.
BM267 10
Rooted Trees
2 2 2 2 2 2 2
3 3
BM267 11
Rooted Trees
The height of a node n is the number of
edges on the longest path from n to a
descendent leaf.
3
1 1 2 1
0 0 0 1 0 0 0
BM267 14
Binary Trees
20
Level 0
21
Level 1
Level 2 22
23
Level 3
BM267 16
Binary Trees
BM267 17
Binary Trees
BM267 18
Binary Trees
(integer
parent(i)= (i –1) / 2 division )
left(i) = 2i + 1
0 a
right(i) = 2i + 2
1 l 2 g
3 o 4 r 5 i 6 t
s 9
7 h 8m
a l g o r i t h m s
0 1 2 3 4 5 6 7 8 9
BM267 19
Binary Trees
We can also represent incomplete binary trees in an array
A 0
1 B 2
5 C 6
3 4
A B C
0 1 2 3 4 5 6
BM267 20
Binary Trees
struct TreeNode{
Linked representations of binary trees.
char data;
root
struct TreeNode *left;
struct TreeNode *right;
A }
B G
E K M
BM267 21
Binary Trees
Common Binary Tree Operations
– Determine its height
– Determine the number of elements in it
– Display the binary tree on the screen.
Returns the height of the tree.
int height(link h)
{ int u, v;
if (h == NULL)
return -1;
u = height(h->l);
v = height(h->r);
if (u > v) return u+1;
else return v+1; }
BM267 22
Binary Trees
Returns the number of elements in the tree.
int count(link h){
if (h == NULL)
return 0;
return count(h->l) + count(h->r) + 1;
}
BM267 23
Tree Traversals
BM267 24
Depth-first Traversals (binary trees)
j f
d b a e
i c g
hjdicbgfae
: Node is visited here
BM267 26
In-order Traversal
j f
d b a e
i c g
idcjgbh af e
BM267 27
Post-order Traversal
j f
d b a e
i c g
icdgbjaefh
BM267 28
Breadth-first Traversal
j f
d b a e
i c g
hjfdbaeicg
BM267 29
Tree Traversal - Preorder
Prints the nodes’ data in Preorder
void traverse( LINK h )
{
if (h)
{
printf(“%d”, h->data); //(prints the node)
traverse(h->left);
traverse(h->right);
}
}
BM267 30
Tree Traversal - Inorder
Prints the nodes’ data in Inorder
void traverse( LINK h )
{
if (h)
{
traverse(h->left);
printf(“%d”, h->data); //(prints the node)
traverse(h->right);
}
}
BM267 31
Tree Traversal - Postorder
Prints the nodes’ data in Postorder
void traverse( LINK h )
{
if (h)
{
traverse(h->left);
traverse(h->right);
printf(“%d”, h->data); //(prints the node)
}
}
BM267 32
Implementing (general) rooted trees
BM267 33
Example: Variable-length codes
BM267 34
Example: Variable-length codes
Requirements: For each possible coded sequence, the
sequence must be
• uniquely decodeable.
• instantaneously decodeable (without the need for
further computations or table look-ups).
BM267 35
Example: Variable-length codes
Let our alphabet consist of 5 symbols, A, B, C, D, E.
Symbol Freq.(%)
A 40
B 25 Consider the code for
C 15 ABCDE.
D 15
E 5
BM267 36
Example: Variable-length codes
Assume the following
codes were chosen:
Symbol Code Consider the coding for ABCDE.
A 1
The code will be: 1000111011
B 00
C 01 Can you decode it?
D 11 1 . 00 . 01 . 11 ? 011
E 011
Is 011 = 011 (E) or 01.1 (CA) ?
BM267 37
Example: Variable-length codes
Assume the following
codes were chosen: Consider the coding for ABCDE.
Symbol Code
The code will be: 0010110111111
A 0
B 01 Can you decode it?
C 011
0.01 ? 0110111111
D 0111
Is 011 = 01 (B) . 1 or 011 (C) ?
E 111
Draw the code tree. Start from the root and follow
the edges until a code word is found.
Repeat until decoding is completed.
BM267 39
Huffman Coding
5 15 15 25 40 Initial State
E D C B A
15 20 25 40
C B A
5 15
E D
BM267 40
Huffman Coding
25 35 40
B A
15 20
5 15
E D
BM267 41
Huffman Coding
40 60
A
25 35
B
15 20
5 15
E D
BM267 42
Huffman Coding
0 100 1
40 60
0 1
A
25 35
1
0
B
15 20
C 0 1
5 15
E D
BM267 43
Huffman Coding
Symbol Code
A 0 Consider the coding for ABCDE.
B 10
The code will be: 01011011111110
C 110
D 1111 Can you decode it?
E 1110
0 . 10 . 110. 1111 . 1110
BM267 44
BM267 - Introduction to Data
Structures
6. Elementary Sorting
Methods
BM267 1
Objectives
Learn about
• Elementary sorting algorithms
• Selection sort
• Insertion sort
• Bubble sort
• Analysis of each sorting method
BM267 2
Sorting
77 42 35 12 101 5
1 2 3 4 5 6
5 12 35 42 77 101
BM267 3
Choosing a sorting algorithm
BM267 5
Element stability
• A sorting method is said to be stable if it preserves the relative
order of items with duplicated keys in the file. Items with
identical keys should appear in the same order as in the
original input.
BM267 6
Element stability
Adams A Adams A Adams A
Black B Smith A Smith A
Brown D Washington B Black B
Jackson B Jackson B Jackson B
Jones D Black B Washington B
Smith A White C White C
Thompson D Wilson C Wilson C
Washington B Thompson D Brown D
White C Brown D Jones D
Wilson C Jones D Thompson D
BM267 7
Indirect sorting
BM267 8
Selection sort
• Selection sort works as follows:
• Find the smallest element in the array, and exchange it
with the element in the first position.
• Then, find second smallest element and exchange it with
the element in the second position.
• Continue in this way until the array is sorted.
i
25 50 10 95 75 30 70 55 60 80
10 50 25 95 75 30 70 55 60 80
BM267 9
Selection sort
i
10 50 25 95 75 30 70 55 60 80
10 25 50 95 75 30 70 55 60 80
10 25 30 95 75 50 70 55 60 80
10 25 30 50 75 95 70 55 60 80
10 25 30 50 55 95 70 75 60 80
10 25 30 50 55 60 70 75 95 80
BM267 10
Selection sort
10 25 30 50 55 60 70 75 95 80
10 25 30 50 55 60 70 75 95 80
10 25 30 50 55 60 70 75 80 95
BM267 11
Selection sort
BM267 12
Selection sort- Is selection sort stable?
25 10 75 25 95 15
10 25 75 25 95 15
10 15 75 25 95 25
10 15 25 75 95 25
10 15 25 25 95 75
10 15 25 25 75 95
BM267 13
Selection sort - analysis
Iteration 1:
• Find smallest value in a list of n values: n-1 comparisons
• Exchange values and move marker
Iteration 2:
• Find smallest value in a list of n-1 numbers: n-2 comparisons
• Exchange values and move marker
…
Iteration n-2:
• Find smallest value in a list of 2 numbers: 1 comparison
• Exchange values and move marker
Total: (n-1) + (n-2) + …. + 2 + 1 = n(n-1)/2
BM267 14
Selection sort - analysis
Space efficiency:
• No extra space used (except for a few
variables)
Time efficiency:
• The best-case and worst-case are same. All
input sequences need same number of
comparisons.
• the amount of work is the same:
T(n) = n(n-1)/2
BM267 15
Insertion sort
• Insert the first element of the unsorted array into
already sorted portion of the array by shifting all
larger elements to the right.
BM267 16
Insertion sort
i
25 50 10 95 75 30 70 55 60 80
25 50 10 95 75 30 70 55 60 80
10 25 50 95 75 30 70 55 60 80
10 25 50 95 75 30 70 55 60 80
10 25 50 75 95 30 70 55 60 80
10 25 30 50 75 95 70 55 60 80
BM267 17
Insertion sort- Worst Case
i
90 80 70 60 50 40 30 20 10 # of Comparison = 1
i
80 90 70 60 50 40 30 20 10 # of Comparison = 2
i
70 80 90 60 50 40 30 20 10 # of Comparison = 3
i
60 70 80 90 50 40 30 20 10 # of Comparison = 4
i
50 60 70 80 90 40 30 20 10 # of Comparison = 5
BM267 18
Insertion sort- Worst Case
i
40 50 60 70 80 90 30 20 10 # of Comparison = 6
i
30 40 50 60 70 80 90 20 10 # of Comparison = 7
i
20 30 40 50 60 70 80 90 10 # of Comparison = 8
10 20 30 40 50 60 70 80 90
BM267 19
Insertion sort- Worst Case
• For size n, total # of comparisons:
T(n)worst = n-1 + n-2 + n-3+ … + 2 + 1 = ( n-1)n / 2
T(n) avg = N2/ 4
BM267 20
Insertion sort
BM267 21
Comparing insertion sort / selection sort
the amount of work does not for already sorted arrays runs
depend on the type of input faster.
insert each element into its final after the insertion every element
position can be moved later.
BM267 22
Bubble sort
• Bubble sort compares adjacent elements of the list and
exchange them if they are out of order.
• After the first pass, we put the largest element to the
last position in the list.
• The next pass puts the second largest element in its
position( just before the last position).
First pass 25 50 10 95 75 30 70 55 60 80
25 10 50 95 75 30 55 70 60 80
25 10 50 95 75 30 55 70 60 80
BM267 23
Bubble sort
25 10 50 75 95 30 55 70 60 80
25 10 50 75 30 95 55 70 60 80
25 10 50 75 30 55 95 70 60 80
25 10 50 75 30 55 70 95 60 80
25 10 50 75 30 55 70 60 95 80
BM267 24
Bubble sort
25 10 50 75 30 55 70 60 80 95
BM267 25
BM267 - Introduction to Data
Structures
7. Quicksort
Bm 267 1
Quicksort
Bm 267 2
Quicksort - Example 1
2 8 7 1 3 5 4
p ... r
Bm 267 3
Quicksort - Example 1
p ... r
untested
i,j i j
2 8 7 1 3 5 4 2 8 7 1 3 5 4
p ... r p ... r
i j i j
2 8 7 1 3 5 4 2 1 7 8 3 5 4
p ... r p r
i j i j
2 8 7 1 3 5 4 2 1 3 8 7 5 4
p ... r p ... r
Bm 267 5
Quicksort - Example 1
i j
2 1 3 8 7 5 4
p .. r
(exchange element i+1
with the pivot)
i
2 1 3 4 7 5 8
p .. r
Quicksort (A[p..q])
If the array has 0 or 1 elements,
then return. // the array is sorted
else do:
Pick an element in the array to use as the pivot.
Split the remaining elements into two disjoint groups:
– "Smaller" elements not greater than the pivot, A[p...m-1]
– "Larger" elements greater than pivot, A[m+1… r]
i j
75 70 65 60 98 78 99 93 55 59 81 88
75 70 65 60 59 78 99 93 55 98 81 88
75 70 65 60 59 55 99 93 78 98 81 88
When done, exchange the rightmost element in group
"Smaller" with the pivot
55 70 65 60 59 75 99 93 78 98 81 88
75 is now placed appropriately.
Need to sort sublists on either side of 75.
Bm 267 9
Quicksort
int partition(Item a[], int l, int r);
void quicksort(Item a[], int l, int r)
{ int m;
if (r <= l) return;
m = partition(a, l, r);
quicksort(a, l, m-1);
quicksort(a, m+1, r);
}
int partition(Item a[], int l, int r){
int i = l-1, j = r; Item v = a[r];
for (;;){
while (less(a[++i], v)) ;
while (less(v, a[--j])) if (j == l) break;
if (i >= j) break;
exch(a[i], a[j]);
}
exch(a[i], a[r]);
return i; }
Bm 267 10
Quicksort - Analysis
Best Case
– If the pivot results in sub arrays of approximately
the same size.
– T(n) = 2T(n/2) + n – 1
= n log2 n
Bm 267 11
Quicksort - Analysis
Bm 267 12
Quicksort - Analysis
O(n2) worst-case
• List already ordered (either way)
• Then the pivot element is the largest or smallest
element: one of the sublists is almost always empty.
• Partitioning always divides the size n array into
these three parts:
• A length one part, containing the pivot itself
• A length zero part, and
• A length n-1 part, containing everything else
Bm 267 13
Quicksort - Analysis
Bm 267 15
Quicksort - Possible Improvements
Bm 267 16
Quicksort - Possible Improvements
• Often the list to be sorted is already partially ordered.
• An arbitrary pivot gives a poor partition for nearly sorted lists
• In these cases, virtually all the elements either go into the
group "Smaller" or to the "Larger", all through the recursive
calls.
• Quicksort takes quadratic time to do essentially nothing at all.
• There are better methods for selecting the pivot, such as the
median-of-three rule:
Select the median of the first, middle, and last elements
in each sublist as the pivot.
• Median-of-three rule will select a pivot closer to the middle of
the sublist than will the “first-element” rule.
Bm 267 17
Quicksort - Possible Improvements
#define M 10
void quicksort(Item a[], int l, int r)
{ int i;
if (r-l <= M) return;
exch(a[(l+r)/2], a[r-1]);
compexch(a[l], a[r-1]);
compexch(a[l], a[r]);
compexch(a[r-1], a[r]);
i = partition(a, l+1, r-1);
quicksort(a, l, i-1);
quicksort(a, i+1, r);
}
Bm 267 18
Quicksort - Possible Improvements
void sort(Item a[], int l, int r)
{
quicksort(a, l, r);
insertion(a, l, r);
}
Bm 267 19
BM267 - Introduction to Data
Structures
8. Mergesort
BLM 267 1
Mergesort
• Also a divide and conquer algorithm.
• Let’s see how the mergesort works with a
small array.
The first step divides the array into two equally
sized groups.
9 12 31 25 5 8 20 2 3 6
BLM 267 2
Mergesort
• Assume that we sort these arrays somehow
with recursive calls.
5 9 12 25 31 2 3 6 8 20
2 3 5 6 8 9 12 20 25 31
BLM 267 3
Mergesort
• As shown, the mergesort algorithm divides the
array its midpoint, sorts the two half-arrays by
recursive call.
• The stopping case for mergesort is when the array
to be sorted consists of one element.
• If there is more than one element, mergesort
carries out these steps:
Calculate the pieces of the two pieces of the array.
• The last element of the first half, m is (r + l) / 2
• The first element of the second half m+1
Use recursive calls to sort the two halves. (l, m) and
(m+1, r)
Finally, activate merge function to combine the two
sorted halves.
BLM 267 4
Mergesort
void mergesort(Item a[], int l, int r){
int m = ( l+r )/2;
if (r <= l) return;
mergesort(a, l, m);
mergesort(a, m+1, r);
merge(a, l, m, r);
}
BLM 267 5
Merge Function
• Let’s us suppose that we have two ordered arrays
a[0]… a[N-1] and b[0]… b[M-1]of
integers that we wish to merge into a third array
c[0]… c[N+M-1].
• The obvious strategy is to choose the
smallest remaining element from a and b.
a 5 9 12 25 31 b 2 3 6 8 20
c 2 ? ? ? ? ? ? ? ? ?
BLM 267 6
Merge Function
a 5 9 12 25 31 b 2 3 6 8 20
c 2 3 ? ? ? ? ? ? ? ?
a 5 9 12 25 31 b 2 3 6 8 20
c 2 ? ? ? ? ? ? ? ? ?
BLM 267 7
Merge Function
a 5 9 12 25 31 b 2 3 6 8 20
c 2 3 ? ? ? ? ? ? ? ?
a 5 9 12 25 31 b 2 3 6 8 20
c 2 3 5 ? ? ? ? ? ? ?
BLM 267 8
Merge Function
a 5 9 12 25 31 b 2 3 6 8 20
2 3 5 6 ? ? ? ? ? ?
Finally
a 5 9 12 25 31 b 2 3 6 8 20
2 3 5 6 8 9 12 20 25 31
BLM 267 9
Merge Function
mergeAB(Item c[], Item a[], int N, Item b[], int M )
{ int i, j, k;
for (i = 0, j = 0, k = 0; k < N+M; k++)
{
if (i == N) { c[k] = b[j++]; continue; }
if (j == M) { c[k] = a[i++]; continue; }
c[k] = ((a[i] < b[j])) ? a[i++] : b[j++];
}
}
BLM 267 10
Mergesort
8 3 2 9 7 1 5 4
8 3 2 9 7 1 5 4
8 3 2 9 7 1 5 4
8 3 2 9 7 1 5 4
3 8 2 9 1 7 4 5
2 3 8 9 1 4 5 7
1 2 3 4 5BLM
7 267
8 9 11
Analysis of Mergesort
• Mergesort requires about nlogn comparisons to
sort n elements.
– T(n) = 2T(n/2) + n-1 (for simplicity assume n = 2 k)
• Mergesort uses extra space proportional n.
• Mergesort is stable, if the underlying merge is stable.
BLM 267 12
BM267 - Introduction to Data
Structures
9. Heapsort
BLM 267 1
(Binary) Heap Structure
BLM 267 2
(Binary) Heap Structure
Structural Properties: 1
2
• The root of the tree is located 3
in the first array element A[1]. 4
5
• The left subtree of node A[i] 6
7
is located in A[2i]
8
9
• The right subtree of node A[i] 10
is located in A[2i+1] 11
BLM 267 4
(Binary) Heap Structure
The tree nodes are located in the array in a top-down, left-to-
right traversal order.
• A[1] contains the largest element;
• Elements in A[2] .. A[N] are not sorted. 22
12 14
9 11 13 5
4 2 8
22 12 14 9 11 13 5 4 2 8
1 2 3 4 5 6 7 8 9 10 11 12 ..
BLM 267 5
(Binary) Heap Structure
BLM 267 6
(Binary) Heap Structure
Some terminology:
• Height of a node: the number of segments of a downward
path to the farthest leaf node.
• Height of the tree: the height of the root node.
Height 2 Height 1
Height 1 Height 0
Height 0
BLM 267 7
Heap Operations
BLM 267 8
MaxHeapInsert( ) - Insert an item into the heap
14 12 11 9 5 13
1 2 3 4 5 6 7 8 9 10 11 12 ..
14
12 11
14 12 11 9 5 13
1 2 3 4 5 6 7 8 9 10 11 12 ..
14
9 5 11 O(log2 N) operations
BLM 267 10
MaxHeapify( ) - Combine heaps with the parent
Given: Node i, whose children at nodes 2i and 2i+1
already heapified.
Operation: Let the value of A[i] float down until the node
A[i] becomes a heap.
12 11
9 2 6 7
13 8 1 5
4 12 11 9 2 6 7 13 8 1 5
1 2 3 4 5 6 7 8 9 10 11
BLM 267 11
MaxHeapify( ) - Combine heaps with the parent
12 11
9 2 6 7
13 8 1 5
4 12 11 9 2 6 7 13 8 1 5
1 2 3 4 5 6 7 8 9 10 11
BLM 267 12
MaxHeapify( ) - Combine heaps with the parent
4
O(log N) operations.
12 11
9 5 6 7
13 8 1 2
4 12 11 9 2 6 7 13 8 1 5
1 2 3 4 5 6 7 8 9 10 11
BLM 267 13
BuildMaxHeap( ) - Convert an array into a heap
Given: An unordered array of size N.
Operation:
• Convert each node in the tree into a heap, bottom-up.
• Nodes A[N/2+1]..A[N]) are already heaps of size 1.
• Convert nodes A[N/2] .. 1 into heaps, using MaxHeapify(i)
4
12 11
13 8 1 5
4 12 11 9 2 6 7 13 8 1 5
1 2 3
BLM 267
4 5 6 7 8 9 10 1411
HeapExtractMax( ) - Remove the item at the root
Given: A heap of size N,
Operation:
• Exchange root with rightmost leaf.
13
12 11
9 5 6 7
3 8 1 2
13 12 11 9 5 6 7 3 8 1 2
1 2 3 4 5 6 7 8 9 10 11
BLM 267 15
HeapExtractMax( ) - Remove the item at the root
Let the value of A[1] float down until the heap property is
established.
Proceed in the direction of the child node with the larger value.
2 12
12 11 2 11
9 5 6 7 9 5 6 7
3 8 1 13 3 8 1 13
Not in heap
any longer
BLM 267 16
HeapExtractMax( ) - Remove the item at the root
Proceed in the direction of the child node with the larger value.
12 12
9 11 9 11
2 5 6 7 8 5 6 7
3 8 1 13 3 2 1 13
O(log N) operations.
BLM 267 17
Heapsort( )
17 10
16 13 9 1
5 7 12 4 8 3 0
BLM 267 18
Heapsort
27
17 10
exchange A[1] A[i]
16 13 9 1
0
5 7 12 4 8 3 0 17 10
16 13 9 1
5 7 12 4 8 3 27
Max-Heapify(A,1)
17
0
Not in heap
10 any longer
16 13 9 1
5 7 12 4 8 3 27
BLM 267 19
Heapsort
17
16 10
0 13 9 1
5 7 12 4 8 3 27 17
16 10
7 13 9 1
5 0 12 4 8 3 27
exchange A[1] A[i]
3
16 10
7 13 9 1
5 0 12 4 8 17 27 Not in heap
any longer
BLM 267 20
Heapsort
Max-Heapify(A,1)
3
16 10
7 13 9 1
16
5 0 12 4 8 17 27
3 10
7 13 9 1
5 0 12 4 8 17 27
16
13 10
7 3 9 1
5 0 12 4 8 17 27
BLM 267 21
Heapsort
BLM 267 23
Heapsort - Analysis
BLM 267 24
Heapsort - Analysis
BLM 267 25
Heapsort - Analysis
BLM 267 26
Priority Queue
• Problem:
• Maintain a dynamically changing set S so that every
element in S has a priority (key) k.
• Allow efficiently reporting the element with maximal
priority in S.
• Allow the priority of an element in S to be increased.
BLM 267 27
Priority Queue
BLM 267 28
Binary Heap as Priority Queue
• Binary heaps are binary trees that satisfy the following
heap property:
54 65
27 14 41 13
3 23 4
BLM 267 29
Heaps, Priority Queues
BLM 267 30
BM267 - Introduction to Data
Structures
Bm267 1
Searching
Sequential search
The algorithm simply compares successive elements
of a given array or list with a given key until either a
match is encountered (successful search) or the list is
exhausted without finding the a match(unsuccessful
search).
Algorithm Sequential Search( A[0…n-1], Key)
for( int i =0; i < n; i++)
if( A[i] == Key )
return 1;
return 0;
Bm267 2
Searching
• An improvement can be incorporated in sequential
search if the given list or array is sorted: searching
in such list can be stopped as soon as an element
greater than or equal to the search key is
encountered.
• Analysis of sequential search
– Average case: the sequential search compares n/2
element of the list
– Worst case: worst case occurs in two cases, either the
search is unsuccessful or the key is found at the end of
the list. Therefore in these cases sequential search has
to make n comparisons
Bm267 3
Searching
String matching
• String matching problem is searching a string of m
characters called pattern in a given string of n
characters called text. (m <= n)
• To put it more precisely, we want to find i-the
index of the leftmost character of the first matching
substring in the text-such that ti = p0, …,
ti+j= pj,…,ti+m-1 = pm-1.
t0 … ti … ti+j … ti+m-1 … tn-1 text T
p0 … pj … pm-1 pattern P
Bm267 4
Searching
• A brute-force algorithm for the string-matching problem is
quite obvious: align the pattern against the first m
characters of the text and start matching the corresponding
pairs of characters from left to right until either all m pairs
of the characters match or a mismatching pair is
encountered.
• If there is a mismatch, the pattern is shifted one position to
the right and character comparisons are resumed.
Algorithm BruteForceStringMatching( T[0…n-1], P[0…m-1])
for( i =0; i < n-m; i++)
for( j =0; j < m && P[j] == T[i+j]; j++);
if( j == m )
return i;
return –1;
Bm267 5
Searching
N O B O D Y _ N O T I C E D _ H I M
N O T
N O T
N O T
N O T
N O T
N O T
N O T
N Bm267
O T 6
Searching
• Analysis of string matching: in the worst case the
algorithm has to make (n*m) comparisons.
Binary search
• Binary search is an efficient algorithm for searching in a
sorted array. It works comparing a search key with the
array’s middle element a[m]. If they match, the algorithm
stops; Otherwise , the same operation is repeated
recursively for first half of the array if key < a[m] and for
the second half if key > a[m].
key
index 0 1 2 3 4 5 6 7 8 9 10 11 12
value 3 14 27 31 39 42 55 70 74 81 85 93 98
iteration 1 l m r
iteration 2 l m r
iteration 3 l,m r
Bm267 8
Searching
Algorithm BinarySearch( A[0…n-1], Key)
l = 0; r = n-1;
while(l <= r ){
m = (l+r)/2;
if(K = A[m])
return m;
else if ( K < A[m])
r = m-1;
else
l = m+1
}
return –1;
Bm267 9
Searching
Analysis of binary search
The average case analysis of binary search depends
on the key and values in the array.
The worst case occur if the search is unsuccessful. In
this case algorithm has to make:
T(n) = T(n/2) + 1 comparisons
= log2n
Bm267 10
BM267 - Introduction to Data
Structures
Bm267 1
BST
• The basic operations (search, insert, delete)
can be performed in O (logn) time when a
balanced search tree is used.
• Definition: A binary search tree is a binary
tree. A nonempty binary search tree
satisfies the following properties:
1.Every element has a key( or value) and no
two elements have the same key; therefore
all keys are distinct.
2.The keys (if any ) in the left subtree of the
root are smaller than the key in the root.
Bm267 2
BST
3. The keys ( if any ) in the right subtree of the root
are larger than the key in the root.
4. The left and right subtrees of the root are also a
binary search trees. root
root root
60
20 30
70
15 25 5 40
65 80
12 10 22 7
Bm267 3
BST
• We can remove the requirement that all elements
in a binary search tree need be distinct keys. Now
we replace smaller in property 2 by smaller than
or equal to.
• The resulting tree is called a binary search tree
with duplicates.
struct TreeNode
{
int data;
struct TreeNode *left;
struct TreeNode *right;
}
Bm267 4
BST
Linked representations of binary search trees.
root
B G
A E M
Bm267 5
BST- Searching
• Suppose we wish to search for an element with key k. We
begin at the root. If the root is NULL, the search tree
contains no elements and the search is unsuccessful.
• Otherwise, we compare k with the key in the root. If key is
less than the root, then no element in the right subtree can
have key value k and only left subtree is to be searched.
• If the key is larger than the key in the root, only right
subtree needs to be searched.
• If k equals the key in the root, then the search terminates
successfully.
• The subtrees can be searched similarly.
• The time complexity is O(h) where h is the height of the
tree.
Bm267 6
BST- Searching
int tree_search ( int key ){
struct treenode *p = root;
while( p !=NULL ){
if( key < p->data)
p = p->left;
else if ( key > p->data)
p = p->right;
else
return 1; }
return 0;}
Bm267 7
BST- Searching
int search_rec(struct treenode *p, int key ){
if( p == NULL )
return 0;
if( key == p->data )
return 1;
if ( key < p->data )
return search_rec( p->left, key);
else
return search_rec( p->right, key);
}
Bm267 8
BST- Insertion
• To insert a new element ‘e’ into a binary search
tree, we must first verify that its key is different
from those of existing elements by performing a
search in the tree.
• If the search is unsuccessful, then the element is
inserted at the point the search terminated.
• For instance, to insert an element with key 80 into
tree, we must search for 80.
• This search terminates unsuccessfully, and the last
node examined the one with key 40.
• The new element is inserted as the right child of
this node.
Bm267 9
BST- Insertion
void tree_insert( int key ){
struct treenode *p, *pp, *r;
p = root; // search pointer
pp = NULL; //parent of p
while( p ){
pp = p;
if( key < p->data)
p = p->left;
else if ( key > p->data)
p = p->right;
else
printf(" The key is already in the tree.\n"); }
Bm267 10
BST- Insertion
r = malloc( sizeof(struct treenode * ));
r->data = key;
r->left = NULL;
r->right = NULL;
if(root) // tree is not empty {
if( key < pp->data)
pp->left = r;
else
pp->right = r; }
else // insertion into an empty tree
root = r; }
Bm267 11
BST- Deletion
• For deletion we consider three possibilities for the
node p that contains the element to be deleted:
p is a leaf
p has one nonempty subtree
p has two nonempty subtree
• Case 1 is handled by discarding the leaf node.
• To delete 35 from the tree, the left-child field of
its parent is set to NULL and node is discarded.
Bm267 12
BST- Deletion
root
30
5 40 pp
2
p 35 80
pp->left = NULL
Bm267 13
BST- Deletion
• Next consider the case when p has only one nonempty
subtree.
• if p has no parent (i.e. it is the root), node p is discarded and
the root of its single subtree becomes the new search tree root.
root root
root = p->right;
p 30 40
40
35 80
35 80
Bm267 14
BST- Deletion
• if p has a parent pp, then we change the pointer
from pp so that it points to p’s only child and then
delete the node p.
• For instance if we want to delete 5, we change the
left-child field of its parent( the node containing
30) to point the node containing the 2.
root pp->left = p->left;
30 pp
p 5 40
2 35 80
Bm267 15
BST- Deletion
• Finally, to delete an element that has two
nonempty subtrees, we replace this element with
either the largest element in its left subtree or the
smallest element in its right subtree.
root root
30
30
5 40 5 60
2 35 80
2 35 80
32 85
32 60 85
31 33
Bm267 31 33
16
BST- Deletion
root
root
30
30
5 35
5 40
2 32 80
2 35 80
85 31 33 60 85
32 60
31 33
Bm267 17
BST- Tree Min
• An element in binary search tree whose key is the
minimum can always be found by following left
child from the root until a NULL is encountered.
root
30
5 35
2 32 80
31 33 60 85
Bm267 18
BST- Tree Min
struct node * tree_min(struct node * n )
{
while( n->left != NULL )
n = n->left;
return n;
}
Bm267 19
BST- Tree Max
• An element in binary search tree whose key is the
maximum can always be found by following right
child from the root until a NULL is encountered.
root
30
5 35
2 32 80
31 33 60 85
Bm267 20
BST- Tree Max
struct node * tree_max(struct node * n )
{
while( n->right != NULL )
n = n->right;
return n;
}
Bm267 21
BST- Successor
• The successor of a node ‘x’ is the node with the
smallest key greater than ‘x’.
• If the right subtree of node x is nonempty, then the
successor of x is the leftmost node in the right
subtree. 30
5 35
2 32 80
31 33 60 85
Bm267 22
BST- Successor
• If the right subtree of node x is empty and x has a
successor y, then y is the lowest ancestor of x
whose left child is also an ancestor of x.
y 30
5 35
2 7 32 80
13 x
9
Bm267 23
BST- Successor
struct node * tree_succ(struct node * n ){
struct node *y;
if( n->right != NULL )
return tree_min(n->right );
y = n->parent;
while( y != NULL && n == y->right)
{
n = y;
y = y->parent;
}
return y;}
Bm267 24
BST- Analysis
What is the run-time of search, insert, and delete?
• Proportional to the height of the tree
What is the height of a tree with n nodes?
• Worst-Case: O(n) in a linear tree
• Best-Case: O(log n) in a complete binary tree
30 2
5
5 40
7
2 20 35 80 20
Bm267 40 25
• References: Jeffrey S. Childs
Clarion University of PA
Bm267
Balanced Search Trees
• Binary search trees in the average requires ‘logn’
comparison for each operation (searching, deletion
and insertion), unfortunately in the worst case they
need ‘n’ comparisons.
• Computer scientist come up with some solutions
to find a structure that preserves the good
properties of the classical binary search tree.
– AVL Trees
– 2-3 Trees
– Red Black Trees
Bm267
AVL Trees
• Definition: An AVL tree is binary search tree in
which the balance factor of every node, which is
defined as the difference between the heights of the
node’s left and right subtrees, is either 0 or +1 or –1.
10 1 10 2
0 5 15 1 0 5 15 0
1 4 7 -1 20 0 1 4 7 -1
2 0 8 0 2 0 8 0
Bm267
AVL Trees
• If an insertion of a new node makes an AVL tree
unbalanced, we transform the tree by rotation.
• A rotation in an AVL tree is a local
transformations of its subtree rooted at a node
whose balance has become either +2 or –2.
• There are only four types of rotations; in fact two
of them are mirror images of the other two.
2 10 5 0
1 5 0 4 10 0
Single R rotation
0 4
Fall 2005 Bm267
AVL Trees
1 -2 2 0
2 -1 0 1 3 0
Single L rotation
3 0
2 3 3 2 2 0
1 -1 2 1 0 1 3 0
0 2 1 0
Bm267
Double LR rotation
AVL Trees
-2 1 -2 1
2 0
1 3 2 1
0 1 3 0
0 0
2 3
Double RL rotation
Bm267
AVL Trees
C
R Single R rotation
R
C
T3 T1
T2 T3
T1 T2
Bm267
AVL Trees
C
R Single L rotation
R
C
T3
T1
T1 T2
T2 T3
G T4
T1
T2 T3
Double LR rotation
or
{T1} < C < {T2} < G < {T3} <R < {T4}
Bm267
AVL Trees
G
C R
T2 T3
T1 T4
or
{T1} < C < {T2} < G < {T3} <R < {T4}
Bm267
AVL Trees
0 -1
5 5 -2 5 6 0
L
0 6 -1 6 0 5 8 0
0 8
R
6 1 2 6 6 1
1 5 8 0 2 5 0 8 0 3 8 0
0 3 3 1 0 2 5 0
2 0 Bm267
AVL Trees
6 2 LR 5 0
-1 3 8 0 0 3 6 -1
0 2 5 1 0 2 4 0 8 0
4 0
Bm267
AVL Trees
5 -1 RL 5 0
0 3 6 -2 0 3 7 0
0 2 4 0 8 1 0 2 4 0 06 0 8
7 0
Bm267
AVL Trees
• How efficient is an AVL trees?
• The height h of any AVL tree with n nodes
satisfies the inequalities
log2n h < 1.4405log2(n+2)–1.3277
Bm267
2-3 Search Trees
• The second idea of balancing a search tree is to
allow more than one key in the same node.
• The simplest implementation of this idea is 2-3
trees.
• A 2-3 tree is a tree that can have nodes of two
kinds: 2-nodes and 3-nodes.
• A 2-node contains a single key K and has two
children: the left child serves as the root of a
subtree whose keys are less than K and the right
child serves as the root of a subtree whose keys
are greater than K.
Bm267
2-3 Search Trees
<K >K
2-node
Bm267
2-3 Search Trees
• A 3-node contains two ordered keys K1 and K2
(K1 < K2 ) and has three children.
• The leftmost subtree has keys less than K1.
• The middle subtree has keys between K1 and K2.
• The rightmost subtree has keys greater than K2
Bm267
2-3 Search Trees
K1 < K2
Bm267
2-3 Search Trees
• The last requirement of the 2-3 tree is that its
leaves must be on the same level.
• A 2-3 tree is always height balanced;the length of
a path from the root of the tree to a leaf must be
same for every leaf.
• Searching for a given key K in a 2-3 tree quite
straightforward. We start with the root. If the root
is a 2-node, we act as if it were a binary search
tree: we either stop if K is equal to the root’s key
or continue the search in the left or right subtree.
Bm267
2-3 Search Trees
• If the root is a 3-node, we know after no more than
two key comparisons whether the search can be
stopped ( If K is equal to one of the root’s keys) or
in which of the root’s three subtrees it needs to be
continued.
3, 8
2 4,5 9
Bm267
2-3 Search Trees
• Inserting a new key in a 2-3 tree is done as
follows:
– We always insert a new key K at a leaf except for the
empty tree.
– The appropriate leaf is found by performing a search
for K.
• If the leaf in question is a 2-node key, we insert K
there as either the first or the second key, depending
on whether K is smaller or larger than the node’s old
key.
• If the leaf is a 3- node, we split the leaf in two; The
smallest of the three is put in the first leaf, the
largest key is put in the second leaf , while the
middle key is promoted to the old leaf’s parent.
Bm267
2-3 Search Trees
• If the leaf happens to be the tree’s root, a new root
is created to accept the middle key.
• Note that promotion of a middle key to its parent
can cause the parent’s overflow and hence can
lead to several node splits along the chain of the
leaf’s ancestors.
9,5,8,3,2,4,7
9 5, 9 5, 8, 9 8
5 9
Bm267
2-3 Search Trees
8 8 3, 8
3, 5 9 2, 3, 5 9 2 5 9
3, 8 3, 8
2 4,5 9 2 4, 5, 7 9
Bm267
2-3 Search Trees
3, 5, 8 5
3 8
2 4 7 9
2 4 7 9
Bm267
2-3 Search Trees
• The efficiency of the dictionary operations depends on the
tree’s height.
• Let’s find an upper bound for a 2-3 tree. A 2-3 tree of
height h with the smallest number of keys is full tree of 2-
nodes.
n >= 1+2+…+2h = 2h+1 –1
and hence
h <= log2(n+1) –1
• On the other hand, a 2-3 tree of height h with the largest
number of keys is a full of 3-nodes, each with two keys
and three children.
n <= 2*1+2*3+…+2*3h = 3h+1 –1
h>=log3(n+1) –1
Therefore: log3(n+1)–1 <= h <= log2(n+1)–1
Bm267
BM267 - Introduction to Data
Structures
14. Hashing
Bm267 1
Hashing
• A dictionary is an abstract type. A set of
operations searching, insertion, and deletion are
defined on its element.
• The elements of this set can be of numbers,
characters, character strings and so on.
• Typically, tables have several fields, each
responsible for keeping a particular type of
information about an entity.
• For example, a student record may contain fields
for student’s ID, name, date of birth and major and
so on.
Bm267 2
Hashing
• At least one field called key is used for
identifying entities ( the student’s ID ).
• Hashing is based on the idea of distributing n keys
among a one-dimensional array H[0…m-1] called
a hash table.
• The distribution is done by computing the value of
some predefined functions h called hash function.
• This function assigns an integer between 0 and
m-1, called hash address, to a key.
Bm267 3
Hashing
0
U
(universe of keys) h() h(k1)
h(k4)
k1 k4
k2
h(k2)=h(k5)
k5
k3 collision
h(k3)
K (current keys)
m–1
Bm267 4
Hashing
• The hash function depends on the key type.
• For example, if keys are positive integers, hash
function can be of the form h(k)= k mod m.
This function assures that the result is always
between 0 and m-1.
• In general, hash function needs to satisfy two
requirement:
A hash function needs to distribute keys among the
cells of the hash table as evenly as possible. ( Because
of this requirement the value of m is usually chosen to
be prime.)
A hash function has to be easy to compute
Bm267 5
Hashing
• If we choose a hash table’s size m to be smaller than
the number of keys n, we will get collisions ( two
(or more) keys being hashed into same cell of the
hash table).
• In fact, in the worst case, all the keys could be
hashed to the same cell of the hash table.
• With an appropriately chosen size of the hash table
and a good hash function, this situation will happen
rarely.
• Still, every hashing scheme must have a collision
resolution mechanism.
• There are two version of hashing: open hashing
(separate chaining) and closed hashing (open
Bm267 Fall 2005 6
addressing)
Collisions
Bm267 7
Collisions
Bm267 8
Collisions handling
Bm267 9
Open Hashing (Separate Chaining)
• In open hashing, keys are stored in linked
list attached to cells of a hash table.
keys A S E R C H I N G X M P L
hash value 0 2 0 4 4 4 2 2 1 2 4 3 3
0 A 0 A
1 1
2 2 S
3 3
4 4
Bm267 10
Open Hashing (Separate Chaining)
0 E A
1
2 S
3
4 0 E A
1 G
2 X N I S
3 L P
4 M H C R
Bm267 11
Open Hashing (Separate Chaining)
• Consider , as an example, the following list of
words:
A, FOOL, AND, HIS MONEY, ARE, SOON, PARTED
• As a hash function, we will use the simple
function for strings, that is we will add the
positions of a word’s letters in the alphabet and
compute the sum’s remainder after division by 13.
• We start with an empty table.
• The first key is the word “A”; its hash is
h(A) = 1 mod 13 = 1
Bm267 12
A
Open Hashing (Separate Chaining)
• The second key –the word FOOL; its hash is
h(FOOL) = (6 +15 +15+12) mod 13 = 9
0 1 2 3 4 5 6 7 8 9 10 11 12
A
FOOL
0 1 2 3 4 5 6 7 8 9 10 11 12
A
AND FOOL SOON PARTED
Bm267 13
ARE
Open Hashing (Separate Chaining)
• A collision occurred at position 11, because h(ARE) =
(1+8+5) mod 13 = 11 and h(SOON) = (19+15+15+14) mod
13 = 11.
Searching
•Apply same procedure to search key that was used for
creating the table.
•If we want to search a key “KID” in the hash table, we first
compute the value of the same hash function for the key:
h(KID) = 11.
•Since the list attached to cell 11 is not empty, so the list may
contain the key. After comparing the string KID first with
SOON and then ARE, we end up with an unsuccessful search.
Bm267 14
Open Hashing (Separate Chaining)
• The efficiency of searching depends on the length
of the list.
• The length of the list depends on the table size
and the quality of the hash function.
• If the hash function distributes n keys among m
cells of the hash table evenly, each list will be
about n/m keys long.
• The ratio = n/m called the load factor of the
hash table.
• The average number of pointers inspected in
successful search will be 1 + /2. In unsuccessful
search, it will be .
Bm267 15
Open Hashing (Separate Chaining)
• The other two dictionary operations –
insertion and deletion are almost identical to
searching.
• Insertion are done to the front or end of the
list.
• Deletion is performed by searching for a
key to be deleted and then removing it from
the its list.
Bm267 16
Closed Hashing( Open Addressing)
• In closed hashing, all keys are stored in the hash
table itself without the use of linked list.
• This implies that the table size m must be at least
as large as the number of keys n.
• Different strategies can be employed for collision
resolution. The simplest one – called linear
probing- checks the cell following the one where
collision occurs. If that cell is empty, the new key
is saved there, if not the first empty cell is
searched. If the end of the table is reached, the
search starts from the beginning of the table. The
table is treated as a circular array.
Bm267 17
Linear Probing
keys A FOOL AND HIS MONEY ARE SOON PARTED
hash ad 1 9 6 10 7 11 11 12
0 1 2 3 4 5 6 7 8 9 10 11 12
A
A FOOL
A AND FOOL
A AND FOOL HIS
A AND MONEY FOOL HIS
A AND MONEY FOOL HIS ARE
A AND MONEY FOOL HIS ARE SOON
PARTED A AND MONEY FOOL HIS ARE SOON
Bm267 18
Linear Probing
• To search for a given key k, we start by computing h(k)
where h is the hash function used in the table’s
construction.
• If the cell is empty the search is unsuccessful.
• If the cell is not empty, we must compare k with the cell: if
they are equal, we have found a matching key; if they are
not we compare k with a key in the next cell and continue
in this manner until either we encounter a matching key or
an empty cell (unsuccessful search)
• For example, if we search the word LIT in the table, first
we need to calculate the hash value. We will get
h(LIT) = ( 12 + 9 +20 ) mod 13 = 2 , since cell is empty
the search is unsuccessful. We can stop immediately
Bm267 19
Linear Probing
• However if we search for KID with h(KID) = (11+9+4)
mod 13 = 11, we will compare KID with ARE, SOON,
PARTED, and, A before we can declare the search is
unsuccessful.
• While the search and insertion operations are
straightforward, deletion is not.
• For example, if we simply delete the key ARE from the
table, we would unable to to find key SOON afterward.
• After computing h(SOON) = 11, the algorithm would find
this location empty and report the search is unsuccessful.
• The solution is to mark previously occupied locations by a
special symbol to distinguish them from locations that
have not been occupied.
Bm267 20
Linear Probing
•Linear probing creates clusters. Clusters are bad news in
hashing because they make operations less efficient.
Bm267 21
Double Hashing
• Double Hashing is one of the best methods available for
open addressing.
• Double hashing uses a hash function of the form
h(k,i) = ( h1(k) + ih2(k)) mod m
where h1 and h2 are auxiliary hash functions.
• The initial probe is to position h1(k); successive probe
positions are offset from previous positions by the amount
h2(k) mod m.
• The h2(k) must be relatively prime to the hash-table size m.
• A convenient way to ensure this condition, let m be prime
and to design h2 so that it always returns a positive integer
less than m.
Bm267 22
Double Hashing
•For example, we could choose m prime and
let
h1(k) = k mod m,
h2(k) = 1 + (k mod m’)
where m’ is chosen to be slightly less than ( say m-
1).
•If k =123456, m = 701 and m’ =700, we have
h1(k) = 80 and h2(k) =257 so the first probe is
at position 80, and then every 257th slot is
examined.
Bm267 23
Double Hashing
0 1 2 3 4 5 6 7 8 9 10 11 12
79 69 14 92 72
h1(k) = k mod 13
h2(k) = 1 + ( k mod 11)
Bm267 24