ECE250:
Algorithms and Data Structures
Hash Tables (Part A)
Ladan Tahvildari, PEng, SMIEEE
Professor
Software Technologies Applied Research (STAR) Group
Dept. of Elect. & Comp. Eng.
University of Waterloo
Materials from CLRS: Chapter 11.1, 11.2, 11.4
Acknowledgements
v The following resources have been used to prepare
materials for this course:
Ø MIT OpenCourseWare
Ø Introduction to Algorithms (CLRS Book)
Ø Data Structures and Algorithm Analysis in C++ (M. Wiess)
v Thanks to many people for pointing out mistakes, providing
suggestions, or helping to improve the quality of this course
over the last ten years
Ø Please refer to the Acknowledgment on LEARN
Lecture 8 ECE250 2
The Problem
vRT&T is a large phone company, and they
want to provide caller ID capability:
Ø given a phone number, return the caller’s name
Ø phone numbers range from 0 to r = 108 -1
Ø want to do this as efficiently as possible
Lecture 8 ECE250 3
A Potential Solution
vA suboptimal way to design this dictionary is
an array indexed by key
§ takes O(1) time
§ O(r) space - huge amount of wasted space
(null) (null) Jens (null) (null)
Jensen
0000-0000 0000-0000 9635-8904 0000-0000 0000-0000
Lecture 8 ECE250 4
Symbol-Table Problem
vSymbol table T holding n records
records
x k ey[ x ] Operations on T
Ø INSERT (T , x )
Ø DELETE (T , x )
Ø SEARCH (T , k )
How should the data structure T be organized?
Lecture 8 ECE250 5
Direct Access Table
IDEA: Suppose that the set of keys is K Í { 0 ,1,..., m - 1}
and keys are distinct. Set up an array T [ 0 ..m - 1]
ì x if k Î K and key [ x ] = k
T [k ] = í
î NIL otherwise
Operations take q (1) time
Problem: The range of keys can be large
64-bit numbers (represent 18,446,744,073,709,551,616 different keys)
Lecture 8 ECE250 6
Hash Functions
Solution: Use a hash function h to map the
universe U of all keys into { 0 ,1,..., m - 1}
k1 T
0
k3
K k2
h ( k 1)
k4 h(k 4 )
U h(k 2 ) = h(k3 )
m -1
When a record to be inserted maps to an
already occupied slot in T, a collision occurs
Lecture 8 ECE250 7
Collision Resolution
v How to deal with two keys which hash to the same
spot in the array?
Ø Use chaining which sets up an array of links (a table),
indexed by the keys, to lists of items with the same key
Lecture 8 ECE250 8
An Example
Given the following input
{3171,4123,5973,2699,5344,5879,9789}
and the following hash function
h( x) = x mod 10
Show the resulting hash table using Chaining
Lecture 8 ECE250 9
Collision Resolution by Chaining
0
1 3171
3
4123 5973
4
5344
9 2699 5879 9789
Lecture 8 ECE250 10
Dictionary Operations with Chaining
v Search: CHAINED-HASH-SEARCH(T, k)
Ø search for an element with key k in list T [h(k)]
vInsertion: CHAINED-HASH-INSERT(T, x)
Ø insert x at the head of list T [h(key[x])]
vDeletion: CHAINED-HASH-DELETE(T, x)
Ø delete x from the list T [h(key[x])]
Lecture 8 ECE250 11
Analysis of Hashing
vAssumption: Each key is equally likely to be
hashed into any slot of table T independent of
where other keys are hashed
Simple Uniform Hashing
vGiven hash table T with m slots holding n
elements, the load factor is defined as a = n / m
Average Number of Keys per Slot
vAssume time to compute h (k ) is q (1)
v Search Time: q (1 + a )
Lecture 8 ECE250 12
Analysis of Operations with Chaining
v Assuming the number of hash table slots is proportional to the
number of elements in the table
n = O ( m)
a = n / m = O(m) / m = O(1)
ØSearch:
§ takes constant time on average
ØInsertion:
§ takes O(1) worst-case time
o Assumes that the element being inserted isn’t already in the list
o It would take an additional search to check if it was already inserted
ØDeletion:
§ takes O(1) worst-case time when the lists are doubly linked
§ If the lists are singly linked, then deletion takes as long as searching, because we
must find x’s predecessor in its list in order to correctly update next pointers
Lecture 8 ECE250 13
More on Collisions
vA key is mapped to an already occupied table
location
Ø what to do?!?
vUse a collision handling technique
vWe have seen Chaining
vCan also use Open Addressing
Ø Linear Probing
Ø Double Hashing
Lecture 8 ECE250 14
Open Addressing
v All elements are stored in the hash table (n £ m)
v Insertion systematically probes the table until an
empty slot is found à The table may fill up!
v Modify hash function to take the probe number i as
the second parameter (depends on both the key and
the probe number)
h : U ´ {0,1,..., m - 1} ® {0,1,..., m - 1}
probe number slot number
Hash function h determines the sequence of slots examined for a given key
v Probe sequence for a given key k
h ( k ,0), h ( k ,1),..., h ( k , m - 1) - a permutation of 0,1,..., m - 1
Lecture 8 ECE250 15
Linear Probing
v If the current location is used, try the next table
location h ( k , i ) = ( h ' ( k ) + i ) mod m
LinearProbingInsert(k)
01 if (table is full) error
02 probe = h(k)
03 while (table[probe] occupied)
04 probe = (probe+1) mod m
05 table[probe] = k
v Uses less memory than chaining
Ø one does not have to store all those links
v Slower than chaining (Primary Clustering)
Ø one might have to walk along the table for a long time
Lecture 8 ECE250 16
Hash Tables - Example 1
Given the following input
{3171,4123,5973,2699,5344,5879,9789}
and the following hash function
h( x) = x mod 10
Show the resulting hash table using
Ø Chaining
Ø Linear Probing
Lecture 8 ECE250 17
Collision Resolution by Chaining
0
1 3171
3
4123 5973
4
5344
9 2699 5879 9789
Lecture 8 ECE250 18
Collision Resolution by Linear Probing
0 5879
1 3171
2 9789
3 4123
4 5973
5 5344
7
8
9 2699
Lecture 8 ECE250 19
Hash Tables – Example 2
Show the resulting hash table using Linear Probing when the following keys:
{One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve}
are inserted one-by-one, in the order given into an initially empty table. Assume
that table size is m = 16 . Use the division method of hashing.
Use the following table of values for each key: x value (hexadecimal)
One 0x6EBE5
Two 0x75DAF
Three 0x75A73925
Four 0x19EED32
Five 0x19E8DE5
Six 0x72A38
Seven 0x7293792E
Eight 0x64A26A74
Nine 0x1BE8BE5
Ten 0x7592E
Eleven 0x4993792E
Twelve 0x292DDE5
Lecture 8 ECE250 20
Hash Tables – Example 2: Solution
0 Ten
1 Eleven
2 Four
3
4 Eight
5 One
6 Three
7 Five
8 Six
9 Nine
A Twelve
B
C
D
E Seven
F Two
Lecture 8 ECE250 21