Faculty of Engineering, Assumption University
CE2102: Data Lecture 14: Hashing
Structure & Algorithms
Hashing
❖ Technique used to perform
insertions, deletions, and finds
in constant average time, O(1)
Hash Table
❖ Hash table is a data structure that associates keys with
values
❖ Key is string with associated value (salary, telephone
number, …)
❖ Size of hash table is normally a prime number
Hash Table
Hash Function
❖ Hash function maps keys to index
❖ In the example, John Smith hashes to 873, Lisa Smith to
1, Sam Do to 998
❖ Two keys may be mapped to the same location (collision)
❖ Good hash function will spread values over the table,
and there are only few collisions
Hash Function
❖ Suppose table size = 5
❖ Let a = 0, b = 1, c = 2, …
❖ H[Key] = (∑ characters) % 5
❖ H[cab] = (2+0+1) % 5 = 3
❖ H[bed] = (1+4+3) % 5 = 3
❖ H[bad] = (1+0+3) % 5 = 4
Collision Resolutions
❖ A collision resolution is used to solve the collision by
mapping the collided key to another location
❖ Separate chaining
❖ Open addressing
❖ Linear probing
❖ Quadratic probing
❖ Double hashing
Separate Chaining
❖ Store all the elements mapped to the same location in a
linked list
❖ When an element is added, put it at the end of a list
Separate Chaining
Load Factor (λ)
❖ Number of elements / table size
❖ Average number of elements in the list
❖ Average (unsuccessful) search complexity is 1+λ
❖ Average (successful) search complexity is 1+λ/2
Load Factor (λ)
❖ Low λ: table is sparsely used
❖ Good for searching
❖ Bad for utilization
❖ High λ: table is densely used
❖ Bad for searching
❖ Good for utilization
❖ Rule: keep λ around 1
Open Addressing
❖ Separate chaining requires pointer manipulations and
dynamic memory allocation which are expensive
❖ When collision occurs, try alternate locations until an
empty location is found
Linear Probing
❖ Every location has only one element
❖ If the table is not full, an element can be inserted
❖ If Key is K, and H[K] is not empty
❖ Try H[K]+1, H[K]+2, …
❖ Continue until an empty location is found
Linear Probing
Linear Probing
❖ Sometimes it is difficult to find an empty location
❖ Linear probing creates long sequences of filled locations
of unrelated keys (primary clustering)
Linear Probing: Example
0
3
❖ H[K] = K % 10
4
❖ Table size = 10
5
❖ Insert 18, 89, 21, 58, 68, 11 6
9
Linear Probing: Example
0
3
❖ Insert 18 4
❖ 18 % 10 = 8 (ok) 5
18 8
9
Linear Probing: Example
0
3
❖ Insert 89 4
❖ 89 % 10 = 9 (ok) 5
18 8
89 9
Linear Probing: Example
0
21 1
3
❖ Insert 21 4
❖ 21 % 10 = 1 (ok) 5
18 8
89 9
Linear Probing: Example
58 0
21 1
2
❖ Insert 58 3
❖ 58 % 10 = 8 (collision) 4
❖ Try 8+1 % 10 = 9 (collision) 5
6
❖ Try 8+2 % 10 = 0 (ok)
7
18 8
89 9
Linear Probing: Example
58 0
21 1
❖ Insert 68
68 2
❖ 68 % 10 = 8 (collision) 3
❖ Try 8+1 % 10 = 9 (collision) 4
❖ Try 8+2 % 10 = 0 (collision) 5
6
❖ Try 8+3 % 10 = 1 (collision)
7
❖ Try 8+4 % 10 = 2 (ok)
18 8
89 9
Linear Probing: Example
58 0
21 1
68 2
❖ Insert 11 11 3
❖ 11 % 10 = 1 (collision) 4
❖ Try 1+1 % 10 = 2 (collision) 5
6
❖ Try 1+2 % 10 = 3 (ok)
7
18 8
89 9
Primary Clustering
0
3
❖ H[K] = K % 10 4
❖ Insert 89, 18, 49, 58, 69 5
9
Primary Clustering
0
3
❖ Insert 89 4
❖ 89 % 10 = 9 (ok) 5
89 9
Primary Clustering
0
3
❖ Insert 18 4
❖ 18 % 10 = 8 (ok) 5
18 8
89 9
Primary Clustering
49 0
3
❖ Insert 49
4
❖ 49 % 10 = 9 (collision)
5
❖ Try 9+1 % 10 = 0 (ok) 6
18 8
89 9
Primary Clustering
49 0
58 1
❖ Insert 58 2
3
❖ 58 % 10 = 8 (collision)
4
❖ Try 8+1 % 10 = 9 (collision)
5
❖ Try 8+2 % 10 = 0 (collision) 6
❖ Try 8+3 % 10 = 1 (ok) 7
18 8
89 9
Primary Clustering
49 0
58 1
❖ Insert 69 69 2
3
❖ 69 % 10 = 9 (collision)
4
❖ Try 9+1 % 10 = 0 (collision)
5
❖ Try 9+2 % 10 = 1 (collision) 6
❖ Try 9+3 % 10 = 2 (ok) 7
18 8
89 9
Primary Clustering
49 0
Primary
58 1
Cluster
❖ Insert 69 69 2
3
❖ 69 % 10 = 9 (collision)
4
❖ Try 9+1 % 10 = 0 (collision)
5
❖ Try 9+2 % 10 = 1 (collision) 6
❖ Try 9+3 % 10 = 2 (ok) 7
18 8
89 9
Quadratic Probing
❖ If Key is K, and H[K] is not empty
❖ Try H[K]+1, H[K]+4, H[K]+9 …
❖ Continue until an empty location is found
Quadratic Probing
❖ Fewer collisions than linear probing
❖ There is no guarantee to find empty location once table
is more than half full
❖ Elements that hash to same locations always probe same
set of cells (secondary clustering)
Quadratic Probing: Example
0
3
❖ H[K] = K % 10
4
❖ Table size = 10
5
❖ Insert 18, 89, 21, 58, 68, 11 6
9
Quadratic Probing: Example
0
3
❖ Insert 18 4
❖ 18 % 10 = 8 (ok) 5
18 8
9
Quadratic Probing: Example
0
3
❖ Insert 89 4
❖ 89 % 10 = 9 (ok) 5
18 8
89 9
Quadratic Probing: Example
0
21 1
3
❖ Insert 21 4
❖ 21 % 10 = 1 (ok) 5
18 8
89 9
Quadratic Probing: Example
0
21 1
58 2
❖ Insert 58 3
❖ 58 % 10 = 8 (collision) 4
❖ Try 8+1 % 10 = 9 (collision) 5
6
❖ Try 8+4 % 10 = 2 (ok)
7
18 8
89 9
Quadratic Probing: Example
0
❖ Insert 68
21 1
❖ 68 % 10 = 8 (collision) 58 2
❖ Try 8+1 % 10 = 9 (collision) 3
❖ Try 8+4 % 10 = 2 (collision) 4
5
❖ Try 8+9 % 10 = 7 (ok)
6
68 7
❖ Note: 68 takes the same probes
18 8
as 58 (secondary clustering)
89 9
Quadratic Probing: Example
0
21 1
58 2
❖ Insert 11 3
❖ 11 % 10 = 1 (collision) 4
❖ Try 1+1 % 10 = 2 (collision) 11 5
6
❖ Try 1+4 % 10 = 5 (ok)
68 7
18 8
89 9
Double Hashing
❖ Avoid secondary clustering
❖ Apply a second hash function to the key when a
collision occur
❖ H[i,K] = (H1[K] + i*H2[K]) % Table size
❖ Popular second hash function is H2[K] = R – (K % R), R
is a prime number smaller than the table size
Double Hashing: Example
0
3
❖ H1[K] = K % 10
4
❖ H2[K] = 7 – (K % 7)
5
❖ Insert 89, 18, 49, 58, 69 6
9
Double Hashing: Example
0
3
❖ Insert 89 4
❖ 89 % 10 = 9 (ok) 5
89 9
Double Hashing: Example
0
3
❖ Insert 18 4
❖ 18 % 10 = 8 (ok) 5
18 8
89 9
Double Hashing: Example
0
3
❖ Insert 49
4
❖ 49 % 10 = 9 (collision)
5
❖ Try (9+ 7-(49%7))%10 = 6 (ok) 49 6
18 8
89 9
Double Hashing: Example
0
58 3
❖ Insert 58
4
❖ 58 % 10 = 8 (collision)
5
❖ Try (8+ 7-(58%7))%10 = 3 (ok) 49 6
18 8
89 9
Double Hashing: Example
69 0
58 3
❖ Insert 69
4
❖ 69 % 10 = 9 (collision)
5
❖ Try (9+ 7-(69%7))%10 = 0 (ok) 49 6
18 8
89 9
Rehashing
❖ If the hash table is nearly full, then a hash table of at
least twice the size is used
❖ Table size must be prime
❖ All elements will be transferred to the new hash table
(rehashing)
Applications
❖ Web search engine
❖ Checksum to detect errors in transmission or storage
❖ MD5: Message Digestion Algorithm
❖ Digital signature
Assignment
❖ Given input {4371, 1323, 6173, 4199, 4344, 9679, 1989}
and a hash function H[K] = K%10, show the resulting:
❖ Separate chaining hash table
❖ Open addressing hash table using linear probing
❖ Open addressing hash table using quadratic probing
❖ Open addressing hash table with H2[K] = 7-(K%7)