HASH TABLE
Contents
1. Introduction
2. Hash function.
3. Methods to solve collisions
11-Jan-21 1
Introduction
▪ We often want to associate values with keys.
▪ For example, we might want to be able to look up an Airport based on its
code
▪ Can we use the 1D array or Singly linked list to store these Code-Airport
pairs?
▪ Yes
▪ Searching for a particular airport by its code in an array of N airports would take
O(N) time if airports are not sorted and O(NlogN) time if they are sorted.
11-Jan-21 2
Introduction
▪ Hash Table (HT) is an abstract data type which stores data in an
associative manner.
▪ A hash table uses a hash function (HF) to determine the index
where each key-value pair should go in the hash table.
11-Jan-21 3
Hash function
▪ A hash function is any function that can be used to map
data of arbitrary size to fixed-size values
▪ The values returned by a hash function are called hash
values, hash codes, digests, or simply hashes.
▪ The values are used to index a fixed-size table called a
hash table.
f(k3)
k1, k2,k3 f(k1)
f(k2)
11-Jan-21 4
Hash function
• Suppose the key k is an integer
• The modulo hash function:
f(k) = k mod M
where M is the size of the hash table
11-Jan-21 5
Collisions
▪ Ideally, the hash function will assign each key to a unique bucket,
▪ But most hash table designs employ an imperfect hash function,
which might cause hash collisions
▪ Collisions: The hash function generates the same index for more than one
key.
key1<>key2, but f(key1)=f(key2)
▪ Collisions are typically accommodated in some way:
1. Direct chaining Method
2. Coalesced chaining Method
3. Linear Probing Method
4. Quadratic Probing Method
5. Double hashing Method
11-Jan-21 6
Direct chaining Method
• The hash table is an array of linked lists i.e., each index (address)
has its own linked list.
• Elements that hash to the same address are placed on a linked list
at that address
• Ex: Insert the numbers 5, 7, 2, 9, 12, and 15 into the hash table of
size M=5 using the hash function f(k) = k%5
11-Jan-21 7
Direct chaining Method
Benefits of chaining
◼ Insertion in a hash table always occurs in O(1) since
linked lists allow insertion in constant time.
◼ A hash table using chaining method will never need to
be resized
11-Jan-21 8
Coalesced chaining Method
If a conflict happens, this element will be
added into the hash table at the first available
address, say j, from the bottom of the hash Key next
table. NULL -1
The field next of the last node at index i in the 0
chain will be updated to j.
… …
…
M-1 NULL -1
11-Jan-21 9
Coalesced chaining Method
Example: Suppose the 0 NULL -1
keys A, C, B, D, and E have
the hash values of 1, 2, 1, 1, 1 A 9
and 3 respectively. Insert
these keys to the hash table 2 C -1
of size 10.
3 E -1
Collision ➔ add B into … NULL -1
the fist available address,
from the bottom of the
hask table 8 D -1
9 B 8
11-Jan-21 10
Linear Probing Method
▪ The HT in this case is implemented using an
array containing M elements, each element of 0 nullkey
the HT has a field k used to contain the key of
the element
▪ M can be any positive integer but it is often 1 nullkey
chosen to be a prime number
▪ When the HT is initialized, all fields k are
assigned to nullkey (i.e. -1) 2 nullkey
3 nullkey
… nullkey
M-1 nullkey
11-Jan-21 11
Linear Probing Method
• When an element with the key k needs to
be added into the HT, the hash function
f(k) = k%M
will specify the address i = f(k) (i.e., an
index of an array) within the range [0, M-
1].
• If there is no conflict (the cell i is
unoccupied), this element is added into the
HT at the address i.
11-Jan-21 12
Linear Probing Method
• If the conflict takes place, the following
function is used to rehash
fj(key) = (f(key) + j) % M
Where j = 1, 2, ... at first time, second time,
… of the rehash respectively. This process
repeats until the available address found.
Then this element will be added at this
address.
11-Jan-21 13
Linear Probing Method
Insert keys 32, 53, 22, 92, 17, 34, 24, 37, 56 into a HT of size 10.
HF: f(key) = key % 10
Function for rehashing: fj(key) = (f(key) + j) % 10
0 NULL 0 NULL 0 NULL 0 NULL 0 56
1 NULL 1 NULL 1 NULL 1 NULL 1 NULL
2 32 2 32 2 32 2 32 2 32
3 53 3 53 3 53 3 53 3 53
4 NULL 4 22 4 22 4 22 4 22
5 NULL 5 92 5 92 5 92 5 92
6 NULL 6 NULL 6 34 6 34 6 34
7 NULL 7 NULL 7 17 7 17 7 17
8 NULL 8 NULL 8 NULL 8 24 8 24
9 NULL
11-Jan-21
9 NULL 9 NULL 9 37 9 37 14
Quadratic Probing Method
• The idea is to probe more widely separated
cells.
• The HT in this case is implemented using an
array containing M elements. Each element of
the HT has a field k used to contain the key of
the node
• When the HT is initialized, all elements are
assigned to -1.
11-Jan-21 15
Quadratic Probing Method
• When an element with the key k needs to
be added into the HT, the hash function
f(k) = k%M
will specify the address i = f(k) (i.e., an
index of an array) within the range [0, M-
1].
• If there is no conflict (the cell i is
unoccupied), this element is added into the
HT at the address i.
11-Jan-21 16
Quadratic Probing Method
• If the conflict takes place, the following
function is used to rehash
fj(key) = (f(key) + j2) % M
Where j = 1, 2, ... at first time, second time,
… of the rehash respectively. This process
repeats until the available address found.
Then this element will be added at this
address.
11-Jan-21 17
Quadratic Probing Method
Ex: Add keys 10, 15, 16, 20, 30, 25, ,26, and 36 into the HT of size 10.
HF: f(key) = key % 10
Function for rehashing: fj(key) = (f(key) + j2) % 10
0 10 0 10 0 10 0 10 0 10
1 NULL 1 20 1 20 1 20 1 20
2 NULL 2 NULL 2 NULL 2 NULL 2 36
3 NULL 3 NULL 3 NULL 3 NULL 3 NULL
4 NULL 4 NULL 4 30 4 30 4 30
5 15 5 15 5 15 5 15 5 15
6 16 6 16 6 16 6 16 6 16
7 NULL 7 NULL 7 NULL 7 26 7 26
8 NULL 8 NULL 8 NULL 8 NULL 8 NULL
9 NULL 9 NULL 9 25 9 25 9 25
11-Jan-21
18
Quadratic Probing Method
• The problem with quadratic probing method is that it
causes the secondary clutering problem
• Secondary clustering occurs when keys which hash to the
same location trace the same sequence in looking for an
empty location
• To eliminate secondary clustering problem, we can use
another approach: double hashing
11-Jan-21 19
Double Hashing Method
• Double hashing is one of the most effective methods of probing an
open addressed hash table.
• This method uses 3 functions
f1(key)= key %M.
f2(key) =(M-2)-key %(M-2)
fj(key) = (f1(key) + j* f2(key))%M
• When a node with the key k needs to be added into the hash table,
the first hash function f1 will specify the address i = f1(key) within a
range [0, M-1]
• where M is a prime number (a prime number M produces fewer collisions)
• If there is no conflict, this node is added into the hash table at the
address i
• If a conflict takes place, the next addresses will be specified by using
the rehashing function fj
• Where j = 1, 2, ... at first time, second time, … of the rehash
respectively.
11-Jan-21 20
Double Hashing Method
Add the keys 10, 15, 16, 20, 30, 25, 26, and 36 into HT of size 10 using:
f1(key)= key %M.
f2(key) =(M-2)-key %(M-2).
0 10 0 10 0 10 0 10 0 10
1 NULL 1 NULL 1 NULL 1 20 1 20
2 NULL 2 NULL 2 30 2 30 2 26
3 NULL 3 NULL 3 NULL 3 NULL 3 36
4 NULL 4 20 4 20 4 20 4 20
5 15 5 15 5 15 5 15 5 15
6 16 6 16 6 16 6 16 6 16
7 NULL 7 NULL 7 NULL 7 NULL 7 NULL
8 NULL 8 NULL 8 NULL 8 26 8 NULL
9 NULL 9 NULL 9 25 9 25 9 25
11-Jan-21 21