Data Structures
Hashing
Hashing
It is always a struggle to store huge data efficiently
Array is a common data structure but it suffers from
inefficient searching. Storage takes O(1) while retrieval in
the best case takes O(logN) while O(N) in the worst case.
Hashing data structure can offer constant time retrieval i.e.
O(1)
2
Components of Hashing
Key: A Key can be anything string or integer which is fed
as input in the hash function which is a technique that
determines an index or location for storage of an item in a
data structure.
Hash Function: The hash function receives the input key
and returns the index of an element in an array called a
hash table. The index is known as the hash index.
Hash Table: Hash table is a data structure that maps keys
to values using a special function called a hash function.
Hash stores the data in an associative manner in an array
where each data value has its own unique index.
3
Hashing
If “VU” is the key then Hash(“VU”) will return the index.
A Hash function assigns each value with a unique key.
Sometimes hash table uses an imperfect hash function that
causes a collision because the hash function generates the
same key of two different values.
4
Hashing
A common hash function is division i.e. h(ki) = ki % m
where m is size of hash table and ki is the key
For eg. if size of hash table is 10 then index 7 is returned if
ki is 7.
Its possible for two different key values, same index is
returned like for 7 and 17. Two values are stored at the
same index. This leads to collision problem.
The following are the collision resolution techniques:
– Open Hashing: It is also known as closed addressing (Separate
Chaining)
– Closed Hashing: It is also known as open addressing
5
Open Hashing
Collision resolution by chaining
6
Open Hashing (Closed addressing)
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
Index value of 3 is h(3)=(2*3+3)%m = 9%10 = 9. 3 stored
at index 9
Index value of 2 is h(2)=7%10 = 7. So 2 stored at index 7
Index value of 6 is h(6)=15%10 = 5. So 6 stored at index 5
Index of 11 is h(11)=25%10 = 5. 11 to be stored at index 5
Two values 6 and 11 to be stored at the same index,
leading to collision, Using the chaining method to avoid
the collision, one more list is created and the value 11 is
added to this list. After the creation of the new list, the
newly created list will be linked to the list having value 6
7
Closed Hashing (Open addressing)
Three techniques are used to resolve collision:
1. Linear probing
2. Quadratic probing
3. Double Hashing technique
Linear probing: In case of collision, the linear probing
technique searches for the closest free locations and adds a
new key to that empty cell.
8
Closed Hashing - Linear probing
Assume a simple hash function “key mod size” with size of 5 and
key sequence of 50, 70, 76, 93.
Indexes are 0, 0, 1, 3 respectively
9
Closed Hashing - Quadratic Probing / mid-square
In case of linear probing, searching is performed linearly. In contrast,
quadratic probing is an open addressing technique that uses
quadratic polynomials for searching until a empty slot is found.
It can also be defined as allowing the insertion ki at the first free
location from (H+i2)%m where i=0 to m-1.
Ex. Hash size of 7, hash function H(x) = x%7, insert 22, 30, and 50
Closed Hashing – Double hashing
Double hashing make use of two hash function, h1(k) to identify
original index. h2(k) is used in case of collision to identify new index
in combination with h1(k) like h(k, i) = (h1(k) + i * h2(k)) % m where
i ranging from 0 to m-1
Ex. A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h1(k) = 2k+3 and
h2(k) = 3k+1
Index of 3 = h1(3) = 9%10=9
Index of 2 = h1(2) = 7%10=7
Index of 9 = h1(9) = 21%10=1
Index of 6 = h1(6) = 15%10=5
Index of 11 = h1(11) = 25%10=5. h2(11)=34%10=4,
h(11, i)= (5 + i * 4)%10 for i=0 to m-1. Store on 3
Index of 13 = h1(13) = 29%10=9. h2(11)=40%10=0,
h(13, i)= (9 + i * 0)%10 for i=0 to m-1. No new index
found in any iteration. So 13 is not stored