0% found this document useful (0 votes)
13 views14 pages

Hashing Updated

Hashing is a searching technique with a constant time complexity of O(1) that utilizes hash tables and hash functions for efficient data storage and retrieval. It minimizes comparisons compared to linear and binary search methods, making it advantageous for applications like databases and caches. Collision resolution techniques, such as separate chaining and open addressing, are essential for managing instances where multiple keys hash to the same index.

Uploaded by

Raunak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

Hashing Updated

Hashing is a searching technique with a constant time complexity of O(1) that utilizes hash tables and hash functions for efficient data storage and retrieval. It minimizes comparisons compared to linear and binary search methods, making it advantageous for applications like databases and caches. Collision resolution techniques, such as separate chaining and open addressing, are essential for managing instances where multiple keys hash to the same index.

Uploaded by

Raunak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

HASHING

Hashing is one the searching techniques that uses a constant time. The time
complexity in hashing is O(1). It is used in data structure to store & retrieve data
effectively. Till now, we read the two techniques for searching i.e. linear search and
binary search.

The worst case time complexity in linear sear is O(n), and O(log n) in binary search. In
both the searching techniques, the searching depends upon the number of elements
but hashing minimizes the number of comparisons while performing the search. So,
hashing technique completes the search with a constant time.

In hashing technique, the hash table and hash function are used. Using the hash
function, we can calculate the address at which value can be stored.

Advantage-
Unlike other searching techniques,
 Hashing is extremely efficient.
 The time taken by it to perform the search does not depend upon the total
number of elements.
 It completes the search with constant time complexity O(1).

Why do we need hashing?


 We need something that can do better than a binary search, O (log n).
 We want O(1)

Solution: Hashing

Applications

 Data Retrieval: Hashing is widely used in data retrieval applications where quick
access to stored data is essential, such as databases, caches, and symbol tables in
compilers.

 Efficiency: The efficiency of hashing primarily depends on the quality of the hash
function and the effectiveness of collision resolution techniques. A good hash
function minimises collisions, while efficient collision resolution ensures optimal
overall performance.

1
Hash Table:

 A hash table is a data structure that stores records in an array, called a hash table. A
hash table can be used for quick insertion and searching.
 A hash table is a data structure that stores some information, and the information has
basically two main components, i.e. index and value.

HASH FUNCTION:

 Hash function takes the data item as an input and returns a hash value as an output.
 Hash value of the data item is then used as an index for storing it into the hash table.

Hash Key Value-

Types of Hash Functions-


There are various types of hash functions available such as
1. Division method
2. Mid Square
3. Folding method
4. Multiplication Method

Properties of Hash Function-


The properties of a good hash functions are-
1. It is efficiently computable.
2. It minimizes the number of collisions.
3. It distributes the keys uniformly over the table.

2
1. Division Method:
The k-value is divided by M in this hash function, and the result is used.

Formula:
H(K) = K mod M
(Here, K = key value, and M = the size of the hash table).

Example:
If the Key: 6, 20, 21, 24, 26, 32 is placed in the hash table and if the table size is 10 then
Solution:

H(6) = 6 % 10 = 6
H(20)= 20 % 10 = 0
H(21)= 21 % 10 = 1
H(24)=24 % 10 = 4
H(26)= 26 % 10 = 6 collision
H(32) =32 % 10 = 2

index Value
0 20
1 21
2 32
3
4 24
5
6 6
7
8
9

Hash table

3
2. Mid square Method:
In the mid square method, the key is squared and the middle or mid part of the result is
used as the index. If the key is a string, it has to be preprocessed to produce a number.

Example 2:
For the Hash Table size is 100
Let k=3205. The hash function square the K that is, K2= (3205)2
= 102]72[025
Take middle value. This middle value is the address of bucket. Mid square is applied,
when only the bucket size is a power of 2.

Example: 3

4
Example:
For the Hash Table size is 1000

4. Multiplication method

5
What is Collision?
The situation in which the hash function may return the same hash value for two or
more keys. When two or more keys have the same hash value, a collision happens.
To handle this collision, we use Collision Resolution Techniques.

6
Collision Resolution Techniques (CRT)
There are mainly two methods to handle collision:
1. Separate Chaining
2. Open Addressing

1) Separate Chaining
Separate Chaining is to make each cell of the hash table point to a linked list of
records that have the same hash function value. Chaining is simple but requires
additional memory outside the table.
Formula:
H(k) = k mod m

Advantages:

 Insertion time-O(1)
 Simple to implement

Disadvantages:

 Wastage of space
 Uses extra space for links
 If the chain become long, then search time become O(n) in the worst case
 Case performance of chaining is not good as keys are stored using a linked list.
open addressing provides better cache performance as everything is stored in
the same table
 It requires pointers which causes the algorithm to slow down a bit.

7
Example 1:
Using the hash function ‘key mod 8’ insert the following sequence of keys in the Hash
table 36, 18, 72, 43, 6, 10, 5 and 15.
Use separate chaining technique for collision resolution
Solution:

8
2. Open Addressing
In open addressing, all the keys are stored inside the hash table and No key is stored
outside the hash table

1. Linear Probing
 In linear probing, the hash table is searched sequentially that starts from the
original location of the hash value.
 If in case the location that we get is already occupied, then we check for the
next location.
 If the end of the list is reached and no empty slot is found. The probing starts
at the beginning of the list.

Formula:
Following hash function is used to resolve the collision
h(k, i) = [h(k) + i] mod m
Where,
h(k) = k mod m
i = the probe number that varies from 0 to m-1

Note:
Linear probing suffers from both primary clustering and secondary clustering, while
Quadratic probing suffers only from secondary clustering.

Advantage:

 It does not require pointers.


 Linear probing require very less memory

Disadvantage:
 The main problem with linear probing is clustering.
 Primary clustering
 Secondary clustering
 The Search time can become(n) in worst case
 Deletion difficult

Problem with linear probing:


Linear probing suffers from both primary clustering and secondary clustering . Primary
clustering is a process in which a block of data is formed in the hash table when
collision is resolved

9
Example:

Note:

Example 2:
Using the hash function ‘key mod 10’, insert the following sequence of the keys are:
89, 18, 49, 58, 69 and table size = 10
Use linear probing technique for collision resolution. What will be location of key 69
h(k, i) = [h(k) + i] mod m
h(89,0) = 89 mod 10 = 9
h(18,0) = 18 mod 10 = 8
h(49,0) = 49 mod 10 = 9 (collision)
h(49,1) = (49+1) mod 10 = 0

h(58,0) = 58 mod 10 = 8 (collision)


h(58,1) = (58+1) mod 10 = 9 (collision)
h(58,2) = (58+2) mod 10 = 0 (collision)
h(58,3) = (58+3) mod 10 = 1

h(69,0) = 69 mod 10 = 9 (collision)


h(69,1) = (69+1) mod 10 = 0 (collision)
h(69,2) = (69+2) mod 10 = 1 (collision)
h(69,3) = (69+3) mod 10 = 2
So, key 69 will be inserted in index-2 of the hash table

10
Example 3:

Detailed Solution:

Hence, the value 18 would be stored at the index 5.


So, key 18 will be inserted in index-5 of the hash table

11
[Link] Probing

 It is a kind of open addressing [Link] is a collision resolution method that


eliminates the primary clustering problem of linear probing.
 In Quadetic probing, we get secondary clustering problem.

Formula:
h(k, i) = (h(k) + i2 )mod m

Where m is the hash table size & i=0,1,2.....m-1

Example:
Keys are: 21, 44, 53, 67, 31, 51, 81
Hash table size=10
Solution:
h(k, i) = (h(k) + i2 )mod m index Value
0 81
h(21,0) = 21 % 10 = 1 1 21
h(44,0) = 44 % 10 = 4 2 31
h(53,0) = 53 % 10 = 3 3 53
h(67,0) = 67 % 10 = 7 4 44
h(31,0) = (31+0) % 10 = 1 collision 5 51
2 6
h(31,1) = (1+1 ) % 10 = 2 7 67
8
h(51,0) = (51+0) % 10 = 1 collision 9
h(51, 1) = (1+ 12) % 10 = 2 collision
h(51, 2) = (1+ 22) % 10 = 5

h(81,0) = (81 + 0)% 10 = 1


h(81,1) = (1 + 12)% 10 = 2 collision
2
h(81,2) = (1 + 2 )% 10 = 5 collision
2
h(81,3) = (1 + 3 )% 10 = 0
Secondary Clustering: Elements that Hash to the same position will probe the same
alternative cells. This is known as secondary clustering.

Advantages:
 No extra space
 Primary cluster resolved

Disadvantages:
 Search time –O(n)
 Secondary clustering
 No guarantee finding slot

12
3. Double Hashing:

 It is a collision resolution technique which uses two hash functions to handle


collision.
 Double hashing is a technique that reduces clustering in an optimized way.

Formula:
h(k, i) = [h1(k) + i*h2(k)] mod m
Where,
h1(k) = k mod m
h2(k) = prime number-(k mod prime number)

Example:

The keys are: 89, 18, 49, 58, and 69


The hash able size are = 10

Solution:

h1(k) =k mod 10, h2(k) = 7 - (k mod 7)

h1(69) = 69 mod 10 = 9 (occupied)


h2(69) = 7-(69 mod 7) = 7 - 6 = 1.
h1(69) = (9+1) mod 10 = 0

APPLICATIONS:

 It is used in caches.
 It is used in finding duplicate records.
 Used in finding similar records.
 Finding similar substrings.

Advantages:
 Avoids clustering
Disadvantages: O(n)

13
AKTU Questions
1. What do you mean by hashing and collision? Discuss the advantages and
disadvantages of hashing over other searching techniques.

2. What is hashing? Explain all probing technique to resolve collision in open


addressing.

3. Classify the hashing functions based on the various methods by which key value is
found.
4. What is hashing? Give the characteristics of hash function. Explain collision
resolution technique in hashing.
5. How using hash table is beneficial for us? Explain collision -resolution strategies used
in hash table
6. What are the disadvantages of linear probing in hashing? Discuss how quadratic
probing can be used to solve some of these problems.
7. A hash function H defined as H(key) =key%7, with linear proving is used to insert the
key 37, 38, 72, 48, 98, 11, 56 into a table indexed from 0 to 6. What will be the
location of key 48 in hash table? Justify your answer.
8. What is hashing? Define hash function. Discuss open hashing and close hashing with
suitable example.
9.

10.

14

You might also like