0% found this document useful (0 votes)
23 views33 pages

Hashing

notes of hashing

Uploaded by

ishnkhan123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views33 pages

Hashing

notes of hashing

Uploaded by

ishnkhan123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Hashing

The sequential search algorithm takes time


proportional to the data size, i.e, O(n).
Binary search improves on liner search reducing the search
time to O(log n).
With a BST, an O(log n) search efficiency can be
obtained; but the worst-case complexity is O(n).
To guarantee the O(log n) search time, BST height
balancing is required ( i.e., AVL trees).
Hashing

Suppose that we want to store 10,000 students records (each with


a 5-digit ID) in a given container.

·A linked list implementation would take O(n) time.

·A height balanced tree would give O(log n)access time.

·Using an array of size 100,000 would give O(1)access time but


will lead to a lot of space wastage.
Hashing

Is there some way that we could get O(1)


access without wasting a lot of space?
The answer is hashing.
Constant time per operation (on the average)
Like an array, come up with a function to map the
large range into one which we can manage.
Basic Idea

Use hash (or hashing) function to map hash


key into hash address (location) in a hash
table
Hash Function H: K -> L
If Student A has ID(Key) k and h is hash
function, then A’s Details is stored in
position h(k) of table
To search for A, compute h(k) to locate
position. If no element, hash table does
not contain A.
Example

Let keys be ID of 100 students And ID in


form of like 345610.

Now, we decided to take A[100]


And, Hash function is , say , LAST TWO DIGIT

So, 103062 will go to location 62 And same if


some one have 113062 Then again goes to the
location 62
THIS EVENT IS CALLED COLLISION
Collisions

U
(universe of keys) h(k1)
h(k4)

K k1
h(k2) = h(k5)
(actual k4 k2
keys) Collisions!
k5 k3 h(k3)

m-1

6
Collisions
Two or more keys hash to the same location
For a given set K of keys
If |K| ≤ m, collisions may or may not happen, depending on the
hash function
If |K| > m, collisions will definitely happen (i.e., there must be at
least two keys that have the same hash value)
Avoiding collisions completely is hard, even with a good hash
function

7
Collision Resolution

Methods:
Separate Chaining (open hashing)
Open addressing (closed hashing)
Linear probing
Quadratic probing
Double hashing

We will discuss chaining first, and ways to build “good”


functions.
8
Collision with Chaining - Discussion
Choosing the size of the table
Small enough not to waste space
Large enough such that lists remain short
Typically 1/5 or 1/10 of the total number of elements

How should we keep the lists: ordered or not?


Not ordered!
Insert is fast

Can easily remove the most recently inserted elements


Insertion in Hash Tables

Worst-case running time is O(1)


Assumes that the element being inserted isn’t already in the list
It would take an additional search to check if it was already inserted

10
Deletion in Hash Tables
Need to find the element to be deleted.
Worst-case running time:
Deletion depends on searching the corresponding list

11
Searching in Hash Tables

search for an element with key k in list T[h(k)]

Running time is proportional to the length of the list of

elements at location h(k)

12
Analysis of Hashing with Chaining:
Worst Case
How long does it take to search
T
for an element with a given key? 0

Worst case:
All n keys hash to the same slot

Worst-case time to search is Θ(n),


plus time to compute the hash
chain
function
m-1

13
Load Factor of a Hash Table

Load factor of a hash table T:


T
α = n/m 0

n = # of elements stored in the table chain


m = # of locations in the table = # of linked lists chain

α encodes the average number of elements chain

stored in a chain chain

α can be <, =, > 1 m-1

14
Hash Functions

A hash function transforms a hash key into a hash table address


What makes a good hash function?
(1) Easy to compute
(2) Approximates a random function: for every input, every output is equally likely (simple uniform
hashing)
In practice, it is very hard to satisfy the simple uniform hashing property
i.e., we don’t know in advance the probability distribution that keys are drawn from

15
Good Approaches for Hash Functions

Minimize the chance that closely related keys hash to the same slot

Strings such as pt and pts should hash to different slots

Derive a hash value that is independent from any patterns that may exist in the
distribution of the keys

Hash keys such as 199 and 499 should hash to different slots

16
The Division Method
Idea:

Map a key k into one of the m slots by taking the


remainder of k divided by m
h(k) = k mod m
m is usually chosen to be a prime number or a number without small divisors
to minimize the number of collisions
Advantage:
fast, requires only one operation
Disadvantage:
Certain values of m are bad, e.g.,
power of 2
non-prime numbers

17
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
The Midsquare Method
Idea:

Key k is squared.
h(k) = l
l is obtained by deleting digits from both
ends of k2
Same positions of
k2 are used for all the keys

K 3205 7148 2345

K2 10272025 51093904 5499025

H(k) 72 93 99
19
The mid-square method is a very good hashing method. It involves two steps to compute
the hash value-
Square the value of the key k i.e. k2
Extract the middle r digits as the hash value.
Suppose the hash table has 100 memory locations. So r = 2 because two digits are required
to map the key to the memory location.

k = 60
k x k = 60 x 60
= 3600
h(60) = 60

The hash value obtained is 60


The Folding Method
Idea:

Key k is partitioned into a number of parts k1,


k2,…,kr Where each part except possibly the last
has same number of digits as the required address
h(k) = k1+ k2+…+kr
Where leading-digit carries are ignored, if any
Sometimes even numbered parts are reversed

K 3205 7148 2345

32+05 71+48 23+45

H(k)
37 19 68
22
This method involves two steps:

Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has the same number of digits
except for the last part that can have lesser digits than the other parts.
Add the individual parts. The hash value is obtained by ignoring the last carry if any.
Formula:

k = k1, k2, k3, k4, ….., kn


s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s

Here,
s is obtained by adding the parts of the key k
Open Addressing
If we have enough contiguous memory to store all the keys (m > N) ⇒ store the
keys in the table itself
It is called “open” because the address where key k is stored also
e.g., insert 14
depends on keys already stored in hash table along with h(k)
It is also called closed hashing

No need to use linked lists anymore


Hashing with chaining is called open hashing
Basic idea:
Insertion: if a slot is full, try another one,
until you find an empty one
Search: follow the same sequence of probes
Deletion: more difficult

Search time depends on the length of the


probe sequence!
Common Open Addressing Methods

Linear probing
Quadratic probing
Double hashing

25
Linear probing: Inserting a key
Idea: when there is a collision, check the next available
position in the table (i.e., probing)

(h(k) + i) mod m, i=0,1,2,...

First slot probed: h(k)


Second slot probed: h(k) + 1
Third slot probed: h(k)+2, and so on
probe sequence: < h(k), h(k)+1 , h(k)+2 , ....>
The process wraps around to the beginning of
the table
26 wrap around
Linear Probing Example
27

insert(14) insert(8) insert(21) insert(2)


14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2
0 0 0 0
14 14 14 14
1 1 1 1
8 8 8
2 2 2 2
21 12
3 3 3 3
2
4 4 4 4

5 5 5 5

6 6 6 6

1 1 3 2
probes:
Quadratic probing: Inserting a key
Idea: when there is a collision, check the next 0
available position in the table 1
2
(h(k) + i2) mod m, i=0,1,2,... 3
probe sequence: < h(k), h(k)+1 , h(k)+4 , ....> 4
First slot probed: h(k) 5
Second slot probed: h(k) + 1 6
Third slot probed: h(k)+4, and so on 7
The process wraps around to the beginning 8
of the table 9
10
Clustering problem is less serious
11
But it is still an issue (secondary clustering)
12
multiple keys hashed to the same spot all follow the same
probe sequence.
28
Quadratic Probing Example
29

insert(14) insert(8) insert(21) insert(2)


14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2
0 0 0 0
14 14 14 14
1 1 1 1
8 8 8
2 2 2 2
2
3 3 3 3

4 4 4 4
21 21
5 5 5 5

6 6 6 6

1 1 3 1
probes:
Double Hashing
(1) Use one hash function to determine the first slot
(2) Use a second hash function to determine the increment for the
probe sequence
(h1(k) + i h2(k) ) mod m, i=0,1,...
Initial probe: h1(k)
Second probe is offset by h2(k) mod m, so on ...
Advantage: avoids clustering
Disadvantage: harder to delete an element

30
Double Hashing: Example
h1(k) = k mod 13
0
h2(k) = 1+ (k mod 11) 1 79
h(k)= (h1(k) + i h2(k) ) mod 13 2
Insert key 14: 3
4 69
h1(14) = 14 mod 13 = 1
5 98
h(14) = (h1(14) + h2(14)) mod 13
6
= (1 + 4) mod 13 = 5 7 72
h(14,2) = (h1(14) + 2 h2(14)) mod 13 8
= (1 + 8) mod 13 = 9 9 14
10
11 50
12

31
Double Hashing Example
32
insert(14) insert(8) insert(21) insert(2) insert(7)
14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2 7%7 = 0
5-(21%5)=4 5-(21%5)=4

0 0 0 0 0
14 14 14 14 14
1 1 1 1 1
8 8 8 8
2 2 2 2 2
2 2
3 3 3 3 3

4 4 4 4 4
21 21 21
5 5 5 5 5

6 6 6 6 6

1 1 2 1 ??
probes:
Double Hashing Example
33
insert(14) insert(8) insert(21) insert(2) insert(56)
14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2 56%7 = 0
5-(21%5)=4 5-(56%5)=4

0 0 0 0 0
14 14 14 14 14
1 1 1 1 1
8 8 8 8
2 2 2 2 2
2 2
3 3 3 3 3

4 4 4 4 4
21 21 21
5 5 5 5 5
56
6 6 6 6 6

1 1 2 1 4
probes:

You might also like