0% found this document useful (0 votes)
21 views45 pages

Module 4 Hashing

Uploaded by

adithya.pr2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views45 pages

Module 4 Hashing

Uploaded by

adithya.pr2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Database Systems

BCSE302L
Module 4 – Hashing

Dr. Balasundaram A
Assistant Professor (Sr.)
SCOPE, VIT Chennai
Topics to be covered

➢ Hashing– Static and Dynamic

6 March 2023 VITCC-BCSE202L 2


Hashing
➢ With different searching techniques, search time is basically dependent
on the number of elements.

➢ So many key comparisons are also involved.

➢ Aim is to search an element in constant time and less key comparisons


should be involved

6 March 2023 VITCC-BCSE202L 3


Hashing
➢ Example: Suppose all the elements are in array size N. Let us take all the keys are
unique and in the range 0 to N-1. Now we are storing the records in array based on
the key where the array index and keys are same. Now we can access the records in
constant time and no comparison is involved. Consider the following example:

➢ 5 records are there where keys are:

➢9 4 6 7 2

➢ It will be stored in array as:

6 March 2023 VITCC-BCSE202L 4


Hashing
➢ So for storing records:

➢ Key → Generate array index → Store the record on that array index

➢ For accessing records:

➢ Key → Generate array index → Get the record from that array index

6 March 2023 VITCC-BCSE202L 5


Hashing
➢ The generation of array index uses hash function, which converts the
keys into array index.

➢ And the array which supports hashing for storing records or searching
records is called Hash table.

6 March 2023 VITCC-BCSE202L 6


Hashing
➢ Let’s take an example:
0
1
2 011 Delhi
3
4 022 Bombay
5
6 033 Kolkata
7
8 044 Chennai
9

➢ Hash function is denoted by h(k) notation.


6 March 2023 VITCC-BCSE202L 7
Choosing a hash function
➢ It should be easy to compute

➢ It should generate unique addresses. However, it is an ideal situation, so


it should generate address with minimum collision.

➢ Truncation Method
➢ Mid Square Method
➢ Folding Method
➢ Modular Method
➢ Hash function for floating point numbers
➢ Hash function for strings

6 March 2023 VITCC-BCSE202L 8


Truncation Method
➢ Keys: 82394561, 87139465, 83567271, 85943228

➢ Idea:
➢ Either take leftmost or rightmost digits based on the size of hash table.

➢ Suppose the size is 100, then take either 2 rightmost or 2 leftmost digits.

➢ Suppose, we are taking rightmost digits, then hash table entry address will be 61,
65, 71, 28

6 March 2023 VITCC-BCSE202L 9


Mid Square Method
➢ Keys: 1337, 1273, 1391, 1026

➢ Idea:

➢ Square the keys—


➢ 1337 → 1787569
➢ 1273 → 1620529
➢ 1391 → 1934881
➢ 1026 → 1052676

➢ Now, take the 3rd and 4th digit from each number and that will be hash address of
these keys.
6 March 2023 VITCC-BCSE202L 10
Folding Method
➢ Keys: 82394561, 87139465, 83567271, 85943228

➢ Idea:
➢ Now, chop these into pieces of 3, 2, and 3 digits and add them.
➢ 82394561= 823 + 94 + 561 = 1478
➢ 87139465 = 871 + 39 + 465 = 1375
➢ …………………….

➢ If it is still large, we can use truncation method as follows:

➢ h(82394561) = 478

6 March 2023 VITCC-BCSE202L 11


Modular Method
➢ Keys: 82394561, 87139465, 83567271, 85943228

➢ Idea:

➢ Let the table size is 97, then address will be:

➢ 82394561 % 97 = 45
➢ 87139465 % 97 = 0
➢ 83567271 % 97 = 25
➢ 83567271 % 97 = 64

➢ So, here the hash address of above keys will be 45, 0 , 25, and 64.
➢ We can combine any method with modular method also.
6 March 2023 VITCC-BCSE202L 12
Hash function for floating point numbers
➢ Keys: 123.4321, 19.463, 2.0298, 8.9956

➢ Idea:
i. Take the fractional part of the key.
ii. Multiply the fractional part with the size of the hash tables.
iii. Take the integer part of the multiplication result as a hash address of the key.

➢ Let the table size is 97, then address will be:

➢ 0.4321 * 97 = 41.9137
➢ 0.463 * 97 = 44.911
➢ 0.0298 * 97 = 2.8906
➢ 0.9956 * 97 = 96.5732
6 March 2023 VITCC-BCSE202L 13
Hash function for floating point numbers
➢ Keys: 123.4321, 19.463, 2.0298, 8.9956

➢ Now, the hash address will be the integer part of these numbers.
➢ h(123.4321) = 41
➢ h(19.463) = 44
➢ h(2.0298) = 2
➢ h(8.9956) = 96

6 March 2023 VITCC-BCSE202L 14


Hash function for strings
➢ Key: suresh

➢ Idea:
i. We can add the ASCII value of each character and then we can apply modular method.

➢ Let the table size is 97, then address will be:

➢ suresh = s + u + r + e + s + h = 115+117+117+101+115+104 = 666


➢ 666 % 97 = 84

➢ h(suresh) = 84

6 March 2023 VITCC-BCSE202L 15


Collision
➢ A collision occurs when two keys are hashed to the same index in a
hash table.

➢ Collisions are a problem because every slot in a hash table is supposed


to store a single element.

6 March 2023 VITCC-BCSE202L 16


Collision Resolution Techniques: Handling
Collision
➢ There are two broad ways of collision resolution:

1. Static Hashing
2. Dynamic Hashing

6 March 2023 VITCC-BCSE202L 17


Collision Resolution Techniques:
Static Hashing
➢ There are two broad ways of collision resolution using static hashing:

1.Separate Chaining (Open Hashing): An array of linked list


implementation.

2. Open Addressing (Closed Hashing): Array-based implementation.

(i) Linear probing (linear search)


(ii) Quadratic probing (nonlinear search)
(iii) Double hashing (uses two hash functions)

6 March 2023 VITCC-BCSE202L 18


Separate Chaining (Open Hashing)
1. Separate Chaining (Open
Hashing): An array of linked
list implementation.

• The idea behind separate


chaining is to implement the
array as a linked list called a
chain. Separate chaining is one
of the most popular and
commonly used techniques in
order to handle collisions.

6 March 2023 VITCC-BCSE202L 19


Separate Chaining (Open Hashing)
Example: Let us consider a
simple hash function as “key mod
7” and a sequence of keys as 50,
700, 76, 85, 92, 73, 101

6 March 2023 VITCC-BCSE202L 20


Separate Chaining (Open Hashing):
Advantages and Disadvantages
Advantages: Disadvantages:
• Simple to implement. • The cache performance of chaining is
not good as keys are stored using a
• Hash table never fills up, we can linked list. Open addressing provides
always add more elements to the better cache performance as
chain. everything is stored in the same table.

• Wastage of Space (Some Parts of the
• Less sensitive to the hash hash table are never used)
function or load factors.
• If the chain becomes long, then search
• It is mostly used when it is time can become O(n) in the worst
unknown how many and how case
frequently keys may be inserted
or deleted. • Uses extra space for links
6 March 2023 VITCC-BCSE202L 21
Closed Hashing: Linear Probing
➢ Example: Perform the operations given below, in the given order, on an
initially empty hash table of size 11 using linear probing and the hash
function: h(key) = key % 11:

➢ Insert: 29, 18, 43, 10, 36, 25, 46

➢ The required probe sequences are given by:


h(key) = h(key) % 11 i = 0, 1, 2, . . ., 10

6 March 2023 VITCC-BCSE202L 22


Closed Hashing: Linear Probing
➢ h(29) = 29 % 11 = 7 (Success) 0 10
➢ h(18) = 18 % 11 = 7 (Collision) 1
h(18) = (18+1)%11 = 8 (Success) 2 46
➢ h(43) = 43 % 11 = 10 (Success) 3 36
➢ h(10) = 10 % 11 = 10 (Collision) 4 25
h(10) = (10+1) %11 = 0 (Success) 5
➢ h(36) = 36 % 11 = 3 (Success) 6
➢ h(25) = 25 % 11 = 3 (Collision) 7 29
h(25) = (25+1)%11 = 4 (Success) 8 18
➢ h(46) = 46 % 11 = 2 (Success) 9
10 43

6 March 2023 VITCC-BCSE202L 23


Closed Hashing: Linear Probing
➢ The main disadvantage of the linear probing is clustering problem.

➢ When half of the table is full, it is difficult to find empty position in


hash table in case of collision.

6 March 2023 VITCC-BCSE202L 24


Closed Hashing: Quadratic Probing
➢ Suppose hash address is h, then in case of collision, linear probing searches the
location h, h+1, h+2, … % TableSize

➢ Here, in quadratic probing, it searches the locations (h+i2)% TableSize (for i = 0, 1,


2, 3, …).

6 March 2023 VITCC-BCSE202L 25


Closed Hashing: Quadratic Probing
➢ Example: 29, 18, 43, 10, 46, 54 0 10
1
➢ h(29) = 29%11 = 7 (Success) 2 46
➢ h(18) = 18%11 = 7 (Collision) 3 54
h(18) = (18+12)%11 = 8 (Success) 4
➢ h(43) = 43%11 = 10 (Success) 5
➢ h(10) = 10%11 = 10 (Collision) 6
h(10) = (10+1)%11 = 0 (Success) 7 29
➢ h(46) = 46%11 = 2 (Success) 8 18
➢ h(54) = 54%11 = 10 (Collision) 9
h(54) = (54+12)%11 = 0 (Collision) 10 43
h(54) = (54+22)%11 = (54+4)%11
= 3 (Success)

6 March 2023 VITCC-BCSE202L 26


Closed Hashing: Double Hashing
➢ Double hashing is a collision resolution technique used in hash tables. It works by
using two hash functions to compute two different hash values for a given key. The
first hash function is used to compute the initial hash value, and the second hash
function is used to compute the step size for the probing sequence.

➢ Double hashing has the ability to have a low collision rate, as it uses two hash
functions to compute the hash value and the step size. This means that the
probability of a collision occurring is lower than in other collision resolution
techniques such as linear probing or quadratic probing.

➢ However, double hashing has a few drawbacks. First, it requires the use of two hash
functions, which can increase the computational complexity of the insertion and
search operations. Second, it requires a good choice of hash functions to achieve
good performance. If the hash functions are not well-designed, the collision rate
may still be high.
6 March 2023 VITCC-BCSE202L 27
Closed Hashing: Double Hashing (cond…)
➢ Double hashing can be done using :

➢ (hash1(key) + i * hash2(key)) % TABLE_SIZE

➢ Here hash1() and hash2() are hash functions and TABLE_SIZE is size of hash table.

➢ First hash function is typically hash1(key) = key % TABLE_SIZE

➢ A popular second hash function is hash2(key) = PRIME – (key % PRIME) where


PRIME is a prime smaller than the TABLE_SIZE.

6 March 2023 VITCC-BCSE202L 28


Closed Hashing: Double Hashing: Example

6 March 2023 VITCC-BCSE202L 29


Dynamic Hashing
Extendible Hashing

➢ It is a dynamic hashing method wherein directories, and buckets are used to hash
data. It is an aggressively flexible method in which the hash function also
experiences dynamic changes.

➢ Main features of Extendible Hashing: The main features in this hashing technique
are:
➢ Directories: The directories store addresses of the buckets in pointers. An id is
assigned to each directory which may change each time when Directory
Expansion takes place.
➢ Buckets: The buckets are used to hash the actual data.

6 March 2023 VITCC-BCSE202L 30


Basic Structure of Extendible Hashing:

6 March 2023 VITCC-BCSE202L 31


Frequently used terms in Extendible Hashing:
• Directories: These containers store pointers to buckets. Each directory is given a unique id which
may change each time when expansion takes place. The hash function returns this directory id
which is used to navigate to the appropriate bucket. Number of Directories = 2^Global Depth.
• Buckets: They store the hashed keys. Directories point to buckets. A bucket may contain more
than one pointers to it if its local depth is less than the global depth.
• Global Depth: It is associated with the Directories. They denote the number of bits which are used
by the hash function to categorize the keys. Global Depth = Number of bits in directory id.
• Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is
associated with the buckets and not the directories. Local depth in accordance with the global
depth is used to decide the action that to be performed in case an overflow occurs. Local Depth is
always less than or equal to the Global Depth.
• Bucket Splitting: When the number of elements in a bucket exceeds a particular size, then the
bucket is split into two parts.
• Directory Expansion: Directory Expansion Takes place when a bucket overflows. Directory
Expansion is performed when the local depth of the overflowing bucket is equal to the global
depth.
6 March 2023 VITCC-BCSE202L 32
Basic Working of Extendible Hashing:
• Step 1 – Analyze Data Elements: Data elements may exist in various forms eg. Integer, String, Float,
etc.. Currently, let us consider data elements of type integer. eg: 49.
• Step 2 – Convert into binary format: Convert the data element in Binary form. For string elements,
consider the ASCII equivalent integer of the starting character and then convert the integer into
binary form. Since we have 49 as our data element, its binary form is 110001.
• Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-directory is 3.
• Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary number
and match it to the directory id.
Eg. The binary obtained is: 110001 and the global-depth is 3. So, the hash function will return 3
LSBs of 110001 viz. 001.
• Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id 001.
• Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket overflows. If an
overflow is encountered, go to step 7 followed by Step 8, otherwise, go to step 9.

6 March 2023 VITCC-BCSE202L 33


Basic Working of Extendible Hashing:
(contd…)
• Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while inserting data in
the buckets, it might happen that the Bucket overflows. In such cases, we need to follow an
appropriate procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global depth. Then choose one of the
cases below.
• Case1: If the local depth of the overflowing Bucket is equal to the global depth, then
Directory Expansion, as well as Bucket Split, needs to be performed. Then increment the
global depth and the local depth value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash structure.
• Case2: In case the local depth is less than the global depth, then only Bucket Split takes place.
Then increment only the local depth value by 1. And, assign appropriate pointers.

• Step 8 – Rehashing of Split Bucket Elements: The Elements present in the overflowing bucket
that is split are rehashed w.r.t the new global depth of the directory.
• Step 9 – The element is successfully hashed.

6 March 2023 VITCC-BCSE202L 34


Basic Working of Extendible Hashing:
(contd…)

6 March 2023 VITCC-BCSE202L 35


Extendible Hashing: Example
• Now, let us consider a prominent Solution: First, calculate the binary
example of hashing the following forms of each of the given numbers.
elements: 16- 10000
16,4,6,22,24,10,31,7,9,20,26. 4- 00100
6- 00110
• Bucket Size: 3 (Assume) 22- 10110
24- 11000
• Hash Function: Suppose the 10- 01010
global depth is X. Then the Hash 31- 11111
Function returns X LSBs. 7- 00111
9- 01001
20- 10100
26- 11010

6 March 2023 VITCC-BCSE202L 36


Extendible Hashing: Example (cond…)
• Initially, the global-depth and Inserting 16:
local-depth is always 1. Thus, the The binary format of 16 is 10000 and
hashing frame looks like this: global-depth is 1. The hash function
returns 1 LSB of 10000 which is 0. Hence,
16 is mapped to the directory with id=0.

6 March 2023 VITCC-BCSE202L 37


Extendible Hashing: Example (cond…)
• Inserting 4 and 6: Inserting 22: The binary form of 22 is
• Both 4(100) and 6(110)have 0 in 10110. Its LSB is 0. The bucket pointed by
their LSB. Hence, they are hashed directory 0 is already full. Hence, Over
as follows: Flow occurs.

6 March 2023 VITCC-BCSE202L 38


Extendible Hashing: Example (cond…)
• As directed by Step 7-Case 1, Since
Local Depth = Global Depth, the
bucket splits and directory expansion
takes place. Also, rehashing of
numbers present in the overflowing
bucket takes place after the split.
And, since the global depth is
incremented by 1, now, the global
depth is 2. Hence, 16,4,6,22 are now
rehashed w.r.t 2 LSBs.[
16(10000),4(100),6(110),22(10110) ]

*Notice that the bucket which was underflow has remained untouched. But, since the number of directories has
doubled, we now have 2 directories 01 and 11 pointing to the same bucket. This is because the local-depth of the
bucket has remained 1. And, any bucket having a local depth less than the global depth is pointed-to by more than
one directories.
6 March 2023 VITCC-BCSE202L 39
Extendible Hashing: Example (cond…)
• Inserting 24 and 10: 24(11000) Inserting 31,7,9: All of these elements[
and 10 (1010) can be hashed 31(11111), 7(111), 9(1001) ] have either 01
based on directories with id 00 or 11 in their LSBs. Hence, they are
and 10. Here, we encounter no mapped on the bucket pointed out by 01
overflow condition. and 11. We do not encounter any overflow
condition here.

6 March 2023 VITCC-BCSE202L 40


Extendible Hashing: Example (cond…)
• Inserting 20: Insertion of 20 is inserted in bucket pointed out by 00. As
data element 20 (10100) directed by Step 7-Case 1, since the local depth of
will again cause the the bucket = global-depth, directory expansion
overflow problem. (doubling) takes place along with bucket splitting.

6 March 2023 VITCC-BCSE202L 41


Extendible Hashing: Example (cond…)
• Inserting 26: Global depth is
3. Hence, 3 LSBs of
26(11010) are considered.
Therefore 26 best fits in the
bucket pointed out by
directory 010.

6 March 2023 VITCC-BCSE202L 42


Extendible Hashing: Example (cond…)
• The bucket overflows, and,
as directed by Step 7-Case 2,
since the local depth of
bucket < Global depth (2<3),
directories are not doubled
but, only the bucket is split
and elements are rehashed.
• Finally, the output of hashing
the given list of numbers is
obtained.

6 March 2023 VITCC-BCSE202L 43


Extendible Hashing: Example (cond…)
• Key Observations:

• A Bucket will have more than one pointers pointing to it if its local depth is less than
the global depth.

• When overflow condition occurs in a bucket, all the entries in the bucket are rehashed
with a new local depth.

• The size of a bucket cannot be changed after the data insertion process begins.

6 March 2023 VITCC-BCSE202L 44


Thank You!

6 March 2023 VITCC-BCSE202L 45

You might also like