0% found this document useful (0 votes)
19 views41 pages

Unit - 4 Hashing

Uploaded by

bhumikadeshkar1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views41 pages

Unit - 4 Hashing

Uploaded by

bhumikadeshkar1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Unit IV - Hashing

Data Structures

Data Structures Unit-IV Hashing


Syllabus
Hash Table : Concepts-hash table, hash function, basic operations (1 Hr)
Bucket, collision, probe, synonym, overflow (1 Hr)
Open hashing, closed hashing, perfect hash function (2 Hr)
Load density, full table, load factor, rehashing, properties of good hash
function (1 Hr)
Collision resolution strategies- open addressing and chaining (1 Hr)
Hash table overflow- open addressing and chaining (2 Hr)
Extendible hashing, closed addressing and separate chaining (1 Hr)

Case study : Dictionary Application using Hash Tables


Description: Implement a dictionary where words and meanings are stored
and retrieved using hashing with collision resolution
Data Structures Unit-IV Hashing
What is Hashing??

Hashing is a process used in computer science to


convert data into a fixed-size value, usually for
purposes like fast data retrieval, data
comparison, encryption, or checksums. It's a
fundamental concept in areas such as data
structures, cryptography, and computer
security.
Hash Table : Concepts-hash table, hash function,
basic operations

What is a Hash Table?


• A hash table (or hash map) is a data structure that provides
efficient insertion, deletion, and search operations, typically in
average-case constant time O(1).
• It stores key-value pairs.
• Keys are transformed into an index of an array (called buckets
or slots) using a hash function.
• Handles collisions when two keys map to the same index.

Data Structures Unit-IV Hashing


A hash table is a data structure that allows for fast data
retrieval. It stores key-value pairs, where each key is
mapped to a value through a process called hashing.

Key Concepts:

1. Hash Function
•A function that takes a key (like a string or number) and
computes an index (a number).
•This index tells the hash table where to store or find the
value associated with the key.
Example: hash("apple") → 4

So the key "apple" is stored at index 4.


2. Buckets / Array
•Internally, a hash table uses an array where each index
(or "bucket") holds data.
•The hash function determines which bucket a key
belongs in.

3. Handling Collisions
•A collision happens when two different keys hash to the
same index.
•Common ways to handle this:
• Chaining: Store multiple values in a list at that
index.
• Open addressing: Find another empty slot using a
method like linear probing.
Hash Function
A hash function converts keys into an integer index within the
bounds of the table size.

• Properties of a good hash function:


Fast to compute.
Uniformly distributes keys to minimize collisions.
Deterministic: same key always maps to the same index.
• Common hash functions:
• Division method:
• hash(key)=key mod table_size\text{hash}(key) = key \mod table\
_sizehash(key)=keymodtable_size
• Multiplication method:
Uses fractional parts of key times a constant (Knuth’s method).
• Universal hashing:
Uses randomization
Data Structures to reduce worst-case scenarios.
Unit-IV Hashing
• Collision Handling
• When two keys hash to the same index, collisions occur.
Common methods to handle collisions:
• Chaining:
Each bucket holds a linked list (or another dynamic
structure) of all elements hashed to it.
• Open addressing:
Finds another empty slot via probing (linear probing,
quadratic probing, double hashing).

Data Structures Unit-IV Hashing


• Basic Operations
• Insert(key, value)
 Compute index using the hash function.
 If collision occurs, handle using chaining or probing.
 Insert the key-value pair at the calculated index.
• Search(key)
 Compute index using the hash function.
 Check if the key is present in the bucket.
 Return the value if found; else indicate key is not present.
• Delete(key)
 Compute index using the hash function.
 Locate the key in the bucket or probe sequence.
 Remove the key-value pair.
 In open addressing, mark slot as deleted for proper future
probing. Unit-IV Hashing
Data Structures
Operation Average Time Complexity Description

Insert key-value pair into


Insert O(1) Summary Table
the table

Find value associated with


Search O(1)
a key

Remove key-value pair


Delete O(1)
from the table

Data Structures Unit-IV Hashing


Bucket, collision, probe, synonym, overflow
• 1. Bucket
• Definition:
A bucket is a slot or container in a hash table that stores
one or more key-value pairs.
In chaining, a bucket holds a list or chain of entries. In
open addressing, a bucket usually holds a single entry.

• Synonyms/Related terms:
– Slot
– Cell
– Container
– Index position
Data Structures Unit-IV Hashing
Bucket, collision, probe, synonym, overflow
• 2. Collision
• Definition:
A collision happens when two different keys produce the
same hash index and thus map to the same bucket.

• Synonyms/Related terms:
– Conflict
– Hash clash
– Hash conflict

Data Structures Unit-IV Hashing


Bucket, collision, probe, synonym, overflow
• 3. Probe
• Definition:
A probe refers to the process of searching for an
empty bucket or the target key in the event of a
collision in open addressing. It involves checking
subsequent buckets according to a probing
sequence (linear, quadratic, etc.).
• Synonyms/Related terms:
– Search sequence
– Probe sequence
– Collision resolution step
Data Structures Unit-IV Hashing
Bucket, collision, probe, synonym, overflow
• 4. Synonym
• In hashing:
A synonym is a key that hashes to the same
index (bucket) as another key. Essentially, it's a
different key that causes a collision.
• Synonyms/Related terms:
– Colliding key
– Conflicting key
– Hash synonym

Data Structures Unit-IV Hashing


Bucket, collision, probe, synonym, overflow
• 5. Overflow
• Definition:
Overflow occurs when a bucket exceeds its
storage capacity, especially in fixed-size buckets or
hash tables without dynamic resizing.
• Synonyms/Related terms:
– Bucket overflow
– Collision overflow
– Overflow chain (in chaining when bucket chains grow)
– Overflow area (in some implementations that use a
secondary storage for overflow)
Data Structures Unit-IV Hashing
Open hashing, closed hashing, perfect hash
function
1. Open Hashing (Chaining)
• Definition:
In open hashing, each bucket in the hash table contains a linked list (or chain)
of entries that hash to the same index. When collisions occur, new elements
are simply added to the chain.
• How it works:
– The table is an array of buckets.
– Each bucket stores a list of key-value pairs.
– Collisions are resolved by chaining keys in the same bucket.
• Advantages:
– Easy to implement.
– Table can store more items than the number of buckets.
– Deletion is straightforward.
• Disadvantages:
– Extra memory overhead for pointers.
– Performance depends on linked list length.

Data Structures Unit-IV Hashing


Open hashing, closed hashing, perfect hash function
• 2. Closed Hashing (Open Addressing)
• Definition:
In closed hashing, all elements are stored directly in the hash table array itself.
When collisions happen, the algorithm searches for another free slot within the
array using a probing technique.
• Probing Methods:
– Linear probing: Check next slot sequentially.
– Quadratic probing: Check slots based on quadratic function.
– Double hashing: Use a secondary hash function for steps.
• Advantages:
– No extra pointers needed (less memory).
– Cache-friendly due to contiguous storage.
• Disadvantages:
– Clustering can degrade performance.
– Table must have empty slots; can't exceed capacity easily.
– Deletion is trickier (needs special markers).
Data Structures Unit-IV Hashing
Open hashing, closed hashing, perfect hash function
• Perfect Hash Function
• Definition:
A perfect hash function maps a set of keys to a hash table with
no collisions. That means every key hashes to a unique index.
• Types:
– Static perfect hashing: For fixed key sets known in advance.
– Minimal perfect hashing: A perfect hash function that uses exactly n
slots for n keys, with no wasted space.
• Uses:
– Compiler symbol tables.
– Databases and static sets where fast lookup with no collisions is critical.
• Challenges:
– Designing perfect hash functions dynamically for changing key sets is
complex.

Data Structures Unit-IV Hashing


Open hashing, closed hashing, perfect hash function
Term Collision Handling Storage Use Case

Open Hashing Linked lists per Extra memory for When data grows
(Chaining) bucket chains dynamically

Memory
Closed Hashing
Probing sequence All in array constrained
(Open Addressing)
environments

Perfect Hash Minimal or exact Static key sets,


No collisions (ideal)
Function size performance critical

Data Structures Unit-IV Hashing


Load density, full table, load factor, rehashing,
properties of good hash function
• 1. Load Density
• Definition:
The number of elements stored in the hash table per bucket or
per slot.
In chaining, it's the average length of the linked lists (chains) in
each bucket.
• Example:
If 30 keys are stored in a table of size 10 using chaining, the load
density is 3 (on average, 3 keys per bucket).

Data Structures Unit-IV Hashing


Load density, full table, load factor, rehashing,
properties of good hash function
• 2. Full Table
• Definition:
A hash table is said to be full when there are
no empty slots available for insertion.
This term mainly applies to open addressing
hash tables since chaining can theoretically
hold unlimited keys (by extending the chains).

Data Structures Unit-IV Hashing


Load density, full table, load factor, rehashing,
properties of good hash function

(n) to the number of buckets (m) in the hash table:𝛼=𝑛/𝑚


• Load Factor (α) Definition: Ratio of the number of keys stored

• ​Interpretation: For chaining, α indicates the average chain


length.
• For open addressing, α should be less than 1 to avoid a full
table.
• Impact:Higher load factor means more collisions and slower
operations.Keeping α below a threshold helps maintain
performance.

Data Structures Unit-IV Hashing


Load density, full table, load factor, rehashing,
properties of good hash function
• Rehashing
• Definition:
The process of creating a new larger hash table and
reinserting all existing keys into it, typically triggered
when the load factor exceeds a threshold.
• Purpose:
– To reduce collisions and maintain efficient operations.
– Usually doubles the table size and recalculates hash indices.

Data Structures Unit-IV Hashing


Load density, full table, load factor, rehashing,
properties of good hash function
• Properties of a Good Hash Function
• Uniform Distribution:
Spreads keys evenly across the table to minimize collisions.
• Deterministic:
The same key always produces the same hash value.
• Fast Computation:
Computes the hash value quickly to keep insert/search efficient.
• Minimizes Collisions:
Reduces the chance that different keys produce the same hash
value.
• Uses All Key Bits:
Efficiently incorporates all parts of the key to improve
randomness.
Data Structures Unit-IV Hashing
Collision Resolution Strategies
1. Open Addressing
• Concept:
All entries are stored directly in the hash table array. When a collision occurs, the
algorithm searches (probes) for another empty slot in the table according to a
defined sequence.
• How it works:
– Compute the initial hash index.
– If occupied, try next slot(s) based on a probing method until an empty slot is
found.
• Common Probing Techniques:
– Linear Probing: Check next slot (index + 1, index + 2, ...)
– Quadratic Probing: Check slots with quadratic offset (index + 1², index + 2², ...)
– Double Hashing: Use a second hash function to determine step size.
• Advantages:
– Simple and requires no extra memory for pointers.
– Cache-friendly (elements stored contiguously).
• Disadvantages:
– Clustering (groups of filled slots) can degrade performance.
– Table can become full, requiring rehashing.
Data–Structures
Deletion is tricky because
Unit-IV removing elements can break probe sequences.
Hashing
Collision Resolution Strategies
• 2. Chaining (Open Hashing)
• Concept:
Each slot (bucket) of the hash table holds a linked list (or another dynamic
data structure) of all elements hashing to that slot.
• How it works:
– Compute the hash index.
– Insert the new key-value pair into the linked list at that bucket.
– Search and delete operations involve traversing the list.
• Advantages:
– Simple to implement.
– Table can handle more elements than the number of slots.
– Deletion is straightforward.
– Less affected by load factor than open addressing.
• Disadvantages:
– Extra memory overhead for pointers in linked lists.
– Potentially slower due to linked list traversal.
– Cache-unfriendly due to pointer-based structures.
Data Structures Unit-IV Hashing
Collision Resolution Strategies
Aspect Open Addressing Chaining

Storage All elements stored in array Buckets hold linked lists

Memory Overhead Low (no extra pointers) Higher (pointers for chains)

Performance (Low Load) Very fast Fast

Performance (High Load) Degrades due to clustering Handles collisions better

More complex (needs


Deletion Easy
special markers)

Must be larger than number Can exceed number of


Table Size
of keys buckets

Data Structures Unit-IV Hashing


Hash Table Overflow: Open Addressing vs Chaining
• 1. Overflow in Open Addressing
• What is it?
In open addressing, overflow occurs when the hash table becomes full
—that is, there are no empty slots available to insert new keys.
• Why does it happen?
Because all keys must be stored inside the fixed-size array, once all
slots are occupied, no further insertions are possible without resizing.
• Consequences:
– Insertions fail or require rehashing.
– Performance degrades as the load factor approaches 1.
– Searching takes longer due to clustering.
• Handling overflow:
– Rehashing: Create a larger table (typically double the size) and reinsert all
keys.
– Maintain the load factor (e.g., below 0.7) by resizing proactively.
– Use better probing methods to reduce clustering.
Data Structures Unit-IV Hashing
Hash Table Overflow: Open Addressing vs Chaining
• 2. Overflow in Chaining
• What is it?
In chaining, overflow means that a bucket’s linked list grows too
long because many keys hash to the same index.
• Why does it happen?
Due to poor hash function distribution or high load factor (more
keys than buckets).
• Consequences:
– Longer chains lead to slower search, insert, and delete operations
(degrades from O(1) average to O(n) worst case).
– Increased memory usage due to longer lists.
• Handling overflow:
– Use a better hash function to distribute keys more evenly.
– Increase the number of buckets (resize the table and rehash).
– Use dynamic data structures like balanced trees in buckets instead of
linked lists toUnit-IV
Data Structures speed upHashing
access.
Hash Table Overflow: Open Addressing vs Chaining

Rehash, better hash


How to handle Rehash to a larger table functions, or use balanced
trees in buckets

Table completely full (no Buckets have long chains


What causes overflow
empty slots) (many collisions)
Cannot insert new Performance degrades due
Effect
elements unless resized to long chains

Overflow Type Open Addressing Chaining

Data Structures Unit-IV Hashing


Extendible Hashing, Closed Addressing, and Separate
Chaining
• 1. Extendible Hashing
• What is it?
Extendible hashing is a dynamic hashing technique that grows the
hash table directory as needed to handle more keys without excessive
collisions.
• Key Features:
– Uses a directory indexed by some bits of the hash value.
– Each bucket has a local depth indicating how many hash bits it uses.
– When a bucket overflows, it splits and may cause the directory to double in
size (increasing global depth).
– Efficient for dynamic datasets where the number of keys grows over time.
• Advantages:
– Avoids large-scale rehashing on every insertion.
– Provides good average-case performance with controlled memory use.
• Use case:
– Database indexing and file systems that need scalable hash structures.
Data Structures Unit-IV Hashing
Extensible/Extendible Hashing
1. The dynamic hashing method is used to overcome the
problems of static hashing like bucket overflow.
2. In this method, data buckets grow or shrink as the records
increases or decreases. This method is also known as
Extendable hashing method.
3. This method makeshashing dynamic, it
i.e., insertion or deletion allows
without resulting performance. in
Extensible/Extendible Hashing
• How to search a key
1. First, calculate the hash address of the key.
2. Check how many bits are used in the directory, and these
bits are called as i.
3. Take the least significant i bits of the hash address.
This gives an index of the directory.
4. Now using the index, go to the directory and find bucket
address where the record might be.
Extensible/Extendible Hashing
• How to insert a new record
1. Firstly, you have to follow the same procedure
for retrieval, ending up in some bucket.
2. If there is still space in that bucket, then place the record
in it.
3. If the bucket is full, then we will split the bucket
and redistribute the records.
Extensible/Extendible Hashing
• For example :
Consider the following grouping of into buckets,
keys depending on the prefix of their hash
address:
Extensible/Extendible Hashing
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The
last two bits of 5 and 6 are 01, so it will go into bucket B1. The last
two bits of 1 and 3 are 10, so it will go into bucket B2. The last two
bits of 7 are 11, so it will go into B3.
Extendible Hashing, Closed Addressing, and
Separate Chaining
• 2. Closed Addressing (Open Addressing)
• What is it?
Closed addressing, commonly called open addressing, stores all
entries directly inside the hash table array. When collisions occur,
the algorithm finds another empty slot via probing.
• How it works:
– Uses probing sequences like linear probing, quadratic probing, or double
hashing to resolve collisions.
– All elements are stored in the table itself (no separate linked lists).
• Pros:
– Space-efficient (no extra pointers).
– Cache-friendly due to contiguous storage.
• Cons:
– Performance suffers if the table is too full (load factor near 1).
Data– Structures
Deletion requires special
Unit-IV care to avoid breaking probe chains.
Hashing
Extendible Hashing, Closed Addressing, and
Separate Chaining
• 3. Separate Chaining (Open Hashing)
• What is it?
Separate chaining stores elements that hash to the same bucket in a
linked list (or other dynamic data structure) associated with that bucket.
• How it works:
– Each bucket points to a chain of all key-value pairs that hash to it.
– Insertions add new entries to the chain; searches and deletions scan the chain.
• Pros:
– Simple and flexible.
– Can handle loads greater than the number of buckets (chains grow as needed).
– Deletions are straightforward.
• Cons:
– Extra memory overhead for pointers.
– Potentially slower access due to list traversal.

Data Structures Unit-IV Hashing


Quick Comparison Table
Closed Addressing Separate Chaining
Feature Extendible Hashing
(Open Addressing) (Open Hashing)

Storage Directory + buckets Array only Array + linked lists

Bucket splitting & Probing to find


Collision Handling Chains per bucket
directory doubling empty slot

Yes, directory Requires rehashing Resize buckets array


Dynamic resizing
doubles (resize array) + chains

Moderate
Higher (pointers in
Memory Overhead (directory + Low (no pointers)
chains)
buckets)

Deletion Tricky (due to probe Simple (remove


Moderate
Complexity sequences) from chain)

Data Structures Unit-IV Hashing


Case study
Dictionary Application using Hash Tables
Description: Implement a dictionary where words and
meanings are stored and retrieved using hashing with
collision resolution

•hash function to convert each word to an index.


•Collisions are handled by chaining — each bucket is a linked list of (key, value) pairs.
•Insert adds new words or updates existing meanings.
•Search finds the meaning of a word.
•Delete removes a word from the dictionary.
•Display prints the entire dictionary content.

Data Structures Unit-IV Hashing


Thank You

You might also like