HASHING
-Introduction
-Static Hashing
-Dynamic Hashing
WHAT IS HASHING ?
Hashing is a process in computer science and cryptography used to
transform input data of arbitrary size (such as a string or file) into a fixed-size
value, often called a hash value or hash code. This transformation is performed
using a mathematical function known as a hash function. The resulting hash
value is typically a sequence of numbers and/or letters.
CHARACTERISTICS OF HASHING
1. Deterministic: The same input always produces the same
hash value.
2. Fixed Size: No matter how large or small the input data is,
the hash value has a fixed length.
3. Efficient: Hash functions are designed to process data
quickly.
4. Unique (Minimally Colliding): While not guaranteed, hash
functions are designed to minimize the chance of different
inputs producing the same hash (called a collision).
5. Non-reversible: Hashing is a one-way operation. It's
computationally infeasible to derive the original input from
its hash.
COMPONENTS OF HASHING
1.Key: A Key can be anything string or integer which is fed as input in the
hash function the technique that determines an index or location for
storage of an item in a data structure.
2.Hash Function: The hash function receives the input key and returns the
index of an element in an array called a hash table. The index is known as
the hash index .
3.Hash Table: Hash table is a data structure that maps keys to values
using a special function called a hash function. Hash stores the data in an
associative manner in an array where each data value has its own unique
index.
CHARACTERISTICS OF HASHING
1. Deterministic: The same input always produces the same
hash value.
2. Fixed Size: No matter how large or small the input data is,
the hash value has a fixed length.
3. Efficient: Hash functions are designed to process data
quickly.
4. Unique (Minimally Colliding): While not guaranteed, hash
functions are designed to minimize the chance of different
inputs producing the same hash (called a collision).
5. Non-reversible: Hashing is a one-way operation. It's
computationally infeasible to derive the original input from
its hash.
TYPES OF HASH FUNCTIONS
1. Division Method
It is the most simple method of hashing an integer x.
This method divides x by M and then uses the
remainder obtained
In this case, the hash function can be given as h(x) = x
mod M
since it requires only a single division operation, the
method works very fast.
Generally, it is best to choose M to be a prime number
because it increases the likelihood that the keys are
mapped with a uniformity in the output range of
values.
Example: If the record to be stored {54,72,89,37} is to
be placed in the hash table and if the table is of size 10
2. Mid-Squared Method
Here, the key K is squared. A number ‘l’ in
the middle of K^2 is selected by removing
the digits from both ends. H(k)=l
Calculate the hash value for keys 1234 using
the mid-square method. The hash table has
100 memory locations.
Solution: Note that the hash table has 100
memory locations whose indices vary from
0 to 99.
This means that only two digits are needed
to map the key to a location in the hash
table.
When k = 1234, k^2 = 1522756, h (1234) = 27.
3.Folding Method
The folding method involves dividing the key into equal parts, summing the parts,
and then taking the modulo with respect to 𝑚m
Step 1: Divide the key value into a number of parts. That is, divide k into parts k1, k2,
..., kn, where each part has the same number of digits except the last part which may
have lesser digits than the other parts.
Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ... + kn. The hash
value is produced by ignoring the last carry, if any.
WHAT IS COLLISION ?
A collision occurs in hashing when two different keys
(or inputs) produce the same hash value and are
mapped to the same location in the hash table.
Collisions are inevitable in hashing because:
The hash function maps a large set of possible
input values (keys) to a fixed-size range of hash
values (e.g., 0 to 99 for a hash table with 100
locations).
Since there are more possible inputs than hash
table slots, different keys may map to the same
hash value.
If we have a hash table with 100 locations and two
keys, 1234 and 5678, both hash to the same index, a
collision happens
COLLISION RESOLUTION
TECHNIQUES
COLLISION RESOLUTION BY LINEAR PROBING
(OPEN ADDRESSING)
In linear probing, the hash table is searched sequentially that
starts from the original location of the hash. If in case the location
that we get is already occupied, then we check for the next
location.
linear probing searches for the next available slot in a sequential
manner.
It checks the next index (current index + 1) and continues until an
empty slot is found.
Example question - Let us consider a simple hash function as “key
mod 5” and a sequence of keys that are to be inserted are 50, 70,
76, 85, 93.
QUADRATIC PROBING
Quadratic Probing is a collision resolution technique used in open addressing for hash tables.
When a collision occurs quadratic probing helps find the next available slot by searching at a
quadratic distance from the original index.
If a collision occurs at index i, the algorithm will attempt to place the key in the next available
position using a quadratic function. The formula: h(k,i)=(h(k)+f(i)) modm
where:
h(k) is the original hash value of the key k.
i is the number of attempts (or collisions) that have occurred.
f(i) is the quadratic function, often defined ash(k,i)=(h(k)+f(i))modm.
m is the size of the hash table.
Quadratic Probing Formula:
New Index=(h(k)+i^2)mod m
Where:
h(k) is the initial hash value.
i is the number of probing attempts (starting at 1, 2, 3, etc.).
m is the size of the hash table.
DOUBLE HASHING
Double Hashing is another collision resolution technique used in open addressing for
hash tables. It is a more advanced method to resolve collisions and avoids the
clustering issues seen in linear probing and quadratic probing.
The idea behind double hashing is to apply a second hash function to determine how
far to jump when looking for the next available slot.
The formula for double hashing is: h(k,i)=(h1(k)+i⋅h2(k))mod m
Where:
h1(k) is the primary hash function that maps the key k to an initial index.
h2(k)is the secondary hash function, used to calculate the step size for probing.
i is the number of probing attempts (i.e., how many times the key has collided and
been probed).
m is the size of the hash table.
DOUBLE HASHING
Advantages of Double Disdvantages of
Hashing Double Hashing
Reduced Clustering: Double hashing Complexity: Double hashing is more complex
effectively reduces both primary clustering to implement compared to other collision
and secondary clustering (which occur in resolution methods like linear probing or
linear probing and quadratic probing). quadratic probing.
Efficient: The combination of two hash Hash Function Choice: The choice of the
functions provides a good spread of keys, second hash function h2(k) is important. If it's
reducing the chance of collisions. not chosen carefully, it can cause poor
Improved Performance: For a large number distribution and inefficient probing.
of keys and moderate load factors, double Table Size Constraints: For double hashing to
hashing performs better than other open work effectively, the hash table size m should
addressing methods typically be prime to ensure that the probes
can spread out well.
TYPES OF HASHING
STATIC HASHING
Static Hashing is a hashing technique where the size of the hash table is
fixed and does not change during its lifetime. It is used when the number
of records to be stored in the table is known and is not expected to
change significantly.
CHARACTERISTICS OF
STATIC HASHING
Fixed Hash Hash Collision Performance
Table Size: Function Handling Works well for
Maps keys to Since the hash table size is
datasets of a fixed
The hash table is specific slots in the limited, collisions (when
size.
created with a hash table. two keys map to the same
Suffers from
predefined size. The choice of the slot) are inevitable.
overflow or
This size remains hash function Common techniques to
underutilization if
constant, determines the handle collisions:
the number of
irrespective of the efficiency of data Chaining: Storing collided
records grows or
number of records. retrieval and keys in a linked list at the
shrinks
collision resolution. same index.
significantly.
STATIC HASHING
Advantages of Static Disadvantages of
Static Hashing is a hashing technique where the size of the hash table is
Hashing Static Hashing
fixed and does not change during its lifetime. It is used when the number
Easier to implement compared to Inefficient for datasets that grow or
of records to be stored in the table is known and is not expected to
dynamic hashing. shrink dynamically.
change significantly.
Memory allocation is fixed, which If the hash table becomes full,
makes resource planning performance degrades significantly.
straightforward. If the dataset is much smaller than
Optimal when the number of the table size, memory is wasted.
records is known and does not Managing collisions can become
change. complex as the table fills up.
DYNAMIC HASHING
Dynamic Hashing is a technique used to handle the problem of growing a hash table
when the table becomes too full. Unlike static hashing, where the hash table size is fixed
and can lead to performance issues as the table fills up, dynamic hashing allows the hash
table to grow or shrink dynamically to maintain efficient data storage and retrieval.
Dynamic hashing is particularly useful for applications where the number of keys is not
known in advance, and the data size can change over time, such as in database
management systems and file systems.
There are two main approaches to dynamic hashing:
1. Extendible Hashing
2. Linear Hashing
EXTENDIBLE HASHING
In Extendible Hashing, the hash table grows or shrinks by
doubling or halving the number of buckets (also called
"directory"). It uses a directory that stores pointers to the actual
buckets
Initially, the directory size is small, and the hash function
produces hash values based on the number of directory
entries.
If the bucket overflows (i.e., it exceeds the number of keys it
can hold), the directory is doubled (the number of directory
entries is increased), and the keys are redistributed into the
new buckets.
LINEAR HASHING
In Linear Hashing, instead of doubling the size of the hash table
all at once, the table grows incrementally. The key idea is that
one bucket is split at a time when the hash table is full.
Initially, keys are hashed into the available buckets
Keys are inserted using a hash function that is based on the
current number of buckets.
If a bucket overflows, only that bucket is split into two, and
the keys are redistributed.
Instead of doubling the table size all at once, one bucket is
split at a time.
This reduces the cost of resizing and avoids large memory
reallocations.
DYNAMIC HASHING
Advantages of Disadvantages of
Static Hashing is a hashing technique where the size of the hash table is
Dynamic Hashing Dynamic Hashing
fixed and does not change during its lifetime. It is used when the number
Flexible Growth: Dynamic hashing allows the Complex Implementation: Dynamic hashing
of records to be stored in the table
hash table to grow and shrink as the number
is known and is not expected to
(especially extendible hashing) is more complex
of keys changes.
change significantly. to implement compared to static hashing
Improved Performance: By resizing the table because it requires additional steps for resizing
and redistributing keys, dynamic hashing helps and redistributing keys.
maintain a low load factor, reducing the Performance Overhead: While resizing and
chance of collisions and improving search redistributing keys improves the hash table’s
performance. performance in the long term, it introduces
Efficient Memory Usage: Since the table grows some overhead during the resizing process,
incrementally (especially in linear hashing), especially if the table needs to grow rapidly.
memory is used efficiently without over-
allocating space upfront
THANK YOU!