Hashing
https://www.geeksforgeeks.org/hashing-data-structure/
Hashing is a technique used in data structures that efficiently
stores and retrieves data in a way that allows for quick access.
Hashing involves mapping data to a specific index in a
hash table (an array of items) using a hash function that
enables fast retrieval of information based on its key.
The great thing about hashing is, we can achieve all
three operations (search, insert and delete) in O(1) time
on average.
Hashing is mainly used to implement a set of distinct
items and dictionaries (key value pairs).
Basics of Hashing in Data Structure
Introduction
Applications
Index Mapping (or Trivial Hashing)
Separate Chaining for Collision Handling
Open Addressing for Collision Handling
Double Hashing
Load Factor and Rehashing
What is Hashing?
Hashing in Data Structures refers to the process of transforming
a given key to another value. It involves mapping data to a
specific index in a hash table using a hash function that enables
fast retrieval of information based on its key. The
transformation of a key to the corresponding value is done using
a Hash Function and the value obtained from the hash function
is called Hash Code .
Need for Hash data structure
Every day, the data on the internet is increasing multifold and
it is always a struggle to store this data efficiently. In day-to-
day programming, this amount of data might not be that big,
but still, it needs to be stored, accessed, and processed easily
and efficiently. A very common data structure that is used for
such a purpose is the Array data structure.
Now the question arises if Array was already there, what was
the need for a new data structure! The answer to this is in the
word ” efficiency “. Though storing in Array takes O(1) time,
searching in it takes at least O(log n) time. This time appears to
be small, but for a large data set, it can cause a lot of problems
and this, in turn, makes the Array data structure inefficient.
So now we are looking for a data structure that can store the
data and search in it in constant time, i.e. in O(1) time. This is
how Hashing data structure came into play. With the
introduction of the Hash data structure, it is now possible to
easily store data in constant time and retrieve them in constant
time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is
fed as input in the hash function the technique that
determines an index or location for storage of an item
in a data structure.
2. Hash Function: The hash function receives the input
key and returns the index of an element in an array
called a hash table. The index is known as the hash
index .
3. Hash Table: Hash table is a data structure that maps
keys to values using a special function called a hash
function. Hash stores the data in an associative manner
in an array where each data value has its own unique
index.
How does Hashing work?
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we
would like to store it in a table.
Our main objective here is to search or update the values stored
in the table quickly in O(1) time and we are not concerned
about the ordering of strings in the table. So the given set of
strings can act as a key and the string itself will act as the value
of the string but how to store the value corresponding to the
key?
Step 1: We know that hash functions (which is some
mathematical formula) are used to calculate the hash
value which acts as the index of the data structure
where the value will be stored.
Step 2: So, let’s assign
o “a” = 1,
o “b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of
all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to
store these strings. The hash function that is used here is
the sum of the characters in key mod Table size . We
can compute the location of the string in the array by
taking the sum(string) mod 7 .
Step 5: So we will then store
o “ab” in 3 mod 7 = 3,
o “cd” in 7 mod 7 = 0, and
o “efg” in 18 mod 7 = 4.
The above technique enables us to calculate the location of a
given string by using a simple hash function and rapidly find the
value that is stored in that location. Therefore the idea of
hashing seems like a great way to store (key, value) pairs of the
data in a table.
What is a Hash function?
The hash function creates a mapping between key and value,
this is done through the use of mathematical formulas known as
hash functions. The result of the hash function is referred to as
a hash value or hash. The hash value is a representation of the
original string of characters but usually smaller than the
original.
For example: Consider an array as a Map where the key is the
index and the value is the value at that index. So for an array A
if we have index i which will be treated as the key then we can
find the value by simply looking at the value at A[i].
Types of Hash functions:
https://www.geeksforgeeks.org/hash-functions-and-list-
types-of-hash-functions/
This article focuses on discussing different hash functions:
1. Division Method.
2. Multiplication Method
3. Mid-Square Method
4. Folding Method
5. Cryptographic Hash Functions
6. Universal Hashing
7. Perfect Hashing
Let’s begin discussing these methods in detail.
1. Division Method
The division method involves dividing the key by a prime
number and using the remainder as the hash value.
h(k)=k mod m
Where k is the key and 𝑚m is a prime number.
Advantages:
Simple to implement.
Works well when 𝑚m is a prime number.
Disadvantages:
Poor distribution if 𝑚m is not chosen wisely.
2. Multiplication Method
In the multiplication method, a constant 𝐴A (0 < A < 1) is used
to multiply the key. The fractional part of the product is then
multiplied by 𝑚m to get the hash value.
h(k)=⌊m(kAmod1)⌋
Where ⌊ ⌋ denotes the floor function.
Advantages:
Less sensitive to the choice of 𝑚m.
Disadvantages:
More complex than the division method.
3. Mid-Square Method
In the mid-square method, the key is squared, and the middle
digits of the result are taken as the hash value.
Steps:
1. Square the key.
2. Extract the middle digits of the squared value.
Advantages:
Produces a good distribution of hash values.
Disadvantages:
May require more computational effort.
4. Folding Method
The folding method involves dividing the key into equal parts,
summing the parts, and then taking the modulo with respect to
𝑚m .
Steps:
1. Divide the key into parts.
2. Sum the parts.
3. Take the modulo 𝑚m of the sum.
Advantages:
Simple and easy to implement.
Disadvantages:
Depends on the choice of partitioning scheme.
5. Cryptographic Hash Functions
Cryptographic hash functions are designed to be secure and are
used in cryptography. Examples include MD5, SHA-1, and
SHA-256.
Characteristics:
Pre-image resistance.
Second pre-image resistance.
Collision resistance.
Advantages:
High security.
Disadvantages:
Computationally intensive.
6. Universal Hashing
Universal hashing uses a family of hash functions to minimize
the chance of collision for any given set of inputs.
h(k)=((a⋅k+b)modp)modm
Where a and b are randomly chosen constants, p is a prime
number greater than m, and k is the key.
Advantages:
Reduces the probability of collisions.
Disadvantages:
Requires more computation and storage.
7. Perfect Hashing
Perfect hashing aims to create a collision-free hash function for
a static set of keys. It guarantees that no two keys will hash to
the same value.
Types:
Minimal Perfect Hashing: Ensures that the range of the
hash function is equal to the number of keys.
Non-minimal Perfect Hashing: The range may be larger
than the number of keys.
Advantages:
No collisions.
Disadvantages:
Complex to construct.
Conclusion
In conclusion, hash functions are very important tools that help
store and find data quickly. Knowing the different types of hash
functions and how to use them correctly is key to making
software work better and more securely. By choosing the right
hash function for the job, developers can greatly improve the
efficiency and reliability of their systems.
Properties of a Good hash function
A hash function that maps every item into its own unique slot is
known as a perfect hash function. We can construct a perfect
hash function if we know the items and the collection will never
change but the problem is that there is no systematic way to
construct a perfect hash function given an arbitrary collection of
items. Fortunately, we will still gain performance efficiency even
if the hash function isn’t perfect. We can achieve a perfect hash
function by increasing the size of the hash table so that every
possible value can be accommodated. As a result, each item will
have a unique slot. Although this approach is feasible for a small
number of items, it is not practical when the number of
possibilities is large.
So, We can construct our hash function to do the same but the
things that we must be careful about while constructing our own
hash function.
A good hash function should have the following properties:
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position
is equally likely for each.
3. Should minimize collisions.
4. Should have a low load factor(number of items in the
table divided by the size of the table).
Complexity of calculating hash value using the hash
function
Time complexity: O(n)
Space complexity: O(1)
Problem with Hashing
If we consider the above example, the hash function we used is
the sum of the letters, but if we examined the hash function
closely then the problem can be easily visualized that for
different strings same hash value is being generated by the hash
function.
For example: {“ab”, “ba”} both have the same hash value, and
string {“cd”,”be”} also generate the same hash value, etc. This is
known as collision and it creates problem in searching, insertion,
deletion, and updating of value.
What is Collision?
Collision in Hashing occurs when two different keys map to the
same hash value. Hash collisions can be intentionally created for
many hash algorithms. The probability of a hash collision
depends on the size of the algorithm, the distribution of hash
values and the efficiency of Hash function.
The hashing process generates a small number for a big key, so
there is a possibility that two keys could produce the same
value. The situation where the newly inserted key maps to an
already occupied, and it must be handled using some collision
handling technology.
How to handle Collisions?
There are mainly two methods to handle collision:
1. Separate Chaining
2. Open Addressing
1) Separate Chaining
The idea is to make each cell of the hash table point to a linked
list of records that have the same hash function value. Chaining
is simple but requires additional memory outside the table.
Example: We have given a hash function and we have to insert
some elements in the hash table using a separate chaining
method for collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
Let’s see step by step approach to how to solve the above
problem:
2) Open Addressing
In open addressing, all elements are stored in the hash table
itself. Each table entry contains either a record or NIL. When
searching for an element, we examine the table slots one by one
until the desired element is found or it is clear that the element
is not in the table.
2.a) Linear Probing
In linear probing, the hash table is searched sequentially that
starts from the original location of the hash. If in case the
location that we get is already occupied, then we check for the
next location.
Algorithm:
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
store the value directly by hashTable[key] = data
3. If the hash index already has some value then
check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then
store the value. Otherwise try for next index.
5. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5”
and a sequence of keys that are to be inserted are 50, 70, 76,
85, 93.
2.b) Quadratic Probing
Quadratic probing is an open addressing scheme in computer
programming for resolving hash collisions in hash tables.
Quadratic probing operates by taking the original hash index
and adding successive values of an arbitrary quadratic
polynomial until an open slot is found.
An example sequence using quadratic probing is:
H+1 2 ,H +2 2 ,H +3 2 ,H +4 2 …………………. H + k 2
This method is also known as the mid-square method because in
this method we look for i 2 ‘th probe (slot) in i’th iteration and
the value of i = 0, 1, . . . n – 1. We always start from the
original hash location. If only the location is occupied then we
check the other slots.
Let hash(x) be the slot index computed using the hash function
and n be the size of the hash table.
If the slot hash(x) % n is full, then we try (hash(x) + 1 2 ) % n.
If (hash(x) + 1 2 ) % n is also full, then we try (hash(x) + 2 2 )%
n.
If (hash(x) + 2 2 ) % n is also full, then we try (hash(x) + 3 2 )%
n.
This process will be repeated for all the values of i until an
empty slot is found
Example: Let us consider table Size = 7, hash function as
Hash(x) = x % 7 and collision resolution strategy to be f(i) = i 2 .
Insert = 22, 30, and 50
https://www.geeksforgeeks.org/introduction-to-hashing-2/
Applications of Hash Data structure
Hash is used in databases for indexing.
Hash is used in disk-based data structures.
In some programming languages like Python, JavaScript
hash is used to implement objects.
Real-Time Applications of Hash Data structure
Hash is used for cache mapping for fast access to the
data.
Hash can be used for password verification.
Hash is used in cryptography as a message digest.
Rabin-Karp algorithm for pattern matching in a string.
Calculating the number of different substrings of a
string.
Advantages of Hash Data structure
Hash provides better synchronization than other data
structures.
Hash tables are more efficient than search trees or other
data structures
Hash provides constant time for searching, insertion,
and deletion operations on average.
Disadvantages of Hash Data structure
Hash is inefficient when there are many collisions.
Hash collisions are practically not avoided for a large set
of possible keys.
Hash does not allow null values.
Hashing is a computation technique in which hashing functions take variable-
length data as input and issue a shortened fixed-length data as output. The
output data is often called a "Hash Code", "Key", or simply "Hash". The data
on which hashing works is called a "Data Bucket".
Characteristics of Hashing Technique
Hashing techniques come with the following characteristics −
The first characteristic is, hashing technique is deterministic. Means, whatever
number of times you invoke the function on the same test variable, it delivers the
same fixed-length result.
The second characteristic is its unidirectional action. There is no way you can use the
Key to retrieve the original data. Hashing is irreversible.
What are Hash Functions?
Hash functions are mathematical functions that are executed to generate
addresses of data records. Hash functions use memory locations that store
data, called ‘Data Buckets’.
Hash functions are used in cryptographic signatures, securing privacy of
vulnerable data, and verifying correctness of the received files and texts. In
computation, hashing is used in data processing to locate a single string of
data in an array, or to calculate direct addresses of records on the disk by
requesting its Hash Code or Key.
Explore our latest online courses and learn new skills at your own pace. Enroll and become
a certified expert to boost your career.
Applications of Hashing
Hashing is applicable in the following area −
Password verification
Associating filename with their paths in operating systems
Data Structures, where a key-value pair is created in which the key is a unique value,
whereas the value associated with the keys can be either same or different for
different keys.
Board games such as Chess, tic-tac-toe, etc.
Graphics processing, where a large amount of data needs to be matched and fetched.
Database Management Systems where phenomenal records are required to be
searched, queried, and matched for retrieval. For example, DBMS used in
banking or large public transport reservation software.
Read through this article to find out more about Hashing and specifically the
difference between two important hashing techniques − static
hashing and dynamic hashing.
What is Static Hashing?
It is a hashing technique that enables users to lookup a definite data set.
Meaning, the data in the directory is not changing, it is "Static" or fixed. In this
hashing technique, the resulting number of data buckets in memory remains
constant.
Operations Provided by Static Hashing
Static hashing provides the following operations −
Delete − Search a record address and delete a record at the same address or delete a
chunk of records from records for that address in memory.
Insertion − While entering a new record using static hashing, the hash function (h)
calculates bucket address "h(K)" for the search key (k), where the record is going to
be stored.
Search − A record can be obtained using a hash function by locating the address of
the bucket where the data is stored.
Update − It supports updating a record once it is traced in the data bucket.
Advantages of Static Hashing
Static hashing is advantageous in the following ways −
Offers unparalleled performance for small-size databases.
Allows Primary Key value to be used as a Hash Key.
Disadvantages of Static Hashing
Static hashing comes with the following disadvantages −
It cannot work efficiently with the databases that can be scaled.
It is not a good option for large-size databases.
Bucket overflow issue occurs if there is more data and less memory.
What is Dynamic Hashing?
It is a hashing technique that enables users to lookup a dynamic data set.
Means, the data set is modified by adding data to or removing the data from,
on demand hence the name ‘Dynamic’ hashing. Thus, the resulting data bucket
keeps increasing or decreasing depending on the number of records.
In this hashing technique, the resulting number of data buckets in memory is
ever-changing.
Operations Provided by Dynamic Hashing
Dynamic hashing provides the following operations −
Delete − Locate the desired location and support deleting data (or a chunk of data) at
that location.
Insertion − Support inserting new data into the data bucket if there is a space
available in the data bucket.
Query − Perform querying to compute the bucket address.
Update − Perform a query to update the data.
Advantages of Dynamic Hashing
Dynamic hashing is advantageous in the following ways −
It works well with scalable data.
It can handle addressing large amount of memory in which data size is always
changing.
Bucket overflow issue comes rarely or very late.
Disadvantages of Dynamic Hashing
Dynamic hashing comes with the following disadvantage −
The location of the data in memory keeps changing according to the bucket size. Hence
if there is a phenomenal increase in data, then maintaining the bucket address table
becomes a challenge.
Differences between Static and Dynamic Hashing
Here are some prominent differences by which Static Hashing is different than
Dynamic Hashing −
Key Factor Static Hashing Dynamic Hashing
Form of Variable-size, changing
Fixed-size, non-changing data.
Data data.
The resulting Data Bucket is of The resulting Data Bucket
Result
fixed-length. is of variable-length.
Challenge of Bucket overflow Bucket overflow can occur
Bucket
can arise often depending upon very late or doesn’t occur
Overflow
memory size. at all.
Complexity Simple Complex
Conclusion
Hashing is a computation technique that uses mathematical functions called
Hash Functions to calculate the location (address) of the data in the memory.
We learnt that there are two different hashing functions namely, Static hashing
and Dynamic hashing.
Each hashing technique is different in terms of whether they work on fixed-
length data bucket or a variable-length data bucket. Selecting a proper hashing
technique is required by considering the amount of data needed to be handled,
and the intended speed of the application.
Static Hashing in DBMS
Static hashing refers to a hashing technique that allows the user to search over
a pre-processed dictionary (all elements present in the dictionary are final and
unmodified). In this article, we will take an in-depth look at static hashing in a
DBMS.
What is Static Hashing?
When a search key is specified in a static hash, the hashing algorithm always
returns the same address. For example, if you take the mod-4 hash function,
only 5 values will be produced. For this to work, the output address must
always be the same. The number of buckets at any given time is constant.
The data bucket address obtained by static hashing will always be the same. So,
if we use the mod(5) hash function to get the address of EmpId = 103, we always
get the same data bucket address 3. In this case, the data bucket position
remains unchanged. Therefore, all existing data buckets in memory remain
unchanged while the entire hashing process remains the same. In this case,
there are five partitions in the memory used to hold data.
What are the Operations in Static Hashing?
Searching the data: When data is needed, the same hash function
is used to get the address of the packet where the data is stored.
Inserting Data: When new data is entered into the table, the hash
key is used to create the address of the new data and place the
data there.
Deleting a Record: To delete a record, we must first provide the
record to be destroyed. The data of this address will be deleted
from memory.
Updating a Record: To update the info, we will first find it using
the hash function and then change the profile info. If we want to
add data to the file, but the address of the dataset created by
the hash function is not empty or there is data in the address,
we cannot add information. Bucket overflow is the term used to
describe this situation in static hashing.
There are many ways to solve bucket overflow problem. Here are
some of the most common uses:
Open Hashing
When the hash function returns an address that already contains
information, the next packet is assigned to it through a process
called linear probing. For example, if the new address needed
after input is R4, the hash function will produce 112 as the
address of R4. However, the residence is completely finished; so
the system selects 113 as the next available packet and assigns it
R4.
Close Hashing
When the data group is full, a new bucket will be created for the
same hash result and added after the old group. This method is
called overflow chaining. For example, if the new address to be
added to the file is R4, the hash function will give it address 112.
But the bucket is too full to hold any more information. In this
case, a new bucket will be placed and tied to the end of the 112
bucket.
Advantages of Static Hashing
The advantages of using static hashes in a DBMS are:
Performance is very good for small data sets.
Help with storage management.
Hash keys facilitate access to address information.
The primary key value can be used by the hash value.
Disadvantages of Static Hashing
Disadvantages of using static hashing techniques in DBMS are:
Static hashing is not a good choice for big data.
This process takes longer than usual because the hash
function has to go through all memory locations in the
DBMS system.
Does not work with extensible databases.
The sorting technique is not good compared to other
hashing techniques.
Conclusion
Static hashing is one of many other hashing techniques used to
show that data is not stored sequentially and also provides the
correct memory address for each value in the data. Unlike other
hashes, static hash methods can be used for static results without
changing the values of objects, objects, and relational data in
the DBMS.
Frequently Asked Questions on Static Hashing -
FAQs
What is hash collision?
A hash collision occurs when the hash values of two or more files
in the dataset are not assigned to the same location in the hash
table.
What to do with hash conflicts?
There are two methods that can be used to avoid hash collisions
which includes rehashing in which calls the association hash
function, which is used repeatedly until space appears. Second
method is the chaining method creates a linked list of objects
whose keys have the same value. This method should have an
extension for each address.
What is the time complexity of static hashing?
The Modulo Hash Function calculates the hash of key data and
performs a modulo N operation to find the array index (node
identifier) to store or retrieve a key. The time complexity of
finding an identity in the static hash part is constant O(1).
https://quescol.com/data-structure/linear-probing
https://databasetown.com/types-of-hashing-in-dbms-static-
dynamic-hashing/
https://pediaa.com/what-is-the-difference-between-static-
and-dynamic-hashing/