0% found this document useful (0 votes)
14 views13 pages

Hashing

Uploaded by

widakim943
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views13 pages

Hashing

Uploaded by

widakim943
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

GRAPHS

Hashing
Hashing in the data structure is a technique of mapping a large chunk of data
into small tables using a hashing function.
Hashing is a technique or process of mapping keys, and values into the hash table
by using a hash function. It is done for faster access to elements. The efficiency
of mapping depends on the efficiency of the hash function used.

In Hashing technique, the hash table and hash function are used. Using the hash
function, we can calculate the address at which the value can be stored.

The main idea behind the hashing is to create the (key/value) pairs. If the key is
given, then the algorithm computes the index at which the value would be
stored. It can be written as:

Index = hash(key)

Hash Table
Hash Table is a data structure which stores data in an associative manner. In a
hash table, data is stored in an array format, where each data value has its own
unique index value. Access of data becomes very fast if we know the index of
the desired data.

Examples of Hashing in Data Structure


The following are real-life examples of hashing in the data structure –

MANJULA PRASAD DATA STRUCTURE 1


GRAPHS

• In schools, the teacher assigns a unique roll number to each student.


Later, the teacher uses that roll number to retrieve information about
that student.
• A library has an infinite number of books. The librarian assigns a unique
number to each book. This unique number helps in identifying the
position of the books on the bookshelf.

The hash function in a data structure maps the arbitrary size of data to fixed-
sized data. It returns the following values: a small integer value (also known as
hash value), hash codes, and hash sums. The hashing techniques in the data
structure are very interesting, such as:
hash = hashfunc(key)
index = hash % array_size
Hashing is a technique to convert a range of key values into a range of indexes
of an array. We're going to use modulo operator to get a range of key values.
Consider an example of hash table of size 20, and the following items are to be
stored. Item are in the (key, value) format.

• (1,20) • (14,32)
• (2,70) • (17,11)
• (42,80) • (13,78)
• (4,25) • (37,98)
• (12,44)
Sr.No. Key Hash Array Index

1 1 1 % 20 = 1 1

2 2 2 % 20 = 2 2

3 42 42 % 20 = 2 2

4 4 4 % 20 = 4 4

MANJULA PRASAD DATA STRUCTURE 2


GRAPHS

5 12 12 % 20 = 12 12

6 14 14 % 20 = 14 14

7 17 17 % 20 = 17 17

8 13 13 % 20 = 13 13

9 37 37 % 20 = 17 17

Characteristics of the hash function in the data structure are:


• Easy to compute: it should be computed quickly 7 returns values within
the range of the hash table.
• Uniform distribution: it should provide a uniform distribution of hash
values.
• Less collision: chose a hash function with good collision resolution
algorithm which can be used to compute alternative index if the collision
occurs.
• High load factor: it should have high load factor for a given set of keys.
Load factor=number of elements in hash table/hash table size.

Hash Collision
A hash collision is a random match in hash values that occurs when a hashing
algorithm produces the same hash value for two distinct pieces of data.
A hash collision or hash clash is when two pieces of data in a hash table share
the same hash value. The hash value in this case is derived from a hash
function which takes a data input and returns a fixed length of bits.

Choosing a hash function


Many hash functions use alphanumeric or numeric keys. The main hash
functions cover -

MANJULA PRASAD DATA STRUCTURE 3


GRAPHS

• Division Method.
• Mid Square Method.
• Folding Method.
• Multiplication Method.

1. Division Method
The division method is the simplest and easiest method used to generate a
hash value. In this hash function, the value of k is divided by M and uses the
remainder as obtained.
Formula - h(K) = k mod M
(where k = key value and M = the size of the hash table)
Advantages -
• This method works well for any value of M
• The division approach is extremely quick because it only calls for one
operation.
Disadvantages -
• This may lead to poor performance as consecutive keys are mapped to
consecutive hash values in the hash table
• There are situations when choosing the value of M requires particular
caution.
Example -
k = 1320 h (1320) = 1320 mod 11
M = 11 =0
2. Mid Square Method
The steps involved in computing this hash method include the following -
1. Squaring the value of k ( like k*k)
2. Extract the hash value from the middle r digits.
Formula - h(K) = h(k x k) (where k = key value )
Advantages –
• Since most or all of the key value's digits contribute to the outcome, this
strategy performs well. The middle digits of the squared result are
produced by a process in which all of the essential digits participate.
• The top or bottom digits of the original key value do not predominate in
the outcome.

MANJULA PRASAD DATA STRUCTURE 4


GRAPHS

Disadvantages -
• One of this method's constraints is the size of the key; if the key is large,
its square will have twice as many digits.
• Chance of repeated collisions.
Example -
Let's take the hash table with 200 memory locations and r = 2, as decided on
the size of the mapping in the table.
k = 50 = 2500
Therefore, Thus,
k=kxk h(50) = 50
= 50 x 50
3. Folding Method
There are two steps in this method -
1. The key-value k should be divided into a specific number of parts, such as
k1, k2, k3,..., kn, each having the very same number of digits aside from
the final component, which may have fewer digits than the remaining
parts.
2. Add each component separately. The last carry, if any, is disregarded to
determine the hash value.
Formula - k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
(Where, s = addition of the parts of key k)
Advantages -
• Breaks up the key value into precise equal-sized segments for an easy
hash value
• Independent of distribution in a hash table
Disadvantages -
• Sometimes inefficient if there are too many collisions
Example -
k = 54321 = 54 + 32+ 1
k1 = 54 ; k2 = 32 ; k3 = 1 = 87
Therefore, Thus,
s = k1 + k2 + k3 h (k) = 87

MANJULA PRASAD DATA STRUCTURE 5


GRAPHS

4. Multiplication Method
Steps to follow -
1. Pick up a constant value A (where 0 < A < 1)
2. Multiply A with the key value
3. Take the fractional part of kA
4. Take the result of the previous step and multiply it by the size of the hash
table, M.
Formula - h(K) = floor (M (kA mod 1))
(Where, M = size of the hash table, k = key value and A = constant value)
Advantages -
• It may be applied to any number between 0 and 1, albeit some numbers
tend to produce better results than others.
Disadvantages -
• When the table size is a power of two, the multiplication technique is
typically appropriate since multiplication hashing makes it possible to
compute the index by key quickly.
Example -
k = 1234 = floor [ 100 ( 441.57456 mod 1)]
A = 0.35784 = floor [100 ( 0. 57456)]
M = 100 = floor [ 57.456]
So, = 57
h (1234) = floor [ 100(1234 x Thus,
0.35784 mod 1)] h (1234) = 57

Collision Resolution
There are two primary hashing techniques in a data structure.
• Open hashing / separate chaining / closed addressing
• Closed hashing / open addressing
1. Open Hashing / Separate Chaining
Separate chaining is the most used collision hashing technique in data
structures that uses a lined list. Any two or more components that meet at the
same point are chained together to form a single-linked list known as a chain.
Every linked list member that hashes is chained to the same position here. Also

MANJULA PRASAD DATA STRUCTURE 6


GRAPHS

known as closed addressing, open hashing is used to avoid any hash collisions,
using an array of linked lists in order to resolve the collision.
Example -
• In a simple hash function, h(key) = key mod table size.

0 600

2 50 74 14

3 99

5 113

• Let us take a simple hash table with table size = 6


• Sequence keys = 50, 600, 74, 113, 14, 99
Therefore, (As collision occurs after inserting keys 74 and 14, they are added to
the chain)

Advantages of Open hashing -


• Easy and basic to implement
• As there are a lot of empty spaces in the hash table, we can add more keys
to the table.

MANJULA PRASAD DATA STRUCTURE 7


GRAPHS

• Comparatively less sensitive to different load factors


• Mostly used when unsure about the amount and frequency of the keys to
be implemented in the hash table.
Disadvantages -

• Wastage of space
• With a long chain, the search time is increased
• Poor cache performance as compared to closed hashing.
2. Closed hashing (Open addressing)
Open addressing stores all entry records within the array itself, as opposed to
linked lists. The phrase 'open addressing' refers to the notion that the hash
value of an item does not identify its location or address. In order to insert a
new entry, the array is first checked before computing the hash index of the
hashed value, starting with the hashed indexIn Closed hashing, there are three
techniques that are used to resolve the collision:

• Linear Probing
Linear probing involves systematically checking the hash table from its very
beginning. A different site is searched if the one received is already occupied. In
linear probing, the interval between the probes is usually fixed (generally, to a
value of 1).

• The formula for linear probing: index = key % hashTableSize


• The hash(n) is the index computed using a hash function, and T is the table
size.
• If slot index = ( hash(n) % T) is full, then the next slot index is calculated by
adding 1 ((hash(n) + 1) % T).
The sequence goes as -

• index = ( hash(n) % T)
• (hash(n) + 1) % T
• (hash(n) + 2) % T
• (hash(n) + 3) % T … and so on.
Example -

MANJULA PRASAD DATA STRUCTURE 8


GRAPHS

• For a hash table, Table Size = 20


• Keys = 3,2,46,6,11,13,53,12,70,90 Therefore,

INDEX
SL. NO KEY HASH INDEX
(AFTER LINEAR PROBING)

1 3 3%20 3 3

2 2 2%20 2 2

3 46 46%20 6 6

4 6 6%20 6 7

5 11 11%20 11 11

6 13 13%20 13 13

7 53 53%20 13 14

MANJULA PRASAD DATA STRUCTURE 9


GRAPHS

INDEX
SL. NO KEY HASH INDEX
(AFTER LINEAR PROBING)

8 12 12%20 12 12

9 70 70%20 10 10

10 90 90%20 10 11

• Quadratic Probing
The only distinction between linear and quadratic probing is the space between
succeeding probes or entry slots. When a hashed index slot for an entry record
is already taken, you must start traversing until you discover an open slot. The
spacing between slots is calculated by adding each subsequent value of any
arbitrary polynomial in the initial hashed index.
The formula for quadratic probing: index = index % hashTableSize
• The hash(n) is the index computed using a hash function, and T is the table
size.
• If slot index = ( hash(n) % T) is full, then the next slot index is calculated by
adding 1 (hash(n) + 1 x 1) % T
The sequence goes as -
• index = ( hash(n) % T)
• (hash(n) + 1 x 1) % T
• (hash(n) + 2 x 2) % T
• (hash(n) + 3 x 3) % T … and so on
Example -
• For a hash table, Table Size = 7
• Keys = 22,30,50 Thus,

MANJULA PRASAD DATA STRUCTURE 10


GRAPHS

INDEX
SL. NO KEY HASH INDEX
(AFTER QUADRATIC PROBING)

1 22 22%7 1 1

2 30 30%7 2 2

5 50 50%7 1 1(1+2 x 2)

• Double-Hashing
It is another hash function that determines the intervals between probes. An
optimized method of reducing clustering is double hashing. An additional hash
function is used to calculate the increments for the probing sequence.
The formula for double hashing - (first hash(key) + i * secondHash(key)) % size
of the table
The sequence goes as follows -
• index = hash(x) % S
• (hash(x) + 1*hash2(x)) % S

MANJULA PRASAD DATA STRUCTURE 11


GRAPHS

• (hash(x) + 2*hash2(x)) % S
• (hash(x) + 3*hash2(x)) % S … an so on
Example -
• For a hash table, Table Size = 7
• Keys = 27,43,98,72 Thus,

INDEX
SL. NO KEY HASH INDEX
(AFTER DOUBLE HASHING)

1 43 43%7 1

[h1(92) + i * (h2(92)] % 7
= [6 + 1 * (1 + 92 % 5)] % 7
2 92 92%7 6
=9%7
=2

[h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
5 72 72%7 2
=5%7
=5

6 27 27%7 6

MANJULA PRASAD DATA STRUCTURE 12


GRAPHS

Differences between Static and Dynamic Hashing


Here are some prominent differences by which Static Hashing is different than
Dynamic Hashing −

Key Factor Static Hashing Dynamic Hashing

Form of Fixed-size, non-changing Variable-size, changing data.


Data data.

The resulting Data Bucket is The resulting Data Bucket is of


Result
of fixed-length. variable-length.

Challenge of Bucket Bucket overflow can occur very


Bucket overflow can arise often late or doesn’t occur at all.
Overflow depending upon memory
size.

Complexity Simple Complex

Efficient Less Efficient More Efficient

Addresses in the memory Addresses are always generated


Storing of are sorted according to a using a hash function on the key
address key value called the primary value.
key

MANJULA PRASAD DATA STRUCTURE 13

You might also like