0% found this document useful (0 votes)

23 views33 pages

Hashing

notes of hashing

Uploaded by

ishnkhan123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views33 pages

Hashing

notes of hashing

Uploaded by

ishnkhan123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Hashing

The sequential search algorithm takes time

proportional to the data size, i.e, O(n).
Binary search improves on liner search reducing the search
time to O(log n).
With a BST, an O(log n) search efficiency can be
obtained; but the worst-case complexity is O(n).
To guarantee the O(log n) search time, BST height
balancing is required ( i.e., AVL trees).
Hashing

Suppose that we want to store 10,000 students records (each with

a 5-digit ID) in a given container.

·A linked list implementation would take O(n) time.

·A height balanced tree would give O(log n)access time.

·Using an array of size 100,000 would give O(1)access time but

will lead to a lot of space wastage.
Hashing

Is there some way that we could get O(1)

access without wasting a lot of space?
The answer is hashing.
Constant time per operation (on the average)
Like an array, come up with a function to map the
large range into one which we can manage.
Basic Idea

Use hash (or hashing) function to map hash

key into hash address (location) in a hash
table
Hash Function H: K -> L
If Student A has ID(Key) k and h is hash
function, then A’s Details is stored in
position h(k) of table
To search for A, compute h(k) to locate
position. If no element, hash table does
not contain A.
Example

Let keys be ID of 100 students And ID in

form of like 345610.

Now, we decided to take A[100]

And, Hash function is , say , LAST TWO DIGIT

So, 103062 will go to location 62 And same if

some one have 113062 Then again goes to the
location 62
THIS EVENT IS CALLED COLLISION
Collisions

U
(universe of keys) h(k1)
h(k4)

K k1
h(k2) = h(k5)
(actual k4 k2
keys) Collisions!
k5 k3 h(k3)

m-1

6
Collisions
Two or more keys hash to the same location
For a given set K of keys
If |K| ≤ m, collisions may or may not happen, depending on the
hash function
If |K| > m, collisions will definitely happen (i.e., there must be at
least two keys that have the same hash value)
Avoiding collisions completely is hard, even with a good hash
function

7
Collision Resolution

Methods:
Separate Chaining (open hashing)
Open addressing (closed hashing)
Linear probing
Quadratic probing
Double hashing

We will discuss chaining first, and ways to build “good”

functions.
8
Collision with Chaining - Discussion
Choosing the size of the table
Small enough not to waste space
Large enough such that lists remain short
Typically 1/5 or 1/10 of the total number of elements

How should we keep the lists: ordered or not?

Not ordered!
Insert is fast

Can easily remove the most recently inserted elements

Insertion in Hash Tables

Worst-case running time is O(1)

Assumes that the element being inserted isn’t already in the list
It would take an additional search to check if it was already inserted

10
Deletion in Hash Tables
Need to find the element to be deleted.
Worst-case running time:
Deletion depends on searching the corresponding list

11
Searching in Hash Tables

search for an element with key k in list T[h(k)]

Running time is proportional to the length of the list of

elements at location h(k)

12
Analysis of Hashing with Chaining:
Worst Case
How long does it take to search
T
for an element with a given key? 0

Worst case:
All n keys hash to the same slot

Worst-case time to search is Θ(n),

plus time to compute the hash
chain
function
m-1

13
Load Factor of a Hash Table

Load factor of a hash table T:

T
α = n/m 0

n = # of elements stored in the table chain

m = # of locations in the table = # of linked lists chain

α encodes the average number of elements chain

stored in a chain chain

α can be <, =, > 1 m-1

14
Hash Functions

A hash function transforms a hash key into a hash table address

What makes a good hash function?
(1) Easy to compute
(2) Approximates a random function: for every input, every output is equally likely (simple uniform
hashing)
In practice, it is very hard to satisfy the simple uniform hashing property
i.e., we don’t know in advance the probability distribution that keys are drawn from

15
Good Approaches for Hash Functions

Minimize the chance that closely related keys hash to the same slot

Strings such as pt and pts should hash to different slots

Derive a hash value that is independent from any patterns that may exist in the
distribution of the keys

Hash keys such as 199 and 499 should hash to different slots

16
The Division Method
Idea:

Map a key k into one of the m slots by taking the

remainder of k divided by m
h(k) = k mod m
m is usually chosen to be a prime number or a number without small divisors
to minimize the number of collisions
Advantage:
fast, requires only one operation
Disadvantage:
Certain values of m are bad, e.g.,
power of 2
non-prime numbers

17
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
The Midsquare Method
Idea:

Key k is squared.
h(k) = l
l is obtained by deleting digits from both
ends of k2
Same positions of
k2 are used for all the keys

K 3205 7148 2345

K2 10272025 51093904 5499025

H(k) 72 93 99
19
The mid-square method is a very good hashing method. It involves two steps to compute
the hash value-
Square the value of the key k i.e. k2
Extract the middle r digits as the hash value.
Suppose the hash table has 100 memory locations. So r = 2 because two digits are required
to map the key to the memory location.

k = 60
k x k = 60 x 60
= 3600
h(60) = 60

The hash value obtained is 60

The Folding Method
Idea:

Key k is partitioned into a number of parts k1,

k2,…,kr Where each part except possibly the last
has same number of digits as the required address
h(k) = k1+ k2+…+kr
Where leading-digit carries are ignored, if any
Sometimes even numbered parts are reversed

K 3205 7148 2345

32+05 71+48 23+45

H(k)
37 19 68
22
This method involves two steps:

Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has the same number of digits
except for the last part that can have lesser digits than the other parts.
Add the individual parts. The hash value is obtained by ignoring the last carry if any.
Formula:

k = k1, k2, k3, k4, ….., kn

s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s

Here,
s is obtained by adding the parts of the key k
Open Addressing
If we have enough contiguous memory to store all the keys (m > N) ⇒ store the
keys in the table itself
It is called “open” because the address where key k is stored also
e.g., insert 14
depends on keys already stored in hash table along with h(k)
It is also called closed hashing

No need to use linked lists anymore

Hashing with chaining is called open hashing
Basic idea:
Insertion: if a slot is full, try another one,
until you find an empty one
Search: follow the same sequence of probes
Deletion: more difficult

Search time depends on the length of the

probe sequence!
Common Open Addressing Methods

Linear probing
Quadratic probing
Double hashing

25
Linear probing: Inserting a key
Idea: when there is a collision, check the next available
position in the table (i.e., probing)

(h(k) + i) mod m, i=0,1,2,...

First slot probed: h(k)

Second slot probed: h(k) + 1
Third slot probed: h(k)+2, and so on
probe sequence: < h(k), h(k)+1 , h(k)+2 , ....>
The process wraps around to the beginning of
the table
26 wrap around
Linear Probing Example
27

insert(14) insert(8) insert(21) insert(2)

14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2
0 0 0 0
14 14 14 14
1 1 1 1
8 8 8
2 2 2 2
21 12
3 3 3 3
2
4 4 4 4

5 5 5 5

6 6 6 6

1 1 3 2
probes:
Quadratic probing: Inserting a key
Idea: when there is a collision, check the next 0
available position in the table 1
2
(h(k) + i2) mod m, i=0,1,2,... 3
probe sequence: < h(k), h(k)+1 , h(k)+4 , ....> 4
First slot probed: h(k) 5
Second slot probed: h(k) + 1 6
Third slot probed: h(k)+4, and so on 7
The process wraps around to the beginning 8
of the table 9
10
Clustering problem is less serious
11
But it is still an issue (secondary clustering)
12
multiple keys hashed to the same spot all follow the same
probe sequence.
28
Quadratic Probing Example
29

insert(14) insert(8) insert(21) insert(2)

14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2
0 0 0 0
14 14 14 14
1 1 1 1
8 8 8
2 2 2 2
2
3 3 3 3

4 4 4 4
21 21
5 5 5 5

6 6 6 6

1 1 3 1
probes:
Double Hashing
(1) Use one hash function to determine the first slot
(2) Use a second hash function to determine the increment for the
probe sequence
(h1(k) + i h2(k) ) mod m, i=0,1,...
Initial probe: h1(k)
Second probe is offset by h2(k) mod m, so on ...
Advantage: avoids clustering
Disadvantage: harder to delete an element

30
Double Hashing: Example
h1(k) = k mod 13
0
h2(k) = 1+ (k mod 11) 1 79
h(k)= (h1(k) + i h2(k) ) mod 13 2
Insert key 14: 3
4 69
h1(14) = 14 mod 13 = 1
5 98
h(14) = (h1(14) + h2(14)) mod 13
6
= (1 + 4) mod 13 = 5 7 72
h(14,2) = (h1(14) + 2 h2(14)) mod 13 8
= (1 + 8) mod 13 = 9 9 14
10
11 50
12

31
Double Hashing Example
32
insert(14) insert(8) insert(21) insert(2) insert(7)
14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2 7%7 = 0
5-(21%5)=4 5-(21%5)=4

0 0 0 0 0
14 14 14 14 14
1 1 1 1 1
8 8 8 8
2 2 2 2 2
2 2
3 3 3 3 3

4 4 4 4 4
21 21 21
5 5 5 5 5

6 6 6 6 6

1 1 2 1 ??
probes:
Double Hashing Example
33
insert(14) insert(8) insert(21) insert(2) insert(56)
14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2 56%7 = 0
5-(21%5)=4 5-(56%5)=4

0 0 0 0 0
14 14 14 14 14
1 1 1 1 1
8 8 8 8
2 2 2 2 2
2 2
3 3 3 3 3

4 4 4 4 4
21 21 21
5 5 5 5 5
56
6 6 6 6 6

1 1 2 1 4
probes:

6 - Hashing
No ratings yet
6 - Hashing
52 pages
Hashing Techniques Done
No ratings yet
Hashing Techniques Done
53 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
23 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
06 - APS - Hash Table
No ratings yet
06 - APS - Hash Table
28 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hashing
No ratings yet
Hashing
20 pages
Hash Tables: Concepts & Implementations
No ratings yet
Hash Tables: Concepts & Implementations
53 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Hash Tables and Collision Resolution
No ratings yet
Hash Tables and Collision Resolution
47 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
32 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hashing
No ratings yet
Hashing
16 pages
Understanding Hashing Techniques and Methods
No ratings yet
Understanding Hashing Techniques and Methods
33 pages
Hashing
No ratings yet
Hashing
30 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
13 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
51 pages
Unit2 Hashing DSA
No ratings yet
Unit2 Hashing DSA
55 pages
Hashing Techniques & Functions
No ratings yet
Hashing Techniques & Functions
30 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
53 pages
Unit 5
No ratings yet
Unit 5
50 pages
University Institute of Engineering CSE-2 Year: Advanced Data Structures and Algorithms
No ratings yet
University Institute of Engineering CSE-2 Year: Advanced Data Structures and Algorithms
26 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
Hashing
No ratings yet
Hashing
34 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
HASHING
No ratings yet
HASHING
63 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hashing: Collision Handling Methods
No ratings yet
Hashing: Collision Handling Methods
52 pages
Hashing Techniques and Applications
No ratings yet
Hashing Techniques and Applications
44 pages
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
No ratings yet
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
24 pages
Introduction to Hashing Techniques
No ratings yet
Introduction to Hashing Techniques
65 pages
Hash Tables and Collision Handling Techniques
No ratings yet
Hash Tables and Collision Handling Techniques
25 pages
Hashing
No ratings yet
Hashing
37 pages
UNIT 8 Hashing
No ratings yet
UNIT 8 Hashing
24 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
4
No ratings yet
4
29 pages
2,2 Hashing
No ratings yet
2,2 Hashing
30 pages
Week13 1
No ratings yet
Week13 1
16 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
CH 4
No ratings yet
CH 4
58 pages
Hashing Methods
No ratings yet
Hashing Methods
20 pages
08 Hashing
No ratings yet
08 Hashing
26 pages
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
9 pages
Understanding Hash Tables in Python
No ratings yet
Understanding Hash Tables in Python
33 pages
Hash Tables: A Programmer's Guide
No ratings yet
Hash Tables: A Programmer's Guide
26 pages
Lecture 4 Hash Table Stu
No ratings yet
Lecture 4 Hash Table Stu
13 pages
Lecture 6
No ratings yet
Lecture 6
57 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Hashing Techniques and Collision Resolution
100% (1)
Hashing Techniques and Collision Resolution
22 pages
Hashing & Collision Techniques Notes
No ratings yet
Hashing & Collision Techniques Notes
23 pages
Linked List
No ratings yet
Linked List
66 pages
Arrays
No ratings yet
Arrays
42 pages
Unit II-Part1
No ratings yet
Unit II-Part1
22 pages
Unit III-Part1
No ratings yet
Unit III-Part1
29 pages
Comparison of TTL
No ratings yet
Comparison of TTL
1 page
Unit 3
No ratings yet
Unit 3
26 pages
AS/NZS 1365:1996 Tolerances For Flat-Rolled Steel Products: Title
No ratings yet
AS/NZS 1365:1996 Tolerances For Flat-Rolled Steel Products: Title
23 pages
MMA 167 Marine Structural Engineering: LNG Carriers
100% (1)
MMA 167 Marine Structural Engineering: LNG Carriers
10 pages
Combined Science Form 2 Paper 2 Exam
No ratings yet
Combined Science Form 2 Paper 2 Exam
13 pages
Unit 5 Quiz Review Results
No ratings yet
Unit 5 Quiz Review Results
3 pages
BC - Haircutting
No ratings yet
BC - Haircutting
16 pages
Chapter 5 - Elasticity
No ratings yet
Chapter 5 - Elasticity
4 pages
PME Pneumatic cylinders-DATA Sheet
No ratings yet
PME Pneumatic cylinders-DATA Sheet
2 pages
Analytical Voltage Sensitivity-Based Distributed Volt - Var Control For Mitigating Voltage-Violations in Low-Voltage Distribution Networks
No ratings yet
Analytical Voltage Sensitivity-Based Distributed Volt - Var Control For Mitigating Voltage-Violations in Low-Voltage Distribution Networks
11 pages
Intro To Microcomputers: Block Diagram of A Typical Microcomputer
100% (2)
Intro To Microcomputers: Block Diagram of A Typical Microcomputer
6 pages
Network Modeling and Analysis in ArcGIS
No ratings yet
Network Modeling and Analysis in ArcGIS
30 pages
Substrate Noise in Mixed-Signal Circuits
No ratings yet
Substrate Noise in Mixed-Signal Circuits
56 pages
Model of Questions MCQ
No ratings yet
Model of Questions MCQ
12 pages
CM All Unit MCQ Qa Student
No ratings yet
CM All Unit MCQ Qa Student
16 pages
Pengaruh Warna LED pada Anggrek Phalaenopsis
No ratings yet
Pengaruh Warna LED pada Anggrek Phalaenopsis
8 pages
Std-12 Ch-10
No ratings yet
Std-12 Ch-10
13 pages
Remote Sensing Resolution Types
No ratings yet
Remote Sensing Resolution Types
30 pages
Python Unit 4
No ratings yet
Python Unit 4
9 pages
Practical 1 A
No ratings yet
Practical 1 A
3 pages
OMAC P&G PackML Implementation Guide
No ratings yet
OMAC P&G PackML Implementation Guide
33 pages
ImmuniWeb SSL Security Test Report - bDoTMy7U
No ratings yet
ImmuniWeb SSL Security Test Report - bDoTMy7U
12 pages
Lesson 2 - Nature and Organization of Matter
No ratings yet
Lesson 2 - Nature and Organization of Matter
6 pages
Automa General Catalog
No ratings yet
Automa General Catalog
44 pages
Issue 85 - Aug 2024 - Full Text Part 03
No ratings yet
Issue 85 - Aug 2024 - Full Text Part 03
1,291 pages
Cit 102 PDF
100% (1)
Cit 102 PDF
216 pages
Physics-II Lab Manual (Aug 2025)
No ratings yet
Physics-II Lab Manual (Aug 2025)
135 pages
Celestia Dot PDF
No ratings yet
Celestia Dot PDF
15 pages
7.5 - Dividing An Integer by A Fraction
No ratings yet
7.5 - Dividing An Integer by A Fraction
16 pages
Train-Test-Validation Split - Complete ML Interview Guide
No ratings yet
Train-Test-Validation Split - Complete ML Interview Guide
5 pages
Windows 10 System Info and Drivers
No ratings yet
Windows 10 System Info and Drivers
31 pages
PLS-SEM: Key Tool in Business Research
No ratings yet
PLS-SEM: Key Tool in Business Research
45 pages

Hashing

Uploaded by

Hashing

Uploaded by

Hashing

The sequential search algorithm takes time

Suppose that we want to store 10,000 students records (each with

·A linked list implementation would take O(n) time.

·A height balanced tree would give O(log n)access time.

·Using an array of size 100,000 would give O(1)access time but

Is there some way that we could get O(1)

Use hash (or hashing) function to map hash

Let keys be ID of 100 students And ID in

Now, we decided to take A[100]

So, 103062 will go to location 62 And same if

We will discuss chaining first, and ways to build “good”

How should we keep the lists: ordered or not?

Can easily remove the most recently inserted elements

Worst-case running time is O(1)

search for an element with key k in list T[h(k)]

Running time is proportional to the length of the list of

elements at location h(k)

Worst-case time to search is Θ(n),

Load factor of a hash table T:

n = # of elements stored in the table chain

α encodes the average number of elements chain

stored in a chain chain

α can be <, =, > 1 m-1

A hash function transforms a hash key into a hash table address

Strings such as pt and pts should hash to different slots

Map a key k into one of the m slots by taking the

K 3205 7148 2345

K2 10272025 51093904 5499025

The hash value obtained is 60

Key k is partitioned into a number of parts k1,

K 3205 7148 2345

32+05 71+48 23+45

k = k1, k2, k3, k4, ….., kn

No need to use linked lists anymore

Search time depends on the length of the

(h(k) + i) mod m, i=0,1,2,...

First slot probed: h(k)

insert(14) insert(8) insert(21) insert(2)

insert(14) insert(8) insert(21) insert(2)

You might also like