0% found this document useful (0 votes)

27 views61 pages

Day3.2 DS2 HashTablesHeaps

Hash tables

Uploaded by

Rishav Dhama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views61 pages

Day3.2 DS2 HashTablesHeaps

Hash tables

Uploaded by

Rishav Dhama

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Hash Tables

Reference

Introduction to Algorithms
by
T.H. Cormen
C.E. Leiserson
R.L. Rivest
C. Stein
Hash Tables
• Motivation: symbol tables
– A compiler uses a symbol table to relate symbols
to associated data
• Symbols: variable names, procedure names, etc.
• Associated data: memory location, call graph, etc.
– For a symbol table (also called a dictionary), we
care about search, insertion, and deletion
– We typically don’t care about sorted order
Hash Tables
• More formally:
– Given a table T and a record x, with key (= symbol)
and satellite data, we need to support:
• Insert (T, x)
• Delete (T, x)
• Search(T, x)
– We want these to be fast, but don’t care about
sort the records
• The structure we will use is a hash table
– Supports all the above in O(1) expected time!
Hashing: Keys
• In the following discussions we will consider
all keys to be (possibly large) natural numbers
• How can we convert floats to natural numbers
for hashing purposes?
• How can we convert ASCII strings to natural
numbers for hashing purposes?
Direct Addressing
• Suppose:
– The range of keys is 0..m-1
– Keys are distinct
• The idea:
– Set up an array T[0..m-1] in which
• T[i] = x if x T and key[x] = i
• T[i] = NULL otherwise
– This is called a direct-address table
• Operations take O(1) time!
• So what’s the problem?
The Problems
• Direct addressing works well when the range
m of keys is relatively small
• But what if the keys are 32-bit integers?
– Problem 1: direct-address table will have
232 entries, more than 4 billion
– Problem 2: even if memory is not an issue, the
time to initialize the elements to NULL may be
• Solution: map keys to smaller range 0..m-1
• This mapping is called a hash function
Hash Functions
• Next problem: collision
T

U 0
(universe of keys)
h(k1)
k1
h(k4)
K k4
(actual k5
h(k2) = h(k5)
keys)

k2 h(k3)
k3

m-1
Resolving Collisions
• How can we solve the problem of collisions?
• Solution 1: chaining
• Solution 2: open addressing
Open Addressing
• Basic idea:
– To insert: if slot is full, try another slot, …, until an
open slot is found (probing)
– To search, follow same sequence of probes as would
be used when inserting the element
• If reach element with correct key, return it
• If reach a NULL pointer, element is not in table
• Good for fixed sets (adding but no deletion)
– Example: spell checking
• Table needn’t be much bigger than n
Chaining
• Chaining puts elements that hash to the same
slot in a linked list: T

U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 ——
k8 k3
k6
k8 k6 ——
——
Chaining
• Chaining puts elements that hash to the same
slot in a linked list: T
U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we insert an element?
T
U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we delete an element?
T
U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we search for a element with a
given key? T
U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Analysis of Chaining
• Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
• Given n keys and m slots in the table: the
load factor  = n/m = average # keys per slot
• What will be the average cost of an
unsuccessful search for a key?
Analysis of Chaining
• Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
• Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
• What will be the average cost of an
unsuccessful search for a key? A: O(1+)
Analysis of Chaining
• Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
• Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
• What will be the average cost of an
unsuccessful search for a key? A: O(1+)
• What will be the average cost of a successful
search?
Analysis of Chaining
• Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
• Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
• What will be the average cost of an
unsuccessful search for a key? A: O(1+)
• What will be the average cost of a successful
search? A: O(1 + /2) = O(1 + )
Analysis of Chaining
• So the cost of searching = O(1 + )
• If the number of keys n is proportional to the
number of slots in the table, what is ?
• A:  = O(1)
– In other words, we can make the expected cost of
searching constant if we make  constant
Choosing A Hash Function
• Clearly choosing the hash function well is
crucial
– What will a worst-case hash function do?
– What will be the time to search in this case?
• What are desirable features of the hash
function?
– Should distribute keys uniformly into slots
– Should not depend on patterns in the data
Hash Functions:
The Division Method
• h(k) = k mod m
– In words: hash k into a table with m slots using the
slot given by the remainder of k divided by m
• What happens to elements with adjacent
values of k?
• What happens if m is a power of 2 (say 2P)?
• What if m is a power of 10?
• Upshot: pick table size m = prime number not
too close to a power of 2 (or 10)
Hash Functions:
The Multiplication Method
• For a constant A, 0 < A < 1:
• h(k) =  m (kA - kA) 
What does this term represent?
Hash Functions:
The Multiplication Method
• For a constant A, 0 < A < 1:
• h(k) =  m (kA - kA) 
Fractional part of kA

• Choose m = 2P
• Choose A not too close to 0 or 1
• Knuth: Good choice for A = (5 - 1)/2
Data Structures

11/08/2022 25
HEAPS

11/08/2022 26
Heaps
• A heap can be seen as a complete binary tree:
16

14 10

8 7 9 3

2 4 1

– What makes a binary tree complete?

– Is the example above complete?

11/08/2022 27
Heaps
• A heap can be seen as a complete binary tree:
16

14 10

8 7 9 3

2 4 1 1 1 1 1 1

– The book calls them “nearly complete” binary

trees; can think of unfilled slots as null pointers

11/08/2022 28
Heaps
• In practice, heaps are usually implemented as
arrays:
16

14 10

8 7 9 3
A = 16 14 10 8 7 9 3 2 4 1 =
2 4 1

11/08/2022 29
Heaps
• To represent a complete binary tree as an
array:
– The root node is A[1]
– Node i is A[i]
– The parent of node i is A[i/2] (note: integer divide)
– The left child of node i is A[2i]
– The right child of node i is A[2i + 1] 16

14 10

8 7 9 3
A = 16 14 10 8 7 9 3 2 4 1 =
2 4 1

11/08/2022 30
Referencing Heap Elements
• So…
Parent(i) { return i/2; }
Left(i) { return 2*i; }
right(i) { return 2*i + 1; }
• An aside: How would you implement this
most efficiently?
• Another aside: Really?

11/08/2022 31
The Heap Property
• Heaps also satisfy the heap property:
A[Parent(i)]  A[i] for all nodes i > 1
– In other words, the value of a node is at most the
value of its parent
– Where is the largest element in a heap stored?
• Definitions:
– The height of a node in the tree = the number of
edges on the longest downward path to a leaf
– The height of a tree = the height of its root

11/08/2022 32
Heap Height
• What is the height of an n-element heap?
Why?
• This is nice: basic heap operations take at
most time proportional to the height of the
heap

11/08/2022 33
Heap Operations: Heapify()
• Heapify(): maintain the heap property
– Given: a node i in the heap with children l and r
– Given: two subtrees rooted at l and r, assumed to
be heaps
– Problem: The subtree rooted at i may violate the
heap property (How?)
– Action: let the value of the parent node “float
down” so subtree at i satisfies the heap property
• What do you suppose will be the basic operation
between i, l, and r?

11/08/2022 34
Heap Operations: Heapify()
Heapify(A, i)
{
l = Left(i); r = Right(i);
if (l <= heap_size(A) && A[l] > A[i])
largest = l;
else
largest = i;
if (r <= heap_size(A) && A[r] > A[largest])
largest = r;
if (largest != i)
Swap(A, i, largest);
Heapify(A, largest);
}

11/08/2022 35
Heapify() Example

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1

11/08/2022 36
Heapify() Example

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1

11/08/2022 37
Heapify() Example

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1

11/08/2022 38
Heapify() Example

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1

11/08/2022 39
Heapify() Example

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1

11/08/2022 40
Heapify() Example

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1

11/08/2022 41
Heapify() Example

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1

11/08/2022 42
Heapify() Example

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1

11/08/2022 43
Heapify() Example

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1

11/08/2022 44
Analyzing Heapify(): Informal
• Aside from the recursive call, what is the
running time of Heapify()?
• How many times can Heapify() recursively
call itself?
• What is the worst-case running time of
Heapify() on a heap of size n?

11/08/2022 45
Analyzing Heapify(): Formal
• Fixing up relationships between i, l, and r
takes (1) time
• If the heap at i has n elements, how many
elements can the subtrees at l or r have?
– Draw it
• Answer: 2n/3 (worst case: bottom row 1/2
full)
• So time taken by Heapify() is given by
T(n)  T(2n/3) + (1)
11/08/2022 46
2/3 bound
• In the worst case, for the heap with i internal
nodes and r leaves,
• 2i = r + i – 1
Fraction of nodes in left subtree =
• => r = i + 1 (2m+1)/(3m+2) ≤ 2/3

m m

m +1

11/08/2022 47
Analyzing Heapify(): Formal
• So we have
T(n)  T(2n/3) + (1)
• By case 2 of the Master Theorem,
T(n) = O(lg n)
• Thus, Heapify() takes logarithmic time

11/08/2022 48
Heap Operations: BuildHeap()
• We can build a heap in a bottom-up manner
by running Heapify() on successive
subarrays
– Fact: for array of length n, all elements in range
A[n/2 + 1 .. n] are heaps (Why?)
– So:
• Walk backwards through the array from n/2 to 1, calling
Heapify() on each node.
• Order of processing guarantees that the children of
node i are heaps when i is processed

11/08/2022 49
BuildHeap()
// given an unsorted array A, make A a heap
BuildHeap(A)
{
heap_size(A) = length(A);
for (i = length[A]/2 downto 1)
Heapify(A, i);
}

11/08/2022 50
BuildHeap() Example
• Work through example
A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7}

1 3

2 16 9 10

14 8 7

11/08/2022 51
Analyzing BuildHeap()
• Each call to Heapify() takes O(lg n) time
• There are O(n) such calls (specifically, n/2)
• Thus the running time is O(n lg n)
– Is this a correct asymptotic upper bound?
– Is this an asymptotically tight bound?
• A tighter bound is O(n)
– How can this be? Is there a flaw in the above
reasoning?

11/08/2022 52
Analyzing BuildHeap(): Tight
• To Heapify() a subtree takes O(h) time
where h is the height of the subtree
– h = O(lg m), m = # nodes in subtree
– The height of most subtrees is small
• Fact: an n-element heap has at most n/2h+1
nodes of height h
• Can be used to prove that BuildHeap()
takes O(n) time

11/08/2022 53
Heapsort
• Given BuildHeap(), an in-place sorting
algorithm is easily constructed:
– Maximum element is at A[1]
– Discard by swapping with element at A[n]
• Decrement heap_size[A]
• A[n] now contains correct value
– Restore heap property at A[1] by calling
Heapify()
– Repeat, always swapping A[1] for A[heap_size(A)]

11/08/2022 54
Heapsort
Heapsort(A)
{
BuildHeap(A);
for (i = length(A) downto 2)
{
Swap(A[1], A[i]);
heap_size(A) -= 1;
Heapify(A, 1);
}
}

11/08/2022 55
Analyzing Heapsort
• The call to BuildHeap() takes O(n) time
• Each of the n - 1 calls to Heapify() takes
O(lg n) time
• Thus the total time taken by HeapSort()
= O(n) + (n - 1) O(lg n)
= O(n) + O(n lg n)
= O(n lg n)

11/08/2022 56
Problems
• 1. Design an algorithm to merge k large sorted
arrays of a total of n elements using no more
than O(k) additional storage. Here k << n.
• 2. Design an algorithm which reads in n
integers one by one and then returns the k
smallest ones. Here k << n and n is unknown
to start with.

11/08/2022 57
Priority Queues
• Heapsort is a nice algorithm, but in practice
Quicksort usually wins
• But the heap data structure is incredibly
useful for implementing priority queues
– A data structure for maintaining a set S of
elements, each with an associated value or key
– Supports the operations Insert(),
Maximum(), and ExtractMax()
– What might a priority queue be useful for?

11/08/2022 58
Priority Queue Operations
• Insert(S, x) inserts the element x into set S
• Maximum(S) returns the element of S with
the maximum key
• ExtractMax(S) removes and returns the
element of S with the maximum key
• How could we implement these operations
using a heap?

11/08/2022 59
Example of Insertion to Max Heap
• Add at the end, move upwards
20 20 21

15 2 15 5 15 20

14 10 14 10 2 14 10 2

initial location of new node insert 5 into heap insert 21 into heap

11/08/2022 60
Example of Deletion from Max Heap
• EXTRACT_MAX
remove
20 10 15

15 2 15 2 14 2

14 10 14 10

11/08/2022 61

Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Lecture 6
No ratings yet
Lecture 6
57 pages
Direct Addressing in Hash Tables
No ratings yet
Direct Addressing in Hash Tables
35 pages
Data Structures: Hashing & Search
No ratings yet
Data Structures: Hashing & Search
55 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
CH 4
No ratings yet
CH 4
58 pages
Chapter 1-3 Algorithm Analysis
No ratings yet
Chapter 1-3 Algorithm Analysis
181 pages
Design and Analysis of Algorithm
No ratings yet
Design and Analysis of Algorithm
182 pages
CS 332: Algorithms: Hash Tables
No ratings yet
CS 332: Algorithms: Hash Tables
26 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Hashing Techniques Done
No ratings yet
Hashing Techniques Done
53 pages
Module V Unit 2 Hashing
No ratings yet
Module V Unit 2 Hashing
41 pages
Hashing
No ratings yet
Hashing
5 pages
06 Hashing
No ratings yet
06 Hashing
6 pages
Hash
No ratings yet
Hash
5 pages
Primary Clustering in Hashing
No ratings yet
Primary Clustering in Hashing
61 pages
4
No ratings yet
4
29 pages
Module 5
No ratings yet
Module 5
72 pages
Ads 1
No ratings yet
Ads 1
8 pages
Advanced Data Structures Key - M.tech
No ratings yet
Advanced Data Structures Key - M.tech
25 pages
DS Revision On Heap
No ratings yet
DS Revision On Heap
34 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
51 pages
Ders7 - Data Structures and Search Algorithms
No ratings yet
Ders7 - Data Structures and Search Algorithms
41 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
CMP2030 L02 Hashing
No ratings yet
CMP2030 L02 Hashing
21 pages
Hash Tables
No ratings yet
Hash Tables
20 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
11 Hash Tables Slides
No ratings yet
11 Hash Tables Slides
34 pages
Open Addressing Hash Table Operations
No ratings yet
Open Addressing Hash Table Operations
16 pages
L-2005-08-Advance Data Structure Part 1-HS
No ratings yet
L-2005-08-Advance Data Structure Part 1-HS
46 pages
CH 06
No ratings yet
CH 06
23 pages
DSA Chapter 08 (Searching)
No ratings yet
DSA Chapter 08 (Searching)
65 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
Hash Table
No ratings yet
Hash Table
9 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
CS2040 Summary
No ratings yet
CS2040 Summary
16 pages
Unit I
No ratings yet
Unit I
95 pages
Dsa Merged
No ratings yet
Dsa Merged
339 pages
Hash Table Search Complexity Explained
No ratings yet
Hash Table Search Complexity Explained
43 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
Introduction to Hashing Techniques
No ratings yet
Introduction to Hashing Techniques
65 pages
9A Hash Tables
No ratings yet
9A Hash Tables
7 pages
6 - Hashing
No ratings yet
6 - Hashing
52 pages
Hash Tables and Collision Resolution
No ratings yet
Hash Tables and Collision Resolution
47 pages
Hashing & Collision Techniques Notes
No ratings yet
Hashing & Collision Techniques Notes
23 pages
22csc22 Cat-3.1 - Answer Key
No ratings yet
22csc22 Cat-3.1 - Answer Key
22 pages
Hashing and Indexing Techniques Explained
No ratings yet
Hashing and Indexing Techniques Explained
28 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hash Tables: Concepts and Applications
No ratings yet
Hash Tables: Concepts and Applications
15 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
Lecture 6 - Searching
No ratings yet
Lecture 6 - Searching
41 pages
Lecture9f13 Hashing
No ratings yet
Lecture9f13 Hashing
29 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
44 pages
Chapter 8 - Searching and Hashing
No ratings yet
Chapter 8 - Searching and Hashing
53 pages
Hashing
No ratings yet
Hashing
37 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Day4.0 Classes
No ratings yet
Day4.0 Classes
83 pages
Day1.5 ShellProgramming
No ratings yet
Day1.5 ShellProgramming
39 pages
Day1.3 NetworkingConcepts
No ratings yet
Day1.3 NetworkingConcepts
33 pages
Day1.2 SQL
No ratings yet
Day1.2 SQL
64 pages
6.4 Practice
No ratings yet
6.4 Practice
4 pages
Getting Started With OpenSees - Loads and Analysis - OpenSeesWiki
No ratings yet
Getting Started With OpenSees - Loads and Analysis - OpenSeesWiki
2 pages
Beginner's Guide to C Programming
100% (1)
Beginner's Guide to C Programming
3 pages
Machine Learning in Body Area Networks
No ratings yet
Machine Learning in Body Area Networks
51 pages
Morgan Stanley Interview Topic Covered
No ratings yet
Morgan Stanley Interview Topic Covered
4 pages
Theoretical Foundations of CS
No ratings yet
Theoretical Foundations of CS
4 pages
Week 6 - Tutorial Solutions
No ratings yet
Week 6 - Tutorial Solutions
2 pages
Mailam Engineering College Mailam (Po), Villupuram (DT) - Pin: 604 304
No ratings yet
Mailam Engineering College Mailam (Po), Villupuram (DT) - Pin: 604 304
43 pages
CS 188 AI Final Exam Fall 2011
No ratings yet
CS 188 AI Final Exam Fall 2011
21 pages
National Institute of Technology Rourkela
No ratings yet
National Institute of Technology Rourkela
2 pages
Indu 421: Facilities Design and Material Handling Systems: Problem 1
No ratings yet
Indu 421: Facilities Design and Material Handling Systems: Problem 1
3 pages
Decidable and Undecidable Problems in Theory of Computation
No ratings yet
Decidable and Undecidable Problems in Theory of Computation
12 pages
Codejava107: 1 .Banking Scenario, Currentaccount & Savingsaccount
No ratings yet
Codejava107: 1 .Banking Scenario, Currentaccount & Savingsaccount
7 pages
Week 04
No ratings yet
Week 04
101 pages
CSC263 Notes
No ratings yet
CSC263 Notes
101 pages
Automata and Complexity Theory Exam
No ratings yet
Automata and Complexity Theory Exam
223 pages
Zero Lecture MTH401.Pptx Final
No ratings yet
Zero Lecture MTH401.Pptx Final
33 pages
Binary Tree Traversal Methods
No ratings yet
Binary Tree Traversal Methods
4 pages
Logic Gates Logic Gates Symbol Truth Table Function: A AB B AND
No ratings yet
Logic Gates Logic Gates Symbol Truth Table Function: A AB B AND
2 pages
Dsa 2025
No ratings yet
Dsa 2025
3 pages
Compiler Construction Lab2
No ratings yet
Compiler Construction Lab2
11 pages
Sheet 2
No ratings yet
Sheet 2
6 pages
SJSU EE288 Lecture18 SAR ADC
No ratings yet
SJSU EE288 Lecture18 SAR ADC
15 pages
Deadlock Analysis Using Banker's Algorithm
No ratings yet
Deadlock Analysis Using Banker's Algorithm
2 pages
CTE 121 Assignment
100% (1)
CTE 121 Assignment
1 page
Java Dsa Question Ans
No ratings yet
Java Dsa Question Ans
21 pages
LL(1) Parsing and Left Factoring Guide
No ratings yet
LL(1) Parsing and Left Factoring Guide
19 pages
Binary Arithmetic & Alus: O Objectives O Reading Assignment
No ratings yet
Binary Arithmetic & Alus: O Objectives O Reading Assignment
10 pages
Toa Shorts1
No ratings yet
Toa Shorts1
8 pages
Ada Selection Sort
No ratings yet
Ada Selection Sort
13 pages

Day3.2 DS2 HashTablesHeaps

Uploaded by

Day3.2 DS2 HashTablesHeaps

Uploaded by

Hash Tables

– What makes a binary tree complete?

– The book calls them “nearly complete” binary

You might also like