CSE 332 Winter 2024
Lecture 12: Hashing
Nathan Brunelle
http://www.cs.uw.edu/332
Dictionary Data Structures
Data Structure Time to insert Time to find Time to delete
Unsorted Array Θ(𝑛) Θ(𝑛) Θ(𝑛)
Unsorted Linked List Θ(𝑛) Θ(𝑛) Θ(𝑛)
Sorted Array Θ 𝑛 Θ(log 𝑛) Θ(𝑛)
Sorted Linked List Θ 𝑛 Θ 𝑛 Θ 𝑛
Binary Search Tree Θ 𝑛 Θ 𝑛 Θ 𝑛
AVL Tree Θ(log 𝑛) Θ(log 𝑛) Θ(log 𝑛)
Hash Table (Worst case) Θ(𝑛) Θ(𝑛) Θ(𝑛)
Hash Table (Average) Θ 1 Θ 1 Θ 1
Hash Tables
• Idea:
• Have a small array to store information
• Use a hash function to convert the key into an index
• Hash function should “scatter” the keys, behave as if it randomly assigned keys to indices
• Store key at the index given by the hash function
• Do something if two keys map to the same place (should be very rare)
• Collision resolution
Index
Insert / find /
ℎ(𝑘) between 0
delete & value
and size-1
Key Object
Properties of a “Good” Hash
• Definition: A hash function maps objects to integers
• Should be very efficient
• Calculating the hash should be negligible
• Should randomly scatter objects
• Objects that are similar to each other should be likely to end up far away
• Should use the entire table
• There should not be any indices in the table that nothing can hash to
• Picking a table size that is prime helps with this
• Should use things needed to “identify” the object
• Use only fields you would check for a .equals method be included in calculating the hash
• More fields typically leads to fewer collisions, but less efficient calculation
A Bad Hash (and phone number trivia)
• ℎ 𝑝ℎ𝑜𝑛𝑒 = the first digit of the phone number
• No US phone numbers start with 1 or 0
• If we’re sampling from this class, 2 is by far the most likely
0 1 2 3 4 5 6 7 8 9
Compare These Hash Functions (for strings)
• Let 𝑠 = 𝑠0 𝑠1 𝑠2 … 𝑠𝑚−1 be a string of length 𝑚
• Let 𝑎(𝑠𝑖 ) be the ascii encoding of the character 𝑠𝑖
• ℎ1 𝑠 = 𝑎 𝑠0
• ℎ2 𝑠 = σ𝑚−1
𝑖=0 𝑎 𝑠𝑖
• ℎ3 𝑠 = σ𝑚−1
𝑖=0 𝑎 𝑠𝑖 ⋅ 37𝑖
Collision Resolution
• A Collision occurs when we want to insert something into an already-
occupied position in the hash table
• 2 main strategies:
• Separate Chaining
• Use a secondary data structure to contain the items
• E.g. each index in the hash table is itself a linked list
• Open Addressing
• Use a different spot in the table instead
• Linear Probing
• Quadratic Probing
• Double Hashing
0 1 2 3 4 5 6 7 8 9
Separate Chaining Insert
• To insert 𝑘, 𝑣:
• Compute the index using 𝑖 = ℎ 𝑘 % 𝑠𝑖𝑧𝑒
• Add the key-value pair to the data structure at 𝑡𝑎𝑏𝑙𝑒 𝑖
𝑘, 𝑣
𝑘, 𝑣 𝑘, 𝑣
0 1 2 3 4 5 6 7 8 9
Separate Chaining Find
• To find 𝑘:
• Compute the index using 𝑖 = ℎ 𝑘 % 𝑠𝑖𝑧𝑒
• Call find with the key on the data structure at 𝑡𝑎𝑏𝑙𝑒 𝑖
𝑘, 𝑣
𝑘, 𝑣 𝑘, 𝑣
0 1 2 3 4 5 6 7 8 9
Separate Chaining Delete
• To delete 𝑘:
• Compute the index using 𝑖 = ℎ 𝑘 % 𝑠𝑖𝑧𝑒
• Call delete with the key on the data structure at 𝑡𝑎𝑏𝑙𝑒 𝑖
𝑘, 𝑣
𝑘, 𝑣 𝑘, 𝑣
0 1 2 3 4 5 6 7 8 9
Formal Running Time Analysis
• The load factor of a hash table represents the average number of
items per “bucket”
𝑛
• 𝜆=
𝑠𝑖𝑧𝑒
• Assume we have a has table that uses a linked-list for separate
chaining
• What is the expected number of comparisons needed in an unsuccessful find?
• What is the expected number of comparisons needed in a successful find?
• How can we make the expected running time Θ(1)?
Load Factor?
𝑘, 𝑣
𝑘, 𝑣 𝑘, 𝑣
0 1 2 3 4 5 6 7 8 9
𝑘, 𝑣
Load Factor? 𝑘, 𝑣 𝑘, 𝑣
𝑘, 𝑣 𝑘, 𝑣
𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣
0 1 2 3 4 5 6 7 8 9
𝑘, 𝑣
Load Factor? 𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣
𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣
𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣
0 1 2 3 4 5 6 7 8 9
Collision Resolution: Linear Probing
• When there’s a collision, use the next open space in the table
0 1 2 3 4 5 6 7 8 9
Linear Probing: Insert Procedure
• To insert 𝑘, 𝑣
• Calculate 𝑖 = ℎ 𝑘 % 𝑠𝑖𝑧𝑒
• If 𝑡𝑎𝑏𝑙𝑒[𝑖] is occupied then try 𝑖 + 1 % 𝑠𝑖𝑧𝑒
• If that is occupied try 𝑖 + 2 % 𝑠𝑖𝑧𝑒
• If that is occupied try 𝑖 + 3 % 𝑠𝑖𝑧𝑒
• …
0 1 2 3 4 5 6 7 8 9
Linear Probing: Find
• Let’s do this together!
Linear Probing: Find
• To find key 𝑘
• Calculate 𝑖 = ℎ 𝑘 % 𝑠𝑖𝑧𝑒
• If 𝑡𝑎𝑏𝑙𝑒 𝑖 is occupied and does not contain 𝑘 then look at 𝑖 + 1 % 𝑠𝑖𝑧𝑒
• If that is occupied and does not contain 𝑘 then look at 𝑖 + 2 % 𝑠𝑖𝑧𝑒
• If that is occupied and does not contain 𝑘 then look at 𝑖 + 3 % 𝑠𝑖𝑧𝑒
• Repeat until you either find 𝑘 or else you reach an empty cell in the table
Linear Probing: Delete
• Let’s do this together!
Linear Probing: Delete
• Option 1: Find the last thing with a matching hash, move that into the
spot you deleted from
• Option 2: Called “tombstone” deletion. Leave a special object that
indicates an object was deleted from there
• The tombstone does not act as an open space when finding (so keep looking
after its reached)
• When inserting you can replace a tombstone with a new item
𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣 𝑘, 𝑣
0 1 2 3 4 5 6 7 8 9
Downsides of Linear Probing
• What happens when 𝜆 approaches 1?
• What happens when 𝜆 exceeds 1?
Quadratic Probing: Insert Procedure
• To insert 𝑘, 𝑣
• Calculate 𝑖 = ℎ 𝑘 % 𝑠𝑖𝑧𝑒
• If 𝑡𝑎𝑏𝑙𝑒[𝑖] is occupied then try 𝑖 + 12 % 𝑠𝑖𝑧𝑒
• If that is occupied try 𝑖 + 22 % 𝑠𝑖𝑧𝑒
• If that is occupied try 𝑖 + 32 % 𝑠𝑖𝑧𝑒
• If that is occupied try 𝑖 + 42 % 𝑠𝑖𝑧𝑒
• …
0 1 2 3 4 5 6 7 8 9
Quadratic Probing: Example
• Insert:
• 76
• 40
• 48
• 5
• 55
• 47
0 1 2 3 4 5 6
Using Quadratic Probing
• If you probe 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒 times, you start repeating the same indices
1
• If 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒 is prime and 𝜆 < then you’re guaranteed to find an
2
open spot in at most 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒/2 probes
• Helps with the clustering problem of linear probing, but does not help
if many things hash to the same value
Double Hashing: Insert Procedure
• Given ℎ and 𝑔 are both good hash functions
• To insert 𝑘, 𝑣
• Calculate 𝑖 = ℎ 𝑘 % 𝑠𝑖𝑧𝑒
• If 𝑡𝑎𝑏𝑙𝑒[𝑖] is occupied then try 𝑖 + 𝑔 𝑘 % 𝑠𝑖𝑧𝑒
• If that is occupied try 𝑖 + 2 ⋅ 𝑔 𝑘 % 𝑠𝑖𝑧𝑒
• If that is occupied try 𝑖 + 3 ⋅ 𝑔 𝑘 % 𝑠𝑖𝑧𝑒
• If that is occupied try 𝑖 + 4 ⋅ 𝑔 𝑘 % 𝑠𝑖𝑧𝑒
• …
0 1 2 3 4 5 6 7 8 9
Rehashing
• If your load factor 𝜆 gets too large, copy everything over to a larger
hash table
• To do this: make a new array with a new hash function
• Re-insert all items into the new hash table with the new hash function
• New hash table should be “roughly” double the size (but probably still want it
to be prime)