ALGORITHMS AND DATA STRUCTURES
LECTURE 5 – HASH TABLE AND BST
Askar Khaimuldin
[email protected]CONTENT
1. Hashing
2. Hash Table
3. Binary Search Tree
4. BST: Inorder Traversal
HASHING
Hashing means using some function or algorithm to map object data to some
representative integer value
This so-called hash code (or simply hash) can then be used as a way to
narrow down our search when looking for the item in the set
Object class contains hashCode() method with its default implementation
Recommended: Each class provides its own implementation of hashCode()
HASHING: STRING EXAMPLE
Horner's method to hash string of length L: L multiplies/adds.
Equivalent to ℎ = 𝑠[0] · 31𝐿−1 + … + 𝑠[𝐿 – 3] · 312 + 𝑠[𝐿 – 2] · 311 + 𝑠[𝐿 – 1] · 310
HASHING:‘STANDARD’ RECIPE
Combine each significant field using the
31x + y rule.
If field is a primitive type, use wrapper
type hashCode()
If field is null, return 0
If field is a reference type, use
hashCode()
If field is an array, apply to each entry
HASH TABLE
Hash table maps keys to values. Any non-null object
can be used as a key or as a value
To successfully store and retrieve objects from a
hashtable, the objects used as keys must implement the
hashCode method and the equals method.
It looks like “an array of singly-linked lists (chains)”
Each linked list is accepted as bucket
Array size indicates number of buckets
Average case:
• Insertion = deletion = retrieving = searching = O(1)
HASH TABLE
The capacity (number of buckets - M) and load factor are parameters that affect to
its performance
The load factor is a measure of how full the hash table is allowed to get before its
capacity is automatically increased (LF should be around 0.75)
The hashCode is used to get an index of chain by hash() method (Modular hashing)
HASH TABLE
Best case
Collision – having same index for several nodes (cannot be
avoided)
A new node should be added to the same chain (bucket)
Challenge: Deal with collisions efficiently
Target: Uniform distribution
Analysis: Number of probes for search/insert is proportional to
N/M Worst case
M too large ⇒ too many empty chains
M too small ⇒ chains too long
Typical choice: M ~ N / 4 ⇒ constant-time ops
Once a hash table has passed its load factor - it has to rehash
[create a new bigger table, and re-insert each element to the
table]
HASH TABLE: EXAMPLE
BINARY SEARCH TREE
A BST is a binary tree in symmetric order.
Each node has two references to left and right nodes
Symmetric order. Each node has a key, and every
node’s key is:
Larger than all keys in its left subtree
Smaller than all keys in its right subtree
A Node is composed of four fields
Key and Value
Left and right subtree references
BINARY SEARCH TREE
A BST uses O(log(N)) for most manipulations
Search: If less, go left; if greater, go right; if equal,
search hit
Insert: If less, go left; if greater, go right; if null, insert
GetMin(): Most left node
GetMax(): Most right node
BINARY SEARCH TREE: DELETE
To delete a node with key k: search
for node t containing key k
Case 1 (1 child): Delete t by replacing
parent link
Case 2 (2 children):
Find successor x of t
Delete the minimum in t's right subtree
Put x in t's spot
BST: INORDER TRAVERSAL
In BST, inorder traversal is used to get nodes in increasing order (Left-Root-Right)
Ordered iteration
Traverse left subtree
Enqueue key
Traverse right subtree
LITERATURE
Algorithms, 4th Edition, by Robert Sedgewick and Kevin Wayne, Addison-Wesley
Chapters 3.2, 3.4
Grokking Algorithms, by Aditya Y. Bhargava, Manning
Chapter 5
GOOD LUCK!