Database System Concepts
Chapter: Index and Hashing
Prof. Dr. Rafiqul Islam
31 July, 2025
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 1 / 33
Basic Concept of Indexing
An index in a database works like a book’s index: it maps search keys
to locations.
Helps retrieve data efficiently by avoiding full table scans.
Example: To find a student record by ID, use an index to locate the
disk block.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 2 / 33
Basic Concept of Indexing
An index in a database works like a book’s index: it maps search keys
to locations.
Helps retrieve data efficiently by avoiding full table scans.
Example: To find a student record by ID, use an index to locate the
disk block.
Why indexing is needed?
Improves search performance.
Reduces disk I/O.
Essential for large databases where linear search is impractical.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 2 / 33
Types of Indices
Ordered Index:
Index entries sorted by key values.
Suitable for range queries.
Hash Index:
Uses a hash function to map keys to buckets.
Efficient for exact-match queries.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 3 / 33
Types of Indices
Ordered Index:
Index entries sorted by key values.
Suitable for range queries.
Hash Index:
Uses a hash function to map keys to buckets.
Efficient for exact-match queries.
Trade-off:
Ordered: Better for sequential access.
Hash: Faster for direct lookups.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 3 / 33
Visual Comparison
Ordered Index Hash Index
Sorted structure Hash-based mapping
Supports range queries Supports point queries
Example: B+ Tree Example: Hash Table
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 4 / 33
Need for Diverse Indexing Strategies
No single indexing or hashing method is universally optimal.
Different database applications benefit from different indexing
approaches.
Evaluation must consider multiple performance factors.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 5 / 33
Need for Diverse Indexing Strategies
No single indexing or hashing method is universally optimal.
Different database applications benefit from different indexing
approaches.
Evaluation must consider multiple performance factors.
Example:
Searching books by author, subject, or title → multiple indices,
multiple search keys.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 5 / 33
Indexing Evaluation Metrics
Access Types:
Exact match queries
Range queries
Access Time: Time to locate one or more records.
Insertion Time: Includes search + index update cost.
Deletion Time: Includes item lookup + structure update.
Space Overhead: Additional storage cost for index structures.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 6 / 33
Defining Search Keys
A search key is an attribute or set of attributes used to find records.
Different from:
Primary key: Uniquely identifies tuples.
Candidate/Superkey: Alternative definitions of uniqueness.
Search keys may not be unique and can vary across indices.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 7 / 33
Dense vs. Sparse Index
Dense Index: Entry for every search-key value.
Sparse Index: Entry for only some search-key values.
Both help optimize record retrieval from large databases.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 8 / 33
Dense vs. Sparse Index
Dense Index: Entry for every search-key value.
Sparse Index: Entry for only some search-key values.
Both help optimize record retrieval from large databases.
Applicability:
Dense: Suitable for both clustering and nonclustering indices.
Sparse: Only applicable to clustering indices.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 8 / 33
Dense Index Variants
Clustering Index:
Records stored sorted by search key.
Index points to the first record in a sequence.
Nonclustering Index:
Records not sorted by search key.
Index stores a list of pointers to matching records.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 9 / 33
Dense Index Variants
Clustering Index:
Records stored sorted by search key.
Index points to the first record in a sequence.
Nonclustering Index:
Records not sorted by search key.
Index stores a list of pointers to matching records.
Example:
Instructor ID “22222” → direct pointer from dense index.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 9 / 33
Sparse Index Details
Requires relation to be sorted by search key.
Index stores entry for select keys only.
Lookup involves:
Finding the largest search-key target.
Sequential search from that point.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 10 / 33
Sparse Index Details
Requires relation to be sorted by search key.
Index stores entry for select keys only.
Lookup involves:
Finding the largest search-key target.
Sequential search from that point.
Example:
Looking for “22222”, index gives pointer to “10101”.
Scan forward to locate desired record.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 10 / 33
Dense and Sparse Indexing
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 11 / 33
Dense and Sparse Indexing
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 12 / 33
Dense and Sparse Indexing
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 13 / 33
Scaling Indexing with Relation Size
Problem: Large indices can’t fit entirely in memory.
Disk-based search (e.g., binary search) is slow due to multiple block
reads.
Example: 10,000 blocks → 14 reads → 140 ms per search.
Overflow blocks prevent binary search → require even slower
sequential scan.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 14 / 33
Scaling Indexing with Relation Size
Problem: Large indices can’t fit entirely in memory.
Disk-based search (e.g., binary search) is slow due to multiple block
reads.
Example: 10,000 blocks → 14 reads → 140 ms per search.
Overflow blocks prevent binary search → require even slower
sequential scan.
Implication: Need for a faster, scalable search mechanism.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 14 / 33
Multilevel Index Explained
Treat first-level index as a file.
Build a second-level index on top of first-level blocks.
Reduces search cost to:
One block read per level.
Significantly fewer total disk accesses.
Similar to B-tree architecture, without needing to balance nodes.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 15 / 33
Multilevel Index Explained
Treat first-level index as a file.
Build a second-level index on top of first-level blocks.
Reduces search cost to:
One block read per level.
Significantly fewer total disk accesses.
Similar to B-tree architecture, without needing to balance nodes.
Trade-off: Slight increase in space overhead for faster access.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 15 / 33
Search Time Comparison
Single-Level Index Multilevel Index
Binary Search → log(b) block Typically 2–3 block reads
reads Faster and scalable for big
Slower if b is large datasets
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 16 / 33
Dense and Sparse Indexing
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 17 / 33
Updating Index Structures
Every index must be updated on:
Record Insertion
Record Deletion
Attribute Update (modeled as delete + insert)
Affected indices depend on the updated search-key attributes.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 18 / 33
Updating Index Structures
Every index must be updated on:
Record Insertion
Record Deletion
Attribute Update (modeled as delete + insert)
Affected indices depend on the updated search-key attributes.
Example:
Change in instructor’s department → update index on dept name.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 18 / 33
Index Insertion Procedure
Dense Index
If search key not present → insert entry at correct position.
If present:
Store pointer to new record (if multi-pointer index).
Place record sequentially (if only first-pointer index).
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 19 / 33
Index Insertion Procedure
Dense Index
If search key not present → insert entry at correct position.
If present:
Store pointer to new record (if multi-pointer index).
Place record sequentially (if only first-pointer index).
Sparse Index
Insert search key if new block is created.
Update index entry only if new record is least in its block.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 19 / 33
Index Deletion Procedure
Dense Index
If only record with key → remove entry.
If multiple records:
Remove pointer to deleted record.
If first record is deleted → update index pointer.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 20 / 33
Index Deletion Procedure
Dense Index
If only record with key → remove entry.
If multiple records:
Remove pointer to deleted record.
If first record is deleted → update index pointer.
Sparse Index
If key not present → no update.
If present:
Replace or remove index entry based on key position.
Update pointer if it points to deleted record.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 20 / 33
Using Multiple Single-Key Indices
Example Query:
SELECT ID FROM instructor WHERE dept name = ’Finance’
AND salary = 80000;
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 21 / 33
Using Multiple Single-Key Indices
Example Query:
SELECT ID FROM instructor WHERE dept name = ’Finance’
AND salary = 80000;
Strategy Options:
1 Use index on dept name, filter by salary.
2 Use index on salary, filter by dept name.
3 Intersect results from both indices → most efficient.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 21 / 33
Limitations of Multi-Index Intersection
If many records match each individual condition...
But few satisfy both conditions...
Scanning many pointers yields small result set.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 22 / 33
Limitations of Multi-Index Intersection
If many records match each individual condition...
But few satisfy both conditions...
Scanning many pointers yields small result set.
Better Alternative:
Use a composite index or bitmap index.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 22 / 33
Indices on Composite Search Keys
Search Key: (dept name, salary)
Efficient for queries using both attributes.
Supports:
Equality on both: dept name = ’Finance’, salary = 80000
Equality on first, range on second: salary < 80000
Equality on first: treated as range query
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 23 / 33
Trade-offs of Composite Indices
Limitation:
Query with comparison on first key (e.g., dept name < ’Finance’)
leads to many I/O operations.
Data blocks might be scattered due to sorted ordering.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 24 / 33
Trade-offs of Composite Indices
Limitation:
Query with comparison on first key (e.g., dept name < ’Finance’)
leads to many I/O operations.
Data blocks might be scattered due to sorted ordering.
Conclusion:
Composite indices are powerful but must be tailored to query patterns.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 24 / 33
Static Hashing: Motivation
Why Hashing?
Avoids index structure and binary search overhead.
Direct access to bucket using a hash function.
Used both for file organization and hash-based indexing.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 25 / 33
Buckets and Basic Operations
Terminology:
Bucket: Unit of storage (often a disk block).
Hash Function: h : K → B, maps search key to bucket address.
Operations:
Insert: Compute h(Ki ) → place record.
Lookup: Compute h(Ki ) → search bucket.
Delete: Compute h(Ki ) → find and remove record.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 26 / 33
Collision Handling
Scenario: h(K5 ) = h(K7 )
Multiple records in same bucket.
Need to compare actual key values to resolve.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 27 / 33
Collision Handling
Scenario: h(K5 ) = h(K7 )
Multiple records in same bucket.
Need to compare actual key values to resolve.
Challenge: Avoid poor hash functions mapping everything to same
bucket.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 27 / 33
Desirable Properties of Hash Functions
Uniformity: Keys evenly distributed across all buckets.
Randomness: No correlation to natural key ordering.
Scalability: Effective even with unknown future data.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 28 / 33
Desirable Properties of Hash Functions
Uniformity: Keys evenly distributed across all buckets.
Randomness: No correlation to natural key ordering.
Scalability: Effective even with unknown future data.
Examples:
Alphabet-based hash: Simple, not uniform.
Salary range hash: Uniform if salary distribution is even.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 28 / 33
Bucket Overflow Scenarios
Definition: Overflow occurs when a bucket can’t store a newly inserted
record.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 29 / 33
Bucket Overflow Scenarios
Definition: Overflow occurs when a bucket can’t store a newly inserted
record.
Causes:
nr
Insufficient Buckets: nB < fr
Skewed Distribution:
Multiple records with same key.
Poor hash function → uneven spread.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 29 / 33
Hashing
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 30 / 33
Design Strategy to Minimize Overflow
Bucket Count Formula:
nr
B= × (1 + d), d ≈ 0.2
fr
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 31 / 33
Design Strategy to Minimize Overflow
Bucket Count Formula:
nr
B= × (1 + d), d ≈ 0.2
fr
Trade-off:
Wastes 20
But lowers chance of overflow.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 31 / 33
Overflow Chaining
Concept:
Extra bucket allocated if primary is full.
All overflow buckets are linked in a list.
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 32 / 33
Overflow Chaining
Concept:
Extra bucket allocated if primary is full.
All overflow buckets are linked in a list.
Structure:
Bucket b Overflow Bucket 1 Overflow Bucket 2
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 32 / 33
Hashing
Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 33 / 33