0% found this document useful (0 votes)
30 views52 pages

Lecture 12 Database - Systems

Uploaded by

ifrat.official25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views52 pages

Lecture 12 Database - Systems

Uploaded by

ifrat.official25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Database System Concepts

Chapter: Index and Hashing

Prof. Dr. Rafiqul Islam

31 July, 2025

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 1 / 33


Basic Concept of Indexing

An index in a database works like a book’s index: it maps search keys


to locations.
Helps retrieve data efficiently by avoiding full table scans.
Example: To find a student record by ID, use an index to locate the
disk block.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 2 / 33


Basic Concept of Indexing

An index in a database works like a book’s index: it maps search keys


to locations.
Helps retrieve data efficiently by avoiding full table scans.
Example: To find a student record by ID, use an index to locate the
disk block.
Why indexing is needed?
Improves search performance.
Reduces disk I/O.
Essential for large databases where linear search is impractical.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 2 / 33


Types of Indices

Ordered Index:
Index entries sorted by key values.
Suitable for range queries.
Hash Index:
Uses a hash function to map keys to buckets.
Efficient for exact-match queries.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 3 / 33


Types of Indices

Ordered Index:
Index entries sorted by key values.
Suitable for range queries.
Hash Index:
Uses a hash function to map keys to buckets.
Efficient for exact-match queries.
Trade-off:
Ordered: Better for sequential access.
Hash: Faster for direct lookups.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 3 / 33


Visual Comparison

Ordered Index Hash Index


Sorted structure Hash-based mapping
Supports range queries Supports point queries
Example: B+ Tree Example: Hash Table

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 4 / 33


Need for Diverse Indexing Strategies

No single indexing or hashing method is universally optimal.


Different database applications benefit from different indexing
approaches.
Evaluation must consider multiple performance factors.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 5 / 33


Need for Diverse Indexing Strategies

No single indexing or hashing method is universally optimal.


Different database applications benefit from different indexing
approaches.
Evaluation must consider multiple performance factors.
Example:
Searching books by author, subject, or title → multiple indices,
multiple search keys.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 5 / 33


Indexing Evaluation Metrics

Access Types:
Exact match queries
Range queries
Access Time: Time to locate one or more records.
Insertion Time: Includes search + index update cost.
Deletion Time: Includes item lookup + structure update.
Space Overhead: Additional storage cost for index structures.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 6 / 33


Defining Search Keys

A search key is an attribute or set of attributes used to find records.


Different from:
Primary key: Uniquely identifies tuples.
Candidate/Superkey: Alternative definitions of uniqueness.
Search keys may not be unique and can vary across indices.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 7 / 33


Dense vs. Sparse Index

Dense Index: Entry for every search-key value.


Sparse Index: Entry for only some search-key values.
Both help optimize record retrieval from large databases.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 8 / 33


Dense vs. Sparse Index

Dense Index: Entry for every search-key value.


Sparse Index: Entry for only some search-key values.
Both help optimize record retrieval from large databases.
Applicability:
Dense: Suitable for both clustering and nonclustering indices.
Sparse: Only applicable to clustering indices.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 8 / 33


Dense Index Variants

Clustering Index:
Records stored sorted by search key.
Index points to the first record in a sequence.
Nonclustering Index:
Records not sorted by search key.
Index stores a list of pointers to matching records.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 9 / 33


Dense Index Variants

Clustering Index:
Records stored sorted by search key.
Index points to the first record in a sequence.
Nonclustering Index:
Records not sorted by search key.
Index stores a list of pointers to matching records.
Example:
Instructor ID “22222” → direct pointer from dense index.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 9 / 33


Sparse Index Details

Requires relation to be sorted by search key.


Index stores entry for select keys only.
Lookup involves:
Finding the largest search-key target.
Sequential search from that point.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 10 / 33


Sparse Index Details

Requires relation to be sorted by search key.


Index stores entry for select keys only.
Lookup involves:
Finding the largest search-key target.
Sequential search from that point.
Example:
Looking for “22222”, index gives pointer to “10101”.
Scan forward to locate desired record.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 10 / 33


Dense and Sparse Indexing

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 11 / 33


Dense and Sparse Indexing

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 12 / 33


Dense and Sparse Indexing

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 13 / 33


Scaling Indexing with Relation Size

Problem: Large indices can’t fit entirely in memory.


Disk-based search (e.g., binary search) is slow due to multiple block
reads.
Example: 10,000 blocks → 14 reads → 140 ms per search.
Overflow blocks prevent binary search → require even slower
sequential scan.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 14 / 33


Scaling Indexing with Relation Size

Problem: Large indices can’t fit entirely in memory.


Disk-based search (e.g., binary search) is slow due to multiple block
reads.
Example: 10,000 blocks → 14 reads → 140 ms per search.
Overflow blocks prevent binary search → require even slower
sequential scan.
Implication: Need for a faster, scalable search mechanism.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 14 / 33


Multilevel Index Explained

Treat first-level index as a file.


Build a second-level index on top of first-level blocks.
Reduces search cost to:
One block read per level.
Significantly fewer total disk accesses.
Similar to B-tree architecture, without needing to balance nodes.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 15 / 33


Multilevel Index Explained

Treat first-level index as a file.


Build a second-level index on top of first-level blocks.
Reduces search cost to:
One block read per level.
Significantly fewer total disk accesses.
Similar to B-tree architecture, without needing to balance nodes.
Trade-off: Slight increase in space overhead for faster access.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 15 / 33


Search Time Comparison

Single-Level Index Multilevel Index


Binary Search → log(b) block Typically 2–3 block reads
reads Faster and scalable for big
Slower if b is large datasets

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 16 / 33


Dense and Sparse Indexing

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 17 / 33


Updating Index Structures

Every index must be updated on:


Record Insertion
Record Deletion
Attribute Update (modeled as delete + insert)
Affected indices depend on the updated search-key attributes.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 18 / 33


Updating Index Structures

Every index must be updated on:


Record Insertion
Record Deletion
Attribute Update (modeled as delete + insert)
Affected indices depend on the updated search-key attributes.
Example:
Change in instructor’s department → update index on dept name.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 18 / 33


Index Insertion Procedure

Dense Index
If search key not present → insert entry at correct position.
If present:
Store pointer to new record (if multi-pointer index).
Place record sequentially (if only first-pointer index).

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 19 / 33


Index Insertion Procedure

Dense Index
If search key not present → insert entry at correct position.
If present:
Store pointer to new record (if multi-pointer index).
Place record sequentially (if only first-pointer index).
Sparse Index
Insert search key if new block is created.
Update index entry only if new record is least in its block.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 19 / 33


Index Deletion Procedure

Dense Index
If only record with key → remove entry.
If multiple records:
Remove pointer to deleted record.
If first record is deleted → update index pointer.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 20 / 33


Index Deletion Procedure

Dense Index
If only record with key → remove entry.
If multiple records:
Remove pointer to deleted record.
If first record is deleted → update index pointer.
Sparse Index
If key not present → no update.
If present:
Replace or remove index entry based on key position.
Update pointer if it points to deleted record.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 20 / 33


Using Multiple Single-Key Indices

Example Query:
SELECT ID FROM instructor WHERE dept name = ’Finance’
AND salary = 80000;

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 21 / 33


Using Multiple Single-Key Indices

Example Query:
SELECT ID FROM instructor WHERE dept name = ’Finance’
AND salary = 80000;
Strategy Options:
1 Use index on dept name, filter by salary.
2 Use index on salary, filter by dept name.
3 Intersect results from both indices → most efficient.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 21 / 33


Limitations of Multi-Index Intersection

If many records match each individual condition...


But few satisfy both conditions...
Scanning many pointers yields small result set.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 22 / 33


Limitations of Multi-Index Intersection

If many records match each individual condition...


But few satisfy both conditions...
Scanning many pointers yields small result set.
Better Alternative:
Use a composite index or bitmap index.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 22 / 33


Indices on Composite Search Keys

Search Key: (dept name, salary)


Efficient for queries using both attributes.
Supports:
Equality on both: dept name = ’Finance’, salary = 80000
Equality on first, range on second: salary < 80000
Equality on first: treated as range query

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 23 / 33


Trade-offs of Composite Indices

Limitation:
Query with comparison on first key (e.g., dept name < ’Finance’)
leads to many I/O operations.
Data blocks might be scattered due to sorted ordering.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 24 / 33


Trade-offs of Composite Indices

Limitation:
Query with comparison on first key (e.g., dept name < ’Finance’)
leads to many I/O operations.
Data blocks might be scattered due to sorted ordering.
Conclusion:
Composite indices are powerful but must be tailored to query patterns.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 24 / 33


Static Hashing: Motivation

Why Hashing?
Avoids index structure and binary search overhead.
Direct access to bucket using a hash function.
Used both for file organization and hash-based indexing.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 25 / 33


Buckets and Basic Operations

Terminology:
Bucket: Unit of storage (often a disk block).
Hash Function: h : K → B, maps search key to bucket address.
Operations:
Insert: Compute h(Ki ) → place record.
Lookup: Compute h(Ki ) → search bucket.
Delete: Compute h(Ki ) → find and remove record.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 26 / 33


Collision Handling

Scenario: h(K5 ) = h(K7 )


Multiple records in same bucket.
Need to compare actual key values to resolve.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 27 / 33


Collision Handling

Scenario: h(K5 ) = h(K7 )


Multiple records in same bucket.
Need to compare actual key values to resolve.
Challenge: Avoid poor hash functions mapping everything to same
bucket.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 27 / 33


Desirable Properties of Hash Functions

Uniformity: Keys evenly distributed across all buckets.


Randomness: No correlation to natural key ordering.
Scalability: Effective even with unknown future data.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 28 / 33


Desirable Properties of Hash Functions

Uniformity: Keys evenly distributed across all buckets.


Randomness: No correlation to natural key ordering.
Scalability: Effective even with unknown future data.
Examples:
Alphabet-based hash: Simple, not uniform.
Salary range hash: Uniform if salary distribution is even.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 28 / 33


Bucket Overflow Scenarios

Definition: Overflow occurs when a bucket can’t store a newly inserted


record.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 29 / 33


Bucket Overflow Scenarios

Definition: Overflow occurs when a bucket can’t store a newly inserted


record.
Causes:
nr
Insufficient Buckets: nB < fr
Skewed Distribution:
Multiple records with same key.
Poor hash function → uneven spread.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 29 / 33


Hashing

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 30 / 33


Design Strategy to Minimize Overflow

Bucket Count Formula:


nr
B= × (1 + d), d ≈ 0.2
fr

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 31 / 33


Design Strategy to Minimize Overflow

Bucket Count Formula:


nr
B= × (1 + d), d ≈ 0.2
fr
Trade-off:
Wastes 20
But lowers chance of overflow.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 31 / 33


Overflow Chaining

Concept:
Extra bucket allocated if primary is full.
All overflow buckets are linked in a list.

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 32 / 33


Overflow Chaining

Concept:
Extra bucket allocated if primary is full.
All overflow buckets are linked in a list.
Structure:
Bucket b Overflow Bucket 1 Overflow Bucket 2

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 32 / 33


Hashing

Prof. Dr. Rafiqul Islam Database System Concepts 31 July, 2025 33 / 33

You might also like