0% found this document useful (0 votes)
17 views158 pages

Lecture 9

Module 41 covers indexing and hashing in database management systems, focusing on the need for indexing, types of indices (ordered, dense, sparse, primary, secondary), and their evaluation metrics. It discusses the organization of database files, mechanisms for fast access, and the implications of index updates during record modifications. The module emphasizes the importance of efficient data retrieval and the structure of indices to optimize database performance.

Uploaded by

23f2005522
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views158 pages

Lecture 9

Module 41 covers indexing and hashing in database management systems, focusing on the need for indexing, types of indices (ordered, dense, sparse, primary, secondary), and their evaluation metrics. It discusses the organization of database files, mechanisms for fast access, and the implications of index updates during record modifications. The module emphasizes the importance of efficient data retrieval and the structure of indices to optimize database performance.

Uploaded by

23f2005522
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Database Management Systems
Outline
Module 41: Indexing and Hashing/1: Indexing/1
Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files Partha Pratim Das
Primary and
Secondary Indices
Multilevel Index
Index Update
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
Module Summary
[email protected]

Database Management Systems Partha Pratim Das 41.1


Week Recap PPD

Module 41
• Need for algorithm analysis, Asymptotic complexity, and Worst-case, average-case and
Partha Pratim
Das
best-case analysis
Week Recap
• Reviewed Linear Data Structures; array, list, stack, queue; and linear and binary search
Objectives & • Reviewed Non-linear Data Structures - graph, tree, hash table; Binary Search Tree; and
Outline

Indexing
compared Linear and Non-Linear Data Structures
Metrics
• Understood the range of Physical Storage Media
Ordered Indices
Dense Index Files • Studied about Magnetic Disks and Magnetic Tape
Sparse Index Files
Primary and
Secondary Indices
• Glimpsed through Other Storage and the Future of Storage
Multilevel Index
Index Update
• Familiarized with the organization for database files
Module Summary • Understood how records and relations are organized in files
• Learnt how databases keep their own information in Data-Dictionary Storage – the
metadata database of a database
• Understood the mechanisms for fast access of a database store

Database Management Systems Partha Pratim Das 41.2


Module Objectives PPD

Module 41

Partha Pratim • To understand the reasons for which we need to index database table
Das
• To learn about the ordered indexes and Indexed Sequential Access Mechanism
Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.3


Module Outline PPD

Module 41

Partha Pratim • Basic Concepts of Indexing


Das
• Ordered Indices
Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.4


Concepts of Indexing PPD

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary
Concepts of Indexing

Database Management Systems Partha Pratim Das 41.5


Search Records PPD

Module 41 • Consider a table: Faculty(Name, Phone)


Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index • How to search on Name?
Index Update
◦ Get the phone number for ‘Pabitra Mitra’
Module Summary
◦ Use “Name” Index – sorted on ‘Name’, search ‘Pabitra Mitra’ and navigate on pointer (rec #)
• How to search on Phone?
◦ Get the name of the faculty having phone number = 84772
◦ Use “Phone” Index – sorted on ‘Phone’, search ‘84772’ and navigate on pointer (rec #)
• We can keep the records sorted on ‘Name’ or on ‘Phone’ (called the primary index), but not on both
Database Management Systems Partha Pratim Das 41.6
Basic Concepts

Module 41

Partha Pratim • Indexing mechanisms used to speed up access to desired data.


Das
◦ For example:
Week Recap

Objectives &
. Name in a faculty table
Outline . author catalog in library
Indexing
Metrics
• Search Key - attribute to set of attributes used to look up records in a file
Ordered Indices • An index file consists of records (called index entries) of the form
Dense Index Files
Sparse Index Files
Primary and search-key pointer
Secondary Indices
Multilevel Index
Index Update • Index files are typically much smaller than the original file
Module Summary
• Two basic kinds of indices:
◦ Ordered indices: search keys are stored in sorted order
◦ Hash indices: search keys are distributed uniformly across buckets using a hash
function

Database Management Systems Partha Pratim Das 41.7


Index Evaluation Metrics

Module 41

Partha Pratim • Access types supported efficiently. For example,


Das
◦ records with a specified value in the attribute, or
Week Recap
◦ records with an attribute value falling in a specified range of values
Objectives &
Outline • Access time
Indexing
Metrics • Insertion time
Ordered Indices
Dense Index Files
• Deletion time
Sparse Index Files
Primary and
• Space overhead
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.8


Ordered Indices

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary
Ordered Indices

Database Management Systems Partha Pratim Das 41.9


Ordered Indices

Module 41

Partha Pratim • In an ordered index, index entries are stored sorted on the search key value. For
Das
example, author catalog in library
Week Recap
• Primary index: in a sequentially ordered file, the index whose search key specifies the
Objectives &
Outline sequential order of the file
Indexing
Metrics
◦ Also called clustering index
Ordered Indices
◦ The search key of a primary index is usually but not necessarily the primary key
Dense Index Files
Sparse Index Files
• Secondary index: an index whose search key specifies an order different from the
Primary and
Secondary Indices
sequential order of the file
Multilevel Index
Index Update
◦ Also called non-clustering index
Module Summary • Index-sequential file: ordered sequential file with a primary index

Database Management Systems Partha Pratim Das 41.10


Dense Index Files

Module 41

Partha Pratim • Dense index — Index record appears for every search-key value in the file.
Das
• For example, index on ID attribute of instructor relation
Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.11


Dense Index Files (2)

Module 41

Partha Pratim • Dense index on dept name, with instructor file sorted on dept name
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.12


Sparse Index Files

Module 41

Partha Pratim • Sparse Index: contains index records for only some search-key values.
Das
◦ Applicable when records are sequentially ordered on search-key
Week Recap

Objectives &
• To locate a record with search-key value K we:
Outline
◦ Find index record with largest search-key value < K
Indexing
Metrics
◦ Search file sequentially starting at the record to which the index record points
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.13


Sparse Index Files (2)

Module 41

Partha Pratim • Compared to dense indices:


Das
◦ Less space and less maintenance overhead for insertions and deletions
Week Recap
◦ Generally slower than dense index for locating records
Objectives &
Outline • Good tradeoff: sparse index with an index entry for every block in file, corresponding
Indexing
Metrics
to least search-key value in the block
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.14


Secondary Indices Example

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary
Secondary index on salary field of instructor

• Index record points to a bucket that contains pointers to all the actual records with
that particular search-key value.
• Secondary indices have to be dense
Database Management Systems Partha Pratim Das 41.15
Primary and Secondary Indices

Module 41

Partha Pratim • Indices offer substantial benefits when searching for records
Das
• BUT: Updating indices imposes overhead on database modification –when a file is
Week Recap
modified, every index on the file must be updated
Objectives &
Outline
• Sequential scan using primary index is efficient, but a sequential scan using a secondary
Indexing
Metrics
index is expensive
Ordered Indices ◦ Each record access may fetch a new block from disk
Dense Index Files
Sparse Index Files
◦ Block fetch requires about 5 to 10 milliseconds, versus about 100 nanoseconds for
Primary and
Secondary Indices
memory access
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.16


Multilevel Index

Module 41

Partha Pratim • If primary index does not fit in memory, access becomes expensive
Das
• Solution: treat primary index kept on disk as a sequential file and construct a sparse
Week Recap
index on it
Objectives &
Outline ◦ outer index – a sparse index of primary index
Indexing
Metrics
◦ inner index – the primary index file
Ordered Indices • If even outer index is too large to fit in main memory, yet another level of index can be
Dense Index Files
Sparse Index Files
created, and so on
Primary and
Secondary Indices • Indices at all levels must be updated on insertion or deletion from the file
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.17


Multilevel Index (2)

Module 41

Partha Pratim
Das

Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.18


Index Update: Deletion

Module 41

Partha Pratim
Das
• If deleted record was the only
Week Recap record in the file with its partic-
Objectives &
Outline
ular search-key value, the search-
Indexing
key is deleted from the index also.
Metrics

Ordered Indices
Dense Index Files • Single-level index entry deletion:
Sparse Index Files
Primary and
Secondary Indices
◦ Dense indices – deletion of search-key is similar to file record deletion
Multilevel Index ◦ Sparse indices –
Index Update

Module Summary
. If an entry for the search key exists in the index, it is deleted by replacing the
entry in the index with the next search-key value in the file (in search-key order)
. If the next search-key value already has an index entry, the entry is deleted
instead of being replaced

Database Management Systems Partha Pratim Das 41.19


Index Update (2): Insertion

Module 41

Partha Pratim • Single-level index insertion:


Das
◦ Perform a lookup using the search-key value appearing in the record to be inserted
Week Recap
◦ Dense indices – if the search-key value does not appear in the index, insert it
Objectives &
Outline ◦ Sparse indices – if index stores an entry for each block of the file, no change needs
Indexing to be made to the index unless a new block is created
Metrics

Ordered Indices
. If a new block is created, the first search-key value appearing in the new block is
Dense Index Files inserted into the index
Sparse Index Files
Primary and
Secondary Indices
• Multilevel insertion and deletion: algorithms are simple extensions of the single-level
Multilevel Index algorithms
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.20


Secondary Indices

Module 41

Partha Pratim • Frequently, one wants to find all the records whose values in a certain field (which is
Das
not the search-key of the primary index) satisfy some condition
Week Recap
◦ Example 1: In the instructor relation stored sequentially by ID, we may want to find
Objectives &
Outline all instructors in a particular department
Indexing ◦ Example 2: as above, but where we want to find all instructors with a specified
Metrics
salary or with salary in a specified range of values
Ordered Indices
Dense Index Files • We can have a secondary index with an index record for each search-key value
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary

Database Management Systems Partha Pratim Das 41.21


Module Summary

Module 41

Partha Pratim • Appreciated the reasons for indexing database tables


Das
• Understood the ordered indexes
Week Recap

Objectives &
Outline

Indexing
Metrics

Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update

Module Summary
Slides used in this presentation are borrowed from http://db-book.com/ with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 41.22


Module 42

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Balanced BST
Module 42: Indexing and Hashing/2: Indexing/2
2-3-4 Tree
Search
Insert
Split
Example
Delete
Partha Pratim Das
Observations

Module Summary Department of Computer Science and Engineering


Indian Institute of Technology, Kharagpur

[email protected]

Database Management Systems Partha Pratim Das 42.1


Module Recap PPD

Module 42

Partha Pratim • Appreciated the reasons for indexing database tables


Das
• Understood the ordered indexes
Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.2


Module Objectives PPD

Module 42

Partha Pratim • To recap Balanced Binary Search Trees as options for optimal in-memory search data
Das
structures
Objectives &
Outline • To understand the issues relating to external search data structures for persistent data
Balanced BST
• To study 2-3-4 Tree as a precursor to B/B+-Tree for an efficient external data
2-3-4 Tree
Search
structure for database and index tables
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.3


Module Outline PPD

Module 42

Partha Pratim • Balanced Binary Search Trees


Das
• 2-3-4 Tree
Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.4


Balanced Binary Search Trees PPD

Module 42

Partha Pratim
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Balanced Binary Search Trees

Database Management Systems Partha Pratim Das 42.5


Search Data Structures PPD

Module 42
• How to search a key in a list of n data items?
Partha Pratim
◦ Linear Search: O(n): Find 28 ⇒ 16 comparisons
Das . Unordered items in an array – search sequentially
Objectives &
. Unordered / Ordered items in a list – search sequentially
Outline

Balanced BST

2-3-4 Tree
◦ Binary Search: O(lg n): Find 28 ⇒ 4 comparisons – 25, 36, 30, 28
Search . Ordered items in an array – search by divide-and-conquer
Insert
Split
Example
Delete
. Binary Search Tree – recursively on left / right
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.6


Search Data Structures (2) PPD

Module 42

Partha Pratim • Worst case time (n data items in the data structure):
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

• Between an array and a list, there is a trade-off between search and insert/delete
complexity
• For a BST of n nodes, lg n ≤ h < n, where h is the height of the tree
• A BST is balanced if h ∼ O(lg n): this what we desire

Database Management Systems Partha Pratim Das 42.7


Search Data Structures (3): BST PPD

Module 42
• In the worst case, searching a key in a BST is O(h), where h is the height of the key
Partha Pratim
Das • Bad Tree: h ∼ O(n)
Objectives & ◦ The BST is a skewed binary search tree (all the nodes except the leaf would have
Outline
only one child)
Balanced BST
◦ This can happen if keys are inserted in sorted order
2-3-4 Tree
Search ◦ Height (h) of the BST having n elements becomes n − 1
Insert
Split
◦ Time complexity of search in BST becomes O(n)
Example
Delete
• Good Tree: h ∼ O(lg n)
Observations
◦ The BST is a balanced binary search tree
Module Summary
◦ This is possible if
. If keys are inserted in purely randomized order, Or
. If the tree is explicitly balanced after every insertion
◦ Height (h) of the binary search tree becomes lg n
◦ Time complexity of search in BST becomes O(lg n)

Database Management Systems Partha Pratim Das 42.8


Balanced Binary Search Trees PPD

Module 42
• A BST is balanced if h ∼ O(lg n)
Partha Pratim
Das • Balancing Guarantees may be of various types:
Objectives & ◦ Worst-case
Outline

Balanced BST
. AVL Tree: Self-balancing BST
2-3-4 Tree
− Named after inventors Adelson-Velsky-Landis
Search − Heights of the two child subtrees of any node differ by at most one: |hL − hR | ≤ 1
Insert − If they differ by more than one, rebalancing is done rotation
Split
Example ◦ Randomized
Delete
Observations . Randomized BST
Module Summary − A BST on n keys is random if either it is empty (n = 0), or the probability that a given
1
key is at the root is n
, and the left and right subtrees are random
. Skip List
− A skip list is built (probabbilistically) in layers of ordered linked lists
◦ Amortized
. Splay
− A BST where recently accessed elements are quick to access again
Database Management Systems Partha Pratim Das 42.9
Balanced Binary Search Trees (2) PPD

Module 42
• These data structures have optimal complexity for the required operations:
Partha Pratim
Das ◦ Search: O(lg n)
Objectives &
◦ Insert: Search + O(1): O(lg n)
Outline ◦ Delete: Search + O(1): O(lg n)
Balanced BST
• And they are:
2-3-4 Tree
Search ◦ Good for in-memory operations
Insert
Split ◦ Work well for small volume of data
Example
Delete
◦ Has complex rotation and / or similar operations
Observations ◦ Do not scale for external data structures
Module Summary

Database Management Systems Partha Pratim Das 42.10


2-3-4 Tree PPD

Module 42

Partha Pratim
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

2-3-4 Tree

Database Management Systems Partha Pratim Das 42.11


2-3-4 Trees PPD

Module 42

Partha Pratim • All leaves are at the same depth (the bottom level).
Das
◦ Height, h, of all leaf nodes are same
Objectives &
Outline . h ∼ O(lg n)
Balanced BST . Complexity of search, insert and delete: O(h) ∼ O(lg n)
2-3-4 Tree
Search
• All data is kept in sorted order
Insert
Split • Every node (leaf or internal) is a 2-node, 3-node or a 4-node (based on the number of
Example
Delete
links or children), and holds one, two, or three data elements, respectively
Observations
• Generalizes easily to larger nodes
Module Summary
• Extends to external data structures

Database Management Systems Partha Pratim Das 42.12


2-3-4 Trees PPD

Module 42

Partha Pratim • Uses 3 kinds of nodes satisfying key relationships as shown below:
Das
◦ A 2-node must contain a single data item (S) and two links
Objectives &
Outline ◦ A 3-node must contain two data items (S, L) and three links
Balanced BST ◦ A 4-node must contain three data items (S, M, L) and four links
2-3-4 Tree ◦ A leaf may contain either one, two, or three data items
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.13


2-3-4 Trees: Search PPD

Module 42

Partha Pratim • Search


Das
◦ Simple and natural extension of search in BST
Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.14


2-3-4 Trees: Insert PPD

Module 42

Partha Pratim • Insert


Das
◦ Search to find expected location
Objectives &
Outline . If it is a 2 node, change to 3 node and insert
Balanced BST . If it is a 3 node, change to 4 node and insert
2-3-4 Tree . If it is a 4 node, split the node by moving the middle item to parent node, then
Search
Insert insert
Split
Example ◦ Node Splitting
Delete
Observations . A 4-node is split as soon as it is encountered during a search from the root to a
Module Summary leaf
. The 4-node that is split will
− Be the root, or
− Have a 2-node parent, or
− Have a 3-node parent

Database Management Systems Partha Pratim Das 42.15


2-3-4 Trees: Insert PPD

Module 42

Partha Pratim • Splitting at Root


Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.16


2-3-4 Trees: Insert PPD

Module 42

Partha Pratim • Splitting with 2 Node parent


Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.17


2-3-4 Trees: Insert PPD

Module 42 • Splitting with 3 Node parent


Partha Pratim
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.18


2-3-4 Trees: Insert

Module 42

Partha Pratim • Node Splitting: There are two strategies:


Das
◦ Early : Split a 4-node as soon as you cross on in traversal. It ensures that the tree
Objectives &
Outline does not have a path with multiple 4-nodes at any point
Balanced BST ◦ Late: Split a 4-node only when you need to insert an item in it. This might lead to
2-3-4 Tree cases where for one insert we may need to perform O(h) splits going till up to the
Search
Insert
root
Split
Example
• Both are valid and has the same complexity O(h). However, they lead to different
Delete
Observations
results. Different texts and sites follow different strategies.
Module Summary • Here we are following early strategy

Database Management Systems Partha Pratim Das 42.19


2-3-4 Trees: Insert: Example PPD

Module 42

Partha Pratim • Insert 10, 30, 60, 20, 50, 40, 70, 80, 15, 90, 100
Das
• 10
Objectives &
Outline • 10, 30
Balanced BST • 10, 30, 60
2-3-4 Tree
Search
• Split for 20
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.20


2-3-4 Trees: Insert: Example PPD

Module 42

Partha Pratim • 10, 30, 60, 20


Das
• 10, 30, 60, 20, 50
Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.21


2-3-4 Trees: Insert: Example PPD

Module 42

Partha Pratim • 10, 30, 60, 20, 50, 40


Das
• Split for 70
Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.22


2-3-4 Trees: Insert: Example PPD

Module 42

Partha Pratim • 10, 30, 60, 20, 50, 40, 70


Das
• 10, 30, 60, 20, 50, 40, 70, 80
Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.23


2-3-4 Trees: Insert: Example PPD

Module 42

Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15
Das
• Split for 90
Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.24


2-3-4 Trees: Insert: Example PPD

Module 42

Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15, 90
Das
• Split for 100
Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.25


2-3-4 Trees: Insert: Example PPD

Module 42

Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15, 90, 100
Das

Objectives &
Outline

Balanced BST

2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.26


2-3-4 Trees: Delete PPD

Module 42

Partha Pratim • Delete


Das
◦ Locate the node n that contains the item theItem
Objectives &
Outline ◦ Find theItem’s inorder successor and swap it with theItem (deletion will always be
Balanced BST at a leaf)
2-3-4 Tree ◦ If that leaf is a 3-node or a 4-node, remove theItem
Search
Insert
◦ To ensure that theItem does not occur in a 2-node
Split
Example
. Transform each 2-node encountered into a 3-node or a 4-node
Delete . Reverse different cases illustrated for splitting
Observations

Module Summary

Database Management Systems Partha Pratim Das 42.27


2-3-4 Tree PPD

Module 42

Partha Pratim • Advantages


Das
◦ All leaves are at the same depth (the bottom level): Height, h ∼ O(lg n)
Objectives &
Outline ◦ Complexity of search, insert and delete: O(h) ∼ O(lg n)
Balanced BST ◦ All data is kept in sorted order
2-3-4 Tree ◦ Generalizes easily to larger nodes
Search
Insert
◦ Extends to external data structures
Split
Example
• Disadvantages
Delete
Observations
◦ Uses variety of node types – need to destruct and construct multiple nodes for
Module Summary converting a 2 Node to 3 Node, a 3 Node to 4 Node, for splitting etc.

Database Management Systems Partha Pratim Das 42.28


2-3-4 Trees PPD

Module 42
• Consider only one node type with space for 3 items and 4 links
Partha Pratim
Das ◦ Internal node (non-root) has 2 to 4 children (links)
Objectives & ◦ Leaf node has 1 to 3 items
Outline
◦ Wastes some space, but has several advantages for external data structure
Balanced BST

2-3-4 Tree • Generalizes easily to larger nodes


Search
Insert
◦ All paths from root to leaf are of the same length  
n
Split ◦ Each node that is not a root
l or ma leaf has between 2 and n children.
Example
Delete
Observations
◦ A leaf node has between (n−1) 2 and n − 1 values
Module Summary ◦ Special cases:
. If the root is not a leaf, it has at least 2 children.
. If the root is a leaf, it can have between 0 and (n − 1) values.
• Extends to external data structures
◦ B-Tree
◦ 2-3-4 Tree is a B-Tree where n = 4
Database Management Systems Partha Pratim Das 42.29
Module Summary

Module 42

Partha Pratim • Recapitulated the notions of Balanced Binary Search Trees as options for optimal
Das
in-memory search data structures
Objectives &
Outline • Understood the issues relating to external data structures for persistent data
Balanced BST
• Explored 2-3-4 Tree in depth as a precursor to B/B+-Tree for an efficient external data
2-3-4 Tree
Search
structure for database and index tables
Insert
Split
Example
Delete
Observations

Module Summary

Slides used in this presentation are borrowed from http://db-book.com/ with kind
permission of the authors.
Edited and new slides are marked with “PPD”.

Database Management Systems Partha Pratim Das 42.30


Module 43

Partha Pratim
Das

Objectives &
Outline Database Management Systems
B+-Tree Index
Files Module 43: Indexing and Hashing/3: Indexing/3
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Partha Pratim Das
Updates
Insertion
Department of Computer Science and Engineering
Deletion
Indian Institute of Technology, Kharagpur
File Organization
Non-Unique Keys
Relocation and
[email protected]
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.1
Module Recap PPD

Module 43

Partha Pratim • Recapitulated the notions of Balanced Binary Search Trees as options for optimal
Das
in-memory search data structures
Objectives &
Outline • Understood the issues relating to external data structures for persistent data
B+-Tree Index
Files
• Explored 2-3-4 Tree in depth as a precursor to B/B+-Tree for an efficient external data
Simple B
+
Tree structure for database and index tables
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.2
Module Objectives PPD

Module 43

Partha Pratim • To understand the design of B+ Tree Index Files as a generalization of 2-3-4 Tree
Das
• To understand the fundamentals of B-Tree Index Files
Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.3
Module Outline PPD

Module 43

Partha Pratim • B+ Tree Index Files


Das
• B-Tree Index Files
Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.4
B+ Tree Index Files PPD

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion

B+ Tree Index Files


File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.5
B+ Tree

Module 43 The B+ Tree


Partha Pratim
Das
• Is a balanced binary search tree
Objectives &
◦ Follows a multi-level index format like 2-3-4 Tree
Outline
• Has the leaf nodes denoting actual data pointers
B+-Tree Index
Files
Simple B
+
Tree
• Ensures that all leaf nodes remain at the same height (like 2-3-4 Tree)
Index Files
Nodes
• Has the leaf nodes are linked using a link list
Observations
◦ Can support random access as well as sequential access
Query
Duplicates
Updates
• Example:
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison
Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.6
B+ Tree (2)

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
• Internal node contains
Observations
Query
◦ At least n2 child pointers, except the root node
Duplicates ◦ At most n pointers Note: These are approximate
Updates
Insertion • Leaf node contains values, we will discuss more
Deletion
File Organization ◦ At least n2 record pointers and n2 key values precise values later in this lecture.
Non-Unique Keys
Relocation and ◦ At most n record pointer and n key values
Secondary Indices
Strings
◦ One block pointer P to point to next leaf node
B-Tree Index
Files
Source: B+ Tree
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.7
B+ Tree (3): Search

Module 43
• Suppose we have to search 55 in the B+ tree below
Partha Pratim
Das ◦ First, we will fetch for the intermediary node which will direct to the leaf node that
Objectives &
can contain a record for 55
Outline
• So, in the intermediary node, we will find a branch between 50 and 75 nodes
B+-Tree Index
Files
+
◦ Then at the end, we will be redirected to the third leaf node
Simple B Tree
Index Files ◦ Here DBMS will perform a sequential search to find 55
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings Source: B+ Tree

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.8
B+ Tree (3): Insert

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query • Suppose we want to insert a record 60 that goes to 3rd leaf node after 55
Duplicates
Updates • The leaf node of this tree is already full, so we cannot insert 60 there
Insertion
Deletion • So we have to split the leaf node, so that it can be inserted into tree without affecting
File Organization
Non-Unique Keys
the fill factor, balance and order
• The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50
Relocation and
Secondary Indices
Strings

B-Tree Index
• We will split the leaf node of the tree in the middle so that its balance is not altered
Files
Comparison Source: B+ Tree

Module Summary
Database Management Systems Partha Pratim Das 43.9
B+ Tree (4): Insert

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query • So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes
Duplicates
Updates • If these two has to be leaf nodes, the intermediate node cannot branch from 50
Insertion
Deletion • It should have 60 added to it, and then we can have pointers to a new leaf node
File Organization
Non-Unique Keys • This is how we can insert an entry when there is overflow. In a normal scenario, it is
Relocation and
Secondary Indices very easy to find the node where it fits and then place it in that leaf node
Strings
Source: B+ Tree
B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.10
B+ Tree (5): Delete

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations • To delete 60, we have to remove 60 from intermediate node as well as 4th leaf node
Query
Duplicates
• If we remove it from the intermediate node, then the tree will not remain a B+ tree
Updates
Insertion
Deletion
• So with deleting 60 we re-arranging the nodes:
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison
Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.11
B+ Tree Index Files

Module 43

Partha Pratim • B+ tree indices are an alternative to indexed-sequential files


Das
• Disadvantage of ISAM files
Objectives &
Outline ◦ Performance degrades as file grows, since many overflow blocks get created
B+-Tree Index
Files
◦ Periodic reorganization of entire file is required
• Advantage of B+ tree index files:
+
Simple B Tree
Index Files
Nodes
Observations
◦ Automatically reorganizes itself with small, local, changes, in the face of insertions
Query and deletions
Duplicates
Updates
◦ Reorganization of entire file is not required to maintain performance
• (Minor) disadvantage of B+ trees:
Insertion
Deletion
File Organization
Non-Unique Keys
◦ Extra insertion and deletion overhead, space overhead
• Advantages of B+ trees outweigh disadvantages
Relocation and
Secondary Indices
Strings

B-Tree Index
◦ B+ trees are used extensively
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.12
B+ Tree Index Files (2): Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.13
B+ Tree Index Files (3): Structure

Module 43

Partha Pratim
A B+ tree is a rooted tree satisfying the following properties:
Das
• All paths from root to leaf are of the same length
• Each node that is not a root or a leaf has between d n2 e and n children
Objectives &
Outline

B+-Tree Index
Files
• A leaf node has between an d n−1
2 e and n − 1 values
+
Simple B
Index Files
Tree
• Special cases:
Nodes
Observations
◦ If the root is not a leaf, it has at least 2 children.
Query ◦ If the root is a leaf (that is, there are no other nodes in the tree), it can have
Duplicates
Updates between 0 and (n − 1) values.
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.14
B+ Tree Index Files (4): Node Structure

Module 43

Partha Pratim • Typical node


Das

Objectives &
Outline

B+-Tree Index ◦ Ki are the search-key values


Files
Simple B
+
Tree ◦ Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of
Index Files
Nodes
records (for leaf nodes).
Observations
Query
• The search-keys in a node are ordered
Duplicates K1 < K2 < K3 < · · · < Kn−1
Updates
Insertion (Initially assume no duplicate keys, address duplicates later)
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.15
B+ Tree Index Files (5): Leaf Nodes

Module 43 Properties of a leaf node


Partha Pratim
Das
• For i = 1, 2, · · · , n − 1, pointer Pi points to a file record with search-key value Ki ,
Objectives &
• If Li , Lj are leaf nodes and i < j, Li ’s search-key values are less than or equal to Lj ’s
Outline
search-key values
B+-Tree Index
Files
+
• Pn points to next leaf node in search-key order
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.16
B+ Tree Index Files (6): Non-Leaf Nodes

Module 43

Partha Pratim • Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node
Das
with m pointers:
Objectives &
Outline ◦ All the search-keys in the subtree to which P1 points are less than K1
B+-Tree Index ◦ For 2 ≤ i ≤ n − 1, all the search-keys in the subtree to which Pi points have values
Files
Simple B
+
Tree
greater than or equal to Ki−1 and less than Ki
Index Files
Nodes
◦ All the search-keys in the subtree to which Pn points have values greater than or
Observations equal to Kn−1
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.17
B+ Tree Index Files (7): Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index B+ tree for instructor file (n = 6)


Files
+
Simple B Tree
Index Files • Leaf nodes must have between 3 and 5 values: d n−1
2 e and n − 1, with n = 6
Nodes
Observations • Non-leaf nodes other than root must have between 3 and 6 children: d n2 e and n with
Query
Duplicates n=6
Updates
Insertion • Root must have at least 2 children
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.18
B+ Tree Index Files: Observations

Module 43
• Since the inter-node connections are done by pointers, logically close blocks need not
Partha Pratim
Das
be physically close
Objectives &
• The non-leaf levels of the B+ tree form a hierarchy of sparse indices
Outline
• The B+ tree contains a relatively small number of levels
B+-Tree Index
◦ Level below root has at least 2 ∗ n2 values
 
Files
+
Simple B Tree
Index Files ◦ Next level has at least 2 ∗ d n2 e ∗ d n2 e values
Nodes
Observations
◦ ... etc.
Query
Duplicates
◦ If there are K search-key values in the file, the tree height is no more than
Updates dlogdn/2e (K )e
Insertion
Deletion ◦ thus searches can be conducted efficiently
File Organization
Non-Unique Keys • Insertions and deletions to the main file can be handled efficiently, as the index can be
Relocation and
Secondary Indices restructured in logarithmic time
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.19
B+ Tree Index Files: Queries

Module 43 • Find record with search-key value V


Partha Pratim a) C = root
Das
b) While C is not a leaf node
Objectives &
Outline i) Let i be least value such that V ≤ Ki
B+-Tree Index ii) If no such exists, set C = last non-null pointer in C
Files
Simple B
+
Tree
iii) Else { if (V = Ki ) Set C = Pi+1 else set C = Pi }
Index Files
Nodes
c) Let i be least value s.t. Ki = V
Observations d) If there is such a value i, follow pointer Pi to the desired record
Query
Duplicates e) Else no record with search-key value k exists
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.20
B+ Trees Index Files: Queries (2)

Module 43

Partha Pratim • If
l there are mK search-key values in the file, the height of the tree is no more than
Das
logd n e (K )
Objectives & 2
Outline
• A node is generally the same size as a disk block, typically 4 kilobytes
B+-Tree Index
Files
+
◦ and n is typically around 100 (40 bytes per index entry)
Simple B Tree
Index Files • With 1 million search key values and n = 100
Nodes
Observations ◦ at most log50 (1, 000, 000) = 4 nodes are accessed in a lookup
Query
Duplicates • Contrast this with a balanced binary tree with 1 million search key values — around 20
Updates
Insertion nodes are accessed in a lookup
Deletion
File Organization
◦ above difference is significant since every node access may need a disk I/O, costing
Non-Unique Keys
around 20 milliseconds
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.21
B+ Tree Index Files: Handling Duplicates

Module 43

Partha Pratim • With duplicate search keys


Das
◦ In both leaf and internal nodes,
Objectives &
Outline . we cannot guarantee that K1 < K2 < K3 < · · · < Kn−1
B+-Tree Index
Files
. but can guarantee K1 ≤ K2 ≤ K3 ≤ · · · ≤ Kn−1
Simple B
+
Tree ◦ Search-keys in the subtree to which Pi points
Index Files
Nodes . are ≤ Ki , but not necessarily < Ki ,
Observations
Query . To see why, suppose same search key value V is present in two leaf node Li and
Duplicates
Updates
Li+1 . Then in parent node Ki must be equal to V
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.22
B+ Tree Index Files: Handling Duplicates (2)

Module 43

Partha Pratim • We modify find procedure as follows


Das
◦ traverse Pi even if V = Ki
Objectives &
Outline ◦ As soon as we reach a leaf node C check if C has only search key values less than V
B+-Tree Index
Files
. if so set C = right sibling of C before checking whether C contains V
Simple B
Index Files
+
Tree
• Procedure printAll
Nodes
◦ uses modified find procedure to find first occurrence of V
Observations
Query ◦ Traverse through consecutive leaves to find all occurrences of V
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.23
Updates on B+ Trees: Insertion

Module 43

Partha Pratim • Find the leaf node in which the search-key value would appear
Das
• If the search-key value is already present in the leaf node
Objectives &
Outline ◦ Add record to the file
B+-Tree Index
Files
◦ If necessary add a pointer to the bucket
+
Simple B
Index Files
Tree
• If the search-key value is not present, then
Nodes
Observations
◦ Add the record to the main file (and create a bucket if necessary)
Query ◦ If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node
Duplicates
Updates
◦ Otherwise, split the node (along with the new (key-value, pointer) entry) as
Insertion
Deletion
discussed in the next slide
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.24
Updates on B+ Trees: Insertion (2)

Module 43

Partha Pratim • Splitting a leaf node:


Das
◦ take the n (search-key value,pointer) pairs (including the one being inserted) in
sorted order. Place the first n2 in the original node, and the rest in a new node
Objectives &

Outline

B+-Tree Index ◦ let the new node be p, and let k be the least key value in p. Insert (k, p) in the
Files
Simple B
+
Tree
parent of the node being split
Index Files
Nodes
◦ If the parent is full, split it and propagate the split further up
Observations
Query
• Splitting of nodes proceeds upwards till a node that is not full is found
Duplicates
◦ In the worst case the root node may be split increasing the height of the tree by 1
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Secondary Indices Next step: insert entry with (Califieri,pointer-to-new-node) into parent
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.25
Updates on B+ Trees: Insertion (3)

Module 43

Partha Pratim • Splitting a non-leaf node: when inserting (k, p) into an already full internal node N
Das
◦ Copy N to an in-memory area M with space for n + 1 pointers and n keys
Objectives &
Outline ◦ Insert (k, p) into M
B+-Tree Index ◦ Copy P1 , K1 , · · · , Kd n e−1 , Pd n e from M back into node N
Files 2 2
Simple B
+
Tree ◦ Copy Pd n e+1 , Kd n e+1 , · · · , Kn , Pn+1 from M into newly allocated node N 0
Index Files 2 2
Nodes ◦ Insert (Kd n e , N 0 ) into parent N
Observations 2
Query
Duplicates
• Read pseudocode in book!
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.26
Updates on B+ Trees: Insertion Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files B+ Tree before and after insertion of “Adams”
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.27
Updates on B+ Trees: Insertion Example (2)

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index B+ Tree before and after insertion of “Lamport”


Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.28
Updates on B+ Trees: Deletion

Module 43

Partha Pratim • Find the record to be deleted, and remove it from the main file and from the bucket (if
Das
present)
Objectives &
Outline • Remove (search-key value, pointer) from the leaf node if there is no bucket or if the
B+-Tree Index bucket has become empty
Files
Simple B
+
Tree • If the node has too few entries due to the removal, and the entries in the node and a
Index Files
Nodes sibling fit into a single node, then merge siblings:
Observations
Query ◦ Insert all the search-key values in the two nodes into a single node (the one on the
Duplicates
Updates
left), and delete the other node.
Insertion ◦ Delete the pair (Ki−1 , Pi ), where Pi is the pointer to the deleted node, from its
Deletion
File Organization
parent, recursively using the above procedure.
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.29
Updates on B+ Trees: Deletion (2)

Module 43

Partha Pratim • Otherwise, if the node has too few entries due to the removal, but the entries in the
Das
node and a sibling do not fit into a single node, then redistribute pointers:
Objectives &
Outline ◦ Redistribute the pointers between the node and a sibling such that both have more
B+-Tree Index than the minimum number of entries
Files
Simple B
+
Tree
◦ Update the corresponding search-key value in the parent of the node
• The node deletions may cascade upwards till a node which has n2 or more pointers is
Index Files
 
Nodes
Observations
Query
found
Duplicates
Updates
• If the root node has only one pointer after deletion, it is deleted and the sole child
Insertion becomes the root
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.30
Updates on B+ Trees: Deletion Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Before and after deleting “Srinivasan”
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
Deleting “Srinivasan” causes merging of under-full leaves
B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.31
Updates on B+ Trees: Deletion Example (2)

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Deletion of “Singh” and “Wu” from result of previous example
Observations
Query
Duplicates • Leaf containing Singh and Wu became underfull, and borrowed a value Kim from its
Updates
Insertion left sibling
Deletion
File Organization • Search-key value in the parent changes as a result
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.32
Updates on B+ Trees: Deletion Example (3)

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization Before and after deletion of “Gold” from earlier example
Non-Unique Keys
Relocation and
Secondary Indices • Node with “Gold” and “Katz” became underfull, and was merged with its sibling
Strings

B-Tree Index
• Parent node becomes underfull, and is merged with its sibling
Files
Comparison
◦ Value separating two nodes (at the parent) is pulled down when merging
Module Summary • Root node then has only one child, and is delete
Database Management Systems Partha Pratim Das 43.33
B+ Tree File Organization

Module 43

Partha Pratim • Index file degradation problem is solved by using B+ Tree indices
Das
• Data file degradation problem is solved by using B+ Tree File Organization
Objectives &
Outline • The leaf nodes in a B+ tree file organization store records, instead of pointers
B+-Tree Index
Files • Leaf nodes are still required to be half full
+
Simple B Tree
Index Files ◦ Since records are larger than pointers, the maximum number of records that can be
Nodes
Observations
stored in a leaf node is less than the number of pointers in a non-leaf node
Query
Duplicates
• Insertion and deletion are handled in the same way as insertion and deletion of entries
Updates in a B+ tree index
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.34
B+ Tree File Organization: Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Example of B+ tree File Organization
Insertion
Deletion
File Organization • Good space utilization important since records use more space than pointers.
Non-Unique Keys
Relocation and
Secondary Indices
• To improve space utilization, involve more sibling nodes in redistribution during splits
Strings and merges
B-Tree Index
Files
◦ Involving 2 siblings in redistribution
  (to avoid split / merge where possible) results
Comparison in each node having at least 2n 3 entries
Module Summary
Database Management Systems Partha Pratim Das 43.35
Non-Unique Search Keys

Module 43

Partha Pratim • Alternatives to scheme described earlier


Das
◦ Buckets on separate block (bad idea)
Objectives &
Outline ◦ List of tuple pointers with each key
B+-Tree Index
Files
. Extra code to handle long lists
Simple B
+
Tree . Deletion of a tuple can be expensive if there are many duplicates on search key
Index Files
Nodes
(why?)
Observations . Low space overhead, no extra cost for queries
Query
Duplicates ◦ Make search key unique by adding a record-identifier
Updates
Insertion . Extra storage overhead for keys
Deletion
File Organization
. Simpler code for insertion/deletion
Non-Unique Keys
Relocation and
. Widely used
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.36
Record Relocation and Secondary Indices

Module 43

Partha Pratim • If a record moves, all secondary indices that store record pointers have to be updated
Das
• Node splits in B+ tree file organizations become very expensive
Objectives &
Outline • Solution: Use primary-index search key instead of record pointer in secondary index
B+-Tree Index
Files ◦ Extra traversal of primary index to locate record
+
Simple B
Index Files
Tree
– Higher cost for queries, but node splits are cheap
Nodes ◦ Add record-id if primary-index search key is non-unique
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.37
Indexing Strings

Module 43

Partha Pratim • Variable length strings as keys


Das
◦ Variable fanout
Objectives &
Outline ◦ Use space utilization as criterion for splitting, not number of pointers
B+-Tree Index
Files
• Prefix compression
Simple B
Index Files
+
Tree
◦ Key values at internal nodes can be prefixes of full key
Nodes
. Keep enough characters to distinguish entries in the subtrees separated by the
Observations
Query key value
Duplicates
Updates − For example, “Silas” and “Silberschatz” can be separated by “Silb”
Insertion
Deletion ◦ Keys in leaf node can be compressed by sharing common prefixes
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.38
B-Tree Index Files PPD

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
B-Tree Index Files
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.39
B-Tree Index Files

Module 43

Partha Pratim • Similar to B+ tree, but B-tree allows search-key values to appear only once; eliminates
Das
redundant storage of search keys
Objectives &
Outline • Search keys in non-leaf nodes appear nowhere else in the B-tree; an additional pointer
B+-Tree Index field for each search key in a non-leaf node must be included
Files
Simple B
+
Tree • Generalized B-tree leaf node
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings • Non-leaf node - pointers Bi are the bucket or file record pointers
B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.40
B-Tree Index File (2): Example

Module 43

Partha Pratim
Das

Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
B-tree (above) and B+ tree (below) on same data
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.41
Comparison of B-Tree and B+ Tree Index Files

Module 43

Partha Pratim • Advantages of B-Tree indices:


Das
◦ May use less tree nodes than a corresponding B+ Tree
Objectives &
Outline ◦ Sometimes possible to find search-key value before reaching leaf node
B+-Tree Index
Files
• Disadvantages of B-Tree indices:
Simple B
Index Files
+
Tree
◦ Only small fraction of all search-key values are found early
Nodes ◦ Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically have
Observations
Query
greater depth than corresponding B+ Tree
Duplicates
◦ Insertion and deletion more complicated than in B+ Trees
Updates
Insertion ◦ Implementation is harder than B+ Trees
Deletion
File Organization • Typically, advantages of B-Trees do not outweigh disadvantages
Non-Unique Keys
Relocation and
Secondary Indices
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.42
Module Summary

Module 43

Partha Pratim • Understood the design of B+ Tree Index Files in depth for database persistent store
Das
• Familiarized with B-Tree Index Files
Objectives &
Outline

B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Slides used in this presentation are borrowed from http://db-book.com/ with kind
Non-Unique Keys permission of the authors.
Relocation and
Secondary Indices Edited and new slides are marked with “PPD”.
Strings

B-Tree Index
Files
Comparison

Module Summary
Database Management Systems Partha Pratim Das 43.43
Module 44

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Static Hashing
Hash Function
Module 44: Indexing and Hashing/4: Hashing
Example
Bucket Overflow

Dynamic Hashing
Example
Partha Pratim Das
Comparison
Schemes

Bitmap Indices Department of Computer Science and Engineering


Module Summary
Indian Institute of Technology, Kharagpur

[email protected]

Database Management Systems Partha Pratim Das 44.1


Module Recap PPD

Module 44

Partha Pratim • Understood the design of B+ Tree Index Files in depth for database persistent store
Das
• Familiarized with B-Tree Index Files
Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.2


Module Objectives PPD

Module 44

Partha Pratim • To explore various hashing schemes – Static and Dynamic Hashing
Das
• To compare Ordered Indexing and Hashing
Objectives &
Outline • To understand the Bitmap Indices
Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.3


Module Outline PPD

Module 44

Partha Pratim • Static Hashing


Das
• Dynamic Hashing
Objectives &
Outline • Comparison of Ordered Indexing and Hashing
Static Hashing
Hash Function • Bitmap Indices
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.4


Static Hashing PPD

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
Static Hashing

Database Management Systems Partha Pratim Das 44.5


Hash Function PPD

Module 44 • A hash function h maps data of arbitrary size (from domain D) to fixed-size values (say,
Partha Pratim
integers from 0 to N > 0 h : D → [0..N]
Das
• Given key k, h(k) is called hash values, hash codes, digests, or simply hashes
Objectives &
Outline • If for two keys k1 6= k2 , we have h(k1 ) = h(k2 ), we say a collision has occured
Static Hashing
Hash Function
• A hash function should be Collision Free and Fast
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.6


Static Hashing

Module 44

Partha Pratim • A bucket is a unit of storage containing one or more records (a bucket is typically a
Das
disk block)
Objectives &
Outline • In a hash file organization we obtain the bucket of a record directly from its
Static Hashing search-key value using a hash function
Hash Function
Example • Hash function h is a function from the set of all search-key values K to the set of all
Bucket Overflow
bucket addresses B
Dynamic Hashing
Example • Hash function is used to locate records for access, insertion as well as deletion
Comparison
Schemes • Records with different search-key values may be mapped to the same bucket; thus
Bitmap Indices entire bucket has to be searched sequentially to locate a record
Module Summary

Database Management Systems Partha Pratim Das 44.7


Example of Hash File Organization

Module 44

Partha Pratim
Hash file organization of instructor file, using dept name as key
Das
• There are 10 buckets
Objectives &
Outline • The binary representation of the i th character is assumed to be the integer i
Static Hashing
Hash Function
• The hash function returns the sum of the binary representations of the characters
Example modulo 10
Bucket Overflow

Dynamic Hashing
◦ For example
Example

Comparison
Schemes
h(Music) = 1 h(History) = 2
Bitmap Indices
h(Physics) = 3 h(Elec. Eng.) = 3
Module Summary

Database Management Systems Partha Pratim Das 44.8


Example of Hash File Organization

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Hash file organization of instructor file, using dept name as key


Database Management Systems Partha Pratim Das 44.9
Hash Functions

Module 44

Partha Pratim • Worst hash function maps all search-key values to the same bucket; this makes access
Das
time proportional to the number of search-key values in the file
Objectives &
Outline • An ideal hash function is uniform, i.e., each bucket is assigned the same number of
Static Hashing search-key values from the set of all possible values
Hash Function
Example • Ideal hash function is random, so each bucket will have the same number of records
Bucket Overflow
assigned to it irrespective of the actual distribution of search-key values in the file
Dynamic Hashing
Example • Typical hash functions perform computation on the internal binary representation of the
Comparison
Schemes
search-key
Bitmap Indices ◦ For example, for a string search-key, the binary representations of all the characters
Module Summary in the string could be added and the sum modulo the number of buckets could be
returned

Database Management Systems Partha Pratim Das 44.10


Handling of Bucket Overflows

Module 44

Partha Pratim • Bucket overflow can occur because of


Das
◦ Insufficient buckets
Objectives &
Outline ◦ Skew in distribution of records. This can occur due to two reasons:
Static Hashing . multiple records have same search-key value
Hash Function
Example . chosen hash function produces non-uniform distribution of key values
Bucket Overflow

Dynamic Hashing
• Although the probability of bucket overflow can be reduced, it cannot be eliminated
Example ◦ it is handled by using overflow buckets
Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.11


Handling of Bucket Overflows (2)

Module 44 • Overflow chaining – the overflow buckets of a given bucket are chained together in a
Partha Pratim
linked list
Das
• Above scheme is called closed hashing
Objectives & ◦ An alternative, called open hashing, which does not use overflow buckets, is not
Outline

Static Hashing
suitable for database applications
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.12


Hash Indices

Module 44

Partha Pratim • Hashing can be used not only for file organization, but also for index-structure creation
Das
• A hash index organizes the search keys, with their associated record pointers, into a
Objectives &
Outline hash file structure
Static Hashing
Hash Function
• Strictly speaking, hash indices are always secondary indices
Example
Bucket Overflow
◦ if the file itself is organized using hashing, a separate primary hash index on it using
Dynamic Hashing
the same search-key is unnecessary
Example ◦ However, we use the term hash index to refer to both secondary index structures
Comparison
Schemes
and hash organized files
Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.13


Example of Hash Index

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

• Hash index on instructor, on attribute ID


• Computed by adding the digits modulo 8
Database Management Systems Partha Pratim Das 44.14
Deficiencies of Static Hashing

Module 44

Partha Pratim • In static hashing, function h maps search-key values to a fixed set of B of bucket
Das
addresses. Databases grow or shrink with time
Objectives &
Outline ◦ If initial number of buckets is too small, and file grows, performance will degrade
Static Hashing due to too much overflows
Hash Function
Example
◦ If space is allocated for anticipated growth, a significant amount of space will be
Bucket Overflow wasted initially (and buckets will be underfull).
Dynamic Hashing ◦ If database shrinks, again space will be wasted
Example

Comparison • One solution: periodic re-organization of the file with a new hash function
Schemes

Bitmap Indices
◦ Expensive, disrupts normal operations
Module Summary • Better solution: allow the number of buckets to be modified dynamically

Database Management Systems Partha Pratim Das 44.15


Dynamic Hashing PPD

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
Dynamic Hashing

Database Management Systems Partha Pratim Das 44.16


Dynamic Hashing

Module 44

Partha Pratim • Good for database that grows and shrinks in size
Das
• Allows the hash function to be modified dynamically
Objectives &
Outline • Extendable hashing – one form of dynamic hashing
Static Hashing
Hash Function
◦ Hash function generates values over a large range — typically b-bit integers, with
Example
Bucket Overflow
b = 32
Dynamic Hashing
◦ At any time use only a prefix of the hash function to index into a table of bucket
Example addresses
Comparison
Schemes
◦ Let the length of the prefix be i bits, 0 ≤ i ≤ 32
Bitmap Indices . Bucket address table size = 2i . Initially i = 0
Module Summary . Value of i grows and shrinks as the size of the database grows and shrinks
◦ Multiple entries in the bucket address table may point to a bucket (why?)
◦ Thus, actual number of buckets is < 2i
. The number of buckets also changes dynamically due to coalescing and splitting
of buckets
Database Management Systems Partha Pratim Das 44.17
General Extendable Hash Structure PPD

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

In this structure, i2 = i3 = i, whereas i1 = i − 1


Decode ij number of bits to find the record in bucket j. ij ≤ i.
Database Management Systems Partha Pratim Das 44.18
Use of Extendable Hash Structure

Module 44

Partha Pratim • Each bucket j stores a value ij


Das
◦ All the entries that point to the same bucket have the same values on the first ij bits
Objectives &
Outline • To locate the bucket containing search-key Kj
Static Hashing
Hash Function
◦ Compute h(Kj ) = X
Example ◦ Use the first i high order bits of X as a displacement into bucket address table, and
Bucket Overflow

Dynamic Hashing
follow the pointer to appropriate bucket
Example
• To insert a record with search-key value Kj
Comparison
Schemes ◦ Follow same procedure as look-up and locate the bucket, say j
Bitmap Indices ◦ If there is room in the bucket j insert record in the bucket
Module Summary ◦ Else the bucket must be split and insertion re-attempted (next slide)
. Overflow buckets used instead in some cases (will see shortly)

Database Management Systems Partha Pratim Das 44.19


Insertion in Extendable Hash Structure

Module 44
To split a bucket j when inserting record with search-key value Kj
Partha Pratim
Das
• If i > ij (more than one pointer to bucket j)
Objectives & ◦ Allocate a new bucket z, and set ij = iz = (ij + 1)
Outline

Static Hashing
◦ Update the second half of the bucket address table entries originally pointing to j,
Hash Function to point to z
Example
Bucket Overflow
◦ Remove each record in bucket j and reinsert (in j or z)
Dynamic Hashing ◦ Recompute new bucket for Kj and insert record in the bucket (further splitting is
Example
required if the bucket is still full)
Comparison
Schemes • If i = ij (only one pointer to bucket j)
Bitmap Indices ◦ If i reaches some limit b, or too many splits have happened in this insertion, create
Module Summary
an overflow bucket
◦ Else
. Increment i and double the size of the bucket address table
. Replace each entry in the table by two entries that point to the same bucket
. Recompute new bucket address table entry for Kj . Now i > ij so use the first
case
Database Management above
Systems Partha Pratim Das 44.20
Deletion in Extendable Hash Structure

Module 44

Partha Pratim • To delete a key value,


Das
◦ locate it in its bucket and remove it
Objectives &
Outline ◦ The bucket itself can be removed if it becomes empty (with appropriate updates to
Static Hashing the bucket address table)
Hash Function
Example
◦ Coalescing of buckets can be done (can coalesce only with a “buddy” bucket having
Bucket Overflow same value of ij and same ij –1 prefix, if it is present)
Dynamic Hashing ◦ Decreasing bucket address table size is also possible
Example

Comparison . Note: decreasing bucket address table size is an expensive operation and should
Schemes
be done only if number of buckets becomes much smaller than the size of the
Bitmap Indices
table
Module Summary

Database Management Systems Partha Pratim Das 44.21


Use of Extendable Hash Structure: Example

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.22


Example (2)

Module 44

Partha Pratim
Das

Objectives &
Outline
• Initial Hash structure; bucket size = 2
Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
• Insert “Mozart”, “Srinivasan”, and “Wu” records

Database Management Systems Partha Pratim Das 44.23


Example (3)

Module 44

Partha Pratim
• Hash structure after insertion of “Mozart”, “Srini-
Das vasan”, and “Wu” records
Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
• Insert Einstein record

Database Management Systems Partha Pratim Das 44.24


Example (4)

Module 44
• Hash structure after insertion of “Einstein” record
Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
• Insert “Gold” and “El Said” records

Database Management Systems Partha Pratim Das 44.25


Example (5)

Module 44
• Hash structure after insertion of “Gold” and “El
Partha Pratim
Das
Said” records
Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

• Insert Katz record

Database Management Systems Partha Pratim Das 44.26


Example (6)
• Hash structure after insertion of “Katz” record
Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

• Insert “Singh”, “Califieri”, “Crick”, “Brandt”


records

Database Management Systems Partha Pratim Das 44.27


Example (7)

Module 44 • Hash structure after insertion of “Singh”, “Cali-


Partha Pratim fieri”, “Crick”, “Brandt” records
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

• Insert Kim record


Database Management Systems Partha Pratim Das 44.28
Example (8)

Module 44

Partha Pratim
• Hash structure after insertion of “Kim” record
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.29


Comparison Schemes PPD

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
Comparison Schemes

Database Management Systems Partha Pratim Das 44.30


Extendable Hashing vs. Other Schemes

Module 44

Partha Pratim • Benefits of extendable hashing:


Das
◦ Hash performance does not degrade with growth of file
Objectives &
Outline ◦ Minimal space overhead
Static Hashing • Disadvantages of extendable hashing
Hash Function
Example ◦ Extra level of indirection to find desired record
Bucket Overflow

Dynamic Hashing
◦ Bucket address table may itself become very big (larger than memory)
Example . Cannot allocate very large contiguous areas on disk either
Comparison
Schemes
. Solution: B+ -tree structure to locate desired record in bucket address table
Bitmap Indices ◦ Changing size of bucket address table is an expensive operation
Module Summary
• Linear hashing is an alternative mechanism
◦ Allows incremental growth of its directory (equivalent to bucket address table)
◦ At the cost of more bucket overflows

Database Management Systems Partha Pratim Das 44.31


Comparison of Ordered Indexing and Hashing

Module 44

Partha Pratim • Cost of periodic re-organization


Das
• Relative frequency of insertions and deletions
Objectives &
Outline • Is it desirable to optimize average access time at the expense of worst-case access time?
Static Hashing
Hash Function • Expected type of queries:
Example
Bucket Overflow ◦ Hashing is generally better at retrieving records having a specified value of the key
Dynamic Hashing ◦ If range queries are common, ordered indices are to be preferred
Example

Comparison
• In practice:
Schemes
◦ PostgreSQL supports hash indices, but discourages use due to poor performance
Bitmap Indices
◦ Oracle supports static hash organization, but not hash indices
Module Summary
◦ SQLServer supports only B+ -trees

Database Management Systems Partha Pratim Das 44.32


Bitmap Indices PPD

Module 44

Partha Pratim
Das

Objectives &
Outline

Static Hashing
Hash Function
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary
Bitmap Indices

Database Management Systems Partha Pratim Das 44.33


Bitmap Indices

Module 44

Partha Pratim • Bitmap indices are a special type of index designed for efficient querying on multiple
Das
keys
Objectives &
Outline • Records in a relation are assumed to be numbered sequentially from, say, 0
Static Hashing ◦ Given a number n it must be easy to retrieve record n
Hash Function
Example . Particularly easy if records are of fixed size
Bucket Overflow

Dynamic Hashing • Applicable on attributes that take on a relatively small number of distinct values
Example
◦ For example: gender, country, state, . . .
Comparison
Schemes ◦ For example: income-level (income broken up into a small number of levels such as
Bitmap Indices 0-9999, 10000-19999, 20000-50000, 50000- infinity)
Module Summary
• A bitmap is simply an array of bits

Database Management Systems Partha Pratim Das 44.34


Bitmap Indices (2)

Module 44

Partha Pratim • In its simplest form a bitmap index on an attribute has a bitmap for each value of the
Das
attribute
Objectives &
Outline ◦ Bitmap has as many bits as records
Static Hashing ◦ In a bitmap for value v, the bit for a record is 1 if the record has the value v for the
Hash Function
Example
attribute, and is 0 otherwise
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Database Management Systems Partha Pratim Das 44.35


Bitmap Indices (3)

Module 44

Partha Pratim • Bitmap indices are useful for queries on multiple attributes
Das
◦ not particularly useful for single attribute queries
Objectives &
Outline • Queries are answered using bitmap operations
Static Hashing
Hash Function
◦ Intersection (and)
Example ◦ Union (or)
Bucket Overflow

Dynamic Hashing
◦ Complementation (not)
Example
• Each operation takes two bitmaps of the same size and applies the operation on
Comparison
Schemes corresponding bits to get the result bitmap
Bitmap Indices ◦ For example: 100110 AND 110011 = 100010
Module Summary 100110 OR 110011 = 110111
NOT 100110 = 011001
◦ Males with income level L1: 10010 AND 10100 = 10000
. Can then retrieve required tuples
. Counting number of matching tuples is even faster
Database Management Systems Partha Pratim Das 44.36
Bitmap Indices (4)

Module 44

Partha Pratim • Bitmap indices generally very small compared with relation size
Das
◦ For example, if record is 100 bytes, space for a single bitmap is 1/800 of space used
Objectives &
Outline by relation
Static Hashing . If number of distinct attribute values is 8, bitmap is only 1% of relation size
Hash Function
Example • Deletion needs to be handled properly
Bucket Overflow

Dynamic Hashing
◦ Existence bitmap to note if there is a valid record at a record location
Example ◦ Needed for complementation
Comparison
Schemes . not(A=v ): (NOT bitmap-A-v) AND ExistenceBitmap
Bitmap Indices
• Should keep bitmaps for all values, even null value
Module Summary
◦ To correctly handle SQL null semantics for NOT(A=v ):
. intersect above result with (NOT bitmap-A-Null)

Database Management Systems Partha Pratim Das 44.37


Bitmap Indices (5): Efficient Bitmap Operations

Module 44

Partha Pratim • Bitmaps are packed into words; a single word and (a basic CPU instruction) computes
Das
and of 32 or 64 bits at once
Objectives &
Outline ◦ For example, 1-million-bit maps can be and-ed with just 31,250 instruction
Static Hashing • Counting number of 1s can be done fast by a trick:
Hash Function
Example ◦ Use each byte to index into a precomputed array of 256 elements each storing the
Bucket Overflow

Dynamic Hashing
count of 1s in the binary representation
Example . Can use pairs of bytes to speed up further at a higher memory cost
Comparison
Schemes ◦ Add up the retrieved counts
Bitmap Indices
• Bitmaps can be used instead of Tuple-ID lists at leaf levels of B+ -trees, for values that
Module Summary
have a large number of matching records
◦ Worthwhile if > 1/64 of the records have that value, assuming a tuple-id is 64 bits
◦ Above technique merges benefits of bitmap and B+ -tree indices

Database Management Systems Partha Pratim Das 44.38


Module Summary

Module 44

Partha Pratim • Explored various hashing schemes – Static and Dynamic Hashing
Das
• Compared Ordered Indexing and Hashing
Objectives &
Outline • Studied the use of Bitmap Indices for fast access of columns with limited number of
Static Hashing
Hash Function
distinct values
Example
Bucket Overflow

Dynamic Hashing
Example

Comparison
Schemes

Bitmap Indices

Module Summary

Slides used in this presentation are borrowed from http://db-book.com/ with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 44.39
Module 45

Partha Pratim
Das

Objectives &
Outline Database Management Systems
Index Definition
in SQL Module 45: Indexing and Hashing/5: Index Design
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules Partha Pratim Das
Rule 0
Rule 1
Rule 2 Department of Computer Science and Engineering
Rule 3 Indian Institute of Technology, Kharagpur
Rule 4
Rule 5
[email protected]
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.1


Module Recap PPD

Module 45

Partha Pratim • Explored various hashing schemes – Static and Dynamic Hashing
Das
• Compared Ordered Indexing and Hashing
Objectives &
Outline • Studied the use of Bitmap Indices for fast access of columns with limited number of
Index Definition
in SQL
distinct values
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.2


Module Objectives PPD

Module 45

Partha Pratim • To discuss how Indexes can be created in SQL


Das
• To deliberate on good index designs in terms of Guidelines for Indexing
Objectives &
Outline

Index Definition
in SQL
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.3


Module Outline PPD

Module 45

Partha Pratim • Index Definition in SQL


Das
• Guidelines for Indexing
Objectives &
Outline

Index Definition
in SQL
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.4


Index Definition in SQL PPD

Module 45

Partha Pratim
Das

Objectives &
Outline

Index Definition
in SQL
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Index Definition in SQL
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.5


Index in SQL PPD

Module 45
• Create an index
Partha Pratim
Das
create index <index-name> on <relation-name> (<attribute-list>)
For example: create index b-index on branch (branch name)
Objectives &
Outline
• Use create unique index to indirectly specify and enforce the condition that the search
Index Definition
in SQL key is a candidate key
Multiple-Key Access
Privileges
◦ Not really required if SQL unique integrity constraint is supported – it is preferred
Guidelines for
Indexing
• To drop an index
Ground Rules drop index <index-name>
Rule 0
Rule 1 • Most database systems allow specification of type of index, and clustering
Rule 2
Rule 3 ◦ You can also create an index for a cluster
Rule 4
Rule 5
◦ You can create a composite index on multiple columns up to a maximum of 32
Rule 6 columns
Module Summary
. A composite index key cannot exceed roughly one-half (minus some overhead)
of the available space in the data block

Database Management Systems Partha Pratim Das 45.6


Index in SQL: Examples PPD

Module 45 • Create an index for a single column, to speed up queries that test that column:
Partha Pratim ◦ CREATE INDEX emp ename ON emp tab(ename);
Das
• Specify several storage settings explicitly for the index:
Objectives &
Outline ◦ CREATE INDEX emp ename ON emp tab(ename)
Index Definition
TABLESPACE users // Allocation of space in the Database to contain schema objects
in SQL STORAGE ( // Specify how Database should store a database object
Multiple-Key Access
INITIAL 20K // Specify the size of the 1st extent of the object
Privileges
NEXT 20K // Specify in bytes the size of the 2nd extent to be allocated to the object
Guidelines for
Indexing PCTINCREASE 75) // Specify the percent by which later extents grow over
Ground Rules PCTFREE 0 // 0% of each data block in this table’s data segment be free for updates
Rule 0 COMPUTE STATISTICS;
Rule 1
Rule 2
◦ Create index on two columns, to speed up queries that test either the first column or both columns:
Rule 3 . CREATE INDEX emp ename ON emp tab(ename, empno) COMPUTE STATISTICS;
Rule 4
Rule 5
◦ If a query is going to sort on the function UPPER(ENAME), an index on the ENAME column itself
Rule 6 would not speed up this operation, and it might be slow to call the function for each result row
Module Summary . A function-based index precomputes the result of the function for each column value, speeding
up queries that use the function for searching or sorting:
CREATE INDEX emp upper ename ON emp tab(UPPER(ename)) COMPUTE STATISTICS;
Source: Selecting an Index Strategy
Database Management Systems Partha Pratim Das 45.7
Index in SQL: Bitmap PPD

Module 45

Partha Pratim • create bitmap index <index-name> on <relation-name>(<attribute-list>)


Das
• Example:
Objectives &
Outline ◦ Student (Student ID, Name, Address, Age, Gender, Semester)
Index Definition
in SQL
◦ CREATE BITMAP INDEX Idx Gender ON Student (Gender);
Multiple-Key Access ◦ CREATE BITMAP INDEX Idx Semester ON Student (Semester);
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6 • SELECT * FROM Student WHERE Gender = ‘F’ AND Semester =4;
Module Summary ◦ AND 0 1 1 1 with 0 0 0 1 to get the result

Database Management Systems Partha Pratim Das 45.8


Multiple-Key Access

Module 45

Partha Pratim • Use multiple indices for certain types of queries


Das
• Example:
Objectives &
Outline select ID
Index Definition from instructor
in SQL
Multiple-Key Access where dept name = “Finance” and salary = 80000
Privileges

Guidelines for
• Possible strategies for processing query using indices on single attributes:
Indexing
Ground Rules
◦ Use index on dept name to find instructors with department name Finance; test
Rule 0 salary = 80000
Rule 1
Rule 2 ◦ Use index on salary to find instructors with a salary of 80000; test dept name =
Rule 3
Rule 4
“Finance”
Rule 5 ◦ Use dept name index to find pointers to all records pertaining to the “Finance”
Rule 6
department. Similarly use index on salary. Take intersection of both sets of pointers
Module Summary
obtained

Database Management Systems Partha Pratim Das 45.9


Multiple-Key Access (2): Indices

Module 45

Partha Pratim • Composite Search Keys are search keys containing more than one attribute
Das
◦ For example, (dept name, salary )
Objectives &
Outline • Lexicographic ordering: (a1 , a2 ) < (b1 , b2 ) if either
Index Definition
in SQL ◦ a1 < b1 , or
Multiple-Key Access
Privileges
◦ a1 = b1 and a2 < b2
Guidelines for • Hence, the order is important
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.10


Multiple-Key Access (3): Indices on Multiple Attributes

Module 45

Partha Pratim
Suppose we have an index on combined search-key:
Das (dept name, salary )
Objectives &
Outline • With the where clause
Index Definition
in SQL
where dept name = “Finance” and salary = 80000
Multiple-Key Access the index on (dept name, salary ) can be used to fetch only records that satisfy both
Privileges
conditions.
Guidelines for
Indexing ◦ Using separate indices in less efficient - we may fetch many records (or pointers)
Ground Rules
Rule 0 that satisfy only one of the conditions
Rule 1
Rule 2 ◦ Can also efficiently handle
Rule 3
Rule 4
where dept name = “Finance” and salary < 80000
Rule 5 ◦ But cannot efficiently handle
Rule 6
where dept name < “Finance” and balance = 80000
Module Summary
. May fetch many records that satisfy the first but not the second condition

Database Management Systems Partha Pratim Das 45.11


Privileges Required to Create an Index PPD

Module 45

Partha Pratim • When using indexes in an application, you might need to request that the DBA grant
Das
privileges or make changes to initialization parameters
Objectives &
Outline • To create a new index
Index Definition
in SQL
◦ You must own, or have the INDEX object privilege for the corresponding table
Multiple-Key Access ◦ The schema that contains the index must also have a quota for the tablespace
Privileges
intended to contain the index, or the UNLIMITED TABLESPACE system privilege
Guidelines for
Indexing ◦ To create an index in another user’s schema, you must have the CREATE ANY
Ground Rules
Rule 0
INDEX system privilege
Rule 1
Rule 2 • Function-based indexes also require the QUERY REWRITE privilege, and that the
Rule 3
Rule 4
QUERY REWRITE ENABLED initialization parameter to be set to TRUE
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.12


Guidelines for Indexing PPD

Module 45

Partha Pratim
Das

Objectives &
Outline

Index Definition
in SQL
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Guidelines for Indexing
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.13


Guidelines for Indexing PPD

Module 45
• In Modules 16 to 20 (Week 4), we have studied various issues for a proper design of a
Partha Pratim
Das
relational database system. This focused on:
Objectives &
◦ Normalization of Tables leading to
Outline
. Reduction of Redundancy to minimize possibilities of Anomaly
Index Definition
in SQL . Easier adherence to constraints (various dependencies)
Multiple-Key Access
Privileges
. Efficiency of access and update – a better normalized design often gives better
Guidelines for
performance
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.14


Guidelines for Indexing (2) PPD

Module 45
• The performance of a database system, however, is also significantly impacted by the
Partha Pratim
Das
way the data is physically organized and managed. These are done through:
Objectives &
◦ Indexing and Hashing
Outline
• While normalization and design are startup time activities that are usually performed
Index Definition
in SQL once at the beginning (and rarely changed later), the performance behavior continues
Multiple-Key Access
Privileges
to evolve as the database is used over time. Hence we need to continually:
Guidelines for ◦ Collect Statistics about data (of various tables) to learn of the patterns, and
Indexing
Ground Rules ◦ Adjust the Indexes on the tables to optimize performance
Rule 0
Rule 1 • There is no sound theory that determines optimal performance. Rather, we take a quick
Rule 2
Rule 3
look into a few common guidelines that can help you keep your database agile in its
Rule 4 behavior
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.15


Guidelines for Indexing: Ground Rules PPD

Module 45

Partha Pratim • Some guidelines - heuristic and common sense, but time-tested - are summarized here
Das
as a set of Ground Rules for Indexing
Objectives &
Outline ◦ Rule 0: Indexes lead to Access – Update Tradeoff
Index Definition ◦ Rule 1: Index the Correct Tables
in SQL
Multiple-Key Access
◦ Rule 2: Index the Correct Columns
Privileges
◦ Rule 3: Limit the Number of Indexes for Each Table
Guidelines for
Indexing ◦ Rule 4: Choose the Order of Columns in Composite Indexes
Ground Rules
◦ Rule 5: Gather Statistics to Make Index Usage More Accurate
Rule 0
Rule 1 ◦ Rule 6: Drop Indexes That Are No Longer Required
Rule 2
Rule 3 • These rules are explained in the following slides
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.16


Guidelines for Indexing: Rule 0 PPD

Module 45

Partha Pratim • Rule 0: Indexes lead to Access – Update Tradeoff


Das
◦ Every query (access) results in a ‘search’ on the underlying physical data structures
Objectives &
Outline . Having specific index on search field can significantly improve performance
Index Definition
in SQL
◦ Every update (insert / delete / values update) results in update of the index files –
Multiple-Key Access an overhead or penalty for quicker access
Privileges

Guidelines for . Having unnecessary indexes can cause significant degradation of performance of
Indexing
Ground Rules
various operations
Rule 0 . Index files may also occupy significant space on your disk and / or
Rule 1
Rule 2 . Cause slow behavior due to memory limitations during index computations
Rule 3
Rule 4
◦ Use informed judgment to index!
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.17


Guidelines for Indexing: Rule 1 PPD

Module 45

Partha Pratim • Rule 1: Index the Correct Tables


Das
◦ Create an index if you frequently want to retrieve less than 15% of the rows in a
Objectives &
Outline large table
Index Definition
in SQL
. The percentage varies greatly according to the relative speed of a table scan and
Multiple-Key Access how clustered the row data is about the index key
Privileges

Guidelines for
− The faster the table scan, the lower the percentage
Indexing − More clustered the row data, the higher the percentage
Ground Rules
Rule 0 • Index columns used for joins to improve performance on joins of multiple tables
Rule 1
Rule 2
• Primary and unique keys automatically have indexes, but you might want to create an
Rule 3
Rule 4 index on a foreign key
Rule 5
Rule 6 • Small tables do not require indexes
Module Summary
◦ If a query is taking too long, then the table might have grown from small to large

Database Management Systems Partha Pratim Das 45.18


Guidelines for Indexing: Rule 2 PPD

Module 45 • Rule 2: Index the Correct Columns


Partha Pratim
Das
◦ Columns with the following characteristics are candidates for indexing:
. Values are relatively unique in the column
Objectives &
Outline . There is a wide range of values (good for regular indexes)
Index Definition
in SQL
. There is a small range of values (good for bitmap indexes)
Multiple-Key Access . The column contains many nulls, but queries often select all rows having a
Privileges
value. In this case, a comparison that matches all the non-null values, such as:
Guidelines for
Indexing − WHERE COL X > -9.99 *power(10, 125) is preferable to WHERE COL X
Ground Rules
Rule 0 IS NOT NULL
Rule 1
Rule 2
− This is because the first uses an index on COL X (if COL X is a numeric
Rule 3 column)
Rule 4
Rule 5 ◦ Columns with the following characteristics are less suitable for indexing:
Rule 6

Module Summary
. There are many nulls in the column and you do not search on the non-null values
. LONG and LONG RAW columns cannot be indexed
◦ The size of a single index entry cannot exceed roughly one-half (minus some
overhead) of the available space in the data block
Database Management Systems Partha Pratim Das 45.19
Guidelines for Indexing: Rule 3 PPD

Module 45

Partha Pratim • Rule 3: Limit the Number of Indexes for Each Table
Das
◦ The more indexes, the more overhead is incurred as the table is altered
Objectives &
Outline . When rows are inserted or deleted, all indexes on the table must be updated
Index Definition
in SQL
. When a column is updated, all indexes on the column must be updated
Multiple-Key Access ◦ You must weigh the performance benefit of indexes for queries against the
Privileges

Guidelines for
performance overhead of updates
Indexing
Ground Rules
. If a table is primarily read-only, you might use more indexes; but, if a table is
Rule 0 heavily updated, you might use fewer indexes
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.20


Guidelines for Indexing: Rule 4 PPD

Module 45 • Rule 4: Choose the Order of Columns in Composite Indexes


Partha Pratim ◦ The order of columns in the CREATE INDEX statement can affect
Das
performance
Objectives &
Outline . Put the column used most often first in the index
Index Definition . You can create a composite index (using several columns), and
in SQL
Multiple-Key Access the same index can be used for queries that reference all of
Privileges
these columns, or just some of them
Guidelines for
Indexing • For the VENDOR PARTS table, assume that there are 5 vendors, and each vendor has
Ground Rules
Rule 0
about 1000 parts. Suppose VENDOR PARTS is commonly queried as:
Rule 1
Rule 2
◦ SELECT * FROM vendor parts WHERE part no = 457 AND vendor id = 1012;
Rule 3 ◦ Create a composite index with the most selective (with most values) column first
Rule 4
Rule 5 . CREATE INDEX ind vendor id ON vendor parts (part no, vendor id);
Rule 6

Module Summary
• Composite indexes speed up queries that use the leading portion of the index:
◦ So queries with WHERE clauses using only PART NO column also runs faster
◦ With only 5 distinct values, a separate index on VENDOR ID does not help
Database Management Systems Partha Pratim Das 45.21
Guidelines for Indexing: Rule 5 PPD

Module 45

Partha Pratim • Rule 5: Gather Statistics to Make Index Usage More Accurate
Das
◦ The database can use indexes more effectively when it has statistical information
Objectives &
Outline about the tables involved in the queries
Index Definition
in SQL
. Gather statistics when the indexes are created by including the keywords
Multiple-Key Access COMPUTE STATISTICS in the CREATE INDEX statement
Privileges
. As data is updated and the distribution of values changes, periodically refresh
Guidelines for
Indexing the statistics by calling procedures like (in Oracle):
Ground Rules
Rule 0 − DBMS STATS.GATHER TABLE STATISTICS and
Rule 1
Rule 2
− DBMS STATS.GATHER SCHEMA STATISTICS
Rule 3
Rule 4
Rule 5
Rule 6

Module Summary

Database Management Systems Partha Pratim Das 45.22


Guidelines for Indexing: Rule 6 PPD

Module 45

Partha Pratim • Rule 6: Drop Indexes That Are No Longer Required


Das
◦ You might drop an index if:
Objectives &
Outline . It does not speed up queries. The table might be very small, or there might be
Index Definition
in SQL
many rows in the table but very few index entries
Multiple-Key Access . The queries in your applications do not use the index
Privileges
. The index must be dropped before being rebuilt
Guidelines for
Indexing ◦ When you drop an index, all extents of the index’s segment are returned to the
Ground Rules
Rule 0 containing tablespace and become available for other objects in the tablespace
Rule 1
Rule 2
◦ Use the SQL command DROP INDEX to drop an index. For example, the following
Rule 3 statement drops a specific named index:
Rule 4
Rule 5 . DROP INDEX Emp ename;
Rule 6

Module Summary
◦ If you drop a table, then all associated indexes are dropped
◦ To drop an index, the index must be contained in your schema or you must have the
DROP ANY INDEX system privilege
Database Management Systems Partha Pratim Das 45.23
Module Summary

Module 45

Partha Pratim • Learnt to create Indexes in SQL


Das
• Introduced the set of Ground Rules for Indexing
Objectives &
Outline

Index Definition
in SQL
Multiple-Key Access
Privileges

Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Slides used in this presentation are borrowed from http://db-book.com/ with kind
Rule 4
Rule 5 permission of the authors.
Rule 6
Edited and new slides are marked with “PPD”.
Module Summary

Database Management Systems Partha Pratim Das 45.24

You might also like