Lecture 9
Lecture 9
Partha Pratim
Das
Week Recap
Objectives &
Database Management Systems
Outline
Module 41: Indexing and Hashing/1: Indexing/1
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files Partha Pratim Das
Primary and
Secondary Indices
Multilevel Index
Index Update
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
Module Summary
[email protected]
Module 41
• Need for algorithm analysis, Asymptotic complexity, and Worst-case, average-case and
Partha Pratim
Das
best-case analysis
Week Recap
• Reviewed Linear Data Structures; array, list, stack, queue; and linear and binary search
Objectives & • Reviewed Non-linear Data Structures - graph, tree, hash table; Binary Search Tree; and
Outline
Indexing
compared Linear and Non-Linear Data Structures
Metrics
• Understood the range of Physical Storage Media
Ordered Indices
Dense Index Files • Studied about Magnetic Disks and Magnetic Tape
Sparse Index Files
Primary and
Secondary Indices
• Glimpsed through Other Storage and the Future of Storage
Multilevel Index
Index Update
• Familiarized with the organization for database files
Module Summary • Understood how records and relations are organized in files
• Learnt how databases keep their own information in Data-Dictionary Storage – the
metadata database of a database
• Understood the mechanisms for fast access of a database store
Module 41
Partha Pratim • To understand the reasons for which we need to index database table
Das
• To learn about the ordered indexes and Indexed Sequential Access Mechanism
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Concepts of Indexing
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index • How to search on Name?
Index Update
◦ Get the phone number for ‘Pabitra Mitra’
Module Summary
◦ Use “Name” Index – sorted on ‘Name’, search ‘Pabitra Mitra’ and navigate on pointer (rec #)
• How to search on Phone?
◦ Get the name of the faculty having phone number = 84772
◦ Use “Phone” Index – sorted on ‘Phone’, search ‘84772’ and navigate on pointer (rec #)
• We can keep the records sorted on ‘Name’ or on ‘Phone’ (called the primary index), but not on both
Database Management Systems Partha Pratim Das 41.6
Basic Concepts
Module 41
Objectives &
. Name in a faculty table
Outline . author catalog in library
Indexing
Metrics
• Search Key - attribute to set of attributes used to look up records in a file
Ordered Indices • An index file consists of records (called index entries) of the form
Dense Index Files
Sparse Index Files
Primary and search-key pointer
Secondary Indices
Multilevel Index
Index Update • Index files are typically much smaller than the original file
Module Summary
• Two basic kinds of indices:
◦ Ordered indices: search keys are stored in sorted order
◦ Hash indices: search keys are distributed uniformly across buckets using a hash
function
Module 41
Module Summary
Module 41
Partha Pratim
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Ordered Indices
Module 41
Partha Pratim • In an ordered index, index entries are stored sorted on the search key value. For
Das
example, author catalog in library
Week Recap
• Primary index: in a sequentially ordered file, the index whose search key specifies the
Objectives &
Outline sequential order of the file
Indexing
Metrics
◦ Also called clustering index
Ordered Indices
◦ The search key of a primary index is usually but not necessarily the primary key
Dense Index Files
Sparse Index Files
• Secondary index: an index whose search key specifies an order different from the
Primary and
Secondary Indices
sequential order of the file
Multilevel Index
Index Update
◦ Also called non-clustering index
Module Summary • Index-sequential file: ordered sequential file with a primary index
Module 41
Partha Pratim • Dense index — Index record appears for every search-key value in the file.
Das
• For example, index on ID attribute of instructor relation
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim • Dense index on dept name, with instructor file sorted on dept name
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim • Sparse Index: contains index records for only some search-key values.
Das
◦ Applicable when records are sequentially ordered on search-key
Week Recap
Objectives &
• To locate a record with search-key value K we:
Outline
◦ Find index record with largest search-key value < K
Indexing
Metrics
◦ Search file sequentially starting at the record to which the index record points
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Module Summary
Module 41
Partha Pratim
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Secondary index on salary field of instructor
• Index record points to a bucket that contains pointers to all the actual records with
that particular search-key value.
• Secondary indices have to be dense
Database Management Systems Partha Pratim Das 41.15
Primary and Secondary Indices
Module 41
Partha Pratim • Indices offer substantial benefits when searching for records
Das
• BUT: Updating indices imposes overhead on database modification –when a file is
Week Recap
modified, every index on the file must be updated
Objectives &
Outline
• Sequential scan using primary index is efficient, but a sequential scan using a secondary
Indexing
Metrics
index is expensive
Ordered Indices ◦ Each record access may fetch a new block from disk
Dense Index Files
Sparse Index Files
◦ Block fetch requires about 5 to 10 milliseconds, versus about 100 nanoseconds for
Primary and
Secondary Indices
memory access
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim • If primary index does not fit in memory, access becomes expensive
Das
• Solution: treat primary index kept on disk as a sequential file and construct a sparse
Week Recap
index on it
Objectives &
Outline ◦ outer index – a sparse index of primary index
Indexing
Metrics
◦ inner index – the primary index file
Ordered Indices • If even outer index is too large to fit in main memory, yet another level of index can be
Dense Index Files
Sparse Index Files
created, and so on
Primary and
Secondary Indices • Indices at all levels must be updated on insertion or deletion from the file
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim
Das
• If deleted record was the only
Week Recap record in the file with its partic-
Objectives &
Outline
ular search-key value, the search-
Indexing
key is deleted from the index also.
Metrics
Ordered Indices
Dense Index Files • Single-level index entry deletion:
Sparse Index Files
Primary and
Secondary Indices
◦ Dense indices – deletion of search-key is similar to file record deletion
Multilevel Index ◦ Sparse indices –
Index Update
Module Summary
. If an entry for the search key exists in the index, it is deleted by replacing the
entry in the index with the next search-key value in the file (in search-key order)
. If the next search-key value already has an index entry, the entry is deleted
instead of being replaced
Module 41
Ordered Indices
. If a new block is created, the first search-key value appearing in the new block is
Dense Index Files inserted into the index
Sparse Index Files
Primary and
Secondary Indices
• Multilevel insertion and deletion: algorithms are simple extensions of the single-level
Multilevel Index algorithms
Index Update
Module Summary
Module 41
Partha Pratim • Frequently, one wants to find all the records whose values in a certain field (which is
Das
not the search-key of the primary index) satisfy some condition
Week Recap
◦ Example 1: In the instructor relation stored sequentially by ID, we may want to find
Objectives &
Outline all instructors in a particular department
Indexing ◦ Example 2: as above, but where we want to find all instructors with a specified
Metrics
salary or with salary in a specified range of values
Ordered Indices
Dense Index Files • We can have a secondary index with an index record for each search-key value
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Slides used in this presentation are borrowed from http://db-book.com/ with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Balanced BST
Module 42: Indexing and Hashing/2: Indexing/2
2-3-4 Tree
Search
Insert
Split
Example
Delete
Partha Pratim Das
Observations
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim • To recap Balanced Binary Search Trees as options for optimal in-memory search data
Das
structures
Objectives &
Outline • To understand the issues relating to external search data structures for persistent data
Balanced BST
• To study 2-3-4 Tree as a precursor to B/B+-Tree for an efficient external data
2-3-4 Tree
Search
structure for database and index tables
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim
Das
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
• How to search a key in a list of n data items?
Partha Pratim
◦ Linear Search: O(n): Find 28 ⇒ 16 comparisons
Das . Unordered items in an array – search sequentially
Objectives &
. Unordered / Ordered items in a list – search sequentially
Outline
Balanced BST
2-3-4 Tree
◦ Binary Search: O(lg n): Find 28 ⇒ 4 comparisons – 25, 36, 30, 28
Search . Ordered items in an array – search by divide-and-conquer
Insert
Split
Example
Delete
. Binary Search Tree – recursively on left / right
Observations
Module Summary
Module 42
Partha Pratim • Worst case time (n data items in the data structure):
Das
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
• Between an array and a list, there is a trade-off between search and insert/delete
complexity
• For a BST of n nodes, lg n ≤ h < n, where h is the height of the tree
• A BST is balanced if h ∼ O(lg n): this what we desire
Module 42
• In the worst case, searching a key in a BST is O(h), where h is the height of the key
Partha Pratim
Das • Bad Tree: h ∼ O(n)
Objectives & ◦ The BST is a skewed binary search tree (all the nodes except the leaf would have
Outline
only one child)
Balanced BST
◦ This can happen if keys are inserted in sorted order
2-3-4 Tree
Search ◦ Height (h) of the BST having n elements becomes n − 1
Insert
Split
◦ Time complexity of search in BST becomes O(n)
Example
Delete
• Good Tree: h ∼ O(lg n)
Observations
◦ The BST is a balanced binary search tree
Module Summary
◦ This is possible if
. If keys are inserted in purely randomized order, Or
. If the tree is explicitly balanced after every insertion
◦ Height (h) of the binary search tree becomes lg n
◦ Time complexity of search in BST becomes O(lg n)
Module 42
• A BST is balanced if h ∼ O(lg n)
Partha Pratim
Das • Balancing Guarantees may be of various types:
Objectives & ◦ Worst-case
Outline
Balanced BST
. AVL Tree: Self-balancing BST
2-3-4 Tree
− Named after inventors Adelson-Velsky-Landis
Search − Heights of the two child subtrees of any node differ by at most one: |hL − hR | ≤ 1
Insert − If they differ by more than one, rebalancing is done rotation
Split
Example ◦ Randomized
Delete
Observations . Randomized BST
Module Summary − A BST on n keys is random if either it is empty (n = 0), or the probability that a given
1
key is at the root is n
, and the left and right subtrees are random
. Skip List
− A skip list is built (probabbilistically) in layers of ordered linked lists
◦ Amortized
. Splay
− A BST where recently accessed elements are quick to access again
Database Management Systems Partha Pratim Das 42.9
Balanced Binary Search Trees (2) PPD
Module 42
• These data structures have optimal complexity for the required operations:
Partha Pratim
Das ◦ Search: O(lg n)
Objectives &
◦ Insert: Search + O(1): O(lg n)
Outline ◦ Delete: Search + O(1): O(lg n)
Balanced BST
• And they are:
2-3-4 Tree
Search ◦ Good for in-memory operations
Insert
Split ◦ Work well for small volume of data
Example
Delete
◦ Has complex rotation and / or similar operations
Observations ◦ Do not scale for external data structures
Module Summary
Module 42
Partha Pratim
Das
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
2-3-4 Tree
Module 42
Partha Pratim • All leaves are at the same depth (the bottom level).
Das
◦ Height, h, of all leaf nodes are same
Objectives &
Outline . h ∼ O(lg n)
Balanced BST . Complexity of search, insert and delete: O(h) ∼ O(lg n)
2-3-4 Tree
Search
• All data is kept in sorted order
Insert
Split • Every node (leaf or internal) is a 2-node, 3-node or a 4-node (based on the number of
Example
Delete
links or children), and holds one, two, or three data elements, respectively
Observations
• Generalizes easily to larger nodes
Module Summary
• Extends to external data structures
Module 42
Partha Pratim • Uses 3 kinds of nodes satisfying key relationships as shown below:
Das
◦ A 2-node must contain a single data item (S) and two links
Objectives &
Outline ◦ A 3-node must contain two data items (S, L) and three links
Balanced BST ◦ A 4-node must contain three data items (S, M, L) and four links
2-3-4 Tree ◦ A leaf may contain either one, two, or three data items
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Module 42
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Module 42
Partha Pratim • Insert 10, 30, 60, 20, 50, 40, 70, 80, 15, 90, 100
Das
• 10
Objectives &
Outline • 10, 30
Balanced BST • 10, 30, 60
2-3-4 Tree
Search
• Split for 20
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15
Das
• Split for 90
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15, 90
Das
• Split for 100
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15, 90, 100
Das
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Module Summary
Module 42
Module 42
• Consider only one node type with space for 3 items and 4 links
Partha Pratim
Das ◦ Internal node (non-root) has 2 to 4 children (links)
Objectives & ◦ Leaf node has 1 to 3 items
Outline
◦ Wastes some space, but has several advantages for external data structure
Balanced BST
Module 42
Partha Pratim • Recapitulated the notions of Balanced Binary Search Trees as options for optimal
Das
in-memory search data structures
Objectives &
Outline • Understood the issues relating to external data structures for persistent data
Balanced BST
• Explored 2-3-4 Tree in depth as a precursor to B/B+-Tree for an efficient external data
2-3-4 Tree
Search
structure for database and index tables
Insert
Split
Example
Delete
Observations
Module Summary
Slides used in this presentation are borrowed from http://db-book.com/ with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
B+-Tree Index
Files Module 43: Indexing and Hashing/3: Indexing/3
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Partha Pratim Das
Updates
Insertion
Department of Computer Science and Engineering
Deletion
Indian Institute of Technology, Kharagpur
File Organization
Non-Unique Keys
Relocation and
[email protected]
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.1
Module Recap PPD
Module 43
Partha Pratim • Recapitulated the notions of Balanced Binary Search Trees as options for optimal
Das
in-memory search data structures
Objectives &
Outline • Understood the issues relating to external data structures for persistent data
B+-Tree Index
Files
• Explored 2-3-4 Tree in depth as a precursor to B/B+-Tree for an efficient external data
Simple B
+
Tree structure for database and index tables
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.2
Module Objectives PPD
Module 43
Partha Pratim • To understand the design of B+ Tree Index Files as a generalization of 2-3-4 Tree
Das
• To understand the fundamentals of B-Tree Index Files
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.3
Module Outline PPD
Module 43
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.4
B+ Tree Index Files PPD
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.5
B+ Tree
B-Tree Index
Files
Comparison
Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.6
B+ Tree (2)
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
• Internal node contains
Observations
Query
◦ At least n2 child pointers, except the root node
Duplicates ◦ At most n pointers Note: These are approximate
Updates
Insertion • Leaf node contains values, we will discuss more
Deletion
File Organization ◦ At least n2 record pointers and n2 key values precise values later in this lecture.
Non-Unique Keys
Relocation and ◦ At most n record pointer and n key values
Secondary Indices
Strings
◦ One block pointer P to point to next leaf node
B-Tree Index
Files
Source: B+ Tree
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.7
B+ Tree (3): Search
Module 43
• Suppose we have to search 55 in the B+ tree below
Partha Pratim
Das ◦ First, we will fetch for the intermediary node which will direct to the leaf node that
Objectives &
can contain a record for 55
Outline
• So, in the intermediary node, we will find a branch between 50 and 75 nodes
B+-Tree Index
Files
+
◦ Then at the end, we will be redirected to the third leaf node
Simple B Tree
Index Files ◦ Here DBMS will perform a sequential search to find 55
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings Source: B+ Tree
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.8
B+ Tree (3): Insert
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query • Suppose we want to insert a record 60 that goes to 3rd leaf node after 55
Duplicates
Updates • The leaf node of this tree is already full, so we cannot insert 60 there
Insertion
Deletion • So we have to split the leaf node, so that it can be inserted into tree without affecting
File Organization
Non-Unique Keys
the fill factor, balance and order
• The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50
Relocation and
Secondary Indices
Strings
B-Tree Index
• We will split the leaf node of the tree in the middle so that its balance is not altered
Files
Comparison Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.9
B+ Tree (4): Insert
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query • So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes
Duplicates
Updates • If these two has to be leaf nodes, the intermediate node cannot branch from 50
Insertion
Deletion • It should have 60 added to it, and then we can have pointers to a new leaf node
File Organization
Non-Unique Keys • This is how we can insert an entry when there is overflow. In a normal scenario, it is
Relocation and
Secondary Indices very easy to find the node where it fits and then place it in that leaf node
Strings
Source: B+ Tree
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.10
B+ Tree (5): Delete
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations • To delete 60, we have to remove 60 from intermediate node as well as 4th leaf node
Query
Duplicates
• If we remove it from the intermediate node, then the tree will not remain a B+ tree
Updates
Insertion
Deletion
• So with deleting 60 we re-arranging the nodes:
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.11
B+ Tree Index Files
Module 43
B-Tree Index
◦ B+ trees are used extensively
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.12
B+ Tree Index Files (2): Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.13
B+ Tree Index Files (3): Structure
Module 43
Partha Pratim
A B+ tree is a rooted tree satisfying the following properties:
Das
• All paths from root to leaf are of the same length
• Each node that is not a root or a leaf has between d n2 e and n children
Objectives &
Outline
B+-Tree Index
Files
• A leaf node has between an d n−1
2 e and n − 1 values
+
Simple B
Index Files
Tree
• Special cases:
Nodes
Observations
◦ If the root is not a leaf, it has at least 2 children.
Query ◦ If the root is a leaf (that is, there are no other nodes in the tree), it can have
Duplicates
Updates between 0 and (n − 1) values.
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.14
B+ Tree Index Files (4): Node Structure
Module 43
Objectives &
Outline
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.15
B+ Tree Index Files (5): Leaf Nodes
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.16
B+ Tree Index Files (6): Non-Leaf Nodes
Module 43
Partha Pratim • Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node
Das
with m pointers:
Objectives &
Outline ◦ All the search-keys in the subtree to which P1 points are less than K1
B+-Tree Index ◦ For 2 ≤ i ≤ n − 1, all the search-keys in the subtree to which Pi points have values
Files
Simple B
+
Tree
greater than or equal to Ki−1 and less than Ki
Index Files
Nodes
◦ All the search-keys in the subtree to which Pn points have values greater than or
Observations equal to Kn−1
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.17
B+ Tree Index Files (7): Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.18
B+ Tree Index Files: Observations
Module 43
• Since the inter-node connections are done by pointers, logically close blocks need not
Partha Pratim
Das
be physically close
Objectives &
• The non-leaf levels of the B+ tree form a hierarchy of sparse indices
Outline
• The B+ tree contains a relatively small number of levels
B+-Tree Index
◦ Level below root has at least 2 ∗ n2 values
Files
+
Simple B Tree
Index Files ◦ Next level has at least 2 ∗ d n2 e ∗ d n2 e values
Nodes
Observations
◦ ... etc.
Query
Duplicates
◦ If there are K search-key values in the file, the tree height is no more than
Updates dlogdn/2e (K )e
Insertion
Deletion ◦ thus searches can be conducted efficiently
File Organization
Non-Unique Keys • Insertions and deletions to the main file can be handled efficiently, as the index can be
Relocation and
Secondary Indices restructured in logarithmic time
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.19
B+ Tree Index Files: Queries
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.20
B+ Trees Index Files: Queries (2)
Module 43
Partha Pratim • If
l there are mK search-key values in the file, the height of the tree is no more than
Das
logd n e (K )
Objectives & 2
Outline
• A node is generally the same size as a disk block, typically 4 kilobytes
B+-Tree Index
Files
+
◦ and n is typically around 100 (40 bytes per index entry)
Simple B Tree
Index Files • With 1 million search key values and n = 100
Nodes
Observations ◦ at most log50 (1, 000, 000) = 4 nodes are accessed in a lookup
Query
Duplicates • Contrast this with a balanced binary tree with 1 million search key values — around 20
Updates
Insertion nodes are accessed in a lookup
Deletion
File Organization
◦ above difference is significant since every node access may need a disk I/O, costing
Non-Unique Keys
around 20 milliseconds
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.21
B+ Tree Index Files: Handling Duplicates
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.22
B+ Tree Index Files: Handling Duplicates (2)
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.23
Updates on B+ Trees: Insertion
Module 43
Partha Pratim • Find the leaf node in which the search-key value would appear
Das
• If the search-key value is already present in the leaf node
Objectives &
Outline ◦ Add record to the file
B+-Tree Index
Files
◦ If necessary add a pointer to the bucket
+
Simple B
Index Files
Tree
• If the search-key value is not present, then
Nodes
Observations
◦ Add the record to the main file (and create a bucket if necessary)
Query ◦ If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node
Duplicates
Updates
◦ Otherwise, split the node (along with the new (key-value, pointer) entry) as
Insertion
Deletion
discussed in the next slide
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.24
Updates on B+ Trees: Insertion (2)
Module 43
B+-Tree Index ◦ let the new node be p, and let k be the least key value in p. Insert (k, p) in the
Files
Simple B
+
Tree
parent of the node being split
Index Files
Nodes
◦ If the parent is full, split it and propagate the split further up
Observations
Query
• Splitting of nodes proceeds upwards till a node that is not full is found
Duplicates
◦ In the worst case the root node may be split increasing the height of the tree by 1
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Secondary Indices Next step: insert entry with (Califieri,pointer-to-new-node) into parent
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.25
Updates on B+ Trees: Insertion (3)
Module 43
Partha Pratim • Splitting a non-leaf node: when inserting (k, p) into an already full internal node N
Das
◦ Copy N to an in-memory area M with space for n + 1 pointers and n keys
Objectives &
Outline ◦ Insert (k, p) into M
B+-Tree Index ◦ Copy P1 , K1 , · · · , Kd n e−1 , Pd n e from M back into node N
Files 2 2
Simple B
+
Tree ◦ Copy Pd n e+1 , Kd n e+1 , · · · , Kn , Pn+1 from M into newly allocated node N 0
Index Files 2 2
Nodes ◦ Insert (Kd n e , N 0 ) into parent N
Observations 2
Query
Duplicates
• Read pseudocode in book!
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.26
Updates on B+ Trees: Insertion Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files B+ Tree before and after insertion of “Adams”
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.27
Updates on B+ Trees: Insertion Example (2)
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
Module Summary
Database Management Systems Partha Pratim Das 43.28
Updates on B+ Trees: Deletion
Module 43
Partha Pratim • Find the record to be deleted, and remove it from the main file and from the bucket (if
Das
present)
Objectives &
Outline • Remove (search-key value, pointer) from the leaf node if there is no bucket or if the
B+-Tree Index bucket has become empty
Files
Simple B
+
Tree • If the node has too few entries due to the removal, and the entries in the node and a
Index Files
Nodes sibling fit into a single node, then merge siblings:
Observations
Query ◦ Insert all the search-key values in the two nodes into a single node (the one on the
Duplicates
Updates
left), and delete the other node.
Insertion ◦ Delete the pair (Ki−1 , Pi ), where Pi is the pointer to the deleted node, from its
Deletion
File Organization
parent, recursively using the above procedure.
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.29
Updates on B+ Trees: Deletion (2)
Module 43
Partha Pratim • Otherwise, if the node has too few entries due to the removal, but the entries in the
Das
node and a sibling do not fit into a single node, then redistribute pointers:
Objectives &
Outline ◦ Redistribute the pointers between the node and a sibling such that both have more
B+-Tree Index than the minimum number of entries
Files
Simple B
+
Tree
◦ Update the corresponding search-key value in the parent of the node
• The node deletions may cascade upwards till a node which has n2 or more pointers is
Index Files
Nodes
Observations
Query
found
Duplicates
Updates
• If the root node has only one pointer after deletion, it is deleted and the sole child
Insertion becomes the root
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.30
Updates on B+ Trees: Deletion Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Before and after deleting “Srinivasan”
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
Deleting “Srinivasan” causes merging of under-full leaves
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.31
Updates on B+ Trees: Deletion Example (2)
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Deletion of “Singh” and “Wu” from result of previous example
Observations
Query
Duplicates • Leaf containing Singh and Wu became underfull, and borrowed a value Kim from its
Updates
Insertion left sibling
Deletion
File Organization • Search-key value in the parent changes as a result
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.32
Updates on B+ Trees: Deletion Example (3)
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization Before and after deletion of “Gold” from earlier example
Non-Unique Keys
Relocation and
Secondary Indices • Node with “Gold” and “Katz” became underfull, and was merged with its sibling
Strings
B-Tree Index
• Parent node becomes underfull, and is merged with its sibling
Files
Comparison
◦ Value separating two nodes (at the parent) is pulled down when merging
Module Summary • Root node then has only one child, and is delete
Database Management Systems Partha Pratim Das 43.33
B+ Tree File Organization
Module 43
Partha Pratim • Index file degradation problem is solved by using B+ Tree indices
Das
• Data file degradation problem is solved by using B+ Tree File Organization
Objectives &
Outline • The leaf nodes in a B+ tree file organization store records, instead of pointers
B+-Tree Index
Files • Leaf nodes are still required to be half full
+
Simple B Tree
Index Files ◦ Since records are larger than pointers, the maximum number of records that can be
Nodes
Observations
stored in a leaf node is less than the number of pointers in a non-leaf node
Query
Duplicates
• Insertion and deletion are handled in the same way as insertion and deletion of entries
Updates in a B+ tree index
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.34
B+ Tree File Organization: Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Example of B+ tree File Organization
Insertion
Deletion
File Organization • Good space utilization important since records use more space than pointers.
Non-Unique Keys
Relocation and
Secondary Indices
• To improve space utilization, involve more sibling nodes in redistribution during splits
Strings and merges
B-Tree Index
Files
◦ Involving 2 siblings in redistribution
(to avoid split / merge where possible) results
Comparison in each node having at least 2n 3 entries
Module Summary
Database Management Systems Partha Pratim Das 43.35
Non-Unique Search Keys
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.36
Record Relocation and Secondary Indices
Module 43
Partha Pratim • If a record moves, all secondary indices that store record pointers have to be updated
Das
• Node splits in B+ tree file organizations become very expensive
Objectives &
Outline • Solution: Use primary-index search key instead of record pointer in secondary index
B+-Tree Index
Files ◦ Extra traversal of primary index to locate record
+
Simple B
Index Files
Tree
– Higher cost for queries, but node splits are cheap
Nodes ◦ Add record-id if primary-index search key is non-unique
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.37
Indexing Strings
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.38
B-Tree Index Files PPD
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
B-Tree Index Files
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.39
B-Tree Index Files
Module 43
Partha Pratim • Similar to B+ tree, but B-tree allows search-key values to appear only once; eliminates
Das
redundant storage of search keys
Objectives &
Outline • Search keys in non-leaf nodes appear nowhere else in the B-tree; an additional pointer
B+-Tree Index field for each search key in a non-leaf node must be included
Files
Simple B
+
Tree • Generalized B-tree leaf node
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings • Non-leaf node - pointers Bi are the bucket or file record pointers
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.40
B-Tree Index File (2): Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
B-tree (above) and B+ tree (below) on same data
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.41
Comparison of B-Tree and B+ Tree Index Files
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.42
Module Summary
Module 43
Partha Pratim • Understood the design of B+ Tree Index Files in depth for database persistent store
Das
• Familiarized with B-Tree Index Files
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Slides used in this presentation are borrowed from http://db-book.com/ with kind
Non-Unique Keys permission of the authors.
Relocation and
Secondary Indices Edited and new slides are marked with “PPD”.
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.43
Module 44
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Static Hashing
Hash Function
Module 44: Indexing and Hashing/4: Hashing
Example
Bucket Overflow
Dynamic Hashing
Example
Partha Pratim Das
Comparison
Schemes
Module 44
Partha Pratim • Understood the design of B+ Tree Index Files in depth for database persistent store
Das
• Familiarized with B-Tree Index Files
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • To explore various hashing schemes – Static and Dynamic Hashing
Das
• To compare Ordered Indexing and Hashing
Objectives &
Outline • To understand the Bitmap Indices
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Static Hashing
Module 44 • A hash function h maps data of arbitrary size (from domain D) to fixed-size values (say,
Partha Pratim
integers from 0 to N > 0 h : D → [0..N]
Das
• Given key k, h(k) is called hash values, hash codes, digests, or simply hashes
Objectives &
Outline • If for two keys k1 6= k2 , we have h(k1 ) = h(k2 ), we say a collision has occured
Static Hashing
Hash Function
• A hash function should be Collision Free and Fast
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • A bucket is a unit of storage containing one or more records (a bucket is typically a
Das
disk block)
Objectives &
Outline • In a hash file organization we obtain the bucket of a record directly from its
Static Hashing search-key value using a hash function
Hash Function
Example • Hash function h is a function from the set of all search-key values K to the set of all
Bucket Overflow
bucket addresses B
Dynamic Hashing
Example • Hash function is used to locate records for access, insertion as well as deletion
Comparison
Schemes • Records with different search-key values may be mapped to the same bucket; thus
Bitmap Indices entire bucket has to be searched sequentially to locate a record
Module Summary
Module 44
Partha Pratim
Hash file organization of instructor file, using dept name as key
Das
• There are 10 buckets
Objectives &
Outline • The binary representation of the i th character is assumed to be the integer i
Static Hashing
Hash Function
• The hash function returns the sum of the binary representations of the characters
Example modulo 10
Bucket Overflow
Dynamic Hashing
◦ For example
Example
Comparison
Schemes
h(Music) = 1 h(History) = 2
Bitmap Indices
h(Physics) = 3 h(Elec. Eng.) = 3
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • Worst hash function maps all search-key values to the same bucket; this makes access
Das
time proportional to the number of search-key values in the file
Objectives &
Outline • An ideal hash function is uniform, i.e., each bucket is assigned the same number of
Static Hashing search-key values from the set of all possible values
Hash Function
Example • Ideal hash function is random, so each bucket will have the same number of records
Bucket Overflow
assigned to it irrespective of the actual distribution of search-key values in the file
Dynamic Hashing
Example • Typical hash functions perform computation on the internal binary representation of the
Comparison
Schemes
search-key
Bitmap Indices ◦ For example, for a string search-key, the binary representations of all the characters
Module Summary in the string could be added and the sum modulo the number of buckets could be
returned
Module 44
Dynamic Hashing
• Although the probability of bucket overflow can be reduced, it cannot be eliminated
Example ◦ it is handled by using overflow buckets
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44 • Overflow chaining – the overflow buckets of a given bucket are chained together in a
Partha Pratim
linked list
Das
• Above scheme is called closed hashing
Objectives & ◦ An alternative, called open hashing, which does not use overflow buckets, is not
Outline
Static Hashing
suitable for database applications
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • Hashing can be used not only for file organization, but also for index-structure creation
Das
• A hash index organizes the search keys, with their associated record pointers, into a
Objectives &
Outline hash file structure
Static Hashing
Hash Function
• Strictly speaking, hash indices are always secondary indices
Example
Bucket Overflow
◦ if the file itself is organized using hashing, a separate primary hash index on it using
Dynamic Hashing
the same search-key is unnecessary
Example ◦ However, we use the term hash index to refer to both secondary index structures
Comparison
Schemes
and hash organized files
Bitmap Indices
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • In static hashing, function h maps search-key values to a fixed set of B of bucket
Das
addresses. Databases grow or shrink with time
Objectives &
Outline ◦ If initial number of buckets is too small, and file grows, performance will degrade
Static Hashing due to too much overflows
Hash Function
Example
◦ If space is allocated for anticipated growth, a significant amount of space will be
Bucket Overflow wasted initially (and buckets will be underfull).
Dynamic Hashing ◦ If database shrinks, again space will be wasted
Example
Comparison • One solution: periodic re-organization of the file with a new hash function
Schemes
Bitmap Indices
◦ Expensive, disrupts normal operations
Module Summary • Better solution: allow the number of buckets to be modified dynamically
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Dynamic Hashing
Module 44
Partha Pratim • Good for database that grows and shrinks in size
Das
• Allows the hash function to be modified dynamically
Objectives &
Outline • Extendable hashing – one form of dynamic hashing
Static Hashing
Hash Function
◦ Hash function generates values over a large range — typically b-bit integers, with
Example
Bucket Overflow
b = 32
Dynamic Hashing
◦ At any time use only a prefix of the hash function to index into a table of bucket
Example addresses
Comparison
Schemes
◦ Let the length of the prefix be i bits, 0 ≤ i ≤ 32
Bitmap Indices . Bucket address table size = 2i . Initially i = 0
Module Summary . Value of i grows and shrinks as the size of the database grows and shrinks
◦ Multiple entries in the bucket address table may point to a bucket (why?)
◦ Thus, actual number of buckets is < 2i
. The number of buckets also changes dynamically due to coalescing and splitting
of buckets
Database Management Systems Partha Pratim Das 44.17
General Extendable Hash Structure PPD
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Dynamic Hashing
follow the pointer to appropriate bucket
Example
• To insert a record with search-key value Kj
Comparison
Schemes ◦ Follow same procedure as look-up and locate the bucket, say j
Bitmap Indices ◦ If there is room in the bucket j insert record in the bucket
Module Summary ◦ Else the bucket must be split and insertion re-attempted (next slide)
. Overflow buckets used instead in some cases (will see shortly)
Module 44
To split a bucket j when inserting record with search-key value Kj
Partha Pratim
Das
• If i > ij (more than one pointer to bucket j)
Objectives & ◦ Allocate a new bucket z, and set ij = iz = (ij + 1)
Outline
Static Hashing
◦ Update the second half of the bucket address table entries originally pointing to j,
Hash Function to point to z
Example
Bucket Overflow
◦ Remove each record in bucket j and reinsert (in j or z)
Dynamic Hashing ◦ Recompute new bucket for Kj and insert record in the bucket (further splitting is
Example
required if the bucket is still full)
Comparison
Schemes • If i = ij (only one pointer to bucket j)
Bitmap Indices ◦ If i reaches some limit b, or too many splits have happened in this insertion, create
Module Summary
an overflow bucket
◦ Else
. Increment i and double the size of the bucket address table
. Replace each entry in the table by two entries that point to the same bucket
. Recompute new bucket address table entry for Kj . Now i > ij so use the first
case
Database Management above
Systems Partha Pratim Das 44.20
Deletion in Extendable Hash Structure
Module 44
Comparison . Note: decreasing bucket address table size is an expensive operation and should
Schemes
be done only if number of buckets becomes much smaller than the size of the
Bitmap Indices
table
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
• Initial Hash structure; bucket size = 2
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
• Insert “Mozart”, “Srinivasan”, and “Wu” records
Module 44
Partha Pratim
• Hash structure after insertion of “Mozart”, “Srini-
Das vasan”, and “Wu” records
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
• Insert Einstein record
Module 44
• Hash structure after insertion of “Einstein” record
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
• Insert “Gold” and “El Said” records
Module 44
• Hash structure after insertion of “Gold” and “El
Partha Pratim
Das
Said” records
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim
• Hash structure after insertion of “Kim” record
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Comparison Schemes
Module 44
Dynamic Hashing
◦ Bucket address table may itself become very big (larger than memory)
Example . Cannot allocate very large contiguous areas on disk either
Comparison
Schemes
. Solution: B+ -tree structure to locate desired record in bucket address table
Bitmap Indices ◦ Changing size of bucket address table is an expensive operation
Module Summary
• Linear hashing is an alternative mechanism
◦ Allows incremental growth of its directory (equivalent to bucket address table)
◦ At the cost of more bucket overflows
Module 44
Comparison
• In practice:
Schemes
◦ PostgreSQL supports hash indices, but discourages use due to poor performance
Bitmap Indices
◦ Oracle supports static hash organization, but not hash indices
Module Summary
◦ SQLServer supports only B+ -trees
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Bitmap Indices
Module 44
Partha Pratim • Bitmap indices are a special type of index designed for efficient querying on multiple
Das
keys
Objectives &
Outline • Records in a relation are assumed to be numbered sequentially from, say, 0
Static Hashing ◦ Given a number n it must be easy to retrieve record n
Hash Function
Example . Particularly easy if records are of fixed size
Bucket Overflow
Dynamic Hashing • Applicable on attributes that take on a relatively small number of distinct values
Example
◦ For example: gender, country, state, . . .
Comparison
Schemes ◦ For example: income-level (income broken up into a small number of levels such as
Bitmap Indices 0-9999, 10000-19999, 20000-50000, 50000- infinity)
Module Summary
• A bitmap is simply an array of bits
Module 44
Partha Pratim • In its simplest form a bitmap index on an attribute has a bitmap for each value of the
Das
attribute
Objectives &
Outline ◦ Bitmap has as many bits as records
Static Hashing ◦ In a bitmap for value v, the bit for a record is 1 if the record has the value v for the
Hash Function
Example
attribute, and is 0 otherwise
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • Bitmap indices are useful for queries on multiple attributes
Das
◦ not particularly useful for single attribute queries
Objectives &
Outline • Queries are answered using bitmap operations
Static Hashing
Hash Function
◦ Intersection (and)
Example ◦ Union (or)
Bucket Overflow
Dynamic Hashing
◦ Complementation (not)
Example
• Each operation takes two bitmaps of the same size and applies the operation on
Comparison
Schemes corresponding bits to get the result bitmap
Bitmap Indices ◦ For example: 100110 AND 110011 = 100010
Module Summary 100110 OR 110011 = 110111
NOT 100110 = 011001
◦ Males with income level L1: 10010 AND 10100 = 10000
. Can then retrieve required tuples
. Counting number of matching tuples is even faster
Database Management Systems Partha Pratim Das 44.36
Bitmap Indices (4)
Module 44
Partha Pratim • Bitmap indices generally very small compared with relation size
Das
◦ For example, if record is 100 bytes, space for a single bitmap is 1/800 of space used
Objectives &
Outline by relation
Static Hashing . If number of distinct attribute values is 8, bitmap is only 1% of relation size
Hash Function
Example • Deletion needs to be handled properly
Bucket Overflow
Dynamic Hashing
◦ Existence bitmap to note if there is a valid record at a record location
Example ◦ Needed for complementation
Comparison
Schemes . not(A=v ): (NOT bitmap-A-v) AND ExistenceBitmap
Bitmap Indices
• Should keep bitmaps for all values, even null value
Module Summary
◦ To correctly handle SQL null semantics for NOT(A=v ):
. intersect above result with (NOT bitmap-A-Null)
Module 44
Partha Pratim • Bitmaps are packed into words; a single word and (a basic CPU instruction) computes
Das
and of 32 or 64 bits at once
Objectives &
Outline ◦ For example, 1-million-bit maps can be and-ed with just 31,250 instruction
Static Hashing • Counting number of 1s can be done fast by a trick:
Hash Function
Example ◦ Use each byte to index into a precomputed array of 256 elements each storing the
Bucket Overflow
Dynamic Hashing
count of 1s in the binary representation
Example . Can use pairs of bytes to speed up further at a higher memory cost
Comparison
Schemes ◦ Add up the retrieved counts
Bitmap Indices
• Bitmaps can be used instead of Tuple-ID lists at leaf levels of B+ -trees, for values that
Module Summary
have a large number of matching records
◦ Worthwhile if > 1/64 of the records have that value, assuming a tuple-id is 64 bits
◦ Above technique merges benefits of bitmap and B+ -tree indices
Module 44
Partha Pratim • Explored various hashing schemes – Static and Dynamic Hashing
Das
• Compared Ordered Indexing and Hashing
Objectives &
Outline • Studied the use of Bitmap Indices for fast access of columns with limited number of
Static Hashing
Hash Function
distinct values
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Slides used in this presentation are borrowed from http://db-book.com/ with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 44.39
Module 45
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Index Definition
in SQL Module 45: Indexing and Hashing/5: Index Design
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules Partha Pratim Das
Rule 0
Rule 1
Rule 2 Department of Computer Science and Engineering
Rule 3 Indian Institute of Technology, Kharagpur
Rule 4
Rule 5
[email protected]
Rule 6
Module Summary
Module 45
Partha Pratim • Explored various hashing schemes – Static and Dynamic Hashing
Das
• Compared Ordered Indexing and Hashing
Objectives &
Outline • Studied the use of Bitmap Indices for fast access of columns with limited number of
Index Definition
in SQL
distinct values
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Partha Pratim
Das
Objectives &
Outline
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Index Definition in SQL
Rule 6
Module Summary
Module 45
• Create an index
Partha Pratim
Das
create index <index-name> on <relation-name> (<attribute-list>)
For example: create index b-index on branch (branch name)
Objectives &
Outline
• Use create unique index to indirectly specify and enforce the condition that the search
Index Definition
in SQL key is a candidate key
Multiple-Key Access
Privileges
◦ Not really required if SQL unique integrity constraint is supported – it is preferred
Guidelines for
Indexing
• To drop an index
Ground Rules drop index <index-name>
Rule 0
Rule 1 • Most database systems allow specification of type of index, and clustering
Rule 2
Rule 3 ◦ You can also create an index for a cluster
Rule 4
Rule 5
◦ You can create a composite index on multiple columns up to a maximum of 32
Rule 6 columns
Module Summary
. A composite index key cannot exceed roughly one-half (minus some overhead)
of the available space in the data block
Module 45 • Create an index for a single column, to speed up queries that test that column:
Partha Pratim ◦ CREATE INDEX emp ename ON emp tab(ename);
Das
• Specify several storage settings explicitly for the index:
Objectives &
Outline ◦ CREATE INDEX emp ename ON emp tab(ename)
Index Definition
TABLESPACE users // Allocation of space in the Database to contain schema objects
in SQL STORAGE ( // Specify how Database should store a database object
Multiple-Key Access
INITIAL 20K // Specify the size of the 1st extent of the object
Privileges
NEXT 20K // Specify in bytes the size of the 2nd extent to be allocated to the object
Guidelines for
Indexing PCTINCREASE 75) // Specify the percent by which later extents grow over
Ground Rules PCTFREE 0 // 0% of each data block in this table’s data segment be free for updates
Rule 0 COMPUTE STATISTICS;
Rule 1
Rule 2
◦ Create index on two columns, to speed up queries that test either the first column or both columns:
Rule 3 . CREATE INDEX emp ename ON emp tab(ename, empno) COMPUTE STATISTICS;
Rule 4
Rule 5
◦ If a query is going to sort on the function UPPER(ENAME), an index on the ENAME column itself
Rule 6 would not speed up this operation, and it might be slow to call the function for each result row
Module Summary . A function-based index precomputes the result of the function for each column value, speeding
up queries that use the function for searching or sorting:
CREATE INDEX emp upper ename ON emp tab(UPPER(ename)) COMPUTE STATISTICS;
Source: Selecting an Index Strategy
Database Management Systems Partha Pratim Das 45.7
Index in SQL: Bitmap PPD
Module 45
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6 • SELECT * FROM Student WHERE Gender = ‘F’ AND Semester =4;
Module Summary ◦ AND 0 1 1 1 with 0 0 0 1 to get the result
Module 45
Guidelines for
• Possible strategies for processing query using indices on single attributes:
Indexing
Ground Rules
◦ Use index on dept name to find instructors with department name Finance; test
Rule 0 salary = 80000
Rule 1
Rule 2 ◦ Use index on salary to find instructors with a salary of 80000; test dept name =
Rule 3
Rule 4
“Finance”
Rule 5 ◦ Use dept name index to find pointers to all records pertaining to the “Finance”
Rule 6
department. Similarly use index on salary. Take intersection of both sets of pointers
Module Summary
obtained
Module 45
Partha Pratim • Composite Search Keys are search keys containing more than one attribute
Das
◦ For example, (dept name, salary )
Objectives &
Outline • Lexicographic ordering: (a1 , a2 ) < (b1 , b2 ) if either
Index Definition
in SQL ◦ a1 < b1 , or
Multiple-Key Access
Privileges
◦ a1 = b1 and a2 < b2
Guidelines for • Hence, the order is important
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Partha Pratim
Suppose we have an index on combined search-key:
Das (dept name, salary )
Objectives &
Outline • With the where clause
Index Definition
in SQL
where dept name = “Finance” and salary = 80000
Multiple-Key Access the index on (dept name, salary ) can be used to fetch only records that satisfy both
Privileges
conditions.
Guidelines for
Indexing ◦ Using separate indices in less efficient - we may fetch many records (or pointers)
Ground Rules
Rule 0 that satisfy only one of the conditions
Rule 1
Rule 2 ◦ Can also efficiently handle
Rule 3
Rule 4
where dept name = “Finance” and salary < 80000
Rule 5 ◦ But cannot efficiently handle
Rule 6
where dept name < “Finance” and balance = 80000
Module Summary
. May fetch many records that satisfy the first but not the second condition
Module 45
Partha Pratim • When using indexes in an application, you might need to request that the DBA grant
Das
privileges or make changes to initialization parameters
Objectives &
Outline • To create a new index
Index Definition
in SQL
◦ You must own, or have the INDEX object privilege for the corresponding table
Multiple-Key Access ◦ The schema that contains the index must also have a quota for the tablespace
Privileges
intended to contain the index, or the UNLIMITED TABLESPACE system privilege
Guidelines for
Indexing ◦ To create an index in another user’s schema, you must have the CREATE ANY
Ground Rules
Rule 0
INDEX system privilege
Rule 1
Rule 2 • Function-based indexes also require the QUERY REWRITE privilege, and that the
Rule 3
Rule 4
QUERY REWRITE ENABLED initialization parameter to be set to TRUE
Rule 5
Rule 6
Module Summary
Module 45
Partha Pratim
Das
Objectives &
Outline
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Guidelines for Indexing
Rule 6
Module Summary
Module 45
• In Modules 16 to 20 (Week 4), we have studied various issues for a proper design of a
Partha Pratim
Das
relational database system. This focused on:
Objectives &
◦ Normalization of Tables leading to
Outline
. Reduction of Redundancy to minimize possibilities of Anomaly
Index Definition
in SQL . Easier adherence to constraints (various dependencies)
Multiple-Key Access
Privileges
. Efficiency of access and update – a better normalized design often gives better
Guidelines for
performance
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
• The performance of a database system, however, is also significantly impacted by the
Partha Pratim
Das
way the data is physically organized and managed. These are done through:
Objectives &
◦ Indexing and Hashing
Outline
• While normalization and design are startup time activities that are usually performed
Index Definition
in SQL once at the beginning (and rarely changed later), the performance behavior continues
Multiple-Key Access
Privileges
to evolve as the database is used over time. Hence we need to continually:
Guidelines for ◦ Collect Statistics about data (of various tables) to learn of the patterns, and
Indexing
Ground Rules ◦ Adjust the Indexes on the tables to optimize performance
Rule 0
Rule 1 • There is no sound theory that determines optimal performance. Rather, we take a quick
Rule 2
Rule 3
look into a few common guidelines that can help you keep your database agile in its
Rule 4 behavior
Rule 5
Rule 6
Module Summary
Module 45
Partha Pratim • Some guidelines - heuristic and common sense, but time-tested - are summarized here
Das
as a set of Ground Rules for Indexing
Objectives &
Outline ◦ Rule 0: Indexes lead to Access – Update Tradeoff
Index Definition ◦ Rule 1: Index the Correct Tables
in SQL
Multiple-Key Access
◦ Rule 2: Index the Correct Columns
Privileges
◦ Rule 3: Limit the Number of Indexes for Each Table
Guidelines for
Indexing ◦ Rule 4: Choose the Order of Columns in Composite Indexes
Ground Rules
◦ Rule 5: Gather Statistics to Make Index Usage More Accurate
Rule 0
Rule 1 ◦ Rule 6: Drop Indexes That Are No Longer Required
Rule 2
Rule 3 • These rules are explained in the following slides
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Guidelines for . Having unnecessary indexes can cause significant degradation of performance of
Indexing
Ground Rules
various operations
Rule 0 . Index files may also occupy significant space on your disk and / or
Rule 1
Rule 2 . Cause slow behavior due to memory limitations during index computations
Rule 3
Rule 4
◦ Use informed judgment to index!
Rule 5
Rule 6
Module Summary
Module 45
Guidelines for
− The faster the table scan, the lower the percentage
Indexing − More clustered the row data, the higher the percentage
Ground Rules
Rule 0 • Index columns used for joins to improve performance on joins of multiple tables
Rule 1
Rule 2
• Primary and unique keys automatically have indexes, but you might want to create an
Rule 3
Rule 4 index on a foreign key
Rule 5
Rule 6 • Small tables do not require indexes
Module Summary
◦ If a query is taking too long, then the table might have grown from small to large
Module Summary
. There are many nulls in the column and you do not search on the non-null values
. LONG and LONG RAW columns cannot be indexed
◦ The size of a single index entry cannot exceed roughly one-half (minus some
overhead) of the available space in the data block
Database Management Systems Partha Pratim Das 45.19
Guidelines for Indexing: Rule 3 PPD
Module 45
Partha Pratim • Rule 3: Limit the Number of Indexes for Each Table
Das
◦ The more indexes, the more overhead is incurred as the table is altered
Objectives &
Outline . When rows are inserted or deleted, all indexes on the table must be updated
Index Definition
in SQL
. When a column is updated, all indexes on the column must be updated
Multiple-Key Access ◦ You must weigh the performance benefit of indexes for queries against the
Privileges
Guidelines for
performance overhead of updates
Indexing
Ground Rules
. If a table is primarily read-only, you might use more indexes; but, if a table is
Rule 0 heavily updated, you might use fewer indexes
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module Summary
• Composite indexes speed up queries that use the leading portion of the index:
◦ So queries with WHERE clauses using only PART NO column also runs faster
◦ With only 5 distinct values, a separate index on VENDOR ID does not help
Database Management Systems Partha Pratim Das 45.21
Guidelines for Indexing: Rule 5 PPD
Module 45
Partha Pratim • Rule 5: Gather Statistics to Make Index Usage More Accurate
Das
◦ The database can use indexes more effectively when it has statistical information
Objectives &
Outline about the tables involved in the queries
Index Definition
in SQL
. Gather statistics when the indexes are created by including the keywords
Multiple-Key Access COMPUTE STATISTICS in the CREATE INDEX statement
Privileges
. As data is updated and the distribution of values changes, periodically refresh
Guidelines for
Indexing the statistics by calling procedures like (in Oracle):
Ground Rules
Rule 0 − DBMS STATS.GATHER TABLE STATISTICS and
Rule 1
Rule 2
− DBMS STATS.GATHER SCHEMA STATISTICS
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Module Summary
◦ If you drop a table, then all associated indexes are dropped
◦ To drop an index, the index must be contained in your schema or you must have the
DROP ANY INDEX system privilege
Database Management Systems Partha Pratim Das 45.23
Module Summary
Module 45
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Slides used in this presentation are borrowed from http://db-book.com/ with kind
Rule 4
Rule 5 permission of the authors.
Rule 6
Edited and new slides are marked with “PPD”.
Module Summary