CH 24
CH 24
Database System Concepts - 7th Edition 24.2 ©Silberschatz, Korth and Sudarshan
Bloom Filters (Cont.)
Key idea of Bloom filter: reduce false positives by use multiple hash
functions hi() for i = 1..k
• For each element s in set S for each i compute hi(s) and set bit hi(s)
• To query an element v for each i compute hi(v), and check if bit hi(v) is
set
If bit hi(v) is set for every i then report v as present in set
Else report v as absent
• With 10n bits, and k = 7, false positive rate reduces to 1% instead of
10% with k = 1
Database System Concepts - 7th Edition 24.3 ©Silberschatz, Korth and Sudarshan
Write Optimized Indices
Performance of
B+-trees can be poor for write-intensive workloads
• One I/O per leaf, assuming all internal nodes are in memory
• With magnetic disks, < 100 inserts per second per disk
• With flash memory, one page overwrite per insert
Two approaches to reducing cost of writes
• Log-structured merge tree
• Buffer tree
Database System Concepts - 7th Edition 24.4 ©Silberschatz, Korth and Sudarshan
Log Structured Merge (LSM) Tree
Consider only inserts/queries for
now
Records inserted first into in-
memory tree (L0 tree)
When in-memory tree is full,
records moved to disk (L1 tree)
• B+-tree constructed using
bottom-up build by merging
existing L1 tree with records
from L0 tree
When L1 tree exceeds some
threshold, merge into L2 tree
• And so on for more levels
• Size threshold for Li+1 tree is
k times size threshold for Li
tree
•
Database System Concepts - 7th Edition + 24.5 ©Silberschatz, Korth and Sudarshan
LSM Tree (Cont.)
Benefits of LSM approach
• Inserts are done using only sequential I/O operations
• Leaves are full, avoiding space wastage
• Reduced number of I/O operations per record inserted as compared to
normal B+-tree (up to some size)
If each leaf has m entries, m/k entries merged in using 1 IO
Total I/O operations: k/m logk(I/M) where I = total number of
entries, and M is the size of L0 tree.
Drawback of LSM approach
• Queries have to search multiple trees
• Entire content of each level copied multiple times
Database System Concepts - 7th Edition 24.6 ©Silberschatz, Korth and Sudarshan
Optimizations of LSM
Rolling merge
LSM/Stepped Merge often implemented on a partitioned relation
• Each partition size set to some max, split if over-sized
• Spread partitions over multiple machines
Database System Concepts - 7th Edition 24.7 ©Silberschatz, Korth and Sudarshan
Stepped Merge Index
Database System Concepts - 7th Edition 24.8 ©Silberschatz, Korth and Sudarshan
LSM Trees (Cont.)
Deletion handled by adding special “delete” entries
• Lookups will find both original entry and the delete entry, and must
return only those entries that do not have matching delete entry
• When trees are merged, if we find a delete entry matching an original
entry, both are dropped.
Update handled using insert + delete
LSM trees were introduced for disk-based indices
• But useful to minimize erases with flash-based indices
• The stepped-merge variant of LSM trees is used in many BigData
storage systems
Google BigTable, Apache Cassandra, MongoDB
And more recently in SQLite4, LevelDB, and MyRocks storage
engine of MySQL
Database System Concepts - 7th Edition 24.9 ©Silberschatz, Korth and Sudarshan
Buffer Tree
Alternative to LSM tree
Key idea: each internal node of B+-tree has a buffer to store inserts
• Inserts are moved to lower levels when buffer is full
• With a large buffer, many records are moved to lower level each time
• Per record I/O decreases correspondingly
Benefits
• Less overhead on queries
• Can be used with any tree index structure
• Used in PostgreSQL Generalized Search Tree (GiST) indices
Drawback: more random I/O than LSM tree
Database System Concepts - 7th Edition 24.10 ©Silberschatz, Korth and Sudarshan
Bitmap Indices
Bitmap indices are a special type of index designed for efficient querying on
multiple keys
Records in a relation are assumed to be numbered sequentially from, say, 0
Given a number n it must be easy to retrieve record n
Particularly easy if records are of fixed size
Applicable on attributes that take on a relatively small number of distinct
values
E.g., gender, country, state, …
E.g., income-level (income broken up into a small number of levels
such as 0-9999, 10000-19999, 20000-50000, 50000- infinity)
A bitmap is simply an array of bits
Database System Concepts - 7th Edition 24.11 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
In its simplest form a bitmap index on an attribute has a bitmap for each
value of the attribute
Bitmap has as many bits as records
In a bitmap for value v, the bit for a record is 1 if the record has the
value v for the attribute, and is 0 otherwise
Database System Concepts - 7th Edition 24.12 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
Database System Concepts - 7th Edition 24.13 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
Database System Concepts - 7th Edition 24.14 ©Silberschatz, Korth and Sudarshan
Efficient Implementation of Bitmap Operations
Bitmaps are packed into words; a single word and (a basic CPU
instruction) computes and of 32 or 64 bits at once
E.g., 1-million-bit maps can be and-ed with just 31,250 instruction
Counting number of 1s can be done fast by a trick:
Use each byte to index into a precomputed array of 256 elements each
storing the count of 1s in the binary representation
Can use pairs of bytes to speed up further at a higher memory cost
Add up the retrieved counts
Bitmaps can be used instead of Tuple-ID lists at leaf levels of
B+-trees, for values that have a large number of matching records
Worthwhile if > 1/64 of the records have that value, assuming a tuple-
id is 64 bits
Above technique merges benefits of bitmap and B+-tree indices
Database System Concepts - 7th Edition 24.15 ©Silberschatz, Korth and Sudarshan
Spatial and Temporal Indices
Database System Concepts - 7th Edition 24.16 ©Silberschatz, Korth and Sudarshan
Spatial Data
Databases can store data types such as lines, polygons, in addition to
raster images
• allows relational databases to store and retrieve spatial information
• Queries can use spatial conditions (e.g. contains or overlaps).
• queries can mix spatial and nonspatial conditions
Nearest neighbor queries, given a point or an object, find the nearest
object that satisfies given conditions.
Range queries deal with spatial regions. e.g., ask for objects that lie
partially or fully inside a specified region.
Queries that compute intersections or unions of regions.
Spatial join of two spatial relations with the location playing the role of join
attribute.
Database System Concepts - 7th Edition 24.17 ©Silberschatz, Korth and Sudarshan
Indexing of Spatial Data
Database System Concepts - 7th Edition 24.18 ©Silberschatz, Korth and Sudarshan
Division of Space by Quadtrees
Quadtrees
Each node of a quadtree is associated with a rectangular region of space;
the top node is associated with the entire target space.
Each non-leaf nodes divides its region into four equal sized quadrants
• correspondingly each such node has four child nodes corresponding
to the four quadrants and so on
Leaf nodes have between zero and some fixed maximum number of
points (set to 1 in example).
Database System Concepts - 7th Edition 24.19 ©Silberschatz, Korth and Sudarshan
Quadtrees (Cont.)
PR quadtree: stores points; space is divided based on regions, rather than
on the actual set of points stored.
Region quadtrees store array (raster) information.
• A node is a leaf node is all the array values in the region that it covers
are the same. Otherwise, it is subdivided further into four children of
equal area, and is therefore an internal node.
• Each node corresponds to a sub-array of values.
• The sub-arrays corresponding to leaves either contain just a single
array element, or have multiple array elements, all of which have the
same value.
Extensions of k-d trees and PR quadtrees have been proposed to index
line segments and polygons
• Require splitting segments/polygons into pieces at partitioning
boundaries
Same segment/polygon may be represented at several leaf nodes
Database System Concepts - 7th Edition 24.20 ©Silberschatz, Korth and Sudarshan
R-Trees
R-trees are a N-dimensional extension of B+-trees, useful for indexing sets
of rectangles and other polygons.
Supported in many modern database systems, along with variants like R+ -
trees and R*-trees.
Basic idea: generalize the notion of a one-dimensional interval associated
with each B+ -tree node to an
N-dimensional interval, that is, an N-dimensional rectangle.
Will consider only the two-dimensional case (N = 2)
• generalization for N > 2 is straightforward, although R-trees work well
only for relatively small N
A polygon is stored only in one node, and the bounding box of the node
must contain the polygon
• The storage efficiency or R-trees is better than that of k-d trees or
quadtrees since a polygon is stored only once
Database System Concepts - 7th Edition 24.21 ©Silberschatz, Korth and Sudarshan
Example R-Tree
The bounding box of a node is a minimum sized rectangle that contains
all the rectangles/polygons associated with the node
• Bounding boxes of children of a node are allowed to overlap
Rectangles being Indexed R-Tree
Database System Concepts - 7th Edition 24.22 ©Silberschatz, Korth and Sudarshan
Search in R-Trees
Database System Concepts - 7th Edition 24.23 ©Silberschatz, Korth and Sudarshan
Search in R-Trees (Cont.)
Can be very inefficient in worst case since multiple paths may need to be
searched
• but works acceptably in practice.
Simple extensions of search procedure to handle predicates contained-in
and contains
Database System Concepts - 7th Edition 24.24 ©Silberschatz, Korth and Sudarshan
Insertion in R-Trees
To insert a data item:
Find a leaf to store it, and add it to the leaf
To find leaf, follow a child (if any) whose bounding box contains
bounding box of data item, else child whose overlap with data item
bounding box is maximum
Handle overflows by splits (as in B+ -trees)
Split procedure is different though (see below)
Adjust bounding boxes starting from the leaf upwards
Split procedure:
Goal: divide entries of an overfull node into two sets such that the
bounding boxes have minimum total area
This is a heuristic. Alternatives like minimum overlap are possible
Finding the “best” split is expensive, use heuristics instead
See next slide
Database System Concepts - 7th Edition 24.25 ©Silberschatz, Korth and Sudarshan
Splitting an R-Tree Node
Quadratic split divides the entries in a node into two new nodes as
follows
1. Find pair of entries with “maximum separation”
that is, the pair such that the bounding box of the two would has
the maximum wasted space (area of bounding box – sum of
areas of two entries)
2. Place these entries in two new nodes
3. Repeatedly find the entry with “maximum preference” for one of the
two new nodes, and assign the entry to that node
Preference of an entry to a node is the increase in area of
bounding box if the entry is added to the other node
4. Stop when half the entries have been added to one node
Then assign remaining entries to the other node
Cheaper linear split heuristic works in time linear in number of entries,
Cheaper but generates slightly worse splits.
Database System Concepts - 7th Edition 24.26 ©Silberschatz, Korth and Sudarshan
Deleting in R-Trees
Deletion of an entry in an R-tree done much like a B+-tree deletion.
In case of underfull node, borrow entries from a sibling if possible, else
merging sibling nodes
Alternative approach removes all entries from the underfull node,
deletes the node, then reinserts all entries
Database System Concepts - 7th Edition 24.27 ©Silberschatz, Korth and Sudarshan
Indexing Temporal Data
Temporal data refers to data that has an associated time period (interval)
Time interval has a start and end time
• End time set to infinity (or large date such as 9999-12-31) if a tuple is
currently valid and its validity end time is not currently known
Query may ask for all tuples that are valid at a point in time or during a time
interval
• Index on valid time period speeds up this task
Database System Concepts - 7th Edition 24.28 ©Silberschatz, Korth and Sudarshan
Indexing Temporal Data (Cont.)
To create a temporal index on attribute a:
• Use spatial index, such as R-tree, with attribute a as one dimension,
and time as another dimension
Valid time forms an interval in the time dimension
• Tuples that are currently valid cause problems, since value is infinite
or very large
Solution: store all current tuples (with end time as infinity) in a
separate index, indexed on (a, start-time)
• To find tuples valid at a point in time t in the current tuple
index, search for tuples in the range (a, 0) to (a,t)
Temporal index on primary key can help enforce temporal primary key
constraint
Database System Concepts - 7th Edition 24.29 ©Silberschatz, Korth and Sudarshan
Hashing
Database System Concepts - 7th Edition 24.30 ©Silberschatz, Korth and Sudarshan
Static Hashing
Database System Concepts - 7th Edition 24.31 ©Silberschatz, Korth and Sudarshan
Hash Functions
Worst hash function maps all search-key values to the same bucket; this
makes access time proportional to the number of search-key values in the
file.
An ideal hash function is uniform, i.e., each bucket is assigned the same
number of search-key values from the set of all possible values.
Ideal hash function is random, so each bucket will have the same number
of records assigned to it irrespective of the actual distribution of search-key
values in the file.
Typical hash functions perform computation on the internal binary
representation of the search-key.
For example, for a string search-key, the binary representations of all
the characters in the string could be added and the sum modulo the
number of buckets could be returned.
Database System Concepts - 7th Edition 24.32 ©Silberschatz, Korth and Sudarshan
Example of Hash File Organization
Database System Concepts - 7th Edition 24.34 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows
Database System Concepts - 7th Edition 24.35 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows (Cont.)
Overflow chaining – the overflow buckets of a given bucket are chained
together in a linked list.
Above scheme is called closed addressing (also called closed hashing
or open hashing depending on the book you use)
An alternative, called
open addressing
(also called
open hashing or
closed hashing
depending on the book
you use) which does not
use overflow buckets,
is not suitable for
database applications.
Database System Concepts - 7th Edition 24.36 ©Silberschatz, Korth and Sudarshan
Deficiencies of Static Hashing
Database System Concepts - 7th Edition 24.38 ©Silberschatz, Korth and Sudarshan
Dynamic Hashing
Periodic rehashing
• If number of entries in a hash table becomes (say) 1.5 times size of
hash table,
create new hash table of size (say) 2 times the size of the previous
hash table
Rehash all entries to new table
Linear Hashing
• Do rehashing in an incremental manner
Extendable Hashing
• Tailored to disk based hashing, with buckets shared by multiple hash
values
• Doubling of # of entries in hash table, without doubling # of buckets
Database System Concepts - 7th Edition 24.39 ©Silberschatz, Korth and Sudarshan
Extendable Hashing
Database System Concepts - 7th Edition 24.40 ©Silberschatz, Korth and Sudarshan
General Extendable Hash Structure
Database System Concepts - 7th Edition 24.41 ©Silberschatz, Korth and Sudarshan
Use of Extendable Hash Structure
Database System Concepts - 7th Edition 24.42 ©Silberschatz, Korth and Sudarshan
Insertion in Extendable Hash Structure (Cont.)
Database System Concepts - 7th Edition 24.43 ©Silberschatz, Korth and Sudarshan
Deletion in Extendable Hash Structure
Database System Concepts - 7th Edition 24.44 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Database System Concepts - 7th Edition 24.46 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Database System Concepts - 7th Edition 24.47 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Database System Concepts - 7th Edition 24.48 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Database System Concepts - 7th Edition 24.49 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Database System Concepts - 7th Edition 24.50 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Database System Concepts - 7th Edition 24.51 ©Silberschatz, Korth and Sudarshan
Example (Cont.)
Database System Concepts - 7th Edition 24.52 ©Silberschatz, Korth and Sudarshan
Extendable Hashing vs. Other Schemes
Benefits of extendable hashing:
Hash performance does not degrade with growth of file
Minimal space overhead
Disadvantages of extendable hashing
Extra level of indirection to find desired record
Bucket address table may itself become very big (larger than memory)
Cannot allocate very large contiguous areas on disk either
Solution: B+-tree structure to locate desired record in bucket
address table
Changing size of bucket address table is an expensive operation
Linear hashing is an alternative mechanism
Allows incremental growth of its directory (equivalent to bucket
address table)
At the cost of more bucket overflows
Database System Concepts - 7th Edition 24.53 ©Silberschatz, Korth and Sudarshan
Comparison of Ordered Indexing and Hashing
Database System Concepts - 7th Edition 24.54 ©Silberschatz, Korth and Sudarshan
End of Chapter 24
Database System Concepts - 7th Edition 24.55 ©Silberschatz, Korth and Sudarshan
Partitioned Hashing
Hash values are split into segments that depend on each attribute of the
search-key.
(A1, A2, . . . , An) for n attribute search-key
Example: n = 2, for customer, search-key being
(customer-street, customer-city)
search-key value hash value
(Main, Harrison) 101 111
(Main, Brooklyn) 101 001
(Park, Palo Alto) 010 010
(Spring, Brooklyn) 001 001
(Alma, Palo Alto) 110 010
To answer equality query on single attribute, need to look up multiple
buckets. Similar in effect to grid files.
Database System Concepts - 7th Edition 24.56 ©Silberschatz, Korth and Sudarshan