Final Review
Final Review
COMPSCI 445
1
Exam Time:
5/13/2024, Monday 3:30-5:30
AgEng 119
2
Topics from the before time
Relational Algebra
know the operators and what they mean
selection σ - filter rows by predicate
projection π - drop columns from all rows
join ⋈ - join two relations by predicate
Storage basics
what is a page?
what is the buffer manager?
what makes a dirty page dirty?
difference between heap files and indexes
Some direct questions on these topics, but will
come up on questions about indexing and
evaluation and optimization 3
Topics from the before time
SQL Topics:
GROUP BY + Aggregates
ORDER BY
JOINs
Don’t worry about:
XML
XPath
JSON
DB Security
4
Indexing
5
Indexing
An index on a table speeds up selections on the
search key fields for the index.
Any subset of fields of a relation can be the search key
Search key is not the same as key
• key - minimal set of fields that uniquely identify a
record in a relation
An index contains a collection of data entries, and
supports efficient retrieval of all data entries k*
with a given search key value k.
Data entry = data record or pointer to data record
• data record for clustered indices
• pointer for unclustered indices
6
B+ Tree Indexes
Non-leaf
Pages
Leaf
Pages
(Sorted by search key)
Leaf pages contain data entries, and are chained (prev & next)
Non-leaf pages have index entries; only used to direct searches:
index entry
P0 K 1 P1 K 2 P 2 K m Pm
7
Example B+ Tree
Search begins at root, and key comparisons
direct it to a leaf.
Search for 5*, 15*, all data entries >= 24* ...
Root
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
9
Hash-based indexes
10
Hash-Based Indexes
Good for equality
selections.
Search key
Index is a collection of value k
buckets. h
12
Example
LOCAL DEPTH L 2
Bucket A
GLOBAL DEPTH D 4* 12* 32* 16*
Directory is array of size 4,
global depth D = 2. 2 2
(L ≤ D) 01
13
Insert h(r)=20 (Causes Doubling)
LOCAL DEPTH 2 3
LOCAL DEPTH
Bucket A
GLOBAL DEPTH 32*16* 32* 16* Bucket A
GLOBAL DEPTH
2 2
3 2
00 1* 5* 21*13* Bucket B 000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
4* 12* 20* Bucket A2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
14
Clustered vs Unclustered
Clustered Indices
an index is clustered if it stores the actual
record data in the key entries
Only one clustered index per table
Index scans are also sequential scans w.r.t.
record data
Unclustered Indices
of fields. 11,80 11
Equality query: Every field 12,10 12
value is equal to a constant 12,20 name age sal
12
value. E.g. wrt <sal,age> index: 13,75 bob 12 10 13
• age=20 and sal =75 <age, sal> cal 11 80 <age>
joe 12 20
Range query: Some field value
10,12 sue 13 75 10
is not a constant. E.g.:
20,12 Data records 20
• age =20; or age=20 and sal
75,13 sorted by name 75
> 10
80,11 80
Data entries in index <sal, age> <sal>
sorted by search key to Data entries in index Data entries
sorted by <sal,age> sorted by <sal>
support range queries.
16
Index-Only Plans
17
Index-Only Plans
Tree index <E.dno,E.sal>
18
Index-Only Plans
Tree index <E.dno,E.sal>
Query is asking for minimum salary for each dno.
The index is already grouping sal by dno
19
Index-Only Plans
Tree index <E.dno,E.sal>
Query is asking for minimum salary for each dno.
The index is already grouping sal by dno
SELECT AVG(E.sal)
FROM Emp E
WHERE E.age=25 AND
E.sal BETWEEN 3000 AND 5000
21
Index-Only Plans
22
Index-Only Plans
23
Index-Only Plans
Query uses =
and BETWEEN
24
Index-Only Plans
25
Query Evaluation: Sorting
26
Query Evaluation
Query Evaluation Plan: tree of relational algebra
operators
sname
sid=sid
Reserves Sailors
27
External Sorting
Sorting a set that’s too big to fit in memory
Idea: divide and conquer
Sort file in separate bits (pages)
Combine sorted bits into larger sorted runs
Eventually whole relation is sorted
Multiple passes
Pass 0: for each page in the file
• read it into memory
• sort it in memory
• write it back out
Pass 1, 2, …
• Combine sorted runs from pass N-1 into a larger sorted run
28
2-Way Sort: Requires 3 Buffers
Combine sorted pages w/ merging
Resulting run is 2x the size of each input
Once all pass N runs are merged
Start on pass N + 1
INPUT 1
OUTPUT
INPUT 2
29
Two-Way External Merge Sort
3,4 6,2 9,4 8,7 5,6 3,1 2 Input file
Each pass we read + PASS 0
write each page in file: 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs
PASS 1
2N. 2,3 4,7 1,3
2-page runs
8,9 5,6 2
N pages in the file => 4,6
PASS 2
the number of passes 2,3
4,4 1,2 4-page runs
6,7 3,5
So total cost is: 8,9 6
PASS 3
1,2
2,3
Idea: Divide and conquer: 3,4 8-page runs
4,5
sort subfiles and merge 6,6
7,8
9
30
General External Merge Sort
More than 3 buffer pages. How can we utilize them?
To sort a file with N pages using B buffer pages:
Pass 0: use B buffer pages. Produce sorted runs of B
pages each.
Pass 1, …, etc.: merge B-1 runs.
INPUT 1
... INPUT 2
... OUTPUT ...
INPUT B-1
Disk Disk
B Main memory buffers
31
Cost of External Merge Sort
Number of passes:
Cost = 2N * (# of passes)
E.g., with 5 buffer pages, to sort 108 page
file:
Pass 0: = 22 sorted runs of 5 pages each
(last run is only 3 pages)
Pass 1: = 6 sorted runs of 20 pages each
(last run is only 8 pages)
Pass 2: 2 sorted runs, 80 pages and 28 pages
Pass 3: Sorted file of 108 pages
32
Query Evaluation: Joins
33
Equality Joins With One Join Column
In algebra: R ⋈ S. Common relational operation!
Size of R is M
Size of S is N
Cost metric: # of I/Os. Ignore cost of output
What is the optimal cost?
M + N (size of R + size of S)
trivially possible if everything fits in memory
34
Page-Oriented Nested Loops Join
foreach page PR in R do
foreach page PS in S do
foreach tuple r in PR do
foreach tuple s in PS do
if ri == sj then add <r, s> to result
For each page of R, get each page of S, and
write out matching pairs of tuples <r, s>,
where r is in R-page and S is in S-page.
Cost of outer loop is M
Cost of inner loop is N per invocation
The per-tuple comparisons are all in-memory, so
they’re “free” (read: O(1) since they don’t require I/O)
Cost: M + M * N
35
Block Nested Loops Join
Take advantage of the fact that the DB has more
than 2 pages of memory available
Take the smaller relation, say R, as outer, the other as
inner.
Use one buffer for scanning the inner S, one buffer for
output, and use all remaining buffers to hold ``block’’ of
outer R.
If R fits in memory, scan S one page at a time joining
against all of R (cost M+N)
If R doesn’t fit, still scan outer relation just once, but
scan inner relation less often (R / BufferSize) iterations
36
Block Nested Loops Join
let R_buffer = …
while offset < R.size
R_buffer = read(R, offset, buffer_size)
offset += buffer_size
foreach PS in S do
foreach PR in R_buffer do
foreach tuple r in PR do
foreach tuple s in PS do
if r == s add <r, s> to result
Cost: Scan of outer + #outer blocks * scan of
inner
#outer blocks = ⎡ # pages of outer / block size⎤
Given available buffer size B, block size is at most B-2.
M + N * ⎡ M / B-2 ⎤
37
Block Nested Loops Join w/ hashing
In practice hashing is often worth the memory cost
Scanning GBs of tuples is not instantaneous, and most of that work is
wasted if the selectivity is low
If the cost of all the memory scanning > cost of another inner loop,
then it’s absolutely worth it!
38
Index Nested Loops Join
If there is an index on the join column of one
relation (say S), can make it the inner and exploit
the index.
avoid having to scan all pages all the time
• May even skip over pages if they have no matches
First join algorithm that isn’t computing a cross product!
Index Scan!
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result
39
Index Nested Loops Join
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result
Cost: M + ( (M*pR) * cost of finding matching S tuples)
pR is the likelihood that an S tuple matches
For each R tuple, cost of probing S index is:
about 1.2 for hash index
2-4 for B+ tree.
Clustered index: ~1 I/O for multiple matching S
tuples. (
Unclustered: up to 1 I/O per matching S tuple.
40
Sort-Merge Join (R i=j
S)
(1) Sort R and S on the join column,
(2) Merge them (on join col.), and output result
tuples.
Merging
If R_i < S_j, advance R to R_i+1
If S_j < R_i, advance S to S_j+1
If R_i == S_j we’ve located a matching partition
For all R_x == R_i and S_y == S_j output <R_x, S_y>
Sorting lets us efficiently locate a matching
partition in value space
41
Sort-Merge Join (R i=j
S)
(1) Sort R and S on the join column, (2) Merge them
(on join col.), and output result tuples.
Merge: repeat until either R or S is finished
R is scanned once; each S group is scanned once
per matching R tuple. (Multiple scans of an S
group are likely to find needed pages in buffer.)
This exploits locality
Temporal locality: S group is read repeatedly close in
time and then not read again
Spatial locality: S group is scanned from start to finish
each time
42
Sort-Merge Join (R i=j
S)
Cost O(M logBM) + O(N logBN) + G
logBM and logBN are small numbers 2-4
G = O(M+N) if R tuples match with a small
constant number of S tuples
both partitions fit in memory
guaranteed with foreign key joins (1-to-1)
G = O(M * N) if the inner relation’s partition
doesn’t fit in memory
each iteration has to read the whole thing in again
rare in practice
43
Hash Join
Rather than partitioning by sorting, use a
hash function
If R and S are hashed into buckets with
the same hash function (on the join
column)
Matching tuples from R and S must
have the same hash value
Therefore, you only need to perform
matching on pairs of buckets
• for hash join matching is called
‘probing’ 44
Hash Join - Partitioning
Readin R and S
hash the tuples into partitions (buckets)
write buckets out
Original
Relation OUTPUT Partitions
1
1
INPUT 2
hash 2
... function
h B-1
B-1
45
Hash Join - Probing
foreach partition i
read in pages for R_i
scan through pages for S_i evaluating
join cond
like a block-nested join, just for a
partition at a time
Partitions
of R & S Join Result
Ri pages
Output
Input buffer
for Si buffer
46
B main memory buffers Disk
Hash Join - Probing
Can apply block-nested hash optimization
make comparisons cheaper with an in-
memory hash
use a different hash function (h2) from
the initial hash partitioning function
Partitions
of R & S Join Result
Hash table for partition
hash Ri (k < B-1 pages)
fn
h2
h2
Output
Input buffer
for Si buffer
48
Cost of Hash-Join
Partitioning reads+writes both relns; 2(M+N).
Probing reads both relns; M+N I/Os.
The total is 3(M+N).
In our running example, a total of 4500 I/Os using hash join,
~45s (compared to 140 hours w. NLJ).
Sort-Merge Join vs. Hash Join:
With optimizations to Sort-Merge (not discussed in class), the
cost is similar ~ 3(M+N)
Hash Join superior if relation sizes differ greatly.
Hash Join has been shown to be highly parallelizable.
Sort-Merge less sensitive to data skew; result is sorted.
49
Query Optimization
50
Review of Relational Algebra
Equivalences
Some useful equivalences
Selections: (Cascade)
(Commute)
Projections: (Cascade)
Joins: R (S T) (R S) T (Associative)
(R S) (S R) (Commute)
51
Pipelined Evaluation
Materialization: Output is saved in a temporary
relation for use (read: multiple scans) by the next
operator.
Pipelining: Do everything tuple-by-tuple. Avoid
the cost of writing things out and reading things
back. Can occur in two cases:
Unary operator: when the input is pipelined into it, the
operator is applied on-the-fly, e.g. selection on-the-fly,
project on-the-fly.
Binary operator: e.g., the outer relation in indexed nested
loops join.
52
Highlights of System R Optimizer
Impact: most widely used; works well for < 10 joins.
Cost of a plan: approximate at best.
Statistics, maintained in system catalogs, used to estimate
cost of operations and result sizes.
Considers combination of CPU and I/O costs.
Plan Space: too large, must be pruned.
Only considers the space of left-deep plans.
Plan Search: dynamic programming (prunes useless
subtrees).
53
Left deep plans
The inner relation is always a base relation
no need to materialize the inner relation
Naturally support pipelining of joins
Downside: there may be non-left deep plans that are better than
the best left deep plan we optimize
A tradeoff to limit plan space and enable pipelining
A B A B C D
54
Plan Space Exploration Summary
For each block, the plans considered are:
All available access methods, for each reln in FROM clause.
Index scan(s)
table scans
All left-deep join trees: all the ways to join the relns one-at-a-
time, with the inner reln in the FROM clause.
Considering all permutations of N relns, N factorial!
All join methods, for each join in the tree.
BNL, indexed NL
Hash join, sort-merge join
Appropriate places for selections and projections.
55
Cost Estimation
For each plan considered, must estimate its cost.
Estimate cost of each operation in a plan tree:
Depends on input cardinalities.
We’ve discussed how to estimate the cost of operations
(sequential scan, index scan, joins, etc.)
Estimate size of result for each operation in tree:
Use information about the input relations.
For selections and joins, assume independence of
predicates and uniform distribution of values.
56
Cost Estimation
The database has access to:
sizes of the relations and indices (#pages, #tuples)
min/max values for each indexed column
more detailed summaries of value distributions
•For each column we maintain a histogram of values
•A summary of the broad ‘shape’ of the actual data
57
Size Estimation & Reduction Factors
SELECT attribute list
FROM relation list
Consider a query block: WHERE term1 AND ... AND termk
We construct the estimate by:
Estimating the selectivity of each term
•We construct a Reduction factor (RF) which is a ratio of
expected satisfying tuples over all tuples
Then over-estimate the maximum result set size
Result cardinality = Max # tuples * product of all RF’s.
58
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
Reduction factor (RF) or Selectivity of each term:
Assumption 1: uniform distribution of the values!
If we have an index we can estimate with:
Term col=value:
• RF = 1/#Keys(I), given index I on col
• This follows from our uniformity assumption. The
probability of a key from I = val
59
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
Reduction factor (RF) or Selectivity of each term:
Assumption 1: uniform distribution of the values!
If we don’t have an index
Term col=value:
• Can assume some constant RF (e.g. 1/10)
• Could estimate from histogram as well
• use uniform dist. assump. to estimate slice of bucket
• divide by total across all buckets
60
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
Reduction factor (RF) or Selectivity of each term:
Assumption 1: uniform distribution of the values!
If we have an index we can estimate with:
Term col>value:
• RF = (High(I)-value)/(High(I)-Low(I))
• From uniformity, we assume all values are evenly
distributed between High(I) and Low(I)
61
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
Reduction factor (RF) or Selectivity of each term:
Assumption 1: uniform distribution of the values!
If we don’t have an index
Term col>value:
• Use another constant (1/2 isn’t uncommon)
• With a histogram, you could do better estimation
62
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
Reduction factor (RF) or Selectivity of each term:
Assumption 1: uniform distribution of the values!
If we have an index for each column, I1 and I2
Term col1=col2:
• RF = 1/MAX(#Keys(I1), #Keys(I2))
• Assume that each value in I1 has a matching value in I2
• Each of the #Keys(I2) is equally likely (by uniformity)
• The chance of a match is then 1/#keys(I2)
• MM for I1 if I1 was the larger index
63
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
Reduction factor (RF) or Selectivity of each term:
Assumption 1: uniform distribution of the values!
If we have an index for only one column, I
Term col1=col2:
• RF = 1/#Keys(I)
If we have no indexes at all
• Use the arbitrary constant 1/10
• Could use histogram data to approx. joint distribution
64
Size Estimation & Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
Reduction factor (RF) or Selectivity of each term:
Max. number of tuples in result
the product of the cardinalities of relations in the FROM
clause.
Pull cardinalities from system catalog, and multiply!
Result cardinality = Max # tuples * product of all RF’s.
Assumption 2: terms are independent!
65
Transactions and Concurrency
Control
66
What is a Transaction?
A transaction is the DBMS’s abstract view of
a user program: a sequence of reads and
writes.
Organizes a sequence of operations into a
history
Useful for concurrency reasoning
Also useful for programming in a world of
concurrent data access and mutation
67
The ACID Properties
Database systems ensure the ACID properties:
Atomicity: all operations of transaction reflected
properly in database, or none are.
Consistency: each transaction in isolation keeps
the database in a consistent state
Isolation: should be able to understand what’s
going on by considering each separate transaction
independently.
Durability: updates stay in the DBMS!!!
68
Serializability
You could always execute transactions in some
forced order
That would be very slow
lots of waiting
So, how do you run multiple transactions but
maintain the illusion of serial execution?
Implementing serializability requires
understanding effects of operations well
enough to prevent non-serial-equivalent
interleavings
69
Schedules
The way we’re going to reason about transaction
interleavings is with schedules
A schedule is a list of operations
no multi-threading or anything
Two operations next to each other in the schedule
may be from different transactions
CANNOT re-order operations within the SAME
transaction
70
A schedule
72
Conflicting operations
Two operations conflict if:
they operate on the same data object, and
at least one is a WRITE.
Schedule outcome is determined by order of
the conflicting operations.
non-conflicting operations can be
reordered with each other
conflicting operations act as a barrier to
reordering
73
Conflict Serializable Schedules
Two schedules are conflict equivalent if:
Involve the same actions of the same transactions
Every pair of conflicting actions (of committed
txn) are ordered the same way.
Alternatively: S can be transformed to S’ by swaps
of non-conflicting actions.
Schedule S is conflict serializable if S is
conflict equivalent to some serial schedule
T1 T2
R(A)
W(A)
R(A)
W(A)
R(B)
W(B)
R(B)
W(B)
Commit
Commit
75
Conflict-serializable schedule
T1 T2
R(A)
We can move T1’s W(A)
R(B); W(B) before
T2’s R(A) to R(B)
transform this into
a serial schedule W(B)
Commit R(A)
W(A)
R(B)
W(B)
Commit
76
Precedence graphs
77
Schedules involving abort
In addition to being serial equivalent, we
need schedules to be recoverable
A recoverable schedule can handle aborts
of any uncommitted transaction
Recoverable schedule: transactions commit
only after all transactions whose changes
they read commit.
commits are delayed until the system can
guarantee no failure
78
Cascading rollback
T1 T2 T3
Even if schedule is
R(A)
recoverable, several R(B)
transactions may need to be W(A)
rolled back to recover R(A)
correctly. W(A)
R(A)
Cascading Rollback: a single Abort
transaction failure leading to
a series of rollbacks
79
Cascadeless schedules
Cascadeless schedule: For
any transactions Ti and Tj: if
Tj reads data written by Ti,
then Ti commits before read
operation of Tj. T1 T2 T3
R(A)
R(B)
T2’s read would have to wait for T1 to commit W(A)
R(A)
T3’s read would have to wait for T2 to commit W(A)
R(A)
Abort
80
Lock-Based Concurrency Control
Just like in procedural programming
locks can be used to maintain orders of
access to shared objects
Lock - associated with some object
shared (r) or exclusive (w)
Locking protocol - set of rules to be followed
by each transaction to ensure good
properties.
81
Lock Compatibility Matrix
T1 T2
X(A) granted
X(B) granted
X(B) queued
X(A) queued
89
Deadlock Detection
92
Basic Idea: Logging
Record REDO and UNDO information, for
every update, in a log.
Sequential writes to log
• sequential writing is much faster than random
writes from page I/O
Log is a separate file, so it can be on its own disk
diffs written to log, smaller than pages
• multiple updates fit in a single log page.
93
Basic Idea: Logging
94
Write-Ahead Logging (WAL)
96