0% found this document useful (0 votes)
30 views96 pages

Final Review

Uploaded by

est09serra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views96 pages

Final Review

Uploaded by

est09serra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Final Review

COMPSCI 445

1
Exam Time:
5/13/2024, Monday 3:30-5:30
AgEng 119

Allowed 2 Page of Notes

2
Topics from the before time
 Relational Algebra
 know the operators and what they mean
 selection σ - filter rows by predicate
 projection π - drop columns from all rows
 join ⋈ - join two relations by predicate
 Storage basics
 what is a page?
 what is the buffer manager?
 what makes a dirty page dirty?
 difference between heap files and indexes
 Some direct questions on these topics, but will
come up on questions about indexing and
evaluation and optimization 3
Topics from the before time
 SQL Topics:
 GROUP BY + Aggregates
 ORDER BY
 JOINs
 Don’t worry about:
 XML
 XPath
 JSON
 DB Security

4
Indexing

5
Indexing
 An index on a table speeds up selections on the
search key fields for the index.
 Any subset of fields of a relation can be the search key
 Search key is not the same as key
• key - minimal set of fields that uniquely identify a
record in a relation
 An index contains a collection of data entries, and
supports efficient retrieval of all data entries k*
with a given search key value k.
 Data entry = data record or pointer to data record
• data record for clustered indices
• pointer for unclustered indices

6
B+ Tree Indexes

Non-leaf
Pages

Leaf
Pages
(Sorted by search key)

 Leaf pages contain data entries, and are chained (prev & next)
 Non-leaf pages have index entries; only used to direct searches:
index entry

P0 K 1 P1 K 2 P 2 K m Pm

7
Example B+ Tree
 Search begins at root, and key comparisons
direct it to a leaf.
 Search for 5*, 15*, all data entries >= 24* ...

Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Based on the search for 15*, we know it is not in the tree!


8
B+ Trees in Practice

 Typical order: 100. Typical fill-factor: 67%.


 average fanout = 133
 Typical capacities:
 Height 4: 1334 = 312,900,700 records
 Height 3: 1333 = 2,352,637 records
 Can often hold top levels in buffer pool:
 Level 1 = 1 page = 8 Kbytes
 Level 2 = 133 pages = 1 Mbyte
 Level 3 = 17,689 pages = 133 MBytes

9
Hash-based indexes

 Hash-based indexes are best for equality selections.


Cannot support range searches.
 Static and dynamic hashing techniques exist;
trade-offs for dynamic data
 static: fixed index size
 dynamic: grows efficiently with data size

10
Hash-Based Indexes
 Good for equality
selections.
Search key
 Index is a collection of value k
buckets. h

 Bucket = primary page plus


zero or more overflow pages. 1 2 3 … … … … … … N-1

 Buckets contain data entries.


 Hashing function h: h(k) =
bucket of data entries of
the search key value k.
 No need for “index entries” in
this scheme. 11
Extendible Hashing
 Situation: Bucket (primary page) becomes full.
Why not re-organize file by doubling # of buckets?
 Reading and writing all pages is expensive!
 Idea: Use directory of pointers to buckets, double # of
buckets by (1) doubling the directory, (2) splitting just the
bucket that overflowed!
 Directory much smaller than file, so doubling it is much
cheaper. Only one page of data entries is split. No
overflow page!
 Trick lies in how hash function is adjusted!

12
Example
LOCAL DEPTH L 2
Bucket A
GLOBAL DEPTH D 4* 12* 32* 16*
 Directory is array of size 4,
global depth D = 2. 2 2

 Each bucket has local depth L 00 1* 5* 21* 13*


Bucket B

(L ≤ D) 01

 To find bucket for r, 10 2


Bucket C
11 10*
 (1) get h(r),
 (2) take last D bits of h(r). DIRECTORY 2

• If h(r) = 5 = binary 15* 7* 19*


Bucket D

101, DATA Entry PAGES


• Take last 2 bits, go to
bucket pointed to by 01.

13
Insert h(r)=20 (Causes Doubling)
LOCAL DEPTH 2 3
LOCAL DEPTH
Bucket A
GLOBAL DEPTH 32*16* 32* 16* Bucket A
GLOBAL DEPTH

2 2
3 2
00 1* 5* 21*13* Bucket B 000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
4* 12* 20* Bucket A2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
14
Clustered vs Unclustered
 Clustered Indices
an index is clustered if it stores the actual
record data in the key entries
Only one clustered index per table
Index scans are also sequential scans w.r.t.
record data
 Unclustered Indices

an index is unclustered if the key entries are


pointers to data entries
Additional disk I/O required to fetch record
data
Index scans are random I/O w.r.t. record data 15
Indexes with Composite Search Keys
 Composite Search Keys: Examples of composite key
Search on a combination indexes using lexicographic order.

of fields. 11,80 11
 Equality query: Every field 12,10 12
value is equal to a constant 12,20 name age sal
12
value. E.g. wrt <sal,age> index: 13,75 bob 12 10 13
• age=20 and sal =75 <age, sal> cal 11 80 <age>
joe 12 20
 Range query: Some field value
10,12 sue 13 75 10
is not a constant. E.g.:
20,12 Data records 20
• age =20; or age=20 and sal
75,13 sorted by name 75
> 10
80,11 80
 Data entries in index <sal, age> <sal>
sorted by search key to Data entries in index Data entries
sorted by <sal,age> sorted by <sal>
support range queries.
16
Index-Only Plans

 Some queries can be Index on <E.dno>


answered without
retrieving any tuples
from one or more of the SELECT E.dno, COUNT(*)
relations involved, if a FROM Emp E
suitable index is GROUP BY E.dno
available.
 This is possible when
all required fields are
present in the index.

17
Index-Only Plans
Tree index <E.dno,E.sal>

SELECT E.dno, MIN(E.sal)


Referenced cols
FROM Emp E
are dno and sal.
GROUP BY E.dno

What about <E.sal,E.dno> ?

18
Index-Only Plans
Tree index <E.dno,E.sal>
Query is asking for minimum salary for each dno.
The index is already grouping sal by dno

SELECT E.dno, MIN(E.sal)


Referenced cols
FROM Emp E
are dno and sal.
GROUP BY E.dno

What about <E.sal,E.dno> ?

19
Index-Only Plans
Tree index <E.dno,E.sal>
Query is asking for minimum salary for each dno.
The index is already grouping sal by dno

SELECT E.dno, MIN(E.sal)


Referenced cols
FROM Emp E
are dno and sal.
GROUP BY E.dno

What about <E.sal,E.dno> ?


The index is grouping dno by sal
The entire index would have to be scanned and
sorted to satisfy the query.
Technically can be an index-only query, but no
better than a table scan + sort
20
Index-Only Plans

SELECT AVG(E.sal)
FROM Emp E
WHERE E.age=25 AND
E.sal BETWEEN 3000 AND 5000

21
Index-Only Plans

Query uses table


fields: sal, age
SELECT AVG(E.sal)
FROM Emp E
WHERE E.age=25 AND
E.sal BETWEEN 3000 AND 5000

22
Index-Only Plans

Query uses table <E.age, E.sal> or <E.sal, E.age>


fields: sal, age
SELECT AVG(E.sal)
FROM Emp E
WHERE E.age=25 AND
E.sal BETWEEN 3000 AND 5000

23
Index-Only Plans

Query uses table <E.age, E.sal> or <E.sal, E.age>


fields: sal, age
SELECT AVG(E.sal)
FROM Emp E
WHERE E.age=25 AND
E.sal BETWEEN 3000 AND 5000

Query uses =
and BETWEEN

24
Index-Only Plans

Query uses table <E.age, E.sal> or <E.sal, E.age>


fields: sal, age
SELECT AVG(E.sal)
FROM Emp E
WHERE E.age=25 AND
E.sal BETWEEN 3000 AND 5000

Query uses = Tree index


and BETWEEN

25
Query Evaluation: Sorting

26
Query Evaluation
 Query Evaluation Plan: tree of relational algebra
operators

sname

bid=100 rating > 5

sid=sid

Reserves Sailors

27
External Sorting
 Sorting a set that’s too big to fit in memory
 Idea: divide and conquer
 Sort file in separate bits (pages)
 Combine sorted bits into larger sorted runs
 Eventually whole relation is sorted
 Multiple passes
 Pass 0: for each page in the file
• read it into memory
• sort it in memory
• write it back out
 Pass 1, 2, …
• Combine sorted runs from pass N-1 into a larger sorted run

28
2-Way Sort: Requires 3 Buffers
 Combine sorted pages w/ merging
 Resulting run is 2x the size of each input
 Once all pass N runs are merged
 Start on pass N + 1

INPUT 1

OUTPUT
INPUT 2

Main memory buffers Disk


Disk

29
Two-Way External Merge Sort
3,4 6,2 9,4 8,7 5,6 3,1 2 Input file
 Each pass we read + PASS 0
write each page in file: 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs
PASS 1
2N. 2,3 4,7 1,3
2-page runs
8,9 5,6 2
 N pages in the file => 4,6
PASS 2
the number of passes 2,3
4,4 1,2 4-page runs
6,7 3,5
 So total cost is: 8,9 6
PASS 3

1,2
2,3
 Idea: Divide and conquer: 3,4 8-page runs
4,5
sort subfiles and merge 6,6
7,8
9
30
General External Merge Sort
More than 3 buffer pages. How can we utilize them?
 To sort a file with N pages using B buffer pages:
 Pass 0: use B buffer pages. Produce sorted runs of B
pages each.
 Pass 1, …, etc.: merge B-1 runs.
INPUT 1

... INPUT 2
... OUTPUT ...
INPUT B-1
Disk Disk
B Main memory buffers
31
Cost of External Merge Sort
 Number of passes:
 Cost = 2N * (# of passes)
 E.g., with 5 buffer pages, to sort 108 page
file:
 Pass 0: = 22 sorted runs of 5 pages each
(last run is only 3 pages)
 Pass 1: = 6 sorted runs of 20 pages each
(last run is only 8 pages)
 Pass 2: 2 sorted runs, 80 pages and 28 pages
 Pass 3: Sorted file of 108 pages

32
Query Evaluation: Joins

33
Equality Joins With One Join Column
 In algebra: R ⋈ S. Common relational operation!
 Size of R is M
 Size of S is N
 Cost metric: # of I/Os. Ignore cost of output
 What is the optimal cost?
 M + N (size of R + size of S)
 trivially possible if everything fits in memory

34
Page-Oriented Nested Loops Join
foreach page PR in R do
foreach page PS in S do
foreach tuple r in PR do
foreach tuple s in PS do
if ri == sj then add <r, s> to result
 For each page of R, get each page of S, and
write out matching pairs of tuples <r, s>,
where r is in R-page and S is in S-page.
 Cost of outer loop is M
 Cost of inner loop is N per invocation
 The per-tuple comparisons are all in-memory, so
they’re “free” (read: O(1) since they don’t require I/O)
 Cost: M + M * N

35
Block Nested Loops Join
 Take advantage of the fact that the DB has more
than 2 pages of memory available
 Take the smaller relation, say R, as outer, the other as
inner.
 Use one buffer for scanning the inner S, one buffer for
output, and use all remaining buffers to hold ``block’’ of
outer R.
 If R fits in memory, scan S one page at a time joining
against all of R (cost M+N)
 If R doesn’t fit, still scan outer relation just once, but
scan inner relation less often (R / BufferSize) iterations

36
Block Nested Loops Join
let R_buffer = …
while offset < R.size
R_buffer = read(R, offset, buffer_size)
offset += buffer_size
foreach PS in S do
foreach PR in R_buffer do
foreach tuple r in PR do
foreach tuple s in PS do
if r == s add <r, s> to result
 Cost: Scan of outer + #outer blocks * scan of
inner
 #outer blocks = ⎡ # pages of outer / block size⎤
 Given available buffer size B, block size is at most B-2.
 M + N * ⎡ M / B-2 ⎤
37
Block Nested Loops Join w/ hashing
 In practice hashing is often worth the memory cost
 Scanning GBs of tuples is not instantaneous, and most of that work is
wasted if the selectivity is low
 If the cost of all the memory scanning > cost of another inner loop,
then it’s absolutely worth it!

R&S Join Result


Hash table for block of R
(k < B-1 pages)
...
... ...
Input buffer for S Output buffer

38
Index Nested Loops Join
 If there is an index on the join column of one
relation (say S), can make it the inner and exploit
the index.
 avoid having to scan all pages all the time
• May even skip over pages if they have no matches
 First join algorithm that isn’t computing a cross product!
Index Scan!
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result

39
Index Nested Loops Join
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result
 Cost: M + ( (M*pR) * cost of finding matching S tuples)
 pR is the likelihood that an S tuple matches
 For each R tuple, cost of probing S index is:
 about 1.2 for hash index
 2-4 for B+ tree.
 Clustered index: ~1 I/O for multiple matching S
tuples. (
 Unclustered: up to 1 I/O per matching S tuple.

40
Sort-Merge Join (R i=j
S)
 (1) Sort R and S on the join column,
 (2) Merge them (on join col.), and output result
tuples.
 Merging
 If R_i < S_j, advance R to R_i+1
 If S_j < R_i, advance S to S_j+1
 If R_i == S_j we’ve located a matching partition
 For all R_x == R_i and S_y == S_j output <R_x, S_y>
 Sorting lets us efficiently locate a matching
partition in value space
41
Sort-Merge Join (R i=j
S)
 (1) Sort R and S on the join column, (2) Merge them
(on join col.), and output result tuples.
 Merge: repeat until either R or S is finished
 R is scanned once; each S group is scanned once
per matching R tuple. (Multiple scans of an S
group are likely to find needed pages in buffer.)
 This exploits locality
 Temporal locality: S group is read repeatedly close in
time and then not read again
 Spatial locality: S group is scanned from start to finish
each time
42
Sort-Merge Join (R i=j
S)
 Cost O(M logBM) + O(N logBN) + G
 logBM and logBN are small numbers 2-4
 G = O(M+N) if R tuples match with a small
constant number of S tuples
 both partitions fit in memory
 guaranteed with foreign key joins (1-to-1)
 G = O(M * N) if the inner relation’s partition
doesn’t fit in memory
 each iteration has to read the whole thing in again
 rare in practice
43
Hash Join
 Rather than partitioning by sorting, use a
hash function
 If R and S are hashed into buckets with
the same hash function (on the join
column)
 Matching tuples from R and S must
have the same hash value
 Therefore, you only need to perform
matching on pairs of buckets
• for hash join matching is called
‘probing’ 44
Hash Join - Partitioning
 Readin R and S
 hash the tuples into partitions (buckets)
 write buckets out

Original
Relation OUTPUT Partitions
1

1
INPUT 2
hash 2
... function
h B-1
B-1

Disk B main memory buffers Disk

45
Hash Join - Probing
 foreach partition i
 read in pages for R_i
 scan through pages for S_i evaluating
join cond
 like a block-nested join, just for a
partition at a time
Partitions
of R & S Join Result
Ri pages

Output
Input buffer
for Si buffer
46
B main memory buffers Disk
Hash Join - Probing
 Can apply block-nested hash optimization
 make comparisons cheaper with an in-
memory hash
 use a different hash function (h2) from
the initial hash partitioning function
Partitions
of R & S Join Result
Hash table for partition
hash Ri (k < B-1 pages)
fn
h2
h2

Output
Input buffer
for Si buffer

Disk B main memory buffers Disk


47
Observations on Hash-Join
 # partitions ≤ B-1, AND
 size of largest partition ≤ B-2 to be held in
memory.
 Assuming uniformly sized partitions, we get:
• M / (B-1) < (B-2), i.e., B must be >
• Hash-join works if the smaller relation satisfies above.
 If we build an in-memory hash table a little more
memory is needed.

48
Cost of Hash-Join
 Partitioning reads+writes both relns; 2(M+N).
 Probing reads both relns; M+N I/Os.
 The total is 3(M+N).
 In our running example, a total of 4500 I/Os using hash join,
~45s (compared to 140 hours w. NLJ).
 Sort-Merge Join vs. Hash Join:
 With optimizations to Sort-Merge (not discussed in class), the
cost is similar ~ 3(M+N)
 Hash Join superior if relation sizes differ greatly.
 Hash Join has been shown to be highly parallelizable.
 Sort-Merge less sensitive to data skew; result is sorted.
49
Query Optimization

50
Review of Relational Algebra
Equivalences
 Some useful equivalences

 Selections: (Cascade)
(Commute)
 Projections: (Cascade)

 Joins: R (S T) (R S) T (Associative)
(R S) (S R) (Commute)

51
Pipelined Evaluation
 Materialization: Output is saved in a temporary
relation for use (read: multiple scans) by the next
operator.
 Pipelining: Do everything tuple-by-tuple. Avoid
the cost of writing things out and reading things
back. Can occur in two cases:
 Unary operator: when the input is pipelined into it, the
operator is applied on-the-fly, e.g. selection on-the-fly,
project on-the-fly.
 Binary operator: e.g., the outer relation in indexed nested
loops join.

52
Highlights of System R Optimizer
 Impact: most widely used; works well for < 10 joins.
 Cost of a plan: approximate at best.
 Statistics, maintained in system catalogs, used to estimate
cost of operations and result sizes.
 Considers combination of CPU and I/O costs.
 Plan Space: too large, must be pruned.
 Only considers the space of left-deep plans.
 Plan Search: dynamic programming (prunes useless
subtrees).

53
Left deep plans
The inner relation is always a base relation
no need to materialize the inner relation
Naturally support pipelining of joins
Downside: there may be non-left deep plans that are better than
the best left deep plan we optimize
A tradeoff to limit plan space and enable pipelining

This is the inner relation of


Left-deep Bushy the parent join. It would have
to be materialized for every
D outer tuple
C

A B A B C D

54
Plan Space Exploration Summary
 For each block, the plans considered are:
 All available access methods, for each reln in FROM clause.
Index scan(s)
table scans
 All left-deep join trees: all the ways to join the relns one-at-a-
time, with the inner reln in the FROM clause.
 Considering all permutations of N relns, N factorial!
 All join methods, for each join in the tree.
BNL, indexed NL
Hash join, sort-merge join
 Appropriate places for selections and projections.

55
Cost Estimation
 For each plan considered, must estimate its cost.
 Estimate cost of each operation in a plan tree:
 Depends on input cardinalities.
 We’ve discussed how to estimate the cost of operations
(sequential scan, index scan, joins, etc.)
 Estimate size of result for each operation in tree:
 Use information about the input relations.
 For selections and joins, assume independence of
predicates and uniform distribution of values.

56
Cost Estimation
 The database has access to:
 sizes of the relations and indices (#pages, #tuples)
 min/max values for each indexed column
 more detailed summaries of value distributions
•For each column we maintain a histogram of values
•A summary of the broad ‘shape’ of the actual data

57
Size Estimation & Reduction Factors
SELECT attribute list
FROM relation list
 Consider a query block: WHERE term1 AND ... AND termk
 We construct the estimate by:
 Estimating the selectivity of each term
•We construct a Reduction factor (RF) which is a ratio of
expected satisfying tuples over all tuples
 Then over-estimate the maximum result set size
 Result cardinality = Max # tuples * product of all RF’s.

58
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
 Reduction factor (RF) or Selectivity of each term:
 Assumption 1: uniform distribution of the values!
 If we have an index we can estimate with:
 Term col=value:
• RF = 1/#Keys(I), given index I on col
• This follows from our uniformity assumption. The
probability of a key from I = val

59
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
 Reduction factor (RF) or Selectivity of each term:
 Assumption 1: uniform distribution of the values!
 If we don’t have an index
 Term col=value:
• Can assume some constant RF (e.g. 1/10)
• Could estimate from histogram as well
• use uniform dist. assump. to estimate slice of bucket
• divide by total across all buckets

60
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
 Reduction factor (RF) or Selectivity of each term:
 Assumption 1: uniform distribution of the values!
 If we have an index we can estimate with:
 Term col>value:
• RF = (High(I)-value)/(High(I)-Low(I))
• From uniformity, we assume all values are evenly
distributed between High(I) and Low(I)

61
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
 Reduction factor (RF) or Selectivity of each term:
 Assumption 1: uniform distribution of the values!
 If we don’t have an index
 Term col>value:
• Use another constant (1/2 isn’t uncommon)
• With a histogram, you could do better estimation

62
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
 Reduction factor (RF) or Selectivity of each term:
 Assumption 1: uniform distribution of the values!
 If we have an index for each column, I1 and I2
 Term col1=col2:
• RF = 1/MAX(#Keys(I1), #Keys(I2))
• Assume that each value in I1 has a matching value in I2
• Each of the #Keys(I2) is equally likely (by uniformity)
• The chance of a match is then 1/#keys(I2)
• MM for I1 if I1 was the larger index
63
Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
 Reduction factor (RF) or Selectivity of each term:
 Assumption 1: uniform distribution of the values!
 If we have an index for only one column, I
 Term col1=col2:
• RF = 1/#Keys(I)
 If we have no indexes at all
• Use the arbitrary constant 1/10
• Could use histogram data to approx. joint distribution

64
Size Estimation & Reduction Factors
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
 Reduction factor (RF) or Selectivity of each term:
 Max. number of tuples in result
 the product of the cardinalities of relations in the FROM
clause.
 Pull cardinalities from system catalog, and multiply!
 Result cardinality = Max # tuples * product of all RF’s.
 Assumption 2: terms are independent!

65
Transactions and Concurrency
Control

66
What is a Transaction?
 A transaction is the DBMS’s abstract view of
a user program: a sequence of reads and
writes.
 Organizes a sequence of operations into a
history
 Useful for concurrency reasoning
 Also useful for programming in a world of
concurrent data access and mutation

67
The ACID Properties
 Database systems ensure the ACID properties:
 Atomicity: all operations of transaction reflected
properly in database, or none are.
 Consistency: each transaction in isolation keeps
the database in a consistent state
 Isolation: should be able to understand what’s
going on by considering each separate transaction
independently.
 Durability: updates stay in the DBMS!!!

68
Serializability
 You could always execute transactions in some
forced order
 That would be very slow

 lots of waiting
 So, how do you run multiple transactions but
maintain the illusion of serial execution?
 Implementing serializability requires
understanding effects of operations well
enough to prevent non-serial-equivalent
interleavings

69
Schedules
 The way we’re going to reason about transaction
interleavings is with schedules
 A schedule is a list of operations

 potentially from multiple transactions


 Schedules assume a single thread of execution

 no multi-threading or anything
 Two operations next to each other in the schedule
may be from different transactions
 CANNOT re-order operations within the SAME
transaction
70
A schedule

T1: Transfer T2: Interest


Read(A)
T1: Transfer T2: Interest
Write(A)
A=A+100
Read(A)
A=1.06*A
Write(A)
B=1.06*B
Read(B)
B=B-100
Write(B)
Read(B)
Write(B)
71
Scheduling Transactions
 Serial schedule: Schedule that does not interleave the
actions of different transactions.
 Equivalent schedules: For any database state, the effect
(on the set of objects in the database) of executing the
first schedule is identical to the effect of executing the
second schedule.
 Serializable schedule: A schedule that is equivalent to
some serial schedule

72
Conflicting operations
 Two operations conflict if:
 they operate on the same data object, and
 at least one is a WRITE.
 Schedule outcome is determined by order of
the conflicting operations.
 non-conflicting operations can be
reordered with each other
 conflicting operations act as a barrier to
reordering

73
Conflict Serializable Schedules
 Two schedules are conflict equivalent if:
 Involve the same actions of the same transactions
 Every pair of conflicting actions (of committed
txn) are ordered the same way.
 Alternatively: S can be transformed to S’ by swaps
of non-conflicting actions.
 Schedule S is conflict serializable if S is
conflict equivalent to some serial schedule

Every conflict serializable schedule is serializable.


Conflict-serializable schedule

T1 T2
R(A)
W(A)
R(A)
W(A)
R(B)
W(B)
R(B)
W(B)
Commit
Commit
75
Conflict-serializable schedule

T1 T2
R(A)
We can move T1’s W(A)
R(B); W(B) before
T2’s R(A) to R(B)
transform this into
a serial schedule W(B)
Commit R(A)
W(A)
R(B)
W(B)
Commit

76
Precedence graphs

 Directed graph derived from schedule S:


 Vertex for each transaction
 Edge from Ti to Tj if:
• Ti executes Write(O) before Tj executes Read(O)
• Ti executes Read(O) before Tj executes Write(O)
• Ti executes Write(O) before Tj executes Write(O)

If edge Ti -> Tj appears in precedence graph, then in any


serial schedule equivalent to S, Ti must appear before Tj.

77
Schedules involving abort
 In addition to being serial equivalent, we
need schedules to be recoverable
 A recoverable schedule can handle aborts
of any uncommitted transaction
 Recoverable schedule: transactions commit
only after all transactions whose changes
they read commit.
 commits are delayed until the system can
guarantee no failure

78
Cascading rollback
T1 T2 T3
 Even if schedule is
R(A)
recoverable, several R(B)
transactions may need to be W(A)
rolled back to recover R(A)

correctly. W(A)
R(A)
 Cascading Rollback: a single Abort
transaction failure leading to
a series of rollbacks

79
Cascadeless schedules
 Cascadeless schedule: For
any transactions Ti and Tj: if
Tj reads data written by Ti,
then Ti commits before read
operation of Tj. T1 T2 T3
R(A)
R(B)
T2’s read would have to wait for T1 to commit W(A)
R(A)
T3’s read would have to wait for T2 to commit W(A)
R(A)
Abort
80
Lock-Based Concurrency Control
 Just like in procedural programming
 locks can be used to maintain orders of
access to shared objects
 Lock - associated with some object
 shared (r) or exclusive (w)
 Locking protocol - set of rules to be followed
by each transaction to ensure good
properties.

81
Lock Compatibility Matrix

Locks on a data item are granted based on a


lock compatibility matrix:
Mode of Data Item
None Shared Exclusive

Request mode { Shared


Exclusive Y
Y Y
N
N
N

When a transaction requests a lock, it must wait


(block) until the lock is granted

*there’s a whole literature about how to handle waiting for


locked resources 82
Two-Phase Locking (2PL)
 Two-Phase Locking Protocol
 Each Xact must obtain a S (shared) lock on object
before reading, and an X (exclusive) lock on object
before writing.
 A transaction can not request additional locks
once it releases any locks.
• This implies two phases:
• growing phase
• shrinking phase
Example of 2PL
T1 T2
X(A)
growing phase R(A)
W(A)
X(B)
U(A)
X(A)
R(A)
shrinking phase W(A) growing phase
R(B)
W(B)
U(B)
X(B)
R(B)
W(B)
U(A); U(B) shrinking phase
Commit
Commit
84
Lock-based protocols

 2PL ensures conflict serializability


 Transactions can be ordered by their end of
growing phase (called lock point)
 A 2PL schedule is equivalent to the serial
schedule where transactions ordered by lock
point order.
 2PL ensures serializability-consistent
interleaving
 but is it fast enough?
 What about recoverability?
Strict Two-Phase Locking (Strict 2PL)
 Strict Two-phase Locking Protocol:
 Each Xact must obtain a S (shared) lock on object
before reading, and an X (exclusive) lock on object
before writing.
 A transaction can not request additional locks
once it releases any locks.
 All X (exclusive) locks acquired by a transaction
must be held until completion (commit/abort).
Schedule following strict 2PL
T1 T2
S(A)
R(A)
S(A)
R(A)
X(B)
R(B)
W(B)
U(A) ; U(B)
Commit
X(C)
R(C)
W(C)
U(A); U(C)
Commit
87
Lock-based protocols

 Strict 2PL ensures conflict serializable and


cascadeless schedules
 Writers hold an X lock until they commit.
 Serializability and Cascadeless recoverability
 but locks are held for longer
 the longer locks are held => the less concurrency
you can support
 Consistency + Recoverability vs. Performance
tradeoff
 This is why many databases default to weaker
consistency (more concurrency)
Deadlock

T1 T2
X(A) granted

X(B) granted

X(B) queued
X(A) queued

 Deadlock: Cycle of transactions waiting for


locks to be released by each other.

89
Deadlock Detection

 Create a waits-for graph:


 Nodes are transactions
 There is an edge from Ti to Tj if Ti is waiting for Tj
to release a lock
 add edge when queueing a lock request,
 remove edge when granting lock request.
 Periodically check for cycles in the waits-for
graph
 Resolve by aborting a transaction on cycle,
releasing its locks.
Deadlock Prevention
 Assign priorities based on timestamps. If Ti
wants a lock that Tj holds. Two policies are
possible:
 Wait-Die: If Ti has higher priority, Ti waits for Tj;
otherwise Ti aborts
 aborts most recent txn until deadlock is broken
 Wound-wait: If Ti has higher priority, Tj aborts;
otherwise Ti waits
 Aborts txn when lock conflict is detected

 If txn re-starts, re-use its original


timestamp (to avoid starvation).
Recovery

92
Basic Idea: Logging
 Record REDO and UNDO information, for
every update, in a log.
 Sequential writes to log
• sequential writing is much faster than random
writes from page I/O
 Log is a separate file, so it can be on its own disk
 diffs written to log, smaller than pages
• multiple updates fit in a single log page.

93
Basic Idea: Logging

 Log: An ordered list of REDO/UNDO actions


 Log record contains:
Before image (for UNDO), After image (for REDO)
<XID, pageID, offset, length, old data, new data>
 and additional control info (which we’ll see soon).

94
Write-Ahead Logging (WAL)

 The Write-Ahead Logging Protocol:


 Must force the log record for an update before the
corresponding data page is overwritten on disk.
 Must write all log records for a txn before commit.
 #1 guarantees Atomicity.
 #2 guarantees Durability.

 Exactly how is logging and recovery done?


 the ARIES algorithm (we won’t see details of this
in this class)
95
Recovery Summary
 Changes are written to a log (WAL)
 sequential writes (fast)
 Commits finish when WAL is written
 When recovering after a crash

 WAL is ‘replayed’ to restore state


 Committed txn entries are REDOne
 Uncommitted txn entries are UNDOne

96

You might also like