0% found this document useful (0 votes)

47 views94 pages

CC Part 2 No Quizzes

Uploaded by

est09serra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views94 pages

CC Part 2 No Quizzes

Uploaded by

est09serra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 94

Introduction to Transaction

Management

CMPSCI 445

1
Txn Semantics Summary
 Transactions are how programmers make
sense of concurrent behavior
 ACID properties are a contract between
DB and programmer
 A tool to build understandable code
Interleaving execution is vital for
performance
 DB has to run operations concurrently, but
 Simulate that only a single txn was run at
a time
 This is Concurrency Control 2
Anomalies with Interleaved Execution

 Not all interleavings of operations are okay.

 Anomaly: two consistency-preserving
committed transactions that lead to an
inconsistent state.
 Types of anomalies:
 Reading Uncommitted Data (WR Conflicts)
“dirty reads”
 Unrepeatable Reads (RW Conflicts)
 Overwriting Uncommitted Data (WW Conflicts)

3
Reading Uncommitted Data
(WR Conflict)
T1: Transfer T2: Interest
“Dirty Read” R(A)
W(A)
R(A)
Inconsistent result of A is W(A)
exposed to transaction T2
R(B)
(We saw this schedule in
the banking example) W(B)
Commit
Transaction is no longer
ISOLATED from other R(B)
transactions’ effects
W(B)
Commit 4
Acceptable Dirty Read

If we are just checking availability of an

airline seat, a dirty read might be fine! (Why
is that?)
Reservation transaction:
EXEC SQL select occupied into :occ
from Flights
where Num= ‘123’ and date=11-03-99
and seat=‘23f’;
if (!occ) {EXEC SQL
update Flights
set occupied=true
where Num= ‘123’ and date=11-03-99
and seat=‘23f’;}
else {notify user that seat is unavailable}

5
Unrepeatable Reads

T1 could see two values for A, T1 T2

although it has not changed A itself. R(A)
(This could not happen in a serial
execution. ) R(A)
W(A)
Commit
R(A)
W(A)
Commit

6
Overwriting Uncommitted Data
(WW Conflicts)
T1 T2
W(A) Here T2 overwrites value of B that
has been modified by T1.
W(B)
Example: Harry and Larry are 2
W(B) employees. Salaries must be
kept equal.
Commit
W(A) T1: set salaries to 1000
T2 set salaries to 2000
Commit

7
Overwriting Uncommitted Data
(WW Conflicts)
Here T2 overwrites value of B that
T1 T2 has been modified by T1.
W(A) Example: Harry and Larry are 2
W(B) employees. Salaries must be
kept equal.
W(B)
T1: set salaries to 1000
Commit T2 set salaries to 2000
T1; T2 => salaries = 2000
W(A) T2; T1 => salaries = 1000
Commit This interleaving, Harry = 1000,
Larry = 2000 (lucky larry)

8
Lock-Based Concurrency Control
 Just like in procedural programming
 locks can be used to maintain orders of
access to shared objects
 Lock - associated with some object
 shared (r) or exclusive (w)
 Locking protocol - set of rules to be followed
by each transaction to ensure good
properties.

9
Lock Compatibility Matrix

Locks on a data item are granted based on a

lock compatibility matrix:
Mode of Data Item
None Shared Exclusive

Request mode { Shared

Exclusive Y
Y Y
N
N
N

When a transaction requests a lock, it must wait

(block) until the lock is granted

*there’s a whole literature about how to handle waiting for

locked resources 10
Transactions and locks
 Transactions now have two new operations
 lock (written as X for exclusive S for
shared)
 unlock (written as U)
 Depending on the locking protocol
transactions will request and release locks
around data accesses

11
Example Txn with locking
T1

lock-X(A)
R(A)
W(A)
unlock(A)
lock-S(B)
R(B)
unlock(B)

12
Goals of a locking protocol
 Ensure consistency/isolation
 Serializability for this lecture
 Enable interleaving of txn operations

 If locks don’t make things faster, we

shouldn’t use them
 Lock operations should be fast
 Protocol should try to maximize consistent
concurrency

13
Two-Phase Locking (2PL)
 Two-Phase Locking Protocol
 Each Xact must obtain a S (shared) lock on object
before reading, and an X (exclusive) lock on object
before writing.
 A transaction can not request additional locks
once it releases any locks.
• This implies two phases:
• growing phase
• shrinking phase
Example of 2PL
T1 T2
X(A)
R(A)
W(A)
X(B)
U(A)
X(A)
R(A)
W(A)
R(B)
W(B)
U(B)
X(B)
R(B)
W(B)
U(A); U(B)
Commit
Commit
15
Example of 2PL
T1 T2
X(A)
growing phase R(A)
W(A)
X(B)
U(A)
X(A)
R(A)
shrinking phase W(A) growing phase
R(B)
W(B)
U(B)
X(B)
R(B)
W(B)
U(A); U(B) shrinking phase
Commit
Commit
16
Not admissible under 2PL

T1 T2
R(A)
W(A)
R(A)
W(A)
R(B)
W(B)
Commit
R(B)
W(B)
Commit
17
Not admissible under 2PL
Why?
T1 T2
R(A)
W(A)
R(A)
W(A)
R(B)
W(B)
Commit
R(B)
W(B)
Commit
18
Not admissible under 2PL
Why?
T1 T2
X(A)

X(A)
X(B)
U(B)
U(A)
Commit
X(B)
U(B)
U(A)
Commit
19
Not admissible under 2PL
Why?
T1 T2
X(A)

X(A) T1 holds A’s lock

X(B) T2 won’t be able
U(B) to proceed
U(A)
Commit
X(B)
U(B)
U(A)
Commit
20
Lock-based protocols

 2PL ensures conflict serializability

 Transactions can be ordered by their end of
growing phase (called lock point)
 A 2PL schedule is equivalent to the serial
schedule where transactions ordered by lock
point order.
 2PL ensures serializability-consistent
interleaving
 but is it fast enough?
 What about recoverability?
Strict Two-Phase Locking (Strict 2PL)
 Strict Two-phase Locking Protocol:
 Each Xact must obtain a S (shared) lock on object
before reading, and an X (exclusive) lock on object
before writing.
 A transaction can not request additional locks
once it releases any locks.
 All X (exclusive) locks acquired by a transaction
must be held until completion (commit/abort).
Schedule following strict 2PL
T1 T2
S(A)
R(A)
S(A)
R(A)
X(B)
R(B)
W(B)
Commit
X(C)
R(C)
W(C)
Commit
23
Lock-based protocols

 Strict 2PL ensures conflict serializable and

cascadeless schedules
 Writers hold an X lock until they commit.
 Serializability and Cascadeless recoverability
 but locks are held for longer
 the longer locks are held => the less concurrency
you can support
 Consistency + Recoverability vs. Performance
tradeoff
 This is why many databases default to weaker
consistency (more concurrency)
Lock Management

 Lock and unlock requests are handled by the lock

manager, an internal software module of the DBMS
 Lock table entry (for each object) contains:
 Number of transactions currently holding a lock
 Type of lock held (shared or exclusive)
 Pointer to queue of lock requests
 Locking and unlocking have to be atomic operations
 Locks cannot be half-held or half-released
 Built on OS primitives and/or CPU primitives
 Slower (10-100x) than non-atomic memory
accesses
Lock Management
 Lock upgrade: transaction that holds a shared lock
can be upgraded to hold an exclusive lock
 allows max. concurrency until you realize you
need eXclusive lock
 useful for UPDATE (only X on the rows that need
to be written, everyone else gets S)
 special case: X granted immediately if no one else
holds S, puts request at front of queue otherwise
 Does NOT prevent deadlock
 T1: S(A); T2: S(A); T1: Up(A) ; T2: Up(A)
 T2’s S lock blocks T1’s Upgrade request
 T1’s S lock blocks T2’s Upgrade request
Deadlock

T1 T2
X(A) granted

X(B) granted

X(B) queued
X(A) queued

 Deadlock: Cycle of transactions waiting for

locks to be released by each other.

27
Deadlocks
 Tend to be rare in practice.*
 Two ways of dealing with deadlocks:
 Deadlock prevention
 lock protocol makes deadlock impossible
 transactions may have to ‘restart’
 Deadlock detection
 a separate process examines txn state to detect
deadlock
 and then transactions are aborted to clear the
deadlock
*tend to start cropping up when the load on the system increases greatly.
More txns => More concurrency => more opportunities for deadlock
Deadlock Detection

 Create a waits-for graph:

 Nodes are transactions
 There is an edge from Ti to Tj if Ti is waiting for Tj
to release a lock
 add edge when queueing a lock request,
 remove edge when granting lock request.
 Periodically check for cycles in the waits-for
graph
 Resolve by aborting a transaction on cycle,
releasing its locks.
Deadlock Detection (Continued)
T1 T2 T3 T4
S(A)
R(A)
X(B)
W(B)
S(B)
S(C)
R(C)
X(C)
X(B)
X(A)

T1 T2

T4 T3
Deadlock Detection (Continued)
T1 T2 T3 T4
S(A)
R(A)
X(B)
W(B)
B is held by T2 S(B)
S(C)
R(C)
X(C)
X(B)
X(A)

T1 T2

T4 T3
Deadlock Detection (Continued)
T1 T2 T3 T4
S(A)
R(A)
X(B)
W(B)
S(B)
S(C)
R(C)
C is held by T3 X(C)
X(B)
X(A)

T1 T2

T4 T3
Deadlock Detection (Continued)
T1 T2 T3 T4
S(A)
R(A)
X(B)
W(B)
S(B)
S(C)
R(C)
X(C)
X(B) B is held by T2
X(A)

T1 T2

T4 T3
Deadlock Detection (Continued)
T1 T2 T3 T4
S(A)
R(A)
X(B)
W(B)
S(B)
S(C)
R(C)
X(C)
X(B)
X(A) A is held by T1

T1 T2
A waits-for cycle means
deadlock

T4 T3
Deadlock Prevention
 Idea 1: Assign priorities based on
timestamps. If Ti wants a lock that Tj holds.
Two policies are possible:
 Wait-Die: If Ti has higher priority, Ti waits for Tj;
otherwise Ti aborts
 aborts most recent txn until deadlock is broken
 Wound-wait: If Ti has higher priority, Tj aborts;
otherwise Ti waits
 Aborts txn when lock conflict is detected

 If txn re-starts, re-use its original

timestamp (to avoid starvation).
Deadlock Prevention
 Idea 2: Conservative 2PL:
 transaction grabs all locks it will need up
front, before issuing any statements
 All blocking happens at the beginning of
txn
 no waiting in the middle of txn
 How can you know what locks you need
before you even start?
 Not always feasible
Performance of Locking
 Lock-based schemes resolve conflicting
schedules by blocking and aborting
 in practice few deadlocks and relatively few
aborts
 most of penalty from blocking
 To increase throughput
 lock smallest objects possible
 reduce time locks are held
 reduce hotspots
• a hotspot is a popular/highly contended object

37
Performance Graphs
 If concurrency is a win, then running more
transactions means higher throughput
Throughput DB linear

# transactions
38
Performance Graphs
 If concurrency is a win, then running more
transactions means higher throughput
Throughput DB linear

Initially, we get near

linear speedup.
E.g. 2x txns means
2x throughput
# transactions
39
Performance Graphs
 If concurrency is a win, then running more
transactions means higher throughput
Throughput DB linear

Then the system begins

to plateau until it hits some
maximum value
# transactions
40
Performance Graphs
 If concurrency is a win, then running more
transactions means higher throughput
Throughput DB linear

What’s going on here?

# transactions
41
Thrashing
 Locking causes blocking. The more popular
an object is, the more transactions are trying
to lock it (contention)
 More contention means blocking is more
costly
 vicious cycle of more contention causing
more slowdowns which lead to even more
contention
 When adding more transactions reduces
throughput we say the system is thrashing

42
Transaction support in SQL
 Explicit transactions with BEGIN/START
TRANSACTION
 ‘autocommit mode’ Transaction
automatically started for SELECT, UPDATE,
CREATE
 Transaction ends with COMMIT or
ROLLBACK (abort)
 SQL 99 supports SAVEPOINTs which are
simple nested transactions
 ABORT can roll back to an intermediate
point rather than just to the beginning
43
What should we lock?
T1 T2

SELECT S.rating, MIN(S.age) INSERT Sailors(Name, Rating,

FROM Sailors S Age) VALUES (“Mary”, 8, 18)
WHERE S.rating = 8

 Lock Tables? (Coarse locking):

 T1 S-lock on Sailors;
 T2 X-lock on Sailors
 Lock rows? (Fine-grained locking):
 T1 S-lock on all rows with rating=8;
 T2 X-lock on Joe’s tuple.
 Fine granularity locking is generally better. But it
introduces a problem: phantoms. 44
The Phantom Problem
T1 T2
SELECT MIN(age) INSERT Sailors(Name, Rating,
FROM Sailors S Age) VALUES (“Mary”, 8, 18)
WHERE rating = 8

SELECT MAX(age)
FROM Sailors S
WHERE rating = 8

 Consider the following schedule

 Initially, no Sailor has rating = 8
 T1 runs the first query, which scans Sailors. It locks no
row and returns an empty table.
 T2 inserts a row for Mary at the end of the relation on a
page not previously seen by T1. T1 has no lock so T2 goes
ahead and commits.
 T1 executes the second query and returns the age of
Mary. This violates serializability despite using locks! 45
The Phantom Problem
T1 T2
SELECT MIN(age)
FROM Sailors S INSERT Sailors(Name, Rating,
WHERE rating = 8 Age) VALUES (“Mary”, 8, 18)
SELECT MAX(age) COMMIT
FROM Sailors S
WHERE rating = 8

 Consider the following schedule

 Initially, no Sailor has rating = 8

46
The Phantom Problem
T1 T2
SELECT MIN(age) No Results, no rows locked
FROM Sailors S
WHERE rating = 8 INSERT Sailors(Name, Rating,
Age) VALUES (“Mary”, 8, 18)
SELECT MAX(age)
COMMIT
FROM Sailors S
WHERE rating = 8

 Consider the following schedule

 Initially, no Sailor has rating = 8
 T1’s first SELECT runs first, no results, no rows locked

47
The Phantom Problem
T1 T2
SELECT MIN(age) No Results, no rows locked
FROM Sailors S
WHERE rating = 8 INSERT Sailors(Name, Rating,
Age) VALUES (“Mary”, 8, 18)
SELECT MAX(age)
COMMIT
FROM Sailors S
WHERE rating = 8

 Consider the following schedule

 Initially, no Sailor has rating = 8
 T1’s first SELECT runs first, no results, no rows locked
 T2’s INSERT runs, adds a new row b/c no conflicting
locks detected

48
The Phantom Problem
T1 T2
SELECT MIN(age) No Results, no rows locked
FROM Sailors S
WHERE rating = 8 INSERT Sailors(Name, Rating,
Age) VALUES (“Mary”, 8, 18)
SELECT MAX(age)
COMMIT
FROM Sailors S
WHERE rating = 8

 Consider the following schedule

 Initially, no Sailor has rating = 8
 T1’s first SELECT runs first, no results, no rows locked
 T2’s INSERT runs, adds a new row b/c no conflicting
locks detected
 T1’s second SELECT runs, now returns a sailor

49
The Phantom Problem
 T1 can only lock rows that already exist
 wants to lock all potential rows with rating = 8
 how do you lock a row that doesn’t exist?
 Idea 1: Lock whole table
 no modification can slip by a transaction
 no other transaction can write the table
•or read it after eXclusive locking
 really bad for performance
 Idea 2: use index to lock ranges of values
 Idea 3: prevent new rows that match the
selection condition
50
Data
Index
Index Locking
r=1
 If there is an index on the rating field, T1
should lock the index page containing the data
entries with rating = 8.
 If there are no records with rating = 8, T1 must
lock the index page where such a data entry would
be, if it existed!
 If there is no suitable index:
 T1 locks all pages when doing table scan
 Also locks the file/table to prevent new
pages from being added
Predicate Locking

 Grant lock on all records that satisfy some

logical predicate, e.g. age > 2*salary.
 Index locking is a special case of predicate
locking for which an index supports efficient
implementation of the predicate lock.
 In general, predicate locking has a lot of
locking overhead.
 In general it’s constraint satisfaction
Locking in B+ Trees
 How can we efficiently lock a particular leaf
node?
 One solution: Ignore the tree structure, just lock
pages while traversing the tree, following 2PL.
 This has terrible performance!
 Root node (and many higher level nodes) become
bottlenecks because every tree access begins at the
root.
 No different from locking whole table
Two Useful Observations
 Higher levels of the tree only direct searches
for leaf pages.
 All the action is at the leaves
 that’s where the key+values are
 Why not just lock the leaf page(s)?
 A split creates a new leaf and moves half
the keys into it
 So you need to lock internal nodes to
prevent splits
 But maybe not always?
A Simple Tree Locking Algorithm
 Search: Start at root and go down;
repeatedly, S lock child then unlock parent.
 Insert/Delete: Start at root and go down,
obtaining X locks. Once child is locked,
check if it is safe:
 If child is safe, release all locks on ancestors.
 Safe node: Node such that changes will not
propagate up beyond this node.
 Inserts: Node is not full. (no split possible)
 Deletes: Node is not half-empty. (no merge
possible)
Tree Locking Example T1: Search 38*
T2: Insert 45*
Shared root T3: Search 22*
…

Exclusive … 20 …

…
…
35 …

23 38 44

leaves