DBMS-Unit 3 | PDF | Database Transaction | Acid
0% found this document useful (0 votes)
330 views

DBMS-Unit 3

The document discusses transaction concepts in database management systems. It defines a transaction as a collection of operations that form a logical unit of work. Transactions must satisfy the ACID properties of atomicity, consistency, isolation, and durability. Concurrency control techniques like two-phase locking and timestamps are used to ensure serializable execution of concurrent transactions and isolation between transactions. Recovery techniques like deferred and immediate updates with shadow paging and the ARIES algorithm are used to ensure durability in the event of failures.

Uploaded by

ABR
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
330 views

DBMS-Unit 3

The document discusses transaction concepts in database management systems. It defines a transaction as a collection of operations that form a logical unit of work. Transactions must satisfy the ACID properties of atomicity, consistency, isolation, and durability. Concurrency control techniques like two-phase locking and timestamps are used to ensure serializable execution of concurrent transactions and isolation between transactions. Recovery techniques like deferred and immediate updates with shadow paging and the ARIES algorithm are used to ensure durability in the event of failures.

Uploaded by

ABR
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CS3492 /Database Management Systems 2022-2023

UNIT III TRANSACTIONS


Transaction Concepts – ACID Properties – Schedules – Serializability – Transaction support in SQL – Need
for Concurrency – Concurrency control –Two Phase Locking- Timestamp – Multiversion – Validation and
Snapshot isolation– Multiple Granularity locking – Deadlock Handling – Recovery Concepts – Recovery
based on deferred and immediate update – Shadow paging – ARIES Algorithm

TRANSACTION CONCEPTS

A transaction is a collection of operations that forms single logical unit of work.


Simple Transaction Example
1. Read your account balance
2. Deduct the amount from your balance
3. Write the remaining balance to your account
4. Read your friend’s account balanace
5. Add the amount to his account balance
6. Write the new updated balance to his account
This whole set of operations can be called a transaction

Transaction processing system


– The system with large database and hundreds of concurrent users that are executing
database transaction.
– Eg :reservation system , banking system etc
Concurrent access
mutiple user accessing a system at the same time
Single user-one user at a time can use a system
Multi user-many user use the system at a time.
It can be achieved by multiprogramming:
Parallel- multi-users access different resources at the same time.

Interleaved- Multiple users access a single resource based on time.


Page 1
CS3492 /Database Management Systems 2022 – 2023

Transaction access data using two operations


• Read(x)
It transfer the data item x from the database to a local buffer belonging to the transaction that
executed the read operation.
• Write(x)
It transfer the data item x from the local buffer of the transaction to the database i.e. it write back to the
database.
ACID Properties
To ensure the integrity of data during a transaction, the database system maintains the following
properties. These properties are widely known as ACID properties:
Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either
all of its operations are executed or none. There must be no state in a database where a transaction
is left partially completed. States should be defined either before the execution of the transaction or
after the execution/abortion/failure of the transaction.
Consistency − The database must remain in a consistent state after any transaction. No transaction
should have any adverse effect on the data residing in the database. If the database was in a
consistent state before the execution of a transaction, it must remain consistent after the execution
of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even if the system
fails or restarts. If a transaction updates a chunk of data in a database and commits, then the database
will hold the modified data. If a transaction commits but the system fails before the data could be
written on to the disk, then that data will be updated once the system springs back into action.
Isolation − In a database system where more than one transaction are being executed simultaneously
and in parallel, the property of isolation states that all the transactions will be carried out and
executed as if it is the only transaction in the system. No transaction will affect the existence of any
other transaction.
E.g. transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Example of Fund Transfer
• Atomicity requirement
– if the transaction fails after step 3 and before step 6, money will be “lost” leading to an
inconsistent database state
– the system should ensure that updates of a partially executed transaction are not reflected in
the database
• Durability requirement
– once the user has been notified that the transaction has completed, the updates to the
database by the transaction must persist even if there are software or hardware
failures.
• Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access the
partially updated database, it will see an inconsistent database T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
• Isolation can be ensured trivially by running transactions serially
– that is, one after the other.
SCHEDULES
• Schedule – a sequences of instructions that specify the chronological order in which instructions
of concurrent transactions are executed
– a schedule for a set of transactions must consist of all instructions of those transactions
– must preserve the order in which the instructions appear in each individual transaction.
• Serial Schedule
It is a schedule in which transactions are aligned in such a way that one transaction is executed
first. When the first transaction completes its cycle, then the next transaction is executed.
Transactions are ordered one after the other. This type of schedule is called a serial schedule, as
transactions are executed in a serial manner.
Schedule 1
• Let T1 transfer 50 from A to B, and T2 transfer 10% of the balance from A to B.
• A serial schedule in which T1 is followed by T2 :
Schedule 2
• A serial schedule where T2 is followed by T1

Schedule 3
• Let T1 and T2 be the transactions defined previously. The following schedule is not a serial
schedule, but it is equivalent to Schedule 1.

Schedule 4
The following concurrent schedule does not preserve the value of (A + B ).
SERIALIZABILITY
When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with some other
transaction.
• Serializability is the classical concurrency scheme.
• It ensures that a schedule for executing concurrent transactions is equivalent to one that executes
the transactions serially in some order.
Serializable schedule
If a schedule is equivalent to some serial schedule then that schedule is called Serializable
schedule
Let us consider a schedule S. What the schedule S says ?
Read A after updation.
Read B before updation.

Let us consider 3 schedules S1, S2, and S3. We have to check whether they are serializable with S or not ?
Types of Serializability
-Conflict Serializability
-View Serializability

Conflict Serializable
Any given concurrent schedule is said to be Conflict Serializable if and only if it is Conflict
equivalent to one of the possible serial schedule.
Two schedules would be conflicting if they have the following properties
– Both belong to separate transactions.
– Both accesses the same data item.
– At least one of them is "write" operation.
Conflicting Instructions
Instructions li and lj of transactions Ti and Tj respectively, conflict if they are operations by different
transaction on the same data item, and at least one of these instruction is write operation.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Two schedules having multiple transactions with conflicting operations are said to be conflict equivalent if
and only if
– Both the schedules contain the same set of Transactions.
– The order of conflicting pairs of operation is maintained in both the schedules.
– If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting
instructions, we say that S and S´ are conflict equivalent.
– We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule
Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by series of swaps
of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.
Schedule 3:
View Serializable
Any given concurrent schedule is said to be View Serializable if and only if it is View equivalent
to one of the possible serial schedule.
Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the
following three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’
also transaction Ti must read the initial value of Q.
2. If in schedule S, transaction Ti executes read(Q), and that value was produced by
transaction Tj (if any), then in schedule S’ also transaction Ti must read the value of Q that
was produced by the same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation in schedule S must
also perform the final write(Q) operation in schedule S’.

CONCURRENCY CONTROL
Process of managing simultaneous execution of transactions in a shared database, to ensure
the serializability of transactions, is known as concurrency control.
• Process of managing simultaneous operations on the database without having them interfere
with one another.
• Prevents interference when two or more users are accessing database simultaneously and at least
one is updating data.
• Although two transactions may be correct in themselves, interleaving of operations may produce
an incorrect result.
• Simultaneous execution of transactions over a shared database can create several data integrity
and consistency problems.
• lost updated problem
• Temporary updated problem
• Incorrect summery problem
Lost updated problem
• This problem occurs when two transactions that access the same database items have
their operations interleaved in a way that makes the value of some database item
incorrect.
• Successfully completed update is overridden by another user.
Example:
• T1 withdraws £10 from an account with balx, initially £100.
• T2 deposits £100 into same account.
• Serially, final balance would be £190.
• Loss of T2's update!!
• This can be avoided by preventing T1 from reading balx until after update.

Temporary updated problem


• This problem occurs when one transaction updates a database item and then the transaction fails
for some reason. The updated item is accessed by another transaction before it is changed back to
its original value.
• Occurs when one transaction can see intermediate results of another transaction before it
has committed.

Example:
• T1 updates balx to £200 but it aborts, so balx should be back at original value of £100.
• T2 has read new value of balx (£200) and uses value as basis of £10 reduction, giving a new
balance of £190, instead of £90.
• Problem avoided by preventing T2 from reading balx until after T1 commits or aborts.

Incorrect summary problem


• If one transaction is calculating an aggregate summary function on a number of records while other
transactions are updating some of these records, the aggregate function may calculate some values
before they are updated and others after they are updated.
• Occurs when transaction reads several values but second transaction updates some of them during
execution of first.
Example:
• T6 is totaling balances of account x (£100), account y (£50), and account z (£25).
• Meantime, T5 has transferred £10 from balx to balz, so T6 now has wrong result (£10 too high).
• Problem avoided by preventing T6 from reading balx and balz until after T5 completed updates.

Concurrency control techniques
Some of the main techniques used to control the concurrent execution of transaction are based on
the concept of locking the data items

LOCKING PROTOCOLS
A lock is a variable associated with a data item that describe the statues of the item with respect to possible
operations that can be applied to it.
Locking is an operation which secures
(a) permission to Read
(b) permission to Write a data item for a transaction.
Example: Lock (X). Data item X is locked in behalf of the requesting transaction.
Unlocking is an operation which removes these permissions from the data item.
Example: Unlock (X): Data item X is made available to all other transactions.
Lock and Unlock are Atomic operations.
Conflict matrix

Lock Manager:
• Managing locks on data items.
Lock table:
• Lock manager uses it to store the identify of transaction locking a data item, the data item,
lock mode and pointer to the next data item locked. One simple way to implement a lock
table is through linked list
Types of lock
o Binary lock
o Read/write(shared / Exclusive) lock
Binary lock
– It can have two states (or) values 0 and 1.
0 – unlocked
1 - locked
– Lock value is 0 then the data item can accessed when requested.
– When the lock value is 1,the item cannot be accessed when requested.
• Lock_item(x)
B : if lock(x) = 0 ( * item is unlocked * )
then lock(x) 1
else begin
wait ( until lock(x) = 0 )
goto B;
end;
• Unlock_item(x)
B : if lock(x)=1 ( * item is locked * )
then lock(x) 0
else
printf (‘ already is unlocked ‘)
goto B;
end;
Read / write(shared/exclusive) lock
Read_lock
- its also called shared-mode lock
- If a transaction Ti has obtain a shared-mode lock on item X, then Ti can read, but cannot write ,X.
- Outer transactions are also allowed to read the data item but cannot write.
Read_lock(x)
B : if lock(x) = “unlocked” then
begin
lock(x) “ read_locked”
no_of_read(x) 1
else if
lock(x) = “read_locked”
then
no_of_read(x) no_of_read(x) +1
else begin
wait (until lock(x) = “unlocked”
goto B;
end;
Write_lock(x)
B : if lock(x) = “unlocked” then
begin
lock(x) “write_locked”
else if
lock(x) = “write_locked”
wait ( until lock(x) = “unlocked” )
else begin
lock(x)=“read_locked” then
wait ( until lock(x) = “unlocked” )
end;
Unlock(x)
If lock(x) = “write_locked” then
Begin
Lock(x) “unlocked”
Else if
lock(x) = “read_locked” then
Begin
No_of_read(x) no_of_read(x) - 1
If ( no_of_read(x) = 0 ) then
Begin
Lock(x) “unlocked”
End

TWO PHASE LOCKING PROTOCOL


This protocol requires that each transaction issue lock and unlock request in two phases
• Growing phase
• Shrinking phase
Growing phase
• During this phase new locks can be occurred but none can be released
Shrinking phase
• A transaction may release locks, but may not obtain any new locks.
• A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the
transaction releases a lock, it enters the shrinking phase, and it can issue not more lock requests.
Lock point:
• The point in the schedule where the transaction has obtained its final lock (the end of its
growing phase) is called the lock point of the transaction
Another variant of two - phase locking

• Strict two phase locking protocol


• Rigorous two phase locking protocol
Strict two phase locking protocol
This protocol requires not only that locking be two phase, but also all exclusive locks taken by a
transaction be held until that transaction commits.
Rigorous two phase locking protocol
This protocol requires that all locks be held until all transaction commits.
Consider the two transaction T1 and T2
T1 : read(a1);
read(a2);
…….
read(an);
write(a1);
T2: read(a1);
read(a2);
display(a1+a1);
Lock conversion
• Lock Upgrade
• Lock Downgrade
Lock upgrade:
• Conversion of existing read lock to write lock
• Take place in only the growing phase
if Ti has a read-lock (X) and Tj has no read-lock (X) (i j) then
convert read-lock (X) to write-lock (X)
else
force Ti to wait until Tj unlocks X
Lock downgrade:
• conversion of existing write lock to read lock
• Take place in only the shrinking phase
Ti has a write-lock (X) (*no transaction can have any lock on X*)
convert write-lock (X) to read-lock (X)

Transaction State
Active:
 This is the initial state, the transaction stays in this state while it is executing.
Partially committed:
 A transaction is in this state when it has executed the final statement.
Failed:
 A transaction is in this state once the normal execution of the transaction cannot proceed.
Aborted:
 A transaction is said to be aborted when the transaction has rolled back and the database is being
restored to the consistent state prior to the start of the transaction.
Committed:
 A transaction is in the committed state once it has been successfully executed and the database is
transformed into a new consistent state.

Log
• Log is a history of actions executed by a database management system to guarantee ACID
properties over crashes or hardware failures.
• Physically, a log is a file of updates done to the database, stored in stable storage.
Log rule
– A log records for a given database update must be physically written to the log, before the update
physically written to the database.
– All other log record for a given transaction must be physically written to the log, before the commit
log record for the transaction is physically written to the log.
– Commit processing for a given transaction must not complete until the commit log record for the
transaction is physically written to the log.
System log
– [ Begin transaction ,T ]
– [ write_item , T, X , oldvalue,newvalue]
– [read_item,T,X]
– [commit,T]
– [abort,T]

• Assumes fail-stop model – failed sites simply stop working, and do not cause any other harm, such
as sending incorrect messages to other sites.
• Execution of the protocol is initiated by the coordinator after the last step of the transaction has
been reached.
• The protocol involves all the local sites at which the transaction executed
• Let T be a transaction initiated at site Si, and let the transaction coordinator at Si be Ci
Phase 1: Obtaining a Decision (prepare)
• Coordinator asks all participants to prepare to commit transaction Ti.

Page 13
CS3492 /Database Management Systems 2022 - 2023

– Ci adds the records <prepare T> to the log and forces log to stable storage
– sends prepare T messages to all sites at which T executed
• Upon receiving message, transaction manager at site determines if it can commit the transaction
– if not, add a record <no T> to the log and send abort T message to Ci
– if the transaction can be committed, then:
– add the record <ready T> to the log
– force all records for T to stable storage
– send ready T message to Ci
Phase 2: Recording the Decision (commit)
• T can be committed of Ci received a ready T message from all the participating sites: otherwise T
must be aborted.
• Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto
stable storage. Once the record stable storage it is irrevocable (even if failures occur)
• Coordinator sends a message to each participant informing it of the decision (commit or abort)
• Participants take appropriate action locally.
Handling of Failures - Site Failure
When site Si recovers, it examines its log to determine the fate of
transactions active at the time of the failure.
• Log contain <commit T> record: site executes redo (T)
• Log contains <abort T> record: site executes undo (T)
• Log contains <ready T> record: site must consult Ci to determine the fate of T.
– If T committed, redo (T)
– If T aborted, undo (T)
• The log contains no control records concerning T replies that Sk failed before responding to the
prepare T message from Ci
– since the failure of Sk precludes the sending of such a
response C1 must abort T
– Sk must execute undo (T)

Handling of Failures- Coordinator Failure


• If coordinator fails while the commit protocol for T is executing then participating sites must decide
on T’s fate:
1. If an active site contains a <commit T> record in its log, then T must be committed.
2. If an active site contains an <abort T> record in its log, then T must be aborted.
3. If some active participating site does not contain a <ready T> record in its log, then
the failed coordinator Ci cannot have decided to commit T. Can therefore abort T.
4. If none of the above cases holds, then all active sites must have a <ready T> record in their
logs, but no additional control records (such as <abort T> of <commit T>). In this case
active sites must wait for Ci to recover, to find decision.
• Blocking problem : active sites may have to wait for failed coordinator to recover.
Handling of Failures - Network Partition
• If the coordinator and all its participants remain in one partition, the failure has no effect on
the commit protocol.
• If the coordinator and its participants belong to several partitions:
– Sites that are not in the partition containing the coordinator think the coordinator has failed,
and execute the protocol to deal with failure of the coordinator.

Page 14
CS8492 /Database Management Systems 2018 - 2019

• No harm results, but sites may still have to wait for decision from coordinator.
• The coordinator and the sites are in the same partition as the coordinator think that the sites in
the other partition have failed, and follow the usual commit protocol.
• Again, no harm results

Implementation of Isolation Levels


There are various concurrency-control policies that we can use to ensure that, even when multiple transactions
are executed concurrently, only acceptable schedules are generated, regardless of how the operating system
time-shares resources (such as CPU time) among the transactions. As a trivial example of a concurrency-
control policy, consider this: A transaction acquires a lock on the entire database before it starts and releases
the lock after it has committed. While a transaction holds a lock, no other transaction is allowed to acquire
the lock, and all must therefore wait for the lock to be released. As a result of the locking policy, only one
transaction can execute at a time.
Therefore, only serial schedules are generated. These are trivially serializable, and it is easy to verify that
they are recoverable and cascadeless as well. A concurrency-control policy such as this one leads to poor
performance, since it forces transactions to wait for preceding transactions to finish before they can start. In
other words, it provides a poor degree of concurrency. The goal of concurrency-control policies is to provide
a high degree of concurrency, while ensuring that all schedules that can be generated are conflict or view
serializable, recoverable, and cascadeless.
Locking Instead of locking the entire database, a transaction could, instead, lock only those data items that it
accesses. Presented the two-phase locking protocol, a simple, widely used technique that ensures
serializability. Stated simply, two-phase locking requires a transaction to have two phases, one where it
acquires locks but does not release any, and a second phase where the transaction releases locks but does not
acquire any. Further improvements to locking result if we have two kinds of locks: shared and exclusive.
Shared locks are used for data that the transaction reads and exclusive locks are used for those it writes.

Timestamps
The techniques for the implementation of isolation assigns each transaction a timestamp, typically when it
begins. For each data item, the system keeps two timestamps.
i. The read timestamp of a data item holds the largest timestamp of those transactions that read the data item.
ii The write timestamp of a data item holds the timestamp of the transaction that wrote the current value of
the data item.
Timestamps are used to ensure that transactions access each data item in order of the transactions’ timestamps
if their accesses conflict. When this is not possible, offending transactions are aborted and restarted with a
new timestamp.
Multiple Versions and Snapshot Isolation
By maintaining more than one version of a data item, it is possible to allow a transaction to read an old version
of a data item rather than a newer version written by an uncommitted transaction or by a transaction that
should come later in the serialization order.
There are a variety of multiversion concurrency control techniques. One called snapshot isolation, is
widely used in practice. In snapshot isolation, we can imagine that each transaction is given its own version,
or snapshot, of the database when it begins.
It reads data from this private version and is thus isolated from the updates made by other transactions.
If the transaction updates the database, that update appears only in its own version, not in the actual database
itself. Information about these updates is saved so that the updates can be applied to the “real” database if the
transaction commits.
When a transaction T enters the partially committed state, it then proceeds to the committed state
only if no other concurrent transaction has modified data that T intends to update. Transactions that, as a
result, cannot commit abort instead. Snapshot isolation ensures that attempts to read data never need to wait.
Read-only transactions cannot be aborted; only those that modify data run a slight risk of aborting.
Since each transaction reads its own version or snapshot of the database, reading data does not cause
subsequent update attempts by other transactions to wait (unlike locking). Since most transactions are read-
only (and most others read more data than they update), this is often a major source of performance
improvement as compared to locking.
The problem with snapshot isolation is that it provides too much isolation. Consider two transactions
T and T1 . In a serializable execution, either T sees all the updates made by T1 or T1 sees all the updates
made by T, because one must follow the other in the serialization order. Under snapshot isolation, there are
cases where neither transaction sees the updates of the other. This is a situation that cannot occur in a
serializable execution. In many cases, the data accesses by the two transactions do not conflict and there is
no problem. However, if T reads some data item that T1 updates and T1 reads some data item that T updates,
it is possible that both transactions fail to read the update made by the other.

Timestamp-Based Protocols
Timestamps
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This
timestamp is assigned by the database system before the transaction Ti starts execution.
There are two simple methods for implementing this scheme:
1.Use the value of the system clock as the timestamp; that is, a transaction’s timestamp is equal to the
value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a
transaction’s timestamp is equal to the value of the counter when the transaction enters the system
Each transaction is issued a timestamp when it enters the system. If an old transaction Ti has time-stamp
TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj).
The protocol manages concurrent execution such that the time-stamps determine the serializability order.
In order to assure such behavior, the protocol maintains for each data Q two timestamp values:
W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q) successfully.
R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q) successfully.
The Timestamp-Ordering Protocol
The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order of
transaction is nothing but the ascending order of the transaction creation.
The priority of the older transaction is higher that's why it executes first. To determine the timestamp of the
transaction, this protocol uses system time or logical counter.
The lock-based protocol is used to manage the order between conflicting pairs among transactions at the
execution time. But Timestamp based protocols start working as soon as a transaction is created.
Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system at 007
times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it executes first
as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation on a data.
The timestamp ordering protocol ensures that any conflicting read and write operations are executed in
timestamp order.
Suppose a transaction Ti issues a read(Q)
o If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten.
• Hence, the read operation is rejected, and Ti is rolled back.
o If TS(Ti)  W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set
to max(R-timestamp(Q), TS(Ti)).
Suppose that transaction Ti issues write(Q).
o If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously,
and the system assumed that that value would never be produced.
▪ Hence, the write operation is rejected, and Ti is rolled back.
o If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
▪ Hence, this write operation is rejected, and Ti is rolled back.
o Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).

Multiple Granularity Locking


Let's start by understanding the meaning of granularity.
Granularity: It is the size of data item allowed to lock.
Multiple Granularity:
It can be defined as hierarchically breaking up the database into blocks which can be locked. The
Multiple Granularity protocol enhances concurrency and reduces lock overhead.
It maintains the track of what to lock and how to lock.
It makes easy to decide either to lock a data item or to unlock a data item. This type of hierarchy can
be graphically represented as a tree.
For example: Consider a tree which has four levels of nodes.
o The first level or higher level shows the entire database.
o The second level represents a node of type area. The higher level database consists of exactly these
areas.
o The area consists of children nodes which are known as files. No file can be present in more than
one area. o Finally, each file contains child nodes known as records. The file has exactly those records
that are its child nodes. No records represent in more than one file.
o Hence, the levels of the tree starting from the top level are as follows:
1. Database
2. Area
3. File
4. Record

DEADLOCK
System is deadlocked if there is a set of transactions such that every transaction in the set is
waiting for another transaction in the set.
Consider the following two transactions:
T1: write (A) T2: write(A)
write(B) write(B)

Schedule with deadlock


T1 T2

lock-X on A
write (A)

lock-X on B
write (B)
wait for lock-X on A

wait for lock-X on B

Deadlock Handling
Deadlock prevention protocol
Ensure that the system will never enter into a deadlock state.
Some prevention strategies :
Approach1
– Require that each transaction locks all its data items before it begins execution either all are
locked in one step or none are locked.
– Disadvantages
• Hard to predict ,before transaction begins, what data item need to be locked.
• Data item utilization may be very low.
Approach2
– Assign a unique timestamp to each transaction.
– These timestamps only to decide whether a transaction should wait or rollback.
schemes:
- wait-die scheme
- wound-wait scheme
wait-die scheme
- non preemptive technique
When transaction Ti request a data item currently held by Tj, Ti is allowed to wait only if it
has a timestamp smaller than that of Tj. otherwise ,Ti rolled back(dies)
– older transaction may wait for younger one to release data item. Younger transactions
never wait for older ones; they are rolled back instead.
– A transaction may die several times before acquiring needed data item
Example.
• Transaction T1,T2,T3 have time stamps 5,10,15,respectively.
• if T 1 requests a data item held by T2,then T1 will wait.
• If T3 request a data item held by T2,then T3 will be rolled back.
.wound-wait scheme
- Preemptive technique
- When transaction Ti requests a data item currently held by Tj,Ti is allowed to wait only if it has
a timestamp larger than that of Tj. Otherwise Tj is rolled back
– Older transaction wounds (forces rollback) of younger transaction instead of waiting for it.
Younger transactions may wait for older ones.
Example
• Transaction T1,T2,T3 have time stamps 5,10,15,respectively.
• if T1 requests a data item held by T2,then the data item will be preempted from T2,and
T2 will be rolled back.
• If T3 requests a data item held by T2,then T3 will wait.
Deadlock Detection
• Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E),
– V is a set of vertices
– E is a set of edges
• If Ti Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is waiting for Tj to
release a data item.
• The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a
deadlock-detection algorithm periodically to look for cycles.
Wait-for graph without a cycle
Wait-for graph with a cycle

Recovery from deadlock


• The common solution is to roll back one or more transactions to break the deadlock.
• Three action need to be taken
– Selection of victim
– Rollback
– Starvation
Selection of victim
• Set of deadlocked transations,must determine which transaction to roll back to break
the deadlock.
• Consider the factor minimum cost
Rollback
• once we decided that a particular transaction must be rolled back, must determine how far
this transaction should be rolled back
• Total rollback
• Partial rollback
Starvation
Ensure that a transaction can be picked as victim only a finite number of times.
Intent locking
• Intent locks are put on all the ancestors of a node before that node is locked explicitly.
• If a node is locked in an intention mode, explicit locking is being done at a lower level of the tree.
Types of Intent Locking
• Intent shared lock(IS)
• Intent exclusive lock(IX)
• Shared lock (S)
• Shared Intent exclusive lock (SIX)
• Exclusive lock (X)
Intent shared lock(IS)
• If a node is locked in indent shared mode, explicit locking is being done at a lower level
of the tree, but with only shared-mode lock
• Suppose the transaction T1 reads record ra2 in file Fa. Then,T1 needs to lock the database,
area A1,and Fa in IS mode, and finally lock ra2 in S mode.
Intent exclusive lock(IX)
If a node is locked in intent locking is being done at a lower level of the tree, but with
exclusive mode or shared-mode locks.
– Suppose the transaction T2 modifies record ra9 in file Fa. Then,T2 needs to lock the
database,
area A1,and Fa in IX mode, and finally to lock ra9 in X mode.
Shared Intent exclusive lock (SIX)
If the node is locked in Shared Intent exclusive mode, the subtree rooted by that node is
locked explicitly in shared mode, and that explicit locking is being done at lower level with
exclusive mode.
Shared lock (S)
-T can tolerate concurrent readers but not concurrent updaters in R.
Exclusive lock (X)
-T cannot tolerate any concurrent access to R at all.
Lock compatibility

CS3492 /Database Management Systems 2022- 2023

Tran 2
NL IS IX S SIX X
T NL Yes Yes Yes Yes Yes Yes
r IS Yes Yes Yes Yes Yes No

a
IX Yes Yes Yes No No No
n
S Yes Yes No Yes No No
1 SIX Yes Yes No No No No
X Yes No No No No No

If Tran 1 holds a lock of the given type and


Tran 2 requests another lock of the given
type will that request be granted?
TRANSACTION RECOVERY
Recovery Algorithms
• Recovery algorithms are techniques to ensure database consistency and transaction atomicity
and durability despite failures
• Recovery algorithms have two parts
1. Actions taken during normal transaction processing to ensure enough information exists
to recover from failures
2. Actions taken after a failure to recover the database contents to a state that ensures
atomicity, consistency and durability
Example
Begin transaction
Update Acc 1001{balance:=Balance-100};
If any error occurred then
Goto Undo;
End if;
Update Acc 1002{balance:=balance+100};
If any error occurred then
Goto undo;
End if;
Commit;
Goto finish;
Undo: rollback;
Finish: return;
Requirement for recovery
• Implicit rollback
• Message handling
• Recovery log
• Statement atomicity
• No nested transaction
Transaction recovery
Database updates are kept in buffer in main memory and not physically written to disk until commit.
System recovery
Local failures –affect only the transaction which the failure has actually
occurred. Global failures- affect all the transaction in progress at the time of
failure.
System failure – do not physically damage the DB Eg: power shut down
Media failure-cause damage to the DB. Eg: head crash ARIES
Recovery Algorithm
• ARIES-Algorithm for Recovery and Isolation Exploiting Semantics
• ARIES recovery involves three passes
Analysis pass: Determines the REDO and UNDO lists.
Redo pass: Repeats history, redoing all actions from REDO List
Undo pass: Rolls back all incomplete transactions
• The system failure occurred at time Tf , the most recent check point prior to the time Tf was taken
at a time Tf
• Start with two list of transaction the UNDO and REDO list
• search forward through the log starting from check point.
• if begin transaction log record is found for transaction(T) add T to UNDO list.
• if commit log record is found for transaction(T),add T to REDO list
• when the end of log record is reached the UNDO and REDO list is identified
UNDO REDO
T3 T2
T5 T4
SAVE POINTS
• It is possible for a transaction to create a savepoint.
• It is used to store intermediate results
So that it will rollback to a previously established savepoint whenever any recovery process starts.
Create: Savepoint <savepoint_name>;
Rollback: Rollback to <savepoint_name>;
Drop: Release <savepoint_name>;
SQL
COMMIT: Used to made the changes permanently in the Database.
SAVEPOINT: Used to create a savepoint or a reference point.

Page 19
CS3492 /Database Management Systems 2022 - 2023

ROLLBACK: Similar to the undo operation.


Example:
SQL> select * from customer;
CUSTID PID QUANTITY
---------- ---------- ----------
100 1234 10
101 1235 15
102 1236 15
103 1237 10
SQL> savepoint s1;
Savepoint created.
SQL> Delete from customer where custid=103;
CUSTID PID QUANTITY
---------- ---------- ----------
100 1234 10
101 1235 15
102 1236 15
SQL> rollback to s1;
Rollback complete.
SQL> select * from customer;
CUSTID PID QUANTITY
---------- ---------- ----------
100 1234 10
101 1235 15
102 1236 15
103 1237 10
SQL> commit;
ISOLATION LEVEL
• Degree of interference
• An isolation levels mechanism is used to isolate each transaction in a multi-user environment
• Dirty Reads: This situation occurs when transactions read data that has not been committed.
• Nonrepeatable Reads: This situation occurs when a transaction reads the same query multiple
times and results are not the same each time
• Phantoms: This situation occurs when a row of data matches the first time but does not match
subsequent times
Types
Higher isolation level (Repeatable read)
– Less interference
– Lower concurrency
– All schedules are serializable
Lower isolation level(cursor stability)
– More interference
– Higher concurrency
– Not a serializable
One special problem that can occur if transaction operates at less than the maximum isolation level (i.e)
less then repeatable read level is called phantom problem.

Page 20
NO-UNDO/REDO Recovery Based on Deferred Update

The idea behind deferred update is to defer or postpone any actual updates to the database on disk until the
transaction completes its execution successfully and reaches its commit point.

During transaction execution, the updates are recorded only in the log and in the cache buffers. After the
transaction reaches its commit point and the log is force-written to disk, the updates are recorded in the database.
If a transaction fails before reaching its commit point, there is no need to undo any operations because the
transaction has not affected the database on disk in any way. Therefore, only REDO-type log entries are needed
in the log, which include the new value (AFIM) of the item written by a write operation. The UNDO-type log
entries are not needed since no undoing of operations will be required during recovery. Although this may sim-
plify the recovery process, it cannot be used in practice unless transactions are short and each transaction changes
few items. For other types of transactions, there is the potential for running out of buffer space because transaction
changes must be held in the cache buffers until the commit point.

We can state a typical deferred update protocol as follows:

1. A transaction cannot change the database on disk until it reaches its commit point.

2. A transaction does not reach its commit point until all its REDO-type log entries are recorded in the
log and the log buffer is force-written to disk.

Notice that step 2 of this protocol is a restatement of the write-ahead logging (WAL) protocol. Because the
database is never updated on disk until after the transaction commits, there is never a need to UNDO any
operations. REDO is needed in case the system fails after a transaction commits but before all its changes are
recorded in the database on disk. In this case, the transaction operations are redone from the log entries during
recovery.

For multiuser systems with concurrency control, the concurrency control and recovery processes are interrelated.
Consider a system in which concurrency control uses strict two-phase locking, so the locks on items remain in
effect until the trans-action reaches its commit point. After that, the locks can be released. This ensures strict and
serializable schedules. Assuming that [checkpoint] entries are included in the log, a possible recovery algorithm
for this case, which we call RDU_M (Recovery using Deferred Update in a Multiuser environment), is given
next.
Procedure RDU_M (NO-UNDO/REDO with checkpoints). Use two lists of transactions maintained by the
system: the committed transactions T since the last checkpoint (commit list), and the active
transactions T (active list). REDO all the WRITE operations of the committed transactions from the log, in the
order in which they were written into the log. The transactions that are active and did not commit are effectively
canceled and must be resubmitted.

The REDO procedure is defined as follows:

Procedure REDO (WRITE_OP). Redoing a write_item operation WRITE_OP consists of examining its log
entry [write_item, T, X, new_value] and setting the value of item X in the database to new_value, which is the
after image (AFIM).

Figure 23.2 illustrates a timeline for a possible schedule of executing transactions. When the checkpoint was
taken at time t1, transaction T1 had committed, whereas transactions T3 and T4 had not. Before the system crash
at time t2, T3 and T2 were committed but not T4 and T5. According to the RDU_M method, there is no need to
redo the write_item operations of transaction T1—or any transactions committed before the last checkpoint
time t1. The write_item operations of T2 and T3 must be redone, however, because both transactions reached their
commit points after the last checkpoint. Recall that the log is force-written before committing a transaction.
Transactions T4 and T5 are ignored: They are effectively canceled or rolled back because none of
their write_item operations were recorded in the database on disk under the deferred update protocol.

We can make the NO-UNDO/REDO recovery algorithm more efficient by noting that, if a data item X has been
updated—as indicated in the log entries—more than once by committed transactions since the last checkpoint, it
is only necessary to REDO the last update of X from the log during recovery because the other updates would
be overwritten by this last REDO. In this case, we start from the end of the log; then, whenever an item is redone,
it is added to a list of redone items. Before REDO is applied to an item, the list is checked; if the item appears on
the list, it is not redone again, since its last value has already been recovered.
If a transaction is aborted for any reason (say, by the deadlock detection method), it is simply resubmitted, since
it has not changed the database on disk. A drawback of the method described here is that it limits the concurrent
execution of transactions because all write-locked items remain locked until the transaction reaches its
commit point. Additionally, it may require excessive buffer space to hold all updated items until the transactions
commit. The method’s main benefit is that transaction operations never need to be undone, for two reasons:

1. A transaction does not record any changes in the database on disk until after it reaches its commit point—
that is, until it completes its execution success-fully. Hence, a transaction is never rolled back because of failure
during transaction execution.

2. A transaction will never read the value of an item that is written by an uncommitted transaction, because
items remain locked until a transaction reaches its commit point. Hence, no cascading rollback will occur.
Recovery Techniques Based on Immediate Update
In these techniques, when a transaction issues an update command, the database on disk can be
updated immediately, without any need to wait for the transaction to reach its commit point. Notice that it is not
a requirement that every update be
applied immediately to disk; it is just possible that some updates are applied to disk before the transaction
commits.
Provisions must be made for undoing the effect of update operations that have been applied to the database by
a failed transaction. This is accomplished by rolling back the transaction and undoing the effect of the
transaction’s write_item operations. Therefore, the UNDO-type log entries, which include the old
value (BFIM) of the item, must be stored in the log. Because UNDO can be needed during recovery, these
methods follow a steal strategy for deciding when updated main memory buffers can be written back to disk
(see Section 23.1.3). Theoretically, we can distinguish two main categories of immediate update algorithms. If
the recovery technique ensures that all updates of a transaction are recorded in the database on disk before the
transaction commits, there is never a need to REDO any operations of committed transactions. This is called
the UNDO/NO-REDO recovery algorithm. In this method, all updates by a transaction must be recorded on
disk before the transaction commits, so that REDO is never needed. Hence, this method must utilize the force
strategy for deciding when updated main memory buffers are written back to disk (see Section 23.1.3).
If the transaction is allowed to commit before all its changes are written to the data-base, we have the most
general case, known as the UNDO/REDO recovery algorithm. In this case, the steal/no-force strategy is
applied (see Section 23.1.3). This is also the most complex technique. We will outline an UNDO/REDO recovery
algo-rithm and leave it as an exercise for the reader to develop the UNDO/NO-REDO vari-ation. In Section 23.5,
we describe a more practical approach known as the ARIES recovery technique.
When concurrent execution is permitted, the recovery process again depends on the protocols used for
concurrency control. The procedure RIU_M (Recovery using Immediate Updates for a Multiuser environment)
outlines a recovery algorithm for concurrent transactions with immediate update (UNDO/REDO recovery).
Assume that the log includes checkpoints and that the concurrency control protocol pro-duces strict schedules—
as, for example, the strict two-phase locking protocol does. Recall that a strict schedule does not allow a
transaction to read or write an item unless the transaction that last wrote the item has committed (or aborted and
rolled back). However, deadlocks can occur in strict two-phase locking, thus requiring abort and UNDO of
transactions. For a strict schedule, UNDO of an operation requires changing the item back to its old value
(BFIM).
Procedure RIU_M (UNDO/REDO with checkpoints).
1. Use two lists of transactions maintained by the system: the committed trans-actions since the last
checkpoint and the active transactions.

2. Undo all the write_item operations of the active (uncommitted) transactions, using the UNDO procedure.
The operations should be undone in the reverse of the order in which they were written into the log.
3. Redo all the write_item operations of the committed transactions from the log, in the order in which they
were written into the log, using the REDO procedure defined earlier.
The UNDO procedure is defined as follows:

Procedure UNDO (WRITE_OP). Undoing a write_item operation write_op consists of examining its log
entry [write_item, T, X, old_value, new_value] and set-ting the value of item X in the database to old_value,
which is the before image (BFIM). Undoing a number of write_item operations from one or more trans-actions
from the log must proceed in the reverse order from the order in which the operations were written in the log.
As we discussed for the NO-UNDO/REDO procedure, step 3 is more efficiently done by starting from the end
of the log and redoing only the last update of each item X.
Whenever an item is redone, it is added to a list of redone items and is not redone again. A similar procedure can
be devised to improve the efficiency of step 2 so that an item can be undone at most once during recovery. In
this case, the earliest UNDO is applied first by scanning the log in the forward direction (starting from the
beginning of the log). Whenever an item is undone, it is added to a list of undone items and is not undone again.
Shadow Paging
This recovery scheme does not require the use of a log in a single-user environment. In a multiuser environment,
a log may be needed for the concurrency control method. Shadow paging considers the database to be made up
of a number of fixed-size disk pages (or disk blocks)—say, n—for recovery purposes.
A directory with n entries is constructed, where the ith entry points to the ith database page on disk. The
directory is kept in main memory if it is not too large, and all references—reads or writes—to database pages on
disk go through it. When a transaction begins executing, the current directory—whose entries point to the most
recent or current database pages on disk—is copied into a shadow directory. The shadow directory is then saved
on disk while the current directory is used by the transaction.

During transaction execution, the shadow directory is never modified. When a write_item operation is
performed, a new copy of the modified database page is created, but the old copy of that page is not
overwritten. Instead, the new page is writ-ten elsewhere—on some previously unused disk block. The current
directory entry is modified to point to the new disk block, whereas the shadow directory is not modified and
continues to point to the old unmodified disk block. Figure 23.4 illustrates the concepts of shadow and current
directories. For pages updated by the transaction, two versions are kept. The old version is referenced by the
shadow directory and the new version by the current directory.
To recover from a failure during transaction execution, it is sufficient to free the modified database pages and to
discard the current directory. The state of the data-base before transaction execution is available through the
shadow directory, and that state is recovered by reinstating the shadow directory. The database thus is returned
to its state prior to the transaction that was executing when the crash occurred, and any modified pages are
discarded. Committing a transaction corresponds to discarding the previous shadow directory. Since recovery
involves neither undoing nor redoing data items, this technique can be categorized as a NO-UNDO/NO-
REDO technique for recovery.
In a multiuser environment with concurrent transactions, logs and checkpoints must be incorporated into the
shadow paging technique. One disadvantage of shadow paging is that the updated database pages change location
on disk. This makes it difficult to keep related database pages close together on disk without complex storage
management strategies. Furthermore, if the directory is large, the overhead of writing shadow directories to disk
as transactions commit is significant. A further complication is how to handle garbage collection when a
transaction commits. The old pages referenced by the shadow directory that have been updated must be released
and added to a list of free pages for future use. These pages are no longer needed after the transaction commits.
Another issue is that the operation to migrate between cur-rent and shadow directories must be implemented as
an atomic operation.
ARIES Algorithm
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is based on the Write Ahead Log
(WAL) protocol. Every update operation writes a log record which is one of the following :
1. Undo-only log record: Only the before image is logged. Thus, an undo operation can be done to retrieve
the old data.
2. Redo-only log record: Only the after image is logged. Thus, a redo operation can be attempted.
3. Undo-redo log record: Both before images and after images are logged.
In it, every log record is assigned a unique and monotonically increasing log sequence number (LSN).
Every data page has a page LSN field that is set to the LSN of the log record corresponding to the last
update on the page. WAL requires that the log record corresponding to an update make it to stable storage
before the data page corresponding to that update is written to disk. For performance reasons, each log
write is not immediately forced to disk. A log tail is maintained in main memory to buffer log writes. The
log tail is flushed to disk when it gets full. A transaction cannot be declared committed until the commit
log record makes it to disk.
Once in a while the recovery subsystem writes a checkpoint record to the log. The checkpoint record
contains the transaction table and the dirty page table. A master log record is maintained separately, in
stable storage, to store the LSN of the latest checkpoint record that made it to disk. On restart, the recovery
subsystem reads the master log record to find the checkpoint’s LSN, reads the checkpoint record, and
starts recovery from there on. The recovery process actually consists of 3 phases:
1. Analysis: The recovery subsystem determines the earliest log record from which the next pass must
start. It also scans the log forward from the checkpoint record to construct a snapshot of what the system
looked like at the instant of the crash.
2. Redo: Starting at the earliest LSN, the log is read forward and each update redone.
3. Undo: The log is scanned backward and updates corresponding to loser transactions are undone.

You might also like