DBMS-Unit 3
DBMS-Unit 3
TRANSACTION CONCEPTS
Schedule 3
• Let T1 and T2 be the transactions defined previously. The following schedule is not a serial
schedule, but it is equivalent to Schedule 1.
Schedule 4
The following concurrent schedule does not preserve the value of (A + B ).
SERIALIZABILITY
When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with some other
transaction.
• Serializability is the classical concurrency scheme.
• It ensures that a schedule for executing concurrent transactions is equivalent to one that executes
the transactions serially in some order.
Serializable schedule
If a schedule is equivalent to some serial schedule then that schedule is called Serializable
schedule
Let us consider a schedule S. What the schedule S says ?
Read A after updation.
Read B before updation.
Let us consider 3 schedules S1, S2, and S3. We have to check whether they are serializable with S or not ?
Types of Serializability
-Conflict Serializability
-View Serializability
Conflict Serializable
Any given concurrent schedule is said to be Conflict Serializable if and only if it is Conflict
equivalent to one of the possible serial schedule.
Two schedules would be conflicting if they have the following properties
– Both belong to separate transactions.
– Both accesses the same data item.
– At least one of them is "write" operation.
Conflicting Instructions
Instructions li and lj of transactions Ti and Tj respectively, conflict if they are operations by different
transaction on the same data item, and at least one of these instruction is write operation.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Two schedules having multiple transactions with conflicting operations are said to be conflict equivalent if
and only if
– Both the schedules contain the same set of Transactions.
– The order of conflicting pairs of operation is maintained in both the schedules.
– If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting
instructions, we say that S and S´ are conflict equivalent.
– We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule
Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by series of swaps
of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.
Schedule 3:
View Serializable
Any given concurrent schedule is said to be View Serializable if and only if it is View equivalent
to one of the possible serial schedule.
Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the
following three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’
also transaction Ti must read the initial value of Q.
2. If in schedule S, transaction Ti executes read(Q), and that value was produced by
transaction Tj (if any), then in schedule S’ also transaction Ti must read the value of Q that
was produced by the same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation in schedule S must
also perform the final write(Q) operation in schedule S’.
CONCURRENCY CONTROL
Process of managing simultaneous execution of transactions in a shared database, to ensure
the serializability of transactions, is known as concurrency control.
• Process of managing simultaneous operations on the database without having them interfere
with one another.
• Prevents interference when two or more users are accessing database simultaneously and at least
one is updating data.
• Although two transactions may be correct in themselves, interleaving of operations may produce
an incorrect result.
• Simultaneous execution of transactions over a shared database can create several data integrity
and consistency problems.
• lost updated problem
• Temporary updated problem
• Incorrect summery problem
Lost updated problem
• This problem occurs when two transactions that access the same database items have
their operations interleaved in a way that makes the value of some database item
incorrect.
• Successfully completed update is overridden by another user.
Example:
• T1 withdraws £10 from an account with balx, initially £100.
• T2 deposits £100 into same account.
• Serially, final balance would be £190.
• Loss of T2's update!!
• This can be avoided by preventing T1 from reading balx until after update.
Example:
• T1 updates balx to £200 but it aborts, so balx should be back at original value of £100.
• T2 has read new value of balx (£200) and uses value as basis of £10 reduction, giving a new
balance of £190, instead of £90.
• Problem avoided by preventing T2 from reading balx until after T1 commits or aborts.
LOCKING PROTOCOLS
A lock is a variable associated with a data item that describe the statues of the item with respect to possible
operations that can be applied to it.
Locking is an operation which secures
(a) permission to Read
(b) permission to Write a data item for a transaction.
Example: Lock (X). Data item X is locked in behalf of the requesting transaction.
Unlocking is an operation which removes these permissions from the data item.
Example: Unlock (X): Data item X is made available to all other transactions.
Lock and Unlock are Atomic operations.
Conflict matrix
Lock Manager:
• Managing locks on data items.
Lock table:
• Lock manager uses it to store the identify of transaction locking a data item, the data item,
lock mode and pointer to the next data item locked. One simple way to implement a lock
table is through linked list
Types of lock
o Binary lock
o Read/write(shared / Exclusive) lock
Binary lock
– It can have two states (or) values 0 and 1.
0 – unlocked
1 - locked
– Lock value is 0 then the data item can accessed when requested.
– When the lock value is 1,the item cannot be accessed when requested.
• Lock_item(x)
B : if lock(x) = 0 ( * item is unlocked * )
then lock(x) 1
else begin
wait ( until lock(x) = 0 )
goto B;
end;
• Unlock_item(x)
B : if lock(x)=1 ( * item is locked * )
then lock(x) 0
else
printf (‘ already is unlocked ‘)
goto B;
end;
Read / write(shared/exclusive) lock
Read_lock
- its also called shared-mode lock
- If a transaction Ti has obtain a shared-mode lock on item X, then Ti can read, but cannot write ,X.
- Outer transactions are also allowed to read the data item but cannot write.
Read_lock(x)
B : if lock(x) = “unlocked” then
begin
lock(x) “ read_locked”
no_of_read(x) 1
else if
lock(x) = “read_locked”
then
no_of_read(x) no_of_read(x) +1
else begin
wait (until lock(x) = “unlocked”
goto B;
end;
Write_lock(x)
B : if lock(x) = “unlocked” then
begin
lock(x) “write_locked”
else if
lock(x) = “write_locked”
wait ( until lock(x) = “unlocked” )
else begin
lock(x)=“read_locked” then
wait ( until lock(x) = “unlocked” )
end;
Unlock(x)
If lock(x) = “write_locked” then
Begin
Lock(x) “unlocked”
Else if
lock(x) = “read_locked” then
Begin
No_of_read(x) no_of_read(x) - 1
If ( no_of_read(x) = 0 ) then
Begin
Lock(x) “unlocked”
End
Transaction State
Active:
This is the initial state, the transaction stays in this state while it is executing.
Partially committed:
A transaction is in this state when it has executed the final statement.
Failed:
A transaction is in this state once the normal execution of the transaction cannot proceed.
Aborted:
A transaction is said to be aborted when the transaction has rolled back and the database is being
restored to the consistent state prior to the start of the transaction.
Committed:
A transaction is in the committed state once it has been successfully executed and the database is
transformed into a new consistent state.
Log
• Log is a history of actions executed by a database management system to guarantee ACID
properties over crashes or hardware failures.
• Physically, a log is a file of updates done to the database, stored in stable storage.
Log rule
– A log records for a given database update must be physically written to the log, before the update
physically written to the database.
– All other log record for a given transaction must be physically written to the log, before the commit
log record for the transaction is physically written to the log.
– Commit processing for a given transaction must not complete until the commit log record for the
transaction is physically written to the log.
System log
– [ Begin transaction ,T ]
– [ write_item , T, X , oldvalue,newvalue]
– [read_item,T,X]
– [commit,T]
– [abort,T]
• Assumes fail-stop model – failed sites simply stop working, and do not cause any other harm, such
as sending incorrect messages to other sites.
• Execution of the protocol is initiated by the coordinator after the last step of the transaction has
been reached.
• The protocol involves all the local sites at which the transaction executed
• Let T be a transaction initiated at site Si, and let the transaction coordinator at Si be Ci
Phase 1: Obtaining a Decision (prepare)
• Coordinator asks all participants to prepare to commit transaction Ti.
Page 13
CS3492 /Database Management Systems 2022 - 2023
– Ci adds the records <prepare T> to the log and forces log to stable storage
– sends prepare T messages to all sites at which T executed
• Upon receiving message, transaction manager at site determines if it can commit the transaction
– if not, add a record <no T> to the log and send abort T message to Ci
– if the transaction can be committed, then:
– add the record <ready T> to the log
– force all records for T to stable storage
– send ready T message to Ci
Phase 2: Recording the Decision (commit)
• T can be committed of Ci received a ready T message from all the participating sites: otherwise T
must be aborted.
• Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto
stable storage. Once the record stable storage it is irrevocable (even if failures occur)
• Coordinator sends a message to each participant informing it of the decision (commit or abort)
• Participants take appropriate action locally.
Handling of Failures - Site Failure
When site Si recovers, it examines its log to determine the fate of
transactions active at the time of the failure.
• Log contain <commit T> record: site executes redo (T)
• Log contains <abort T> record: site executes undo (T)
• Log contains <ready T> record: site must consult Ci to determine the fate of T.
– If T committed, redo (T)
– If T aborted, undo (T)
• The log contains no control records concerning T replies that Sk failed before responding to the
prepare T message from Ci
– since the failure of Sk precludes the sending of such a
response C1 must abort T
– Sk must execute undo (T)
Page 14
CS8492 /Database Management Systems 2018 - 2019
• No harm results, but sites may still have to wait for decision from coordinator.
• The coordinator and the sites are in the same partition as the coordinator think that the sites in
the other partition have failed, and follow the usual commit protocol.
• Again, no harm results
Timestamps
The techniques for the implementation of isolation assigns each transaction a timestamp, typically when it
begins. For each data item, the system keeps two timestamps.
i. The read timestamp of a data item holds the largest timestamp of those transactions that read the data item.
ii The write timestamp of a data item holds the timestamp of the transaction that wrote the current value of
the data item.
Timestamps are used to ensure that transactions access each data item in order of the transactions’ timestamps
if their accesses conflict. When this is not possible, offending transactions are aborted and restarted with a
new timestamp.
Multiple Versions and Snapshot Isolation
By maintaining more than one version of a data item, it is possible to allow a transaction to read an old version
of a data item rather than a newer version written by an uncommitted transaction or by a transaction that
should come later in the serialization order.
There are a variety of multiversion concurrency control techniques. One called snapshot isolation, is
widely used in practice. In snapshot isolation, we can imagine that each transaction is given its own version,
or snapshot, of the database when it begins.
It reads data from this private version and is thus isolated from the updates made by other transactions.
If the transaction updates the database, that update appears only in its own version, not in the actual database
itself. Information about these updates is saved so that the updates can be applied to the “real” database if the
transaction commits.
When a transaction T enters the partially committed state, it then proceeds to the committed state
only if no other concurrent transaction has modified data that T intends to update. Transactions that, as a
result, cannot commit abort instead. Snapshot isolation ensures that attempts to read data never need to wait.
Read-only transactions cannot be aborted; only those that modify data run a slight risk of aborting.
Since each transaction reads its own version or snapshot of the database, reading data does not cause
subsequent update attempts by other transactions to wait (unlike locking). Since most transactions are read-
only (and most others read more data than they update), this is often a major source of performance
improvement as compared to locking.
The problem with snapshot isolation is that it provides too much isolation. Consider two transactions
T and T1 . In a serializable execution, either T sees all the updates made by T1 or T1 sees all the updates
made by T, because one must follow the other in the serialization order. Under snapshot isolation, there are
cases where neither transaction sees the updates of the other. This is a situation that cannot occur in a
serializable execution. In many cases, the data accesses by the two transactions do not conflict and there is
no problem. However, if T reads some data item that T1 updates and T1 reads some data item that T updates,
it is possible that both transactions fail to read the update made by the other.
Timestamp-Based Protocols
Timestamps
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by TS(Ti). This
timestamp is assigned by the database system before the transaction Ti starts execution.
There are two simple methods for implementing this scheme:
1.Use the value of the system clock as the timestamp; that is, a transaction’s timestamp is equal to the
value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned; that is, a
transaction’s timestamp is equal to the value of the counter when the transaction enters the system
Each transaction is issued a timestamp when it enters the system. If an old transaction Ti has time-stamp
TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj).
The protocol manages concurrent execution such that the time-stamps determine the serializability order.
In order to assure such behavior, the protocol maintains for each data Q two timestamp values:
W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q) successfully.
R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q) successfully.
The Timestamp-Ordering Protocol
The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order of
transaction is nothing but the ascending order of the transaction creation.
The priority of the older transaction is higher that's why it executes first. To determine the timestamp of the
transaction, this protocol uses system time or logical counter.
The lock-based protocol is used to manage the order between conflicting pairs among transactions at the
execution time. But Timestamp based protocols start working as soon as a transaction is created.
Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system at 007
times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it executes first
as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation on a data.
The timestamp ordering protocol ensures that any conflicting read and write operations are executed in
timestamp order.
Suppose a transaction Ti issues a read(Q)
o If TS(Ti) W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten.
• Hence, the read operation is rejected, and Ti is rolled back.
o If TS(Ti) W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set
to max(R-timestamp(Q), TS(Ti)).
Suppose that transaction Ti issues write(Q).
o If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously,
and the system assumed that that value would never be produced.
▪ Hence, the write operation is rejected, and Ti is rolled back.
o If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
▪ Hence, this write operation is rejected, and Ti is rolled back.
o Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).
DEADLOCK
System is deadlocked if there is a set of transactions such that every transaction in the set is
waiting for another transaction in the set.
Consider the following two transactions:
T1: write (A) T2: write(A)
write(B) write(B)
lock-X on A
write (A)
lock-X on B
write (B)
wait for lock-X on A
Deadlock Handling
Deadlock prevention protocol
Ensure that the system will never enter into a deadlock state.
Some prevention strategies :
Approach1
– Require that each transaction locks all its data items before it begins execution either all are
locked in one step or none are locked.
– Disadvantages
• Hard to predict ,before transaction begins, what data item need to be locked.
• Data item utilization may be very low.
Approach2
– Assign a unique timestamp to each transaction.
– These timestamps only to decide whether a transaction should wait or rollback.
schemes:
- wait-die scheme
- wound-wait scheme
wait-die scheme
- non preemptive technique
When transaction Ti request a data item currently held by Tj, Ti is allowed to wait only if it
has a timestamp smaller than that of Tj. otherwise ,Ti rolled back(dies)
– older transaction may wait for younger one to release data item. Younger transactions
never wait for older ones; they are rolled back instead.
– A transaction may die several times before acquiring needed data item
Example.
• Transaction T1,T2,T3 have time stamps 5,10,15,respectively.
• if T 1 requests a data item held by T2,then T1 will wait.
• If T3 request a data item held by T2,then T3 will be rolled back.
.wound-wait scheme
- Preemptive technique
- When transaction Ti requests a data item currently held by Tj,Ti is allowed to wait only if it has
a timestamp larger than that of Tj. Otherwise Tj is rolled back
– Older transaction wounds (forces rollback) of younger transaction instead of waiting for it.
Younger transactions may wait for older ones.
Example
• Transaction T1,T2,T3 have time stamps 5,10,15,respectively.
• if T1 requests a data item held by T2,then the data item will be preempted from T2,and
T2 will be rolled back.
• If T3 requests a data item held by T2,then T3 will wait.
Deadlock Detection
• Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E),
– V is a set of vertices
– E is a set of edges
• If Ti Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is waiting for Tj to
release a data item.
• The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a
deadlock-detection algorithm periodically to look for cycles.
Wait-for graph without a cycle
Wait-for graph with a cycle
Tran 2
NL IS IX S SIX X
T NL Yes Yes Yes Yes Yes Yes
r IS Yes Yes Yes Yes Yes No
a
IX Yes Yes Yes No No No
n
S Yes Yes No Yes No No
1 SIX Yes Yes No No No No
X Yes No No No No No
Page 19
CS3492 /Database Management Systems 2022 - 2023
Page 20
NO-UNDO/REDO Recovery Based on Deferred Update
The idea behind deferred update is to defer or postpone any actual updates to the database on disk until the
transaction completes its execution successfully and reaches its commit point.
During transaction execution, the updates are recorded only in the log and in the cache buffers. After the
transaction reaches its commit point and the log is force-written to disk, the updates are recorded in the database.
If a transaction fails before reaching its commit point, there is no need to undo any operations because the
transaction has not affected the database on disk in any way. Therefore, only REDO-type log entries are needed
in the log, which include the new value (AFIM) of the item written by a write operation. The UNDO-type log
entries are not needed since no undoing of operations will be required during recovery. Although this may sim-
plify the recovery process, it cannot be used in practice unless transactions are short and each transaction changes
few items. For other types of transactions, there is the potential for running out of buffer space because transaction
changes must be held in the cache buffers until the commit point.
1. A transaction cannot change the database on disk until it reaches its commit point.
2. A transaction does not reach its commit point until all its REDO-type log entries are recorded in the
log and the log buffer is force-written to disk.
Notice that step 2 of this protocol is a restatement of the write-ahead logging (WAL) protocol. Because the
database is never updated on disk until after the transaction commits, there is never a need to UNDO any
operations. REDO is needed in case the system fails after a transaction commits but before all its changes are
recorded in the database on disk. In this case, the transaction operations are redone from the log entries during
recovery.
For multiuser systems with concurrency control, the concurrency control and recovery processes are interrelated.
Consider a system in which concurrency control uses strict two-phase locking, so the locks on items remain in
effect until the trans-action reaches its commit point. After that, the locks can be released. This ensures strict and
serializable schedules. Assuming that [checkpoint] entries are included in the log, a possible recovery algorithm
for this case, which we call RDU_M (Recovery using Deferred Update in a Multiuser environment), is given
next.
Procedure RDU_M (NO-UNDO/REDO with checkpoints). Use two lists of transactions maintained by the
system: the committed transactions T since the last checkpoint (commit list), and the active
transactions T (active list). REDO all the WRITE operations of the committed transactions from the log, in the
order in which they were written into the log. The transactions that are active and did not commit are effectively
canceled and must be resubmitted.
Procedure REDO (WRITE_OP). Redoing a write_item operation WRITE_OP consists of examining its log
entry [write_item, T, X, new_value] and setting the value of item X in the database to new_value, which is the
after image (AFIM).
Figure 23.2 illustrates a timeline for a possible schedule of executing transactions. When the checkpoint was
taken at time t1, transaction T1 had committed, whereas transactions T3 and T4 had not. Before the system crash
at time t2, T3 and T2 were committed but not T4 and T5. According to the RDU_M method, there is no need to
redo the write_item operations of transaction T1—or any transactions committed before the last checkpoint
time t1. The write_item operations of T2 and T3 must be redone, however, because both transactions reached their
commit points after the last checkpoint. Recall that the log is force-written before committing a transaction.
Transactions T4 and T5 are ignored: They are effectively canceled or rolled back because none of
their write_item operations were recorded in the database on disk under the deferred update protocol.
We can make the NO-UNDO/REDO recovery algorithm more efficient by noting that, if a data item X has been
updated—as indicated in the log entries—more than once by committed transactions since the last checkpoint, it
is only necessary to REDO the last update of X from the log during recovery because the other updates would
be overwritten by this last REDO. In this case, we start from the end of the log; then, whenever an item is redone,
it is added to a list of redone items. Before REDO is applied to an item, the list is checked; if the item appears on
the list, it is not redone again, since its last value has already been recovered.
If a transaction is aborted for any reason (say, by the deadlock detection method), it is simply resubmitted, since
it has not changed the database on disk. A drawback of the method described here is that it limits the concurrent
execution of transactions because all write-locked items remain locked until the transaction reaches its
commit point. Additionally, it may require excessive buffer space to hold all updated items until the transactions
commit. The method’s main benefit is that transaction operations never need to be undone, for two reasons:
1. A transaction does not record any changes in the database on disk until after it reaches its commit point—
that is, until it completes its execution success-fully. Hence, a transaction is never rolled back because of failure
during transaction execution.
2. A transaction will never read the value of an item that is written by an uncommitted transaction, because
items remain locked until a transaction reaches its commit point. Hence, no cascading rollback will occur.
Recovery Techniques Based on Immediate Update
In these techniques, when a transaction issues an update command, the database on disk can be
updated immediately, without any need to wait for the transaction to reach its commit point. Notice that it is not
a requirement that every update be
applied immediately to disk; it is just possible that some updates are applied to disk before the transaction
commits.
Provisions must be made for undoing the effect of update operations that have been applied to the database by
a failed transaction. This is accomplished by rolling back the transaction and undoing the effect of the
transaction’s write_item operations. Therefore, the UNDO-type log entries, which include the old
value (BFIM) of the item, must be stored in the log. Because UNDO can be needed during recovery, these
methods follow a steal strategy for deciding when updated main memory buffers can be written back to disk
(see Section 23.1.3). Theoretically, we can distinguish two main categories of immediate update algorithms. If
the recovery technique ensures that all updates of a transaction are recorded in the database on disk before the
transaction commits, there is never a need to REDO any operations of committed transactions. This is called
the UNDO/NO-REDO recovery algorithm. In this method, all updates by a transaction must be recorded on
disk before the transaction commits, so that REDO is never needed. Hence, this method must utilize the force
strategy for deciding when updated main memory buffers are written back to disk (see Section 23.1.3).
If the transaction is allowed to commit before all its changes are written to the data-base, we have the most
general case, known as the UNDO/REDO recovery algorithm. In this case, the steal/no-force strategy is
applied (see Section 23.1.3). This is also the most complex technique. We will outline an UNDO/REDO recovery
algo-rithm and leave it as an exercise for the reader to develop the UNDO/NO-REDO vari-ation. In Section 23.5,
we describe a more practical approach known as the ARIES recovery technique.
When concurrent execution is permitted, the recovery process again depends on the protocols used for
concurrency control. The procedure RIU_M (Recovery using Immediate Updates for a Multiuser environment)
outlines a recovery algorithm for concurrent transactions with immediate update (UNDO/REDO recovery).
Assume that the log includes checkpoints and that the concurrency control protocol pro-duces strict schedules—
as, for example, the strict two-phase locking protocol does. Recall that a strict schedule does not allow a
transaction to read or write an item unless the transaction that last wrote the item has committed (or aborted and
rolled back). However, deadlocks can occur in strict two-phase locking, thus requiring abort and UNDO of
transactions. For a strict schedule, UNDO of an operation requires changing the item back to its old value
(BFIM).
Procedure RIU_M (UNDO/REDO with checkpoints).
1. Use two lists of transactions maintained by the system: the committed trans-actions since the last
checkpoint and the active transactions.
2. Undo all the write_item operations of the active (uncommitted) transactions, using the UNDO procedure.
The operations should be undone in the reverse of the order in which they were written into the log.
3. Redo all the write_item operations of the committed transactions from the log, in the order in which they
were written into the log, using the REDO procedure defined earlier.
The UNDO procedure is defined as follows:
Procedure UNDO (WRITE_OP). Undoing a write_item operation write_op consists of examining its log
entry [write_item, T, X, old_value, new_value] and set-ting the value of item X in the database to old_value,
which is the before image (BFIM). Undoing a number of write_item operations from one or more trans-actions
from the log must proceed in the reverse order from the order in which the operations were written in the log.
As we discussed for the NO-UNDO/REDO procedure, step 3 is more efficiently done by starting from the end
of the log and redoing only the last update of each item X.
Whenever an item is redone, it is added to a list of redone items and is not redone again. A similar procedure can
be devised to improve the efficiency of step 2 so that an item can be undone at most once during recovery. In
this case, the earliest UNDO is applied first by scanning the log in the forward direction (starting from the
beginning of the log). Whenever an item is undone, it is added to a list of undone items and is not undone again.
Shadow Paging
This recovery scheme does not require the use of a log in a single-user environment. In a multiuser environment,
a log may be needed for the concurrency control method. Shadow paging considers the database to be made up
of a number of fixed-size disk pages (or disk blocks)—say, n—for recovery purposes.
A directory with n entries is constructed, where the ith entry points to the ith database page on disk. The
directory is kept in main memory if it is not too large, and all references—reads or writes—to database pages on
disk go through it. When a transaction begins executing, the current directory—whose entries point to the most
recent or current database pages on disk—is copied into a shadow directory. The shadow directory is then saved
on disk while the current directory is used by the transaction.
During transaction execution, the shadow directory is never modified. When a write_item operation is
performed, a new copy of the modified database page is created, but the old copy of that page is not
overwritten. Instead, the new page is writ-ten elsewhere—on some previously unused disk block. The current
directory entry is modified to point to the new disk block, whereas the shadow directory is not modified and
continues to point to the old unmodified disk block. Figure 23.4 illustrates the concepts of shadow and current
directories. For pages updated by the transaction, two versions are kept. The old version is referenced by the
shadow directory and the new version by the current directory.
To recover from a failure during transaction execution, it is sufficient to free the modified database pages and to
discard the current directory. The state of the data-base before transaction execution is available through the
shadow directory, and that state is recovered by reinstating the shadow directory. The database thus is returned
to its state prior to the transaction that was executing when the crash occurred, and any modified pages are
discarded. Committing a transaction corresponds to discarding the previous shadow directory. Since recovery
involves neither undoing nor redoing data items, this technique can be categorized as a NO-UNDO/NO-
REDO technique for recovery.
In a multiuser environment with concurrent transactions, logs and checkpoints must be incorporated into the
shadow paging technique. One disadvantage of shadow paging is that the updated database pages change location
on disk. This makes it difficult to keep related database pages close together on disk without complex storage
management strategies. Furthermore, if the directory is large, the overhead of writing shadow directories to disk
as transactions commit is significant. A further complication is how to handle garbage collection when a
transaction commits. The old pages referenced by the shadow directory that have been updated must be released
and added to a list of free pages for future use. These pages are no longer needed after the transaction commits.
Another issue is that the operation to migrate between cur-rent and shadow directories must be implemented as
an atomic operation.
ARIES Algorithm
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is based on the Write Ahead Log
(WAL) protocol. Every update operation writes a log record which is one of the following :
1. Undo-only log record: Only the before image is logged. Thus, an undo operation can be done to retrieve
the old data.
2. Redo-only log record: Only the after image is logged. Thus, a redo operation can be attempted.
3. Undo-redo log record: Both before images and after images are logged.
In it, every log record is assigned a unique and monotonically increasing log sequence number (LSN).
Every data page has a page LSN field that is set to the LSN of the log record corresponding to the last
update on the page. WAL requires that the log record corresponding to an update make it to stable storage
before the data page corresponding to that update is written to disk. For performance reasons, each log
write is not immediately forced to disk. A log tail is maintained in main memory to buffer log writes. The
log tail is flushed to disk when it gets full. A transaction cannot be declared committed until the commit
log record makes it to disk.
Once in a while the recovery subsystem writes a checkpoint record to the log. The checkpoint record
contains the transaction table and the dirty page table. A master log record is maintained separately, in
stable storage, to store the LSN of the latest checkpoint record that made it to disk. On restart, the recovery
subsystem reads the master log record to find the checkpoint’s LSN, reads the checkpoint record, and
starts recovery from there on. The recovery process actually consists of 3 phases:
1. Analysis: The recovery subsystem determines the earliest log record from which the next pass must
start. It also scans the log forward from the checkpoint record to construct a snapshot of what the system
looked like at the instant of the crash.
2. Redo: Starting at the earliest LSN, the log is read forward and each update redone.
3. Undo: The log is scanned backward and updates corresponding to loser transactions are undone.