UNIT – 4
Transaction: A transaction refers to a sequence of one or more operations (such as read, write,
update, or delete) performed on the database as a single logical unit of work. A transaction ensures
that either all the operations are successfully executed (committed) or none of them take effect
(rolled back). Transactions are designed to maintain the integrity, consistency and reliability of the
database, even in the case of system failures or concurrent access.
Read(X)
A read operation is used to read the value of a particular database element X and stores it in a
temporary buffer in the main memory for further actions such as displaying that value.
Write(X)
A write operation is used to write the value to the database from the buffer in the main memory. For
a write operation to be performed, first a read operation is performed to bring its value in buffer, and
then some changes are made to it, e.g. some set of arithmetic operations are performed on it
according to the user's request, then to store the modified value back in the database, a write
operation is performed.
3) Commit
This operation in transactions is used to maintain integrity in the database. Due to some failure of
power, hardware, or software, etc., a transaction might get interrupted before all its operations are
completed. This may cause ambiguity in the database, i.e. it might get inconsistent before and after
the transaction.
4) Rollback
If an error occurs, the Rollback operation undoes all the changes made by the transaction, reverting
the database to its last consistent state. In simple words, it can be said that a rollback operation does
undo the operations of transactions that were performed before its interruption to achieve a safe
state of the database and avoid any kind of ambiguity or inconsistency.
Schedules
A series of operation from one transaction to another transaction is known as schedule. It is used to
preserve the order of the operation in each of the individual transaction.
1. Serial Schedule: The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction. In the serial schedule, when the first
transaction completes its cycle, then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If
it has no interleaving of operations, then there are the following two possible outcomes:
i. Execute all the operations of T1 which was followed by all the operations of T2.
ii. In the given (a) figure, Schedule A shows the serial schedule where T1 followed by
T2.
iii. In the given (b) figure, Schedule B shows the serial schedule where T2 followed by
T1.
2. Non Serial Schedule:
i. If interleaving of operations is allowed, then there will be non-serial schedule.
ii. It contains many possible orders in which the system can execute the individual
operations of the transactions.
iii. In the given figure (c) and (d), Schedule C and Schedule D are the non-serial
schedules. It has interleaving of operations.
3. Serializable schedule:
i. The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
ii. It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
iii. A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Here,
Schedule A and Schedule B are serial schedule.
Schedule C and Schedule D are Non-serial schedule.
To resolve the problem there are two methods of serializability:
1) COMMIT: In DBMS, a commit is a command that saves all changes made during the current
transaction permanently to the database. Once committed, the changes become visible to
other users and cannot be rolled back.
2) ROLLBACK: In DBMS, a rollback is a command used to undo changes made during the
current transaction. It restores the database to its previous stable state, ensuring data
integrity in case of errors or transaction failures.
SERIALIZABILITY: Serializability in DBMS is a key concept in concurrency control that ensures
the consistency and correctness of transactions when executed simultaneously. It guarantees
that the final state of the database is the same as if the transactions had been executed
serially, one after the other, without any overlap. This prevents issues like lost updates, dirty
reads, and uncommitted data being accessed. Serializability helps maintain data integrity in
multi-user environments by ensuring that interleaved transaction executions do not violate
the database's consistency rules. It is considered the highest level of isolation among
database concurrency control techniques.
There are two types of serializability:
1) CONFLICT SERIALIZABILITY: Conflict Serializability is a method that checks whether the
result of a concurrent transaction schedule is equivalent to a serial schedule by analyzing
conflicting operations such as read-write, write-read, and write-write on the same data
item. It uses a serialization graph (also called a precedence graph) where nodes
represent transactions and edges represent the order of conflicting operations. If the
graph contains no cycles, then the schedule is considered conflict serializable. This
method is widely used because it is relatively simple to implement and can be verified
using graph-based algorithms.
It follows the three conditions:
(R → W)
(W→R)
(W→W)
• They belong to different transactions.
• They operate on the same data item.
• At least one of them is a write operation.
2) VIEW SERIALIZABILITY: View Serializability is a more general and less restrictive form of
serializability than conflict serializability. It ensures that a schedule is view-equivalent to
some serial schedule. View equivalence is based on three conditions: each transaction
must read the same initial value as in the serial schedule, the read operations must
return the same values, and the final write operations must match. Even if a schedule is
not conflict-serializable, it may still be view-serializable. However, checking view
serializability is computationally more expensive and complex compared to conflict
serializability, so it's less commonly used in practice.
RECOVERABILITY: Recoverability is a critical feature of database systems that ensures the database
can return to a consistent and reliable state after a failure or error. It guarantees that the effects of
committed transactions are saved permanently, while uncommitted transactions are rolled back to
maintain data integrity. This process relies on transaction logs, which record all changes made during
transaction processing. These logs enable the system to either undo the changes of uncommitted
transactions or redo the committed ones when a failure occurs.
In database systems, multiple transactions often run simultaneously, with some being independent
and others interdependent. When a dependent transaction fails, it can have a cascading impact on
other transactions. Recoverability in DBMS addresses these challenges by focusing on minimizing the
effects of such failures and ensuring the database remains consistent.
There are several levels of recoverability that can be supported by a database system:
No-undo logging: This level of recoverability only guarantees that committed transactions are
durable, but does not provide the ability to undo the effects of uncommitted transactions.
Undo logging: This level of recoverability provides the ability to undo the effects of uncommitted
transactions but may result in the loss of updates made by committed transactions that occur after
the failed transaction.
Redo logging: This level of recoverability provides the ability to redo the effects of committed
transactions, ensuring that all committed updates are durable and can be recovered in the event of
failure.
Undo-redo logging: This level of recoverability provides both undo and redo capabilities, ensuring
that the system can recover to a consistent state regardless of whether a transaction has been
committed or not.
There are three types of recoverability:
1) Recoverable Schedules
A recoverable schedule is a type of transaction schedule in a Database Management System (DBMS)
where committed transactions do not violate the rules of consistency, even in the event of a failure.
In other words, a transaction in a recoverable schedule only commits after all the transactions it
depends on have committed. This ensures that the database can maintain integrity and recover to a
consistent state.
Irrecoverable Schedules
An irrecoverable schedule in a database is a type of transaction schedule where a transaction
commits after reading uncommitted data (a dirty read) from another transaction and the original
transaction later fails or rolls back. This creates a situation where the database cannot be recovered
to a consistent state.
T 2 read the value of A written by T 1 , and committed. T 1 later aborted, therefore the value read by
T 2 is wrong, but since T 2 committed, this schedule is non-recoverable .
2) Cascading Schedule
Cascading refers to a situation where the failure of one transaction causes a chain reaction of
rollbacks in other dependent transactions. This happens because the dependent transactions
rely on data or changes made by the failed transaction. If the changes are rolled back, the
dependent transactions also need to roll back to maintain consistency.
3) Cascadeless Schedule
A Cascadeless Schedule goes a step further than a recoverable schedule. It prevents a
transaction from reading data written by another uncommitted transaction. In this case,
transactions are isolated in such a way that no transaction is allowed to read an
intermediate, uncommitted value from another transaction. This eliminates the risk of
cascading rollbacks, where a failure in one transaction might cause multiple dependent
transactions to be rolled back. Cascadeless schedules help maintain database consistency
and minimize the overhead of rollbacks in case of a failure.
4) Strict Schedule
A Strict Schedule is the most restrictive type of recoverable schedule. In this schedule, a
transaction is not allowed to read or write data modified by an uncommitted transaction.
This means that no transaction can interact with data that is being modified by another
transaction until that transaction has committed. This strict isolation simplifies recovery and
ensures consistency because, in the event of a failure, no transaction can have its data
affected by uncommitted changes. It also prevents cascading rollbacks but can lead to
reduced concurrency and performance in systems with high transaction volume.
LOG-BASED RECOVERY: Log-based recovery in DBMS ensures data can be maintained or restored in
the event of a system failure. The DBMS records every transaction on stable storage, allowing for
easy data recovery when a failure occurs. For each operation performed on the database, a log file is
created. Transactions are logged and verified before being applied to the database, ensuring data
integrity.
Log based recovery
Log in DBMS
A log is a sequence of records that document the operations performed during database
transactions. Logs are stored in a log file for each transaction, providing a mechanism to recover data
in the event of a failure. For every operation executed on the database, a corresponding log record is
created. It is critical to store these logs before the actual transaction operations are applied to the
database, ensuring data integrity and consistency during recovery processes.
For example, consider a transaction to modify a student’s city. This transaction generates the
following logs:
Start Log: When the transaction begins, a log is created to indicate the start of the transaction.
Format:<Tn, Start>
• Here, Tn represents the transaction identifier.
• Example: <T1, Start> indicates that Transaction 1 has started.
Operation Log: When the city is updated, a log is recorded to capture the old and new values of the
operation.
Format:<Tn, Attribute, Old_Value, New_Value>
• Example: <T1, City, 'Gorakhpur', 'Noida'> shows that in Transaction 1, the value of
the City attribute has changed from 'Gorakhpur' to 'Noida'.
Commit Log: Once the transaction is successfully completed, a final log is created to indicate that the
transaction has been completed and the changes are now permanent.
Format:<Tn, Commit>
• Example: <T1, Commit> signifies that Transaction 1 has been successfully completed.
Operations in Log based recovery
1) Undo Operation
The undo operation reverses the changes made by an uncommitted transaction, restoring
the database to its previous state.
2) Redo Operation
The redo operation re-applies the changes made by a committed transaction to ensure
consistency in the database.
3) Undo-Redo Operation
Undo and Redo operations in DBMS are recovery techniques used to maintain database
consistency after a failure. Undo reverses the effects of uncommitted transactions, restoring
the database to its previous state, while Redo re-applies the changes of committed
transactions to ensure their effects are not lost. Together, they help ensure atomicity and
durability in transaction management.
ADVANTAGES OF LOG BASED RECOVERY
• Durability: In the event of a breakdown, the log file offers a dependable and long-lasting
method of recovering data. It guarantees that in the event of a system crash, no committed
transaction is lost.
• Faster Recovery: Since log-based recovery recovers databases by replaying committed
transactions from the log file, it is typically faster than alternative recovery methods.
• Incremental Backup: Backups can be made in increments using log-based recovery. Just the
changes made since the last backup are kept in the log file, rather than creating a complete
backup of the database each time.
• Lowers the Risk of Data Corruption: By making sure that all transactions are correctly
committed or cancelled before they are written to the database, log-based recovery lowers
the risk of data corruption.
DISADVANTAGES OF LOG BASED RECOVERY
• Additional overhead: Maintaining the log file incurs an additional overhead on the database
system, which can reduce the performance of the system.
• Complexity: Log-based recovery is a complex process that requires careful management and
administration. If not managed properly, it can lead to data inconsistencies or loss.
• Storage space: The log file can consume a significant amount of storage space, especially in a
database with a large number of transactions.
• Time-Consuming: The process of replaying the transactions from the log file can be time-
consuming, especially if there are a large number of transactions to recover.
CHECKPOINT: The Checkpoint is used to declare a point before which the DBMS was in a consistent
state, and all transactions were committed. During transaction execution, such checkpoints are
traced. After execution, transaction log files will be created. Upon reaching the
savepoint/checkpoint, the log file is destroyed by saving its update to the database. Then a new log is
created with upcoming execution operations of the transaction and it will be updated until the next
checkpoint and the process continues.
Steps to Use Checkpoints in the Database
1. Write the begin_checkpoint record into a log.
2. Collect checkpoint data in stable storage.
3. Write the end_checkpoint record into a log.
Transactions and operations of the above diagram:
Transaction 1 Transaction 2 Transaction 3 Transaction 4
(T1) (T2) (T3) (T4)
START
START
COMMIT
START
COMMIT
Transaction 1 Transaction 2 Transaction 3 Transaction 4
(T1) (T2) (T3) (T4)
START
FAILURE
Types of Checkpoints
There are basically two main types of Checkpoints:
1. Automatic Checkpoint
2. Manual Checkpoint
1. Automatic Checkpoint: These checkpoints occur very frequently like every hour or every day.
These intervals are set by the database administrator. They are generally used by heavy databases as
they are frequently updated, and we can recover the data easily in case of failure.
2. Manual Checkpoint: These are the checkpoints that are manually set by the database
administrator. Manual checkpoints are generally used for smaller databases. They are updated very
less frequently only when they are set by the database administrator.
ADVANTAGES OF CHECKPOINTS
• Checkpoints help us in recovering the transaction of the database in case of a random
shutdown of the database.
• It enhancing the consistency of the database in case when multiple transactions are
executing in the database simultaneously.
• It increasing the data recovery process.
• Checkpoints work as a synchronization point between the database and the transaction log
file in the database.
• Checkpoint records in the log file are used to prevent unnecessary redo operations.
• Since dirty pages are flushed out continuously in the background, it has a very low overhead
and can be done frequently.
• Checkpoints provide the baseline information needed for the restoration of the lost state in
the event of a system failure.
• A database checkpoint keeps track of change information and enables incremental database
backup.
• A database storage checkpoint can be mounted, allowing regular file system operations to be
performed.
• Database checkpoints can be used for application solutions which include backup, recovery
or database modifications.
DISADVANTAGES OF CHECKPOINTS
1. Database storage checkpoints can only be used to restore from logical errors (E.g. a human error).
2. Because all the data blocks are on the same physical device, database storage checkpoints cannot
be used to restore files due to a media failure.
Real-Time Applications of Checkpoints
1. Backup and Recovery
2. Performance Optimization
3. Auditing
1. Checkpoint and Recovery
A checkpoint is one of the key tools which helps in the recovery process of the database. In
case of a system failure, DBMS can find the information stored in the checkpoint to recover
the database till its last known stage.
The speed of recovery in case of a system failure depends on the duration of the checkpoint
set by the database administrator. For Example, if the checkpoint interval is set to a shorter
duration, it helps in faster recovery and vice-versa. If more frequent checkpoint has to be
written to disk, it can also impact the performance.
2. Performance Optimization
Checkpoint plays an essential role in the Recovery of the database. Still, it also plays a vital
role in improving the performance of DBMS, and this can be done by reducing the amount of
work that should be done during recovery. It can discard any unnecessary information which
helps to keep the database clean and better for optimization purposes.
Another way in which checkpoint is used to improve the performance of the database is by
reducing the amount of data that is to be read from the disk in case of recovery. Analyzing
the checkpoints clearly helps in minimizing the data that is to be read from the disk, which
improves the recovery time. and in that way, it helps in Performance Optimization.
3. Checkpoints and Auditing
Checkpoints can be used for different purposes like Performance Optimization, it can also be
used for Auditing Purposes. Checkpoints help view the database’s history and identify any
problem that had happened at any particular time.
In case of any type of failure, database administrators can use the checkpoint to determine
when it has happened and what amount of data has been affected.
DEADLOCK: In database management systems (DBMS) a deadlock occurs when two or more
transactions are unable to the proceed because each transaction is waiting for the other to
the release locks on resources. This situation creates a cycle of the dependencies where no
transaction can continue leading to the standstill in the system. The Deadlocks can severely
impact the performance and reliability of a DBMS making it crucial to the understand and
manage them effectively.
Characteristics of Deadlock
• Mutual Exclusion: Only one transaction can hold a particular resource at a time.
• Hold and Wait: The Transactions holding resources may request additional resources
held by others.
• No Pre_emption: The Resources cannot be forcibly taken from the transaction
holding them.
• Circular Wait: A cycle of transactions exists where each transaction is waiting for the
resource held by the next transaction in the cycle.
Deadlock in DBMS
Deadlock Handling
Deadlock handling is a critical aspect of operating systems, database systems, and distributed
systems. Deadlock occurs when a group of processes or threads become stuck, each waiting for the
other to release resources, which never happens. This results in a situation where none of the
processes can continue their execution. There are several approaches to handle deadlocks:
Deadlock Avoidance: When a database is stuck in a deadlock, It is always better to avoid the
deadlock rather than restarting or aborting the database. The deadlock avoidance method is suitable
for smaller databases whereas the deadlock prevention method is suitable for larger databases.
One method of avoiding deadlock is using application-consistent logic. In the above-given example,
Transactions that access Students and Grades should always access the tables in the same order. In
this way, in the scenario described above, Transaction T1 simply waits for transaction T2 to release
the lock on Grades before it begins. When transaction T2 releases the lock, Transaction T1 can
proceed freely.
Deadlock Detection: When a transaction waits indefinitely to obtain a lock, The database
management system should detect whether the transaction is involved in a deadlock or not.
Wait-for-graph is one of the methods for detecting the deadlock situation. This method is suitable
for smaller databases. In this method, a graph is drawn based on the transaction and its lock on
the resource. If the graph created has a closed loop or a cycle, then there is a deadlock. For the
above-mentioned scenario, the Wait-For graph is drawn below:
Deadlock Prevention: For a large database, the deadlock prevention method is suitable. A deadlock
can be prevented if the resources are allocated in such a way that a deadlock never occurs. The
DBMS analyzes the operations whether they can create a deadlock situation or not, If they do, that
transaction is never allowed to be executed.
Deadlock prevention mechanism proposes two schemes:
• Wait-Die Scheme: In this scheme, if a transaction requests a resource that is locked by
another transaction, then the DBMS simply checks the timestamp of both transactions and
allows the older transaction to wait until the resource is available for execution.
• Wound Wait Scheme: In this scheme, if an older transaction requests for a resource held by
a younger transaction, then an older transaction forces a younger transaction to kill the
transaction and release the resource. The younger transaction is restarted with a minute
delay but with the same timestamp. If the younger transaction is requesting a resource that
is held by an older one, then the younger transaction is asked to wait till the older one
releases it.
Deadlock Recovery: After deadlock is detected, recovery mechanisms can be employed to
recover from the situation. This involves:
• Rollback: Roll back some or all of the deadlocked processes to a safe state, where
they can start executing again.
• Process Termination: As mentioned above, forcibly terminating deadlocked
processes and freeing the resources they were holding.