0% found this document useful (0 votes)
12 views21 pages

DBMS Unit-5 Notes

The document discusses storage types in Database Management Systems (DBMS), categorizing them into primary, secondary, and tertiary memory, each with distinct characteristics and uses. It also covers concurrency control, explaining the importance of managing simultaneous transactions to prevent data inconsistency, along with various concurrency control protocols like lock-based and timestamp-based methods. Additionally, it highlights the advantages and disadvantages of concurrency in DBMS, including issues like deadlocks and complexity.

Uploaded by

tyagih682
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

DBMS Unit-5 Notes

The document discusses storage types in Database Management Systems (DBMS), categorizing them into primary, secondary, and tertiary memory, each with distinct characteristics and uses. It also covers concurrency control, explaining the importance of managing simultaneous transactions to prevent data inconsistency, along with various concurrency control protocols like lock-based and timestamp-based methods. Additionally, it highlights the advantages and disadvantages of concurrency in DBMS, including issues like deadlocks and complexity.

Uploaded by

tyagih682
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Database Management Systems

Notes of Unit-5
Storage Types in DBMS

The records in databases are stored in file formats. Physically, the data is stored in
electromagnetic format on a device. The electromagnetic devices used in database systems
for data storage are classified as follows:
1. Primary Memory
2. Secondary Memory
3. Tertiary Memory

Types of Memory

1. Primary Memory

The primary memory of a server is the type of data storage that is directly accessible by the
central processing unit, meaning that it doesn’t require any other devices to read from it.
The primary memory must, in general, function flawlessly with equal contributions from the
electric power supply, the hardware backup system, the supporting devices, the coolant that
moderates the system temperature, etc.
• The size of these devices is considerably smaller and they are volatile.
• According to performance and speed, the primary memory devices are the fastest devices,
and this feature is in direct correlation with their capacity.
• These primary memory devices are usually more expensive due to their increased speed
and performance.

The cache is one of the types of Primary Memory.

• Cache Memory: Cache Memory is a special very high-speed memory. It is used to speed
up and synchronize with a high-speed CPU. Cache memory is costlier than main memory
or disk memory but more economical than CPU registers. Cache memory is an extremely
fast memory type that acts as a buffer between RAM and the CPU.
2. Secondary Memory

Data storage devices known as secondary storage, as the name suggests, are devices that can
be accessed for storing data that will be needed at a later point in time for various purposes or
database actions. Therefore, these types of storage systems are sometimes called backup units
as well. Devices that are plugged or connected externally fall under this memory category,
unlike primary memory, which is part of the CPU. The size of this group of devices is
noticeably larger than the primary devices and smaller than the tertiary devices.

• It is also regarded as a temporary storage system since it can hold data when needed and
delete it when the user is done with it. Compared to primary storage devices as well as
tertiary devices, these secondary storage devices are slower with respect to actions and
pace.
• It usually has a higher capacity than primary storage systems, but it changes with the
technological world, which is expanding every day.
Some commonly used Secondary Memory types that are present in almost every system are:

• Flash Memory: Flash memory, also known as flash storage, is a type of nonvolatile
memory that erases data in units called blocks and rewrites data at the byte level. Flash
memory is widely used for storage and data transfer in consumer devices, enterprise
systems, and industrial applications. Unlike traditional hard drives, flash memories are
able to retain data even after the power has been turned off
• Magnetic Disk Storage: A Magnetic Disk is a type of secondary memory that is a flat
disc covered with a magnetic coating to hold information. It is used to store various
programs and files. The polarized information in one direction is represented by 1, and
vice versa. The direction is indicated by 0.

3. Tertiary Memory

For data storage, Tertiary Memory refers to devices that can hold a large amount of data
without being constantly connected to the server or the peripherals. A device of this type is
connected either to a server or to a device where the database is stored from the outside.
• Due to the fact that tertiary storage provides more space than other types of device
memory but is most slowly performing, the cost of tertiary storage is lower than primary
and secondary. As a means to make a backup of data, this type of storage is commonly
used for making copies from servers and databases.
• The ability to use secondary devices and to delete the contents of the tertiary devices is
similar.
Some commonly used Tertiary Memory types that are almost present in every system are:
• Optical Storage: It is a type of storage where reading and writing are to be performed
with the help of a laser. Typically data written on CDs and DVDs are examples of Optical
Storage.
• Tape Storage: Tape Storage is a type of storage data where we use magnetic tape to store
data. It is used to store data for a long time and also helps in the backup of data in case of
data loss.

Concurrency Control in DBMS

Concurrency control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data
inconsistency. Concurrency Control deals with interleaved execution of more than one
transaction.
What is a Schedule?
A schedule is a series of operations from one or more transactions. A schedule can be of two
types:

Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called a serial schedule. A serial schedule is always consistent.
e.g.; If a schedule S has debit transaction T1 and credit transaction T2, possible serial
schedules are T1 followed by T2 (T1->T2) or T2 followed by T1 ((T2->T1). A serial
schedule has low throughput and less resource utilization.

Concurrent Schedule: When operations of a transaction are interleaved with operations of


other transactions of a schedule, the schedule is called a Concurrent schedule. e.g.; the
Schedule of debit and credit transactions shown in Table 1 is concurrent. But concurrency
can lead to inconsistency in the database. The above example of a concurrent schedule is
also inconsistent.

Difference between Serial Schedule and Serializable Schedule:

Serial Schedule Serializable Schedule

In Serial schedule, transactions will be In Serializable schedule transaction are


executed one after other. executed concurrently.

Serial schedule are less efficient. Serializable schedule are more efficient.

In serial schedule only one transaction In Serializable schedule multiple


executed at a time. transactions can be executed at a time.

Serial schedule takes more time for


In Serializable schedule execution is fast.
execution.

Concurrency Control in DBMS

• Executing a single transaction at a time will increase the waiting time of the other
transactions which may result in delay in the overall execution. Hence for increasing the
overall throughput and efficiency of the system, several transactions are executed.
• Concurrency control is a very important concept of DBMS which ensures the
simultaneous execution or manipulation of data by several processes or user without
resulting in data inconsistency.
• Concurrency control provides a procedure that is able to control concurrent execution of
the operations in the database.
• The fundamental goal of database concurrency control is to ensure that concurrent
execution of transactions does not result in a loss of database consistency. The concept of
serializability can be used to achieve this goal, since all serializable schedules preserve
consistency of the database. However, not all schedules that preserve consistency of the
database are serializable.
• In general, it is not possible to perform an automatic analysis of low-level operations by
transactions and check their effect on database consistency constraints. However, there
are simpler techniques. One is to use the database consistency constraints as the basis for
a split of the database into sub databases on which concurrency can be managed
separately.
• Another is to treat some operations besides read and write as fundamental low-level
operations and to extend concurrency control to deal with them.

Concurrency Control Problems


There are several problems that arise when numerous transactions are executed
simultaneously in a random manner. The database transaction consists of two major
operations “Read” and “Write”. It is very important to manage these operations in the
concurrent execution of the transactions in order to maintain the consistency of the data.

Dirty Read Problem (Write-Read conflict)

Dirty read problem occurs when one transaction updates an item but due to some
unconditional events that transaction fails but before the transaction performs rollback, some
other transaction reads the updated value. Thus creates an inconsistency in the database. Dirty
read problem comes under the scenario of Write-Read conflict between the transactions in the
database.

1. The lost update problem can be illustrated with the below scenario between two
transactions T1 and T2.
2. Transaction T1 modifies a database record without committing the changes.
3. T2 reads the uncommitted data changed by T1
4. T1 performs rollback
5. T2 has already read the uncommitted data of T1 which is no longer valid, thus creating
inconsistency in the database.

Lost Update Problem

Lost update problem occurs when two or more transactions modify the same data, resulting in
the update being overwritten or lost by another transaction. The lost update problem can be
illustrated with the below scenario between two transactions T1 and T2.
1. T1 reads the value of an item from the database.
2. T2 starts and reads the same database item.
3. T1 updates the value of that data and performs a commit.
4. T2 updates the same data item based on its initial read and performs commit.
5. This results in the modification of T1 gets lost by the T2’s write which causes a lost
update problem in the database.

Concurrency Control Protocols

Concurrency control protocols are the set of rules which are maintained in order to solve the
concurrency control problems in the database. It ensures that the concurrent transactions can
execute properly while maintaining the database consistency. The concurrent execution of a
transaction is provided with atomicity, consistency, isolation, durability, and serializability
via the concurrency control protocols.

• Locked based concurrency control protocol


• Timestamp based concurrency control protocol

Locked based Protocol

In locked based protocol, each transaction needs to acquire locks before they start accessing
or modifying the data items. There are two types of locks used in databases.
• Shared Lock: Shared lock is also known as read lock which allows multiple transactions
to read the data simultaneously. The transaction which is holding a shared lock can only
read the data item but it cannot modify the data item.

• Exclusive Lock: Exclusive lock is also known as the write lock. Exclusive lock allows a
transaction to update a data item. Only one transaction can hold the exclusive lock on a
data item at a time. While a transaction is holding an exclusive lock on a data item, no
other transaction is allowed to acquire a shared/exclusive lock on the same data item.

There are two kinds of lock-based protocol mostly used in database:

• Two Phase Locking Protocol: Two phase locking is a widely used technique which
ensures strict ordering of lock acquisition and release. Two phase locking protocol works
in two phases.

• Growing Phase: In this phase, the transaction starts acquiring locks before
performing any modification on the data items. Once a transaction acquires a
lock, that lock cannot be released until the transaction reaches the end of the
execution.

• Shrinking Phase: In this phase, the transaction releases all the acquired locks
once it performs all the modifications on the data item. Once the transaction
starts releasing the locks, it cannot acquire any locks further.

• Strict Two-Phase Locking Protocol: It is almost similar to the two phase locking
protocol the only difference is that in two phase locking the transaction can release its
locks before it commits, but in case of strict two phase locking the transactions are only
allowed to release the locks only when they performs commits.

Timestamp based Protocol

• In this protocol each transaction has a timestamp attached to it. Timestamp is nothing but
the time in which a transaction enters into the system.
• The conflicting pairs of operations can be resolved by the timestamp ordering protocol
through the utilization of the timestamp values of the transactions. Therefore,
guaranteeing that the transactions take place in the correct order.

Advantages of Concurrency

In general, concurrency means, that more than one transaction can work on a system. The
advantages of a concurrent system are:

• Waiting Time: It means if a process is in a ready state but still the process does not get
the system to get execute is called waiting time. So, concurrency leads to less waiting
time.
• Response Time: The time wasted in getting the response from the CPU for the first time,
is called response time. So, concurrency leads to less Response Time.
• Resource Utilization: The amount of Resource utilization in a particular system is called
Resource Utilization. Multiple transactions can run parallel in a system. So, concurrency
leads to more Resource Utilization.
• Efficiency: The amount of output produced in comparison to given input is called
efficiency. So, Concurrency leads to more Efficiency.
Disadvantages of Concurrency

• Overhead: Implementing concurrency control requires additional overhead, such as


acquiring and releasing locks on database objects. This overhead can lead to slower
performance and increased resource consumption, particularly in systems with high levels
of concurrency.
• Deadlocks: Deadlocks can occur when two or more transactions are waiting for each
other to release resources, causing a circular dependency that can prevent any of the
transactions from completing. Deadlocks can be difficult to detect and resolve, and can
result in reduced throughput and increased latency.
• Reduced concurrency: Concurrency control can limit the number of users or
applications that can access the database simultaneously. This can lead to reduced
concurrency and slower performance in systems with high levels of concurrency.
• Complexity: Implementing concurrency control can be complex, particularly in
distributed systems or in systems with complex transactional logic. This complexity can
lead to increased development and maintenance costs.
• Inconsistency: In some cases, concurrency control can lead to inconsistencies in the
database. For example, a transaction that is rolled back may leave the database in an
inconsistent state, or a long-running transaction may cause other transactions to wait for
extended periods, leading to data staleness and reduced accuracy.

Lock-Based Protocol

In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.

2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.

There are four types of lock protocols available:

1. Simplistic lock protocol

It is the simplest way of locking the data while transaction. Simplistic lock-based protocols
allow all the transactions to get the lock on the data before insert or delete or update on it. It
will unlock the data item after completing the transaction.
2. Pre-claiming Lock Protocol

o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on
which they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on
all those data items.
o If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls back
and waits until all the locks are granted.

3. Two-phase locking (2PL)


o The two-phase locking protocol divides the execution phase of the transaction into
three parts.
o In the first part, when the execution of the transaction starts, it seeks permission for
the lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.

There are two phases of 2PL:


Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.

In the below example, if lock conversion is allowed then the following phase can happen:

1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.


2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.

Example:

The following way shows how unlocking and locking work with 2-PL.

Transaction T1:

o Growing phase: from step 1-3


o Shrinking phase: from step 5-7
o Lock point: at 3

Transaction T2:

o Growing phase: from step 2-6


o Shrinking phase: from step 8-9
o Lock point: at 6
4. Strict Two-phase locking (Strict-2PL)

o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all
the locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a
lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.

It does not have cascading abort as 2PL does.

Failure Classification in DBMS

Failure in terms of a database can be defined as its inability to execute the specified
transaction or loss of data from the database. A DBMS is vulnerable to several kinds of
failures and each of these failures needs to be managed differently. There are many reasons
that can cause database failures such as network failure, system crash, natural disasters,
carelessness, sabotage (corrupting the data intentionally), software errors, etc.
Failure Classification in DBMS

A failure in DBMS can be classified as:

Failure Classification in DBMS

Transaction Failure:

If a transaction is not able to execute or it comes to a point from where the transaction
becomes incapable of executing further then it is termed as a failure in a transaction.
Reason for a transaction failure in DBMS:

1. Logical error: A logical error occurs if a transaction is unable to execute because of


some mistakes in the code or due to the presence of some internal faults.
2. System error: Where the termination of an active transaction is done by the database
system itself due to some system issue or because the database management system is
unable to proceed with the transaction. For example– The system ends an operating
transaction if it reaches a deadlock condition or if there is an unavailability of resources.

System Crash:

A system crash usually occurs when there is some sort of hardware or software breakdown.
Some other problems which are external to the system and cause the system to abruptly stop
or eventually crash include failure of the transaction, operating system errors, power cuts,
main memory crash, etc.
These types of failures are often termed soft failures and are responsible for the data losses in
the volatile memory. It is assumed that a system crash does not have any effect on the data
stored in the non-volatile storage and this is known as the fail-stop assumption.
Data-transfer Failure:

When a disk failure occurs amid data-transfer operation resulting in loss of content from disk
storage then such failures are categorized as data-transfer failures. Some other reason for disk
failures includes disk head crash, disk unreachability, formation of bad sectors, read-write
errors on the disk, etc.
In order to quickly recover from a disk failure caused amid a data-transfer operation, the
backup copy of the data stored on other tapes or disks can be used. Thus it’s a good practice
to backup your data frequently.

Database Recovery Techniques in DBMS

Database Systems like any other computer system, are subject to failures but the data stored
in them must be available as and when required. When a database fails it must possess the
facilities for fast recovery. It must also have atomicity i.e. either transactions are completed
successfully and committed (the effect is recorded permanently in the database) or the
transaction should have no effect on the database.

Types of Recovery Techniques in DBMS


Database recovery techniques are used in database management systems (DBMS) to restore a
database to a consistent state after a failure or error has occurred. The main goal of recovery
techniques is to ensure data integrity and consistency and prevent data loss.

There are mainly two types of recovery techniques used in DBMS

• Rollback/Undo Recovery Technique


• Commit/Redo Recovery Technique

Rollback/Undo Recovery Technique:

The rollback/undo recovery technique is based on the principle of backing out or undoing the
effects of a transaction that has not been completed successfully due to a system failure or
error. This technique is accomplished by undoing the changes made by the transaction using
the log records stored in the transaction log. The transaction log contains a record of all the
transactions that have been performed on the database. The system uses the log records to
undo the changes made by the failed transaction and restore the database to its previous state.
Commit/Redo Recovery Technique:

The commit/redo recovery technique is based on the principle of reapplying the changes
made by a transaction that has been completed successfully to the database. This technique is
accomplished by using the log records stored in the transaction log to redo the changes made
by the transaction that was in progress at the time of the failure or error. The system uses the
log records to reapply the changes made by the transaction and restore the database to its
most recent consistent state.

In addition to these two techniques, there is also a third technique called checkpoint recovery.

Checkpoint Recovery is a technique used to reduce the recovery time by periodically saving
the state of the database in a checkpoint file. In the event of a failure, the system can use the
checkpoint file to restore the database to the most recent consistent state before the failure
occurred, rather than going through the entire log to recover the database.
Overall, recovery techniques are essential to ensure data consistency and availability
in Database Management System, and each technique has its own advantages and limitations
that must be considered in the design of a recovery system.

Database Systems:
There are both automatic and non-automatic ways for both, backing up data and recovery
from any failure situations. The techniques used to recover lost data due to system crashes,
transaction errors, viruses, catastrophic failure, incorrect command execution, etc. are
database recovery techniques. So to prevent data loss recovery techniques based on deferred
updates and immediate updates or backing up data can be used. Recovery techniques are
heavily dependent upon the existence of a special file known as a system log. It contains
information about the start and end of each transaction and any updates which occur during
the transaction. The log keeps track of all transaction operations that affect the values of
database items. This information is needed to recover from transaction failure.

• The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
• read_item(T, X): This log entry records that transaction T reads the value of database
item X.
• write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new value is known as an
afterimage of X.
• commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
• abort(T): This records that transaction T has been aborted.
• checkpoint: A checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in a consistent state, and all the transactions were committed.

A transaction T reaches its commit point when all its operations that access the database have
been executed successfully i.e. the transaction has reached the point at which it will
not abort (terminate without completing). Once committed, the transaction is permanently
recorded in the database. Commitment always involves writing a commit entry to the log and
writing the log to disk. At the time of a system crash, the item is searched back in the log for
all transactions T that have written a start_transaction(T) entry into the log but have not
written a commit(T) entry yet; these transactions may have to be rolled back to undo their
effect on the database during the recovery process.

• Undoing: If a transaction crashes, then the recovery manager may undo transactions i.e.
reverse the operations of a transaction. This involves examining a transaction for the log
entry write_item(T, x, old_value, new_value) and setting the value of item x in the
database to old-value. There are two major techniques for recovery from non-catastrophic
transaction failures: deferred updates and immediate updates.

• Deferred Update: This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates
are recorded in the local transaction workspace. If a transaction fails before reaching its
commit point, it will not have changed the database in any way so UNDO is not needed.
It may be necessary to REDO the effect of the operations that are recorded in the local
transaction workspace, because their effect may not yet have been written in the database.
Hence, a deferred update is also known as the No-undo/redo algorithm.
• Immediate Update: In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However,
these operations are recorded in a log on disk before they are applied to the database,
making recovery still possible. If a transaction fails to reach its commit point, the effect of
its operation must be undone i.e., the transaction must be rolled back hence we require
both undo and redo. This technique is known as undo/redo algorithm.

• Caching/Buffering: In these one or more disk pages that include data items to be
updated are cached into main memory buffers and then updated in memory before being
written back to disk. A collection of in-memory buffers called the DBMS cache is kept
under the control of DBMS for holding these buffers. A directory is used to keep track of
which database items are in the buffer. A dirty bit is associated with each buffer, which is
0 if the buffer is not modified else 1 if modified.

• Shadow Paging: It provides atomicity and durability. A directory with n entries is


constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory. When
a page is to be modified, a shadow page is allocated in which changes are made and when
it is ready to become durable, all pages that refer to the original are updated to refer new
replacement page.

• Backward Recovery: The term “Rollback” and “UNDO” can also refer to backward
recovery. When a backup of the data is not available and previous modifications need to
be undone, this technique can be helpful. With the backward recovery method, unused
modifications are removed and the database is returned to its prior condition. All
adjustments made during the previous traction are reversed during the backward
recovery. In other words, it reprocesses valid transactions and undoes the erroneous
database updates.

• Forward Recovery: “Roll forward “and “REDO” refers to forwarding recovery. When
a database needs to be updated with all changes verified, this forward recovery technique
is helpful. Some failed transactions in this database are applied to the database to roll
those modifications forward. In other words, the database is restored using preserved data
and valid transactions counted by their past saves.

Backup Techniques:
There are different types of Backup Techniques. Some of them are listed below.
• Full database Backup: In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are backed
up in a predefined time series.

• Differential Backup: It stores only the data changes that have occurred since the last full
database backup. When some data has changed many times since the last full database
backup, a differential backup stores the most recent version of the changed data. For this
first, we need to restore a full database backup.

• Transaction Log Backup: In this, all events that have occurred in the database, like a
record of every single statement executed is backed up. It is the backup of transaction log
entries and contains all transactions that had happened to the database. Through this, the
database can be recovered to a specific point in time. It is even possible to perform a
backup from a transaction log if the data files are destroyed and not even a single
committed transaction is lost.
Transaction Management in DBMS
A transaction is a set of logically related operations. For example, you are transferring money
from your bank account to your friend’s account, the set of operations would be like this:

Simple Transaction Example


1. Read your account balance
2. Deduct the amount from your balance
3. Write the remaining balance to your account
4. Read your friend’s account balance
5. Add the amount to his account balance
6. Write the new updated balance to his account
This whole set of operations can be called a transaction. Although I have shown you read, write
and update operations in the above example but the transaction can have operations like read,
write, insert, update, delete.

In DBMS, we write the above 6 steps transaction like this:


Lets say your account is A and your friend’s account is B, you are transferring 10000 from A
to B, the steps of the transaction are:
1. R(A);
2. A = A - 10000;
3. W(A);
4. R(B);
5. B = B + 10000;
6. W(B);
In the above transaction R refers to the Read operation and W refers to the write operation.

Transaction failure in between the operations

Now that we understand what is transaction, we should understand what are the problems
associated with it.

The main problem that can happen during a transaction is that the transaction can fail before
finishing the all the operations in the set. This can happen due to power failure, system crash
etc. This is a serious problem that can leave database in an inconsistent state. Assume that
transaction fail after third operation (see the example above) then the amount would be
deducted from your account but your friend will not receive it.

To solve this problem, we have the following two operations

Commit: If all the operations in a transaction are completed successfully then commit those
changes to the database permanently.

Rollback: If any of the operation fails then rollback all the changes done by previous
operations.
Even though these operations can help us avoiding several issues that may arise during
transaction but they are not sufficient when two transactions are running concurrently. To
handle those problems we need to understand database ACID properties.

ACID Properties in DBMS


A transaction is a single logical unit of work which accesses and possibly modifies the
contents of a database. Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after the transaction, certain
properties are followed. These are called ACID properties.

Atomicity
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as
one unit and either runs to completion or is not executed at all. It involves the following two
operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.

Consider the following transaction T consisting of T1 and T2: Transfer of 100 from
account X to account Y.
If the transaction fails after completion of T1 but before completion of T2.( say,
after write(X) but before write(Y)), then amount has been deducted from X but not added
to Y. This results in an inconsistent database state. Therefore, the transaction must be executed
in entirety in order to ensure correctness of database state.

Consistency
This means that integrity constraints must be maintained so that the database is consistent
before and after the transaction. It refers to the correctness of a database. Referring to the
example above, The total amount before and after the transaction must be maintained.

Total before T occurs = 500 + 200 = 700.


Total after T occurs = 400 + 300 = 700.

Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As


a result T is incomplete.

Isolation
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of database state. Transactions occur independently without interference.
Changes occurring in a particular transaction will not be visible to any other transaction until
that particular change in that transaction is written to memory or has been committed. This
property ensures that the execution of transactions concurrently will result in a state that is
equivalent to a state achieved these were executed serially in some order.
Let X= 500, Y = 500.

Consider two transactions T and T”.

Suppose T has been executed till Read (Y) and then T’’ starts. As a result , interleaving of
operations takes place due to which T’’ reads correct value of X but incorrect value of Y and
sum computed by

T’’: (X+Y = 50, 000+500=50, 500)


is thus not consistent with the sum at end of transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).

This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main
memory.

Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory.
The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness and consistency
of a database in a way such that each transaction is a group of operations that acts a single unit,
produces consistent results, acts in isolation from other operations and updates that it makes
are durably stored.

DBMS Transaction States


In this guide, we will discuss the states of a transaction in DBMS. A transaction in DBMS
can be in one of the following states.
DBMS Transaction States Diagram

Lets discuss these states one by one.


Active State
As we have discussed in the DBMS transaction introduction that a transaction is a sequence of
operations. If a transaction is in execution then it is said to be in active state. It doesn’t matter
which step is in execution, until unless the transaction is executing, it remains in active state.
Failed State
If a transaction is executing and a failure occurs, either a hardware failure or a software failure
then the transaction goes into failed state from the active state.
Partially Committed State
As we can see in the above diagram that a transaction goes into “partially committed” state
from the active state when there are read and write operations present in the transaction.
A transaction contains number of read and write operations. Once the whole transaction is
successfully executed, the transaction goes into partially committed state where we have all the
read and write operations performed on the main memory (local memory) instead of the actual
database.
The reason why we have this state is because a transaction can fail during execution so if we
are making the changes in the actual database instead of local memory, database may be left in
an inconsistent state in case of any failure. This state helps us to rollback the changes made
to the database in case of a failure during execution.
Committed State
If a transaction completes the execution successfully then all the changes made in the local
memory during partially committed state are permanently stored in the database. You can
also see in the above diagram that a transaction goes from partially committed state to
committed state when everything is successful.
Aborted State
As we have seen above, if a transaction fails during execution then the transaction goes into a
failed state. The changes made into the local memory (or buffer) are rolled back to the previous
consistent state and the transaction goes into aborted state from the failed state. Refer the
diagram to see the interaction between failed and aborted state.

Types of Schedules in DBMS


Schedule, as the name suggests, is a process of lining the transactions and executing them one
by one. When there are multiple transactions that are running in a concurrent manner and the
order of operation is needed to be set so that the operations do not overlap each other,
Scheduling is brought into play and the transactions are timed accordingly.

1. Serial Schedules:

Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule
is one in which no transaction starts until a running transaction has ended are called
serial schedules. i.e., In Serial schedule, a transaction is executed completely before
starting the execution of another transaction. In other words, you can say that in serial
schedule, a transaction does not start execution until the currently running transaction
finished execution. This type of execution of transaction is also known as non-
interleaved execution. The example we have seen above is the serial schedule.

Example: Consider the following schedule involving two transactions T1 and T2.

T1 T2

R(A)

W(A)

R(B)

W(B)

R(A)

R(B)

where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T 1 —> T2

2. Non-Serial Schedule:

This is a type of Scheduling where the operations of multiple transactions are interleaved.
This might lead to a rise in the concurrency problem. The transactions are executed in a non-
serial manner, keeping the end result correct and same as the serial schedule. Unlike the
serial schedule where one transaction must wait for another to complete all its operation, in
the non-serial schedule, the other transaction proceeds without waiting for the previous
transaction to complete. This sort of schedule does not provide any benefit of the concurrent
transaction. It can be of two types namely, Serializable and Non-Serializable Schedule.

The Non-Serial Schedule can be divided further into Serializable and Non-Serializable.

Serializable:
This is used to maintain the consistency of the database. It is mainly used in the Non-Serial
scheduling to verify whether the scheduling will lead to any inconsistency or not. On the other
hand, a serial schedule does not need the serializability because it follows a transaction only
when the previous transaction is complete. The non-serial schedule is said to be in a serializable
schedule only when it is equivalent to the serial schedules, for an n number of transactions.
Since concurrency is allowed in this case thus, multiple transactions can execute concurrently.
These are of two types:

1. Conflict Serializable: A schedule is called conflict serializable if it can be transformed


into a serial schedule by swapping non-conflicting operations.
2. View Serializable: A Schedule is called view serializable if it is view equal to a serial
schedule (no overlapping transactions). A conflict schedule is a view serializable but if
the serializability contains blind writes, then the view serializable does not conflict
serializable.

Non-Serializable:
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable
Schedule.

Recoverable Schedule:

Schedules in which transactions commit only after all transactions whose changes they read
commit are called recoverable schedules. In other words, if some transaction Tj is reading value
updated or written by some other transaction Ti, then the commit of Tj must occur after the
commit of Ti.

Example – Consider the following schedule involving two transactions T 1 and T2.

T1 T2

R(A)

W(A)

W(A)

R(A)

Commit

Commit

This is a recoverable schedule since T1 commits before T2, that makes the value read by
T2 correct.

3. Non-Recoverable Schedule:

Example: Consider the following schedule involving two transactions T 1 and T2.

T1 T2

R(A)

W(A)

W(A)

R(A)

Commit

Abort
4. T2 read the value of A written by T1, and committed. T1 later aborted, therefore the value
read by T2 is wrong, but since T2 committed, this schedule is non-recoverable.

You might also like