DBMS Unit-5 Notes
DBMS Unit-5 Notes
Notes of Unit-5
Storage Types in DBMS
The records in databases are stored in file formats. Physically, the data is stored in
electromagnetic format on a device. The electromagnetic devices used in database systems
for data storage are classified as follows:
1. Primary Memory
2. Secondary Memory
3. Tertiary Memory
Types of Memory
1. Primary Memory
The primary memory of a server is the type of data storage that is directly accessible by the
central processing unit, meaning that it doesn’t require any other devices to read from it.
The primary memory must, in general, function flawlessly with equal contributions from the
electric power supply, the hardware backup system, the supporting devices, the coolant that
moderates the system temperature, etc.
• The size of these devices is considerably smaller and they are volatile.
• According to performance and speed, the primary memory devices are the fastest devices,
and this feature is in direct correlation with their capacity.
• These primary memory devices are usually more expensive due to their increased speed
and performance.
• Cache Memory: Cache Memory is a special very high-speed memory. It is used to speed
up and synchronize with a high-speed CPU. Cache memory is costlier than main memory
or disk memory but more economical than CPU registers. Cache memory is an extremely
fast memory type that acts as a buffer between RAM and the CPU.
2. Secondary Memory
Data storage devices known as secondary storage, as the name suggests, are devices that can
be accessed for storing data that will be needed at a later point in time for various purposes or
database actions. Therefore, these types of storage systems are sometimes called backup units
as well. Devices that are plugged or connected externally fall under this memory category,
unlike primary memory, which is part of the CPU. The size of this group of devices is
noticeably larger than the primary devices and smaller than the tertiary devices.
• It is also regarded as a temporary storage system since it can hold data when needed and
delete it when the user is done with it. Compared to primary storage devices as well as
tertiary devices, these secondary storage devices are slower with respect to actions and
pace.
• It usually has a higher capacity than primary storage systems, but it changes with the
technological world, which is expanding every day.
Some commonly used Secondary Memory types that are present in almost every system are:
• Flash Memory: Flash memory, also known as flash storage, is a type of nonvolatile
memory that erases data in units called blocks and rewrites data at the byte level. Flash
memory is widely used for storage and data transfer in consumer devices, enterprise
systems, and industrial applications. Unlike traditional hard drives, flash memories are
able to retain data even after the power has been turned off
• Magnetic Disk Storage: A Magnetic Disk is a type of secondary memory that is a flat
disc covered with a magnetic coating to hold information. It is used to store various
programs and files. The polarized information in one direction is represented by 1, and
vice versa. The direction is indicated by 0.
3. Tertiary Memory
For data storage, Tertiary Memory refers to devices that can hold a large amount of data
without being constantly connected to the server or the peripherals. A device of this type is
connected either to a server or to a device where the database is stored from the outside.
• Due to the fact that tertiary storage provides more space than other types of device
memory but is most slowly performing, the cost of tertiary storage is lower than primary
and secondary. As a means to make a backup of data, this type of storage is commonly
used for making copies from servers and databases.
• The ability to use secondary devices and to delete the contents of the tertiary devices is
similar.
Some commonly used Tertiary Memory types that are almost present in every system are:
• Optical Storage: It is a type of storage where reading and writing are to be performed
with the help of a laser. Typically data written on CDs and DVDs are examples of Optical
Storage.
• Tape Storage: Tape Storage is a type of storage data where we use magnetic tape to store
data. It is used to store data for a long time and also helps in the backup of data in case of
data loss.
Concurrency control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data
inconsistency. Concurrency Control deals with interleaved execution of more than one
transaction.
What is a Schedule?
A schedule is a series of operations from one or more transactions. A schedule can be of two
types:
Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called a serial schedule. A serial schedule is always consistent.
e.g.; If a schedule S has debit transaction T1 and credit transaction T2, possible serial
schedules are T1 followed by T2 (T1->T2) or T2 followed by T1 ((T2->T1). A serial
schedule has low throughput and less resource utilization.
Serial schedule are less efficient. Serializable schedule are more efficient.
• Executing a single transaction at a time will increase the waiting time of the other
transactions which may result in delay in the overall execution. Hence for increasing the
overall throughput and efficiency of the system, several transactions are executed.
• Concurrency control is a very important concept of DBMS which ensures the
simultaneous execution or manipulation of data by several processes or user without
resulting in data inconsistency.
• Concurrency control provides a procedure that is able to control concurrent execution of
the operations in the database.
• The fundamental goal of database concurrency control is to ensure that concurrent
execution of transactions does not result in a loss of database consistency. The concept of
serializability can be used to achieve this goal, since all serializable schedules preserve
consistency of the database. However, not all schedules that preserve consistency of the
database are serializable.
• In general, it is not possible to perform an automatic analysis of low-level operations by
transactions and check their effect on database consistency constraints. However, there
are simpler techniques. One is to use the database consistency constraints as the basis for
a split of the database into sub databases on which concurrency can be managed
separately.
• Another is to treat some operations besides read and write as fundamental low-level
operations and to extend concurrency control to deal with them.
Dirty read problem occurs when one transaction updates an item but due to some
unconditional events that transaction fails but before the transaction performs rollback, some
other transaction reads the updated value. Thus creates an inconsistency in the database. Dirty
read problem comes under the scenario of Write-Read conflict between the transactions in the
database.
1. The lost update problem can be illustrated with the below scenario between two
transactions T1 and T2.
2. Transaction T1 modifies a database record without committing the changes.
3. T2 reads the uncommitted data changed by T1
4. T1 performs rollback
5. T2 has already read the uncommitted data of T1 which is no longer valid, thus creating
inconsistency in the database.
Lost update problem occurs when two or more transactions modify the same data, resulting in
the update being overwritten or lost by another transaction. The lost update problem can be
illustrated with the below scenario between two transactions T1 and T2.
1. T1 reads the value of an item from the database.
2. T2 starts and reads the same database item.
3. T1 updates the value of that data and performs a commit.
4. T2 updates the same data item based on its initial read and performs commit.
5. This results in the modification of T1 gets lost by the T2’s write which causes a lost
update problem in the database.
Concurrency control protocols are the set of rules which are maintained in order to solve the
concurrency control problems in the database. It ensures that the concurrent transactions can
execute properly while maintaining the database consistency. The concurrent execution of a
transaction is provided with atomicity, consistency, isolation, durability, and serializability
via the concurrency control protocols.
In locked based protocol, each transaction needs to acquire locks before they start accessing
or modifying the data items. There are two types of locks used in databases.
• Shared Lock: Shared lock is also known as read lock which allows multiple transactions
to read the data simultaneously. The transaction which is holding a shared lock can only
read the data item but it cannot modify the data item.
• Exclusive Lock: Exclusive lock is also known as the write lock. Exclusive lock allows a
transaction to update a data item. Only one transaction can hold the exclusive lock on a
data item at a time. While a transaction is holding an exclusive lock on a data item, no
other transaction is allowed to acquire a shared/exclusive lock on the same data item.
• Two Phase Locking Protocol: Two phase locking is a widely used technique which
ensures strict ordering of lock acquisition and release. Two phase locking protocol works
in two phases.
• Growing Phase: In this phase, the transaction starts acquiring locks before
performing any modification on the data items. Once a transaction acquires a
lock, that lock cannot be released until the transaction reaches the end of the
execution.
• Shrinking Phase: In this phase, the transaction releases all the acquired locks
once it performs all the modifications on the data item. Once the transaction
starts releasing the locks, it cannot acquire any locks further.
• Strict Two-Phase Locking Protocol: It is almost similar to the two phase locking
protocol the only difference is that in two phase locking the transaction can release its
locks before it commits, but in case of strict two phase locking the transactions are only
allowed to release the locks only when they performs commits.
• In this protocol each transaction has a timestamp attached to it. Timestamp is nothing but
the time in which a transaction enters into the system.
• The conflicting pairs of operations can be resolved by the timestamp ordering protocol
through the utilization of the timestamp values of the transactions. Therefore,
guaranteeing that the transactions take place in the correct order.
Advantages of Concurrency
In general, concurrency means, that more than one transaction can work on a system. The
advantages of a concurrent system are:
• Waiting Time: It means if a process is in a ready state but still the process does not get
the system to get execute is called waiting time. So, concurrency leads to less waiting
time.
• Response Time: The time wasted in getting the response from the CPU for the first time,
is called response time. So, concurrency leads to less Response Time.
• Resource Utilization: The amount of Resource utilization in a particular system is called
Resource Utilization. Multiple transactions can run parallel in a system. So, concurrency
leads to more Resource Utilization.
• Efficiency: The amount of output produced in comparison to given input is called
efficiency. So, Concurrency leads to more Efficiency.
Disadvantages of Concurrency
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols
allow all the transactions to get the lock on the data before insert or delete or update on it. It
will unlock the data item after completing the transaction.
2. Pre-claiming Lock Protocol
o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on
which they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on
all those data items.
o If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls back
and waits until all the locks are granted.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Transaction T2:
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all
the locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a
lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.
Failure in terms of a database can be defined as its inability to execute the specified
transaction or loss of data from the database. A DBMS is vulnerable to several kinds of
failures and each of these failures needs to be managed differently. There are many reasons
that can cause database failures such as network failure, system crash, natural disasters,
carelessness, sabotage (corrupting the data intentionally), software errors, etc.
Failure Classification in DBMS
Transaction Failure:
If a transaction is not able to execute or it comes to a point from where the transaction
becomes incapable of executing further then it is termed as a failure in a transaction.
Reason for a transaction failure in DBMS:
System Crash:
A system crash usually occurs when there is some sort of hardware or software breakdown.
Some other problems which are external to the system and cause the system to abruptly stop
or eventually crash include failure of the transaction, operating system errors, power cuts,
main memory crash, etc.
These types of failures are often termed soft failures and are responsible for the data losses in
the volatile memory. It is assumed that a system crash does not have any effect on the data
stored in the non-volatile storage and this is known as the fail-stop assumption.
Data-transfer Failure:
When a disk failure occurs amid data-transfer operation resulting in loss of content from disk
storage then such failures are categorized as data-transfer failures. Some other reason for disk
failures includes disk head crash, disk unreachability, formation of bad sectors, read-write
errors on the disk, etc.
In order to quickly recover from a disk failure caused amid a data-transfer operation, the
backup copy of the data stored on other tapes or disks can be used. Thus it’s a good practice
to backup your data frequently.
Database Systems like any other computer system, are subject to failures but the data stored
in them must be available as and when required. When a database fails it must possess the
facilities for fast recovery. It must also have atomicity i.e. either transactions are completed
successfully and committed (the effect is recorded permanently in the database) or the
transaction should have no effect on the database.
The rollback/undo recovery technique is based on the principle of backing out or undoing the
effects of a transaction that has not been completed successfully due to a system failure or
error. This technique is accomplished by undoing the changes made by the transaction using
the log records stored in the transaction log. The transaction log contains a record of all the
transactions that have been performed on the database. The system uses the log records to
undo the changes made by the failed transaction and restore the database to its previous state.
Commit/Redo Recovery Technique:
The commit/redo recovery technique is based on the principle of reapplying the changes
made by a transaction that has been completed successfully to the database. This technique is
accomplished by using the log records stored in the transaction log to redo the changes made
by the transaction that was in progress at the time of the failure or error. The system uses the
log records to reapply the changes made by the transaction and restore the database to its
most recent consistent state.
In addition to these two techniques, there is also a third technique called checkpoint recovery.
Checkpoint Recovery is a technique used to reduce the recovery time by periodically saving
the state of the database in a checkpoint file. In the event of a failure, the system can use the
checkpoint file to restore the database to the most recent consistent state before the failure
occurred, rather than going through the entire log to recover the database.
Overall, recovery techniques are essential to ensure data consistency and availability
in Database Management System, and each technique has its own advantages and limitations
that must be considered in the design of a recovery system.
Database Systems:
There are both automatic and non-automatic ways for both, backing up data and recovery
from any failure situations. The techniques used to recover lost data due to system crashes,
transaction errors, viruses, catastrophic failure, incorrect command execution, etc. are
database recovery techniques. So to prevent data loss recovery techniques based on deferred
updates and immediate updates or backing up data can be used. Recovery techniques are
heavily dependent upon the existence of a special file known as a system log. It contains
information about the start and end of each transaction and any updates which occur during
the transaction. The log keeps track of all transaction operations that affect the values of
database items. This information is needed to recover from transaction failure.
• The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
• read_item(T, X): This log entry records that transaction T reads the value of database
item X.
• write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new value is known as an
afterimage of X.
• commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
• abort(T): This records that transaction T has been aborted.
• checkpoint: A checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in a consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database have
been executed successfully i.e. the transaction has reached the point at which it will
not abort (terminate without completing). Once committed, the transaction is permanently
recorded in the database. Commitment always involves writing a commit entry to the log and
writing the log to disk. At the time of a system crash, the item is searched back in the log for
all transactions T that have written a start_transaction(T) entry into the log but have not
written a commit(T) entry yet; these transactions may have to be rolled back to undo their
effect on the database during the recovery process.
• Undoing: If a transaction crashes, then the recovery manager may undo transactions i.e.
reverse the operations of a transaction. This involves examining a transaction for the log
entry write_item(T, x, old_value, new_value) and setting the value of item x in the
database to old-value. There are two major techniques for recovery from non-catastrophic
transaction failures: deferred updates and immediate updates.
• Deferred Update: This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates
are recorded in the local transaction workspace. If a transaction fails before reaching its
commit point, it will not have changed the database in any way so UNDO is not needed.
It may be necessary to REDO the effect of the operations that are recorded in the local
transaction workspace, because their effect may not yet have been written in the database.
Hence, a deferred update is also known as the No-undo/redo algorithm.
• Immediate Update: In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However,
these operations are recorded in a log on disk before they are applied to the database,
making recovery still possible. If a transaction fails to reach its commit point, the effect of
its operation must be undone i.e., the transaction must be rolled back hence we require
both undo and redo. This technique is known as undo/redo algorithm.
• Caching/Buffering: In these one or more disk pages that include data items to be
updated are cached into main memory buffers and then updated in memory before being
written back to disk. A collection of in-memory buffers called the DBMS cache is kept
under the control of DBMS for holding these buffers. A directory is used to keep track of
which database items are in the buffer. A dirty bit is associated with each buffer, which is
0 if the buffer is not modified else 1 if modified.
• Backward Recovery: The term “Rollback” and “UNDO” can also refer to backward
recovery. When a backup of the data is not available and previous modifications need to
be undone, this technique can be helpful. With the backward recovery method, unused
modifications are removed and the database is returned to its prior condition. All
adjustments made during the previous traction are reversed during the backward
recovery. In other words, it reprocesses valid transactions and undoes the erroneous
database updates.
• Forward Recovery: “Roll forward “and “REDO” refers to forwarding recovery. When
a database needs to be updated with all changes verified, this forward recovery technique
is helpful. Some failed transactions in this database are applied to the database to roll
those modifications forward. In other words, the database is restored using preserved data
and valid transactions counted by their past saves.
Backup Techniques:
There are different types of Backup Techniques. Some of them are listed below.
• Full database Backup: In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are backed
up in a predefined time series.
• Differential Backup: It stores only the data changes that have occurred since the last full
database backup. When some data has changed many times since the last full database
backup, a differential backup stores the most recent version of the changed data. For this
first, we need to restore a full database backup.
• Transaction Log Backup: In this, all events that have occurred in the database, like a
record of every single statement executed is backed up. It is the backup of transaction log
entries and contains all transactions that had happened to the database. Through this, the
database can be recovered to a specific point in time. It is even possible to perform a
backup from a transaction log if the data files are destroyed and not even a single
committed transaction is lost.
Transaction Management in DBMS
A transaction is a set of logically related operations. For example, you are transferring money
from your bank account to your friend’s account, the set of operations would be like this:
Now that we understand what is transaction, we should understand what are the problems
associated with it.
The main problem that can happen during a transaction is that the transaction can fail before
finishing the all the operations in the set. This can happen due to power failure, system crash
etc. This is a serious problem that can leave database in an inconsistent state. Assume that
transaction fail after third operation (see the example above) then the amount would be
deducted from your account but your friend will not receive it.
Commit: If all the operations in a transaction are completed successfully then commit those
changes to the database permanently.
Rollback: If any of the operation fails then rollback all the changes done by previous
operations.
Even though these operations can help us avoiding several issues that may arise during
transaction but they are not sufficient when two transactions are running concurrently. To
handle those problems we need to understand database ACID properties.
Atomicity
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as
one unit and either runs to completion or is not executed at all. It involves the following two
operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from
account X to account Y.
If the transaction fails after completion of T1 but before completion of T2.( say,
after write(X) but before write(Y)), then amount has been deducted from X but not added
to Y. This results in an inconsistent database state. Therefore, the transaction must be executed
in entirety in order to ensure correctness of database state.
Consistency
This means that integrity constraints must be maintained so that the database is consistent
before and after the transaction. It refers to the correctness of a database. Referring to the
example above, The total amount before and after the transaction must be maintained.
Isolation
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of database state. Transactions occur independently without interference.
Changes occurring in a particular transaction will not be visible to any other transaction until
that particular change in that transaction is written to memory or has been committed. This
property ensures that the execution of transactions concurrently will result in a state that is
equivalent to a state achieved these were executed serially in some order.
Let X= 500, Y = 500.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result , interleaving of
operations takes place due to which T’’ reads correct value of X but incorrect value of Y and
sum computed by
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main
memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory.
The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness and consistency
of a database in a way such that each transaction is a group of operations that acts a single unit,
produces consistent results, acts in isolation from other operations and updates that it makes
are durably stored.
1. Serial Schedules:
Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule
is one in which no transaction starts until a running transaction has ended are called
serial schedules. i.e., In Serial schedule, a transaction is executed completely before
starting the execution of another transaction. In other words, you can say that in serial
schedule, a transaction does not start execution until the currently running transaction
finished execution. This type of execution of transaction is also known as non-
interleaved execution. The example we have seen above is the serial schedule.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
R(B)
where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T 1 —> T2
2. Non-Serial Schedule:
This is a type of Scheduling where the operations of multiple transactions are interleaved.
This might lead to a rise in the concurrency problem. The transactions are executed in a non-
serial manner, keeping the end result correct and same as the serial schedule. Unlike the
serial schedule where one transaction must wait for another to complete all its operation, in
the non-serial schedule, the other transaction proceeds without waiting for the previous
transaction to complete. This sort of schedule does not provide any benefit of the concurrent
transaction. It can be of two types namely, Serializable and Non-Serializable Schedule.
The Non-Serial Schedule can be divided further into Serializable and Non-Serializable.
Serializable:
This is used to maintain the consistency of the database. It is mainly used in the Non-Serial
scheduling to verify whether the scheduling will lead to any inconsistency or not. On the other
hand, a serial schedule does not need the serializability because it follows a transaction only
when the previous transaction is complete. The non-serial schedule is said to be in a serializable
schedule only when it is equivalent to the serial schedules, for an n number of transactions.
Since concurrency is allowed in this case thus, multiple transactions can execute concurrently.
These are of two types:
Non-Serializable:
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable
Schedule.
Recoverable Schedule:
Schedules in which transactions commit only after all transactions whose changes they read
commit are called recoverable schedules. In other words, if some transaction Tj is reading value
updated or written by some other transaction Ti, then the commit of Tj must occur after the
commit of Ti.
Example – Consider the following schedule involving two transactions T 1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
Commit
This is a recoverable schedule since T1 commits before T2, that makes the value read by
T2 correct.
3. Non-Recoverable Schedule:
Example: Consider the following schedule involving two transactions T 1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
Abort
4. T2 read the value of A written by T1, and committed. T1 later aborted, therefore the value
read by T2 is wrong, but since T2 committed, this schedule is non-recoverable.