Log based Recovery in DBMS
●
●
●
he atomicity property of DBMS states that either all the operations of
T
transactions must be performed or none. The modifications done by an
aborted transaction should not be visible to the database and the
modificationsdonebythecommittedtransactionshouldbevisible.Toachieve
our goal of atomicity, the user must first output stable storage information
describing the modifications, without modifying the database itself. This
information canhelpusensurethatallmodificationsperformedbycommitted
transactions are reflected in the database. This information can alsohelpus
ensure that no modifications made by an aborted transaction persist in the
database.
Log based Recovery in DBMS
Log and log records
helogisasequenceoflogrecords,recordingalltheupdatedactivitiesinthe
T
database. In stable storage, logs for each transaction are maintained. Any
operationwhichisperformedonthedatabaseisrecordedonthelog.Priorto
performinganymodificationtothedatabase,anupdatedlogrecordiscreated
to reflect that modification. An update log recordrepresentedas:<Ti,Xj,V1,
V2> has these fields:
1. Transaction identifier:Unique Identifier of the transaction that
performed the write operation.
2. Data item:Unique identifier of the data item written.
3. Old value:Value of data item prior to write.
4. New value:Value of data item after write operation.
Other types of log records are:
1. <Ti start>: It contains information about when a transactionTi starts.
2. <Ti commit>: It contains information about when atransaction Ti
commits.
3. <Ti abort>: It contains information about when a transactionTi
aborts.
Undo and Redo Operations
ecausealldatabasemodificationsmustbeprecededbythecreationofalog
B
record,thesystemhasavailableboththeoldvaluepriortothemodificationof
the data item and new value that is to be written for data item. This allows
system to perform redo and undo operations as appropriate:
1. Undo:using a log record sets the data item specifiedin log record to
old value.
2. Redo:using a log record sets the data item specifiedin log record to
new value.
The database can be modified using two approaches –
1. Deferred Modification Technique:If the transactiondoes not
modify the database until it has partially committed, it is said to use
deferred modification technique.
2. Immediate Modification Technique:If database modificationoccur
while the transaction is still active, it is said to use immediate
modification technique.
Recovery using Log records
fter a system crash has occurred, thesystemconsultsthelogtodetermine
A
which transactions need to be redone and which need to be undone.
1. Transaction Ti needs to be undone if the log contains the record <Ti
start> but does not contain either the record <Ti commit> or the
record <Ti abort>.
2. Transaction Ti needs to be redone if log contains record <Ti start>
and either the record <Ti commit> or the record <Ti abort>.
Use of Checkpoints –When a system crash occurs, usermust consult the
log. In principle, that need to search the entire log to determine this
information. There are two major difficulties with this approach:
1. The search process is time-consuming.
2. Most of the transactions that, according to our algorithm, need to be
redone have already written their updates into the database.
Although redoing them will cause no harm, it will cause recovery to
take longer.
To reduce these types of overhead,userintroducecheckpoints.Alogrecord
oftheform<checkpointL>isusedtorepresentacheckpointinlogwhereLis
list of transactions active at thetimeofthecheckpoint.Whenacheckpoint
a
logrecordisaddedtologallthetransactionsthathavecommittedbeforethis
checkpoint have <Ti commit> log record before the checkpoint record. Any
databasemodificationsmadebyTiiswrittentothedatabaseeitherpriortothe
checkpoint or as part ofthecheckpointitself.Thus,atrecoverytime,thereis
noneedtoperformaredooperationonTi.Afterasystemcrashhasoccurred,
the systemexaminesthelogtofindthelast<checkpointL>record.Theredo
or undo operations need to be applied only to transactions in L, and to all
transactions that startedexecutionaftertherecordwaswrittentothelog.Let
us denote this set of transactions as T. Same rules of undo and redo are
applicable on T as mentioned in Recovery using Log recordspart.Notethat
userneedtoonlyexaminethepartofthelogstartingwiththelastcheckpoint
logrecordtofindthesetoftransactionsT,andtofindoutwhetheracommitor
abortrecordoccursinthelogforeachtransactioninT.Forexample,consider
the set of transactions {T0, T1, . . ., T100}. Suppose that the most recent
checkpoint took placeduringtheexecutionoftransactionT67andT69,while
T68 and all transactions with subscripts lower than 67 completed before the
checkpoint. Thus, only transactions T67, T69, . . ., T100 need to be
consideredduringtherecoveryscheme.Eachofthemneedstoberedoneifit
has completed (that is, either committed or aborted); otherwise, it was
incomplete, and needs to be undone.
Log-based recovery is a technique used indatabase management systems
(DBMS) torecoveradatabasetoaconsistentstateintheeventofafailureor
crash. It involves the use of transaction logs, which are records of all the
transactions performed on the database.
In log-based recovery, the DBMS uses the transaction log to reconstructthe
databasetoaconsistentstate.Thetransactionlogcontainsrecordsofallthe
changesmadetothedatabase,includingupdates,inserts,anddeletes.Italso
records information about each transaction, such as its start and end times.
Whenafailureoccurs,theDBMSusesthetransactionlogtodeterminewhich
transactions were incomplete at the time of the failure. It then performs a
series of operations to undo the incomplete transactions and redo the
completed ones. This process is called the redo/undo recovery algorithm.
The redo operation involves reapplying the changes made by completed
transactionsthatwerenotyetsavedtothedatabaseatthetimeofthefailure.
This ensures that all changes are applied to the database.
The undo operation involves undoing the changes made by incomplete
transactions that were saved to the database at the time of the failure. This
restores the database to a consistent state by reversing the effects of the
incomplete transactions.
Once the redo and undo operations are completed, theDBMScanbringthe
database back online and resume normal operations.
Log-basedrecoveryisanessentialfeatureofmodernDBMSsandprovidesa
reliable mechanism for recoveringfromfailuresandensuringtheconsistency
of the database.
Advantages of Log based Recovery
● urability: In the event of a breakdown, the log file offers a
D
dependable and long-lasting method of recovering data. It
guarantees that in the event of a system crash, no committed
transaction is lost.
● Faster Recovery: Since log-based recovery recoversdatabases by
replaying committed transactions from the log file, it is typically faster
than alternative recovery methods.
● Incremental Backup:Backups can be made in incrementsusing
log-based recovery. Just the changes made since the last backup
are kept in the log file, rather than creating a complete backup of the
database each time.
● Lowers the Risk of Data Corruption:By making surethat all
transactions are correctly committed or canceled before they are
written to the database, log-based recovery lowersthe risk of data
corruption.
Disadvantages of Log based Recovery
● dditional overhead:Maintaining the log file incursan additional
A
overhead on the database system, which can reduce the
performance of the system.
● Complexity: Log-based recovery is a complex processthat requires
careful management and administration. If not managed properly, it
can lead to data inconsistencies or loss.
● Storage space: T he log file can consume a significantamount of
storage space, especially in a database with a large number of
transactions.
● Time-Consuming:The process of replaying the transactionsfrom
the log file can be time-consuming, especially if there are a large
number of transactions to recover.
Checkpoints in DBMS
●
●
he Checkpoint is used to declare a point before which the DBMSwas in a
T
consistent state, and all transactions were committed. During transaction
execution, such checkpoints are traced. After execution, transaction log files
will be created. Upon reaching the savepoint/checkpoint, the log file is
destroyed by saving its update to the database. Then a new log is created
with upcoming execution operations of the transaction and it willbeupdated
until the next checkpoint and the process continues.
Why do We Need Checkpoints?
henever transaction logs are created in a real-timeenvironment,iteatsup
W
lotsofstoragespace.Alsokeepingtrackofeveryupdateanditsmaintenance
mayincreasethephysicalspaceofthesystem.Eventually,thetransactionlog
file may not be handled as the size keeps growing. This can be addressed
with checkpoints. The methodology utilized for removing all previous
transaction logs and storing them in permanent storage is called a
Checkpoint.
Steps to Use Checkpoints in the Database
. Write the begin_checkpoint record into a log.
1
2. Collect checkpoint data in stable storage.
3. Write the end_checkpoint record into a log.
The behavior when the system crashes and recovers when concurrent
transactions are executed is shown below:
Understanding Checkpoints in multiple Transactions
Transactions and operations of the above diagram:
Transaction 1 Transaction 2 Transaction 3 Transaction 4
(T1) (T2) (T3) (T4)
START
START
COMMIT
START
COMMIT
START
Transaction 1 Transaction 2 Transaction 3 Transaction 4
(T1) (T2) (T3) (T4)
FAILURE
● The recovery system reads the logs backward from the end to the
last checkpoint i.e. from T4 to T1.
● It will keep track of two lists – Undo and Redo.
● Whenever there is a log with instructions <Tn, start>and <Tn,
commit> or only <Tn, commit> then it will put that transaction in
Redo List. T2 and T3 contain <Tn, Start> and <Tn, Commit>
whereas T1 will have only<Tn,Commit>.Here,T1,T2,andT3are
in the redo list.
● Whenever a log record with no instruction of commit or abort is
found,thattransactionisputtoUndoList<Here,T4has<Tn,Start>
butno<Tn,commit>asitisanongoingtransaction.T4willbeputon
the undo list.
All the transactions in the redo list are deleted with their previous logs and
then redone before saving their logs. All the transactions intheundolistare
undone and their logs are deleted.
Types of Checkpoints
There are basically two main types of Checkpoints:
. Automatic Checkpoint
1
2. Manual Checkpoint
1.AutomaticCheckpoint:Thesecheckpointsoccurveryfrequentlylikeevery
houroreveryday.Theseintervalsaresetbythe d atabaseadministrator.They
are generally used by heavy databases as they are frequently updated, and
we can recover the data easily in case of failure.
2. Manual Checkpoint:These are the checkpoints that are manually set by
thedatabaseadministrator.Manualcheckpointsaregenerallyusedforsmaller
databases. They are updated very less frequently only whentheyaresetby
the database administrator.
Relevance of Checkpoints
checkpointisafeaturethataddsavalueofCin ACID-compliantto RDBMS.
A
A checkpoint is used for recovery if there is an unexpected shutdown in the
database. Checkpoints work on some intervals and write all dirty pages
(modifiedpages)fromlogsrelaytodatafilefromi.efromabuffertoaphysical
disk.Itisalsoknownasthehardeningofdirtypages.Itisadedicatedprocess
and runs automatically by SQLServer at specific intervals. The
synchronizationpointbetweenthedatabaseandtransactionlogisservedwith
a checkpoint.
Advantages of Checkpoints
● Checkpointshelpusinrecoveringthetransactionofthedatabasein
c ase of a random shutdown of the database.
● It enhancing the consistency of the databaseincasewhenmultiple
transactions are executing in the database simultaneously.
● It increasing the data recovery process.
● Checkpoints work as a synchronizationpointbetweenthedatabase
and the transaction log file in the database.
● Checkpoint records in the log file are used to prevent unnecessary
redo operations.
● Since dirty pages are flushed outcontinuouslyinthebackground,it
has a very low overhead and can be done frequently.
● Checkpoints provide the baseline information needed for the
restoration of the lost state in the event of a system failure.
● A database checkpoint keeps track of change information and
enables incremental database backup.
● Adatabasestoragecheckpointcanbemounted,allowingregularfile
system operations to be performed.
● Database checkpoints can be used for application solutions which
include backup, recovery or database modifications.
Disadvantages of Checkpoints
. Database storage checkpoints can only be used to restore from logical
1
errors (E.g. a human error).
. Because all the data blocks are on the same physical device, database
2
storage checkpoints cannot be used to restore files due to a media failure.
Real-Time Applications of Checkpoints
. Backup and Recovery
1
2. Performance Optimization
3. Auditing
Deadlock in DBMS
●
●
●
In a database management system (DBMS), adeadlockoccurswhentwoor
more transactions are waiting for each other to release resources, such as
locks on database objects, that they need tocompletetheiroperations.Asa
result,noneofthetransactionscanproceed,leadingtoasituationwherethey
are stuck or “deadlocked.”
eadlocks can happen in multi-user environments when two or more
D
transactions are running concurrently and try to access the same data in a
different order. When this happens, one transaction may hold a lock on a
resource that another transaction needs, while the second transaction may
old a lock on a resource that the first transaction needs. Both transactions
h
are then blocked, waiting for the other to release the resource they need.
BMSs often use various techniques to detect and resolve deadlocks
D
automatically. These techniques include timeout mechanisms, where a
transaction is forced to release its locks after a certain period of time, and
deadlock detection algorithms, which periodically scanthetransactionlogfor
deadlock cycles and then choose a transaction to abort to resolve the
deadlock.
Itisalsopossibletopreventdeadlocksbycarefuldesignoftransactions,such
as always acquiring locks in the same order or releasing locks as soon as
possible.Properdesignofthedatabaseschemaandapplicationcanalsohelp
to minimize the likelihood of deadlocks
In a database, a deadlock is an unwanted situation in which two or more
transactionsarewaitingindefinitelyforoneanothertogiveuplocks.Deadlock
is said to be one of the most feared complications in DBMS as it brings the
whole system to a Halt.
Example –let us understand the concept of Deadlockwith an example :
Suppose, Transaction T1 holds a lock on some rows in the Students table
and n eeds to updatesome rows in the Grades table. Simultaneously,
Transaction T2holdslocksonthoseveryrows(WhichT1needstoupdate)in
the Grades table b ut needsto update the rows in theStudenttable heldby
Transaction T1.
Now, the main problem arises. Transaction T1 will wait for transaction T2 to
give up the lock, and similarly, transaction T2 will wait for transaction T1 to
give up the lock. As a consequence, All activity comes to a halt and remains at
a standstill forever unless the DBMS detects the deadlock and aborts one of
the transactions.
Deadlock in DBMS
eadlock Avoidance: When a database is stuck in a deadlock, It is always
D
better to avoid the deadlock rather than restarting or aborting the database.
The deadlock avoidance method is suitable for smaller databases whereas
the deadlock prevention method is suitable for larger databases.
One method of avoiding deadlock is using application-consistent logic. In the
above-given example, Transactions that access Students and Grades should
always access the tables in the same order. In this way, in the scenario
described above, Transaction T1 simply waits for transaction T2 to release the
lock on Grades before it begins. When transaction T2 releases the lock,
Transaction T1 can proceed freely.
Another method for avoiding deadlock is to apply both the row-level locking
mechanism and the READ COMMITTED isolation level. However, It does not
guarantee to remove deadlocks completely.
eadlock Detection: W
D hen a transaction waits indefinitelyto obtain a lock,
The database management system should detect whether the transaction is
involved in a deadlock or not.
Wait-for-graphis one of the methods for detectingthe deadlock situation.
This method is suitable for smaller databases. In this method, a graph is
drawn based on the transaction and its lock on the resource. If the graph
created has a closed loop or a cycle, then there is a deadlock.
For the above-mentioned scenario, the Wait-For graph is drawn below:
eadlock prevention: F
D or a large database, the deadlockprevention method
is suitable. A deadlock can be prevented if the resources are allocated in such
a way that a deadlock never occurs. The DBMS analyzes the operations
whether they can create a deadlock situation or not, If they do, that transaction
is never allowed to be executed.
Deadlock prevention mechanism proposes two schemes:
● ait-Die Scheme: In this scheme, If a transactionrequests a
W
resource that is locked by another transaction, then the DBMS
simply checks the timestamp of both transactions and allows the
older transaction to wait until the resource is available for execution.
Suppose, there are two transactions T1 and T2, and Let the
timestamp of any transaction T be TS (T). Now, If there is a lock on
T2 by some other transaction and T1 is requesting resources held by
T2, then DBMS performs the following actions:
Checks if TS (T1) < TS (T2) – if T1 is the older transaction and T2
has held some resource, then it allows T1 to wait until resource is
available for execution. That means if a younger transaction has
locked some resource and an older transaction is waiting for it, then
an older transaction is allowed to wait for it till it is available. If T1 is
an older transaction and has held some resource with it and if T2 is
waiting for it, then T2 is killed and restarted later with random delay
but with the same timestamp. i.e. if the older transaction has held
some resource and the younger transaction waits for the resource,
then the younger transaction is killed and restarted with a very
minute delay with the same timestamp.
This scheme allows the older transaction to wait but kills the younger
one.
● ound Wait Scheme: In this scheme, if an older transaction
W
requests for a resource held by a younger transaction, then an older
transaction forces a younger transaction to kill the transaction and
release the resource. The younger transaction is restarted with a
minute delay but with the same timestamp. If the younger transaction
is requesting a resource that is held by an older one, then the
younger transaction is asked to wait till the older one releases it.
The following table lists the differences between Wait – Die and Wound -Wait
scheme prevention schemes:
Wait – Die Wound -Wait
It is based on a non-preemptive technique. It is based on a preemptive technique.
In this, older transactions must wait for the In this, older transactions never wait
younger one to release its data items. for younger transactions.
The number of aborts and rollbacks is higher In this, the number of aborts and
in these techniques. rollback is lesser.
Distributed Database System
●
●
●
distributed database is basically a database that is not limited to one
A
system, it is spread over different sites, i.e, on multiple computers or over a
network of computers. A distributed database system is located on various
sites that don’t share physical components. This may be required when a
particular database needs to be accessed by various users globally. It needs
to be managed such that for the users it looks like one single database.
ypes:
T
1. Homogeneous Database:
In a homogeneous database, all different sites store database identically. The
operating system, database management system, and the data structures
used – all are the same at all sites. Hence, they’re easy to manage.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different
schema and software that can lead to problems in query processing and
transactions. Also, a particular site might be completely unaware of the other
sites. Different computers may use a different operating system, different
database application. They may even use different data models for the
database. Hence, translations are required for different sites to communicate.
istributed Data Storage :
D
There are 2 ways in which data can be stored on different sites. These are:
1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more
sites. If the entire database is available at all sites, it is a fully redundant
database. Hence, in replication, systems maintain copies of data.
This is advantageous as it increases the availability of data at different sites.
Also, now query requests can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly
updated. Any change made at one site needs to be recorded at every site that
relation is stored or else it may lead to inconsistency. This is a lot of overhead.
Also, concurrency control becomes way more complex as concurrent access
now needs to be checked over a number of sites.
. Fragmentation –
2
In this approach, the relations are fragmented (i.e., they’re divided into smaller
parts) and each of the fragments is stored in different sites where they’re
required. It must be made sure that the fragments are such that they can be
used to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data,
consistency is not a problem.
Fragmentation of relations can be done in two ways:
● orizontal fragmentation – Splitting by rows –
H
The relation is fragmented into groups of tuples so that each tuple is
assigned to at least one fragment.
● Vertical fragmentation – Splitting by columns –
The schema of the relation is divided into smaller schemas. Each
fragment must contain a common candidate key so as to ensure a
lossless join.
In certain cases, an approach that is hybrid of fragmentation and replication is
used.
Applications of Distributed Database:
● It is used in Corporate Management Information System.
● It is used in multimedia applications.
● Used in Military’s control system, Hotel chains etc.
● It is also used in manufacturing control system.