0% found this document useful (0 votes)
4 views64 pages

Chapter 3 (ADB)

Chapter Three of the Advanced Database Systems course discusses transaction processing concepts, including the definition of transactions, their properties (ACID), and the importance of ensuring data integrity. It covers transaction states, operations, concurrency control, and the potential issues that can arise during transaction processing, such as lost updates and dirty reads. The chapter emphasizes the need for effective recovery mechanisms and the role of the recovery manager in maintaining database consistency.

Uploaded by

osatarekegn12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views64 pages

Chapter 3 (ADB)

Chapter Three of the Advanced Database Systems course discusses transaction processing concepts, including the definition of transactions, their properties (ACID), and the importance of ensuring data integrity. It covers transaction states, operations, concurrency control, and the potential issues that can arise during transaction processing, such as lost updates and dirty reads. The chapter emphasizes the need for effective recovery mechanisms and the role of the recovery manager in maintaining database consistency.

Uploaded by

osatarekegn12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Wolaita Sodo University

School of Informatics

Department of Computer Science

Course Title: Advanced Database Systems


Chapter Three
Transaction Processing Concepts

Chapter Contents:
 3.1. Introduction

 3.2. Transaction and System Concepts

 3.3. Properties of Transaction

 3.4. Schedules and Recoverability

 3.5. Serializability of Schedules

 3.6. Transaction Support in SQL


3. Overview

 A transaction is a logical unit of database processing that contains


one or more access operations (read-retrieval, write-insert or update,
delete). It transforms database form one consistent state to another.
 Transaction is an action or a series of actions, carried out by a single
user or application program, which accesses or changes contents of
database (i.e. logical unit of work on the database).
 The effects of all the SQL statements in a transaction can be either
all committed (applied to the database) or all rolled back (undone
from the database) to ensure consistency and data integrity of the
database.
 Generally, change made in real time to a database are called
Transaction.
Examples: ATM transaction, Phone call, academic registration, billing,
hotel check-in and others.
Cont…

A transaction is an atomic unit of work that is either


completed in its entirety or not done at all.
Example: Suppose an employee of bank transfers 800
birr from X's account to Y's account. This small
transaction contains several low-level tasks:

X's Account Y's Account


Open_Account(X) Open_Account(Y)
Old_Balance = X.balance Old_Balance = Y.balance
New_Balance = Old_Balance - New_Balance = Old_Balance + 80
800 0
X.balance = New_Balance Y.balance = New_Balance
Close_Account(X) Close_Account(Y)
Transaction and System Concepts
Thus, transaction can be Started, attempted, Committed or
Aborted via data manipulation commands of SQL.
Every Transaction has two outcomes:
1. Success: transaction commits and DB reaches a new
consistent state. Committed transaction cannot be aborted
or rolled back.
2. Failure: transaction aborts and DB must be restored to
consistent state before it started. Such a transaction is
rolled back or undone and can be restarted later.
❖Either entirely completed or entirely aborted to ensure
consistency and data integrity of the database.
Transaction boundaries:
 Any single transaction in an application program is bounded
with Begin and End statements.
 An application program may contain several transactions
separated by the Begin and End transaction boundaries.
Operations of Transaction:
Following are the main operations of transaction:
 Read(X): Read operation is used to read the value of X from
the database and stores it in a buffer in main memory.
 Write(X): Write operation is used to write the value back to
the database from the buffer.
Cont…
Cont…Example
Cont…
Transaction States

For recovery purposes, the system needs to keep track of


when the transaction starts, terminates, and commits or
aborts.
Transaction states:
• Active state/Begin: indicates the beginning of a transaction
execution.
• Partially committed state: shows the end of read/write operation
but this will not ensure permanent modification on the database.
• Committed state: ensures that all the changes done on a record by
a transaction were done persistently.
• Failed state: happens when a transaction is aborted during its
active state or if one of the rechecking is fails.
• Terminated State: corresponds to the transaction leaving the
system.
Cont…

In a database, the transaction can be in one of the


following states:
Cont…
Recovery manager keeps track of the following
operations:
• begin_transaction: This marks the beginning of transaction
execution.
• read or write: These specify read or write operations on the
database items that are executed as part of a transaction.
• end_transaction: This specifies that read and write
transaction operations have ended and marks the end limit
of transaction execution.
• At this point it may be necessary to check whether the
changes introduced by the transaction can be
permanently applied to the database or whether the
transaction has to be aborted because it violates
concurrency control or for some other reason.
Cont…
 Recovery manager keeps track of the following operations (cont):
• commit_transaction: This signals a successful end of the transaction so
that any changes (updates) executed by the transaction can be safely
committed to the database and will not be undone.
• rollback (abort): This signals that the transaction has ended
unsuccessfully, so that any changes or effects that the transaction may
have applied to the database must be undone.

 Recovery techniques use the following operators:


• undo: Similar to rollback except that it applies to a single
operation rather than to a whole transaction.
• redo: This specifies that certain transaction operations must
be redone to ensure that all the operations of a committed
transaction have been applied successfully to the database.
Cont…
 The System Log
Log or Journal: The log keeps track of all transaction
operations that affect the values of database items.
• This information may be needed to permit recovery from
transaction failures.
• The log is kept on disk, so it is not affected by any type of
failure except for disk or catastrophic failure.
• In addition, the log is periodically backed up to archival
storage (tape) to guard against such catastrophic failures.

➢ Protocols for recovery that avoid cascading rollbacks do not require


that read operations be written to the system log, whereas other
protocols require these entries for recovery.
Cont…
Recovery using log records:
 If the system crashes, we can recover to a consistent
database state by examining the log and using one of
the techniques.
1. Because the log contains a record of every write
operation that changes the value of some database item, it
is possible to undo the effect of these write operations of
a transaction T by tracing backward through the log and
resetting all items changed by a write operation of T to
their old values.
2. We can also redo the effect of the write operations of a
transaction T by tracing forward through the log and
setting all items changed by a write operation of T (that
did not get done permanently) to their new values.
Cont…
Commit Point of a Transaction:
 A Commit Point:
• A transaction T reaches its commit point when all its
operations that access the database have been executed
successfully and the effect of all the transaction
operations on the database has been recorded in the log.
• Beyond the commit point, the transaction is said to be
committed, and its effect is assumed to be permanently
recorded in the database.
• The transaction then writes an entry [commit,T] into the
log.
 Roll Back of transactions:
• Needed for transactions that have a [start_transaction,T]
entry into the log but no commit entry [commit,T] into
the log.
Cont…
Commit Point of a Transaction (cont):
 Redoing transactions:
• Transactions that have written their commit entry in the log
must also have recorded all their write operations in the log;
otherwise they would not be committed, so their effect on the
database can be redone from the log entries. (Notice that the
log file must be kept on disk).
• At the time of a system crash, only the log entries that have
been written back to disk are considered in the recovery
process because the contents of main memory may be lost.
 Force writing a log:
• Before a transaction reaches its commit point, any portion of
the log that has not been written to the disk yet must now be
written to the disk.
• This process is called force-writing the log file before
committing a transaction.
Cont…
Desirable Properties of Transactions

To ensure data integrity , DBMS should maintain the


following ACID properties:
 A=Atomicity: A transaction is an atomic unit of
processing; it is either performed in its entirety or not
performed at all. At the end, either no changes have
occurred to DB or DB has been changed in a consistent
manner. This is materialized by transaction management
component of DBMS.
 C=Consistency (Correctness): A correct execution of
the transaction must take the database from one
consistent state to another. It ensures that if the DB was
in a consistent state before the start of a transaction, then
or termination, the DB will also be in a consistent state.
No violation of integrity constraint. This is because of
when several transaction are being processed concurrently
on a data item they may create many inconsistent
problems.
Example 1:

Suppose that Ti is a transaction that transfer 200


birr from account CA2090 to SB2359 as follows:
Read(CA2090)
CA2090= CA2090-200
Write(CA2090)
Read(SB2359)
SB2359= SB2359+200
Write(SB2359)
Cont…

I=Isolation: indicates that the actions performed by


transaction will be isolated or hidden from outside until the
transaction terminates.
 A transaction should not make its updates visible to other
transactions until it is committed; this property, when enforced
strictly, solves the temporary update problem.
 No other transaction see the change made by a transaction
unless it completes successfully and returns the database to a
new consistent state or already existing consistent state.
 So handling such case is the responsibility of concurrency
control component of the DBMS.
Example:
Cont…
Cont…

 D=Durability or permanency: Once a transaction changes the


database and the changes are committed, these changes must
never be lost. All updates done by a transaction must become
permanent.
 Durability ensures that once the transaction commit (i.e.
update made into a database by transaction are done. Its effect
on the database state cannot be undo or lost even in the event
of subsequent failures such as system crash, HW crash, disk
crash and others).
 Thus, durability is the responsibility of the recovery subsystem
of the DBMS.
Summary of ACID rules
Transaction Processing

 Single-User System:
At most, one user at a time can use the database
management system.
E.g. Personal computer system
 Multiuser System:
Many users can access the DBMS concurrently.
E.g. Air line reservation, Bank and the like system are
operated by many users who submit transaction
concurrently to the system.
This is achieved by multiprogramming, which allows the
computer to execute multiple programs /processes at the
same time.
A multiprogramming is a parallel processing in which the
multiple programs can run simultaneously.
Concurrency
 Concurrency is one of important feature of database which
allows many concurrent users to access the database.
 Interleaving Processing:
➢ Concurrent execution of processes is interleaved in a single
CPU using for example, round robin algorithm.
➢ Advantages:
➢ Keeps CPU busy rather than remaining idle during I/O time.
➢ It will increase system throughput (average no. of transaction
completed within given time).
➢ Prevents long process from delaying their processes (minimize
unpredictable delay in the response time).
 Parallel Processing:
➢ If processes are concurrently executed in multiple CPU’s.
Cont…
Cont…
Why Concurrency Control is needed: Three problems/cases

i. The Lost Update Problem (WW conflict)


This occurs when two transactions that access the same
database items have their operations interleaved in a way that
makes the value of some database item incorrect.
Besides, this is occurred when data is being updated by one
transaction is being overwritten by the update operation of
another transaction.
If this concurrent transaction is uncontrolled, it may lead to
problems, such as an inconsistent database.
Transactions submitted by various users may execute
concurrently and may access and update the same databases.
Cont…
Cont…
ii. II. The Temporary Update or Dirty Read Problem (WR conflict)
 This occurs when one transaction updates a database item and then the
transaction fails (aborts) for some reason. The updated item is accessed by
another transaction before it is changed back to its original value.
 A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now
has values which have never formed part of the stable database.
T1 T2
Read_item(A)
A=A+100 Fig 2: Temporary update problem
Write_item(A)

Read_item(A)
Transaction T2 fails and must
A=A-10 change the values of A back to
Write_item(A) its old value; Meanwhile T1
Abort has read the temporary
incorrect value of A
Commit

❑T2 increases A making it 200 but then aborts the transaction before it is
committed. T1 gets 200, subtracts 10 and make it 190. But the actual balance
should be 90.
III. The incorrect Summary problem (Unrepeatable read)
Cont…

 A transaction T1 reads a record and then does some other processing


during which the transaction T2 updates the record. Now when the
transaction T1 reads the record, then the new value will be
inconsistent with the previous value.
•Example: T1 would like to add the values of A=10, B=20 and C=30. After the
values are read by T1 and before its completion, T2 updates the value of B to be
50. At the end of the execution of the two transactions T1 will come up with the
sum of 60 while it should be 90 since B is updated to 50.

T1 T2
Sum= 0;
Read_item(A)
Sum=Sum+A
Read_item(B)
Sum=Sum+B
Read_item(B)
B=50
Read_item(C)
Sum=Sum+C
What causes a transaction to fail?

 Types of causes that remains a transaction to fail:


1. A computer failure (system crash):
▪ A hardware, software, or network error may cause a
computer to fail. If the hardware crashes, the contents of
the computer’s internal memory may be lost.
2. A transaction or system error:
▪ Some operation in the transaction may cause it to fail, such
as integer overflow or division by zero. Transaction failure
may also occur because of erroneous (wrong) parameter
values or because of a logical programming error.
▪ In addition, the user may interrupt the transaction during its
execution.
Cont…

3. Local errors or exception conditions detected by


the transaction:
Certain conditions forces cancellation of the transaction.
For example, data for the transaction may not be found. such as
insufficient account balance in a banking database, may cause a transaction,
such as a fund withdrawal from that account, to be canceled.
For example, customer A tries to purchase an item online. Assume
that only two such items are available when A starts his transaction
and both of them purchased by other customers. When A reaches
end of the transaction, the items are not available hence the
transaction is cancelled or failed.
❖ A programmed abort in the transaction causes it to fail.
4. Concurrency control enforcement:
▪ The concurrency control method may decide to abort the
transaction, to be restarted later, because it violates serializability
or because several transactions are in a state of deadlock.
▪ If a transaction causes deadlock or violates serializability, then
concurrency control schemes will force the transaction to abort.
5. Disk failure:
▪ Some disk blocks may lose their data because of a read or write
malfunction or because of a disk read/write head crash. This
may happen during a read or a write operation of the
transaction.
6. Physical problems and catastrophes:
▪ This refers to an endless list of problems that includes power or
air-conditioning failure, fire, theft, sabotage, overwriting disks or
tapes by mistake, and mounting of a wrong tape by the
operator.
o Why recovery is needed:
Whenever a transaction is submitted to the DBMS for execution, the
system is responsible for making sure that either all operations in the
transaction to be completed successfully or the transaction has no
effect on the database or any other transaction.
Recovery is needed in case of:
✓ Hardware failure
✓ Program errors/human errors that severely damage data
✓ Viruses
✓ Incorrect commands execution
Concept of Schedules and Serializability

Transaction schedule or history:


 Schedule : a sequence of operations by a set of concurrent
transactions that preserves the order of the operations in each
of the individual transactions.
 When transactions are executing concurrently in an interleaved
fashion, the order of execution of operations from the various
transactions forms what is known as a transaction schedule (or
history).
 Note, however, that operations from other transactions Tj can
be interleaved with the operations of Ti in S.
Eg. Consider the following example:
Sa : r2(X);w2(X);r1(X);w1(X);
Cont…

 A schedule S of n transactions T1, T2, …, Tn;


 It is an ordering of the operations of the transactions subject to the constraint
that, for each transaction Ti that participates in S, the operations of T1 in S must
appear in the same order in which they occur in Ti.
Cont…
Cascading Rollbacks
 Cascading rollback/abort – a single transaction failure leads to a series of transaction
rollbacks.
 An uncommitted transaction has to be rolled back because it read an item from a transaction
that failed.
 This is illustrated in schedule Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2;
 where transaction T2 has to be rolled back because it read item X from T1, and T1 then aborted.
 Consider the following schedule where none of the transactions has yet committed (so the
schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back.


 Can lead to the undoing of a significant amount of work
Cont…
Schedules based on Serializability

 Serial schedule: a schedule where the operations of each transactions are


executed consecutively without any interleaved operations from other
transactions. A Serial schedule is always a serializable schedule because in
serial schedule, a transaction only starts when the other transaction finished
execution. Otherwise, the schedule is called non-serial schedule.
 Non-serial schedule: a schedule the operations from a set of concurrent
transactions are interleaved.
 In general, A serial schedule doesn’t support concurrent execution of transactions
while a non-serial schedule supports concurrency.
 When multiple transactions are running concurrently then there is a
possibility that the database may be left in an inconsistent state.
 Serializability is a concept that helps us to check which schedules are
serializable. A serializable schedule is the one that always leaves the database
in consistent state.
 Serializable schedule: the objective of serializability is to find non-serial
schedules that allow transactions to execute concurrently without interfering
one another, and thereby produce a database stage that could be produced by
a serial execution. That means a non-serial schedule is serializable and
correct if it produces same results as some serial execution.
Cont…

 Serializability is the concept in a transaction that helps to identify which non-


serial schedule is correct and will maintain the database consistency. It relates
to the isolation property of transaction in the database.
 Serializable Schedule
 A schedule whose effect on any consistent database instance is identical to that
of some complete serial schedule over the set of committed transaction in S.
 A serial schedule is always a serializable schedule because any transaction only
starts its execution when another transaction has already completed its
execution.
 A non-serial schedule S is serializable, is equivalent to say that it is correct to
the result of one of serial schedule.
 However, a non-serial schedule of transactions needs to be checked for
serializability.
 Note: If a schedule of concurrent ‘n' transactions can be converted into an
equivalent serial schedule. Then we can say that the schedule is serializable.
And this property is known as serializability.
Cont…
Conflict Serializability
Cont…
Cont…
Cont…

 Conflict Serializability is one of the type of serializability, which can be


used to check whether a non-serial schedule is conflict serializable or not.
View Serializability

 View Serializability is a process to find out that a given schedule is view


serializable or not.
 View equivalent- as long as each read operation of a transaction reads the result
of the same write operation in both schedules.
 As can be seen, view equivalence is also based purely on reads and writes alone.
Cont…
Testing for Serializability
 There is a simple algorithm for determining whether a particular schedule is conflict
serializable or not.
 Consider some schedule of a set of transactions T1, T2, ..., Tn
 Precedence graph /Serialization graph — a directed graph where the vertices/nodes are the
transactions (names).
 We draw an arc from Ti to Tj if the two transactions conflict, and Ti accessed the data item on
which the conflict arose earlier.
 We may label the arc by the item that was accessed.
Example 1

y
Test for Conflict Serializability
 A schedule is conflict serializable if and only if its precedence graph is acyclic (if there is no
cycle).
 If there is a cycle in the precedence graph, a schedule is not (conflict) serializable.
To find an equivalent serial schedule, start with a node that does not have any
incoming edges, and then make sure that the node order for every edge is not
violated.
Benefits of Serializability in DBMS
 Below are the benefits of using the serializable in the database:
1. Predictable execution: In serializable, all the threads of the DBMS are
executed at one time. There are no such surprises in the DBMS. In DBMS, all
the variables are updated as expected, and there is no data loss or
corruption.
2. Easier to Reason about & Debug: In DBMS all the threads are executed
alone, so it is very easier to know about each thread of the database. This can
make the debugging process very easy. So we don't have to worry about the
concurrent process.
3. Reduced Costs: With the help of serializable property, we can reduce the cost
of the hardware that is being used for the smooth operation of the database.
It can also reduce the development cost of the software.
4. Increased Performance: In some cases, serializable executions can perform
better than their non-serializable counterparts since they allow the
developer to optimize their code for performance.
Transaction Support in SQL

 A single SQL statement is always considered to be atomic either it completes


execution without an error or it fails and leaves the database unchanged.
 In SQL, a transaction begins implicitly when particular SQL statements are
encountered.
 A transaction in SQL ends explicitly by:
 Commit - commits current transaction and begins a new one.
 Rollback - causes current transaction to abort.
 Every transaction has certain characteristics attributed to it.
 These characteristics are specified by a SET TRANSACTION statement in
SQL. The characteristics are:
1. The access mode can be specified as READ ONLY or READ WRITE. The
default is READ WRITE.
 A mode of READ WRITE allows select, update, insert, delete, and create
commands to be executed.
 A mode of READ ONLY, as the name implies, is simply for data retrieval.
Cont…
2. The diagnostic area size option: DIAGNOSTIC SIZE n, specifies an
integer value n, which indicates the number of conditions that can be
held simultaneously in the diagnostic area.
 These conditions supply feedback information (errors or exceptions) to
the user or program on the n most recently executed SQL statement.
3. The isolation level option: is specified using the statement
ISOLATION LEVEL <isolation>, where the value for <isolation> can
be:
 READ UNCOMMITTED,
 READ COMMITTED,
 REPEATABLE READ, or
 SERIALIZABLE.
 The default isolation level is SERIALIZABLE, although some systems
use READ COMMITTED as their default.
Cont…
 If a transaction executes at a lower isolation level than SERIALIZABLE,
then one or more of the following three violations may occur:
1) Dirty read - A transaction T1 may read the update of a transaction T2, which
has not yet committed. If T2 fails and is aborted, then T1 would have read a
value that does not exist and is incorrect.
2) Nonrepeatable read - A transaction T1 may read a given value from a table. If
another transaction T2 later updates that value and T1 reads that value again, T1
will see a different value.
3) Phantoms - A transaction T1 may read a set of rows from a table, perhaps
based on some condition specified in the SQL WHERE-clause. Now suppose
that a transaction T2 inserts a new row that also satisfies the WHERE-clause
condition used in T1, into the table used by T1. If T1 is repeated, then T1 will
see a phantom, a row that previously did not exist.
Levels of Consistency in SQL

 Serializable — the most restrictive in that it avoids all three of the problems
mentioned above.
 Repeatable read — only committed records to be read, repeated reads of same
record must return same value. However, a transaction may not be serializable –
it may find some records inserted by a transaction but not find others.
 Read committed — only committed records can be read, but successive reads of
record may return different (but committed) values.
 Read uncommitted — even uncommitted records may be read. So it is the most
forgiving.
End of Chapter Three

Happy Learning!!

You might also like