Module 4 Transaction Management Concurrency Control
Module 4 Transaction Management Concurrency Control
CONCURRENCY CONTROL,
DISTRIBUTED SYSTEM
MODULE 4
• ACID Properties Of A Transaction,
Module 4 2
INTRODUCTION
• In database management systems (DBMS), a transaction is a fundamental concept
• Transactions are essential for handling user requests to access and modify
database contents, ensuring the database remains consistent and reliable despite
Module 4 3
• What does a transaction mean in DBMS?
• It is the result of a request made by the user to access the contents of the
database and perform operations on it.
• It also has some specific properties that must be followed to keep the
database consistent.
Module 4 4
ACID PROPERTIES
• A transaction is a single logical unit of work that accesses and possibly modifies
Module 4 5
• The ACID properties are
• Atomicity
• Consistency
• Isolation
• Durability
Module 4 6
• Atomicity:
• This is either the entire transaction takes place at once or doesn’t happen at all.
• Each transaction is considered as one unit and either runs to completion or is not
executed at all.
• — Abort : If a transaction aborts, changes made to the database are not visible.
• This means that integrity constraints must be maintained so that the database is
• Each transaction, run by itself with no concurrent execution of other transactions, must
• This property ensures that multiple transactions can occur concurrently without leading
transaction until that particular change in that transaction is written to memory or has
been committed.
• This property ensures that the execution of transactions concurrently will result in a
state that is equivalent to a state achieved these were executed serially in some order.
Module 4 9
• Durability:
• This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk
and they persist even if a system failure occurs.
• These updates now become permanent and are stored in non-volatile memory.
Module 4 10
• Consistency and Isolation
• That is, the user who submits a transaction must ensure that when run to
completion by itself against a ‘consistent’ database instance, the transaction
will leave the database in a ‘consistent’ state.
Module 4 11
• The isolation property is ensured by guaranteeing that even though actions of several
transactions might be interleaved, the net effect is identical to executing all
transactions one after the other in some serial order
• Second, the system may crash while one or more transactions are in progress.
• Thus a DBMS must find a way to remove the effects of partial transactions
from the database, that is, it must ensure transaction atomicity: either all of a
transaction’s actions are carried out, or none are.
• This means that users can ignore incomplete transactions in thinking about how
the database is modified by transactions over time
Module 4 14
• To be able to do this, the DBMS maintains a record, called the log, of all
writes to the database.
• The log is also used to ensure durability: if the system crashes before the
changes made by a completed transaction are written to disk, the log is used
to remember and restore these changes when the system restarts.
• The DBMS component that ensures atomicity and durability is called the
recovery manager.
Module 4 15
• Advantages of ACID properties in DBMS
• Data consistency: ACID properties ensure that the data remains consistent and
accurate after any transaction execution.
• Data integrity: acid properties maintain the integrity of the data by ensuring
that any changes to the database are permanent and cannot be lost.
• Concurrency control: acid properties help to manage multiple transactions
occurring concurrently by preventing interference between them.
• Recovery: acid properties ensure that in case of any failure or crash, the
system can recover the data up to the point of failure or crash.
Module 4 16
• Disadvantages of ACID properties in DBMS
• Performance: the ACID properties can cause a performance overhead in the
system, as they require additional processing to ensure data consistency and
integrity.
• Scalability: the acid properties may cause scalability issues in large
distributed systems where multiple transactions occur concurrently.
• Complexity: implementing the acid properties can increase the complexity of
the system and require significant expertise and resources.
Module 4 17
TRANSACTIONS AND SCHEDULES
• Transactions are a list of actions.
• The actions that can be executed by a transaction include reads and writes of
database objects.
• A Transaction can also be defined as a set of actions that are partially ordered.
• That is, the relative order of some of the actions may not be important.
Module 4 18
• To reading and writing, each transaction must specify as its final action either
commit (i.e., complete successfully) or abort (i.e., terminate and undo all the
actions carried out thus far)
Module 4 19
• A schedule that contains either an abort or a commit for each transaction
whose actions are listed in it is called a complete schedule.
• A complete schedule must contain all the actions of every transaction that
appears in it.
Module 4 20
Module 4 21
CONCURRENT EXECUTION
Module 4 22
• Motivation for Concurrent Execution
• First, while one transaction is waiting for a page to be read in from disk, the CPU
can process another transaction.
• This Is because I/O activity can be done in parallel with CPU activity in a computer.
• Overlapping I/O and CPU activity reduces the amount of time disks and processors
are idle, and increases system throughput (the average number of transactions
completed in A given time).
Module 4 23
• Second, interleaved execution of a short transaction with a long
transaction usually allows the short transaction to complete quickly.
Module 4 24
SERIALIZABILITY
• A concurrent schedule must ensure it is the same as if executed serially means one
after another.
• It refers to the sequence of actions such as read, write, abort, commit are
performed in a serial manner.
Module 4 25
• Let’s take two transactions T1 and T2,
• If both transactions are performed without interfering each other then it is called
as serial schedule, it can be represented as follows −
Module 4 26
• A serializable schedule over a set S of committed transactions is a schedule whose
effect on any consistent database instance is guaranteed to be identical to that of
some complete serial schedule over S.
• That is, the database instance that results from executing the given schedule is
identical to the database instance that results from executing the transactions in
some serial order.
Module 4 27
• Executing the transactions serially in different orders may produce different
results, but all are presumed to be acceptable; the DBMS makes no guarantees
about which of them will be the outcome of an interleaved execution.
• The above definition of a serializable schedule does not cover the case of
schedules containing aborted transactions.
• If a transaction computes a value and prints it to the screen, this is an ‘effect’ that
is not directly captured in the state of the database.
• Assume that all such values are also written into the database.
Module 4 28
• Some Anomalies Associated with Interleaved Execution
• Two actions on the same data object conflict if at least one of them is a write.
• The three anomalous situations can be described in terms of when the actions
of two transactions T1 and T2 conflict with each other:
• Write-Read (WR)
• Write-Write (WW)
Module 4 29
• 1. Reading Uncommitted Data (WR Conflicts)
Module 4 30
• Consider two transactions T1 and T2, each of which, run alone, preserves database
consistency: T1 transfers $100 from A to B, and T2 increments both A and B by 6 percent.
• Suppose that their actions are interleaved so that (1) the account transfer program T1
deducts $100 from account A, then (2) the interest deposit program T2 reads the current
values of accounts A and B and adds 6 percent interest to each, and then (3) the account
transfer program credits $100 to account B.
• The result of this schedule is different from any result that would get by running one of the
two transactions first and then the other.
• The problem can be traced to the fact that the value of A written by T1 is read by T2 before
T1 has completed all its changes
Module 4 31
Module 4 32
• 2. Unrepeatable reads (RW conflicts)
• A transaction T2 could change the value of an object A that has been read by
a transaction T1, while T1 is still in progress.
• First, if T1 tries to read the value of A again, it will get a different result, even
though It has not modified A in the meantime.
Module 4 33
• Second, suppose that both T1 and T2 read the same value of A, say, 5, and
then T1, which wants to increment A by 1, changes it to 6, and T2, which wants
to decrement A by 1, decrements the value that it read (i.e., 5) and changes A
to 4.
• Running these transactions in any serial order should leave A with a final value
of 5; thus, the Interleaved execution leads to an inconsistent state.
• The underlying problem here is that although T2’s change is not directly read
by T1, it invalidates T1’s assumption about the value of A, which is the basis
for some of T1’s subsequent actions.
Module 4 34
• 3. Overwriting Uncommitted Data (WW Conflicts)
• Even if T2 does not read the value of A written by T1, a potential problem
exists.
Module 4 35
• Suppose that Harry and Larry are two employees, and their salaries must be kept
equal.
• Transaction T1 sets their salaries to $1,000 and transaction T2 sets their salaries
to $2,000.
• If we execute these in the serial order T1 followed by T2, both receive the salary
$2,000; the serial order T2 followed by T1 gives each the salary $1,000.
• Notice that neither transaction reads a salary value before writing it—such a
write is called a blind write.
Module 4 36
• Schedules involving aborted transactions
Module 4 37
• Suppose that (1) an account transfer program T1 deducts $100 from account A,
then (2) an interest deposit program T2 reads the current values of accounts A and
B and adds 6 percent interest to each, then commits, and then (3) T1 is aborted.
• Now, T2 has read a value for A that should never have been there.
• If T2 had not yet committed, we could deal with the situation by cascading the
abort of T1 and also aborting T2; this process would recursively abort any
transaction that read data written by T2, and so on.
• But T2 has already committed, and so we cannot undo its actions! We say that such
a schedule is unrecoverable.
Module 4 38
• A recoverable schedule is one in which transactions commit only after (and if!) all
transactions whose changes they read commit.
• If transactions read only the changes of committed transactions, not only is the
schedule recoverable, but also aborting a transaction can be accomplished
without cascading the abort to other transactions.
Module 4 39
• Suppose that a transaction T2 overwrites the value of an object A that has been modified
by a transaction T1, while T1 is still in progress, and T1 subsequently aborts.
• All of T1’s changes to database objects are undone by restoring the value of any object
that it modified to the value of the object before T1’s changes.
• When T1 is aborted, and its changes are undone in this manner, T2’s changes are lost as
well, even if T2 decides to commit.
• So, for example, if A originally had the value 5, then was changed by T1 to 6, and by T2
to 7, if T1 now aborts, the value of A becomes 5 again.
• The most widely used locking protocol, called strict two-phase locking, or strict
2PL, has two rules.
2. All locks held by a transaction are released when the transaction is completed.
Module 4 42
• A transaction that has an exclusive lock can also read the object; an additional shared
lock is not required.
• A transaction that requests a lock is suspended until the DBMS is able to grant it the
requested lock.
• The DBMS keeps track of the locks it has granted and ensures that if a transaction
holds an exclusive lock on an object, no other transaction holds a shared or exclusive
lock on the same object.
• After the execution, requests to acquire and release locks can be automatically
inserted into transactions by the DBMS.
Module 4 43
• The locking protocol allows only ‘safe’ interleaving of transactions.
• If two transactions access completely independent parts of the database, they will
be able to concurrently obtain the locks that they need and proceed merrily on
their ways.
• On the other hand, if two transactions access the same object, and one of them
wants to modify it, their actions are effectively ordered serially—all actions of
one of these transactions (the one that gets the lock on the common object first)
are completed before (this lock is released and) the other transaction can
proceed.
Module 4 44
• The action of a transaction T requesting a shared (exclusive) lock on object O as
ST (O) (XT (O))
• Consider
Module 4 45
• This interleaving could result in a state that cannot result from any serial execution
of the three transactions.
• For instance, T1 could change A from 10 to 20, then T2 (which reads the value 20
for A) could change B from 100 to 200, and then T1 would read the value 200
for B.
• If run serially, either T1 or T2 would execute first, and read the values 10 for A
and 100 for B: clearly, the interleaved execution is not equivalent to either serial
execution
Module 4 46
• If the strict 2PL protocol is used, the above interleaving is disallowed.
• Assuming that the transactions proceed at the same relative speed as before, T1
would obtain an exclusive lock on a first and then read and write A.
Module 4 47
• Then, T2 would request a lock on A.
• However, this request cannot be granted until T1 releases its exclusive lock on
A, and the DBMS therefore suspends T2.
Module 4 48
Module 4 49
• In general, the actions of different transactions could be interleaved.
• Consider the interleaving of two transactions, which is permitted by the strict 2PL protocol
Module 4 50
INTRODUCTION TO CRASH RECOVERY
• The recovery manager of a DBMS is responsible for ensuring transaction atomicity and
durability.
• It ensures atomicity by undoing the actions of transactions that do not commit and
durability by making sure that all actions of committed transactions survive system
crashes and media failures.
• When a DBMS is restarted after crashes, the recovery manager is given control and must
bring the database to a consistent state.
• The recovery manager is also responsible for undoing the actions of an aborted
Module 4 51
transaction
• The transaction manager of a DBMS controls the execution of transactions.
• Before reading and writing objects during normal execution, locks must be
acquired (and released at some later time) according to a chosen locking protocol.
• This implies that the system does not crash while a write is in progress and is
unrealistic.
• In practice, disk writes do not have this property, and steps must be taken during
restart after a crash to verify that the most recent write to a given page was
completed successfully and to deal with the consequences if not.
Module 4 52
Stealing Frames And Forcing Pages
• With respect to writing objects, two additional questions arise:
• Such writes are executed when another transaction wants to bring in a page and the
buffer manager chooses to replace the page containing O; of course, this page must
have been unpinned by T.
• If such writes are allowed, we say that a steal approach is used. (the second
transaction ‘steals’ a frame from T.)
Module 4 53
2. When a transaction commits, must we ensure that all the changes it has made to
objects in the buffer pool are immediately forced to disk?
Module 4 55
• The force approach results in excessive page I/O costs.
• With a no-force approach, on the other hand, the in memory copy of the page
would be successively modified and written to disk just once, reflecting the effects
of all 20 updates, when the page is eventually replaced in the buffer pool (in
accordance with the buffer manager’s page replacement policy).
Module 4 56
• For these reasons, most systems use a steal, no-force approach.
• Thus, if a frame is dirty and chosen for replacement, the page it contains is written
to disk even if the modifying transaction is still active (steal); in addition, pages in
the buffer pool that are modified by a transaction are not forced to disk when the
transaction commits (no-force).
Module 4 57
Recovery-Related Steps During Normal Execution
• The recovery manager of a DBMS maintains some information during normal execution of
transactions in order to enable it to perform its task in the event of a failure. In particular,
a log of all modifications to the database is saved on stable storage, which is guaranteed
(with very high probability) to survive crashes and media failures.
• It is important to ensure that the log entries describing a change to the database are
written to stable storage before the change is made; otherwise, the system might crash just
Module 4 58
after the change, leaving us without a record of the change
• The log enables the recovery manager to undo the actions of aborted and
incomplete transactions and to redo the actions of committed transactions.
• For example, a transaction that committed before the crash may have made
updates to a copy (of a database object) in the buffer pool, and this change may
not have been written to disk before the crash, because of a no-force approach.
• Such changes must be identified using the log, and must be written to disk.
• Further, changes of transactions that did not commit prior to the crash might have
been written to disk because of a steal approach. Such changes must be
identified using the log and then undone
Module 4 59
OVERVIEW OF ARIES
• Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is a recovery
algorithm that is designed to work with a steal, no-force approach.
• When the recovery manager is invoked after a crash, restart proceeds in three phases:
1. Analysis: identifies dirty pages in the buffer pool (i.e., Changes that have not been
written to disk) and active transactions at the time of the crash.
2. Redo: repeats all actions, starting from an appropriate point in the log, and restores the
database state to what it was at the time of the crash.
3. Undo: undoes the actions of transactions that did not commit, so that the database
reflects
Module 4
only the actions of committed transactions. 60
DEADLOCKS
• Now, T1 is waiting for T2 to release its lock and T2 is waiting for T1 to release its lock.
• The DBMS must either prevent or detect (and resolve) such deadlock situations.
Module 4 61
Deadlock Prevention
• We can prevent deadlocks by giving each transaction a priority and ensuring that
lower priority transactions are not allowed to wait for higher priority transactions (or
vice versa).
• One way to assign priorities is to give each transaction a timestamp when it starts up.
• The lower the timestamp, the higher the transaction’s priority, that is, the oldest
transaction has the highest priority.
Module 4 62
• If a transaction Ti requests a lock and transaction Tj holds a conflicting lock, the lock
manager can use one of the following two policies:
• In the wait-die scheme, lower priority transactions can never wait for higher priority
transactions.
• In the wound-wait scheme, higher priority transactions never wait for lower priority
transactions.
•Module
In 4either case no deadlock cycle can develop. 63
• When a transaction is aborted and restarted, it should be given the same
timestamp that it had originally.
• Reissuing timestamps in this way ensures that each transaction will eventually
become the oldest transaction, and thus the one with the highest priority, and will
get all the locks that it requires
Module 4 64
• The wait-die scheme is non-preemptive; only a transaction requesting a lock can
be aborted.
• As a transaction grows older (and its priority increases), it tends to wait for more
and more younger transactions.
• Rather than taking measures to prevent deadlocks, it may be better to detect and
resolve deadlocks as they arise.
• In the detection approach, the DBMS must periodically check for deadlocks.
Module 4 66
• When a transaction Ti is suspended because a lock that it requests cannot be
granted, it must wait until all transactions Tj that currently hold conflicting locks
release them.
• The lock manager adds edges to this graph when it queues lock requests and
removes edges when it grants lock requests.
Module 4 67
• Consider the schedule shown in figure, the last step, shown below the line, creates
a cycle in the waits-for graph
Module 4 68
Module 4 69
• The waits-for graph describes all active transactions, some of which will
eventually abort.
• The waits-for graph is periodically checked for cycles, which indicate deadlock.
Module 4 70
• As an alternative to maintaining a waits-for graph, a simplistic way to identify
deadlocks is to use a timeout mechanism:
• If a transaction has been waiting too long for a lock, assume that it is in a
deadlock cycle and abort it.
Module 4 71
DISTRIBUTED DATABASES
Module 4 72
INTRODUCTION
• A distributed database is a database that stores data across multiple locations,
such as computers or servers, instead of in one place.
• Data in a distributed database system is stored across several sites, and each site
is typically managed by a DBMS that can run independently of the other sites.
• The classical view of a distributed database system is that the system should make
the impact of data distribution transparent.
Module 4 73
Module 4 74
Properties
• Distributed Data Independence:
• Users should be able to ask queries without specifying where the referenced
relations, or copies or fragments of the relations, are located.
• Users should be able to write transactions that access and update data at
several sites just as they would write transactions over purely local data.
• The effects of a transaction across sites should continue to be atomic; that is,
all changes persist if the transaction commits, and none persist if it aborts
Module 4 76
Types
• Homogeneous
• All sites use the same database management system (DBMS), data structure, and
operating system. This makes it easier to manage because of the uniformity of the
software and hardware. It requires a commitment to a single DBMS across all sites.
• Different sites use different DBMS software, but there is additional common
software to support data exchange between these sites. For example, various
library database systems use the same machine-readable cataloging format.
Module 4 77
DISTRIBUTED DBMS ARCHITECTURES
• Client-server,
• Middleware.
Module 4 78
• 1.Client-server systems
• A client-server system has one or more client processes and one or more
server processes, and a client process can send a query to any one server
process.
• Clients are responsible for user-interface issues; servers manage data and
execute transactions.
• Thus, a client process could run on a personal computer and send queries to a
server running on a mainframe
Module 4 79
• Reasons for the popularity of this architecture.
2. Expensive server machines are not underutilized when dealing with mundane
user interactions, and they are now relegated to inexpensive client machines.
3. Users can run a graphical user interface that they are familiar with, rather than
the (possibly unfamiliar and unfriendly) user interface on the server
Module 4 80
• The boundary between the client and the server and to keep the communication
between them as set-oriented as possible.
• In particular, opening a cursor and fetching tuples one at a time generates many
messages and should be avoided.
• (Even if we fetch several tuples and cache them at the client, messages must be
exchanged when the cursor is advanced to ensure that the current row is locked.)
Module 4 81
• Drawbacks
• The client-server architecture does not allow a single query to span multiple
servers because the client process would have to be capable of breaking such
a query into appropriate subqueries to be executed at different sites and
then piecing together the answers to the subqueries.
• The client process would thus be quite complex, and its capabilities would
begin to overlap with the server; distinguishing between clients and servers
becomes harder.
Module 4 82
• 2. Collaborating Server Systems
Module 4 83
• When a server receives a query that requires access to data at other servers,
it generates appropriate subqueries to be executed by other servers and puts
the results together to compute answers to the original query.
Module 4 84
• Middleware Systems
Module 4 85
• The idea is that we need just one database server that is capable of
managing queries and transactions spanning multiple servers; the remaining
servers only need to handle local queries and transactions.
Module 4 87
• Fragmentation
Module 4 88
Module 4 89
• Horizontal fragmentation: the union of the horizontal fragments must be equal to
the original relation. Fragments are usually also required to be disjoint.
Module 4 90
• Replication
• For example, if a relation r is fragmented into r1, r2, and r3, there might be
just one copy of r1, whereas r2 is replicated at two other sites and r3 is
replicated at all sites.
Module 4 91
• The motivation for replication is twofold:
• Faster query evaluation: queries can execute faster by using a local copy of
a relation instead of going to a remote site
Module 4 92
• There are two kinds of replication, called
• Which differ primarily in how replicas are kept current when the relation is
modified.
Module 4 93