0% found this document useful (0 votes)
10 views93 pages

Module 4 Transaction Management Concurrency Control

Module 4 covers transaction management in database systems, focusing on ACID properties (Atomicity, Consistency, Isolation, Durability) that ensure reliable database operations. It discusses concurrency control, including serializability and anomalies from interleaved execution, as well as lock-based mechanisms and deadlock handling. The module also highlights the advantages and disadvantages of ACID properties, emphasizing their importance in maintaining data integrity and consistency during concurrent transactions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views93 pages

Module 4 Transaction Management Concurrency Control

Module 4 covers transaction management in database systems, focusing on ACID properties (Atomicity, Consistency, Isolation, Durability) that ensure reliable database operations. It discusses concurrency control, including serializability and anomalies from interleaved execution, as well as lock-based mechanisms and deadlock handling. The module also highlights the advantages and disadvantages of ACID properties, emphasizing their importance in maintaining data integrity and consistency during concurrent transactions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

TRANSACTION MANAGEMENT,

CONCURRENCY CONTROL,
DISTRIBUTED SYSTEM
MODULE 4
• ACID Properties Of A Transaction,

• Concurrent Execution Of Transactions: Serialisability, Anomalies Due To


Interleaved Execution, Schedules Involving Aborted Transactions,

• Lock-based Concurrency Control: Strict Two-phase Locking (Strict 2PL),

• Deadlocks. Introduction To Crash Recovery: Stealing Frames And Forcing Pages,


Overview Of ARIES. Dealing With Deadlocks.

Module 4 2
INTRODUCTION
• In database management systems (DBMS), a transaction is a fundamental concept

representing a set of logically related operations executed as a single unit.

• Transactions are essential for handling user requests to access and modify

database contents, ensuring the database remains consistent and reliable despite

various operations and potential interruptions.

Module 4 3
• What does a transaction mean in DBMS?

• Transaction can be defined as a set of logically related operations.

• It is the result of a request made by the user to access the contents of the
database and perform operations on it.

• It consists of various operations and has various states in its completion


journey.

• It also has some specific properties that must be followed to keep the
database consistent.
Module 4 4
ACID PROPERTIES
• A transaction is a single logical unit of work that accesses and possibly modifies

the contents of a database.

• Transactions access data using read-and-write operations.

• To maintain consistency in a database, certain properties are followed before

and after the transaction.

• These are called ACID properties.

Module 4 5
• The ACID properties are

• Atomicity

• Consistency

• Isolation

• Durability

Module 4 6
• Atomicity:

• This is either the entire transaction takes place at once or doesn’t happen at all.

• There is no midway i.e. transactions do not occur partially.

• Each transaction is considered as one unit and either runs to completion or is not

executed at all.

• It involves the following two operations.

• — Abort : If a transaction aborts, changes made to the database are not visible.

• — Commit : If a transaction commits, changes made are visible.


Module 4 7
• Atomicity is also known as the ‘All or nothing rule’.
• Consistency:

• This means that integrity constraints must be maintained so that the database is

consistent before and after the transaction.

• Each transaction, run by itself with no concurrent execution of other transactions, must

preserve the consistency of the database, called consistency

• The DBMS assumes that it holds for each transaction.

• Ensuring this property of a transaction is the responsibility of the user

• It refers to the correctness of a database.


Module 4 8
• Isolation:

• This property ensures that multiple transactions can occur concurrently without leading

to the inconsistency of the database state.

• Transactions occur independently without interference.

• Changes occurring in a particular transaction will not be visible to any other

transaction until that particular change in that transaction is written to memory or has

been committed.

• This property ensures that the execution of transactions concurrently will result in a

state that is equivalent to a state achieved these were executed serially in some order.
Module 4 9
• Durability:

• This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk
and they persist even if a system failure occurs.

• These updates now become permanent and are stored in non-volatile memory.

• The effects of the transaction, thus, are never lost.

Module 4 10
• Consistency and Isolation

• Users are responsible for ensuring transaction consistency.

• That is, the user who submits a transaction must ensure that when run to
completion by itself against a ‘consistent’ database instance, the transaction
will leave the database in a ‘consistent’ state.

• Transactions of Bank accounts

Module 4 11
• The isolation property is ensured by guaranteeing that even though actions of several
transactions might be interleaved, the net effect is identical to executing all
transactions one after the other in some serial order

• if two transactions T1 and T2 are executed concurrently, the net effect is


guaranteed to be equivalent to executing T1 followed by executing T2 or
executing T2 followed by executing T1.

• If each transaction maps a consistent database instance to another consistent


database instance, executing several transactions one after the other, will also
result in a consistent final database instance
Module 4 12
• Atomicity and durability

• Transactions can be incomplete for three kinds of reasons.

• First, a transaction can be aborted, or terminated unsuccessfully, by the DBMS


because some anomaly arises during execution.

• If a transaction is aborted by the DBMS for some internal reason, It is


automatically restarted and executed anew.

• Second, the system may crash while one or more transactions are in progress.

• Third, a transaction may encounter an unexpected situation and decide to


abort (i.e., terminate itself).
Module 4 13
• A transaction that is Interrupted in the middle may leave the database in an
inconsistent state.

• Thus a DBMS must find a way to remove the effects of partial transactions
from the database, that is, it must ensure transaction atomicity: either all of a
transaction’s actions are carried out, or none are.

• A DBMS ensures transaction atomicity by undoing the actions of incomplete


transactions.

• This means that users can ignore incomplete transactions in thinking about how
the database is modified by transactions over time
Module 4 14
• To be able to do this, the DBMS maintains a record, called the log, of all
writes to the database.

• The log is also used to ensure durability: if the system crashes before the
changes made by a completed transaction are written to disk, the log is used
to remember and restore these changes when the system restarts.

• The DBMS component that ensures atomicity and durability is called the
recovery manager.

Module 4 15
• Advantages of ACID properties in DBMS
• Data consistency: ACID properties ensure that the data remains consistent and
accurate after any transaction execution.
• Data integrity: acid properties maintain the integrity of the data by ensuring
that any changes to the database are permanent and cannot be lost.
• Concurrency control: acid properties help to manage multiple transactions
occurring concurrently by preventing interference between them.
• Recovery: acid properties ensure that in case of any failure or crash, the
system can recover the data up to the point of failure or crash.

Module 4 16
• Disadvantages of ACID properties in DBMS
• Performance: the ACID properties can cause a performance overhead in the
system, as they require additional processing to ensure data consistency and
integrity.
• Scalability: the acid properties may cause scalability issues in large
distributed systems where multiple transactions occur concurrently.
• Complexity: implementing the acid properties can increase the complexity of
the system and require significant expertise and resources.

Module 4 17
TRANSACTIONS AND SCHEDULES
• Transactions are a list of actions.

• A transaction is seen by the DBMS as a series, or list, of actions.

• The actions that can be executed by a transaction include reads and writes of
database objects.

• A Transaction can also be defined as a set of actions that are partially ordered.

• That is, the relative order of some of the actions may not be important.

Module 4 18
• To reading and writing, each transaction must specify as its final action either
commit (i.e., complete successfully) or abort (i.e., terminate and undo all the
actions carried out thus far)

• A schedule is a list of actions (reading, writing, aborting, or committing) from a


set of transactions, and the order in which two actions of a transaction T appear
in a schedule must be the same as the order in which they appear in T.

• A schedule represents an actual or potential execution sequence.

Module 4 19
• A schedule that contains either an abort or a commit for each transaction
whose actions are listed in it is called a complete schedule.

• A complete schedule must contain all the actions of every transaction that
appears in it.

• If the actions of different transactions are not interleaved—that is,


transactions are executed from start to finish, one by one—call the
schedule a serial schedule.

Module 4 20
Module 4 21
CONCURRENT EXECUTION

• Concurrent execution in a DBMS refers to the capacity to carry out numerous


transactions simultaneously in a shared database.

• A collection of database activities known as a transaction, such as inserting,


updating, or removing data, is carried out as a single unit of work.

• Concurrent execution allows many transactions to access the same data


concurrently, which can have numerous benefits, such as higher system
throughput and reaction time

Module 4 22
• Motivation for Concurrent Execution

• Ensuring transaction isolation while permitting concurrent execution is difficult, but is


necessary for performance reasons.

• First, while one transaction is waiting for a page to be read in from disk, the CPU
can process another transaction.

• This Is because I/O activity can be done in parallel with CPU activity in a computer.

• Overlapping I/O and CPU activity reduces the amount of time disks and processors
are idle, and increases system throughput (the average number of transactions
completed in A given time).
Module 4 23
• Second, interleaved execution of a short transaction with a long
transaction usually allows the short transaction to complete quickly.

• In serial execution, a short transaction could get stuck behind a long


transaction leading to unpredictable delays in response time, or average
time taken to complete a transaction.

Module 4 24
SERIALIZABILITY

• A schedule is serialized if it is equivalent to a serial schedule.

• A concurrent schedule must ensure it is the same as if executed serially means one
after another.

• It refers to the sequence of actions such as read, write, abort, commit are
performed in a serial manner.

Module 4 25
• Let’s take two transactions T1 and T2,

• If both transactions are performed without interfering each other then it is called
as serial schedule, it can be represented as follows −

Module 4 26
• A serializable schedule over a set S of committed transactions is a schedule whose
effect on any consistent database instance is guaranteed to be identical to that of
some complete serial schedule over S.

• That is, the database instance that results from executing the given schedule is
identical to the database instance that results from executing the transactions in
some serial order.

• There are some important points to note in this definition:

Module 4 27
• Executing the transactions serially in different orders may produce different
results, but all are presumed to be acceptable; the DBMS makes no guarantees
about which of them will be the outcome of an interleaved execution.

• The above definition of a serializable schedule does not cover the case of
schedules containing aborted transactions.

• If a transaction computes a value and prints it to the screen, this is an ‘effect’ that
is not directly captured in the state of the database.

• Assume that all such values are also written into the database.

Module 4 28
• Some Anomalies Associated with Interleaved Execution

• Two actions on the same data object conflict if at least one of them is a write.

• The three anomalous situations can be described in terms of when the actions
of two transactions T1 and T2 conflict with each other:

• Write-Read (WR)

• Read-Write (RW) and

• Write-Write (WW)

Module 4 29
• 1. Reading Uncommitted Data (WR Conflicts)

• The first source of anomalies is that a transaction T2 could read a


database object A that has been modified by another transaction
T1, which has not yet committed.

• Such a read is called a dirty read.

Module 4 30
• Consider two transactions T1 and T2, each of which, run alone, preserves database
consistency: T1 transfers $100 from A to B, and T2 increments both A and B by 6 percent.

• Suppose that their actions are interleaved so that (1) the account transfer program T1
deducts $100 from account A, then (2) the interest deposit program T2 reads the current
values of accounts A and B and adds 6 percent interest to each, and then (3) the account
transfer program credits $100 to account B.

• The result of this schedule is different from any result that would get by running one of the
two transactions first and then the other.

• The problem can be traced to the fact that the value of A written by T1 is read by T2 before
T1 has completed all its changes
Module 4 31
Module 4 32
• 2. Unrepeatable reads (RW conflicts)

• A transaction T2 could change the value of an object A that has been read by
a transaction T1, while T1 is still in progress.

• This situation causes two problems.

• First, if T1 tries to read the value of A again, it will get a different result, even
though It has not modified A in the meantime.

• This situation could not arise in a serial execution of two transactions; it is


called an unrepeatable read.

Module 4 33
• Second, suppose that both T1 and T2 read the same value of A, say, 5, and
then T1, which wants to increment A by 1, changes it to 6, and T2, which wants
to decrement A by 1, decrements the value that it read (i.e., 5) and changes A
to 4.

• Running these transactions in any serial order should leave A with a final value
of 5; thus, the Interleaved execution leads to an inconsistent state.

• The underlying problem here is that although T2’s change is not directly read
by T1, it invalidates T1’s assumption about the value of A, which is the basis
for some of T1’s subsequent actions.
Module 4 34
• 3. Overwriting Uncommitted Data (WW Conflicts)

• A transaction T2 could overwrite the value of an object A, which has already


been modified by a transaction T1, while T1is still in progress.

• Even if T2 does not read the value of A written by T1, a potential problem
exists.

Module 4 35
• Suppose that Harry and Larry are two employees, and their salaries must be kept
equal.

• Transaction T1 sets their salaries to $1,000 and transaction T2 sets their salaries
to $2,000.

• If we execute these in the serial order T1 followed by T2, both receive the salary
$2,000; the serial order T2 followed by T1 gives each the salary $1,000.

• Either of these is acceptable from a consistency standpoint.

• Notice that neither transaction reads a salary value before writing it—such a
write is called a blind write.
Module 4 36
• Schedules involving aborted transactions

• A serializable schedule over a set S of transactions is a schedule whose effect


on any consistent database instance is guaranteed to be identical to that of
some complete serial schedule over the set of committed transactions in S.

• This definition of serializability relies on the actions of aborted transactions


being undone completely, which may be impossible in some situations.

Module 4 37
• Suppose that (1) an account transfer program T1 deducts $100 from account A,
then (2) an interest deposit program T2 reads the current values of accounts A and
B and adds 6 percent interest to each, then commits, and then (3) T1 is aborted.

• Now, T2 has read a value for A that should never have been there.

• If T2 had not yet committed, we could deal with the situation by cascading the
abort of T1 and also aborting T2; this process would recursively abort any
transaction that read data written by T2, and so on.

• But T2 has already committed, and so we cannot undo its actions! We say that such
a schedule is unrecoverable.
Module 4 38
• A recoverable schedule is one in which transactions commit only after (and if!) all
transactions whose changes they read commit.

• If transactions read only the changes of committed transactions, not only is the
schedule recoverable, but also aborting a transaction can be accomplished
without cascading the abort to other transactions.

• Such a schedule is said to avoid cascading aborts

Module 4 39
• Suppose that a transaction T2 overwrites the value of an object A that has been modified
by a transaction T1, while T1 is still in progress, and T1 subsequently aborts.

• All of T1’s changes to database objects are undone by restoring the value of any object
that it modified to the value of the object before T1’s changes.

• When T1 is aborted, and its changes are undone in this manner, T2’s changes are lost as
well, even if T2 decides to commit.

• So, for example, if A originally had the value 5, then was changed by T1 to 6, and by T2
to 7, if T1 now aborts, the value of A becomes 5 again.

• Even if T2 commits, its change to A is inadvertently lost.


Module 4 40
LOCK-BASED CONCURRENCY CONTROL
• A DBMS must be able to ensure that only serializable, recoverable schedules are
allowed, and that no actions of committed transactions are lost while undoing
aborted transactions.

• A DBMS typically uses a locking protocol to achieve this.

• A locking protocol is a set of rules to be followed by each transaction (and


enforced by the DBMS), in order to ensure that even though actions of several
transactions might be interleaved, the net effect is identical to executing all
transactions in some serial order.
Module 4 41
STRICT TWO-PHASE LOCKING (STRICT 2PL)

• The most widely used locking protocol, called strict two-phase locking, or strict
2PL, has two rules.

1. If a transaction T wants to read/ modify an object, it first requests a shared/


exclusive lock on the object.

2. All locks held by a transaction are released when the transaction is completed.

Module 4 42
• A transaction that has an exclusive lock can also read the object; an additional shared
lock is not required.

• A transaction that requests a lock is suspended until the DBMS is able to grant it the
requested lock.

• The DBMS keeps track of the locks it has granted and ensures that if a transaction
holds an exclusive lock on an object, no other transaction holds a shared or exclusive
lock on the same object.

• After the execution, requests to acquire and release locks can be automatically
inserted into transactions by the DBMS.
Module 4 43
• The locking protocol allows only ‘safe’ interleaving of transactions.

• If two transactions access completely independent parts of the database, they will
be able to concurrently obtain the locks that they need and proceed merrily on
their ways.

• On the other hand, if two transactions access the same object, and one of them
wants to modify it, their actions are effectively ordered serially—all actions of
one of these transactions (the one that gets the lock on the common object first)
are completed before (this lock is released and) the other transaction can
proceed.
Module 4 44
• The action of a transaction T requesting a shared (exclusive) lock on object O as
ST (O) (XT (O))

• Consider

Module 4 45
• This interleaving could result in a state that cannot result from any serial execution
of the three transactions.

• For instance, T1 could change A from 10 to 20, then T2 (which reads the value 20
for A) could change B from 100 to 200, and then T1 would read the value 200
for B.

• If run serially, either T1 or T2 would execute first, and read the values 10 for A
and 100 for B: clearly, the interleaved execution is not equivalent to either serial
execution
Module 4 46
• If the strict 2PL protocol is used, the above interleaving is disallowed.

• Assuming that the transactions proceed at the same relative speed as before, T1
would obtain an exclusive lock on a first and then read and write A.

Module 4 47
• Then, T2 would request a lock on A.

• However, this request cannot be granted until T1 releases its exclusive lock on
A, and the DBMS therefore suspends T2.

• T1 now proceeds to obtain an exclusive lock on B, reads and writes B, then


finally commits, at which time its locks are released.

• T2’s lock request is now granted, and it proceeds.

Module 4 48
Module 4 49
• In general, the actions of different transactions could be interleaved.

• Consider the interleaving of two transactions, which is permitted by the strict 2PL protocol

Module 4 50
INTRODUCTION TO CRASH RECOVERY

• The recovery manager of a DBMS is responsible for ensuring transaction atomicity and
durability.

• It ensures atomicity by undoing the actions of transactions that do not commit and
durability by making sure that all actions of committed transactions survive system
crashes and media failures.

• When a DBMS is restarted after crashes, the recovery manager is given control and must
bring the database to a consistent state.

• The recovery manager is also responsible for undoing the actions of an aborted
Module 4 51

transaction
• The transaction manager of a DBMS controls the execution of transactions.

• Before reading and writing objects during normal execution, locks must be
acquired (and released at some later time) according to a chosen locking protocol.

• This implies that the system does not crash while a write is in progress and is
unrealistic.

• In practice, disk writes do not have this property, and steps must be taken during
restart after a crash to verify that the most recent write to a given page was
completed successfully and to deal with the consequences if not.
Module 4 52
Stealing Frames And Forcing Pages
• With respect to writing objects, two additional questions arise:

1. Can the changes made to an object O in the buffer pool by a transaction T be


written to disk before T commits?

• Such writes are executed when another transaction wants to bring in a page and the
buffer manager chooses to replace the page containing O; of course, this page must
have been unpinned by T.

• If such writes are allowed, we say that a steal approach is used. (the second
transaction ‘steals’ a frame from T.)
Module 4 53
2. When a transaction commits, must we ensure that all the changes it has made to
objects in the buffer pool are immediately forced to disk?

• If so, a force approach is used.

• From the standpoint of implementing a recovery manager, it is simplest to use a


buffer manager with a no-steal, force approach.

• If no-steal is used, we don’t have to undo the changes of an aborted transaction


(because these changes have not been written to disk), and if force is used, we
don’t have to redo the changes of a committed transaction If there is a
subsequent crash (because all these changes are guaranteed to have been
written to disk at commit time).
Module 4 54
• However, these policies have important drawbacks.

• The no-steal approach assumes that all pages modified by ongoing


transactions can be accommodated in the buffer pool, and in the presence
of large transactions (typically run in batch mode, e.g., Payroll processing),
this assumption is unrealistic.

Module 4 55
• The force approach results in excessive page I/O costs.

• If a highly used page is updated in succession by 20 transactions, it would be


written to disk 20 times.

• With a no-force approach, on the other hand, the in memory copy of the page
would be successively modified and written to disk just once, reflecting the effects
of all 20 updates, when the page is eventually replaced in the buffer pool (in
accordance with the buffer manager’s page replacement policy).

Module 4 56
• For these reasons, most systems use a steal, no-force approach.

• Thus, if a frame is dirty and chosen for replacement, the page it contains is written
to disk even if the modifying transaction is still active (steal); in addition, pages in
the buffer pool that are modified by a transaction are not forced to disk when the
transaction commits (no-force).

Module 4 57
Recovery-Related Steps During Normal Execution
• The recovery manager of a DBMS maintains some information during normal execution of
transactions in order to enable it to perform its task in the event of a failure. In particular,
a log of all modifications to the database is saved on stable storage, which is guaranteed
(with very high probability) to survive crashes and media failures.

• Stable storage is implemented by maintaining multiple copies of information (perhaps in


different locations) on nonvolatile storage devices such as disks or tapes.

• It is important to ensure that the log entries describing a change to the database are
written to stable storage before the change is made; otherwise, the system might crash just
Module 4 58
after the change, leaving us without a record of the change
• The log enables the recovery manager to undo the actions of aborted and
incomplete transactions and to redo the actions of committed transactions.

• For example, a transaction that committed before the crash may have made
updates to a copy (of a database object) in the buffer pool, and this change may
not have been written to disk before the crash, because of a no-force approach.

• Such changes must be identified using the log, and must be written to disk.

• Further, changes of transactions that did not commit prior to the crash might have
been written to disk because of a steal approach. Such changes must be
identified using the log and then undone
Module 4 59
OVERVIEW OF ARIES
• Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is a recovery
algorithm that is designed to work with a steal, no-force approach.

• When the recovery manager is invoked after a crash, restart proceeds in three phases:

1. Analysis: identifies dirty pages in the buffer pool (i.e., Changes that have not been
written to disk) and active transactions at the time of the crash.

2. Redo: repeats all actions, starting from an appropriate point in the log, and restores the
database state to what it was at the time of the crash.

3. Undo: undoes the actions of transactions that did not commit, so that the database
reflects
Module 4
only the actions of committed transactions. 60
DEADLOCKS

• Consider the transaction T1 gets an exclusive lock on object A, T2 gets an exclusive


lock on B, T1 requests an exclusive lock on B and is queued, and T2 requests an
exclusive lock on A and is queued.

• Now, T1 is waiting for T2 to release its lock and T2 is waiting for T1 to release its lock.

• Such a cycle of transactions waiting for locks to be released is called a deadlock.

• The DBMS must either prevent or detect (and resolve) such deadlock situations.

Module 4 61
Deadlock Prevention
• We can prevent deadlocks by giving each transaction a priority and ensuring that
lower priority transactions are not allowed to wait for higher priority transactions (or
vice versa).

• One way to assign priorities is to give each transaction a timestamp when it starts up.

• The lower the timestamp, the higher the transaction’s priority, that is, the oldest
transaction has the highest priority.

Module 4 62
• If a transaction Ti requests a lock and transaction Tj holds a conflicting lock, the lock
manager can use one of the following two policies:

• Wait-die: if Ti has higher priority, it is allowed to wait; otherwise it is aborted.

• Wound-wait: if Ti has higher priority, abort Tj; otherwise Ti waits.

• In the wait-die scheme, lower priority transactions can never wait for higher priority
transactions.

• In the wound-wait scheme, higher priority transactions never wait for lower priority
transactions.

•Module
In 4either case no deadlock cycle can develop. 63
• When a transaction is aborted and restarted, it should be given the same
timestamp that it had originally.

• Reissuing timestamps in this way ensures that each transaction will eventually
become the oldest transaction, and thus the one with the highest priority, and will
get all the locks that it requires

Module 4 64
• The wait-die scheme is non-preemptive; only a transaction requesting a lock can
be aborted.

• As a transaction grows older (and its priority increases), it tends to wait for more
and more younger transactions.

• A younger transaction that conflicts with an older transaction may be repeatedly


aborted (a disadvantage with respect to wound wait), but on the other hand, a
transaction that has all the locks it needs will never be aborted for deadlock
reasons (an advantage with respect to wound-wait, which is preemptive).
Module 4 65
Deadlock Detection

• Rather than taking measures to prevent deadlocks, it may be better to detect and
resolve deadlocks as they arise.

• In the detection approach, the DBMS must periodically check for deadlocks.

Module 4 66
• When a transaction Ti is suspended because a lock that it requests cannot be
granted, it must wait until all transactions Tj that currently hold conflicting locks
release them.

• The lock manager maintains a structure called a Waits-for Graph to detect


deadlock cycles.

• The nodes correspond to active transactions, and there is an arc from Ti to Tj if


(and only if) Ti is waiting for Tj to release a lock.

• The lock manager adds edges to this graph when it queues lock requests and
removes edges when it grants lock requests.
Module 4 67
• Consider the schedule shown in figure, the last step, shown below the line, creates
a cycle in the waits-for graph

Module 4 68
Module 4 69
• The waits-for graph describes all active transactions, some of which will
eventually abort.

• If there is an edge from Ti to Tj in the waits-for graph, and both Ti and Tj


eventually commit, there will be an edge in the opposite direction (from Tj to Ti) in
the precedence graph (which involves only committed transactions).

• The waits-for graph is periodically checked for cycles, which indicate deadlock.

• A deadlock is resolved by aborting a transaction that is on a cycle and releasing


its locks; this action allows some of the waiting transactions to proceed.

Module 4 70
• As an alternative to maintaining a waits-for graph, a simplistic way to identify
deadlocks is to use a timeout mechanism:

• If a transaction has been waiting too long for a lock, assume that it is in a
deadlock cycle and abort it.

Module 4 71
DISTRIBUTED DATABASES

Module 4 72
INTRODUCTION
• A distributed database is a database that stores data across multiple locations,
such as computers or servers, instead of in one place.

• The data can be spread out across a network of computers or in multiple


computers in the same location.

• Data in a distributed database system is stored across several sites, and each site
is typically managed by a DBMS that can run independently of the other sites.

• The classical view of a distributed database system is that the system should make
the impact of data distribution transparent.
Module 4 73
Module 4 74
Properties
• Distributed Data Independence:

• Users should be able to ask queries without specifying where the referenced
relations, or copies or fragments of the relations, are located.

• This principle is a natural extension of physical and logical data


independence.

• Further, queries that span multiple sites should be optimized systematically in


a cost-based manner, taking into account communication costs and differences
in local computation costs
Module 4 75
• Distributed Transaction Atomicity:

• Users should be able to write transactions that access and update data at
several sites just as they would write transactions over purely local data.

• The effects of a transaction across sites should continue to be atomic; that is,
all changes persist if the transaction commits, and none persist if it aborts

Module 4 76
Types
• Homogeneous

• All sites use the same database management system (DBMS), data structure, and
operating system. This makes it easier to manage because of the uniformity of the
software and hardware. It requires a commitment to a single DBMS across all sites.

• Heterogeneous(Multi Database System)

• Different sites use different DBMS software, but there is additional common
software to support data exchange between these sites. For example, various
library database systems use the same machine-readable cataloging format.
Module 4 77
DISTRIBUTED DBMS ARCHITECTURES

• There are three alternative approaches to separating functionality across


different DMBS-related processes;

• Client-server,

• Collaborating server, and

• Middleware.

Module 4 78
• 1.Client-server systems

• A client-server system has one or more client processes and one or more
server processes, and a client process can send a query to any one server
process.

• Clients are responsible for user-interface issues; servers manage data and
execute transactions.

• Thus, a client process could run on a personal computer and send queries to a
server running on a mainframe

Module 4 79
• Reasons for the popularity of this architecture.

1. It is relatively simple to implement due to its clean separation of functionality


and because the server is centralized.

2. Expensive server machines are not underutilized when dealing with mundane
user interactions, and they are now relegated to inexpensive client machines.

3. Users can run a graphical user interface that they are familiar with, rather than
the (possibly unfamiliar and unfriendly) user interface on the server

Module 4 80
• The boundary between the client and the server and to keep the communication
between them as set-oriented as possible.

• In particular, opening a cursor and fetching tuples one at a time generates many
messages and should be avoided.

• (Even if we fetch several tuples and cache them at the client, messages must be
exchanged when the cursor is advanced to ensure that the current row is locked.)

Module 4 81
• Drawbacks
• The client-server architecture does not allow a single query to span multiple
servers because the client process would have to be capable of breaking such
a query into appropriate subqueries to be executed at different sites and
then piecing together the answers to the subqueries.

• The client process would thus be quite complex, and its capabilities would
begin to overlap with the server; distinguishing between clients and servers
becomes harder.
Module 4 82
• 2. Collaborating Server Systems

• Eliminating this distinction leads us to an alternative to the client-server


architecture: a collaborating server system.

• A collection of database servers, each capable of running transactions


against local data, which cooperatively execute transactions spanning
multiple servers.

Module 4 83
• When a server receives a query that requires access to data at other servers,
it generates appropriate subqueries to be executed by other servers and puts
the results together to compute answers to the original query.

• Ideally, the decomposition of the query should be done using cost-based


optimization, taking into account the costs of network communication as well as
local processing costs

Module 4 84
• Middleware Systems

• The middleware architecture is designed to allow a single query to span


multiple servers, without requiring all database servers to be capable of
managing such multi-site execution strategies.

• It is especially attractive when trying to integrate several legacy systems,


whose basic capabilities cannot be extended.

Module 4 85
• The idea is that we need just one database server that is capable of
managing queries and transactions spanning multiple servers; the remaining
servers only need to handle local queries and transactions.

• A special server, as a layer of software that coordinates the execution of


queries and transactions across one or more independent database servers;
such software is often called middleware.

• The middleware layer is capable of executing joins and other relational


operations on data obtained from the other servers, but typically, does not
itself maintain any data.
Module 4 86
STORING DATA IN A DISTRIBUTED DBMS
• In a distributed DBMS, relations are stored across several sites.

• Accessing a relation that is stored at a remote site incurs message-passing costs,


and to reduce this overhead, a single relation may be partitioned or fragmented
across several sites, with fragments stored at the sites where they are most often
accessed, or replicated at each site where the relation is in high demand.

Module 4 87
• Fragmentation

• Fragmentation consists of breaking a relation into smaller relations or


fragments, and storing the fragments (instead of the relation itself), possibly
at different sites.

• In horizontal fragmentation, each fragment consists of a subset of rows of


the original relation.

• In vertical fragmentation, each fragment consists of a subset of columns of


the original relation

Module 4 88
Module 4 89
• Horizontal fragmentation: the union of the horizontal fragments must be equal to
the original relation. Fragments are usually also required to be disjoint.

• Vertical fragmentation: the collection of vertical fragments should be a lossless


join decomposition,

Module 4 90
• Replication

• Replication means that we store several copies of a relation or relation


fragment.

• An entire relation can be replicated at one or more sites. Similarly, one or


more fragments of a relation can be replicated at other sites.

• For example, if a relation r is fragmented into r1, r2, and r3, there might be
just one copy of r1, whereas r2 is replicated at two other sites and r3 is
replicated at all sites.

Module 4 91
• The motivation for replication is twofold:

• Increased availability of data: if a site that contains a replica goes down, we


can find the same data at other sites. Similarly, if local copies of remote
relations are available, we are less vulnerable to the failure of communication
links.

• Faster query evaluation: queries can execute faster by using a local copy of
a relation instead of going to a remote site

Module 4 92
• There are two kinds of replication, called

• Synchronous and asynchronous replication,

• Which differ primarily in how replicas are kept current when the relation is
modified.

Module 4 93

You might also like