Concurrency Control in Distributed
Databases
preencoded.png
The Goal of Concurrency Control
Concurrency Control is a mechanism to manage simultaneous operations
(transactions) on a shared database. Its core objective is to ensure the database
remains in a consistent state, despite multiple transactions running concurrently,
providing the Isolation property of the ACID (Atomicity, Consistency, Isolation,
Durability) model.
preencoded.png
The Distributed Challenge
Why is this much harder than on a single computer? The inherent complexities of a distributed environment introduce unique hurdles
to maintaining data consistency and isolation.
Network Latency No Global Clock
Significant and unpredictable delays in communication between Each computer has its own independent clock, making it difficult
sites. A lock granted at Site A is not instantly known at Site B. to determine the exact global order of events.
Site & Network Failures Data Replication
Individual computers or network links can fail, requiring robust Data copies across multiple sites for performance and
handling to prevent data inconsistency. availability must be consistently updated across all instances.
preencoded.png
Common Concurrency Problems
1. The Lost Update 2. The Dirty Read 3. The Inconsistent Retrieval
Occurs when two transactions A transaction reads data modified
(Phantom Read)
access the same data, and the by another transaction that has not
second transaction's write yet committed. If the modifying A transaction reads a set of data
overwrites the first's, leading to transaction aborts, the read data twice, but another committed
incorrect state. For example, two becomes invalid, leading to transaction inserts or deletes rows
tickets sold for one seat. incorrect decisions (e.g., forecasting in between the reads. This results in
based on a cancelled sale). inconsistent views, like a count of
50 employees and a list of 51 names.
preencoded.png
Pessimistic Approach: Lock-Based Protocols
Philosophy: "Assume conflict is likely. It's better to be safe than sorry."
Mechanism: Prevents conflicts by forcing transactions to acquire a lock on a data item before accessing it. Other transactions may have to wait for
the lock to be released, ensuring serialized access.
Types of Locks
1. Binary Locks 2. Shared/Exclusive Locks
• Simplest: Data item is either Locked or Unlocked. • Shared Lock (S-lock): For reading data. Multiple S-locks allowed
• If locked, no other transaction can access it. simultaneously (many can read a book).
• High safety, very low concurrency. • Exclusive Lock (X-lock): For writing data. Only one X-lock at a time.
No S-locks allowed if an X-lock is held.
• Greater concurrency than binary locks.
preencoded.png
The Two-Phase Locking (2PL) Protocol
A fundamental protocol ensuring serializability by strictly regulating lock acquisition and release.
Phase 1: Growing Phase Phase 2: Shrinking Phase
The transaction can acquire any number of locks it needs. Once the transaction releases its first lock, it enters this
During this phase, it cannot release any locks. phase. It can release held locks but cannot acquire any new
locks.
preencoded.png
Distributed Locking Schemes
The core challenge: In a distributed system, how do we manage locks when data may exist on multiple sites?
1. Centralized Lock 2. Primary Copy Locking 3. Fully Distributed
Manager Locking
One dedicated site manages all For data that is replicated across Each site involved in a transaction
lock requests for the entire multiple sites, one specific copy is manages its own local locks. For
distributed database. This designated as the "primary." All replicated data, a lock must be
simplifies implementation and lock requests for that data item acquired on every copy of the
deadlock detection but introduces must be sent to and processed by data item across all relevant sites
a single point of failure and can the site holding its primary copy. before the transaction can
become a communication While this distributes the load, the proceed. This approach offers
bottleneck as all requests funnel system remains vulnerable if the high fault-tolerance as there's no
through it. primary site fails, as it would halt single point of failure, but it incurs
operations for that data item. very high communication
overhead and significant
complexity due to the
coordination required. preencoded.png
Optimistic Approach: Timestamp Ordering
Philosophy: "Assume conflicts are rare. It's easier to ask for forgiveness than permission."
Mechanism: No locks are used upfront. Transactions read and write freely. Before committing, a validation phase checks for conflicts. If
conflicts exist, the transaction is rolled back and restarted. Timestamps determine the correct transaction order.
How Timestamp Ordering Works
• Setup: Each transaction T gets a unique, monotonically increasing timestamp TS(T).
• Data Item Timestamps: Each data item X stores Read-TS(X) (newest read) and Write-TS(X) (newest write).
• Rules for Access:
• Read Rule: If TS(T) < Write-TS(X), T aborts (reading old value).
• Write Rule: If TS(T) < Read-TS(X), T aborts (invalidating a newer read).
preencoded.png
Distributed Timestamps: Lamport's Algorithm
Problem: How do we generate unique, ordered timestamps across machines with no global clock?
Solution: Logical Clocks. We create a logical order of events, ensuring a consistent partial ordering across the system.
The Rules:
Each site i has a local counter Ci.
Ci is incremented for every event at that site.
When site i sends a message, it includes its timestamp Ci.
When site j receives a message with timestamp Ci, it updates its own clock: Cj = max(Cj, Ci) + 1.
This ensures that if event A happens before event B, then Timestamp(A) < Timestamp(B), creating a consistent partial ordering across the system.
preencoded.png