0% found this document useful (0 votes)
6 views73 pages

Distributed DB

The document discusses distributed databases, covering topics such as homogeneous and heterogeneous databases, data fragmentation, replication, and transaction management. It explains the principles of distributed transactions, commit protocols, and concurrency control, emphasizing the importance of atomicity and data consistency across multiple sites. Additionally, it outlines the advantages and disadvantages of various approaches to data management in distributed systems.

Uploaded by

pulokr221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views73 pages

Distributed DB

The document discusses distributed databases, covering topics such as homogeneous and heterogeneous databases, data fragmentation, replication, and transaction management. It explains the principles of distributed transactions, commit protocols, and concurrency control, emphasizing the importance of atomicity and data consistency across multiple sites. Additionally, it outlines the advantages and disadvantages of various approaches to data management in distributed systems.

Uploaded by

pulokr221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Distributed Databases

Content
 Heterogeneous and Homogeneous Databases
 Distributed Data Storage
 Distributed Transactions
 Commit Protocols
 Concurrency Control in Distributed Databases
 Availability
 Distributed Query Processing

2 Advanced Database system principle


Distributed Database System
 A distributed database system:
 loosely coupled sites
 share no physical component
 Database systems that run on each site:
 independent of each other
 Transactions may access data
 over one or several sites
 Can be Homogeneous or Heterogeneous

3 Advanced Database system principle


Homogeneous Distributed Databases
 In a homogeneous distributed database
 All sites have identical software
 Are aware of each other and agree to cooperate in processing
user requests.
 Each site surrenders its right to change schemas or software
 Appears to user as a single system

4 Advanced Database system principle


Heterogeneous Distributed Databases
 In a heterogeneous distributed database
 Different sites may use different schemas and software
 Different schema is a major problem for query processing
 Different software is a major problem for transaction processing
 Sites may not be aware of each other

5 Advanced Database system principle


Content
 Heterogeneous and Homogeneous Databases
 Distributed Data Storage
 Distributed Transactions
 Commit Protocols
 Concurrency Control in Distributed Databases
 Availability
 Distributed Query Processing

6 Advanced Database system principle


Data Fragmentation
 Divide relation r into fragments
r1, r2, …, rn
 Fragment contains sufficient information
 Horizontal fragmentation: (data sharding)
 tuples are assigned to one or more fragments

 Vertical fragmentation:
 schema of r is split into several smaller schemas

7 Advanced Database system principle


Data Fragmentation
 Vertical fragmentation:
 All schemas must contain a common candidate key (or superkey)
to ensure lossless join property.
 The tuple-id attribute may be added to each schema to serve as a
candidate key.

8 Advanced Database system principle


Horizontal Fragmentation of account Relation

account_number branch_name balance

A-305 Hillside 500


A-226 Hillside 336
A-155 Hillside 62

account1 = branch_name=“Hillside” (account )

account_number branch_name balance

A-177 Valleyview 205


A-402 Valleyview 10000
A-408 Valleyview 1123
A-639 Valleyview 750

account2 = branch_name=“Valleyview” (account )


9 Advanced Database system principle
Vertical Fragmentation of employee_info Relation

branch_name customer_name tuple_id

Hillside Lowman 1
Hillside Camp 2
Valleyview Camp 3
Valleyview Kahn 4
Hillside Kahn 5
Valleyview Kahn 6
Valleyview Green 7
deposit1 = branch_name, customer_name, tuple_id (employee_info )
account_number balance tuple_id

A-305 500 1
A-226 336 2
A-177 205 3
A-402 10000 4
A-155 62 5
A-408 1123 6
A-639 750 7
10 Advanced Database system principle
deposit2 = account_number, balance, tuple_id (employee_info )
Advantages of Fragmentation
 Horizontal:
 allows parallel processing on fragments of a relation
 scalability
 Vertical:
 allows vertically split a relation, assign attributes to the sites
where they are most frequently accessed
 tuple-id attribute allows efficient joining of vertical fragments

11 Advanced Database system principle


Data Replication
 Data Replication:
 a relation, or fragment of a relation
 stored redundantly
 in two or more sites.
 Full replication of a relation:
 the relation is stored at all sites.
 Fully redundant databases:
 every site contains a copy of the entire database.

12 Advanced Database system principle


Data Replication (Cont.)
 Advantages of Replication
 Availability:
 One site failure, but r is available at other sites

 Parallelism: queries on r may be processed by several nodes in


parallel

 Reduced data transfer: relation r is available locally at each site


containing a replica of r.

13 Advanced Database system principle


Data Replication (Cont.)
 Disadvantages of Replication:
 Mainly data consistency:
 Increased cost of updates: each replica of relation r must be
updated.
 Increased complexity of concurrency control: concurrent updates
may lead to inconsistent data

14 Advanced Database system principle


Data Transparency
 Data transparency:
 the degree, user may not know the details:
 how and where the data items are stored
 Several kinds of transparency:
 Fragmentation transparency
 Replication transparency
 Location transparency

15 Advanced Database system principle


Naming Data
 locating entity, simplify management, across different sites.
 Solution: Naming management, etc.
 Naming management:
1. Give a system-wide unique name;
2. It should be possible to find the location of data items
efficiently;
3. It should be possible to change the location of data items
transparently.

16 Advanced Database system principle


Centralized Scheme - Name Server
 Structure:
 name server assigns all names
 each site maintains a record of local data items
 sites ask name server to locate non-local data items
 Disadvantages:
 name server is a potential performance bottleneck
 name server is a single point of failure

17 Advanced Database system principle


Use of Aliases
 Alternative to centralized scheme:
 Add site id: i.e., [Link].
 fails to achieve network transparency.

 Therefore:
 Create a set of aliases for data items;
 store the mapping of aliases at each site.

 User unaware of the physical location of data


 unaffected if the data item is moved to other sites

18 Advanced Database system principle


Content
 Heterogeneous and Homogeneous Databases
 Distributed Data Storage
 Distributed Transactions
 Commit Protocols
 Concurrency Control in Distributed Databases
 Availability
 Distributed Query Processing

19 Advanced Database system principle


Distributed Transactions
 Transaction may access data at several sites.
 Each site has a local transaction manager, for:
 Maintaining a log for recovery purposes
 Participating in coordinating the concurrent execution of the
transactions.

20 Advanced Database system principle


Distributed Transactions
 Each site has a transaction coordinator, for:
 Starting the execution of transactions
 Distributing subtransactions at appropriate sites.
 Coordinating the termination of each transaction

21 Advanced Database system principle


Content
 Heterogeneous and Homogeneous Databases
 Distributed Data Storage
 Distributed Transactions
 Commit Protocols
 Concurrency Control in Distributed Databases
 Availability
 Distributed Query Processing

22 Advanced Database system principle


System Failure Modes
 Unique failures of distributed systems:
 Failure of a site.
 Loss of messages
 Handled by TCP-IP protocols etc.
 Failure of a communication link
 Handled by routing messages via alternative links
 Network partition
 if it is split into two or more subsystems, without any connection
Note: a subsystem may consist of a single node

 Network partitioning and site failures are generally indistinguishable.

23 Advanced Database system principle


Commit Protocols
 Used to ensure atomicity across sites
 a transaction executed at multiple sites
 must be committed, or aborted at all the sites

 The two-phase commit (2PC) protocol is widely used


 The three-phase commit (3PC) protocol is more complicated ,
expensive, but avoids some drawbacks of two-phase commit protocol.
This protocol is seldom used in practice.

24 Advanced Database system principle


Two Phase Commit Protocol (2PC)
 Assumes fail-stop model
 failed sites simply stop working
 Executed by the coordinator, after the transaction is executed
 The protocol involves all the local sites at which the transaction
executed
 Let T be a transaction initiated at site Si, and let the transaction
coordinator at Si be Ci

25 Advanced Database system principle


Phase 1: Obtaining a Decision
 Coordinator asks all participants to prepare to commit transaction Ti.
 Ci adds the records <prepare T> to the log and forces log to
stable storage
 sends prepare T messages to all sites (T executed)

26 Advanced Database system principle


Phase 1: Obtaining a Decision
 Upon receiving message, transaction manager at site determines if it
can commit the transaction
 if not, add a record <no T> to the log and send abort T message
to Ci
 if the transaction can be committed, then:
 add the record <ready T> to the log
 force all records for T to stable storage
 send ready T message to Ci

27 Advanced Database system principle


Phase 2: Recording the Decision
 If received a ready T from all sites, then
Ci can commit T
else
T must be aborted.
 Coordinator:
 adds a record to the log: <commit T> or <abort T>
 forces record onto stable storage, once on stable storage, it is
irrevocable
 sends a message to each participant, informing commit or abort
 Participants take appropriate action locally.

28 Advanced Database system principle


Handling of Failures - Site Failure
 To recover a site Si ,first check its log,for:
 Contains <commit T>,finished,perform redo (T)
 Contains <abort T>: not finished,site executes undo (T)
 Contains <ready T>: have submitted,requires Ci to decide operations
of transaction
 If T committed, redo (T) (if all committed)
 If T aborted, then undo (T),(aborted by a site)
 If no message about T,then Sk fails at the preparing stage,then Sk must
execute undo (T)

29 Advanced Database system principle


Handling of Failures- Coordinator Failure
 C fails, sites decide T’s fate:
 If one active site contains <commit T> ,then T must be
committed.
 If one active site contains <abort T> ,then T must be aborted.
 If a site has no <ready T> ,then must be aborted by that site, then
abort T.
 If all contain <ready T>,and no other message like (such as
<abort T> of <commit T>). All have to wait for C to decide.
 Blocking problem : active sites may have to wait for failed coordinator
to recover.

30 Advanced Database system principle


Handling of Failures - Network Partition
 If the coordinator and all its participants remain in one partition,
the failure has no effect on the commit protocol
 If they belong to several partitions:
 Participants think that Coordinator fails, then execute recovery
protocol, wait until Coordinator is up;
 Coordinator think that participants are down, then execute
corresponding protocols

31 Advanced Database system principle


Recovery and Concurrency Control
 In-doubt transactions:
have a <ready T>, not ( <commit T>, or <abort T>)
 The recovering site determines transactions by contacting
other sites;
 this can slow down and potentially block recovery
 Alternatively, recovery algorithms rewrite lock information like:
 Instead of <ready T>, write out <ready T, L>, L = list of
locks held by T.
 Need acquires grant lock for all locks
 After acquiring locks, transactions can be resumed
 Execute commit or abort
 Fast!

32 Advanced Database system principle


Three Phase Commit (3PC)
 Assumptions:
 No network partitioning
 At any point, at least one site must be up.
 At most K sites (participants as well as coordinator) can fail
 Phase 1: identical to 2PC Phase 1
 ready to commit
 In phase 2 coordinator makes a decision (called the pre-commit
decision) and records it in multiple (at least K) sites
 In phase 3, coordinator sends commit/abort message to all
participating sites

33 Advanced Database system principle


Three Phase Commit (3PC)
 Advantage:
 under 3PC, pre-commit decision can be used to commit (despite
failure of some coordinators)
 Avoids blocking problem as long as < K sites fail
 Disadvantage:
 higher overheads
 assumptions may not be satisfied in practice

34 Advanced Database system principle


Content
 Heterogeneous and Homogeneous Databases
 Distributed Data Storage
 Distributed Transactions
 Commit Protocols
 Concurrency Control in Distributed Databases
 Availability
 Distributed Query Processing

35 Advanced Database system principle


Concurrency Control
 Modify concurrency control schemes for distributed environment

 We assume that each site participates in the same commit protocol


 to ensure global transaction atomicity.

 We assume all replicas of any item are updated


 Will see how to relax this in case of site failures later

36 Advanced Database system principle


Single-Lock-Manager Approach
 System maintains a single lock manager Si
 When a transaction needs to lock a data item:
 it sends a lock request to Si
 lock manager determines grant immediately or not
 If yes, lock manager sends back a message
 If no, request is delayed until it can be granted

37 Advanced Database system principle


Single-Lock-Manager Approach (Cont.)
 The transaction read the data item from any site storing the replica
 Writes must be performed on all replicas of a data item
 Advantages of scheme:
 Simple implementation
 Simple deadlock handling
 Disadvantages of scheme are:
 Bottleneck: lock manager site
 Vulnerable

38 Advanced Database system principle


Distributed Lock Manager
 Locking is distributed and implemented at each local site
 Lock managers control access to local data items
 Advantage:
 distributed, robust
 Disadvantage:
 deadlock detection
 Several variants of this approach
 Primary copy
 Majority protocol
 Biased protocol
 Quorum consensus 法定人数通过 协议

39 Advanced Database system principle


Primary Copy
 Choose one replica as primary copy of a data item
 primary site: containing that replica
 Different data items --- different primary sites
 To lock on a data item Q, requests a lock at the primary site of Q
 Implies getting lock on all replicas of Q
 Benefit
 Concurrency control is simple
 Drawback
 If the primary site fails, Q is inaccessible even though other sites is
accessible.

40 Advanced Database system principle


Majority Protocol
 Local lock manager manages lock and unlock requests for data
items stored at that site
 To lock an unreplicated data item Q at site Si, send message to the
lock managers of Si.

41 Advanced Database system principle


Majority Protocol (Cont.)
 To lock replicated data
 If Q is replicated at n sites, then lock request message must be
sent to at least n/2 sites

 Do nothing on Q, until obtain a lock on a majority of the


replicas of Q

 When writing, need to write on all replicas.

42 Advanced Database system principle


Majority Protocol (Cont.)
 Benefit
 Can be used even when some sites are unavailable

 Drawback
 Requires 2(n/2 + 1) messages for lock requests, (n/2 + 1)
messages for unlock requests.

 Might cause deadlock even with single item


e.g., each of 3 transactions has locks on 1/3 of the replicas of a
data.

43 Advanced Database system principle


Biased Protocol
 In biased protocol:priority order
shared locks > exclusive locks
 Shared locks. to lock data item Q, request a lock on a replica of
Q.
 Exclusive locks. to lock Q, request a lock on all replicas of Q.
 Advantage - imposes less overhead on read operations.
 Disadvantage - additional overhead on writes

44 Advanced Database system principle


Quorum Consensus Protocol
 A generalization of majority and biased protocols
 Site: Site 1, Site 2, …, Site n
 Weight: S1, S2, …, Sn
 S = sum (S1,…,Sn)
 Choose two values : read quorum Qr , write quorum Qw
 Requires: Qr + Qw > S
Qw > S / 2
 Quorums is chose separately for each item
 To read, must lock replicas:
Slock >= Qr
 To write, must lock replicas:
Slock >= Qw
 Advantage: control weight & quorum consensus, reduce
read/write cost
Advanced Database system principle
45
Timestamping
 A timestamp based concurrency-control protocol:
 Each transaction -> a unique timestamp
 to determine its serialization order.
 Main task: generate timestamp in a distributed way
 Locally: generates a unique timestamp
using logical counter or clock
 Globally: concatenating
local timestamp with unique identifier

46 Advanced Database system principle


Timestamping (Cont.)
 A site with a slow (fast) clock,
will assign larger (smaller) timestamps
 To fix this problem
 In each site Si , define a logical clock (LCi), generating unique
local timestamp

 If a transaction Ti with timestamp < x,y> visit Si, and x > Lci,
then LCi + 1

47 Advanced Database system principle


Deadlock Handling
Consider the following two transactions list: site 1 has item X and
transaction T1 ; site 2 has item Y and transaction T2 :
T1 : write (X) T2 : write (Y)
write (Y) write (X)

X-lock on X
write (X) X-lock on Y
write (Y)
wait for X-lock on X

Wait for X-lock on Y

Result: deadlock which cannot be detected locally at either site


48 Advanced Database system principle
Centralized Approach
 A global wait-for graph
 constructed and maintained in a single site
 called deadlock-detection coordinator

 two approaches:
 Real graph: real, unknown
 Constructed graph: approximation generated

49 Advanced Database system principle


Centralized Approach
 To construct:
 Add / delete an edge
if inserted/removed in local wait-for graph
 Then, coordinator perform cycle-detection

 If the coordinator finds a cycle, it selects a victim and notifies all


sites. All sites roll back the victim transaction

50 Advanced Database system principle


Local and Global Wait-For Graphs

Local

Global

51 Advanced Database system principle


Wait-For Graph for False Cycles
 Given the right figure,
1. T2 releases resources at S1
 message: remove edge T1  T2
2. And then T2 requests a resource from T3
 message: insert edge T2  T3
 Suppose insert message is before delete message
 due to network delays
 The coordinator would then find a false cycle
Initial state:
T1  T2  T3  T1
 False cycles cannot occur in two-phase protocol
 Generally cause less serious problem
 Still need to select the victim

52 Advanced Database system principle


Content
 Heterogeneous and Homogeneous Databases
 Distributed Data Storage
 Distributed Transactions
 Commit Protocols
 Concurrency Control in Distributed Databases
 Availability
 Distributed Query Processing

53 Advanced Database system principle


Availability
 High availability:
distributed db is available at all the time (99.99%)
 Robustness:
ability to function despite system failures
 To be robust, a distributed system must
 Detect failures
 Reconfigure the system to continue computation
 Recovery/reintegration after a site/link is repaired

54 Advanced Database system principle


Reconfiguration
 Reconfiguration:
 Abort all active transactions, because

 they may hold locks on other sites, other transactions might not
work

 only some replicas failed, but transactions continue to visit data in


failed site

55 Advanced Database system principle


Reconfiguration (Cont)
 Reconfiguration:
 If replica were at failed site, update system catalog to remove
them from the list of replicas.
 Reversed this when failed site recovers

 If a failed site was a central server, use election to determine the


new server
 E.g. name server, concurrency coordinator, global deadlock
detector

56 Advanced Database system principle


Reconfiguration (Cont.)
 Hard to distinguish:
network partition & site failure
 After reconfiguration for network partition, avoid:
 Two/more central servers elected from different partitions
 More than one partition update a replicated data

 Updates must continue even if some sites are down


 Solution: majority based approach
 Alternative “read one write all available”

57 Advanced Database system principle


Majority-Based Approach
 Modify existing majority protocol for distributed concurrency
control
 Each replica has a version number,
 Ver # is updated when the replica is updated
 Send lock request to at least ½ the sites
 Continue operation if a lock is obtained on a majority of the sites
 Read operations: read the value from the replica with largest version
number – the latest
 write back to replicas with lower version numbers

58 Advanced Database system principle


Majority-Based Approach (Cont.)
 Write operations
 find highest version number, set
new version number = old highest version + 1
 Write to all locked replicas, update version number

 Note: reads are guaranteed to know the latest version number

59 Advanced Database system principle


Read One Write All
 Read one write all: allows to read any one replica, but updates
require all replicas are available at commit time

 Incorrect for link failure & network partitioning:


 If failed link is up now, a disconnected site do not know it was
disconnected
 The site has old values, read from that site, incorrect

 With network partitioning, sites in each partition may update same


item concurrently
 As they believe sites in other partitions have failed

60 Advanced Database system principle


Site Reintegration
 After recovery, need to catch all updates missed
 Problem: updates may be happened to items whose replica (the
site) is recovering

 Solution 1: halt all updates, while reintegrating a site


 Cause unacceptable disruption

 Solution 2: lock all replicas at the site, update to latest version,


then release locks

61 Advanced Database system principle


Coordinator Selection
 Backup coordinators
 if coordinator fails, site with enough information, as the new
coordinator,
 executes same algorithms as the previous coor.
 allows fast recovery from coordinator failure, but generates
overhead in normal situation

 Election algorithms
 used to elect a new coordinator if failures
 Example: Bully Algorithm威逼算法

62 Advanced Database system principle


Bully Algorithm
 If site Si sends a request, no response from the coordinator, within T,
then assume it failed
Si elects itself as the new coordinator

 Si :
sends an election message,
to every site,
with a higher identification number,
then waits for any process to answer within T.

63 Advanced Database system principle


Bully Algorithm
 If no response within T,
assume that all sites (with greater id) failed, Si elects itself the
new coordinator.

 If answer is received,
in a new T’,
Si wait for the message: a site with a higher id is elected,
if fail again, restart the algorithm.

64 Advanced Database system principle


Bully Algorithm (Cont.)
 After a failed site recovers, it immediately starts this algorithm

 If no active sites with higher id, the recovered site forces all
processes to let itself become the coordinator, even if there exists
an active coordinator

 Therefore, is called Bully algorithm.

65 Advanced Database system principle


Content
 Heterogeneous and Homogeneous Databases
 Distributed Data Storage
 Distributed Transactions
 Commit Protocols
 Concurrency Control in Distributed Databases
 Availability
 Distributed Query Processing

66 Advanced Database system principle


Distributed Query Processing
 For centralized systems, mainly concern the number of disk
accesses as the cost.

 In a distributed system, need consider:


 data transmission over the network.
 Might speedup query processing (distributed)

67 Advanced Database system principle


Query Transformation
 Relations might be fragmented, needs to reconstruct.

 Consider the horizontal fragmentation of account


account1 =  branch_name = “Hillside” (account )
account2 =  branch_name = “Valleyview” (account )
 The query  branch_name = “Hillside” (account ) becomes
 branch_name = “Hillside” (account1  account2)
which is optimized into
 branch_name = “Hillside” (account1)   branch_name = “Hillside” (account2)

Final strategy is to return account1 as the result of the query.

68 Advanced Database system principle


Simple Join Processing
 Consider three relations, neither replicated nor fragmented
account depositor branch
 account is stored at site S1
 depositor at S2
 branch at S3
 For a query issued at site S1, the system needs to produce the
result at site S1

69 Advanced Database system principle


Possible Query Processing Strategies
 S1: Ship copies of all three relations to site S1,
then process at S1.
 S2: Ship a copy of account to site S2,
compute temp1 = account depositor at S2
ship temp1 to S3,
compute temp2 = temp1 branch at S3
ship the result temp2 to S1
 Must consider following factors:
 amount of data being shipped
 cost of transmitting data blocks between sites
 relative processing speed at each site

70 Advanced Database system principle


Semijoin Strategy
 r1 at S1, r2 at S2
 Before shipping r2, remove those in r2 do not participate the join
operation. Steps:

 1. Compute temp1  R1  R2 (r1) at S1.


 2. Ship temp1 from S1 to S2.
 3. Compute temp2  r2 temp1 at S2
 4. Ship temp2 from S2 to S1.
 5. Compute r1 temp2 at S1. This is the same as r1 r2.

71 Advanced Database system principle


Formal Definition
 The semijoin of r1 with r2, is denoted by:
r1 r2
 it is defined by:
 R1 (r1 r2)
 Thus, r1 r2 selects those tuples of r1 that contributed to r1 r2.
 In step 3, temp2=r2 r1.
 For joins of several relations, can extend to a series of semi-join
steps.

72 Advanced Database system principle


End of Chapter

You might also like