Distributed Data Management
Distributed Systems
Department of Computer Science
UC Irvine
1
Centralized DB systems
Software:
P
M
...
Simplifications:
Application
SQL Front End
Query Processor
Transaction Proc.
File Access
single front end
one place to keep data, locks
if processor fails, system fails, ...
2
Distributed Database Systems
Multiple processors ( + memories)
Heterogeneity and autonomy of
components
Why do we need Distributed
Databases?
Example: IBM has offices in London,
New York, and Hong Kong.
Employee data:
EMP(ENO, NAME, TITLE, SALARY, )
Where should the employee data
table reside?
IBM Data Access Pattern
Mostly, employee data is managed at
the office where the employee works
E.g., payroll, benefits, hire and fire
Periodically, IBM needs consolidated
access to employee data
E.g., IBM changes benefit plans and that
affects all employees.
E.g., Annual bonus depends on global net
profit.
New York
Payroll app
London
Payroll app
EMP
London
New York
Internet
Hong Kong
Payroll app
Hong Kong
Problem:
NY and HK payroll
apps run very slowly!
New York
Payroll app
London
Payroll app
London
London
Emp
New York
NY
Emp
Internet
Hong Kong
Payroll app
Much better!!
Hong Kong
HK
Emp
7
New York
Payroll app
London
Payroll app
Annual
Bonus app
London
London
Emp
New York
NY
Emp
Internet
Hong Kong
Payroll app
Distribution provides
opportunities for
parallel execution
Hong Kong
HK
Emp
8
New York
Payroll app
London
Payroll app
Annual
Bonus app
London
London
Emp
New York
NY
Emp
Internet
Hong Kong
Payroll app
Hong Kong
HK
Emp
9
New York
Payroll app
London
Payroll app
Annual
Bonus app
London
Lon, NY
Emp
New York
NY, HK
Emp
Internet
Hong Kong
Payroll app
Replication improves
availability
Hong Kong
HK, Lon
Emp
10
Homogeneous Distributed
Databases
In a homogeneous distributed database
All sites have identical software
Are aware of each other and agree to cooperate in
processing user requests.
Each site surrenders part of its autonomy in terms of right to
change schemas or software
Appears to user as a single system
In a heterogeneous distributed database
Different sites may use different schemas and software
Difference in schema is a major problem for query
processing
Difference in software is a major problem for transaction
processing
Sites may not be aware of each other and may provide only
limited facilities for cooperation in transaction processing
DB architectures
(1) Shared memory
P
...
12
DB architectures
(2) Shared disk
P
P
...
M
...
13
DB architectures
(3) Shared nothing
...
P
M
14
DB architectures
(4) Hybrid example Hierarchical or Clustered
...
...
M
P
M
15
Issues for selecting
architecture
Reliability
Scalability
Geographic distribution of data
Performance
Cost
16
Parallel or distributed DB system?
More similarities than differences!
17
Typically, parallel DBs:
Fast interconnect
Homogeneous software
High performance is goal
Transparency is goal
18
Typically, distributed DBs:
Geographically distributed
Data sharing is goal (may run into
heterogeneity, autonomy)
Disconnected operation possible
19
Distributed Database
Challenges
Distributed Database Design
Deciding what data goes where
Depends on data access patterns of
major applications
Two subproblems:
Fragmentation: partition tables into
fragments
Allocation: allocate fragments to nodes
20
Distributed Data Storage
Assume relational data model
Replication
System maintains multiple copies of data,
stored in different sites, for faster retrieval and
fault tolerance.
Fragmentation
Relation is partitioned into several fragments
stored in distinct sites
Replication and fragmentation can be combined
Relation is partitioned into several fragments:
system maintains several identical replicas of
each such fragment.
Data Replication
A relation or fragment of a relation is
replicated if it is stored redundantly in two
or more sites.
Full replication of a relation is the case
where the relation is stored at all sites.
Fully redundant databases are those in
which every site contains a copy of the
entire database.
Data Replication (Cont.)
Advantages of Replication
Availability: failure of site containing relation r does not result
in unavailability of r is replicas exist.
Parallelism: queries on r may be processed by several nodes
in parallel.
Reduced data transfer: relation r is available locally at each
site containing a replica of r.
Disadvantages of Replication
Increased cost of updates: each replica of relation r must be
updated.
Increased complexity of concurrency control: concurrent
updates to distinct replicas may lead to inconsistent data unless
special concurrency control mechanisms are implemented.
One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy
Data Fragmentation
Division of relation r into fragments r1, r2, , rn which
contain sufficient information to reconstruct relation r.
Horizontal fragmentation: each tuple of r is assigned
to one or more fragments
Vertical fragmentation: the schema for relation r is
split into several smaller schemas
All schemas must contain a common candidate key
(or superkey) to ensure lossless join property.
A special attribute, the tuple-id attribute may be
added to each schema to serve as a candidate key.
Example : relation account with following schema
Account = (branch_name, account_number, balance )
Horizontal Fragmentation of account Relation
branch_name
Hillside
Hillside
Hillside
account_number
balance
A-305
A-226
A-155
500
336
62
account1 = branch_name=Hillside
(account )
branch_name account_number
Valleyview
Valleyview
Valleyview
Valleyview
balance
A-177
A-402
A-408
A-639
account2 = branch_name=Valleyview
205
10000
1123
750
Vertical Fragmentation of employee_info Relation
branch_name
customer_name
tuple_id
Lowman
1
Hillside
Camp
2
Hillside
Camp
3
Valleyview
Kahn
4
Valleyview
Kahn
5
Hillside
Kahn
6
Valleyview
Green
7
Valleyview
deposit1 = branch_name, customer_name, tuple_id (employee_info )
account_number
balance
tuple_id
500
A-305
1
336
A-226
2
205
A-177
3
10000
A-402
4
62
A-155
5
1123
A-408
6
750
A-639
7
deposit2 = account_number, balance, tuple_id (employee_info )
Advantages of Fragmentation
Horizontal:
allows parallel processing on fragments of a relation
allows a relation to be split so that tuples are located
where they are most frequently accessed
Vertical:
allows tuples to be split so that each part of the tuple is
stored where it is most frequently accessed
tuple-id attribute allows efficient joining of vertical
fragments
allows parallel processing on a relation
Vertical and horizontal fragmentation can be mixed.
Fragments may be successively fragmented to an
arbitrary depth.
Data Transparency
Data transparency: Degree to which
system user may remain unaware of the
details of how and where the data items
are stored in a distributed system
Consider transparency issues in relation
to:
Fragmentation transparency
Replication transparency
Location transparency
Naming of Data Items Criteria
1. Every data item must have a system-wide
unique name.
2. It should be possible to find the location of
data items efficiently.
3. It should be possible to change the
location of data items transparently.
4. Each site should be able to create new
data items autonomously.
Centralized Scheme - Name
Server
Structure:
name server assigns all names
each site maintains a record of local data items
sites ask name server to locate non-local data items
Advantages:
satisfies naming criteria 1-3
Disadvantages:
does not satisfy naming criterion 4
name server is a potential performance bottleneck
name server is a single point of failure
Use of Aliases
Alternative to centralized scheme: each site
prefixes its own site identifier to any name
that it generates i.e., site 17.account.
Fulfills having a unique identifier, and
avoids problems associated with central
control.
However, fails to achieve network
transparency.
What is a Transaction?
A set of steps completed by a DBMS to
accomplish a single user task.
Must be either entirely completed or aborted
No intermediate states are acceptable
Distributed Transactions
Transaction may access data at several sites.
Each site has a local transaction manager responsible for:
Maintaining a log for recovery purposes
Participating in coordinating the concurrent execution of the
transactions executing at that site.
Each site has a transaction coordinator, which is responsible
for:
Starting the execution of transactions that originate at the site.
Distributing subtransactions at appropriate sites for execution.
Coordinating the termination of each transaction that originates
at the site, which may result in the transaction being committed
at all sites or aborted at all sites.
Transaction System
Architecture
System Failure Modes
Failures unique to distributed systems:
Failure of a site.
Loss of massages
Handled by network transmission control protocols such as TCP-IP
Failure of a communication link
Handled by network protocols, by routing messages via
alternative links
Network partition
A network is said to be partitioned when it has been split
into two or more subsystems that lack any connection
between them
Note: a subsystem may consist of a single node
Network partitioning and site failures are generally
indistinguishable.
Distributed commit
problem
.
Transaction T
Action:
a1,a2
Action:
a3
Commit must be atomic
Action:
a4,a5
Example of a Distributed
Transaction
How a distributed transaction that has components
at several sites can
execute
atomically?
Store
2 should
T1
ship
500
T2
toothbrushes to
store 5
T0
Headquarters
T5
FAILs
N := N - 500
N := N + 500
Tn
Stores
Each Ti is executed atomically, but T0 itself is not atom
Distributed commit
problem
Commit must be atomic
Solution: Two-phase commit (2PC)
Centralized 2PC
Distributed 2PC
Linear 2PC
Many other variants
TWO-PHASE COMMIT (2PC) - OK
TWO-PHASE COMMIT (2PC) - ABORT
G
lo
ba
lA
bo
r
Terminology
Resource Managers (RMs)
Usually databases
Participants
RMs that did work on behalf of
transaction
Coordinator
Component that runs two-phase
commit on behalf of transaction
PREPARED*
COMMIT*
DONE
Participant
Coordinator
REQUEST-TO-PREPARE
NO
ABORT
DONE
Participant
Coordinator
REQUEST-TO-PREPARE
Centralized two-phase
commit
Coordinator Participant
I
commit-request
request-prepare*
prepared*
Commit*
no
abort*
request-prepare
prepared
done*
done*
commit
done
request-prepare
no
abort
done
Notation: Incoming message
Outgoing message
( * = everyone)
When participant enters W state:
it must have acquired all resources
it can only abort or commit if so
instructed by a coordinator
Coordinator only enters C state if all
participants are in W, i.e., it is
certain that all will eventually commit
After coordinator receives DONE
message, it can forget about the
transaction
E.g., cleanup control structures
Next
Add timeouts to cope with
messages lost during crashes
Coordinator
commit-request
request-prepare*
no
abort*
W
prepared*
commit*
any
commit
any
abort
I
A
_t_
abort*
done*
-
F
done*
-
_t_
commit
_t_
abort
t=timeout
Participant
request-prepare
abort
no
done
request-prepare
prepared
abort
_t_
done
ping
commit
done
commit
done
equivalent to
finish state
Distributed Query
Processing
For centralized systems, the primary criterion
for measuring the cost of a particular strategy
is the number of disk accesses.
In a distributed system, other issues must be
taken into account:
The cost of a data transmission over the
network.
The potential gain in performance from
having several sites process parts of the
query in parallel.
Problem Statement
Input: Query
How many times has the moon circled
around the earth in the last twenty years?
Output: Answer
240!
Objectives:
response time, throughput, first answers,
little IO, ...
Centralized vs. Distributed Query Processing
same problem
but, different parameters and objectives
Query Processing
Input: Declarative Query
SQL, OQL, XQuery, ...
Step 1: Translate Query into Algebra
Tree of operators
Step 2: Optimize Query (physical and
logical)
Tree of operators
(Compilation)
Step 3: Interpretation
Query result
Algebra
A.d
SELECT A.d
FROM
A, B
WHERE A.a = B.b
AND A.c = 35
A.a = B.b,
A.c = 35
X
A
relational algebra for SQL very well
understood
algebra for XQuery (work in progress)
Query Optimization
A.d
A.d
A.a = B.b,
A.c = 35
hashjoin
X
A
B.b
B
index A.c
no brainers (e.g., push down cheap predicates)
enumerate alternative plans, apply cost model
use search heuristics to find cheapest plan
Query Execution
John
A.d
(John, 35, CS)
hashjoin
(John, 35, CS)
(Mary, 35, EE)
index A.c
(CS)
(AS)
B.b
B
(Edinburgh, CS,5.0
(Edinburgh, AS, 6.0
library of operators (hash join, merge join, ...)
pipelining (iterator model)
lazy evaluation
exploit indexes and clustering in database
Distributed Query
Processing
Idea:
This is just an extension of centralized query
processing. (System R* et al. in the early 80s)
What is different?
extend physical algebra: send&receive
operators
resource vectors, network interconnect matrix
caching and replication
optimize for response time
less predictability in cost model (adaptive algos)
heterogeneity in data formats and data models
Distributed Query Plan
A.d
hashjoin
receive
receive
send
send
B.b
index A.c
Cost
1
Total Cost =
Sum of Cost of Ops
Cost = 40
1
6
2
10
Query Transformation
Translating algebraic queries on fragments.
It must be possible to construct relation r from its fragments
Replace relation r by the expression to construct relation r
from its fragments
Consider the horizontal fragmentation of the account relation
into
account1 =
branch_name = Hillside
account2 =
branch_name = Valleyview
The query
branch_name = Hillside
branch_name = Hillside
(account )
(account )
(account ) becomes
(account1 account2)
which is optimized into
branch_name = Hillside
(account1)
branch_name = Hillside
(account2)
Simple Join Processing
Consider the following relational algebra
expression in which the three relations are
neither replicated nor fragmented
account
depositor
branch
account is stored at site S1
depositor at S2
branch at S3
For a query issued at site SI, the system
needs to produce the result at site SI
Possible Query Processing
Strategies
Ship copies of all three relations to site SI and choose a
strategy for processing the entire locally at site SI.
Ship a copy of the account relation to site S2 and
compute temp1 = account
depositor at S2. Ship
temp1 from S2 to S3, and compute temp2 = temp1
branch at S3. Ship the result temp2 to SI.
Devise similar strategies, exchanging the roles S1, S2, S3
Must consider following factors:
amount of data being shipped
cost of transmitting a data block between sites
relative processing speed at each site
Semijoin Strategy
Let r1 be a relation with schema R1 stores at site S1
Let r2 be a relation with schema R2 stores at site S2
Evaluate the expression r1 r2 and obtain the result at
S1.
1. Compute temp1 R1 R2 (r1) at S1.
2. Ship temp1 from S1 to S2.
3. Compute temp2 r2
temp1 at S2
4. Ship temp2 from S2 to S1.
5. Compute r1 temp2 at S1. This is the same as r1
r2.
Formal Definition
The semijoin of r1 with r2, is denoted by:
r1
r2
it is defined by:
R1 (r1
r2)
Thus, r1
r2 selects those tuples of r1 that
contributed to r1 r2.
In step 3 above, temp2=r2
r1.
For joins of several relations, the above strategy
can be extended to a series of semijoin steps.
Join Strategies that Exploit Parallelism
Consider r1
r2
r3
r4 where relation ri is stored at
site Si. The result must be presented at site S1.
r1 is shipped to S2 and r1
r2 is computed at S2:
simultaneously r3 is shipped to S4 and r3
r4 is
computed at S4
S2 ships tuples of (r1
S4 ships tuples of (r3
Once tuples of (r1
(r3
of (r1
r2) to S1 as they produced;
r4) to S1
r2) and (r3
r4) arrive at S1 (r1
r2)
r4) is computed in parallel with the computation
r2) at S2 and the computation of (r3
r4) at S4.
Conclusion- Advantages of
DDBMSs
Reflects organizational structure
Improved shareability and local autonomy
Improved availability
Improved reliability
Improved performance
Economics
Modular growth
Conclusion- Disadvantages of
DDBMSs
Architectural complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more
complex
References
Chapter 22 of database systems concepts
(Silberschatz book)
ICS courses on DBMS: CS222, CS223