0% found this document useful (0 votes)

86 views67 pages

Understanding Distributed Database Systems

This document discusses distributed database systems and the challenges of managing distributed data and transactions. It introduces key concepts like data fragmentation, replication, and distribution transparency. It also covers distributed transaction management and the distributed commit problem of ensuring transactions execute atomically across multiple sites. Distributed databases allow data to be located close to where it is accessed while distributed transactions ensure operations remain consistent even when involving multiple database sites.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views67 pages

Understanding Distributed Database Systems

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 67

Distributed Data Management

Distributed Systems
Department of Computer Science
UC Irvine
1

Centralized DB systems
Software:

P
M

...

Simplifications:

Application
SQL Front End
Query Processor
Transaction Proc.
File Access

single front end

one place to keep data, locks
if processor fails, system fails, ...
2

Distributed Database Systems

Multiple processors ( + memories)
Heterogeneity and autonomy of
components

Why do we need Distributed

Databases?
Example: IBM has offices in London,
New York, and Hong Kong.
Employee data:
EMP(ENO, NAME, TITLE, SALARY, )

Where should the employee data

table reside?

IBM Data Access Pattern

Mostly, employee data is managed at
the office where the employee works
E.g., payroll, benefits, hire and fire

Periodically, IBM needs consolidated

access to employee data
E.g., IBM changes benefit plans and that
affects all employees.
E.g., Annual bonus depends on global net
profit.

New York
Payroll app

London
Payroll app

EMP
London

New York

Internet
Hong Kong
Payroll app

Hong Kong

Problem:
NY and HK payroll
apps run very slowly!

New York
Payroll app

London
Payroll app

London

London
Emp

New York

NY
Emp

Internet
Hong Kong
Payroll app

Much better!!
Hong Kong
HK
Emp
7

New York
Payroll app

London
Payroll app
Annual
Bonus app

London

London
Emp

New York

NY
Emp

Internet
Hong Kong
Payroll app

Distribution provides
opportunities for
parallel execution
Hong Kong
HK
Emp
8

New York
Payroll app

London
Payroll app
Annual
Bonus app

London

London
Emp

New York

NY
Emp

Internet
Hong Kong
Payroll app

Hong Kong
HK
Emp
9

New York
Payroll app

London
Payroll app
Annual
Bonus app

London

Lon, NY
Emp

New York

NY, HK
Emp

Internet
Hong Kong
Payroll app

Replication improves
availability
Hong Kong
HK, Lon
Emp
10

Homogeneous Distributed
Databases
In a homogeneous distributed database
All sites have identical software
Are aware of each other and agree to cooperate in
processing user requests.
Each site surrenders part of its autonomy in terms of right to
change schemas or software
Appears to user as a single system
In a heterogeneous distributed database
Different sites may use different schemas and software
Difference in schema is a major problem for query
processing
Difference in software is a major problem for transaction
processing
Sites may not be aware of each other and may provide only
limited facilities for cooperation in transaction processing

DB architectures
(1) Shared memory
P

...

DB architectures
(2) Shared disk
P

P
...

M
...
13

DB architectures
(3) Shared nothing

...

P
M

DB architectures
(4) Hybrid example Hierarchical or Clustered

...

M
P
M
15

Issues for selecting

architecture
Reliability
Scalability
Geographic distribution of data
Performance
Cost

Parallel or distributed DB system?

More similarities than differences!

Typically, parallel DBs:

Fast interconnect
Homogeneous software
High performance is goal
Transparency is goal

Typically, distributed DBs:

Geographically distributed
Data sharing is goal (may run into
heterogeneity, autonomy)
Disconnected operation possible

Distributed Database
Challenges

Distributed Database Design

Deciding what data goes where
Depends on data access patterns of
major applications
Two subproblems:
Fragmentation: partition tables into
fragments
Allocation: allocate fragments to nodes

Distributed Data Storage

Assume relational data model
Replication
System maintains multiple copies of data,
stored in different sites, for faster retrieval and
fault tolerance.
Fragmentation
Relation is partitioned into several fragments
stored in distinct sites
Replication and fragmentation can be combined
Relation is partitioned into several fragments:
system maintains several identical replicas of
each such fragment.

Data Replication
A relation or fragment of a relation is
replicated if it is stored redundantly in two
or more sites.
Full replication of a relation is the case
where the relation is stored at all sites.
Fully redundant databases are those in
which every site contains a copy of the
entire database.

Data Replication (Cont.)

Advantages of Replication
Availability: failure of site containing relation r does not result
in unavailability of r is replicas exist.
Parallelism: queries on r may be processed by several nodes
in parallel.
Reduced data transfer: relation r is available locally at each
site containing a replica of r.
Disadvantages of Replication
Increased cost of updates: each replica of relation r must be
updated.
Increased complexity of concurrency control: concurrent
updates to distinct replicas may lead to inconsistent data unless
special concurrency control mechanisms are implemented.
One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy

Data Fragmentation
Division of relation r into fragments r1, r2, , rn which
contain sufficient information to reconstruct relation r.
Horizontal fragmentation: each tuple of r is assigned
to one or more fragments
Vertical fragmentation: the schema for relation r is
split into several smaller schemas
All schemas must contain a common candidate key
(or superkey) to ensure lossless join property.
A special attribute, the tuple-id attribute may be
added to each schema to serve as a candidate key.
Example : relation account with following schema
Account = (branch_name, account_number, balance )

Horizontal Fragmentation of account Relation

branch_name
Hillside
Hillside
Hillside

account_number

balance

A-305
A-226
A-155

500
336
62

account1 = branch_name=Hillside
(account )
branch_name account_number
Valleyview
Valleyview
Valleyview
Valleyview

balance

A-177
A-402
A-408
A-639

account2 = branch_name=Valleyview

205
10000
1123
750

Vertical Fragmentation of employee_info Relation

branch_name

customer_name

tuple_id

Lowman
1
Hillside
Camp
2
Hillside
Camp
3
Valleyview
Kahn
4
Valleyview
Kahn
5
Hillside
Kahn
6
Valleyview
Green
7
Valleyview
deposit1 = branch_name, customer_name, tuple_id (employee_info )
account_number

balance

tuple_id

500
A-305
1
336
A-226
2
205
A-177
3
10000
A-402
4
62
A-155
5
1123
A-408
6
750
A-639
7
deposit2 = account_number, balance, tuple_id (employee_info )

Advantages of Fragmentation
Horizontal:
allows parallel processing on fragments of a relation
allows a relation to be split so that tuples are located
where they are most frequently accessed
Vertical:
allows tuples to be split so that each part of the tuple is
stored where it is most frequently accessed
tuple-id attribute allows efficient joining of vertical
fragments
allows parallel processing on a relation
Vertical and horizontal fragmentation can be mixed.
Fragments may be successively fragmented to an
arbitrary depth.

Data Transparency
Data transparency: Degree to which
system user may remain unaware of the
details of how and where the data items
are stored in a distributed system
Consider transparency issues in relation
to:
Fragmentation transparency
Replication transparency
Location transparency

Naming of Data Items Criteria

1. Every data item must have a system-wide
unique name.
2. It should be possible to find the location of
data items efficiently.
3. It should be possible to change the
location of data items transparently.
4. Each site should be able to create new
data items autonomously.

Centralized Scheme - Name

Server
Structure:
name server assigns all names
each site maintains a record of local data items
sites ask name server to locate non-local data items
Advantages:
satisfies naming criteria 1-3
Disadvantages:
does not satisfy naming criterion 4
name server is a potential performance bottleneck
name server is a single point of failure

Use of Aliases
Alternative to centralized scheme: each site
prefixes its own site identifier to any name
that it generates i.e., site 17.account.
Fulfills having a unique identifier, and
avoids problems associated with central
control.
However, fails to achieve network
transparency.

What is a Transaction?
A set of steps completed by a DBMS to
accomplish a single user task.
Must be either entirely completed or aborted
No intermediate states are acceptable

Distributed Transactions
Transaction may access data at several sites.
Each site has a local transaction manager responsible for:
Maintaining a log for recovery purposes
Participating in coordinating the concurrent execution of the
transactions executing at that site.
Each site has a transaction coordinator, which is responsible
for:
Starting the execution of transactions that originate at the site.
Distributing subtransactions at appropriate sites for execution.
Coordinating the termination of each transaction that originates
at the site, which may result in the transaction being committed
at all sites or aborted at all sites.

Transaction System
Architecture

System Failure Modes

Failures unique to distributed systems:
Failure of a site.
Loss of massages
Handled by network transmission control protocols such as TCP-IP
Failure of a communication link
Handled by network protocols, by routing messages via
alternative links

Network partition
A network is said to be partitioned when it has been split
into two or more subsystems that lack any connection
between them
Note: a subsystem may consist of a single node
Network partitioning and site failures are generally
indistinguishable.

Distributed commit
problem
.
Transaction T
Action:
a1,a2

Action:
a3

Commit must be atomic

Action:
a4,a5

Example of a Distributed
Transaction
How a distributed transaction that has components
at several sites can
execute
atomically?
Store
2 should
T1
ship
500
T2
toothbrushes to
store 5
T0
Headquarters

FAILs
N := N - 500

N := N + 500

Stores
Each Ti is executed atomically, but T0 itself is not atom

Distributed commit
problem

Commit must be atomic

Solution: Two-phase commit (2PC)

Centralized 2PC
Distributed 2PC
Linear 2PC
Many other variants

TWO-PHASE COMMIT (2PC) - OK

TWO-PHASE COMMIT (2PC) - ABORT

G
lo
ba

bo
r

Terminology
Resource Managers (RMs)
Usually databases

Participants
RMs that did work on behalf of
transaction

Coordinator
Component that runs two-phase
commit on behalf of transaction

PREPARED*

COMMIT*

DONE

Participant

Coordinator

REQUEST-TO-PREPARE

ABORT

DONE

Participant

Coordinator

REQUEST-TO-PREPARE

Centralized two-phase
commit
Coordinator Participant

I
commit-request
request-prepare*
prepared*
Commit*

no
abort*

request-prepare
prepared

done*

commit
done

request-prepare
no

abort
done

Notation: Incoming message

Outgoing message
( * = everyone)
When participant enters W state:
it must have acquired all resources
it can only abort or commit if so
instructed by a coordinator

Coordinator only enters C state if all

participants are in W, i.e., it is
certain that all will eventually commit

After coordinator receives DONE

message, it can forget about the
transaction
E.g., cleanup control structures

Next
Add timeouts to cope with
messages lost during crashes

Coordinator
commit-request
request-prepare*

no
abort*

W
prepared*
commit*
any
commit

any
abort

I
A
_t_
abort*

done*
-

F
done*
-

_t_
commit

_t_
abort

t=timeout

Participant
request-prepare
abort
no
done
request-prepare
prepared
abort
_t_
done
ping

commit
done
commit
done

equivalent to
finish state

Distributed Query
Processing

For centralized systems, the primary criterion

for measuring the cost of a particular strategy
is the number of disk accesses.
In a distributed system, other issues must be
taken into account:
The cost of a data transmission over the
network.
The potential gain in performance from
having several sites process parts of the
query in parallel.

Problem Statement
Input: Query
How many times has the moon circled
around the earth in the last twenty years?

Output: Answer
240!

Objectives:
response time, throughput, first answers,
little IO, ...
Centralized vs. Distributed Query Processing
same problem
but, different parameters and objectives

Query Processing
Input: Declarative Query
SQL, OQL, XQuery, ...

Step 1: Translate Query into Algebra

Tree of operators

Step 2: Optimize Query (physical and

logical)
Tree of operators
(Compilation)

Step 3: Interpretation
Query result

Algebra
A.d
SELECT A.d
FROM
A, B
WHERE A.a = B.b
AND A.c = 35

A.a = B.b,
A.c = 35

X
A

relational algebra for SQL very well

understood
algebra for XQuery (work in progress)

Query Optimization
A.d

A.d

A.a = B.b,
A.c = 35

hashjoin

X
A

B.b
B

index A.c

no brainers (e.g., push down cheap predicates)

enumerate alternative plans, apply cost model
use search heuristics to find cheapest plan

Query Execution
John
A.d
(John, 35, CS)
hashjoin
(John, 35, CS)
(Mary, 35, EE)
index A.c

(CS)
(AS)

B.b
B

(Edinburgh, CS,5.0
(Edinburgh, AS, 6.0

library of operators (hash join, merge join, ...)

pipelining (iterator model)
lazy evaluation
exploit indexes and clustering in database

Distributed Query
Processing

Idea:

This is just an extension of centralized query

processing. (System R* et al. in the early 80s)

What is different?
extend physical algebra: send&receive
operators
resource vectors, network interconnect matrix
caching and replication
optimize for response time
less predictability in cost model (adaptive algos)
heterogeneity in data formats and data models

Distributed Query Plan

A.d
hashjoin
receive

receive

send

send
B.b

index A.c

Cost
1

Total Cost =
Sum of Cost of Ops

Cost = 40
1

6
2

Query Transformation
Translating algebraic queries on fragments.
It must be possible to construct relation r from its fragments
Replace relation r by the expression to construct relation r
from its fragments
Consider the horizontal fragmentation of the account relation
into
account1 =

branch_name = Hillside

account2 =

branch_name = Valleyview

The query

branch_name = Hillside

(account )
(account )

(account ) becomes

(account1 account2)

which is optimized into

branch_name = Hillside

(account1)

branch_name = Hillside

(account2)

Simple Join Processing

Consider the following relational algebra
expression in which the three relations are
neither replicated nor fragmented
account
depositor
branch
account is stored at site S1
depositor at S2
branch at S3
For a query issued at site SI, the system
needs to produce the result at site SI

Possible Query Processing

Strategies

Ship copies of all three relations to site SI and choose a

strategy for processing the entire locally at site SI.
Ship a copy of the account relation to site S2 and
compute temp1 = account
depositor at S2. Ship
temp1 from S2 to S3, and compute temp2 = temp1
branch at S3. Ship the result temp2 to SI.
Devise similar strategies, exchanging the roles S1, S2, S3
Must consider following factors:
amount of data being shipped
cost of transmitting a data block between sites
relative processing speed at each site

Semijoin Strategy
Let r1 be a relation with schema R1 stores at site S1

Let r2 be a relation with schema R2 stores at site S2

Evaluate the expression r1 r2 and obtain the result at
S1.
1. Compute temp1 R1 R2 (r1) at S1.
2. Ship temp1 from S1 to S2.
3. Compute temp2 r2
temp1 at S2
4. Ship temp2 from S2 to S1.
5. Compute r1 temp2 at S1. This is the same as r1
r2.

Formal Definition
The semijoin of r1 with r2, is denoted by:
r1

it is defined by:
R1 (r1
r2)
Thus, r1
r2 selects those tuples of r1 that
contributed to r1 r2.
In step 3 above, temp2=r2
r1.
For joins of several relations, the above strategy
can be extended to a series of semijoin steps.

Join Strategies that Exploit Parallelism

Consider r1

r4 where relation ri is stored at

site Si. The result must be presented at site S1.

r1 is shipped to S2 and r1

r2 is computed at S2:

simultaneously r3 is shipped to S4 and r3

r4 is

computed at S4
S2 ships tuples of (r1
S4 ships tuples of (r3
Once tuples of (r1

(r3
of (r1

r2) to S1 as they produced;

r4) to S1

r2) and (r3

r4) arrive at S1 (r1

r2)

r4) is computed in parallel with the computation

r2) at S2 and the computation of (r3

r4) at S4.

Conclusion- Advantages of
DDBMSs
Reflects organizational structure

Improved shareability and local autonomy

Improved availability
Improved reliability
Improved performance
Economics
Modular growth

Conclusion- Disadvantages of
DDBMSs
Architectural complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more
complex

References
Chapter 22 of database systems concepts
(Silberschatz book)
ICS courses on DBMS: CS222, CS223

Distributed DB Systems Overview
No ratings yet
Distributed DB Systems Overview
67 pages
Distributed Databases
No ratings yet
Distributed Databases
53 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
30 pages
Unit 4 Part 4
No ratings yet
Unit 4 Part 4
20 pages
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
No ratings yet
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
24 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Mod 5
No ratings yet
Mod 5
88 pages
Distributed Databases: CMP-3440 - Database Systems
No ratings yet
Distributed Databases: CMP-3440 - Database Systems
12 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
60 pages
Chapter 7
No ratings yet
Chapter 7
26 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Overview of Distributed Data Processing
No ratings yet
Overview of Distributed Data Processing
62 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
30 pages
17 DatabaseArchitectures
No ratings yet
17 DatabaseArchitectures
41 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Unit 5
No ratings yet
Unit 5
17 pages
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
No ratings yet
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
12 pages
Unit I Distributed Databases
No ratings yet
Unit I Distributed Databases
15 pages
Dbmsunit5 Advancedtopics
No ratings yet
Dbmsunit5 Advancedtopics
23 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
29 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
5 pages
Unit V
No ratings yet
Unit V
22 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
25 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
Understanding Distributed Databases Concepts
No ratings yet
Understanding Distributed Databases Concepts
56 pages
CH 19
No ratings yet
CH 19
27 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
29 pages
Advanced Distributed Databases
No ratings yet
Advanced Distributed Databases
14 pages
Distributed Database Frank Chinembiri and Florence-2
No ratings yet
Distributed Database Frank Chinembiri and Florence-2
42 pages
DDBMS (3,4 & 14)
No ratings yet
DDBMS (3,4 & 14)
11 pages
Overview of Distributed Databases
No ratings yet
Overview of Distributed Databases
16 pages
DDB Unit 1-5
No ratings yet
DDB Unit 1-5
190 pages
Chapter-7 Distributed Database Systems
No ratings yet
Chapter-7 Distributed Database Systems
40 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
Chapter 6
No ratings yet
Chapter 6
45 pages
Distributed Database and Big Data
No ratings yet
Distributed Database and Big Data
72 pages
Distributed Database Fundamentals
No ratings yet
Distributed Database Fundamentals
36 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Types of Distributed Data Base System - 49724
No ratings yet
Types of Distributed Data Base System - 49724
37 pages
Unit 5
No ratings yet
Unit 5
21 pages
Database
No ratings yet
Database
6 pages
Distributed Database Recovery Methods
No ratings yet
Distributed Database Recovery Methods
58 pages
Distributed Databases: Presentation-I
No ratings yet
Distributed Databases: Presentation-I
30 pages
Chapter 22: Distributed Databases
No ratings yet
Chapter 22: Distributed Databases
91 pages
DBMS-Unit 5
No ratings yet
DBMS-Unit 5
27 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
123 pages
CS8492 DBMS Unit 5
No ratings yet
CS8492 DBMS Unit 5
20 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
46 pages
Unit-2 - Distributed Database System
No ratings yet
Unit-2 - Distributed Database System
7 pages
Distributed Systems
No ratings yet
Distributed Systems
25 pages
Distributed Database System
No ratings yet
Distributed Database System
5 pages
Distributed Databases Guide
No ratings yet
Distributed Databases Guide
13 pages
Distributed Database Vs Conventional Database
50% (2)
Distributed Database Vs Conventional Database
4 pages
Chapter 19: Distributed Databases
No ratings yet
Chapter 19: Distributed Databases
95 pages
Active Data Guard Hands On Lab
100% (2)
Active Data Guard Hands On Lab
50 pages
BCA 2nd Year
No ratings yet
BCA 2nd Year
4 pages
2.ER Model and Normalization
No ratings yet
2.ER Model and Normalization
21 pages
SQL For Web Developers
No ratings yet
SQL For Web Developers
16 pages
2SQL in PLSQL Control Structure and Loops
No ratings yet
2SQL in PLSQL Control Structure and Loops
11 pages
Databaser
No ratings yet
Databaser
137 pages
Homework Assignment 3: Name: Paul Gudgel
No ratings yet
Homework Assignment 3: Name: Paul Gudgel
12 pages
DWH BASICS Interview Questions
No ratings yet
DWH BASICS Interview Questions
30 pages
Snowpro-Core 7
No ratings yet
Snowpro-Core 7
37 pages
Keyword Analytics Overview Report
No ratings yet
Keyword Analytics Overview Report
5 pages
Online Analytical Processing (OLAP) For Decision Support: January 2008
No ratings yet
Online Analytical Processing (OLAP) For Decision Support: January 2008
25 pages
STEPS For Switch Over From Primary To Standby Database
No ratings yet
STEPS For Switch Over From Primary To Standby Database
7 pages
CST363 Database Final Exam Spring 2017
No ratings yet
CST363 Database Final Exam Spring 2017
14 pages
Module 5 - JDBC Data Access With Spring Final
No ratings yet
Module 5 - JDBC Data Access With Spring Final
81 pages
98-364 Microsoft Exam Questions and Answers - CertLibrary - Com4
No ratings yet
98-364 Microsoft Exam Questions and Answers - CertLibrary - Com4
7 pages
Computer Science Exam Guidelines
No ratings yet
Computer Science Exam Guidelines
9 pages
Lab Experiment 1: Library Management System
No ratings yet
Lab Experiment 1: Library Management System
20 pages
DBMS Module 2
No ratings yet
DBMS Module 2
8 pages
Fullstack Development - Module 2
No ratings yet
Fullstack Development - Module 2
94 pages
PL SQL Performance Tuning Guide
No ratings yet
PL SQL Performance Tuning Guide
1 page
Short Notes: @maheshpal Singh Rathore @mpsrathore2020
No ratings yet
Short Notes: @maheshpal Singh Rathore @mpsrathore2020
21 pages
Polio 065640
No ratings yet
Polio 065640
9 pages
Advanced DBMS Course Overview
No ratings yet
Advanced DBMS Course Overview
1 page
PL SQL Assignment
No ratings yet
PL SQL Assignment
11 pages
MongoDB One-to-One and One-to-Many Tasks
No ratings yet
MongoDB One-to-One and One-to-Many Tasks
6 pages
Welcome To: JDBC Programming I
No ratings yet
Welcome To: JDBC Programming I
27 pages
DBMS MQP
No ratings yet
DBMS MQP
4 pages
CRUD Operations Using Stored Procedures
No ratings yet
CRUD Operations Using Stored Procedures
7 pages
SQL Revision Notes
No ratings yet
SQL Revision Notes
25 pages
SAP LUW for ABAP Developers
No ratings yet
SAP LUW for ABAP Developers
1 page

Understanding Distributed Database Systems

Uploaded by

Understanding Distributed Database Systems

Uploaded by

Distributed Data Management

single front end

Distributed Database Systems

Why do we need Distributed

Where should the employee data

IBM Data Access Pattern

Periodically, IBM needs consolidated

Issues for selecting

Parallel or distributed DB system?

Typically, parallel DBs:

Typically, distributed DBs:

Distributed Database Design

Distributed Data Storage

Data Replication (Cont.)

Horizontal Fragmentation of account Relation

Vertical Fragmentation of employee_info Relation

Naming of Data Items Criteria

Centralized Scheme - Name

System Failure Modes

Commit must be atomic

Commit must be atomic

Solution: Two-phase commit (2PC)

TWO-PHASE COMMIT (2PC) - OK

TWO-PHASE COMMIT (2PC) - ABORT

Notation: Incoming message

Coordinator only enters C state if all

After coordinator receives DONE

For centralized systems, the primary criterion

Step 1: Translate Query into Algebra

Step 2: Optimize Query (physical and

relational algebra for SQL very well

no brainers (e.g., push down cheap predicates)

library of operators (hash join, merge join, ...)

This is just an extension of centralized query

Distributed Query Plan

which is optimized into

Simple Join Processing

Possible Query Processing

Ship copies of all three relations to site SI and choose a

Let r2 be a relation with schema R2 stores at site S2

Join Strategies that Exploit Parallelism

r4 where relation ri is stored at

site Si. The result must be presented at site S1.

simultaneously r3 is shipped to S4 and r3

r2) to S1 as they produced;

r2) and (r3

r4) arrive at S1 (r1

r4) is computed in parallel with the computation

Improved shareability and local autonomy

You might also like