0% found this document useful (0 votes)
13 views44 pages

Topic 7 - Distributed Database Systems

The document discusses distributed database systems, including their architecture, design issues, advantages, and types. A distributed database system exists where logically related data is physically distributed between processors linked by a network, with a distributed database management system making this transparent to users. Key issues in distributed database design include data and control distribution and ensuring performance, transparency, and standards.

Uploaded by

miragelimited91
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
0% found this document useful (0 votes)
13 views44 pages

Topic 7 - Distributed Database Systems

The document discusses distributed database systems, including their architecture, design issues, advantages, and types. A distributed database system exists where logically related data is physically distributed between processors linked by a network, with a distributed database management system making this transparent to users. Key issues in distributed database design include data and control distribution and ensuring performance, transparency, and standards.

Uploaded by

miragelimited91
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 44

DISTRIBUTED DATABASE

SYSTEMS

Distributed Databases 1
Outline
Introduction
Distributed DBMS Architecture
Distributed Database Design Issues
Date’s Twelve Rules for DDBMS1
Parallel Database Systems

Distributed Databases 2
Motivation

Database Computer
Technology Networks
Integration Distribution

Distributed
Database Systems
Integration
Integration // Centralization

Distributed Databases 3
Distributed Computing
Synonymous terms
distributed function
distributed data processing
multiprocessors/multicomputers
satellite processing
backend processing
dedicated/special purpose computers
timeshared systems
functionally modular systems
Distributed Databases 4
What is Distributed….
Processing logic
Functions
Data
Control

Distributed Databases 5
What is Distributed Database
System?
A distributed database system(DDB) exists where
logically related data is physically distributed between
a number of separate processors linked by a
communications network.
A distributed database management system (DDBMS)
is the software that manages the DDB and provides an
access mechanism that makes this distribution
transparent to the users.
Distributed database system (DDBS) = DDB +
DDBMS

Distributed Databases 6
What is not a DDBS?
A timesharing computer system
A loosely or tightly coupled multiprocessor
system
A database system which resides at one of the
nodes of a network of computers - this is a
centralized database on a network node

Distributed Databases 7
Centralized DBMS on a
Network

Site 1 Site 2

Database

Communications
Network

Site 3 Site 4

Distributed Databases 8
Distributed DBMS Environment

Site 1 Site 2

Database2

Database 1

Communications
Network

Site 3 Site 4
Database4

Distributed Databases 9
Implicit Assumptions
Data stored at a number of sites => each site logically
consists of a single processor.
Processors at different sites are interconnected by a
computer network => no multiprocessors -->
parallel database systems
Distributed database is a database, not a collection of
files => data logically related as exhibited in the users’
access patterns --> relational data model
DDBMS is a full-fledged DBMS
--> not remote file system, not a TP system

Distributed Databases 10
Parallel Database Architectures

Distributed Databases 11
Shared Memory Architecture
Processors and disks have access to a
common memory, typically via a bus or
through an interconnection network.

Distributed Databases 12
Shared Disk Architecture
All processors can directly access all disks
via an interconnection network, but the
processors have private memories.
Architecture provides a degree of fault-
tolerance — if a processor fails, the other
processors can take over its tasks since
the database is resident on disks that are
accessible from all processors.

Distributed Databases 13
Shared Nothing Architecture
Node consists of a processor, memory,
and one or more disks. Processors at one
node communicate with another processor
at another node using an interconnection
network. A node functions as the server for
the data on the disk or disks the node
owns.

Distributed Databases 14
Applications of DDB
Manufacturing - especially multi-plant
manufacturing
Military command and control
Banking
Corporate MIS
Airlines
Hotel chains
Any organization which has a decentralized
organization structure

Distributed Databases 15
Advantages of DDBS
Organizational Structure - fits into
organizations distributed over several locations
Shareability and Local Autonomy - Local users
can control their own data while being
accessible ‘globally’
Improved availability – if there is failure at one
site, others are accessible

Distributed Databases 16
Advantages of DDBS
Improved reliability - replication of data
Improved performance - local data is located
where demand for it is likely to be greatest
Transparent management of distributed,
fragmented, and replicated data
Economical - centralized processing power in a
single piece of hardware is not necessarily
cheaper than separate smaller units
Modular growth – simpler to expand

Distributed Databases 17
Disadvantages
Complexity – by hiding their distributed nature
and trying to ensure optimum performance,
reliability and availability, DDBS are more
complex
10
Cost – procurement and maintenance cost
Security – more difficult
Integrity Control more difficult
Lack of Standards
Lack of Experience
Database Design more Complex
Distributed Databases 18
Types of DDBMS

Homogeneous DDBMS
Heterogeneous DDBMS

Distributed Databases 19
Homogeneous DDBMS

All sites use same DBMS product.


Much easier to design and manage.
Approach provides incremental growth
and allows increased performance.

Distributed Databases 20
Heterogeneous DDBMS

Sites may run different DBMS products, with


possibly different underlying data models.
Occurs when sites have implemented their
own databases and integration is considered
later
Translations required to allow for:
Different hardware
Different DBMS products
Different hardware and different DBMS products
Typical solution is to use gateways
Distributed Databases 21
Reference Architecture for
DDBMS
Due to diversity, no accepted architecture
equivalent to ANSI/SPARC 3-level
architecture.
A reference architecture consists of:
Set of global external schemas.
Global conceptual schema (GCS).
Fragmentation schema and allocation schema.
Set of schemas for each local DBMS conforming
to 3-level ANSI/SPARC.
Some levels may be missing, depending on
levels of transparency supported.
Distributed Databases 22
Reference Architecture for
DDBMS

Distributed Databases 23
Key Design Issues
Division and Location of Data
Why fragment at all?
How to fragment?
How much to fragment?
Division and Location of Control
Performance
Transparency to the User
Degree of homogeneity

Distributed Databases 24
Fragmentation
Why fragment?
Usage:
- Apps work with views rather than entire relations.
Efficiency:
- Data stored close to where most frequently used.
- Data not needed by local applications is not stored.
Security:
- and so not available to unauthorized users.
Parallelism:
- With fragments as unit of distribution, T can be divided
into several subqueries that operate on
fragments.
Distributed Databases 25
Horizontal Fragmentation
Projects with Budget less than/greater than or equal to 400,000

Pno pname budget loc

H90 CAD/CAM 200000 Nairobi

S67 Database Dev 600000 H/Q


T67 Maintenance 450000 Kisumu
T90 Networks 300000 Mombasa
S45 School System 100000 Nairobi

Distributed Databases 26
Vertical Fragmentation
Info about project budgets/Info about project names and
locations

Pno pname budget loc

H90 CAD/CAM 200000 Nairobi

S67 Database Dev 600000 H/Q


T67 Maintenance 450000 Kisumu
T90 Networks 300000 Mombasa
S45 School System 100000 Nairobi

Distributed Databases 27
Allocation Alternatives
Non-replicated
partitioned : each fragment resides at only one site
Replicated
fully replicated : each fragment at each site
partially replicated : each fragment at some of the
sites

Distributed Databases 28
Replication Alternatives

Full Partial Partioned


Replication Replication
Query Easy Same Same
Processing Difficulty Difficulty
Directory Easy or Same Same
Management Non-existent Difficulty Difficulty
Concurrency Moderate Difficulty Easy
Control
Reliability Very High High Low

Realty Possible Realistic Possible


Application Application

Distributed Databases 29
Parallelism Requirements
Have as much of the data required by each
application at the site where the application
executes
Full replication
How about updates?
Updates to replicated data requires implementation
of distributed concurrency control and commit
protocols

Distributed Databases 30
System Expansion
Issue is database scaling
Emergence of microprocessor and workstation
technologies
Client-server model of computing
Data communication cost vs telecommunication
cost

Distributed Databases 31
Distributed DBMS Issues
Distributed Database Design
how to distribute the database
replicated & non-replicated database
distribution
related problem in directory management
Query Processing
convert user transactions to data
manipulation instructions
optimization problem

Distributed Databases 32
Distributed DBMS Issues
Concurrency Control
synchronization of concurrent accesses
consistency and isolation of transactions' effects
deadlock management
Reliability
how to make the system resilient to failures
atomicity and durability

Distributed Databases 33
Transparency in a DDBMS
Transparency hides implementation details from users.

Overall objective: equivalence to user of DDBMs to


centralised DBMS. FULL transparency not universally accepted
objective

Four main types:


1. Distribution transparency
2. Transaction transparency
3. Performance transparency
4. DBMS transparency (only applicable to heterogeneous)
Distributed Databases 34
1. Distribution Transparency
Distribution transparency: allows user to perceive database as
single, logical entity.

If DDBMS exhibits distribution transparency, user does not need to know:


• fragmentation transparency: data is fragmented
• Location transparency: location of data items
• otherwise call this local mapping transparency
• replication transparency: user unaware of replication of
fragments

Distributed Databases 35
2. Transaction Transparency
Transaction transparency: Ensures all distributed tx
maintain distributed database’s integrity and
consistency.

• Distributed Tx accesses data stored at more than one


location.
• Each Tx is divided into no. of subTs, one for each site
that has to be accessed.
• DDBMS must ensure the indivisibility of both the global
Tx and each of the subTxs.

Distributed Databases 36
2. Transaction Transparency
Concurrency transparency: All Txs must execute independently and
be logically consistent with results obtained if Txs executed in some
arbitrary serial order.
• Replication makes concurrency more complex
Failure transparency: must ensure atomicity and durability of global
Tx.
• Means ensuring that subTxs of global Tx either all commit or all
abort.
• Classification transparency: In IBM’s Distributed Relational
Database Architecture (DRDA), four types of Txs:
– Remote request
– Remote unit of work
– Distributed unit of work
– Distributed request. Distributed Databases 37
3. Performance Transparency
DDBMS: - no performance degradation due to distributed architecture.
- determine most cost-effective strategy to execute a request.

Distributed Query Processor (DQP) maps data request into ordered


sequence of operations on local databases.
- Must consider fragmentation, replication, and allocation schemas.
DQP has to decide:
1. which fragment to access
2. which copy of a fragment to use
3. which location to use.
- produces execution strategy optimized with respect to some cost
function.
Typically, costs associated with a distributed request include: I/O cost;
CPU cost, communication cost.Distributed Databases 38
Date’s Twelve Rules for DDBMS 1

0 Fundamental Principle
To the user, a distributed system should look exactly like a
non-distributed system
1 Local Autonomy
The sites in a distributed system should be autonomous.
In this context, autonomy means that:
 Local data is locally owned and managed;
 Local operations remain purely local;
 All operations at a given site are controlled by that
site

Distributed Databases 39
Date’s Twelve Rules for DDBMS 2

2 No reliance on a Central Site


There should be no one site without which the system cannot
operate.
This implies that there should be no central servers for services
such as transaction management, deadlock detection, query
optimization, and management of the Global System Catalog
3 Continuous operation
Ideally, there should never be a need for a planned system
shutdown;
for operations such as:
adding or removing a site from the system;
the dynamic creation and deletion of fragments at one or more
sites

Distributed Databases 40
Date’s Twelve Rules for DDBMS 3

4 Location Independence (Transparency)


The user should be able to access the database from
any site. Furthermore, the user should be able to
access all data as if it were stored at the user’s site,
no matter where it is physically stored
5 Fragmentation Independence
The user should be able to access the data, no
matter how it is fragmented.

Distributed Databases 41
Date’s Twelve Rules for DDBMS 4

6 Replication independence
The user should be unaware that data has been
replicated.
Thus, the user should not be able to access a
particular copy of a data item directly, nor should
the user have to specifically update all copies of a
data item
7 Distributed query processing
The system should be capable of processing
queries that reference data at more than one site

Distributed Databases 42
Date’s Twelve Rules for DDBMS 5

8 Distributed Transaction Processing


The system should support the transaction as the unit of
recovery.
The system should ensure that both the global and local
transactions conform to the ACID rules for transactions,
namely: atomicity, consistency, isolation, and durability.
9 Hardware independence
It should be possible to run the DDBMS on a variety of
hardware platforms.
10 Operating system independence
As a corollary to the previous rule, it should be possible to
run the DDBMS on a variety of operating systems

Distributed Databases 43
Date’s Twelve Rules for DDBMS 6

11Network Independence
Again, it should be possible to run the
DDBMS on a variety of disparate
communication networks
12 Database Independence
It should be possible to run different local
DBMSs, perhaps supporting different
underlying data models.
In other words, the system should support
heterogeneity
Distributed Databases 44

You might also like