Topic 7 - Distributed Database Systems
Topic 7 - Distributed Database Systems
SYSTEMS
Distributed Databases 1
Outline
Introduction
Distributed DBMS Architecture
Distributed Database Design Issues
Date’s Twelve Rules for DDBMS1
Parallel Database Systems
Distributed Databases 2
Motivation
Database Computer
Technology Networks
Integration Distribution
Distributed
Database Systems
Integration
Integration // Centralization
Distributed Databases 3
Distributed Computing
Synonymous terms
distributed function
distributed data processing
multiprocessors/multicomputers
satellite processing
backend processing
dedicated/special purpose computers
timeshared systems
functionally modular systems
Distributed Databases 4
What is Distributed….
Processing logic
Functions
Data
Control
Distributed Databases 5
What is Distributed Database
System?
A distributed database system(DDB) exists where
logically related data is physically distributed between
a number of separate processors linked by a
communications network.
A distributed database management system (DDBMS)
is the software that manages the DDB and provides an
access mechanism that makes this distribution
transparent to the users.
Distributed database system (DDBS) = DDB +
DDBMS
Distributed Databases 6
What is not a DDBS?
A timesharing computer system
A loosely or tightly coupled multiprocessor
system
A database system which resides at one of the
nodes of a network of computers - this is a
centralized database on a network node
Distributed Databases 7
Centralized DBMS on a
Network
Site 1 Site 2
Database
Communications
Network
Site 3 Site 4
Distributed Databases 8
Distributed DBMS Environment
Site 1 Site 2
Database2
Database 1
Communications
Network
Site 3 Site 4
Database4
Distributed Databases 9
Implicit Assumptions
Data stored at a number of sites => each site logically
consists of a single processor.
Processors at different sites are interconnected by a
computer network => no multiprocessors -->
parallel database systems
Distributed database is a database, not a collection of
files => data logically related as exhibited in the users’
access patterns --> relational data model
DDBMS is a full-fledged DBMS
--> not remote file system, not a TP system
Distributed Databases 10
Parallel Database Architectures
Distributed Databases 11
Shared Memory Architecture
Processors and disks have access to a
common memory, typically via a bus or
through an interconnection network.
Distributed Databases 12
Shared Disk Architecture
All processors can directly access all disks
via an interconnection network, but the
processors have private memories.
Architecture provides a degree of fault-
tolerance — if a processor fails, the other
processors can take over its tasks since
the database is resident on disks that are
accessible from all processors.
Distributed Databases 13
Shared Nothing Architecture
Node consists of a processor, memory,
and one or more disks. Processors at one
node communicate with another processor
at another node using an interconnection
network. A node functions as the server for
the data on the disk or disks the node
owns.
Distributed Databases 14
Applications of DDB
Manufacturing - especially multi-plant
manufacturing
Military command and control
Banking
Corporate MIS
Airlines
Hotel chains
Any organization which has a decentralized
organization structure
Distributed Databases 15
Advantages of DDBS
Organizational Structure - fits into
organizations distributed over several locations
Shareability and Local Autonomy - Local users
can control their own data while being
accessible ‘globally’
Improved availability – if there is failure at one
site, others are accessible
Distributed Databases 16
Advantages of DDBS
Improved reliability - replication of data
Improved performance - local data is located
where demand for it is likely to be greatest
Transparent management of distributed,
fragmented, and replicated data
Economical - centralized processing power in a
single piece of hardware is not necessarily
cheaper than separate smaller units
Modular growth – simpler to expand
Distributed Databases 17
Disadvantages
Complexity – by hiding their distributed nature
and trying to ensure optimum performance,
reliability and availability, DDBS are more
complex
10
Cost – procurement and maintenance cost
Security – more difficult
Integrity Control more difficult
Lack of Standards
Lack of Experience
Database Design more Complex
Distributed Databases 18
Types of DDBMS
Homogeneous DDBMS
Heterogeneous DDBMS
Distributed Databases 19
Homogeneous DDBMS
Distributed Databases 20
Heterogeneous DDBMS
Distributed Databases 23
Key Design Issues
Division and Location of Data
Why fragment at all?
How to fragment?
How much to fragment?
Division and Location of Control
Performance
Transparency to the User
Degree of homogeneity
Distributed Databases 24
Fragmentation
Why fragment?
Usage:
- Apps work with views rather than entire relations.
Efficiency:
- Data stored close to where most frequently used.
- Data not needed by local applications is not stored.
Security:
- and so not available to unauthorized users.
Parallelism:
- With fragments as unit of distribution, T can be divided
into several subqueries that operate on
fragments.
Distributed Databases 25
Horizontal Fragmentation
Projects with Budget less than/greater than or equal to 400,000
Distributed Databases 26
Vertical Fragmentation
Info about project budgets/Info about project names and
locations
Distributed Databases 27
Allocation Alternatives
Non-replicated
partitioned : each fragment resides at only one site
Replicated
fully replicated : each fragment at each site
partially replicated : each fragment at some of the
sites
Distributed Databases 28
Replication Alternatives
Distributed Databases 29
Parallelism Requirements
Have as much of the data required by each
application at the site where the application
executes
Full replication
How about updates?
Updates to replicated data requires implementation
of distributed concurrency control and commit
protocols
Distributed Databases 30
System Expansion
Issue is database scaling
Emergence of microprocessor and workstation
technologies
Client-server model of computing
Data communication cost vs telecommunication
cost
Distributed Databases 31
Distributed DBMS Issues
Distributed Database Design
how to distribute the database
replicated & non-replicated database
distribution
related problem in directory management
Query Processing
convert user transactions to data
manipulation instructions
optimization problem
Distributed Databases 32
Distributed DBMS Issues
Concurrency Control
synchronization of concurrent accesses
consistency and isolation of transactions' effects
deadlock management
Reliability
how to make the system resilient to failures
atomicity and durability
Distributed Databases 33
Transparency in a DDBMS
Transparency hides implementation details from users.
Distributed Databases 35
2. Transaction Transparency
Transaction transparency: Ensures all distributed tx
maintain distributed database’s integrity and
consistency.
Distributed Databases 36
2. Transaction Transparency
Concurrency transparency: All Txs must execute independently and
be logically consistent with results obtained if Txs executed in some
arbitrary serial order.
• Replication makes concurrency more complex
Failure transparency: must ensure atomicity and durability of global
Tx.
• Means ensuring that subTxs of global Tx either all commit or all
abort.
• Classification transparency: In IBM’s Distributed Relational
Database Architecture (DRDA), four types of Txs:
– Remote request
– Remote unit of work
– Distributed unit of work
– Distributed request. Distributed Databases 37
3. Performance Transparency
DDBMS: - no performance degradation due to distributed architecture.
- determine most cost-effective strategy to execute a request.
0 Fundamental Principle
To the user, a distributed system should look exactly like a
non-distributed system
1 Local Autonomy
The sites in a distributed system should be autonomous.
In this context, autonomy means that:
Local data is locally owned and managed;
Local operations remain purely local;
All operations at a given site are controlled by that
site
Distributed Databases 39
Date’s Twelve Rules for DDBMS 2
Distributed Databases 40
Date’s Twelve Rules for DDBMS 3
Distributed Databases 41
Date’s Twelve Rules for DDBMS 4
6 Replication independence
The user should be unaware that data has been
replicated.
Thus, the user should not be able to access a
particular copy of a data item directly, nor should
the user have to specifically update all copies of a
data item
7 Distributed query processing
The system should be capable of processing
queries that reference data at more than one site
Distributed Databases 42
Date’s Twelve Rules for DDBMS 5
Distributed Databases 43
Date’s Twelve Rules for DDBMS 6
11Network Independence
Again, it should be possible to run the
DDBMS on a variety of disparate
communication networks
12 Database Independence
It should be possible to run different local
DBMSs, perhaps supporting different
underlying data models.
In other words, the system should support
heterogeneity
Distributed Databases 44