Distributed Database Systems
(DDBS)
Khurshid Asghar
Assistant Professor
Department of Computer Science
University Of Okara!
Contents of the Day!
Distributed Data Processing
What is being distributed?
What is a DDBS?
Workable definition + Concepts of DDBS
Distributed Database Management System(DDBMS)
Centralized and Distributed DBS on a Network
Advantages / Disadvantages of DDBS
Transparency in DDBS
Distribution transparency
• Fragmentation transparency
• Location transparency
• Replication transparency
• Naming transparency
• Network transparency
Transaction transparency
Performance transparency
Distributed Data Processing (DDP)
Distributed Data Processing can be defined as:
“A system consisting of a number of autonomous
processing elements (not necessarily homogeneous)
that are connected through a computer network and
that cooperate in performing their assigned task”.
Three things are important here:
1. Multiple systems are involved.
2. These multiple systems are linked together through
some network
3. These multiple systems perform common tasks in
which they cooperate with each other.
Distributed Data Processing (conti…)
The Distributed Systems is the opposite
to the centralized system:
Computers installed at different sites
Each of them performing independent
data processing
Each computer is specialized to
perform a range of activities/tasks
(marketing, promotion….)
Why is DDP Increasing?
Dramatically reduce hardware costs
Increase desktop power
Improve user interfaces
Ability to share data across multiple servers
Figure 1: Distribute Data Processing
What is Being Distributed?
Processing Logic can be distributed
We can divide our goal/task into different
Functions and get them distributed among
various systems
Data
Control
All these things can be divided to make our
system run efficiently.
Distributed Data Processing
Synonymous terms
distributed function
Distributed Computing
multiprocessors/multi-computers
satellite processing
backend processing
dedicated/special purpose computers
timeshared systems
functionally modular systems
What is a Distributed Database System?
A distributed database (DDB) is a
collection of multiple, logically
interrelated databases distributed over a
computer network.
A collection of logically interrelated
databases that are spread physically
across multiple locations connected by a
data communication link.
Workable Definition of DDBS
A distributed database system consists of a collection of
sites connected together via some kind of communications
network, in which :
each site is a database system site in its own right;
the sites agree to work together, so that a user at any
site can access data anywhere in the network exactly as
if the data were all stored at the user's own site
It is a logical union of real databases
It can be seen as a kind of partnership among individual
local DBMS's
Difference with remote access or distributed processing
systems
Temporary assumption: strict homogeneity
Concepts of DDBS
Collection of logically-related shared data.
Data split into fragments.
Fragments may be replicated.
Fragments/replicas allocated to sites.
Sites linked by a communications network.
Data at each site is under control of a DBMS.
DBMSs handle local applications autonomously.
Each DBMS participates in at least one global
application.
Distributed Database Management System:
A Software system that permits the
management of the distributed database and
makes the distribution transparent to users.
Distributed database system (DDBS) = DDB + DDBMS
Centralized DBS on a Network
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Figure 2: Centralized DBS
Distributed DBS Environment
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Figure 3: Distributed DBS
DDBS Environment:
Three types of accesses are involved:
Local access
the access by the users connected to a site and
accessing the data from the same site.
Remote access
a user connected to a site, lets say site 1, and
accessing the data from site 2.
Global access
no matter from where ever the access is made,
data will be displayed after being collected from
all locations.
Advantages of DDBS
1. Increased reliability and availability
2. Local control
3. Modular growth (resilient)
4. Lower communication costs (More Economical)
5. Faster response
6. Reflects the organizational structure
7. Secured management of distributed data
8. Robust
9. Sharing data
10. Complied with ACID properties
11. Improved performance and Parallelism in executing
transactions can be achieved.
Disadvantages of DDBS
1. Complex Software
2. Increased Processing overhead
3. Different data formats might be used – This may cost time.
4. Complexity
5. Cost (Increased training cost etc)
6. Security
7. Data Integrity control more difficult
8. Lack of standards
9. Lack of experience
10. Database design more complex
11. Deadlock is difficult to handle compared to a centralized
system
12. Increased storage requirements
Transparency in DDBMS
The user of a distributed database system should not be
required to know either where the data are physically located
or how the data can be accessed at the specific local site.
This characteristic of DDBMS is called DATA TRANSPARENCY
Figure 4: User View Figure 5: System View
Transparency in DDBMS
Transparencies hide implementation details from the
user!
Allows end users to feel like only database user
Transparency features
Distribution
Transaction
Failure
Performance
Heterogeneity
Transparencies in a DDBMS:
1. Distribution transparency
1. Fragmentation transparency
2. Location transparency
3. Replication transparency
4. Naming transparency
5. Network transparency
2. Data Independence
3. Transaction transparency
4. Performance transparency
1. Distribution transparency
Allows the user to see the database as a single, logical
entity.
If a DDBMS exhibits distribution transparency, then the user
does not need to know the data is fragrances (fragmentation
transparency) or the location of data items (Local
transparency)
Distribution transparency can be classified into:
Fragmentation transparency
Location transparency
Replication transparency
Naming transparency
Network transparency
1.1 Fragmentation transparency
A file or a table is broken down into smaller parts/sections
called fragments and those fragments are stored at different
locations.
A table can be fragmented horizontally (row-wise) or
vertically (column-wise). Hence we have two major types of
fragmentations:
Horizontal Fragmentation: Selection
Vertical Fragmentation: Projection
Different fragmentations of a table are placed at different
locations.
Fragmentation transparency is that a user should not know
that the database is fragmented. The concept of
fragmentation should be kept hidden from the user.
Conti…
Example:
SELECT fName, lName
FROM Staff
WHERE position = ‘ Manager ’;
1.2 Location Transparency
With location transparency, the user must know how the data
has been fragmented but still does not have to know the
location of the data.
A location transparent name contains no information about
the named object’s physical location.
1.3 Replication transparency
Performance, Availability & Reliability results
Replication.
In replication same data is stored on multiple sites.
e.g. In case of a bank every branch is holding the
data of every other branch.
Refers to copies not actual location i.e. distributing
copies across the network in a transparent manner.
User should not be aware of copies.
Does not need to know the details or to understand
the technical details.
1.4 Naming Transparency
Each item in a DDB must have a unique name.
DDBMS must ensure that no two sites create a database
object with same name.
One solution is to create central name server. However, this
results in:
loss of some local autonomy;
central site may become a bottleneck;
low availability; if the central site fails, remaining sites
cannot create any new objects.
1.5 Network transparency:
This is another form of transparency. The user is unaware of
even the existence of the network, that frees him from the
problems and complexities of network.
2. Data Independence Transparency
Major advantage of the database approach is the data
independence as the program and data are not dependent on
each other.
Logical data independence:
If we change the conceptual schema there is little or no effect on
the External level.
Physical data independence:
If we change the physical or lower level then there is little or no
effect on the conceptual level.
3. Transaction Transparency
Ensures that all distributed transactions maintain
distributed database’s integrity and consistency.
Distributed transaction accesses data stored at more
than one location.
Each transaction is divided into number of sub-
transactions, one for each site that has to be
accessed.
DDBMS must ensure the indivisibility of both the
global transaction and each sub-transactions.
4. Performance Transparency
DDBMS must perform as if it were a centralized DBMS.
DDBMS should not suffer any performance degradation
due to distributed architecture.
DDBMS should determine most cost-effective strategy to
execute a request.
Distributed Query Processor (DQP) maps data request into
ordered sequence of operations on local databases.
Must consider fragmentation, replication, and allocation
schemas.
DQP has to decide:
which fragment to access;
which copy of a fragment to use;
which location to use.
Responsible for Transparency
User Language
Operating System
DBMS
References:
Book: Principles of Distributed Database Systems-
Chapter No. 1
[Link]
database-final
[Link]
ributed_dbms_distribution_transparency.htm
[Link]
doc/SCN73/[Link]#intro dist db
Thanks!
Happy Learning!