0% found this document useful (0 votes)

132 views14 pages

Distributed Databases: Indu Saini (Research Scholar) IIT Roorkee Enrollment No.: 10926003

The document discusses distributed databases. It defines a distributed database as a collection of data that resides on more than one machine and is managed by a distributed database management system. Distributed databases offer advantages like reflecting organizational structure, improving performance beyond capacity limits of a single system, increasing reliability and availability through data replication, and allowing flexibility and scalability. Some challenges with distributed databases include managing transaction integrity and consistency across sites.

Uploaded by

Indu Saini

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

132 views14 pages

Distributed Databases: Indu Saini (Research Scholar) IIT Roorkee Enrollment No.: 10926003

Uploaded by

Indu Saini

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Download as docx, pdf, or txt

You are on page 1/ 14

Distributed Databases

Submitted by

Indu Saini

(Research Scholar)

IIT Roorkee

Enrollment No. : 10926003

Distributed Databases |1

Contents

1. Abstract…………………………………………………………..…………..2
2. Introduction to Distributed Databases……………………………………….2
3. Types of Distributed Databases……………………………………………...4
4. Advantages of Distributed Databases………………………………………..5
5. Disadvantages or Challenges of Distributed Databases……………………..7
6. Distributed Database Design Techniques……………………………………9
7. Conclusion…………………………………………………………………...6
8. References……………………………………………………………………7
Distributed Databases |2

1. ABSTRACT

This paper presents an introduction to the concept of distributed databases, their

advantages over centralized database systems, different types of distributed
databases and the challenges that are faced by network managers in managing
distributed databases. The basic techniques for distributing the data over the
communication network are also mentioned in the paper.

2. INTRODUCTION TO DISTRIBUTED DATABASES

In today’s world of universal dependence on information systems, all sorts of

people need access to companies’ databases. In addition to a company’s own
employees, these include the company’s customers, potential customers,
suppliers, and vendors of all types. It is possible for a company to have all of its
databases concentrated at one mainframe computer site with worldwide access
to this site provided by telecommunications networks, including the Internet.
Although the management of such a centralized system and its databases can be
controlled in a well-contained manner and this can be advantageous, it poses
some problems as well. For example, if the single site goes down, then everyone
is blocked from accessing the databases until the site comes back up again. Also
the communications costs from the many far-flung PCs and terminals to the
central site can be expensive. One solution to such problems, and an alternative
design to the centralized database concept, is known as distributed database.

The idea is that instead of having one, centralized database, we are going to
spread the data out among the cities on the distributed network, each of which
has its own computer and data storage facilities. All of this distributed data is
still considered to be a single logical database. When a person or process
anywhere on the distributed network queries the database, it is not necessary to
know where on the network the data being sought is located. The user just
issues the query, and the result is returned. This feature is known as location
Distributed Databases |3

transparency. This can become rather complex very quickly, and it must be
managed by sophisticated software known as a distributed database
management system or distributed DBMS.

A distributed database is a Data Collection which satisfies the following

assumptions: resides on more than one machine with computational power;
machines are connected by a communication network; it benefits of a
distributed database management system which allows users to feel they work
on the entire database and gives users the opportunity to declare what they want
not how they want. The practical experience has demonstrated that there are
powerful reasons for a distributed system to be feasible it has to be relational. A
typical distributed database system (DDBS) consists of processing elements
(nodes), communication links (edges), memory units, database, and programs.
These resources are interconnected via a communication network that dictates
how information flows between nodes. Programs residing on some nodes can
run using database at other nodes.

Fig: Distributed Databases

Distributed database management system has to ensure local applications for

each computational station as well as global applications on more computational
Distributed Databases |4

machines; to develop applications it has to provide a high level query language

with distributed query building means. Transparency levels must confer the
image of a unique database.

3. TYPES OF DISTRIBUTED DATABASES

Homogeneous and Heterogeneous Distributed Database Systems

A homogenous distributed database system is a network of two or more

databases that reside on one or more machines that uses, locally, the same
DBMS product. An application can simultaneously access or modify the data in
several databases in a single distributed environment. For a client application,
the location and platform of the databases are transparent.

In a heterogeneous system, sites may run different DBMS products, which need
not be based on the same underlying data model, and so the system may be
composed of relational, network, hierarchical, and object oriented DBMSs.

Homogeneous systems are much easier to design and manage. This approach
provides incremental growth, making the addition of a new site to the DDBMS
easy, and allows increased performance by exploiting the parallel processing
capability of multiple sites.

Heterogeneous systems usually result where individual sites have implemented

their own databases and integration is considered at a later stage. In a
heterogeneous system, translations are required to allow communication
between different DBMSs. The typical solution used by some relational systems
that are part of a heterogeneous DDBMS is to use gateways, which convert the
language and model of each different DBMS into the language and model of the
relational system. However, the gateway approach has some serious limitations:
it may not support transaction management, being in fact a query translator
from one language to another.
Distributed Databases |5

4. ADVANTAGES OF DISTRIBUTED DATABASES

The distribution of data has potential advantages over traditional centralized

databases systems:

 Reflects organizational structure: Allowing the structure of the

database to mirror the structure of the enterprise is probably the major
benefit of distributed systems. Many organizations are naturally
distributed at least logically (into several divisions, departments, work-
groups, etc.) and very likely physically (into plants, factories, laboratories
etc.). Thus, the data is usually distributed already as well, because each
organizational unit within the enterprise will naturally maintain data that
is relevant to its own operation.
 Improves performance and goes beyond capacity limits: Large
centralized databases can often exceed the capacity of the server
platform, resulting in hardware constraints on the database’s total size
and/or poor query performance. If the database is fragmented into
functional subsets spread across multiple hardware platforms but
logically all these make up a single database, then the demand on data
storage and data processing for each individual database platform is less.
As the data is located at the site with the greatest needs on processing,
speed of database access may be better than that achievable from a
remote centralized database. The database systems themselves are
parallelized, allowing load on the databases to be balanced among
servers.
 Management of distributed data with different levels of
transparency: It can be provided at various levels, and each level
requires a particular type of agreement between the participants. In the
fully transparent case, the sites must agree on the data model, the schema
Distributed Databases |6

interpretation, the data representation, the available functionality, and

where the data is located. In the service (non-transparent) model, there is
only agreement on the data exchange format and on the functions that are
provided by each site.
 Improve data availability and reliability: It may often be the case that
a single centralized data base serves the needs of many different
applications, and in the event of database is unavailable all the
applications become inoperative. The failure of database can be
determined by the DBMS, hardware, operating system, and network or
software applications and can wreak still heavier financial damage on
companies. Distributed database systems are designed to continue to
function despite such failures. The effects of database damaged can be
eliminate or at least reduced by replicating the data in the centralized
database in each application-specific database. Critical data may be
replicated at different sites, making it available with higher probability.
Thus, it provides protection against unscheduled interruption of service
by removing a potential single point of failure. Also, if a node fails, the
system may be able to reroute the failed node’s requests to another node.
Multiple processors also open the door to improved performance. For
instance, a query can be executed in parallel at several sites.
 Combining heterogeneous data sources: Many large organizations
would like to preserve their IT investments. This scenario leads over the
years to the fact that a lot of these companies have installed and use two
or more different database management systems which are often
incompatible.
 Provides system flexibility and scalability: The distributed database
systems are much more flexible than centralized database systems, so it is
much easier to handle expansion. Increasing database size can usually be
handled by adding processing and storage power to the network. Also,
Distributed Databases |7

new sites can be added to the network without affecting the operations of
other sites.
 Local Autonomy: A department can control the data about them. Means
that:
 Local data is locally owned and managed;
 Local operations remain purely local;
 All operations at a given site are controlled by that site.
5. DISADVANTAGES OR CHALLENGES OF DISTRIBUTED
DATABASES

There are several disadvantages or challenges related to the distributed database

systems:

 Complexity: A distributed database systems that hides the distributed

nature of the data from the systems designers and users is inherently more
complex than a centralized database system. Besides of the normal
difficulties, the design of a distributed database has to consider
fragmentation of data, allocation of fragments to specific sites and data
replication. The replicated data adds an extra level of complexity. If the
system is not adequately designed, there will be unacceptable level of
performance, reliability and availability, and the advantages cited above
will become disadvantages. It is necessary to remind at this point the
problems rising from optimal data fragmentation and allocation, data
conflict resolution, referential integrity, deferred transaction resolution,
and so on.
 Security: In a centralized system, access to data can be easily controlled.
However, in a distributed DBMS not only does access to replicated data
have to be controlled in multiple locations, but the network itself has to
be made secure. In the past, networks were regarded as an insecure
Distributed Databases |8

communication medium. Although this is still partially true, significant

developments have been made to make networks more secure.
 Difficult to maintain Integrity: In a distributed database, enforcing
integrity over a network may require too much of the network's resources
to be feasible.
 Cost: Increased complexity means that the procurement and maintenance
costs for a distributed DBMS can be higher than those for a centralized
DBMS. Furthermore, a distributed database environment usually requires
additional hardware to establish a network between sites. There are
ongoing communication costs incurred with the use of this network.
There are also additional labor costs to manage and maintain the local
DBMSs and the underlying network. Thus, total cost is a function of
network configuration, the user work load, the data allocation strategy,
and the query optimization algorithm.
 Concurrency Control: An important consideration in the design of
distributed systems is the concurrency control. The concurrency control is
that portion of the system that is concerned with deciding what actions
should be taken in response to requests by the individual processes to
read and write into the database. The concurrency control is concerned
with avoiding deadlocks or similar occurrences and with maintaining the
consistency of the database. The job of the concurrency control is to
ensure that during the concurrent operation of any set of processes:

1. Each process sees a consistent picture of the database.

2. Each process eventually terminates.

3. The final database after all the processes terminate is consistent.

The concurrency control must maintain the global consistency of the

entire distributed database and must ensure that each process terminates.
Distributed Databases |9

 Lack of experience in the more complex database design: Besides the

normal difficulties of designing a centralized database, the design of a
distributed one has to take account of fragmentation of data, allocation of
fragments to specific sites and data replication. A significant deterrent
may be the fact that we do not have the same level of experience as with
centralized DBMSs. There are also no tools or methodologies to help
developers convert a centralized DBMS into a distributed one.
6. DISTRIBUTED DATABASE DESIGN TECHNIQUES

The data is distributed by partitioning the database tables into fragments by

various techniques. The replication or duplication processes are used to
replicate the data at various locations so that the data is closer to the user who is
accessing it.

6.1. FRAGMENTATION

The main reasons of fragmentation of the relations are to: increase locality of
reference of the queries submitted to database, improve reliability and
availability of data and performance of the system, balance storage capacities
and minimize communication costs among sites.

Fragmentation is a design technique to divide a single relation or class of a

database into two or more partitions such that the combination of the partitions
provides the original database without any loss of information. This reduces the
amount of irrelevant data accessed by the applications of the database, thus
reducing the number of disk accesses. Fragmentation can be horizontal, vertical
or mixed/hybrid.

Horizontal fragmentation (HF): allows a relation or class to be partitioned

into disjoint tuples or instances. The example of horizontal fragmentation of a
table is given as below:
D i s t r i b u t e d D a t a b a s e s | 10

Fig: Horizontal Fragmentation

Vertical Fragmentation (VF): allows a relation or class to be partitioned into

disjoint sets of columns or attributes except the primary key. The example of
vertical fragmentation of a table is given as below:

Fig: Vertical Fragmentation

Combination of horizontal and vertical fragmentations to mixed or hybrid

fragmentations (MF) are also proposed. The example of hybrid or mixed
fragmentation of a table is given as below:
D i s t r i b u t e d D a t a b a s e s | 11

Fig: Hybrid Fragmentation

6.2. RIPLICATION / DUPLICATION

Replication and duplication are two processes to that the distributive databases
are up to date and current.

Replication involves using specialized software that looks for changes in the
distributive database. Once the changes have been identified, the replication
process makes all the databases look the same. The replication process can be
very complex and time consuming depending on the size and number of the
distributive databases. This process can also require a lot of time and computer
resources.

Duplication on the other hand is not as complicated. It basically identifies one

database as a master and then duplicates that database. The duplication process
is normally done at a set time after hours. This is to ensure that each distributed
location has the same data. In the duplication process, changes to the master
database only are allowed. This is to ensure that local data will not be
overwritten. Both of the processes can keep the data current in all distributive
locations.

The term replication refers to the operation of copying and maintaining database
objects in multiple databases belonging to a distributed system. While
replication relies on distributed database technology, database replication offers
D i s t r i b u t e d D a t a b a s e s | 12

applications benefits that are not possible within a pure distributed database
environment. Replication uses distributed database technology to share data
between multiple sites, but a replicated database and a distributed database are
not the same. In a distributed database, data is available at many locations, but a
particular table resides at only one location. Replication means that the same
data is available at multiple locations.

Some of the common reasons for using replication are availability,

performance; network load reduction. Replication improves the availability of
applications because it provides them with alternative data access options. If
one site becomes unavailable, users can continue to query or even update the
remaining locations. In other words, replication provides excellent failover
protection.

Replication provides fast, local access to shared data because it balances activity
over multiple sites. Some users can access one server while other users access
other servers, thereby reducing the load at all servers. Also, users can access
data from the replication site that has the lowest access cost, which is typically
the site that is geographically closest to them. Replication can be used to
distribute data over multiple regional locations. Then, applications can access
various regional servers instead of accessing one central server. This
configuration can reduce network load dramatically.

Most commonly, replication is used to improve local database performance and

protect the availability of applications because alternate data access options
exist. For example, an application may normally access a local database rather
than a remote server to minimize network traffic and achieve maximum
performance. Furthermore, the application can continue to function if the local
server experiences a failure, but other servers with replicated data remain
D i s t r i b u t e d D a t a b a s e s | 13

accessible. The replication of fragments improves reliability and efficiency of

read-only queries but increase update cost.

7. CONCLUSION

Distributed databases have become necessity as networks expand and

organizations perform geographically distributed operations. International
companies store their data at different sites of a computer network, possibly in a
variety of forms, ranging from flat files, to hierarchical, relational or object-
oriented databases. The network itself consists of variety of transmission media,
network topologies or network speeds. Design approaches for distributed
databases have to consider various factors that can affect performance: CPU
time, data transmission time, disk I/O operation time. As communication
technology, hardware, software protocols advances rapidly and prices of
network equipments falls every day, developing distributed database systems
become more and more feasible. Communication networks make it feasible to
access remote data or databases, allowing the sharing of data among a
potentially large community of users. There is also a potential for increased
reliability: when one computer fails, data at other sites is still accessible. Critical
data may be replicated at different sites, making it available with higher
probability. Multiple processors also open the door to improved performance.

8. REFERENCES

1. Distributed Databases from Wikipedia.org.

2. Min-Sheng Li and Deng-Jyi Chen, “The Reliability Problem in
Distributed Database Systems”, International Conference on Information,
Communications and Signal Processing ICICS '97 Singapore, 9-12
September 1997
3. Florin Dumitriu and Liviu Cretu, “DISTRIBUTED DATABASE
TECHNOLOGY. A MANAGEMENT PERSPECTIVE”.

Oracle Notes PDF
75% (16)
Oracle Notes PDF
32 pages
Group Disc
No ratings yet
Group Disc
38 pages
RPG Questions and Synon
100% (3)
RPG Questions and Synon
11 pages
Distibuted Database Management System Notes
No ratings yet
Distibuted Database Management System Notes
58 pages
Answer Question 8 Assignment Aa
No ratings yet
Answer Question 8 Assignment Aa
6 pages
Unit 4 DBMS
No ratings yet
Unit 4 DBMS
15 pages
DDBS Lec1
No ratings yet
DDBS Lec1
20 pages
Distributed Database Overview
No ratings yet
Distributed Database Overview
5 pages
Distributed Databases
100% (1)
Distributed Databases
26 pages
Distributed Databases Introduction
100% (1)
Distributed Databases Introduction
16 pages
Distributed DB
No ratings yet
Distributed DB
4 pages
Distributed Database Vs Conventional Database
50% (2)
Distributed Database Vs Conventional Database
4 pages
Distributed Database Management Systems For Information Management and Access
No ratings yet
Distributed Database Management Systems For Information Management and Access
6 pages
Distributed Database System
No ratings yet
Distributed Database System
4 pages
Advanced DataBases W
No ratings yet
Advanced DataBases W
5 pages
14 Distributed DBMSs
No ratings yet
14 Distributed DBMSs
14 pages
DDB-distribution Database Important.
No ratings yet
DDB-distribution Database Important.
15 pages
System Admin and Server Integration
No ratings yet
System Admin and Server Integration
3 pages
Unit 4
No ratings yet
Unit 4
23 pages
Distributed Database System
No ratings yet
Distributed Database System
15 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Unit 13 Distributed Database: Structure
No ratings yet
Unit 13 Distributed Database: Structure
10 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
Distributed Query Processing +
No ratings yet
Distributed Query Processing +
19 pages
Distributed DBMS - Distributed Databases
No ratings yet
Distributed DBMS - Distributed Databases
4 pages
UNIT- 1 DDB
No ratings yet
UNIT- 1 DDB
34 pages
Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
100% (2)
Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
81 pages
Distributed DB
No ratings yet
Distributed DB
43 pages
Distributed Database: Database Database Management System Storage Devices CPU Computers Network
No ratings yet
Distributed Database: Database Database Management System Storage Devices CPU Computers Network
15 pages
Distributed DBMS
No ratings yet
Distributed DBMS
7 pages
Distributed Database System
No ratings yet
Distributed Database System
6 pages
Distributed Database: Source
No ratings yet
Distributed Database: Source
19 pages
Distributed Databases
No ratings yet
Distributed Databases
39 pages
First Normal Form
No ratings yet
First Normal Form
28 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
ddb unit 1-5
No ratings yet
ddb unit 1-5
190 pages
05 Unit5
No ratings yet
05 Unit5
22 pages
Dbms
No ratings yet
Dbms
7 pages
Module 1
No ratings yet
Module 1
24 pages
Assignment # 2: Submitted by Submitted To Class Semester Roll No
No ratings yet
Assignment # 2: Submitted by Submitted To Class Semester Roll No
9 pages
ADBMS Notes 3
No ratings yet
ADBMS Notes 3
9 pages
DDB.NOTES
No ratings yet
DDB.NOTES
19 pages
Parallel and Distributed Databases
No ratings yet
Parallel and Distributed Databases
7 pages
DDS Unit - 1-1
No ratings yet
DDS Unit - 1-1
22 pages
Unit 2 DDMS
No ratings yet
Unit 2 DDMS
26 pages
Chapter 6 Distributed System Management
No ratings yet
Chapter 6 Distributed System Management
12 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
UNIT 1 _SCSA3008_DISTRIBUTED DATABASE AND INFORMATION
No ratings yet
UNIT 1 _SCSA3008_DISTRIBUTED DATABASE AND INFORMATION
23 pages
Practical No. 1: Aim: Study About Distributed Database System. Theory
No ratings yet
Practical No. 1: Aim: Study About Distributed Database System. Theory
22 pages
Distributed Database
No ratings yet
Distributed Database
9 pages
Question No 1 DDBMS Advantages and Disadvantage:: Example
No ratings yet
Question No 1 DDBMS Advantages and Disadvantage:: Example
3 pages
MC4202 - Adavanced Database Technology
No ratings yet
MC4202 - Adavanced Database Technology
159 pages
What Is A Distributed Database
No ratings yet
What Is A Distributed Database
8 pages
Transaction Processing in Replicated Data in The DDBMS: Ashish Srivastava, Udai Shankar, Sanjay Kumar Tiwari
No ratings yet
Transaction Processing in Replicated Data in The DDBMS: Ashish Srivastava, Udai Shankar, Sanjay Kumar Tiwari
8 pages
Unit-Iii Distributed Database: System
No ratings yet
Unit-Iii Distributed Database: System
55 pages
12000221020_ABHIJIT_Distributed DBMS Architecture
No ratings yet
12000221020_ABHIJIT_Distributed DBMS Architecture
5 pages
Distributed Database Management System
No ratings yet
Distributed Database Management System
5 pages
CH.4
No ratings yet
CH.4
16 pages
DBMS Unit 1.1
No ratings yet
DBMS Unit 1.1
6 pages
1 DDBMS Introduction
No ratings yet
1 DDBMS Introduction
18 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Ontology-Based Question Answering System
0% (1)
Ontology-Based Question Answering System
18 pages
Advance Web Development Theory
No ratings yet
Advance Web Development Theory
3 pages
Senthil Kumar: Career Objectives
No ratings yet
Senthil Kumar: Career Objectives
6 pages
Oracle Siebel Solaris10
No ratings yet
Oracle Siebel Solaris10
79 pages
Kalai_AD_CCS341_DW_UNIT 1
No ratings yet
Kalai_AD_CCS341_DW_UNIT 1
42 pages
MITTR_Redis_final_26nov24
No ratings yet
MITTR_Redis_final_26nov24
8 pages
Introduction To Databases - Part 2
No ratings yet
Introduction To Databases - Part 2
19 pages
Manulife Harry
No ratings yet
Manulife Harry
11 pages
DBMS Unit-3
No ratings yet
DBMS Unit-3
97 pages
Postgresql Install
No ratings yet
Postgresql Install
4 pages
Sampled B 4 Download
No ratings yet
Sampled B 4 Download
10 pages
Fujitsu Telentice Enterprise Brochure
No ratings yet
Fujitsu Telentice Enterprise Brochure
4 pages
M.E. Cse SPL
No ratings yet
M.E. Cse SPL
64 pages
5 - Imanager U2000 V100R002 Security and Data Management ISSUE1.01
No ratings yet
5 - Imanager U2000 V100R002 Security and Data Management ISSUE1.01
43 pages
Advanced Databases Second Semester: Dr. Jihan A. Rasool
No ratings yet
Advanced Databases Second Semester: Dr. Jihan A. Rasool
18 pages
DBMS Final Lab Manual
No ratings yet
DBMS Final Lab Manual
54 pages
List of NOSQL Database
No ratings yet
List of NOSQL Database
23 pages
Hibernate Notes
No ratings yet
Hibernate Notes
17 pages
IS222 S12018 FE Sample Answers
100% (1)
IS222 S12018 FE Sample Answers
18 pages
Software Engineering Lab 02 Prepared by Dang Minh Thang
No ratings yet
Software Engineering Lab 02 Prepared by Dang Minh Thang
16 pages
Client Export Import
No ratings yet
Client Export Import
3 pages
Week 2
No ratings yet
Week 2
22 pages
80303a 01
No ratings yet
80303a 01
26 pages
SQL Notes Full PDF
No ratings yet
SQL Notes Full PDF
72 pages
Expert Systems and Business Intelligence Applications in Knowledge Management Processes
100% (1)
Expert Systems and Business Intelligence Applications in Knowledge Management Processes
8 pages
10 Auditing in An It (Cis) Environment
No ratings yet
10 Auditing in An It (Cis) Environment
12 pages
Web Based Alumni Tracking Sytem in QCU
No ratings yet
Web Based Alumni Tracking Sytem in QCU
4 pages
OPC Client Driver
No ratings yet
OPC Client Driver
156 pages