0% found this document useful (0 votes)
6 views37 pages

Introduction To Distributed Database

The document discusses Distributed Database Management Systems (DDBMS), which allow access to data stored across multiple sites, emphasizing the differences between centralized and distributed systems. It outlines the characteristics, advantages, and disadvantages of DDBMS, including issues of complexity, cost, and security, as well as the distinction between homogeneous and heterogeneous systems. Additionally, it covers parallel DBMS architectures and their benefits for performance and scalability in handling large databases.

Uploaded by

amagsi040
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views37 pages

Introduction To Distributed Database

The document discusses Distributed Database Management Systems (DDBMS), which allow access to data stored across multiple sites, emphasizing the differences between centralized and distributed systems. It outlines the characteristics, advantages, and disadvantages of DDBMS, including issues of complexity, cost, and security, as well as the distinction between homogeneous and heterogeneous systems. Additionally, it covers parallel DBMS architectures and their benefits for performance and scalability in handling large databases.

Uploaded by

amagsi040
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Distributed Database

Danish Nazir Arain


In previous chapters we have concentrated on centralized database
systems, that is, systems with a single logical database located at one
site under the control of a single DBMS.
In this chapter we discuss the concepts and issues of the Distributed
Database Management System (DDBMS), which allows users to
access not only the data at their own site but also data stored at remote
sites.
Distributed database

A logically interrelated collection of shared data (and a description of this


data) physically distributed over a computer network.
Heterogeneity Vs Homogenous

The Internet enables users to access services and run applications over a
heterogeneous collection of computers and networks. Heterogeneity (that
is, variety and difference) applies to all of the following:
● networks;
● computer hardware;
● operating systems;
● programming languages;
● implementations by different developers.
Tightly coupled

Tight coupling is when a group of classes or components are highly


dependent on one another. Like Shared memory.
Loosely coupled

loose coupling is when a group of classes or components are not


dependent on one another.
Distributed DBMS

The software system that permits the management of the distributed


database and makes the distribution transparent (invisible) to users.
Distributed Database Management System (DDBMS)

A Distributed Database Management System (DDBMS) consists of a


single logical database that is split into a number of fragments.
Each fragment is stored on one or more computers under the control of a
separate DBMS, with the computers connected by a communications
network.
Local vs global applications

Users access the distributed database via applications, which are


classified as those that do not require data from other sites (local
applications) and those that do require data from other sites (global
applications).
A DDBMS therefore has the following characteristics:

• a collection of logically related shared data;


• the data is split into a number of fragments;
• fragments may be replicated;
• fragments/replicas are allocated to sites;
• the sites are linked by a communications network;
• the data at each site is under the control of a DBMS;
• the DBMS at each site can handle local applications, autonomously;
• each DBMS participates in at least one global application.
Transparent Distribution
From the definition of the DDBMS, the system is expected to make the
distribution transparent (invisible) to the user. Thus, the fact that a
distributed database is split into fragments that can be stored on different
computers and perhaps replicated, should be hidden from the user.
The objective of transparency is to make the distributed system appear
like a centralized system.
This is sometimes referred to as the fundamental principle of distributed
DBMSs (Date, 1987b).
Distributed processing

A centralized database that can be accessed over a computer network.

The key point with the definition of a distributed DBMS is that the system
consists of data that is physically distributed across a number of sites in
the network.
If the data is centralized, even though other users may be accessing the
data over the network, we do not consider this to be a distributed DBMS
but simply distributed processing.
Distributed processing
Parallel DBMSs

A DBMS running across multiple processors and disks that is designed to


execute operations in parallel, whenever possible, in order to improve
performance.
Parallel DBMSs are again based on the premise that single-processor
systems can no longer meet the growing requirements for cost-effective
scalability, reliability, and performance.
A powerful and financially attractive alternative to a single-processor-
driven DBMS is a parallel DBMS driven by multiple processors.
Parallel DBMSs link multiple, smaller machines to achieve the same
throughput as a single, larger machine, often with greater scalability and
reliability than single-processor DBMSs.
To provide multiple processors with common access to a single database,
a parallel DBMS must provide for shared resource management.
Which resources are shared and how those shared resources are
implemented directly affects the performance and scalability of the
system, which in turn determines its appropriateness for a given
application/environment.
Types of architectures for parallel DBMS

The three main architectures for parallel DBMSs:


• shared memory;
• shared disk;
• shared nothing.
Shared memory

Shared memory is a tightly coupled architecture in which multiple


processors within a single system share system memory.
Known as symmetric multiprocessing (SMP), this approach has become
popular on platforms ranging from personal workstations that support a
few microprocessors in parallel, to large RISC (Reduced Instruction Set
Computer)-based machines, all the way up to the largest mainframes.
This architecture provides high-speed data access for a limited number of
processors, but it is not scalable beyond about 64 processors, at which
point the interconnection network becomes a bottleneck.
Shared Memory
Shared Memory

Where multiple processors share the main memory (RAM) space but
each processor has its own disk (HDD).
If many processes run simultaneously, the speed is reduced, the same
as a computer when many parallel tasks run and the computer slows
down.
Shared disk

Shared disk is a loosely coupled architecture optimized for applications that are
inherently centralized and require high availability and performance. Each
processor can access all disks directly, but each has its own private memory.
Like the shared nothing architecture, the shared disk architecture eliminates the
shared memory performance bottleneck.
Unlike the shared nothing architecture, however, the shared disk architecture
eliminates this bottleneck without introducing the overhead associated with
physically partitioned data.
Shared disk systems are sometimes referred to as clusters.
Shared disk
Shared Memory

Where each node has its own main memory, but all nodes share mass
storage, usually a storage area network. In practice, each node usually
also has multiple processors.
Shared Nothing
Shared nothing, often known as massively parallel processing (MPP), is
a multiple-processor architecture in which each processor is part of a
complete system, with its own memory and disk storage.
The database is partitioned among all the disks on each system
associated with the database, and data is transparently available to users
on all systems.
This architecture is more scalable than shared memory and can easily
support a large number of processors.
However, performance is optimal only when requested data is stored
locally.
Share Nothing
Where each node has its own mass storage as well as main memory.
Although the shared nothing definition sometimes includes distributed
DBMSs, the distribution of data in a parallel DBMS is based solely on
performance considerations.
In addition, the nodes of a DDBMS are typically geographically distributed,
separately administered, and have a slower interconnection network,
whereas the nodes of a parallel DBMS are typically within the same
computer or within the same site.
Parallel technology is typically used for very large databases possibly of the order of
terabytes (1012 bytes), or systems that have to process thousands of transactions per
second.
These systems need access to large volumes of data and must provide timely
responses to queries. A parallel DBMS can use the underlying architecture to improve
the performance of complex query execution using parallel scan, join, and sort
techniques that allow multiple processor nodes automatically to share the processing
workload.
We discuss this architecture further in Chapter 31 on data warehousing. Suffice it to
note here that all the major DBMS vendors produce parallel versions of their database
engines.
Advantages and Disadvantages of DDBMSs
Advantages

● Reflects organizational structure : Local and Global sites and


hierarchy
● Improved shareability and local autonomy : Local policies
● Improved availability : System failures
● Improved reliability : Replicated on different sites
● Improved performance : Near “Greatest demand”, Parallel DB
better than Remote centralized, CPU handles part of DB not
whole.
● Economics : Better to add computers than to buy Mainframe,
local processing is cheaper than processing at another site.
● Modular growth : Scalable (easier to expand)
● Integration : integrating old centralized systems to new
distributed systems.
● Remaining competitive : e-business, computer- supported
collaborative work and workflow management. Many enterprises have
had to reorganize their businesses and use distributed database
technology to remain competitive
Disadvantages
Complexity : DDBMS is more complex than centralized.
Cost : maintenance cost more than centralized
Security : Data stored in different site, communication network can be insecure

Integrity control more difficult : Enforcing


integrity constraints generally requires access
to a large amount of data that defines the constraint but that is not involved in the
actual update operation itself.
lack of standards : lack of Communication and data access protocols, no tools or
methodologies to help users convert a centralized DBMS into a distributed DBMS.
lack of experience : New Technology hence lack of experience
Database design more complex : Besides the normal difficulties of designing a centralized
database, the design of a distributed database has to take account of fragmentation of data,
allocation of fragments to specific sites, and data replication.
Homogeneous and Heterogeneous DDBMSs

A DDBMS may be classified as homogeneous or heterogeneous. In a


homogeneous system, all sites use the same DBMS product. In a
heterogeneous system, sites may run different DBMS products, which
need not be based on the same underlying data model, and so the system
may be composed of relational, network, hierarchical, and object-oriented
DBMSs.
Homogeneous systems are much easier to design and manage. This
approach provides incremental growth, making the addition of a new site
to the DDBMS easy, and allows increased performance by exploiting the
parallel processing capability of multiple sites.
Heterogeneous systems usually result when individual sites have
implemented their own databases and integration is considered at a later
stage. In a heterogeneous system, translations are required to allow
communication between different DBMSs. To provide DBMS
transparency, users must be able to make requests in the language of the
DBMS at their local site. The system then has the task of locating the data
and performing any necessary translation. Data may be required from
another site that may have:
• different hardware;
• different DBMS products;
• different hardware and different DBMS products.
If the hardware is different but the DBMS products are the same, the
translation is straightforward, involving the change of codes and word
lengths. If the DBMS products are different, the translation is complicated
involving the mapping of data structures in one data model to the
equivalent data structures in another data model.
For example, relations in the relational data model are mapped to records
and sets in the network model. It is also necessary to translate the query
language used (for example, SQL SELECT statements are mapped to the
network FIND and GET statements). If both the hardware and software
are different, then both these types of translation are required. This makes
the processing extremely complex.
The typical solution used by some relational systems that are part of a
heterogeneous DDBMS is to use gateways, which convert the language
and model of each different DBMS into the language and model of the
relational system.

You might also like