0% found this document useful (0 votes)
49 views15 pages

DDB-distribution Database Important.

This document provides an overview of distributed databases. It begins by defining a distributed database as a set of interconnected databases spread across a network. A distributed database management system (DDBMS) manages the distributed database and makes the data transparent to users. Key points include: - Distributed databases allow for optimal use of computing resources by intentionally distributing data across multiple nodes. - A DDBMS synchronizes databases periodically and provides access mechanisms to make the distribution transparent. It also ensures data modified at any site is universally updated. - Factors encouraging distributed databases include the distributed nature of organizations, need for data sharing, and support for both online transaction processing and online analytical processing. - Advantages include modular development

Uploaded by

Aman Paidlewar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
49 views15 pages

DDB-distribution Database Important.

This document provides an overview of distributed databases. It begins by defining a distributed database as a set of interconnected databases spread across a network. A distributed database management system (DDBMS) manages the distributed database and makes the data transparent to users. Key points include: - Distributed databases allow for optimal use of computing resources by intentionally distributing data across multiple nodes. - A DDBMS synchronizes databases periodically and provides access mechanisms to make the distribution transparent. It also ensures data modified at any site is universally updated. - Factors encouraging distributed databases include the distributed nature of organizations, need for data sharing, and support for both online transaction processing and online analytical processing. - Advantages include modular development

Uploaded by

Aman Paidlewar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 15

DISTRIBUTED DATABASE (19OE1CS05)

UNIT - I

INTRODUCTION TO DISTRIBUTED DATABASE

Introduction: Features of Distributed versus Centralized Databases, Levels of


Distribution.

Transparency: Reference Architecture for Distributed Databases, Types of


Data Fragmentation, Distribution transparency for Read – only Applications,
Distribution transparency for update Applications, Distributed database
Access primitives, Integrity Constraints in Distributed Databases.

------------------------------------------------------------------------------------------------

A database is an ordered collection of related data that is built for a specific purpose. A
database may be organized as a collection of multiple tables, where a table represents a real
world element or entity. Each table has several different fields that represent the characteristic
features of the entity.
For example, a company database may include tables for projects, employees, departments,
products, and financial records. The fields in the Employee table may be Name,
Company_Id, Date_of_Joining, and so forth.
A database management system is a collection of programs that enables creation and
maintenance of a database. DBMS is available as a software package that facilitates
definition, construction, manipulation and sharing of data in a database. Definition of a
database includes description of the structure of a database. Construction of a database
involves actual storing of the data in any storage medium. Manipulation refers to the
retrieving information from the database, updating the database and generating reports.
Sharing of data facilitates data to be accessed by different users or programs.
Examples of DBMS Application Areas
Automatic Teller Machines Train Reservation System Employee Management System
Student Information System
Examples of DBMS Packages
MySQL
Oracle
SQL Server dBASE
FoxPro PostgreSQL, etc.
Distributed Database
A distributed database is a set of interconnected databases that is distributed over the
computer network or internet. A Distributed Database Management System (DDBMS)
manages the distributed database and provides mechanisms to make the databases transparent
to the users. In these systems, data is intentionally distributed among multiple nodes so that
all computing resources of the organization can be optimally used.
A distributed database is a collection of multiple interconnected databases, which are
spread physically across various locations that communicate via a computer network.
Features
 Databases in the collection are logically interrelated with each other. Often,
theyrepresent a single logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by
a DBMS independent of the other sites.
 The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous
with a transaction processing system.

Distributed database

Distributed Database Management System


A distributed database management system (DDBMS) is a centralized software system that
manages a distributed database in a manner as if it were all stored in a single location.
Features
 It is used to create, retrieve, update and delete distributed databases.
 It synchronizes the database periodically and provides access mechanisms by the
virtue of which the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.
 It is used in application areas where large volumes of data are processed and accessed
by numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.
Factors Encouraging DDBMS
 Distributed Nature of Organizational Units − Most organizations in the current times
are subdivided into multiple units that are physically distributed over the globe. Each
unit requires its own set of local data. Thus, the overall database of the organization
becomes distributed.
 Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources. This demands
common databases or replicated databases that should be used in a synchronized
manner.
 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may
have common data. Distributed database systems aid both these processing by
providing synchronized data.
 Database Recovery − One of the common techniques used in DDBMS is replication
of data across different sites. Replication of data automatically helps in data recovery
if database in any site is damaged. Users can access data from other sites while the
damaged site is being reconstructed. Thus, database failure may become almost
inconspicuous to users.
 Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a
uniform functionality for using the same data among different platforms.
Advantages of Distributed Databases
 Modular Development − If the system needs to be expanded to new locations or new
units, in centralized database systems, the action requires substantial efforts and
disruption in the existing functioning. However, in distributed databases, the work
simply requires adding new computers and local data to the new site and finally
connecting them to the distributed system, with no interruption in current functions.
 More Reliable − In case of database failures, the total system of centralized databases
comes to a halt. However, in distributed systems, when a component fails, the
functioning of the system continues may be at a reduced performance. Hence
DDBMS is more reliable.
 Better Response − If data is distributed in an efficient manner, then user requests can
be met from local data itself, thus providing faster response. On the other hand, in
centralized systems, all queries have to pass through the central computer for
processing, which increases the response time.
 Lower Communication Cost − In distributed database systems, if data is located
locally where it is mostly used, then the communication costs for data manipulation
can be minimized. This is not feasible in centralized systems.

Adversities of Distributed Databases


 Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the several
sites.
 Processing overhead − Even simple operations may require a large number of
communications and additional calculations to provide uniformity in data across the
sites.
 Data integrity − The need for updating data in multiple sites pose problems of data
integrity.
 Overheads for improper data distribution − Responsiveness of queries is largely
dependent upon proper data distribution. Improper data distribution often leads to
very slow response to user requests.

Centralized database

Features of Distributed versus centralized database

Distributed Database Vs Centralized Database


Centralized DBMS Distributed DBMS
In Distributed DBMS the database are stored
In Centralized DBMS the database are stored
in different site and help of network it can
in a only one site
access it

Database and DBMS software distributed


If the data is stored at a single computer site,
over many sites, connected by a computer
which can be used by multiple users
network

Database is maintained at several different


Database is maintained at one site
sites

If one system fails, system continues work


If centralized system fails, entire system is
with another site
halted
It is a more reliable
It is a less reliable
UNIT 2 - DISTRIBUTED DATABASE DESIGN
Distributed database design

First, we introduce a framework for the design of distributed databases, by stressing what should be
designed. We also indicate the objectives of the design of data distribution, and we present a top-down and a
bottom-up approach. In the rest of the chapter, we will concentrate on the top-down approach.

A FRAMEWORK FOR DISTRIBUTED DATABASE DESIGN

The design of a centralized database amounts to:


1. Designing the "conceptual schema" which describes the integrated database (i.e., all the data which are
used by the database applications).
2. Designing the "physical database," i.e., mapping the conceptual schema to storage areas and determining
appropriate access methods.

The distribution of the database adds to the above problems two new ones:
3. Designing the fragmentation, i.e., determining how global relations are subdivided into horizontal,
vertical, or mixed fragments.
4. Designing the allocation of fragments, i.e., determining how fragments are mapped to physical images; in
this way, also the replication of fragments is determined. These two problems fully characterize the design
of data distribution.
In the design of a distributed database, sufficiently precise knowledge of application requirements is
needed; clearly, this knowledge is required only for the more "important" applications, i.e., those
which will be executed frequently or whose performances are critical. In the application requirements
we include:
1. The site from which the application is issued (also called site of origin of the application).
2. The frequency of activation of the application (i.e., the number of activation requests in the unit time); in
the general case of applications which can be issued at multiple sites, we need to know the frequency of
activation of each application at each site.
3. The number, type, and the statistical distribution of accesses made by each application to each required
data "object.

Objectives of the Design of Data Distribution


In the design of data distribution, the following objectives should be taken into account:
Processing locality
 Distributing data to maximize processing locality cor- responds to the simple principle of placing
data as close as possible to the applications which use them.
 The simplest way of characterizing processing locality is to consider two types of references to data:
"local" references and "remote" references s corresponding to each candidate fragmentation and
fragment allocation, and selecting the best solution among them.
 An extension to this simple optimization criterion is to consider when an application has complete
locality. We use this term to designate those applications which can be completely executed at their
sites of origin. The advantage of complete locality is not only the reduction of remote accesses, but
also the in- creased simplicity in controlling the execution of the application
Top-Down and Bottom-Up Approaches to the Design of Data Distribution
There are two alternative approaches to the design of data distribution, the topdown and the bottom-
up approach
Top-Down Design
 Suitable for applications where database needs to be built from scratch
 Activity begins with requirement analysis
 Requirement document is input to two parallel activities:
 view design activity, deals with defining the interfaces for end users
 conceptual design, process by which enterprise is examined
– Can be further divided into 2 related activity groups
– Entity analyses, concerned with determining the entities, attributes and the
relationship between them
– Functional analyses, concerned with determining the fun
 Distributed design activity consists of two steps
– Fragmentation
– Allocation

Bottom-Up Approach

 Suitable for applications where database already exists


 Starting point is individual conceptual schemas
 Exists primarily in the context of heterogeneous database.
 The bottom-up design of a distributed database requires: 
 The selection of a common database model for describing the global schema of the database.
 The translation of each local schema into the common data model 
 The integration of the local schemata into a common global schema. 


 THE DESIGN OF DATABASE FRAGMENTATION

The design of fragmentation is the first problem that must be solved in the top-down design of data distribution. The
purpose of fragmentation design is to determine nonoverlapping fragments which are "logical units of allocation," i.e., that
are appropriate startpoints for the following data allocation problem.

Horizontal Fragmentation
we have introduced two types of horizontal fragmentation, called primary and derived;
Primary fragmentation: primary horizontal fragments are defined using selections on global relations; the correctness
of primary fragmentation requires that each tuple of the global relation be selected in one and only one fragment.
Let R be the global relation for which we want to produce a horizontal primary fragmentation. We introduce the following
definitions:

 A simple predicate is a predicate of the type: Attribute comparison_operator value (Ex: RollNo = 1).
 A minterm predicate y for a set P of simple predicates is the conjunction of all predicates appearing in P, either
taken in natural form or negated, provided that this expression is not a contradiction. Thus,

where (p* = pi or p* = NOT pi) and y != false


 A fragment is the set of all tuples for which a minterm predicate holds.
 A simple predicate pi is relevant with respect to a set P of simple predicates if there exist at least two minterm
predicates of P whose expression differs only in the predicate pi itself (which appears in the natural form in one case
and negated in the other one) such that the corresponding fragments are referenced in a different way by at least one
application.
However, we can define two properties which characterize an appropriate fragmentation. Let P = {pi,p2 , . . . jPn } be a
set of simple predicates. For P to represent fragmentation correctly and efficiently, P must be complete and minimal.
 We say that a set P of predicates is complete if and only if any two tuples belonging to the same fragment are
referenced with the same probability by any application.
 We say that the set P is minimal if all its predicates are relevant.

Derived horizontal fragmentation:


The derived horizontal fragmentation of a global relation R is not based on properties of its own attributes, but is derived
from the horizontal fragmentation of another relation.
Derived fragmentation is used to facilitate the join between fragments. A distributed join is a join between horizontally
fragmented relations. When an application requires the join between two global relations R and S, all the tuples of R and S
need to be compared; thus, in principle, it is necessary to compare all the fragments R{of R with all the fragments Sj of S.
However, sometimes it is possible to deduce that some of the partial joins R{ JN Sj are intrinsically empty. This happens
when, for a given data distribution, values of the join attribute in R{ and Sj are disjoint. A distributed join is represented
efficiently using join graphs.
 The join graph G of the distributed join R JN S is a graph (N, E),
 where nodes N represent fragments of R and S and
 Non-directed edges between nodes represent joins between fragments which are not empty.
 A join graph is total: when it contains all possible edges between fragments of R and S;
reduced: when some edges between fragments of R and fragments of S are missing.

An example of a join graph is presented in Figure


A reduced join graph:
partitioned if the graph is composed of two or more subgraphs without edges between them (Fig b)
simple if it is partitioned and each subgraph has just one edge (Fig c).
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In
order to maintain reconstructiveness, each fragment should contain the primary key field(s)
of the table. Vertical fragmentation can be used to enforce privacy of data.
Grouping
 Starts by assigning each attribute to one fragment
o at each step, joins some of the fragments until some criteria is satisfied.
 Results in overlapping fragments
Splitting
 Starts with a relation and decides on beneficial partitioning based on the access
behavior of applications to the attributes
 Fits more naturally within the top-down design
 Generates non-overlapping fragments

Mixed (Hybrid) Fragmentation


In Mixed fragmentation, a combination of horizontal and vertical fragmentation techniques is
used. This is the most flexible fragmentation technique since it generates fragments with
minimal extraneous information. However, reconstruction of the original table is often an
expensive task.
Mixed fragmentation can be done in two alternative ways −
At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.
THE ALLOCATION OF FRAGMENTS

 Improves the performance of the applications processing in the Distributed Database systems
 to reduce the communication cost during the applications execution and handling their operational processing.
 Fragments are not properly modeled as individual files,
‒ do not consider the fact that they have the same structure or behavior.
 There are many more fragments than original global relations, and many analytic models cannot compute the
solution of problems involving too many variables.
 Modeling application behavior in file systems is very simple while in distributed databases applications can make a
sophisticated use of data

Non redundant allocation:


 The simplest method is a "best-fit" approach; a measure is associated with each possible allocation, and the site with
the best measure is selected.
 This approach gives a solution which disregards the "mutual" effect of placing a fragment at a given site if a related
fragment is also at that site.
 Replication introduces further complexity in the design, because:
‒ The degree of replication of each fragment becomes a variable of the problem.
‒ Modeling read applications is complicated by the fact that the applications can now select among several
alternative sites for accessing fragments.
Redundant allocation of fragments:
 Determine the set of all sites where the benefit of allocating one copy of the fragment is higher than the cost, and
allocate a copy of the fragment to each element of this set; this method selects "all beneficial sites."
 the process is terminated when no "additional replication" is beneficial.
Measure of costs and Benefits of fragment Allocation
Measure of costs and Benefits of fragment Allocation of a global relation R
nki = rkz + ukz
i is the fragment index
j is the site index
k is the application index
fkj is the frequency of application k at site j
rki is the number of retrieval references of application k to fragment i
Uki is the number of update references of application k to fragment i

Measure of costs and Benefits of fragment Allocation


Horizontal fragmentation
 Using the "best-fit" approach for a nonreplicated allocation, we place Ri at the site where the number of
references to Ri is maximum. The number of local references of Ri at site j is

Ri is allocated at site j* such that Bij* is maximum.


 Using the "all beneficial sites" method for replicated allocation, we place Ri at all sites j where the cost
of retrieval references of applications is larger than the cost of update references to Ri from applications
at any other site. Bij is evaluated as the difference. C is a constant which measures the ratio between the
cost of an update and a retrieval access.

 Using the "additional replication" method for replicated allocation. Here, di denote the degree of redundancy of Ri
and Fi denote the benefit of having Ri fully replicated at each site

Vertical fragmentation:
Here we measure the benefit of vertically partitioning a fragment Ri, allocated at site r, into two
fragments Rs and Rt , allocated at sites s and t. By the effect of this partitioning:
1. There are two sets A3 and At of applications, issued at sites s or t, which use only attributes of Rs or Rt
and become local to sites s and t, respectively; these applications save one remote reference.
2. There is a set A\ of applications formerly local to r which use only attributes of Rs or Rt ) these
applications now need to make an additional remote reference.
3. There is a set A2 of applications formerly local to r which reference attributes of both Rs and Rt ; these
applications make two additional remote references.
4. There is a set A3 of applications at sites different than r, 5, or t which reference attributes of both Rs and
Rt; these applications make one additional remote reference.
We evaluate the benefit of this partitioning as

Vertical clustering:
We measure the benefit of the vertical clustering of a fragment Ri, allocated at site r, into two fragments Rs
and Rt , allocated at sites s and t, with overlapping attributes J. The clustering requires reconsidering the
groups of applications introduced for vertical partitioning:
1. As includes applications which are local to site s because they either: • Read any attribute of RSJ or •
Update attributes of Rs which are not in the overlapping part / The same holds for At .
2. A2 includes update applications formerly local to r which make an update to attributes of I, since now
they need to access both Rs and Rt .
3. As includes the applications at sites different than r, s, or t which update attributes of /, which also need
to access both Rs and Rt . We evaluate the benefit of this clustering using the above expression for Bist .
DESIGN ALTERNATIVES
The distribution design alternatives for the tables in a DDBMS are as follows −
Non-replicated and non-fragmented
Fully replicated
Partially replicated Fragmented
Mixed
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is placed so that it
is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at
different sites is low. If an appropriate distribution strategy is adopted, then this design
alternative helps to reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of
queries is required to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the
fact that the frequency of accessing the tables vary considerably from site to site. The
number of copies of the tables (or portions) depends on how frequently the access queries
execute and the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of
each fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
 Vertical fragmentation
 Horizontal fragmentation
 Hybrid fragmentation

Mixed Distribution: This is a combination of fragmentation and partial replications. Here, the
tables are initially fragmented in any form (horizontal or vertical), and then these fragments
are partially replicated across the different sites according to the frequency of accessing the
fragments.
Design Strategies
In the last chapter, we had introduced different design alternatives. In this chapter, we will
study the strategies that aid in adopting the designs. The strategies can be broadly divided
into replication and fragmentation. However, in most cases, a combination of the two is
used.
Data Replication
Data replication is the process of storing separate copies of the database at two or more
sites. It is a popular fault tolerance technique of distributed databases.
Advantages of Data Replication

 Reliability − In case of failure of any site, the database system continues to work
since a copy is available at another site(s).
 Reduction in Network Load − Since local copies of data are available, query
processing can be done with reduced network usage, particularly during prime hours.
Data updating can be done at non-prime hours.
 Quicker Response − Availability of local copies of data ensures quick query
processing and consequently quick response time.
 Simpler Transactions − Transactions require less number of joins of tables located at
different sites and minimal coordination across the network. Thus, they become
simpler in nature.

Disadvantages of Data Replication


 Increased Storage Requirements − Maintaining multiple copies of data is associated
with increased storage costs. The storage space required is in multiples of the storage
required for a centralized system.
 Increased Cost and Complexity of Data Updating − Each time a data item is updated,
the update needs to be reflected in all the copies of the data at the different sites. This
requires complex synchronization techniques and protocols.
 Undesirable Application – Database coupling − If complex update mechanisms are
not used, removing data inconsistency requires complex co- ordination at application
level. This results in undesirable application – database coupling.

Some commonly used replication techniques are


Snapshot replication
Near-real-time replication
Pull replication
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the
table are called fragments. Fragmentation can be of three types: horizontal, vertical, and
hybrid (combination of horizontal and vertical). Horizontal fragmentation can further be
classified into two techniques: primary horizontal fragmentation and derived horizontal
fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from
the fragments. This is needed so that the original table can be reconstructed from the
fragments whenever required. This requirement is called “reconstructiveness.”
Advantages
1. Permits a number of transactions to executed concurrently
2. Results in parallel execution of a single query
3. Increases level of concurrency, also referred to as, intra query concurrency
4. Increased System throughput.
5. Since data is stored close to the site of usage, efficiency of the database system is
increased.
6. Local query optimization techniques are sufficient for most queries since data is
locally available.
7. Since irrelevant data is not available at the sites, security and privacy of the database
system can be maintained.

Disadvantages
1. Applications whose views are defined on more than one fragment may suffer
performance degradation, if applications have conflicting requirements.
2. Simple tasks like checking for dependencies, would result in chasing after data in a
number of sites
3. When data from different fragments are required, the access speeds may be very
high.
4. In case of recursive fragmentations, the job of reconstruction will need expensive
techniques.
5. Lack of back-up copies of data in different sites may render the database ineffective in
case of failure of a site.
For example, let us consider that a University database keeps records of all registeredstudents in a Student
table having the following schema.
STUDENT
Regd_No Name Course Address Semester Fees Ma
rks

Now, the fees details are maintained in the accounts section. In this case, the designer will fragment

CREATE TABLE STD_FEES AS


SELECT Regd_No, Fees
FROM STUDENT;

Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more
fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each
horizontal fragment must have all columns of the original base table.

 Primary horizontal fragmentation is defined by a selection operation on the owner


relation of a database schema.
 Given relation Ri, its horizontal fragments are given by
Ri = σFi(R), 1<= i <= w
Fi selection formula used to obtain fragment Ri
The example mentioned in slide 20, can be represented by using the above formula as
Emp1 = σSal <= 20K (Emp)
Emp2 = σSal > 20K (Emp)
For example, in the student schema, if the details of all students of Computer Science Course
needs to be maintained at the School of Computer Science, then the designer will
horizontally fragment the database as follows −

CREATE COMP_STD AS SELECT * FROM STUDENT

WHERE COURSE = "Computer Science";

Derived Horizontal Fragmentation


 Defined on a member relation of a link according to a selection operation
specified on its owner.

 Link between the owner and the member relations is defined as equi-join

 An equi-join can be implemented by means of semijoins.

 Given a link L where owner (L) = S and member (L) = R, the derived horizontal
fragments of R are defined as
Ri = R α Si, 1 <= I <= w
Where,
Si = σ Fi (S)
w is the max number of fragments that will be defined on
Fi is the formula using which the primary horizontal fragment Si is defined

Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques
are used. This is the most flexible fragmentation technique since it generates fragments with
minimal extraneous information. However, reconstruction of the original table is often an
expensive task.
Hybrid fragmentation can be done in two alternative ways −
At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.
Transparency
Transparency in DBMS stands for the separation of high level semantics of the system from
the low-level implementation issue. High-level semantics stands for the endpoint user, and
low level implementation concerns with complicated hardware implementation of data or
how the data has been stored in the database. Using data independence in various layers of
the database, transparency can be implemented in DBMS.
Distribution transparency is the property of distributed databases by the virtue of which the
internal details of the distribution are hidden from the users. The DDBMS designer may
choose to fragment tables, replicate the fragments and store them at different sites.
However, since users are oblivious of these details, they find the distributed database easy to
use like any centralized database.
Unlike normal DBMS, DDBMS deals with communication network, replicas and fragments
of data. Thus, transparency also involves these three factors.
Following are three types of transparency:
1. Location transparency
2. Fragmentation transparency
3. Replication transparency
Location Transparency
Location transparency ensures that the user can query on any table(s) or fragment(s) of a
table as if they were stored locally in the user’s site. The fact that the table or its fragments
are stored at remote site in the distributed database system, should be completely oblivious to
the end user. The address of the remote site(s) and the access mechanisms are completely
hidden.In order to incorporate location transparency, DDBMS should have access to updated
and accurate data dictionary and DDBMS directory which contains the details of locations
of data.
Fragmentation Transparency
Fragmentation transparency enables users to query upon any table as if it were unfragmented.
Thus, it hides the fact that the table the user is querying on is actually a fragment or union of
some fragments. It also conceals the fact that the fragments are located at diverse sites.This is
somewhat similar to users of SQL views, where the user may not know that they are using a
view of a table instead of the table itself.
Replication Transparency
Replication transparency ensures that replication of databases are hidden from the users. It
enables users to query upon a table as if only a single copy of the table exists.Replication
transparency is associated with concurrency transparency and failure transparency. Whenever
a user updates a data item, the update is reflected in all the copies of the table. However, this
operation should not be known to the user. This is concurrency transparency. Also, in case of
failure of a site, the user can still proceed with his queries using replicated copies without
any knowledge of failure. This is failure transparency.

You might also like