0% found this document useful (0 votes)
88 views

Fundamental Research of Distributed Database PDF

1) Distributed databases are becoming increasingly popular as businesses need reliable, scalable access to information across different network sites. 2) A distributed database is a collection of databases stored at different computer network sites, with each database potentially using different database management systems. 3) The objective of a distributed database management system (DDBMS) is to control the management of a distributed database (DDB) in a way that appears as a centralized database to users.

Uploaded by

Maryam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Fundamental Research of Distributed Database PDF

1) Distributed databases are becoming increasingly popular as businesses need reliable, scalable access to information across different network sites. 2) A distributed database is a collection of databases stored at different computer network sites, with each database potentially using different database management systems. 3) The objective of a distributed database management system (DDBMS) is to control the management of a distributed database (DDB) in a way that appears as a centralized database to users.

Uploaded by

Maryam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IJCSMS International Journal of Computer Science and Management Studies, Vol.

11, Issue 02, Aug 2011 138


ISSN (Online): 2231-5268
www.ijcsms.com

Fundamental Research of Distributed Database


Swati Gupta1, Kuntal Saroha2, Bhawna 3
1
Lecturer, RIMT, Chidana
Swati.mangla.555@gmail.com
2
Research Scholar , IIIT, Gwaliar,
sarohakuntal@gmail.com
3
M.Tech. Scholar , PDMCE,Bahadurgarh
bhawna.kochhar9@gmail.com

ABSTRACT far PCs and terminals to the central site can be


The purpose of this paper is to present an introduction to expensive. One solution to such problems, and
Distributed Databases which are becoming very popular an alternative design to the centralized database
now a days. Today’s business environment has an
concept, is known as distributed database.
increasing need for distributed database and Client/server
In short a distributed database is a collection of
applications as the desire for reliable, scalable and
databases that can be stored at Different
accessible information is Steadily rising. Distributed
database systems provide an improvement on
computer network sites. Each database may
communication and data processing due to its data involve different database management systems
distribution throughout different network sites. Not Only is and different architectures that distribute the
data access faster, but a single-point of failure is less likely execution of transactions. The objective of a
to occur, and it provides local control of data for users. distributed database management system
Keywords: Distributed databases fundamentals, current (DDBMS) is to control the management of a
research: query optimization, distribution optimization, distributed database (DDB) in such a way that it
fragmentation optimization. appears to the user as a centralized database.

II DISTRIBUTED DATABASES
I INTRODUCTION
A distributed database management system
In today’s world of universal dependence on (DDBMS) is the software that manages the
information systems, all sorts of people need DDB, and provides an access mechanism that
access to companies’ databases. In addition to a makes this distribution transparent to the user.
company’s own employees, these include the Distributed database system (DDBS) is the
company’s customers, potential customers, integration of DDB and DDBMS. This
suppliers, and vendors of all types. It is possible integration is achieved through the merging the
for a company to have all of its databases database and networking technologies together.
concentrated at one mainframe computer site Or it can be described as, a system that runs on a
with worldwide access to this site provided by collection of machines that do not have shared
telecommunications networks, including the memory, yet looks to the user like a single
Internet. Although the management of such a machine.
centralized system and its databases can be
controlled in a well-contained manner and this A distributed database (DDB) is a collection of
can be advantageous, it poses some problems as multiple, logically interrelated databases
well. For example, if the single site goes down, distributed over a computer network. A
then everyone is blocked from accessing the distributed database management system
databases until the site comes back up again. (distributed DBMS) is the software system that
Also the communications costs from the many permits the management of the distributed

IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 139
ISSN (Online): 2231-5268
www.ijcsms.com

database and makes the distribution transparent DDB is logically a single database even if
to the users [1]. The term distributed database physically it is distributed.
system (DDBS) is typically used to refer to the
combination of DDB and the distributed DBMS.
Distributed DBMSs are similar to distributed file III ARCHITECTURE CONCERN
systems (see Distributed File Systems) in that
both facilitate access to distributed data. A. The Hardware
However, there are important differences in
structure and functionality, and these Due to the extended functionality the DDBS
characterize a distributed database system: must be capable of, the DDBS design becomes
more complex and more sophisticated. At the
1. Distributed file systems simply allow users to physical level the differences between
access files that are located on machines other centralized and distributed systems are:
than their own. These files have no explicit
structure (i.e., they are flat) and the relationships  Multiple computers called sites.
among data in different files (if there are any)  These sites are connected via a
are not managed by the system and are the users communication network, to enable the
responsibility. A DDB, on the other hand, is data/query communications. Figure 1.
organized according to a schema that defines illustrates this architecture
both the structure of the distributed data, and the
relationships among the data. The schema is
defined according to some data model, which is
usually relational or object-oriented (s e e
Distributed Database Schemas).

2. A distributed file system provides a simple


interface to users which allows them to open,
read/write (records or bytes), and close files. A
distributed DBMS system has the full
functionality of a DBMS. It provides high-level,
declarative query capability, transaction
management (both concurrency control and
Figure 1 Client Server Architecture
recovery), and integrity enforcement. In this
regard, distributed DBMSs are different from
Networks can have several types of topologies
transaction processing systems as well, since the
that define how nodes are physically and
latter provide only some of these functions.
logically connected. One of the popular
3. A distributed DBMS provides transparent
topologies used in DDBS, the client-server
access to data, while in a distributed file system
architecture is described as follows: the
the user has to know (to some extent) the
principle idea of this architecture is to define
location of the data. A DDB may be partitioned
specialized servers with specific functionalities
(called fragmentation) and replicated in addition
such as: printer server, mail server, file server,
to being distributed across multiple sites. All of
etc. these serves then are connected to a
this is not visible to the users. In this sense, the
network of clients that can access the services
distributed database technology extends the
of these servers. Stations (servers or clients) can
concept of data independence, which is a central
have different design complexities starting from
notion of database management, to
Diskless client to combined server-client
environments where data are distributed and
machine. This is illustrated in Figure 1.
replicated over a number of machines connected
by a network. Thus, from a user s perspective, a

IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 140
ISSN (Online): 2231-5268
www.ijcsms.com

The server-client architecture requires some kind DBMS server approach:


of function definition for servers and clients. A much better arrangement is variously known
The DBMS functions are divided between as the database server or DBMS server
servers and clients using different approaches. approach. Again, the database is located at the
We present a common approach that is used server, but this time, the processing is split
with relational DDBS, called centralized DMBS between the client and the server, and there is
at the server level. much less data traffic on the network. Say that
The client refers to a data distribution dictionary someone at a client computer wants to query the
to know how to decompose the global query in database at the server. The query is entered at
to multiple local queries. The interaction is done the client, and the client computer performs the
as follows: initial keyboard and screen interaction
processing, as well as initial syntax checking of
1. Client parses the user’s query and the query. The system then ships the query over
decomposes it into independent site queries. the LAN to the server where the query is
actually run against the database. Only the
2. Client forwards each independent query to results are shipped back to the client. Certainly,
the corresponding server by consulting with the this is a much better arrangement than the file
data distribution dictionary. server approach! The network data traffic is
reduced to a tolerable level, even for frequently
3. Each server process the local query, and queried databases. Also, security and
sends back the resulting relation to the client. concurrency control can be handled at the
server in a much more contained way. The only
4. Client combines (manually by the user, or real drawback to this approach is that the
automatically by client abstract) the received sub company must invest in a sufficiently powerful
queries, and do more processing if needed to get server to keep up with all of the activity
to the final target result. concentrated there.

We would like to discuss the different Two-tier client/server:


architectures of DDBS for the two main types, Another issue involving the data on a LAN is
the client/server, and the distributed databases: the fact that some databases can be stored on a
client PC’s own hard drive while other
The client/server: databases that the client might access are stored
the simplest tactic is known as the file server on the LAN’s server. This is also known as a
approach. When a client computer on the LAN two-tier approach, (Figure 2). Software has been
needs to query, update, or otherwise use a file developed that makes the location of the data
on the server, the entire file must be sent from transparent to the user at the client. In this mode
the server to that client. All of the querying, of operation, the user issues a query at the client,
updating, or other processing is then performed and the software first checks to see if the
in the client computer. If changes were made to required data is on the PC’s own hard drive. If it
the file, the entire file is then shipped back to is, the data is retrieved from it, and that is the
the server. Clearly, for files of even moderate end of the story. If it is not there, then the
size, shipping entire files back and forth across software automatically looks for it on the server.
the LAN with any frequency will be very
costly. In terms of concurrency control, In an even more sophisticated three-tier
obviously the entire file must be locked while approach (Figure 3), if the software doesn’t
one of the clients is updating even one record in find the data on the client PC’s hard drive or on
it. Other than providing a basic file-sharing the LAN server, it can leave the LAN through a
capability, this arrangement’s drawbacks render gateway computer and look for the data on, for
it not very practical or useful.

IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 141
ISSN (Online): 2231-5268
www.ijcsms.com

example, a large, mainframe computer that may the companies’ business with the “visitors”
be reachable from many LANs. working through their browsers. The company
application servers in turn rely on the
companies’ database servers to provide the
necessary data to complete the transactions. For
example, when a bank’s customer visits his
bank’s Web site, he can initiate lots of different
transactions, ranging from checking his account
balances to transferring money between
accounts to paying his credit card bills. The
bank’s Web application server handles all of
these transactions. It, in turn, sends requests to
the bank’s database server and databases to
Figure 2: Two-tier Client/Server retrieve the current account balances, add
money to one account while deducting money
from another in a funds transfer, and so forth.

Figure 3: Three Tier Client/Server

Three-tier approach: In another use of the term Figure 4 : Another version of Three Tier
three-tier approach, the three tiers are the client
PCs, servers known as application servers, and
other servers known as database servers,
(Figure 4). In this arrangement, local screen and Distributed Database
keyboard interaction is still handled by the
clients, but they can now request a variety of 1. No replication:
applications to be performed at and by the The first and simplest idea in distributing the
application servers. The application servers, in data would be to disperse the six tables among
turn, rely on the database servers and their the five sites. If particular tables are used at
databases to supply the data needed by the some sites more frequently than at other sites, it
applications. Though certainly well beyond the would make sense to locate the tables at the
scope of LANs, an example of this kind of sites at which they are most frequently used.
arrangement is the World Wide Web on the Benefits include: local autonomy (security,
Internet. The local processing on the clients is concurrency, backup, recovery), efficient local
limited to the data input and data display transaction. Problems include: if one site goes
capabilities of browsers such as Netscape’s down, then it is not accessible by the rest of the
Communicator and Microsoft’s Internet system. Expensive joins. The security can be
Explorer. The application servers are the argued, one single place, one database is more
computers at company Web sites that conduct secure than DDBS

IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 142
ISSN (Online): 2231-5268
www.ijcsms.com

Selective replication: replicate all at the


headquarters (improves join, all joins at the
headquarters, and replicate each table only once
in the network, so you have 2 copies of each on
the entire network. Figure 7.

Figure 5 : No replication Approach

Figure 7 : Selective Replication


2 Replication the entire DB at each site:
Benefits include, better availability. If more
This last approach has some down sides, more
than one site requires frequent access to a
than two sites could use a table frequently (need
particular table, the table can be replicated at
more replicas), bottleneck at the headquarter for
each of those sites, again minimizing
the join operations. To avoid these, we use the
telecommunications. And copies of a table can
heuristics:
be located at sites that have tables with which it
may have to be joined. Problems include, less
 Place copies of tables at the sites that
security, concurrency and consistency. At the
use them most heavily in order to
extreme: all tables are replicated, very efficient
minimize telecommunications costs.
for availability and join, whereas it is the worst
 Ensure that there are at least two copies
alternative for concurrency, consistency, and
of important or frequently used tables to
disk space Figure 6.
realize the gains in availability.
 Limit the number of copies of any one
table to control the security and
concurrency issues.
 Avoid any one site becoming a
bottleneck.
Figure 8. illustrates a DDBS using these
Heuristics

Figure 6 : Replication of all Tables

IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 143
ISSN (Online): 2231-5268
www.ijcsms.com

horizontal and vertical scaling of resources,


better price/performance on client machines,
ability to use familiar tools on client machines,
client access to remote data (via standards), full
DBMS functionality provided to client
workstations, and overall better system
price/performance

Disadvantages of Client/Server architecture


include: server forms bottleneck, server forms
single point of failure, and database scaling is
difficult .
Figure 8: Replication by heuristics
IV SOFTWARE ASPECT It is preferable for a DDMBS to have the
property of distribution transparency (Figure
In a typical DDBS, three levels of software 10), where the user’s can issue a global queries
modules are defined: without knowing or worrying about the global
distribution in the DDBS.
 The server software: responsible for b
local data management at site.
 The client software: responsible for
most of the distribution functions;
DDBMS catalog, processes all requests
that require more than one site. Other
functions for the client include:
consistency of replicated data,
atomicity of global transactions.
 The communications software: provides
the communication primitives, used by
the client/server to exchange data and
commands Figure 9.

Figure 10 : Layers of Transparency

V FRAGMENTATION & REPLICATION

In distributing and allocating the database in


the previous section, we assumed that the entire
relations are kept intact. However, in DDBS we
need to define the logical unit of DB
distribution and allocation. In some cases it
might be more efficient to split the tables into
smaller units (fragments) and allocate them in
different sites.
Figure 9 : Client/ Server Software
Fragmentation has three different types:
Advantages of Client/Server architecture
include: More efficient division of labor, A. Horizontal Fragmentation

IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 144
ISSN (Online): 2231-5268
www.ijcsms.com

In vertical partitioning, the columns of a table


As appears in Figure 11. the table G has been are divided up among several cities on the
added to demonstrate the fragmentation network. Each such partition must include the
operation. An example on horizontal primary key attribute(s) of the table. This
fragmentation is the employee’s table (G). It arrangement can make sense when different
makes since for the company to split G into sites are responsible for processing different
different partitions based on the employees who functions involving an entity. For example, the
work on that site. This makes the management, salary attributes of a personnel table might be
queries, and transactions convenient and stored in one city while the skills attributes of
efficient. The Down side of this choice is that, the table might be stored in another city. Both
whenever a query involving all G records, it has partitions would include the employee number,
to request all partitions from all sites and do a the primary key of the full table. A down side
union on them. . of this option is that, a query involving the
entire table G (Figure 13) would have to request
all portions from all sites and do a join on them.

Figure 13 : Vertical Fragmentation

C. Hybrid Fragmentation
Figure 11 : Fragmentation among Tables
In this type of fragmentation scheme, the table
is divided into arbitrary blocks, based on the
needed requirements. Each fragment hen can be
allocated on to a specific site. This type of
fragmentation is the most complex one, which
needs more management. This is illustrated in
Figure 14

Figure 12 : Horizontal Fragmentation

B. Vertical Fragmentation

IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 145
ISSN (Online): 2231-5268
www.ijcsms.com

The following two sections describe two


Figure 14 : Hybrid Fragmentation suggestions to manage concurrency control .
.
VI QUERY PROCESSING A. Distinguished Copy of a Data Item

DDBS adds to the conventional centralized There are three variations to this method:
DDBS some other types of processing primary site technique, primary site with
expenses, because of the additional design backup site, and primary copy technique. These
(hardware & software) to handle the distribution. techniques are described as follows:
These expenses present as the cost of data
transfer over the network. Data transferred a) Primary site
could be, intermediate files resulting from local In this method, a single site is designated as the
sites, or final results need to be sent back to the coordinator site. All locks and unlocks for all
original site that issued the query. Therefore, data units are controlled by this site. One
database designers are concerned about query advantage is, easy to implement. However two
optimization, which target minimizing the cost downsides of this method are: overloading of
of transferring data across the network. the coordinator site, and this site forms a single
point failure for the entire DDBS.
One method to optimize query on DDBS is, the
simijoin, where a relation R1 can send the
entire join-column CR1 to the target relation b) Primary site with backup site
R2, then the site containing R2 would perform This technique addresses the second
the join on CR1, and project on the passed disadvantage in the 1st technique (primary site)
attributes. The resulting tuples are then shipped by designating a backup site, that can take over
back to R! for further processing. This can as the new coordinator in case of failure, in
significantly enhance the query efficiency, since which case, another backup site has to be
the data transferred on the network is selected.
minimized.
c) Primary copy technique
VII CONCURRENCY & RECOVERY This method distribute the load to the sites that
have a designated primary copy of a data unit as
DDBS design of concurrency and recovery, has opposed to centralizing the entire data units in
to consider different aspects other than of those one coordinator site. This way if a site goes
of centralized DBS. These aspects include: down, only transactions involving the primary
copies residing on that site will be effected.
 Multiple copies of data: concurrency
has to maintain the data copies B. Voting
consistent. Recovery on the other hand
has to make a copy consistent with This method does not designate any
others whenever a site recovers from a distinguished copy or site to be the coordinator
failure. as suggested in the 1st two methods described
 Failure of communication links above. When a site attempts to lock a data unit,
 Failure of individual sites requests to all sites having the desired copy,
 Distributed commit: during transaction must be sent asking to lock this copy. If the
commit some sites may fail, so the two- requesting transaction did was not granted the
phase commit is used to solve this lock by the majority voting from the sites, then
problem. the transaction fails and sends cancellation to
 Deadlocks on multiple sites. all. Otherwise it keeps the lock and informs all
sites that it has been granted the lock.

IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 146
ISSN (Online): 2231-5268
www.ijcsms.com

C. Recovery

The first step of dealing with the recovery


problem is to identify that there was a failure,
what type was it, and at which site did that
happen. Dealing with distributed recovery
requires aspects include: database logs, and
update protocols, transaction failure recovery
protocol, etc .

VIII CONCLUSION

Through this paper, we want to attract readers


towards the advantageous side of distributed
databases. We also mentioned the software
architecture being used for the distributed
database .We also described Fragmentation,
replication and recovery aspect also in order to
make readers completely aware about the topic
being described here. Besides having a fruitful
side of DDBs ,It also attracts researchers for
finding the new scope in it.

IX REFERENCES
[1] Patrick O’Neil, and Goetz Graefe. 1995. Multi-Table
Joins Through Bitmapped Join Indices. SIGMOD Record,
Vol. 24, No. 3, September 1995

[2] Ambrose Goicoechea. 2000. Requirements Blueprint


and Multiple Criteria For Distributed Database Design.
International Council on Systems Engineering (INCOSE)
2000.

[3] Yin-Fu Huang, and Jyh-Her Chen. 2001. Fragment


Allocation in Distributed Database Design. Journal of
Information Science and Engineering 17, 491-506
(2001).

[4] Cyrus Shahabi, Latifur Khan, and Dennis McLeod.


2000. A Probe-Based Technique to Optimize Join Queries
in Distributed Internet Databases. Knowledge and
Information Systems
(2000) 2: 373-385

[5] Charles P. Pfleeger and Shari Lawrence Pfleeger,


Security in Computing, Prentice Hall Professional
Technical Reference, Upper Saddle River, New Jersey,
2003.

[6] James F. Kurose and Keith W. Ross, Computer


Networking: A Top-Down Approach Featuring the
Internet, Pearson Education, Inc, New York, 2003.

IJCSMS
www.ijcsms.com

You might also like