Fundamental Research of Distributed Database PDF
Fundamental Research of Distributed Database PDF
II DISTRIBUTED DATABASES
I INTRODUCTION
A distributed database management system
In today’s world of universal dependence on (DDBMS) is the software that manages the
information systems, all sorts of people need DDB, and provides an access mechanism that
access to companies’ databases. In addition to a makes this distribution transparent to the user.
company’s own employees, these include the Distributed database system (DDBS) is the
company’s customers, potential customers, integration of DDB and DDBMS. This
suppliers, and vendors of all types. It is possible integration is achieved through the merging the
for a company to have all of its databases database and networking technologies together.
concentrated at one mainframe computer site Or it can be described as, a system that runs on a
with worldwide access to this site provided by collection of machines that do not have shared
telecommunications networks, including the memory, yet looks to the user like a single
Internet. Although the management of such a machine.
centralized system and its databases can be
controlled in a well-contained manner and this A distributed database (DDB) is a collection of
can be advantageous, it poses some problems as multiple, logically interrelated databases
well. For example, if the single site goes down, distributed over a computer network. A
then everyone is blocked from accessing the distributed database management system
databases until the site comes back up again. (distributed DBMS) is the software system that
Also the communications costs from the many permits the management of the distributed
IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 139
ISSN (Online): 2231-5268
www.ijcsms.com
database and makes the distribution transparent DDB is logically a single database even if
to the users [1]. The term distributed database physically it is distributed.
system (DDBS) is typically used to refer to the
combination of DDB and the distributed DBMS.
Distributed DBMSs are similar to distributed file III ARCHITECTURE CONCERN
systems (see Distributed File Systems) in that
both facilitate access to distributed data. A. The Hardware
However, there are important differences in
structure and functionality, and these Due to the extended functionality the DDBS
characterize a distributed database system: must be capable of, the DDBS design becomes
more complex and more sophisticated. At the
1. Distributed file systems simply allow users to physical level the differences between
access files that are located on machines other centralized and distributed systems are:
than their own. These files have no explicit
structure (i.e., they are flat) and the relationships Multiple computers called sites.
among data in different files (if there are any) These sites are connected via a
are not managed by the system and are the users communication network, to enable the
responsibility. A DDB, on the other hand, is data/query communications. Figure 1.
organized according to a schema that defines illustrates this architecture
both the structure of the distributed data, and the
relationships among the data. The schema is
defined according to some data model, which is
usually relational or object-oriented (s e e
Distributed Database Schemas).
IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 140
ISSN (Online): 2231-5268
www.ijcsms.com
IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 141
ISSN (Online): 2231-5268
www.ijcsms.com
example, a large, mainframe computer that may the companies’ business with the “visitors”
be reachable from many LANs. working through their browsers. The company
application servers in turn rely on the
companies’ database servers to provide the
necessary data to complete the transactions. For
example, when a bank’s customer visits his
bank’s Web site, he can initiate lots of different
transactions, ranging from checking his account
balances to transferring money between
accounts to paying his credit card bills. The
bank’s Web application server handles all of
these transactions. It, in turn, sends requests to
the bank’s database server and databases to
Figure 2: Two-tier Client/Server retrieve the current account balances, add
money to one account while deducting money
from another in a funds transfer, and so forth.
Three-tier approach: In another use of the term Figure 4 : Another version of Three Tier
three-tier approach, the three tiers are the client
PCs, servers known as application servers, and
other servers known as database servers,
(Figure 4). In this arrangement, local screen and Distributed Database
keyboard interaction is still handled by the
clients, but they can now request a variety of 1. No replication:
applications to be performed at and by the The first and simplest idea in distributing the
application servers. The application servers, in data would be to disperse the six tables among
turn, rely on the database servers and their the five sites. If particular tables are used at
databases to supply the data needed by the some sites more frequently than at other sites, it
applications. Though certainly well beyond the would make sense to locate the tables at the
scope of LANs, an example of this kind of sites at which they are most frequently used.
arrangement is the World Wide Web on the Benefits include: local autonomy (security,
Internet. The local processing on the clients is concurrency, backup, recovery), efficient local
limited to the data input and data display transaction. Problems include: if one site goes
capabilities of browsers such as Netscape’s down, then it is not accessible by the rest of the
Communicator and Microsoft’s Internet system. Expensive joins. The security can be
Explorer. The application servers are the argued, one single place, one database is more
computers at company Web sites that conduct secure than DDBS
IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 142
ISSN (Online): 2231-5268
www.ijcsms.com
IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 143
ISSN (Online): 2231-5268
www.ijcsms.com
IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 144
ISSN (Online): 2231-5268
www.ijcsms.com
C. Hybrid Fragmentation
Figure 11 : Fragmentation among Tables
In this type of fragmentation scheme, the table
is divided into arbitrary blocks, based on the
needed requirements. Each fragment hen can be
allocated on to a specific site. This type of
fragmentation is the most complex one, which
needs more management. This is illustrated in
Figure 14
B. Vertical Fragmentation
IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 145
ISSN (Online): 2231-5268
www.ijcsms.com
DDBS adds to the conventional centralized There are three variations to this method:
DDBS some other types of processing primary site technique, primary site with
expenses, because of the additional design backup site, and primary copy technique. These
(hardware & software) to handle the distribution. techniques are described as follows:
These expenses present as the cost of data
transfer over the network. Data transferred a) Primary site
could be, intermediate files resulting from local In this method, a single site is designated as the
sites, or final results need to be sent back to the coordinator site. All locks and unlocks for all
original site that issued the query. Therefore, data units are controlled by this site. One
database designers are concerned about query advantage is, easy to implement. However two
optimization, which target minimizing the cost downsides of this method are: overloading of
of transferring data across the network. the coordinator site, and this site forms a single
point failure for the entire DDBS.
One method to optimize query on DDBS is, the
simijoin, where a relation R1 can send the
entire join-column CR1 to the target relation b) Primary site with backup site
R2, then the site containing R2 would perform This technique addresses the second
the join on CR1, and project on the passed disadvantage in the 1st technique (primary site)
attributes. The resulting tuples are then shipped by designating a backup site, that can take over
back to R! for further processing. This can as the new coordinator in case of failure, in
significantly enhance the query efficiency, since which case, another backup site has to be
the data transferred on the network is selected.
minimized.
c) Primary copy technique
VII CONCURRENCY & RECOVERY This method distribute the load to the sites that
have a designated primary copy of a data unit as
DDBS design of concurrency and recovery, has opposed to centralizing the entire data units in
to consider different aspects other than of those one coordinator site. This way if a site goes
of centralized DBS. These aspects include: down, only transactions involving the primary
copies residing on that site will be effected.
Multiple copies of data: concurrency
has to maintain the data copies B. Voting
consistent. Recovery on the other hand
has to make a copy consistent with This method does not designate any
others whenever a site recovers from a distinguished copy or site to be the coordinator
failure. as suggested in the 1st two methods described
Failure of communication links above. When a site attempts to lock a data unit,
Failure of individual sites requests to all sites having the desired copy,
Distributed commit: during transaction must be sent asking to lock this copy. If the
commit some sites may fail, so the two- requesting transaction did was not granted the
phase commit is used to solve this lock by the majority voting from the sites, then
problem. the transaction fails and sends cancellation to
Deadlocks on multiple sites. all. Otherwise it keeps the lock and informs all
sites that it has been granted the lock.
IJCSMS
www.ijcsms.com
IJCSMS International Journal of Computer Science and Management Studies, Vol. 11, Issue 02, Aug 2011 146
ISSN (Online): 2231-5268
www.ijcsms.com
C. Recovery
VIII CONCLUSION
IX REFERENCES
[1] Patrick O’Neil, and Goetz Graefe. 1995. Multi-Table
Joins Through Bitmapped Join Indices. SIGMOD Record,
Vol. 24, No. 3, September 1995
IJCSMS
www.ijcsms.com