0% found this document useful (0 votes)
114 views46 pages

Distributed Database Concepts

The document discusses distributed database concepts, including the architecture of distributed systems, types of transparency, and the importance of availability and reliability. It covers data fragmentation, replication, and allocation techniques, as well as concurrency control and recovery in distributed databases. Additionally, it addresses transaction management and query processing optimization in distributed environments.

Uploaded by

hend mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views46 pages

Distributed Database Concepts

The document discusses distributed database concepts, including the architecture of distributed systems, types of transparency, and the importance of availability and reliability. It covers data fragmentation, replication, and allocation techniques, as well as concurrency control and recovery in distributed databases. Additionally, it addresses transaction management and query processing optimization in distributed environments.

Uploaded by

hend mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Copyright © 2016 Ramez Elmasri and Shamkant B.

Navathe
CHAPTER 23

Distributed Database Concepts

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Introduction
◼ Distributed computing system
◼ Consists of several processing sites or nodes
interconnected by a computer network
◼ Nodes cooperate in performing certain tasks
◼ Partitions large task into smaller tasks for efficient
solving
◼ Big data technologies
◼ Combine distributed and database technologies
◼ Deal with mining vast amounts of data

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 3


23.1 Distributed Database Concepts
◼ What constitutes a distributed database?
◼ Connection of database nodes over computer
network
◼ Logical interrelation of the connected databases
◼ Possible absence of homogeneity among
connected nodes
◼ Distributed database management system
(DDBMS)
◼ Software system that manages a distributed
database

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 4


Distributed Database Concepts
(cont’d.)
◼ Local area network
◼ Hubs or cables connect sites
◼ Long-haul or wide area network
◼ Telephone lines, cables, wireless, or satellite
connections
◼ Network topology defines communication path
◼ Transparency
◼ Hiding implementation details from the end user

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 5


Transparency
◼ Types of transparency
◼ Data organization transparency
Hides how data is logically structured or physically stored. Users
interact with data without needing to know its underlying
organization (e.g., tables, partitions, or distributed storage).
◼ Location transparency :Users can access data
without knowing its physical location (e.g., which
server or node stores it).
◼ Naming transparency: Ensures that a unique
name identifies each data item, regardless of its
location. Avoids name conflicts in distributed
environments.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 6


Transparency
◼ Types of transparency
◼ Replication transparency
Hides the fact that data may be replicated (stored in multiple
locations). Users see a single copy, even if multiple copies exist for
fault tolerance or performance.
◼ Fragmentation transparency

Conceals how data is split (fragmented) across different nodes.


Two types:
◼ Horizontal fragmentation: Rows of a table are distributed

across different nodes (e.g., employees in different regions


stored separately).
◼ Vertical fragmentation : Columns of a table are distributed
(e.g., employees names in one node and addresses in another).

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 7


Transparency
◼ Types of transparency
◼ Design transparency
Hides the complexities of how the distributed system was
designed.
Users and applications do not need to know about
partitioning strategies or replication schemes.
◼ Execution transparency
Conceals the details of how a distributed operation is
executed across multiple nodes.
Example: A distributed transaction appears as a single
operation, even if it involves multiple servers.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 8


Distributed Databases

Figure 23.1 Data distribution and replication among distributed databases

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 9


Availability and Reliability
◼ Availability
◼ Probability that the system is continuously
available during a time interval
◼ Reliability
◼ Probability that the system is running (not down) at
a certain time point
◼ Both directly related to faults, errors, and failures

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 10


Availability and Reliability
◼ Fault-tolerant approaches
◼ Replication: Multiple copies of data ensure continuity if
one node fails.
◼ Redundancy: Extra components (servers, networks)
take over if primary ones fail.
◼ Consensus Protocols: Algorithms
like Paxos or Raft ensure consistency even during
failures.
◼ Check pointing & Recovery: Periodic snapshots allow
quick recovery after crashes.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 11


Scalability and Partition Tolerance
◼ Horizontal scalability
◼ Expanding the number of nodes in a distributed
system
◼ Vertical scalability
◼ Expanding capacity of the individual nodes
◼ Partition tolerance
◼ System should have the capacity to continue
operating while the network is partitioned

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 12


Autonomy
◼ Determines extent to which individual nodes can
operate independently
◼ Design autonomy
◼ Independence of data model usage and
transaction management techniques among nodes
◼ Communication autonomy
◼ Determines the extent to which each node can
decide on sharing information with other nodes
◼ Execution autonomy
◼ Independence of users to act as they please
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 13
Advantages of Distributed Databases
◼ Improved ease and flexibility of application
development
◼ Development at geographically dispersed sites
◼ Increased availability
◼ Isolate faults to their site of origin
◼ Improved performance
◼ Data localization
◼ Easier expansion via scalability
◼ Easier than in non-distributed systems

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 14


23.2 Data Fragmentation, Replication, and Allocation
Techniques for Distributed Database Design

◼ Fragments
◼ Logical units of the database
◼ Horizontal fragmentation (sharding)
◼ Horizontal fragment or shard of a relation is a
subset of the tuples in that relation
◼ Can be specified by condition on one or more
attributes or by some other method
◼ Groups rows to create subsets of tuples
◼ Each subset has a certain logical meaning

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 15


Data Fragmentation (cont’d.)
◼ Vertical fragmentation
◼ Divides a relation vertically by columns
◼ Keeps only certain attributes of the relation
◼ Complete horizontal fragmentation
◼ Apply UNION operation to the fragments to
reconstruct relation
◼ Complete vertical fragmentation
◼ Apply OUTER UNION or FULL OUTER JOIN
operation to reconstruct relation

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 16


Data Fragmentation (cont’d.)
◼ Mixed (hybrid) fragmentation
◼ Combination of horizontal and vertical
fragmentations
◼ Fragmentation schema
◼ Defines a set of fragments that includes all
attributes and tuples in the database
◼ Allocation schema
◼ Describes the allocation of fragments to nodes of
the DDBS

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 17


Data Replication and Allocation
◼ Fully replicated distributed database
◼ Replication of whole database at every site in
distributed system
◼ Improves availability remarkably
◼ Update operations can be slow
◼ Nonredundant allocation (no replication)
◼ Each fragment is stored at exactly one site

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 18


Data Replication and Allocation
(cont’d.)
◼ Partial replication
◼ Some fragments are replicated and others are not
◼ Defined by replication schema
◼ Data allocation (data distribution)
◼ Each fragment assigned to a particular site in the
distributed system
◼ Choices depend on performance and availability
goals of the system

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 19


Example of Fragmentation,
Allocation, and Replication
◼ Company with three computer sites
◼ One for each department
◼ Expect frequent access by employees working in
the department and projects controlled by that
department
◼ See Figures 23.2 and 23.3 in the text for example
fragmentation among the three sites

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 20


23.3 Overview of Concurrency Control
and Recovery in Distributed Databases
◼ For concurrency control and recovery purposes,
numerous problems arise in a distributed DBMS
environment
◼ Dealing with multiple copies of the data items
The concurrency control method is responsible for
maintaining consistency among these copies.
◼ Failure of individual sites
The DDBMS should continue to operate with its
running sites, if possible, when one or more individual
sites fail.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 21


23.3 Overview of Concurrency Control
and Recovery in Distributed Databases
◼ Failure of communication links
The system must be able to deal with the failure of one or more
of the communication links that connect the sites.
◼ Distributed commit
Problems can arise with committing a transaction that is
accessing databases stored on multiple sites if some sites fail
during the commit process. The two-phase commit protocol (see
Section 21.6) is often used to deal with this problem.
◼ Distributed deadlock
Deadlock may occur among several sites, so techniques for
dealing with deadlocks must be extended to take this into
account

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 22


Distributed Concurrency Control Based
on a Distinguished Copy of a Data Item
To deal with replicated data items in a distributed database, a number of
concurrency control methods have been proposed that extend the
concurrency control techniques that are used in centralized databases.
The idea is to designate a particular copy of each data item as a
distinguished copy.
◼ Primary site technique
◼ In this method, a single primary site is designated to be the
coordinator site for all database items. Hence, all locks are kept at
that site, and all requests for locking or unlocking are sent there
◼ disadvantages. One is that all locking requests are sent to a single
site, possibly overloading that site and causing a system
bottleneck. A second disadvantage is that failure of the primary
site paralyzes the system,.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 23


Distributed Concurrency Control Based
on a Distinguished Copy of a Data Item
◼ Primary site with backup site
◼ This approach addresses the second disadvantage of

the primary site method by designating a second site to


be a backup site. All locking information is maintained
at both the primary and the backup sites. In case of
primary site failure, the backup site takes over as the
primary site, and a new backup site is chosen.
◼ Disadvantages. It slows down the process of acquiring

locks, however, because all lock requests and granting


of locks must be recorded at both the primary and the
backup sites before a response is sent to the
requesting transaction.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 24


Distributed Concurrency Control Based
on a Distinguished Copy of a Data Item
◼ Primary copy method
◼ This method attempts to distribute the load of lock
coordination among various sites by having the
distinguished copies of different data items stored
at different sites.
◼ Failure of one site affects any transactions that are
accessing locks on items whose primary copies
reside at that site, but other transactions are not
affected.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 25


Distributed Concurrency Control
Based on Voting
◼ Voting method
◼ No distinguished copy
◼ Lock requests sent to all sites that contain a copy
◼ Each copy maintains its own lock
◼ If transaction that requests a lock is granted that lock by a majority
of the copies, it holds the lock on all copies
◼ If a transaction does not receive a majority of votes granting it a
lock within a certain time-out period, it cancels its request and
informs all sites of the cancellation.
◼ Time-out period applies
◼ Disadvantage : Results in higher message traffic among sites

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 26


Distributed Recovery
◼ Difficult to determine whether a site is down
without exchanging numerous messages with
other sites
◼ Distributed commit
◼ When a transaction is updating data at several
sties, it cannot commit until certain its effect on
every site cannot be lost
◼ Two-phase commit protocol often used to ensure
correctness

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 27


23.4 Overview of Transaction
Management in Distributed Databases
◼ Global transaction manager
◼ Supports distributed transactions
◼ Role temporarily assumed by site at which
transaction originated
◼ Coordinates execution with transaction managers at
multiple sites
◼ Passes database operations and associated
information to the concurrency controller
◼ Controller responsible for acquisition and release of
locks

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 28


Commit Protocols
◼ Two-phase
◼ Coordinator maintains information needed for
recovery
◼ In addition to local recovery managers
◼ Three-phase
◼ Divides second commit phase into two subphases
◼ Prepare-to-commit phase communicates result of
the vote phase
◼ Commit subphase same as two-phase commit
counterpart

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 29


23.5 Query Processing and Optimization
in Distributed Databases
◼ Stages of a distributed database query
◼ Query mapping
◼ Refers to global conceptual schema
◼ Localization
◼ Maps the distributed query to separate queries on
individual fragments
◼ Global query optimization
◼ Strategy selected from list of candidates
◼ Local query optimization
◼ Common to all sites in the DDB

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 30


Query Processing and Optimization in
Distributed Databases (cont’d.)
◼ Data transfer costs of distributed query
processing
◼ Cost of transferring intermediate and final result
files
◼ Optimization criterion: reducing amount of data
transfer

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 31


Query Processing and Optimization in
Distributed Databases (cont’d.)
◼ Distributed query processing using semijoin
◼ Reduces the number of tuples in a relation before
transferring it to another site
◼ Send the joining column of one relation R to one
site where the other relation S is located
◼ Join attributes and result attributes shipped back
to original site
◼ Efficient solution to minimizing data transfer

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 32


Query Processing and Optimization in
Distributed Databases (cont’d.)
◼ Query and update decomposition
◼ User can specify a query as if the DBMS were
centralized
◼ If full distribution, fragmentation, and replication
transparency are supported
◼ Query decomposition module
◼ Breaks up a query into subqueries that can be
executed at the individual sites
◼ Strategy for combining results must be generated
◼ Catalog stores attribute list and/or guard condition

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 33


23.6 Types of Distributed Database
Systems
◼ Factors that influence types of DDBMSs
◼ Degree of homogeneity of DDBMS software
◼ Homogeneous
◼ Heterogeneous
◼ Degree of local autonomy
◼ No local autonomy
◼ Multidatabase system has full local autonomy
◼ Federated database system (FDBS)
◼ Global view or schema of the federation of
databases is shared by the applications

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 34


Classification of Distributed Databases

Figure 23.6 Classification of distributed databases


Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23-35
Types of Distributed Database
Systems (cont’d.)
◼ Federated database management systems
issues
◼ Differences in data models
◼ Differences in constraints
◼ Differences in query languages
◼ Semantic heterogeneity
◼ Differences in meaning, interpretation, and
intended use of the same or related data

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 36


Types of Distributed Database
Systems (cont’d.)
◼ Design autonomy allows definition of the
following parameters
◼ The universe of discourse from which the data is
drawn
◼ Representation and naming
◼ Understanding, meaning, and subjective
interpretation of data
◼ Transaction and policy constraints
◼ Derivation of summaries

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 37


Types of Distributed Database
Systems (cont’d.)
◼ Communication autonomy
◼ Decide whether to communicate with another
component DBS
◼ Execution autonomy
◼ Execute local operations without interference from
external operations by other component DBSs
◼ Ability to decide order of execution
◼ Association autonomy
◼ Decide whether and how much to share its
functionality and resources

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 38


23.7 Distributed Database
Architectures
◼ Parallel versus distributed architectures
◼ Types of multiprocessor system architectures
◼ Shared memory (tightly coupled)
◼ Shared disk (loosely coupled)
◼ Shared-nothing

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 39


Database System Architectures

Figure 23.7 Some different database system architectures (a) Shared-nothing


architecture (b) A networked architecture with a centralized database at one of
the sites (c) A truly distributed database architecture

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 40


General Architecture of Pure
Distributed Databases
◼ Global query compiler
◼ References global conceptual schema from the
global system catalog to verify and impose defined
constraints
◼ Global query optimizer
◼ Generates optimized local queries from global
queries
◼ Global transaction manager
◼ Coordinates the execution across multiple sites
with the local transaction managers

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 41


Schema Architecture of Distributed
Databases

Figure 23.8 Schema architecture of distributed databases


Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 42
Federated Database Schema
Architecture

Figure 23.9 The five-level schema architecture in


a federated database system (FDBS)
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 43
An Overview of Three-Tier
Client/Server Architecture
◼ Division of DBMS functionality among the three
tiers can vary

Figure 23.10 The three-tier client/server architecture


Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 44
23.8 Distributed Catalog
Management
◼ Centralized catalogs
◼ Entire catalog is stored at one single site
◼ Easy to implement
◼ Fully replicated catalogs
◼ Identical copies of the complete catalog are
present at each site
◼ Results in faster reads
◼ Partially replicated catalogs
◼ Each site maintains complete catalog information
on data stored locally at that site
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 45
23.9 Summary
◼ Distributed database concept
◼ Distribution transparency
◼ Fragmentation transparency
◼ Replication transparency
◼ Design issues
◼ Horizontal and vertical fragmentation
◼ Concurrency control and recovery techniques
◼ Query processing
◼ Categorization of DDBMSs

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 23- 46

You might also like