0% found this document useful (0 votes)

8 views27 pages

Distributed Databases

This document provides a detailed overview of Distributed Database Systems (DDBS), covering fundamental concepts, architecture, design, query processing, transaction management, and reliability. It discusses the advantages and challenges of DDBS, various architectural models, and design strategies for effective data distribution. Additionally, it outlines the query processing stages and optimization techniques necessary for efficient operation in a distributed environment.

Uploaded by

skadnanadnan041

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views27 pages

Distributed Databases

Uploaded by

skadnanadnan041

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Detailed Explanation of Distributed Database

Systems Concepts
This document provides a comprehensive overview of the specified units in Distributed Database
Systems, covering fundamental concepts, architecture, design, query processing, transaction
management, reliability, and object-oriented aspects.

UNIT-I: Introduction, Architecture, and Design

Introduction

Distributed Data Processing (DDP)

Definition: Distributed Data Processing refers to the collection, processing, and storage of data
in multiple interconnected computer systems (nodes) that are geographically dispersed but work
together to achieve a common goal. Each node handles a portion of the overall data processing
task. Key Idea: It's about distributing computation and data management across different
machines. Example: A global retail chain processing sales transactions at local stores, then
aggregating data at regional centers, and finally at a central headquarters. Each level processes
data relevant to its scope.

Distributed Database System (DDBS)

Definition: A Distributed Database System (DDBS) is a collection of multiple logically

interrelated databases distributed over a computer network. The system appears as a single
logical database to the user, who does not need to know where the data is physically stored or
how the operations are distributed. Key Characteristics:

 Distribution: Data is stored across multiple sites.

 Logical Interrelation: Data at different sites is related and forms a single logical
database.
 Network Connectivity: Sites are connected via a communication network.
 Distribution Transparency: Users interact with the system as if it were a single,
centralized database.

Diagram (Conceptual DDBS):

graph TD
UserQuery[User Query] --> DBMS1(Distributed DBMS)
DBMS1 --> SiteA(Database Site A)
DBMS1 --> SiteB(Database Site B)
DBMS1 --> SiteC(Database Site C)
SiteA -- Network --> SiteB
SiteB -- Network --> SiteC
Promises of DDBSs (Advantages)

1. Increased Reliability/Availability: If one site fails, other sites can continue to operate,
or its data might be available from a replicated copy.
2. Increased Scalability: New sites/nodes can be added to handle increased data volume or
processing load, allowing horizontal scaling.
3. Reflects Organizational Structure: Data can be stored where it is generated and
primarily used, aligning with organizational distribution.
4. Improved Performance: Data located closer to users reduces access latency. Parallel
processing of queries across multiple sites can also speed up execution.
5. Economic Advantages: It can be cheaper to use a network of smaller computers than a
single large mainframe.
6. Local Autonomy: Each site can maintain a degree of control over its local data, subject
to global consistency constraints.

Problem Areas (Challenges/Disadvantages)

1. Complexity: Designing, implementing, and managing DDBSs is significantly more

complex than centralized systems.
2. Cost: Higher development, management, and network communication costs.
3. Security: Managing security across multiple distributed sites is challenging.
4. Integrity Control: Ensuring global data consistency and integrity across distributed
copies is difficult.
5. Distributed Query Processing: Optimizing queries that span multiple sites is a complex
task.
6. Distributed Transaction Management: Ensuring atomicity, consistency, isolation, and
durability (ACID properties) for transactions across multiple sites is highly challenging,
especially in the presence of failures.
7. Heterogeneity: Integrating different DBMSs or operating systems at various sites.

Distributed DBMS Architecture

Architectural Models for Distributed DBMS

Different ways to structure the components of a DDBMS.

1. Client-Server Architecture:
o Concept: The most common model. Clients submit requests, and servers process
data and return results. The database is often distributed among multiple servers.
o Variants:
 Distributed Presentation: Client handles UI, server handles data.
 Distributed Application: Client handles UI and some application logic,
server handles data and remaining logic.
 Distributed Database: Client handles UI and application, servers manage
data. This is the focus for DDBMS.
o Diagram:
o graph TD
o C1[Client App 1] --> S[Server (DDBMS)]
o C2[Client App 2] --> S
o S --> DB1(Database Site 1)
o S --> DB2(Database Site 2)
o S --> DB3(Database Site 3)

2. Peer-to-Peer Architecture:
o Concept: Each node acts as both a client and a server. There is no central control.
Nodes communicate directly with each other to share data and processing.
o Pros: Highly fault-tolerant, scalable.
o Cons: Complex to manage consistency and discovery, often less suited for
traditional transactional databases.
o Example: Blockchain, some file-sharing systems.
3. Multi-database System (MDBS) Architecture:
o Concept: Integrates multiple existing, independent, and possibly heterogeneous
database systems. It sits on top of these local DBMSs, providing a unified view
without changing them.
o Types:
 Federated Database System (FDBS): Provides a transparent, integrated
view of underlying heterogeneous databases. The FDBS has more control
over component schema integration.
 Gateway Approach: Uses a gateway to translate queries and data
between a global DBMS and a local, heterogeneous DBMS. Less
integrated, more focused on connectivity.
o Diagram (Federated MDBS):
o graph TD
o UserApp[User Application] --> FDBS[Federated DDBMS]
o FDBS --> LDBMS1(Local DBMS 1 - Oracle)
o FDBS --> LDBMS2(Local DBMS 2 - SQL Server)
o FDBS --> LDBMS3(Local DBMS 3 - MySQL)

DDBMS Architecture (Components)

A DDBMS typically consists of several layers/components that interact to manage distributed

data.

1. Global Schema (Conceptual Schema): A single, integrated, logical description of the

entire distributed database. It hides the distribution and fragmentation details from users.
2. Fragmentation Schema: Describes how global relations are divided into fragments.
3. Allocation Schema: Describes where each fragment is stored (i.e., mapping fragments to
sites).
4. Global Query Processor: Responsible for parsing and validating global queries,
decomposing them into sub-queries, and optimizing their execution across sites.
5. Global Transaction Manager: Coordinates global transactions, ensuring ACID
properties across multiple sites.
6. Local Schema (Internal Schema): The description of the local database at each site.
7. Local DBMS: The local database management system at each site, responsible for
managing local data and executing local operations.
8. Communication Network: Facilitates communication between different sites.

Diagram (DDBMS Architecture - Simplified):

graph TD
User[User/Application] --> GlobalQP[Global Query Processor]
GlobalQP --> FragmentationSchema[Fragmentation Schema]
GlobalQP --> AllocationSchema[Allocation Schema]
GlobalQP --> GTM[Global Transaction Manager]
GTM --> SiteA(Local DBMS A)
GTM --> SiteB(Local DBMS B)
SiteA --- Network(Communication Network) --- SiteB
SiteA --> LocalSchemaA[Local Schema A]
SiteB --> LocalSchemaB[Local Schema B]

Distributed Database Design

Alternative Design Strategies

How to go about designing a distributed database.

1. Top-Down Design:
o Concept: Starts with a global conceptual schema (enterprise-wide view) and then
iteratively fragments and allocates data to different sites.
o Steps: Global conceptual design -> Fragmentation design -> Allocation design.
o Pros: Leads to a consistent global view, well-suited for greenfield (new) systems.
o Cons: Can be complex, might not fit existing legacy systems.
2. Bottom-Up Design:
o Concept: Starts with existing local schemas (often heterogeneous) and then
integrates them to form a global conceptual schema.
o Steps: Local schema design -> Schema integration.
o Pros: Suitable for integrating existing databases (federated systems), preserves
local autonomy.
o Cons: Schema integration can be very challenging (semantic heterogeneity),
potential for inconsistencies.
3. Mixed Design:
o Combines elements of both top-down and bottom-up approaches. Might start with
a partial top-down design for core data and then integrate existing local data.

Distribution Design Issues

Key decisions and challenges in distributing data:

1. Fragmentation: Deciding how to break down global relations into smaller, manageable
units (fragments).
2. Allocation: Deciding where to store these fragments across different sites.
3. Replication: Deciding whether to store multiple copies of data for availability and
performance.
4. Location Transparency: Users should not need to know the physical location of data.
5. Fragmentation Transparency: Users should not need to know how relations are
fragmented.
6. Replication Transparency: Users should not need to know if data is replicated.

Fragmentation

The process of breaking a relation (table) into smaller pieces (fragments) that can be stored at
different sites. The goal is to improve performance, reliability, and local autonomy.

1. Horizontal Fragmentation:
o Concept: Divides a relation into subsets of tuples (rows) based on a predicate
(condition) on one or more attributes. Each fragment has the same schema as the
original relation.
o Types:
 Primary Horizontal Fragmentation: Based on a predicate on the base
relation itself.
 Derived Horizontal Fragmentation: Based on a join predicate with
another relation that is already fragmented.
o Example: EMPLOYEE table fragmented by DEPARTMENT_ID.
 EMP_HR (employees in HR department)
 EMP_SALES (employees in Sales department)
o Diagram:
o graph TD
o EmployeeTable[Employee (EmpID, Name, DeptID, Salary)] -->
H1[Fragment 1 (DeptID = 'HR')]
o EmployeeTable --> H2[Fragment 2 (DeptID = 'Sales')]

2. Vertical Fragmentation:
o Concept: Divides a relation into subsets of attributes (columns) plus the primary
key to link back to the original relation. Each fragment has a subset of the original
columns.
o Goal: To improve performance by allowing queries to access only relevant
columns, reducing I/O.
o Example: EMPLOYEE table.
 EMP_PERSONAL (EmpID, Name, Address, Phone)
 EMP_JOB (EmpID, DeptID, Salary, JobTitle)
o Diagram:
o graph TD
o EmployeeTable[Employee (EmpID, Name, DeptID, Salary,
Address)] --> V1[Fragment 1 (EmpID, Name, Address)]
o EmployeeTable --> V2[Fragment 2 (EmpID, DeptID, Salary)]
3. Mixed (Hybrid) Fragmentation:
o Concept: A combination of horizontal and vertical fragmentation. A relation is
first horizontally fragmented, and then some (or all) of these fragments are
vertically fragmented. Or vice-versa.
o Example: First fragment EMPLOYEE by DEPARTMENT_ID (horizontal), then
vertically fragment EMP_HR into EMP_HR_PERSONAL and EMP_HR_JOB.

Allocation

The process of deciding at which site(s) each fragment (or non-fragmented relation) will be
stored.

1. Non-redundant Allocation:
o Concept: Each fragment is stored at exactly one site. There are no replicated
copies.
o Pros: Simplest to manage, no replication consistency issues.
o Cons: Low reliability (if site fails, data is unavailable), lower availability,
potentially slower query performance if data is remote.
o Diagram:
o graph LR
o F1[Fragment 1] --> S1(Site 1)
o F2[Fragment 2] --> S2(Site 2)
o F3[Fragment 3] --> S3(Site 3)

2. Redundant Allocation:
o Concept: One or more fragments are stored at multiple sites (replicated). This is
used for reliability and performance.
o Types:
 Replicated Allocation (Full Replication): Every fragment (or the entire
database) is stored at every site.
 Pros: High availability, fast read queries (can query local copy).
 Cons: High update cost (must update all copies), high storage cost,
complex concurrency control.
 Diagram:
 graph LR
 DB[Database] --> S1(Site 1)
 DB --> S2(Site 2)
 DB --> S3(Site 3)

 Partitioned Allocation (Partial Replication): The database is partitioned

into fragments, and each fragment is replicated at a subset of sites. (This
often refers to non-redundant fragmentation but could also imply specific
fragments are replicated).
 Pros: Balances reliability, performance, and update cost.
 Cons: More complex to manage than full replication or non-
redundant.
Example Scenario (Fragmentation and Allocation): A company has two main offices,
Chennai and Delhi. EMPLOYEE table (EmpID, Name, City, Dept, Salary)

 Horizontal Fragmentation:
o EMP_CHENNAI: (EmpID, Name, City, Dept, Salary) where City = 'Chennai'
o EMP_DELHI: (EmpID, Name, City, Dept, Salary) where City = 'Delhi'
 Allocation:
o EMP_CHENNAI allocated to Site_Chennai.
o EMP_DELHI allocated to Site_Delhi.
 Replication (Partial): If EMP_CHENNAI is frequently accessed from Delhi, a copy might
be allocated to Site_Delhi as well.
o EMP_CHENNAI to Site_Chennai (primary)
o EMP_DELHI to Site_Delhi (primary)
o EMP_CHENNAI (replicated copy) to Site_Delhi

UNIT-II: Query Processing and Optimization

Query Processing and Decomposition

Query Processing: The activities involved in retrieving data from a database. In a DDBMS, this
becomes complex as data is distributed.

Query Processing Objectives

1. Minimize Response Time: Reduce the time elapsed between submitting a query and
receiving the result.
2. Minimize Total Cost: Minimize the sum of I/O cost, CPU cost, and communication cost
(which is dominant in distributed systems).
3. Maximize Throughput: Handle as many queries as possible per unit of time.

Characterization of Query Processors

Query processors are characterized by:

 Language of Interface: SQL, relational algebra, etc.

 Optimization Type: Heuristic-based, cost-based.
 Processing Strategy: Centralized, distributed.
 Time of Optimization: Static (compile-time) vs. Dynamic (run-time).
 Optimization Scope: Single query vs. multiple queries.

Layers of Query Processing

A typical DDBMS query processor operates in several layers:

1. Query Decomposition (Global Query Optimization):

oInput: Global query (e.g., SQL query).
oTasks: Parsing, semantic analysis, query validation, simplification, and
transformation into a canonical form (e.g., relational algebra tree). This phase
largely ignores data distribution.
o Output: Relational algebra tree (or similar intermediate representation).
2. Data Localization (Distributed Query Optimization):
o Input: Relational algebra tree from decomposition.
o Tasks: Translates the global query into a distributed query plan by substituting
global relations with their fragments based on fragmentation and allocation
schemas. Identifies fragments involved and their locations. This step aims to find
an optimal execution strategy.
o Output: Distributed execution plan (sequence of local operations and inter-site
data transfers).
3. Global Optimization (Distributed Query Optimization):
o Input: Distributed execution plan.
o Tasks: Further optimizes the distributed plan considering communication costs
(major factor), parallel execution possibilities, and local processing costs. This
includes decisions on join order, data movement, and semi-join applications.
o Output: Optimized distributed query plan.
4. Local Optimization (Local Query Processor):
o Input: Sub-queries for individual sites.
o Tasks: Each local DBMS optimizes the received sub-query for its local data,
similar to a centralized DBMS query optimizer (e.g., choosing access paths, local
join strategies).
o Output: Local execution plan.

Query Decomposition

The initial phase of query processing that transforms a high-level query (e.g., SQL) into an
equivalent relational algebra expression, checks its validity, and performs initial simplifications.

 Steps:
1. Parsing and Translation: Translate the SQL query into an internal
representation (e.g., parse tree, relational algebra tree).
2. Semantic Analysis: Check for correctness (e.g., relations and attributes exist,
type compatibility).
3. Query Simplification: Remove redundant predicates, eliminate common
subexpressions.
4. Query Restructuring: Apply algebraic equivalences to transform the query into
a form that might be easier to optimize (e.g., push selections/projections down the
tree).

Localization of Distributed Data

The phase where the DDBMS maps the relational algebra operations (from query
decomposition) to operations on specific fragments at specific sites. This makes the query
executable in a distributed environment.

 Substitution: Replace global relation names with their fragment names.

 Fragment Schema Mapping: For horizontal fragmentation, conditions applied to a
global relation are translated into conditions on its fragments. For vertical fragmentation,
join operations might be introduced to reconstruct the original relation from its
fragments.
 Handling Replication: If data is replicated, the optimizer needs to decide which copy to
access (e.g., the local copy, or the copy closest to the execution site).
 Example: Query SELECT * FROM Employee WHERE DeptID = 'Sales'
o If Employee is horizontally fragmented into EMP_HR and EMP_SALES:
o The query would be localized to SELECT * FROM EMP_SALES.
o If EMP_SALES is at Site 2, the query is directed to Site 2.

Distributed Query Optimization

Query Optimization: The process of choosing the most efficient execution plan for a query. In
distributed systems, communication cost is typically the dominant factor.

Centralized Query Optimization

Refers to the optimization techniques used within a single, centralized DBMS.

 Goal: Minimize total cost (I/O, CPU).

 Techniques:
o Heuristic-based: Apply rules of thumb (e.g., "perform selection/projection as
early as possible," "join smaller relations first").
o Cost-based: Uses statistical information about data (e.g., selectivity, cardinality)
and estimated costs of operations to compare different execution plans and choose
the cheapest one.
 Cost Model: Estimates I/O cost (number of block accesses), CPU cost (number of
instructions).

Distributed Query Optimization Algorithms

Focus on minimizing communication cost in addition to local processing costs.

 Types of Costs Considered:

o Local Processing Cost: CPU and I/O costs at each site.
o Communication Cost: Cost of transferring data between sites (dominant). This
includes initiation cost (fixed cost per message) and transmission cost
(proportional to data volume).
 Strategies:
1. Query Tree Transformation: Reorder operations (joins, selections, projections)
in the relational algebra tree to reduce intermediate results.
2. Join Ordering: Deciding the order in which relations are joined. This is critical
for minimizing intermediate data transfer.
 Brute-force: Test all possible join orders (impractical for many relations).
 Dynamic Programming: For a moderate number of relations.
 Greedy Heuristics: Iteratively pick the best join to perform next.
3. Semi-Join:
 Concept: A technique to reduce the size of relations before they are
transferred for a join operation.
 Operation (R ⋈A S):
1. Project S on the join attribute A: Proj_A(S)
2. Send Proj_A(S) to the site of R.
3. Perform a selection on R: Select_A_in_Proj_A(S)(R)
4. Send the reduced R to the site of S.
5. Perform the final join.
 Pros: Can significantly reduce communication cost if the selection is
highly selective.
 Cons: Adds overhead (extra messages, local processing for projection and
selection).
o Materialization vs. Pipelining:
 Materialization: Compute and store intermediate results before passing
them to the next operation.
 Pipelining: Pass results directly from one operation to the next without
materializing. Pipelining is generally preferred for distributed queries to
avoid unnecessary disk I/O.
o Hybrid Algorithms: Combine elements of different strategies (e.g., heuristic-
based with cost estimation for specific distributed operations).

Example (Distributed Join Optimization): Relations R(A, B) at Site 1 and S(B, C) at Site 2.
Query: R JOIN S.

The optimizer compares the estimated costs of these and other possible plans to choose the best
one.

UNIT-III: Transaction Management

Transaction Management

Transaction: A logical unit of work that accesses and possibly modifies the contents of a
database. It is a sequence of operations (read, write, update, delete) that are performed as a
single, atomic unit.

Properties of Transaction (ACID)

These properties guarantee that database transactions are processed reliably.

1. Atomicity:
o Concept: A transaction is treated as a single, indivisible unit of work. Either all of
its operations are completed successfully (committed), or none of them are
(aborted/rolled back). There is no "half-finished" state.
o Example: A money transfer from account A to account B involves two
operations: Debit A and Credit B. If Debit A succeeds but Credit B fails, the
entire transaction is rolled back, and A's balance is restored.
2. Consistency:
o Concept: A transaction brings the database from one consistent state to another
consistent state. It ensures that any data written to the database must be valid
according to all defined rules and constraints (e.g., integrity constraints, business
rules).
o Example: In a banking system, the sum of balances in all accounts must remain
constant before and after a transfer, assuming no money is created or destroyed.
3. Isolation:
o Concept: Transactions are executed in isolation from each other. The
intermediate state of a transaction is not visible to other concurrent transactions
until it commits. This prevents interference problems (e.g., dirty reads, non-
repeatable reads, phantom reads).
o Example: If two transactions simultaneously try to update the same account
balance, isolation ensures that one transaction completes before the other's
changes are applied, or they are processed in a way that avoids conflicts, giving
the impression of sequential execution.
4. Durability:
o Concept: Once a transaction is committed, its changes are permanent and will
survive any subsequent system failures (e.g., power loss, system crash). This is
typically ensured by writing changes to non-volatile storage (e.g., disk) and
logging.
o Example: After a money transfer transaction commits, even if the system crashes
immediately after, the updated balances in both accounts will persist when the
system recovers.

Types of Transactions

1. Flat Transactions:
o Concept: The traditional transaction model. A single, indivisible unit of work. If
any part fails, the entire transaction aborts.
o Pros: Simple to implement and manage.
o Cons: Lacks flexibility for complex applications, limits concurrency for long-
running tasks.
2. Nested Transactions:
o Concept: A transaction can contain sub-transactions. If a sub-transaction aborts,
its effects are rolled back, but the parent transaction can continue, potentially
trying an alternative sub-transaction. Only the top-level transaction's commit is
permanent.
o Pros: Increased modularity, improved concurrency (sub-transactions can run in
parallel), better fault tolerance.
o Cons: More complex recovery and concurrency control.
3. Long-Duration (Long-Lived) Transactions:
o Concept: Transactions that execute for a long time (minutes, hours, or days),
often involving human interaction or external events. They violate the traditional
isolation property to allow other transactions to progress.
o Challenges: Traditional locking mechanisms would hold resources for too long,
causing poor concurrency.
o Approaches: Compensating transactions, sagas, loosening ACID properties (e.g.,
using weaker isolation levels), multi-version concurrency control.
o Example: Workflow management, CAD/CAM systems, complex scientific
simulations.

Distributed Concurrency Control

Definition: The process of coordinating concurrent transactions across multiple sites in a

DDBMS to ensure that the overall execution is correct and consistent (i.e., satisfies
serializability).

Serializability

 Concept: The main correctness criterion for concurrency control. It ensures that the
concurrent execution of multiple transactions is equivalent to some serial (sequential)
execution of those same transactions.
 Importance: If an execution is serializable, it means the database remains consistent,
even with concurrent access.
 Global Serializability: In a DDBMS, not only must local executions at each site be
serializable, but their combined effect must also be globally serializable. This is often
achieved by ensuring that the order of commits of global transactions is the same at all
participating sites.

Concurrency Control Mechanisms & Algorithms

Strategies to ensure serializability.

1. Locking Protocols (Two-Phase Locking - 2PL):

o Concept: The most widely used mechanism. Transactions acquire locks on data
items before accessing them. Locks prevent other transactions from conflicting
accesses.
o Types of Locks:
 Shared (S) Lock: Allows multiple transactions to read an item
concurrently.
 Exclusive (X) Lock: Allows only one transaction to write to an item (also
prevents reads).
o Two-Phase Locking (2PL):
 Growing Phase: A transaction can acquire new locks but cannot release
any.
 Shrinking Phase: A transaction can release locks but cannot acquire any
new ones.
 Strict 2PL: All exclusive locks are held until the transaction commits or
aborts. (Prevents dirty reads).
o Distributed 2PL:
 Centralized 2PL: A single lock manager (at one site) manages all locks.
Simple, but a single point of failure and bottleneck.
 Primary Copy 2PL: One copy of each replicated data item is designated
as the primary copy. All locks for that item are managed by the site
holding the primary copy.
 Distributed 2PL (Decentralized): Each site has its own local lock
manager. To acquire a lock on a replicated item, a transaction might need
to acquire locks on all copies (read-any, write-all approach).

Time-Stamped Concurrency Control Algorithms

 Concept: Each transaction is assigned a unique timestamp at its start. This timestamp
determines the serial order of transactions. Operations are validated based on these
timestamps.
 Timestamp Ordering (TO):
o Each data item has a Read Timestamp (RTS) and a Write Timestamp (WTS)
indicating the timestamp of the last transaction that read/wrote it.
o Read Operation (T attempts to read X): If TS(T) < WTS(X), T is trying to read
an "old" version of X that has already been overwritten by a younger transaction.
T aborts and restarts with a new timestamp. Otherwise, read X and update RTS(X)
= max(RTS(X), TS(T)).
o Write Operation (T attempts to write X): If TS(T) < RTS(X) or TS(T) <
WTS(X), T is trying to write an "old" version or overwrite a value already
read/written by a younger transaction. T aborts and restarts. Otherwise, write X
and update WTS(X) = TS(T).
 Pros: Does not cause deadlocks (transactions are simply aborted).
 Cons: Can lead to cascade aborts, low concurrency for certain workloads, high restart
rate.

Optimistic Concurrency Control Algorithms

 Concept: "Validation-based" approach. Transactions execute without explicit

concurrency control mechanisms (like locks) during their execution phase. Conflicts are
checked only at commit time.
 Phases:
1. Read Phase: Transaction reads data, updates are made to local copies. No locks.
2. Validation Phase: Before committing, the system checks if the transaction's
operations conflict with any committed concurrent transactions.
3. Write Phase: If validation succeeds, changes are made permanent. If it fails, the
transaction aborts and restarts.
 When it works best: High transaction throughput and low data contention (few
conflicts).
 Pros: Higher concurrency when conflicts are rare, no deadlocks.
 Cons: High rollback cost when conflicts are frequent, non-recoverable schedules possible
without careful design.
 Distributed Optimistic Concurrency Control: Involves coordinating validation across
multiple sites.

Deadlock Management

Deadlock: A state where two or more transactions are indefinitely waiting for each other to
release resources (locks) that they need.

Example (Deadlock):

 Transaction T1 holds lock on A, requests lock on B.

 Transaction T2 holds lock on B, requests lock on A.
 Both wait indefinitely.

Methods for Deadlock Management:

1. Deadlock Prevention:
o Concept: Design the system to ensure deadlocks can never occur.
o Techniques:

Pre-claiming: Transactions must acquire all necessary locks at once at the
beginning. If any lock is unavailable, none are acquired.
 Ordering of Resources: Impose a total ordering on all resources.
Transactions must request locks in increasing order of resource numbers.
 Wait-Die: If TS(T_i) < TS(T_j) (T_i is older) and T_i requests a lock
held by T_j, T_i waits. If TS(T_i) > TS(T_j) (T_i is younger), T_i dies
(aborts) and restarts.
 Wound-Wait: If TS(T_i) < TS(T_j) (T_i is older) and T_i requests a
lock held by T_j, T_j is wounded (aborts) and releases its lock. If TS(T_i)
> TS(T_j) (T_i is younger), T_i waits.
o Pros: Guarantees no deadlocks.
o Cons: Can lead to low resource utilization or unnecessary aborts.
2. Deadlock Detection and Recovery:
o Concept: Allow deadlocks to occur, detect them, and then recover by aborting
one or more transactions.
o Steps:
1. Detection: Maintain a Wait-For Graph (WFG). Nodes are transactions,
directed edge T_i -> T_j exists if T_i is waiting for a resource held by
T_j. A cycle in the WFG indicates a deadlock.
2. Recovery: If a cycle is detected, select a "victim" transaction to abort. The
victim releases its locks, allowing other transactions to proceed. The
victim is then restarted.
 Victim Selection Criteria: Minimum cost, least progress,
youngest transaction, etc.
o Distributed Deadlock Detection: More complex.
 Centralized: A single site collects local WFGs from all sites and builds a
global WFG. Single point of failure.
 Distributed: Sites cooperate to detect cycles. Each site maintains its local
WFG and exchanges probes/messages with other sites. (e.g., Chandy-
Lamport algorithm, distributed WFG algorithms).
 Hierarchical: Combines centralized and distributed approaches for large
systems.
o Pros: Higher resource utilization, more efficient if deadlocks are rare.
o Cons: Performance overhead of detection, cost of aborting transactions.

UNIT-IV: Distributed DBMS Reliability & Parallel

Database Systems
Distributed DBMS Reliability

Reliability: The probability that a system will operate without failure for a specified time
interval. Availability: The fraction of time that a system is available for use.

Reliability Concepts and Measures

 MTBF (Mean Time Between Failures): Average time a system operates before a failure
occurs. Higher is better.
 MTTR (Mean Time To Repair): Average time it takes to repair a failed system and
restore it to operation. Lower is better.
 Availability Calculation: MTBF / (MTBF + MTTR)
 Failure: Any deviation from the specified behavior of the system.
 Error: A state of the system that can lead to failure.
 Fault: The cause of an error.

Fault-tolerance in Distributed Systems

The ability of a system to continue operating correctly even in the presence of failures.

 Redundancy: Key principle. Replicating components (hardware, software, data) so that

if one fails, a backup can take over.
o Hardware Redundancy: Multiple CPUs, power supplies, disks (RAID).
o Software Redundancy: Replicated processes, redundant code.
o Information Redundancy: Replicated data (as in DDBS), error-correcting codes.
o Time Redundancy: Retrying operations.
 Checkpointing: Periodically saving the state of the system to stable storage, so that in
case of failure, recovery can start from the last checkpoint instead of from scratch.
 Recovery: Procedures to restore the system to a consistent state after a failure.
 Isolation: Limiting the impact of a failure to the smallest possible part of the system.

Failures in Distributed DBMS

DDBSs are prone to various types of failures, which are more complex than in centralized
systems due to distributed components and networks.

1. Transaction Failures:
o Logical Errors: Bugs in application programs or database integrity violations.
o System Errors: Software bugs, resource exhaustion.
o User Errors: Incorrect input, accidental deletion.
o Action: Transaction rollback/abort.
2. System Failures (Site Failures):
o Concept: A single computer site in the DDBMS crashes (e.g., power failure,
operating system crash, hardware malfunction). The site stops functioning, but
other sites might still be operational.
o Impact: Transactions active at the failed site are lost. Data exclusively at that site
becomes unavailable.
o Recovery: Requires restarting the site, restoring its local database from
logs/backups, and coordinating with other sites for global consistency.
3. Media Failures:
o Concept: Non-volatile storage (disks) containing database data or logs becomes
corrupted or inaccessible.
o Impact: Loss of persistent data.
o
Recovery: Requires restoring data from backups and replaying logs. Data
replication across sites is crucial for quick recovery.
4. Communication Failures (Network Partitioning):
o Concept: The network connecting sites fails, leading to lost messages or network
partitioning (the network splits into two or more disconnected components). Sites
within a component can communicate, but not with sites in other components.
o Impact: Transactions requiring communication across partitions cannot commit.
Can lead to "split-brain" syndrome if not handled carefully, where sites in
different partitions independently update data, leading to inconsistency.
o Recovery: Requires detecting partitions, resolving conflicts, and merging
consistent states after the network is restored.

Local & Distributed Reliability Protocols

Protocols to ensure atomicity and durability in the presence of failures.

1. Local Recovery Protocols:

o Concept: Used by individual sites to recover their local databases after a crash.
o Components:
 Log (Journal): Records all database changes (undo and redo
information).
 Checkpoints: Periodic snapshots of the database state.
o Process: After a crash, the local DBMS uses the log and the latest checkpoint to:
 Redo committed transactions: Apply changes from transactions that
committed before the crash.
 Undo uncommitted transactions: Rollback changes from transactions
that were active but not committed at the time of the crash.
2. Distributed Reliability Protocols (Two-Phase Commit - 2PC):
o Concept: The standard protocol for ensuring atomicity of distributed transactions
(all-or-nothing property) across multiple participating sites. It guarantees that
either all sites commit the transaction or all abort it.
o Components:
 Coordinator: One site that initiates and coordinates the commit process
for a global transaction.
 Participants: All other sites involved in the transaction.
o Phases:
1. Phase 1: Voting Phase (Prepare):
 Coordinator sends a PREPARE message to all participants.
 Each participant executes the transaction up to the point of
commit, writes all changes to its local stable storage, and then
votes:
 If it can commit, it sends a VOTE_COMMIT message (or YES).
 If it cannot, it sends a VOTE_ABORT message (or NO).
2. Phase 2: Decision Phase (Commit/Abort):
 The coordinator collects all votes:

If all participants sent VOTE_COMMIT, the coordinator
decides to GLOBAL_COMMIT, writes this decision to its log,
and sends GLOBAL_COMMIT messages to all participants.
 If any participant sent VOTE_ABORT (or a timeout occurs),
the coordinator decides to GLOBAL_ABORT, writes this
decision to its log, and sends GLOBAL_ABORT messages to
all participants.
 Participants, upon receiving the GLOBAL_COMMIT or GLOBAL_ABORT
message, make the final decision locally and send an ACK to the
coordinator.
o Diagram (2PC - Successful Commit):
o sequenceDiagram
o participant C as Coordinator
o participant P1 as Participant 1
o participant P2 as Participant 2
o C->>P1: Prepare(T)
o C->>P2: Prepare(T)
o P1->>C: Vote_Commit
o P2->>C: Vote_Commit
o C->>P1: Global_Commit
o C->>P2: Global_Commit
o P1->>C: ACK
o P2->>C: ACK

o Pros: Guarantees atomicity.

o Cons: Blocking protocol (if coordinator fails, participants might block
indefinitely), high communication overhead.

Site Failures and Network Partitioning

 Site Failures: (As described in "Failures in Distributed DBMS")

o Impact on 2PC:
 Participant fails before voting: Coordinator times out, decides to abort.
 Participant fails after voting but before final decision: Blocks, waits
for coordinator.
 Coordinator fails after sending PREPARE but before collecting all votes:
Participants are blocked, cannot make a decision.
 Coordinator fails after GLOBAL_COMMIT but before all ACKs: Some
participants might commit, some might not. Recovery protocols needed.
 Network Partitioning: (As described in "Failures in Distributed DBMS")
o Impact on 2PC: If a partition occurs during 2PC, some sites might not receive
coordinator messages, leading to blocking or inconsistent states.
o Split-Brain Syndrome: When a network partition makes a single logical
database appear as two separate, active databases. Each side of the partition might
independently proceed with updates, leading to data divergence when the partition
heals.
o Solutions: Quorum-based protocols (e.g., Paxos, Raft, majority voting) to ensure
consistency even during partitions. Only a majority partition is allowed to proceed
with updates.

Parallel Database Systems

Parallel Database System: A database system that runs on multiple processors and disks,
designed to perform operations in parallel, significantly improving query processing and
transaction throughput.

Parallel Database System Architectures

Categorized by how components share resources.

1. Shared-Memory Architecture:
o Concept: Multiple CPUs share a common main memory and common disks.
o Pros: Easy to program and load balance, low communication overhead (via
shared memory).
o Cons: Limited scalability (shared memory becomes a bottleneck), not fault-
tolerant beyond single node.
o Diagram:
o graph TD
o CPU1[CPU 1] --> SharedMemory(Shared Memory)
o CPU2[CPU 2] --> SharedMemory
o CPU3[CPU 3] --> SharedMemory
o SharedMemory --> SharedDisk(Shared Disk Array)

o Example: Multi-core server with single database instance.

2. Shared-Disk Architecture (Shared-Everything):
o Concept: Multiple CPUs, each with its own private memory, but all share access
to the same set of disks.
o Pros: Scalable (more CPUs can be added), good for data-intensive applications,
improved fault tolerance (if one CPU fails, others can access data).
o Cons: Complex concurrency control (cache coherence issues across memories),
high inter-node communication for lock management, I/O bottleneck at shared
disks.
o Diagram:
o graph TD
o CPU1[CPU 1] --- M1(Memory 1)
o CPU2[CPU 2] --- M2(Memory 2)
o CPU1 & CPU2 --> SharedDisk(Shared Disk Array)

o Example: Oracle Real Application Clusters (RAC).

3. Shared-Nothing Architecture (Massively Parallel Processing - MPP):
o Concept: Each node has its own CPU, memory, and disks. Nodes communicate
only by passing messages over a high-speed interconnect. No shared resources.
o Pros: Highly scalable (linear scalability often achieved), high fault tolerance
(failure of one node doesn't affect others' data), eliminates shared resource
bottlenecks.
o Cons: More complex to design and manage, data redistribution for certain queries
can be costly.
o Diagram:
o graph TD
o N1(Node 1) -- interconnect --> N2(Node 2)
o N1 -- interconnect --> N3(Node 3)
o N2 -- interconnect --> N3
o N1 --> CPU1[CPU 1] & M1[Memory 1] & D1[Disk 1]
o N2 --> CPU2[CPU 2] & M2[Memory 2] & D2[Disk 2]
o N3 --> CPU3[CPU 3] & M3[Memory 3] & D3[Disk 3]

o Example: Teradata, Greenplum, Google Spanner, many modern data warehouses.

Parallel Data Placement

How data is distributed across disks in a parallel database. Also known as Data Partitioning or
Data Distribution.

1. Horizontal Partitioning (Sharding):

o Concept: Rows of a table are distributed across different disks/nodes.
o Types:
 Hash Partitioning: Rows are assigned to partitions based on a hash
function of one or more column values. Provides good load balancing.
 Range Partitioning: Rows are assigned to partitions based on a range of
values in a column. Good for range queries.
 Round-Robin Partitioning: Rows are assigned sequentially to partitions.
Simple, good for uniform distribution.
o Pros: Parallel scan, parallel updates.
o Cons: Range queries might be inefficient with hash, hot spots with range.
2. Vertical Partitioning:
o Concept: Columns of a table are distributed across different disks/nodes (plus
primary key).
o Pros: Efficient for queries accessing only a subset of columns, reduces I/O.
o Cons: Requires joins to reconstruct original rows, not ideal for full row retrieval.
3. Hybrid Partitioning:
o Combines horizontal and vertical partitioning. (e.g., a table is horizontally
partitioned, and then each horizontal fragment is vertically partitioned).

Parallel Query Processing

Executing different parts of a query or multiple queries concurrently.

1. Inter-query Parallelism (Transaction Parallelism):

o Concept: Different queries or transactions are executed in parallel on different
nodes.
o Goal: Increase transaction throughput.
o Example: Multiple users running different reports simultaneously.
2. Intra-query Parallelism (Query Parallelism):
o Concept: A single query is broken down into sub-operations, which are then
executed in parallel.
o Types:
 Inter-operation Parallelism: Different operations within the same query
run in parallel (e.g., a join and a selection running concurrently).
 Intra-operation Parallelism (Data Parallelism): The same operation is
executed in parallel on different partitions of the data.
 Example: A SUM aggregation on a large table partitioned across
multiple disks. Each processor calculates the sum for its local
partition in parallel, and then partial sums are combined.

Diagram (Intra-operation Parallelism - Scan):

graph TD
Table[Large Table] --> P1[Partition 1]
Table --> P2[Partition 2]
Table --> P3[Partition 3]
P1 --> Scan1[Scan Op (Node 1)]
P2 --> Scan2[Scan Op (Node 2)]
P3 --> Scan3[Scan Op (Node 3)]
Scan1 & Scan2 & Scan3 --> Combine[Combine Results]

Load Balancing

 Concept: Distributing the workload evenly across the available resources (processors,
disks) in a parallel database system to prevent bottlenecks and maximize performance.
 Techniques:
o Dynamic Data Migration: Move data partitions to less loaded nodes.
o Dynamic Query Assignment: Assign incoming queries to nodes with lower
current workload.
o Adaptive Query Execution: Adjust execution plans based on real-time load.
o Hashing/Range Partitioning: Used to initially distribute data evenly.

Database Clusters

 Concept: A group of interconnected independent servers (nodes), each running a DBMS

instance, that work together as a single logical database system.
 Purpose: Provide high availability, scalability, and load balancing.
 Common Architectures: Often based on Shared-Disk or Shared-Nothing principles.
o Shared-Disk Clusters (e.g., Oracle RAC): Nodes share storage, use complex
cache coherence mechanisms.
o Shared-Nothing Clusters (e.g., PostgreSQL with Citus, MySQL with NDB
Cluster): Nodes have local storage, data is partitioned.
 Key Features:
o Failover: If one node fails, another node takes over its workload.
o Load Distribution: Queries/transactions can be routed to different nodes.
o Data Synchronization: Mechanisms to keep data consistent across nodes.

UNIT-V: Distributed Object Database Management Systems

& Object-Oriented Data Model
Distributed Object Database Management Systems (DODBMS)

Definition: A DODBMS integrates object-oriented database principles with distributed database

technologies. It manages object-oriented data (objects, classes, inheritance, encapsulation) stored
across multiple interconnected sites, providing a single logical view.

Fundamental Object Concepts and Models

1. Object:
o Concept: A discrete entity that combines both data (attributes/state) and behavior
(methods/operations) into a single unit.
o Example: An Employee object with attributes like name, salary and methods
like calculate_bonus(), update_salary().
2. Class:
o Concept: A blueprint or template for creating objects. It defines the structure
(attributes) and behavior (methods) that all objects of that class will have.
o Example: The Employee class.
3. Encapsulation:
o Concept: Bundling data and methods that operate on the data within a single unit
(the object), and restricting direct access to some of the object's components. It
hides the internal implementation details.
o Benefit: Data integrity, modularity, easier maintenance.
4. Inheritance:
o Concept: A mechanism by which one class (subclass/child class) can acquire the
attributes and methods of another class (superclass/parent class). It promotes code
reuse and creates a hierarchy.
o Example: Manager class inherits from Employee class. Manager has all
properties of Employee plus its own specific ones (e.g., department_managed).
5. Polymorphism:
o Concept: The ability of an object to take on many forms. Specifically, the ability
of methods to behave differently depending on the object on which they are
called.
o Example: A calculate_salary() method might exist in both Employee and
Manager classes, but its implementation might differ for managers (e.g., includes
a bonus calculation).
6. Object Identity:
o Concept: A unique, system-generated identifier for each object, independent of
its attribute values. This ID remains immutable even if the object's state changes.
o Importance: Allows objects to be referenced directly and enables complex object
relationships without relying on primary keys (which can change).

Object Distributed Design

Similar to relational distributed design, but adapted for objects.

1. Object Fragmentation:
o Horizontal Fragmentation: Grouping objects of the same class based on a
predicate (e.g., Employee objects with department='Sales').
o Vertical Fragmentation: Less common for objects as it breaks encapsulation.
Might involve grouping attributes of an object into different fragments, requiring
object reconstruction.
o Class Partitioning: Partitioning a class's instances.
o Schema Partitioning: Distributing parts of the schema across sites.
2. Object Allocation: Deciding where to store object fragments or full objects (similar to
relational allocation: non-redundant, replicated).
3. Complex Object Distribution: Handling objects that contain references to other objects
(complex nested structures). This requires careful consideration during fragmentation and
allocation to minimize distributed object access.

Architectural Issues

Challenges specific to DODBMS architecture:

1. Object Naming and Identification: Ensuring global unique object IDs across all sites.
2. Schema Integration: Integrating heterogeneous object schemas.
3. Object Migration: Moving objects between sites.
4. Distributed Object Access: Efficiently locating and accessing objects that might be
fragmented or replicated across multiple sites.
5. Concurrency Control and Recovery: Adapting distributed transaction management
(2PC, locking) for object-oriented data structures and complex methods.
6. Query Processing for Complex Objects: Optimizing queries that traverse object
relationships.

Object Management

 Object Creation and Deletion: Managing unique object IDs and distributed storage.
 Object Versioning: Supporting multiple versions of an object (e.g., for design
applications).
 Object Caching: Caching frequently accessed objects at client or intermediate sites to
reduce network traffic.
 Object Granularity: Deciding whether to distribute entire objects, or sub-objects (if they
can be meaningfully fragmented).

Distributed Object Storage

 Storage Models:
o Centralized Object Store: All objects stored in one central OODBMS (less
distributed).
o Fragmented Object Store: Objects or object fragments are distributed across
sites.
o Replicated Object Store: Copies of objects/fragments are stored at multiple sites
for availability and performance.
 Addressing Objects: Using Object IDs (OIDs) to locate objects regardless of their
physical location.
 Clustering Objects: Storing related objects together (e.g., parent-child objects) to
optimize access performance and minimize I/O for traversals.

Object Query Processing

 Challenges: Object-oriented queries involve traversing complex object graphs,

navigating through inheritance hierarchies, and invoking methods.
 Object Query Languages: Extensions to SQL (e.g., OQL) or specific object-oriented
query languages.
 Optimization:
o Path Expression Optimization: Optimizing queries that follow relationships
between objects (e.g., Employee.department.name).
o Method Invocation Optimization: Optimizing when and where methods are
invoked (locally or remotely).
o Inheritance Hierarchy Optimization: Handling queries across class hierarchies.
o Distributed Join Optimization: Similar to relational joins, but on object
relationships.

Object-Oriented Data Model (OODM)

Definition: A data model that applies concepts from object-oriented programming to database
design. It stores data as objects, enabling more complex data types and direct representation of
real-world entities and their behaviors.

Inheritance

 Concept: A fundamental principle in OODM. It allows a class (subclass) to inherit

attributes and methods from another class (superclass). This forms an "is-a" relationship.
 Types: Single inheritance, multiple inheritance (less common in databases due to
complexity).
 Benefit: Code reusability, modularity, easier maintenance, modeling of type hierarchies.
 Example:
 +-----------------+
 | Person |
 |-----------------|
 | - name |
 | - address |
 |-----------------|
 | + get_details() |
 +-----------------+
 ^
 | (inherits from)
 |
 +-----------------+ +-----------------+
 | Employee | | Student |
 |-----------------| |-----------------|
 | - employee_id | | - student_id |
 | - salary | | - major |
 |-----------------| |-----------------|
 | + calculate_pay() | | + enroll_course()|
 +-----------------+ +-----------------+

Object Identity

 Concept: As discussed in DODBMS, every object has a unique, system-generated,

immutable identifier (OID). This OID is independent of the object's data values and
physical location.
 Importance:
o Enables direct referencing of objects.
o Supports complex object structures (nested objects, graphs).
o Allows objects to be moved or their data values to change without affecting their
identity.
o Distinguishes between objects with identical attribute values (e.g., two different
Person objects named "John Doe").

Persistent Programming Languages and Persistence of Objects

 Persistent Programming Language:

o Concept: A programming language where objects can directly persist beyond the
execution of the program that created them. This means objects can be stored in a
database and retrieved later without explicit serialization/deserialization code.
o Goal: Bridge the "impedance mismatch" between programming languages
(object-oriented) and traditional relational databases (relational).
o Examples: Some OODBMS provide extensions to languages like C++, Smalltalk,
Java to make objects directly persistent.
 Persistence of Objects:
o Concept: The ability of an object to outlive the process that created it. Objects
created in memory can be saved to stable storage and loaded back into memory in
a later session.
o Mechanism: OODBMS handle the mapping of in-memory objects to disk storage
automatically.
o Types of Persistence:
 Transparent Persistence: The application programmer does not need to
write explicit code to store or retrieve objects. The OODBMS handles it
automatically.
 Non-transparent Persistence: Programmers explicitly call methods (e.g.,
save(), load()) to manage object persistence.

Comparison OODBMS and ORDBMS

OODBMS (Object-Oriented
Feature ORDBMS (Object-Relational DBMS)
DBMS)
Pure object-oriented (objects, Relational model extended with object
Core Paradigm
classes, methods) features
Objects, classes, inheritance,
Data Model Tables, rows, columns + object-like features
encapsulation
Defined by classes and their
Schema Defined by tables and their relationships
relationships
Fundamental, system-
Object Identity generated OIDs for every Primary keys (value-based) for row identity
object
Complex Data Directly supports complex Supports user-defined types, arrays, LOBs,
Types nested objects, collectionsbut less naturally integrated
Stores data, methods usually in application
Stores methods as part of the
Methods/Behavior layer (though some can store
schema, directly callable
functions/procedures)
Directly supported and
Inheritance Limited support, often simulated with tables
managed within the DBMS
Object Query Language
SQL extensions (e.g., SQL:1999/SQL3) for
Query Language (OQL) or language-specific
objects
extensions
Transparent persistence for Requires Object-Relational Mapping
Persistence
programming languages (ORM) tools for object persistence
Niche, less mature in Highly mature, widely used, industry
Maturity
widespread adoption standard
Potentially faster for complex Excellent for structured data, joins, and
Performance
object traversals aggregations
Limited, mainly in specialized
Market Adoption domains (CAD/CAM, Dominant in enterprise applications
telecom)
Versant, GemStone/S,
Example Products Oracle, PostgreSQL, IBM DB2, SQL Server
Objectivity/DB
Impedance Low (direct mapping from
High (need ORM to map objects to tables)
Mismatch OO language to DB)

Conclusion:

 OODBMS: Ideal when the primary requirement is direct storage and manipulation of
complex objects, and close integration with object-oriented programming languages.
Used in specialized applications.
 ORDBMS: A pragmatic evolution of relational databases, extending them to handle
object-like features while retaining the strengths of the relational model. Dominant in
most business applications.

Understanding Distributed Databases Concepts
No ratings yet
Understanding Distributed Databases Concepts
56 pages
Unit 4 Distributed DBMS by ANS
No ratings yet
Unit 4 Distributed DBMS by ANS
12 pages
Distributed DB
No ratings yet
Distributed DB
146 pages
13-Distributed Databases
No ratings yet
13-Distributed Databases
12 pages
Distributeddbms Er. Inderjeet Bal
No ratings yet
Distributeddbms Er. Inderjeet Bal
60 pages
Distributed Database Systems Guide
0% (1)
Distributed Database Systems Guide
54 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
No ratings yet
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
32 pages
Distributed Systems
No ratings yet
Distributed Systems
25 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
24 pages
Chapter - 6 Distributed Database System
No ratings yet
Chapter - 6 Distributed Database System
50 pages
04 - Distributed DBMSs - Concepts and Design
No ratings yet
04 - Distributed DBMSs - Concepts and Design
72 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
Types of Distributed Data Base System - 49724
No ratings yet
Types of Distributed Data Base System - 49724
37 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
ADBS Chapter Seven
No ratings yet
ADBS Chapter Seven
22 pages
Chapter 7
No ratings yet
Chapter 7
22 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Database II: Distributed Databases
No ratings yet
Database II: Distributed Databases
15 pages
Lecture 8 - Distributed Database Management Systems
No ratings yet
Lecture 8 - Distributed Database Management Systems
60 pages
Database Architecture
No ratings yet
Database Architecture
5 pages
Distributed DBMS Architecture
No ratings yet
Distributed DBMS Architecture
49 pages
Subject: Dds (512) Distributed Data Processing
No ratings yet
Subject: Dds (512) Distributed Data Processing
12 pages
Advanced Data Base Management Systems
No ratings yet
Advanced Data Base Management Systems
35 pages
Chapter 2 - 9-15DDB Architecture
No ratings yet
Chapter 2 - 9-15DDB Architecture
67 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
25 pages
ABHIJIT - Distributed DBMS Architecture
No ratings yet
ABHIJIT - Distributed DBMS Architecture
5 pages
Distributed Database Design: Basics
No ratings yet
Distributed Database Design: Basics
18 pages
Distributed Database Design
100% (3)
Distributed Database Design
86 pages
Advanced Database Chapter 7 Assignment PDF
No ratings yet
Advanced Database Chapter 7 Assignment PDF
7 pages
Topic 7 - Distributed Database Systems
No ratings yet
Topic 7 - Distributed Database Systems
44 pages
Adv DB@Chap 4 S
No ratings yet
Adv DB@Chap 4 S
29 pages
Distributed Database Design
88% (8)
Distributed Database Design
85 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
46 pages
Distributed
No ratings yet
Distributed
83 pages
Advanced Distributed Databases
100% (1)
Advanced Distributed Databases
20 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
DBBS Sheet
No ratings yet
DBBS Sheet
17 pages
Distributed DBMS for IT Professionals
No ratings yet
Distributed DBMS for IT Professionals
46 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
5 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
Types of Distributed Database Systems
No ratings yet
Types of Distributed Database Systems
27 pages
DDBS Lec2
No ratings yet
DDBS Lec2
16 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Unit V
No ratings yet
Unit V
80 pages
NoSQL & Distributed Databases Overview
No ratings yet
NoSQL & Distributed Databases Overview
124 pages
Distributed Databases: An Overview: Unit-1
No ratings yet
Distributed Databases: An Overview: Unit-1
42 pages
Unit 3 (Distributed DBMS Architecture) : Architecture: The Architecture of A System Defines Its Structure
No ratings yet
Unit 3 (Distributed DBMS Architecture) : Architecture: The Architecture of A System Defines Its Structure
11 pages
Distribution Database
No ratings yet
Distribution Database
52 pages
Overview of Distributed Databases
No ratings yet
Overview of Distributed Databases
16 pages
Catsql:: Towards Real World Natural Language To SQL Applications
No ratings yet
Catsql:: Towards Real World Natural Language To SQL Applications
14 pages
SSIS Data Integration Guide
No ratings yet
SSIS Data Integration Guide
16 pages
Asset Hierarchy Management Plan
No ratings yet
Asset Hierarchy Management Plan
2 pages
The ABC of ERP
No ratings yet
The ABC of ERP
13 pages
NTA UGC-NET 2020 Computer Science Syllabus
No ratings yet
NTA UGC-NET 2020 Computer Science Syllabus
24 pages
Teradata To Snowflake Migration Guide
100% (2)
Teradata To Snowflake Migration Guide
15 pages
Nosql Database Architecture
No ratings yet
Nosql Database Architecture
18 pages
Open-Source Voting System Proposal
No ratings yet
Open-Source Voting System Proposal
15 pages
Al-Ahliyya Amman University Course Descriptions
No ratings yet
Al-Ahliyya Amman University Course Descriptions
13 pages
Framing The Future of Information Systems in Afghan Dynamics
No ratings yet
Framing The Future of Information Systems in Afghan Dynamics
4 pages
TFM Alberto Simon
No ratings yet
TFM Alberto Simon
67 pages
Popup Software User Guide for CRM
No ratings yet
Popup Software User Guide for CRM
23 pages
MBA I Semester Course Syllabus
No ratings yet
MBA I Semester Course Syllabus
22 pages
Applying Patch in R12.2
50% (2)
Applying Patch in R12.2
80 pages
ERP Purchasing Commitments Guide
No ratings yet
ERP Purchasing Commitments Guide
14 pages
E-Library System Proposal
No ratings yet
E-Library System Proposal
4 pages
Egx300, Com'X 510 and Com'X 200/210 Ecostruxure™ Power Monitoring Expert 9.0
No ratings yet
Egx300, Com'X 510 and Com'X 200/210 Ecostruxure™ Power Monitoring Expert 9.0
40 pages
Rahul Tiwari Resume
No ratings yet
Rahul Tiwari Resume
3 pages
Huawei ICT Competition 2022 2023 Exam Outline - Computing Track
No ratings yet
Huawei ICT Competition 2022 2023 Exam Outline - Computing Track
5 pages
MCA Course Structure and Syllabus 2020-21
No ratings yet
MCA Course Structure and Syllabus 2020-21
78 pages
AWS Solutions Architect Professional Questions Answers
75% (4)
AWS Solutions Architect Professional Questions Answers
7 pages
Web Development Internship at Techoctanet
No ratings yet
Web Development Internship at Techoctanet
2 pages
E Ishmal Vi Amazon Resume
No ratings yet
E Ishmal Vi Amazon Resume
1 page
Bootcamp Week 1
No ratings yet
Bootcamp Week 1
11 pages
ASP.NET Architecture and Features Explained
No ratings yet
ASP.NET Architecture and Features Explained
8 pages
Introduction to SAP R/3 and ABAP/4 Concepts
No ratings yet
Introduction to SAP R/3 and ABAP/4 Concepts
33 pages
12th Computer Application Full Study Material English Medium 2023-24
100% (1)
12th Computer Application Full Study Material English Medium 2023-24
114 pages
General Overview of Presentation Slide
No ratings yet
General Overview of Presentation Slide
6 pages
The CAP Theorem Overview
No ratings yet
The CAP Theorem Overview
16 pages
Intro to ICT Course Overview
No ratings yet
Intro to ICT Course Overview
5 pages

Distributed Databases

Uploaded by

Distributed Databases

Uploaded by

Detailed Explanation of Distributed Database

UNIT-I: Introduction, Architecture, and Design

Distributed Data Processing (DDP)

Distributed Database System (DDBS)

Definition: A Distributed Database System (DDBS) is a collection of multiple logically

 Distribution: Data is stored across multiple sites.

Diagram (Conceptual DDBS):

Problem Areas (Challenges/Disadvantages)

1. Complexity: Designing, implementing, and managing DDBSs is significantly more

Distributed DBMS Architecture

Architectural Models for Distributed DBMS

Different ways to structure the components of a DDBMS.

DDBMS Architecture (Components)

A DDBMS typically consists of several layers/components that interact to manage distributed

1. Global Schema (Conceptual Schema): A single, integrated, logical description of the

Diagram (DDBMS Architecture - Simplified):

Distributed Database Design

Alternative Design Strategies

How to go about designing a distributed database.

Distribution Design Issues

Key decisions and challenges in distributing data:

 Partitioned Allocation (Partial Replication): The database is partitioned

UNIT-II: Query Processing and Optimization

Query Processing Objectives

Characterization of Query Processors

Query processors are characterized by:

 Language of Interface: SQL, relational algebra, etc.

Layers of Query Processing

A typical DDBMS query processor operates in several layers:

1. Query Decomposition (Global Query Optimization):

Localization of Distributed Data

 Substitution: Replace global relation names with their fragment names.

Distributed Query Optimization

Centralized Query Optimization

Refers to the optimization techniques used within a single, centralized DBMS.

 Goal: Minimize total cost (I/O, CPU).

Distributed Query Optimization Algorithms

Focus on minimizing communication cost in addition to local processing costs.

 Types of Costs Considered:

 Option 1 (Ship R to Site 2):

UNIT-III: Transaction Management

Properties of Transaction (ACID)

These properties guarantee that database transactions are processed reliably.

Distributed Concurrency Control

Definition: The process of coordinating concurrent transactions across multiple sites in a

Concurrency Control Mechanisms & Algorithms

Strategies to ensure serializability.

1. Locking Protocols (Two-Phase Locking - 2PL):

Time-Stamped Concurrency Control Algorithms

Optimistic Concurrency Control Algorithms

 Concept: "Validation-based" approach. Transactions execute without explicit

 Transaction T1 holds lock on A, requests lock on B.

Methods for Deadlock Management:

UNIT-IV: Distributed DBMS Reliability & Parallel

Reliability Concepts and Measures

Fault-tolerance in Distributed Systems

 Redundancy: Key principle. Replicating components (hardware, software, data) so that

Failures in Distributed DBMS

Local & Distributed Reliability Protocols

Protocols to ensure atomicity and durability in the presence of failures.

1. Local Recovery Protocols:

o Pros: Guarantees atomicity.

Site Failures and Network Partitioning

 Site Failures: (As described in "Failures in Distributed DBMS")

Parallel Database Systems

Parallel Database System Architectures

Categorized by how components share resources.

o Example: Multi-core server with single database instance.

o Example: Oracle Real Application Clusters (RAC).

o Example: Teradata, Greenplum, Google Spanner, many modern data warehouses.

Parallel Data Placement

1. Horizontal Partitioning (Sharding):

Parallel Query Processing

Executing different parts of a query or multiple queries concurrently.

1. Inter-query Parallelism (Transaction Parallelism):

Diagram (Intra-operation Parallelism - Scan):