0% found this document useful (0 votes)
21 views

Distributed DB

A distributed database system stores data across multiple nodes that are connected via a network. There are two main types: homogeneous, where all nodes use the same database system, and heterogeneous, where nodes can use different systems. Data can be stored using replication, where copies are kept at different nodes, or fragmentation, where relations are split into smaller parts across nodes. Popular architectures include client-server, peer-to-peer, and shared-nothing. Distributed databases improve reliability, allow data sharing, and enable faster processing through parallelism.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Distributed DB

A distributed database system stores data across multiple nodes that are connected via a network. There are two main types: homogeneous, where all nodes use the same database system, and heterogeneous, where nodes can use different systems. Data can be stored using replication, where copies are kept at different nodes, or fragmentation, where relations are split into smaller parts across nodes. Popular architectures include client-server, peer-to-peer, and shared-nothing. Distributed databases improve reliability, allow data sharing, and enable faster processing through parallelism.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Distributed Database Systems

- MAGESH R (23MCA0091)
- MANISH KUMAR V (23MCA0114)
Definition

 A distributed database is a database that is spread across multiple locations or nodes,


typically in a network or across the internet. In a distributed database system, data is
stored on different computers or servers, and these computers are connected to each
other, allowing them to communicate and coordinate in managing the data.
 A centralized distributed database management system (DDBMS) manages the
distributed data as if it were stored in one physical location. DDBMS synchronizes all
data operations among databases and ensures that the updates in one database
automatically reflect on databases in other sites.
Types

 There are two types of distributed databases:


 Homogenous
 Heterogenous
Homogenous

 A homogeneous distributed database is a


network of databases spread across multiple
locations, where each database shares the same
database management system (DDBMS), data
model, and operating system.
 This uniformity simplifies management,
ensuring that all nodes within the distributed
system operate with identical structures and
software. It allows for seamless communication
between nodes, making tasks like data access
and updates consistent and straightforward.
Heterogenous

 A heterogeneous distributed database is a network of


databases that are spread across different locations and
operate with diverse database management systems
(DBMS), data models, or operating systems. Unlike
homogeneous distributed databases, which maintain
uniformity in software and structure across all nodes,
heterogeneous systems embrace diversity.
 Each node within the network may use distinct
technologies, making it more adaptable to specific
requirements. In the case of a heterogeneous
distributed database, a particular site can be
completely unaware of other sites causing limited
cooperation in processing user requests.
Distributed Database Storage

 Distributed database storage is managed in two ways:


 Replication
 Fragmentation
Replication

 In database replication, the systems store copies of data on


different sites. If an entire database is available on multiple
sites, it is a fully redundant database.The advantage of
database replication is that it increases data availability on
different sites and allows for parallel query requests to be
processed.
 However, database replication means that data requires
constant updates and synchronization with other sites to
maintain an exact database copy. Any changes made on one
site must be recorded on other sites, or else inconsistencies
occur.
 Constant updates cause a lot of server overhead and
complicate concurrency control, as a lot of concurrent
queries must be checked in all available sites.
Fragmentation

 When it comes to fragmentation of distributed database storage, the relations are


fragmented, which means they are split into smaller parts. Each of the fragments is stored
on a different site, where it is required.The prerequisite for fragmentation is to make sure
that the fragments can later be reconstructed into the original relation without losing data.
 The advantage of fragmentation is that there are no data copies, which prevents data
inconsistency.
 There are two types of fragmentation:
 Horizontal fragmentation
 Vertical fragmentation
 Horizontal fragmentation - The relation  Vertical fragmentation - The relation schema
schema is fragmented into groups of rows, and is fragmented into smaller schemas, and each
each group (tuple) is assigned to one fragment contains a common candidate key to
fragment. guarantee a lossless join.
Client Server Architecture

 A common method for spreading database functionality is


the client−server architecture. Clients communicate with
a central server, which controls the distributed database
system, in this design.
 The server is in charge of maintaining data storage,
controlling access, and organizing transactions. This
architecture has several clients and servers connected. A
client sends a query and the server which is available at
the earliest would help solve it. This Architecture is
simple to execute because of the centralised server
system.
Peer-Peer Architecture

 Each node in the distributed database system may


function as both a client and a server in a peer−to−peer
architecture. Each node is linked to the others and works
together to process and store data.
 Each node is in charge of managing its data management
and organizing node−to−node interactions. Because the
loss of a single node does not cause the system to
collapse, peer−to−peer systems provide decentralized
control and high fault tolerance.
 This design is ideal for distributed systems with nodes
that can function independently and with equal
capabilities.
Federated Architecture

 Multiple independent databases with various types are


combined into a single meta−database using a
federated database design. It offers a uniform interface
for navigating and exploring distributed data.
 In the federated design, each site maintains a
separate, independent database, while the virtual
database manager internally distributes requests.
When working with several data sources or legacy
systems that can't be simply updated, federated
architectures are helpful.
Shared Nothing Architecture

 Data is divided up and spread among several nodes in a


shared−nothing architecture, with each node in charge
of a particular portion of the data. Resources are not
shared across nodes, and each node runs independently.
 Due to the system's capacity to add additional nodes as
needed without affecting the current nodes, this design
offers great scalability and fault tolerance. Large−scale
distributed systems, such as data warehouses or big
data analytics platforms, frequently employ
shared−nothing designs.
Advantages
1. Reliability:
Data may be replicated in several sites so that the failure of a single site does not make
the data inaccessible.
2. Information Sharing:
Users in one site can access the data present in other sites.
3. Faster data processing:
A distributed database allows for the processing of data at several sites simultaneously.
4. Faster data access:
In a distributed system, the data is usually stored at the site where the demand for it is the
greatest. This can lead to faster access of the data and better performance.
5. Autonomy::
Each site retains some level of control over its data, unlike a central database.
6. Modularity:
New sites can be added and removed when required thus improving flexibility.
Dis Advantages

1. Complexity:
The design and management of Distributed DBMS are very complex especially the
heterogeneous DDBMS since it can use different software.
2. Increased Storage:
Data may be replicated at several sites which leads to increase storage requirements.
3. Difficulty in maintaining integrity:
Integrity refers to the consistency of data. When the data is replicated at multiple sites, all
of them need to be updated if a change is made to one.
4. Communication costs:
The need for the sites to communicate with each other adds more complexity and cost.
5. Security:
Since data is stored at multiple sites, the security risk increases.
A real life example

 Consider a company like Walmart which has branches all over the USA. Each branch stores
information about the customers, products and purchases in that branch. The schema can look
something like this
 Customers(ID, Name, Email, Address, Phone No)
Products(ID, Name, Category, Price)
Purchases(CustomerID, ProductID, Timestamp)
 Suppose the CEO wants to know the number of purchases in the whole of USA. In the manual
approach, we would have to log in to each branch and run a query to get the count of purchases and
then combine the results. This can be very time-consuming.
 But if the system is a distributed database, we can get the count of all purchases by using a single
query.

You might also like