Distribution Model

The document discusses different data distribution models for NoSQL databases including sharding and replication. Sharding partitions data across multiple servers to improve scalability, while replication duplicates data across servers for availability and resilience. The key approaches are: 1. Sharding partitions data by a unique key to distribute load and scale reads/writes. 2. Replication duplicates entire data sets across master-slave or peer-to-peer topologies. Masters synchronize writes while slaves handle reads. 3. Combining sharding and replication distributes data partitions across multiple servers that are also replicated for redundancy.

Uploaded by

chitraalavani

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

989 views

Distribution Model

Uploaded by

chitraalavani

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

DISTRIBUTION MODEL

Distribution Models
• We already discussed the advantages of scale up vs.
scale out.
• Scale out is more appealing since we can run
databases on a cluster of servers.
• Depending on the distribution model the data store
can give us the ability:
1. To handle large quantity of data,
2. To process a greater read or write traffic
3. To have more availability in the case of network
slowdowns of breakages
Distribution Models
• Running over a cluster introduces complexity.
• There are two path for distribution:
– Replication and
– Sharding
Distribution Model: Single Server
• It is the first and simplest distribution option.
• Also if NoSQL database are designed to run on
a cluster they can be used in a single server
application.
• Graph database are the more obvious
• If data usage is most about processing
aggregates, than a key or a document store
may be useful.
Sharding
• Often, a data store is busy because different
people are accessing different part of the dataset.
• In this cases we can support horizontal scalability
by putting different part of the data onto
different servers (Sharding)
• The concept of sharding is not new as a part of
application logic.
• It consists in put all the customer with surname
A-D on one shard and E-G to another
Sharding
• This complicates the programming model as
the application code needs to distributed the
load across the shards
• In the ideal setting we have each user to talk
one server and the load is balanced. Of course
the ideal case is rare.
Sharding: Approaches
• In order to get the ideal case we have to
guarantee that data accessed together are
stored in the same node.
– This is very simple using aggregates.
• When considering data distribution across
nodes.
– If access is based on physical location, we can
place data close to where are accessed.
Sharding: Approaches
• Another factor is trying to keep data balanced.
• We should arrange aggregates so they are evenly
distributed in order that each node receive the
same amount of the load.
• Another approach is to put aggregate together if
we think they may be read in sequence
(BigTable).
• In BigTable as examples data on web addresses
are stored in reverse domain names.
Sharding and NoSQL
• In general, many NoSQL databases offers
autosharding.
• This can make much easier to use sharding in an
application.
• Sharding is especially valuable for performance
because it improves read and write
performances.
• It scales read and writes on the different nodes
of the same cluster.
Sharding and Resilience
• Sharding does little to improve
resilience(flexibility) when used alone.
• Since data is on different nodes, a node failure
makes shard’s data unavailable.
• So in practice, sharding alone is likely to
decrease resilience.
Sharding: right time
• Some databases are intended to be sharded at
the beginning
• Some other let us start with a single node and
then distribute and shard.
• However, sharding very late may create trouble
– especially if done in production where the
database became essentially unavailable during
the moving of the data to the new shards.
Master-Slave Replication
• In this setting one node is designated as the
master, or primary and the other as slaves.
• The master is the authoritative source for the
date and designed to process updates and
send them to slaves.
• The slaves are used for read operations.
• This allows us to scale in data intensive
dataset
Master-Slave Replication
• We can scale horizontally by adding more
slaves
• But, we are limited by the ability of the master
in processing incoming data.
• An advantage is read resilience.
– Also if the master fails the slaves can still handle
read requests.
– Anyway writes are not allowed until the master is
not restored.
Master-Slave Replication
• Another characteristic is that a slave can be
appointed as master.
• Masters can be appointed manually or
automatically.
• In order to achieve resilience we need that
read and write paths are different.
• This is normally done using separate database
connections.
Master-Slave Replication
• Replication in master-slave have the analyzed
advantages but it come with the problem of
inconsistency.
• The readers reading from the slaves can read
data not updated.
Peer-to-Peer Replication
• Master-Slave replication helps with read
scalability but has problems on scalability of
writes.
• Moreover, it provides resilience on read but
not on writes.
• The master is still a single point of failure.
• Peer-to-Peer attacks these problems by not
having a master.
Peer-to-Peer Replication
• All the replica are equal (accept writes and
reads)
• With a Peer-to-Peer we can have node failures
without lose write capability and losing data.
Peer-to-Peer Replication
• Furthermore we can easily add nodes for
performances.
• The bigger compliance here is consistency.
• When we can write on different nodes, we
increase the probability to have inconsistency
on writes.
• However there is a way to deal with this
problem.
Combining Sharding with Replication

• Master-slave and sharding: we have multiple

masters, but each data has a single master.
– Depending on the configuration we can decide the
master for each group of data.
• Peer-to-Peer and sharding is a common
strategy for column-family databases.
– This is commonly composed using replication of
the shards

SQL DBA Resume
No ratings yet
SQL DBA Resume
6 pages
Unit Iv Mapreduce Applications
No ratings yet
Unit Iv Mapreduce Applications
70 pages
Distibuted Database Management System Notes
No ratings yet
Distibuted Database Management System Notes
58 pages
Mining Multilevel Association Rules From Transactional Databases
No ratings yet
Mining Multilevel Association Rules From Transactional Databases
46 pages
ADA Complete Notes
33% (3)
ADA Complete Notes
151 pages
Nosql Module 2
100% (1)
Nosql Module 2
87 pages
Unit 4 Transaction Processing
No ratings yet
Unit 4 Transaction Processing
45 pages
C & Ds Notes 2022-2023 r22 Syllabus
100% (1)
C & Ds Notes 2022-2023 r22 Syllabus
210 pages
CS402 Data Mining and Warehousing Question Bank
No ratings yet
CS402 Data Mining and Warehousing Question Bank
6 pages
CS3492 Database Management Systems Two Mark Questions 1
100% (1)
CS3492 Database Management Systems Two Mark Questions 1
38 pages
OOAD Notes PDF
100% (2)
OOAD Notes PDF
92 pages
NOSQL Module-3
100% (2)
NOSQL Module-3
67 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
CS3481 - DBMS Lab Manual - New
100% (2)
CS3481 - DBMS Lab Manual - New
82 pages
Cs2255-Database Management Systems: Question Bank Unit - I
No ratings yet
Cs2255-Database Management Systems: Question Bank Unit - I
4 pages
Ccs341 DW Lab Manual Chumma Chumma Practical Notes
No ratings yet
Ccs341 DW Lab Manual Chumma Chumma Practical Notes
89 pages
Database Architecture For Parallel Processing
No ratings yet
Database Architecture For Parallel Processing
10 pages
Data Warehousing and Data Mining JNTU Previous Years Question Papers
No ratings yet
Data Warehousing and Data Mining JNTU Previous Years Question Papers
4 pages
Data Warehousing & Data Mining (R20) Imp Questions:-Unit-1
100% (1)
Data Warehousing & Data Mining (R20) Imp Questions:-Unit-1
3 pages
Counting Oneness in A Window
No ratings yet
Counting Oneness in A Window
12 pages
Update Operation Violations
No ratings yet
Update Operation Violations
13 pages
DMW Question Paper
0% (1)
DMW Question Paper
7 pages
System Models For Distributed and Cloud Computing
No ratings yet
System Models For Distributed and Cloud Computing
3 pages
Aggregate Data Models
100% (1)
Aggregate Data Models
55 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Oops Question Bank (Cs 2203)
No ratings yet
Oops Question Bank (Cs 2203)
9 pages
IAT-I Question Paper With Solution of 18CS823 Nosql Database May-2021-Poonam Tijare
100% (1)
IAT-I Question Paper With Solution of 18CS823 Nosql Database May-2021-Poonam Tijare
12 pages
ROCK Clustering Example
100% (2)
ROCK Clustering Example
4 pages
Cs2406 Open Source Lab Viva
0% (1)
Cs2406 Open Source Lab Viva
3 pages
Unit-1 Problem Areas in A Distributed DDBMS
100% (3)
Unit-1 Problem Areas in A Distributed DDBMS
8 pages
The Database System Environment
100% (1)
The Database System Environment
2 pages
Multimedia Mining Presentation
No ratings yet
Multimedia Mining Presentation
18 pages
Cluster Computing
No ratings yet
Cluster Computing
17 pages
S MapReduce Types Formats
No ratings yet
S MapReduce Types Formats
22 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
38 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
2 pages
DDBMS Questions Answers
No ratings yet
DDBMS Questions Answers
4 pages
Distributed Computing Question Paper
No ratings yet
Distributed Computing Question Paper
2 pages
Unit I - Big Data Programming
No ratings yet
Unit I - Big Data Programming
19 pages
Data WareHouse Previous Year Question Paper
100% (1)
Data WareHouse Previous Year Question Paper
10 pages
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
94% (18)
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
70 pages
CP5092-Cloud Computing Technologies
No ratings yet
CP5092-Cloud Computing Technologies
11 pages
CS3551 Distributed Computing Unit5
No ratings yet
CS3551 Distributed Computing Unit5
31 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Network Layer Design Issues
No ratings yet
Network Layer Design Issues
11 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
Distributed Database Management Notes - 1
100% (11)
Distributed Database Management Notes - 1
21 pages
Question Paper Code:: (10×2 20 Marks)
No ratings yet
Question Paper Code:: (10×2 20 Marks)
2 pages
Data Mining Question Bank
0% (1)
Data Mining Question Bank
7 pages
CPP Unit-1
0% (1)
CPP Unit-1
73 pages
DBMS Technical Publications Chapter 2
100% (2)
DBMS Technical Publications Chapter 2
33 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
Unit Wise Important Questions
83% (12)
Unit Wise Important Questions
11 pages
Cooperative Process: Prepared & Presented By: Abdul Rehman & Muddassar Ali
No ratings yet
Cooperative Process: Prepared & Presented By: Abdul Rehman & Muddassar Ali
18 pages
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
No ratings yet
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
12 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
module 2
No ratings yet
module 2
36 pages
Big Data - No SQL Databases and Related Concepts
100% (1)
Big Data - No SQL Databases and Related Concepts
101 pages
NOSQL_MOD2
No ratings yet
NOSQL_MOD2
25 pages
NoSQL M2
No ratings yet
NoSQL M2
47 pages
Implement - Column-Family Stores
No ratings yet
Implement - Column-Family Stores
37 pages
More Details On Data Models
No ratings yet
More Details On Data Models
23 pages
Implement - Graph Databases
No ratings yet
Implement - Graph Databases
40 pages
Consistency
No ratings yet
Consistency
42 pages
GoLookUp - Unclaimed Money Search Service
No ratings yet
GoLookUp - Unclaimed Money Search Service
3 pages
Android Application Control and Monitoring in Construction Project
No ratings yet
Android Application Control and Monitoring in Construction Project
7 pages
BIDA NOTES (1)
No ratings yet
BIDA NOTES (1)
67 pages
"Leave Management System": Bachelor of Engineering
No ratings yet
"Leave Management System": Bachelor of Engineering
55 pages
CCIT105-Final-Project
No ratings yet
CCIT105-Final-Project
4 pages
Zhejiang University of Science & Technology: Report of Engineering and Technical Internship
No ratings yet
Zhejiang University of Science & Technology: Report of Engineering and Technical Internship
15 pages
2022 April UG B.sc. Computer Science B.sc. Computer Science
No ratings yet
2022 April UG B.sc. Computer Science B.sc. Computer Science
39 pages
5.10_it_sp_04a_071150
No ratings yet
5.10_it_sp_04a_071150
13 pages
CSC
No ratings yet
CSC
10 pages
Security Data Challenges
No ratings yet
Security Data Challenges
4 pages
Software Requirements Specification
0% (1)
Software Requirements Specification
10 pages
Ais Prelim Reviewer.
No ratings yet
Ais Prelim Reviewer.
38 pages
Systems Analysis and Design Syllabus
No ratings yet
Systems Analysis and Design Syllabus
6 pages
Model Monitoring With Grafana and Dynatrace: A Comprehensive Framework For Ensuring ML Model Performance
No ratings yet
Model Monitoring With Grafana and Dynatrace: A Comprehensive Framework For Ensuring ML Model Performance
10 pages
Course Outline
No ratings yet
Course Outline
3 pages
Rebuild System Databases: SQL Server 2012
No ratings yet
Rebuild System Databases: SQL Server 2012
19 pages
Cad Module 2
100% (1)
Cad Module 2
3 pages
Akeeba Backup Guide
No ratings yet
Akeeba Backup Guide
147 pages
Chapter-3 - Database Security
No ratings yet
Chapter-3 - Database Security
8 pages
Imaster NCE-Campus V300R020C10 Monitoring and O&M
No ratings yet
Imaster NCE-Campus V300R020C10 Monitoring and O&M
96 pages
Redis Cashe
No ratings yet
Redis Cashe
301 pages
Sage-ERP-X3 Configuration-Console Version History
No ratings yet
Sage-ERP-X3 Configuration-Console Version History
22 pages
BCAPPU
No ratings yet
BCAPPU
10 pages
VB Net Data Access SF 10
No ratings yet
VB Net Data Access SF 10
38 pages
Information System Based On Malaysia Madani
100% (1)
Information System Based On Malaysia Madani
35 pages
Data Mining Final Papper
No ratings yet
Data Mining Final Papper
12 pages
Project Review
No ratings yet
Project Review
59 pages
Cof C02
No ratings yet
Cof C02
7 pages
Project Final Report format_2024_2025
No ratings yet
Project Final Report format_2024_2025
33 pages