Distribution Model | PDF | Replication (Computing) | Scalability
100% found this document useful (1 vote)
989 views

Distribution Model

The document discusses different data distribution models for NoSQL databases including sharding and replication. Sharding partitions data across multiple servers to improve scalability, while replication duplicates data across servers for availability and resilience. The key approaches are: 1. Sharding partitions data by a unique key to distribute load and scale reads/writes. 2. Replication duplicates entire data sets across master-slave or peer-to-peer topologies. Masters synchronize writes while slaves handle reads. 3. Combining sharding and replication distributes data partitions across multiple servers that are also replicated for redundancy.

Uploaded by

chitraalavani
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
989 views

Distribution Model

The document discusses different data distribution models for NoSQL databases including sharding and replication. Sharding partitions data across multiple servers to improve scalability, while replication duplicates data across servers for availability and resilience. The key approaches are: 1. Sharding partitions data by a unique key to distribute load and scale reads/writes. 2. Replication duplicates entire data sets across master-slave or peer-to-peer topologies. Masters synchronize writes while slaves handle reads. 3. Combining sharding and replication distributes data partitions across multiple servers that are also replicated for redundancy.

Uploaded by

chitraalavani
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

DISTRIBUTION MODEL

Distribution Models
• We already discussed the advantages of scale up vs.
scale out.
• Scale out is more appealing since we can run
databases on a cluster of servers.
• Depending on the distribution model the data store
can give us the ability:
1. To handle large quantity of data,
2. To process a greater read or write traffic
3. To have more availability in the case of network
slowdowns of breakages
Distribution Models
• Running over a cluster introduces complexity.
• There are two path for distribution:
– Replication and
– Sharding
Distribution Model: Single Server
• It is the first and simplest distribution option.
• Also if NoSQL database are designed to run on
a cluster they can be used in a single server
application.
• Graph database are the more obvious
• If data usage is most about processing
aggregates, than a key or a document store
may be useful.
Sharding
• Often, a data store is busy because different
people are accessing different part of the dataset.
• In this cases we can support horizontal scalability
by putting different part of the data onto
different servers (Sharding)
• The concept of sharding is not new as a part of
application logic.
• It consists in put all the customer with surname
A-D on one shard and E-G to another
Sharding
• This complicates the programming model as
the application code needs to distributed the
load across the shards
• In the ideal setting we have each user to talk
one server and the load is balanced. Of course
the ideal case is rare.
Sharding: Approaches
• In order to get the ideal case we have to
guarantee that data accessed together are
stored in the same node.
– This is very simple using aggregates.
• When considering data distribution across
nodes.
– If access is based on physical location, we can
place data close to where are accessed.
Sharding: Approaches
• Another factor is trying to keep data balanced.
• We should arrange aggregates so they are evenly
distributed in order that each node receive the
same amount of the load.
• Another approach is to put aggregate together if
we think they may be read in sequence
(BigTable).
• In BigTable as examples data on web addresses
are stored in reverse domain names.
Sharding and NoSQL
• In general, many NoSQL databases offers
autosharding.
• This can make much easier to use sharding in an
application.
• Sharding is especially valuable for performance
because it improves read and write
performances.
• It scales read and writes on the different nodes
of the same cluster.
Sharding and Resilience
• Sharding does little to improve
resilience(flexibility) when used alone.
• Since data is on different nodes, a node failure
makes shard’s data unavailable.
• So in practice, sharding alone is likely to
decrease resilience.
Sharding: right time
• Some databases are intended to be sharded at
the beginning
• Some other let us start with a single node and
then distribute and shard.
• However, sharding very late may create trouble
– especially if done in production where the
database became essentially unavailable during
the moving of the data to the new shards.
Master-Slave Replication
• In this setting one node is designated as the
master, or primary and the other as slaves.
• The master is the authoritative source for the
date and designed to process updates and
send them to slaves.
• The slaves are used for read operations.
• This allows us to scale in data intensive
dataset
Master-Slave Replication
• We can scale horizontally by adding more
slaves
• But, we are limited by the ability of the master
in processing incoming data.
• An advantage is read resilience.
– Also if the master fails the slaves can still handle
read requests.
– Anyway writes are not allowed until the master is
not restored.
Master-Slave Replication
• Another characteristic is that a slave can be
appointed as master.
• Masters can be appointed manually or
automatically.
• In order to achieve resilience we need that
read and write paths are different.
• This is normally done using separate database
connections.
Master-Slave Replication
• Replication in master-slave have the analyzed
advantages but it come with the problem of
inconsistency.
• The readers reading from the slaves can read
data not updated.
Peer-to-Peer Replication
• Master-Slave replication helps with read
scalability but has problems on scalability of
writes.
• Moreover, it provides resilience on read but
not on writes.
• The master is still a single point of failure.
• Peer-to-Peer attacks these problems by not
having a master.
Peer-to-Peer Replication
• All the replica are equal (accept writes and
reads)
• With a Peer-to-Peer we can have node failures
without lose write capability and losing data.
Peer-to-Peer Replication
• Furthermore we can easily add nodes for
performances.
• The bigger compliance here is consistency.
• When we can write on different nodes, we
increase the probability to have inconsistency
on writes.
• However there is a way to deal with this
problem.
Combining Sharding with Replication

• Master-slave and sharding: we have multiple


masters, but each data has a single master.
– Depending on the configuration we can decide the
master for each group of data.
• Peer-to-Peer and sharding is a common
strategy for column-family databases.
– This is commonly composed using replication of
the shards

You might also like