0% found this document useful (0 votes)
30 views39 pages

Module 04C Scaleout & Consistent Hashing - Redundancy

Uploaded by

Samrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views39 pages

Module 04C Scaleout & Consistent Hashing - Redundancy

Uploaded by

Samrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Scale-out and

redundancy
BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

Redundancy
Reliability Availability Serviceability (RAS)
BITS Pilani

• When designing robust, highly available systems three terms


are often used together: reliability, availability, and
serviceability (RAS).
 Reliability: MTTF
 Serviceability: MTTR
 Availability: MTTF/(MTTF+MTTR)
• Operate-repair cycle
4

Availability
BITS Pilani

System Type Availability (%) *Downtime per year

Conventional workstation 99 3.6 days

HA System 99.9 8.5 hours

Fault-resilient system 99.99 1 hour

Fault-tolerant system 99.999 5 minutes

*May not include planned downtime (on weekends or midnight)


which may be significant
Failures
BITS Pilani

• (Planned) Shutdown vs. (Unplanned) Failure


 System may be offline for (planned) changes / maintenance:
 How do you define MTTF and MTTR in this case?
 No of planned down times and their duration.
• Transient vs. Permanent Failures
 Transient: A rollback or restart is sufficient
 Permanente failures require component replacements. E.g. hard-disk failure
• Partial vs. Total Failures
 Single Points of Failure leads to total failures
 A key approach to enhance availability is to convert total failures to partial
failures.
 Increase redundancy to avoid SPoF
 Increasing MTTF vs. Reducing MTTR
Partial vs Total Failures
BITS Pilani

If Bus fails, entire system fails. If Ethernet fails, entire system fails.

If one node fails, other If one node fails, we can still use the
nodes can offer service. Data. Use logs and checkpoints.

• What are SPoF here?


Availability
BITS Pilani

• Assume that network and RAID is 100% available


• Workstation is 99% available.
• If a node fails, workload is switched over to the other node in zero time.
• The what is the availability of the cluster?
 Prob that a node is not available is 0.01. Two nodes at the same time:
0.0001 or 0.01%. Availability is 99.99%. Fault-resilient cluster.
• What is the availability if cluster needs a planned downtime of 1 hour per
week?
 52 hours/(365*24)=0.0059 or .59%. Total = 0.01+0.59=0.6%. i.e.
99.4% availability.
Aspects of High Availability
BITS Pilani

• Redundancy
 2N Model, N+M model, Nway model
 Active & passive
• Replication
 Copying state of system (replication)
 Lock-step strategy (send same operations to be executed by all
machines)
• Monitoring
 push vs poll
 Zookeeper watches
• Failure Detection
 Ping, heartbeat
 Zookeeper watches
• Recovery
Redundancy Models
BITS Pilani

CSIS Dept. BITS Pilani, Pilani Campus


Redundancy Models
BITS Pilani

• 2N Model
 Passive standby for every active node
 This provides the highest level of fault tolerance, as the system can
continue to function even if up to half of the nodes fail
• N+M Model
 At most M standby passive nodes for N nodes.
 The N+M model is a more cost-effective alternative to the 2N model
• N-Way model
 There can be several standbys including active and passive
• N-Way Active
 There are only active standbys and can be many standbys
Two-Node Failover Configurations
BITS Pilani

 Possible cluster configurations:


1. Active-passive clusters

2. Active-active clusters
Fault Tolerant Clusters – Hot Standby
BITS Pilani

1. Active-Passive/hot standby
 Primary (active) vs. Standby (passive) node:

 Primary node mirrors any data to shared storage,


which is accessible by the standby node
 Standby node monitors the health of the primary
but does not handle any workload
 Asymmetric
2. Active-active clusters
Active-Passive Configuration
BITS Pilani

 Cost of additional system


 Highly available
 because chances that
passive one will fail are
less.
 Due to extra cost and admin
overhead, only critical
applications run in active-
passive config
Fault Tolerant Clusters – Active Takeover
BITS Pilani

1. Active-Passive / Hot standby


clusters
2. Active-takeover/active-active
clusters
 Symmetry:
 all nodes act as primaries:
 i.e. they handle normal
workload
 but they also monitor each
other
 If one fails, the survivor will
step in and handle double
workload until other node
comes back.
 Failover vs. Failback
Active-active
BITS Pilani

1. Hot standby clusters


2. Active-active clusters
 Symmetry of nodes

 Failover

 When a node fails, applications fail-over to a


designated node that is available
 Fail-over delay may result in increased
response time, data loss etc.
 Failback

 When a node recovers from a failure, the failed-


over applications fail-back to the recovered node.
Active-active configurations
BITS Pilani

 Manageable costs
 Performance impact when a single node handles double
load.
 Partner node may itself be subjected to failures because it is
also running critical applications.
Example
BITS Pilani

• Storage is shared storage.


Assume that network is 100%
available. MTTF and MTTR of
shared storage is 200 days
and 5 days respectively.
Assume that availability of a
node is 99%.
• Considering that failure of any
node will bring down the whole
cluster, calculate the cluster
availability
Example
BITS Pilani

• Storage is shared storage. Assume


that network is 100% available.
MTTF and MTTR of shared storage
is 200 days and 5 days respectively.
Assume that availability of a node is
99%.
• Considering that Node1 is the
primary server and all other
nodes are passive hot
standbys, calculate the cluster
availability? Assume that
failover takes zero time
Example
BITS Pilani

• Storage is shared storage. Assume that


network is 100% available. MTTF and
MTTR of shared storage is 200 days and
5 days respectively. Assume that
availability of a node is 99%.
• Consider that web server is deployed
on Node1 and Node2. Database
server is deployed on Node3 and
Node4. {Node 1, Node2} is
configured to be in active-active
configuration. Similarly {Node3,
Node4} are configured to be in
active-active cluster configuration.
Web application needs both web
server and database to be available.
Calculate the availability of the web
application? Assume that failover
takes zero time.
Large Cluster Configurations
BITS Pilani

• N-to-1 Clusters
• N+1 clusters
N to1 Clusters
BITS Pilani

• A designated node acts as stand-by.


 This node is required to access all disks
• Failure in any node, failover happens to designated node.
• Failback is required when failed node comes back so that
designated node is freed
N plus 1 Configuration
BITS Pilani
22

• N-to-1 requires failback which


affects availability.
• In N-plus-1, all nodes have
access shared storage using
SAN.
• Assume initially node6 is
standby. If node1 fails, it will
failover to node6.
• When node1 comes back,
node1 becomes standby.
• This way cluster config changes
over time.
Example
BITS Pilani

• Storage is shared storage. Assume that


network is 100% available. MTTF and
MTTR of shared storage is 200 days and
5 days respectively. Assume that
availability of a node is 99%.
• Assume that four nodes are
configured as N-to-1 cluster
configuration where Node4 is the
designated passive hot standby
node. Applications running on
Node1, Node2 and Node3 are
required to be available
simultaneously for the cluster to be
available. Failover and Failback
doesn't take any time. Calculate the
cluster availability
Example
BITS Pilani

• Storage is shared storage. Assume that


network is 100% available. MTTF and
MTTR of shared storage is 200 days and
5 days respectively. Assume that
availability of a node is 99%.
• Assume that four nodes are
configured as N-to-1 cluster
configuration where Node4 is the
designated passive hot standby
node. Applications running on
Node1, Node2 and Node3 are
required to be available
simultaneously for the cluster to be
available. Failover and Failback
doesn't take any time. Calculate the
cluster availability
Failover
BITS Pilani

 The migration of services from one node to another is called


failover.
 Criteria:
 Transparent to users

 Quick

 Minimal Manual intervention

 Guaranteed data access


Failover
BITS Pilani

 Critical elements to be moved


 Network identity

 IP address the client uses should be transferred to


takeover node.
 Access to shared disks

 Set of processes

 The collection of these elements is called a “service group”.


 A service group is the unit that moves from one cluster
node to another.
 A cluster may have multiple service groups.
Failover Requirements
BITS Pilani

 Two nodes
 Network connections
 Pair of heartbeat networks
 Public network
 Admin network
 Disks
 Unshared disks for OS and
failover process
 Shared disks for critical data
 App portability
 Should have binary
compatibility
 No SPoF
Fail Over Management
28

 Diagnosis
 detection of failure and location of the failed component

 heartbeat messages – a common way to

 Notification
 Recovery
 Forward Recovery

 Backward Recovery
Component Monitoring
29

 Hardware components monitoring


 Application health monitoring
 Process table

 Process is running
 Process is running properly??

 Slow response times??

 Query as an end user and check.


Component Monitoring
30

 Split-Brain syndrome
 If disks fail but network is connected, then no
problem.
 If network fails but disks function, then

 Standby server thinks primary has failed. So it tries


to takeover. Primary server goes on with its job.
 Standby also accepts connections. Both access the
data disks simultaneously.
 May lead to data corruption and unexpected
responses.
Checkpointing
31

 Processes periodically save consistent state information


(a.k.a. checkpoint) on a stable storage
 Useful for process migration and fault tolerance (fail
over)
 Single process case:
 Stack, heap, registers, pending signals, fds, fd state
etc are written to file.
 OS after restarting creates a process and the objects
associated with that. The process is set to the state in
the file.

08-Nov-23
Checkpointing – Levels of Implementation
32

 OS Kernel
 OS transparently checkpoints and (on failure) restarts processes

 All data structures are accessible. Can save any data to file.

 Difficulty in implementing.

 Library
 User space checkpointing program

 Imposes restrictions on which systems calls can be used. (forbids IPC).

 May not require source code modifications

 Program has to link to this library


 explicit calls for checkpointing and restarting
 link library with source code
 implicit checkpointing
 i.e. Compiler inserts checkpointing library calls

 Application
 Highest efficiency because user knows what to store.

 Requires modification of the source code.


Implementations
35

 Library
 libckpt

 Condor

 Libtckpt

 System
 VMADump

 CRAK
libckpt
36

 One of the first library implementations for UNIX


 Provides a number of special optimizations to reduce
the size of checkpoint files
 Memory exclusion (mark unused pages or pages that
will not be modified)
 Incremental checkpoint using mprotect()
 Forked checkpointing
 Synchronous checkpoint
 Allows the application to suggest libckpt at what
times to checkpoint.
 The application must be recompiled and statically linked
to libckpt
Failure Recovery – Backward Recovery Scheme
37

 Backward recovery
 Checkpointing:
 processes periodically save consistent state
information (a.k.a. checkpoint) on a stable storage
 Rollback
 Post failure,
 the failed component is isolated,
 previous checkpoint is restored, and
 normal operation is resumed
 Pros and Cons:
 Easy to implement, application independent
 Rollback results in wasted execution
Failure Recovery – Forward Recovery Scheme
38

 Forward recovery is useful in systems


 Forward error recovery attempts to continue the current
computation by restoring the system to a consistent state,
compensating for the inconsistencies found in the current
state.
 Fault masking using Triple Modular Redundancy (TMR) / N-
version programming.
 where execution time is critical
 e.g. Real-Time System
 Space systems /Aero systems
 but may
 be application specific and
 require additional hardware
 Combing outputs during continuous execution may
place overhead on OS. Special hardware (processor) is
reqd.
References
40
Q&A

CSIS Dept. BITS Pilani, Pilani Campus


BITS Pilani
Pilani Campus

Thank You

You might also like