0% found this document useful (0 votes)

30 views39 pages

Module 04C Scaleout & Consistent Hashing - Redundancy

Uploaded by

Samrat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views39 pages

Module 04C Scaleout & Consistent Hashing - Redundancy

Uploaded by

Samrat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Scale-out and

redundancy
BITS Pilani
Pilani Campus
BITS Pilani
Pilani Campus

Redundancy
Reliability Availability Serviceability (RAS)
BITS Pilani

• When designing robust, highly available systems three terms

are often used together: reliability, availability, and
serviceability (RAS).
 Reliability: MTTF
 Serviceability: MTTR
 Availability: MTTF/(MTTF+MTTR)
• Operate-repair cycle
4

Availability
BITS Pilani

System Type Availability (%) *Downtime per year

Conventional workstation 99 3.6 days

HA System 99.9 8.5 hours

Fault-resilient system 99.99 1 hour

Fault-tolerant system 99.999 5 minutes

*May not include planned downtime (on weekends or midnight)

which may be significant
Failures
BITS Pilani

• (Planned) Shutdown vs. (Unplanned) Failure

 System may be offline for (planned) changes / maintenance:
 How do you define MTTF and MTTR in this case?
 No of planned down times and their duration.
• Transient vs. Permanent Failures
 Transient: A rollback or restart is sufficient
 Permanente failures require component replacements. E.g. hard-disk failure
• Partial vs. Total Failures
 Single Points of Failure leads to total failures
 A key approach to enhance availability is to convert total failures to partial
failures.
 Increase redundancy to avoid SPoF
 Increasing MTTF vs. Reducing MTTR
Partial vs Total Failures
BITS Pilani

If Bus fails, entire system fails. If Ethernet fails, entire system fails.

If one node fails, other If one node fails, we can still use the
nodes can offer service. Data. Use logs and checkpoints.

• What are SPoF here?

Availability
BITS Pilani

• Assume that network and RAID is 100% available

• Workstation is 99% available.
• If a node fails, workload is switched over to the other node in zero time.
• The what is the availability of the cluster?
 Prob that a node is not available is 0.01. Two nodes at the same time:
0.0001 or 0.01%. Availability is 99.99%. Fault-resilient cluster.
• What is the availability if cluster needs a planned downtime of 1 hour per
week?
 52 hours/(365*24)=0.0059 or .59%. Total = 0.01+0.59=0.6%. i.e.
99.4% availability.
Aspects of High Availability
BITS Pilani

• Redundancy
 2N Model, N+M model, Nway model
 Active & passive
• Replication
 Copying state of system (replication)
 Lock-step strategy (send same operations to be executed by all
machines)
• Monitoring
 push vs poll
 Zookeeper watches
• Failure Detection
 Ping, heartbeat
 Zookeeper watches
• Recovery
Redundancy Models
BITS Pilani

CSIS Dept. BITS Pilani, Pilani Campus

Redundancy Models
BITS Pilani

• 2N Model
 Passive standby for every active node
 This provides the highest level of fault tolerance, as the system can
continue to function even if up to half of the nodes fail
• N+M Model
 At most M standby passive nodes for N nodes.
 The N+M model is a more cost-effective alternative to the 2N model
• N-Way model
 There can be several standbys including active and passive
• N-Way Active
 There are only active standbys and can be many standbys
Two-Node Failover Configurations
BITS Pilani

 Possible cluster configurations:

1. Active-passive clusters

2. Active-active clusters
Fault Tolerant Clusters – Hot Standby
BITS Pilani

1. Active-Passive/hot standby
 Primary (active) vs. Standby (passive) node:

 Primary node mirrors any data to shared storage,

which is accessible by the standby node
 Standby node monitors the health of the primary
but does not handle any workload
 Asymmetric
2. Active-active clusters
Active-Passive Configuration
BITS Pilani

 Cost of additional system

 Highly available
 because chances that
passive one will fail are
less.
 Due to extra cost and admin
overhead, only critical
applications run in active-
passive config
Fault Tolerant Clusters – Active Takeover
BITS Pilani

1. Active-Passive / Hot standby

clusters
2. Active-takeover/active-active
clusters
 Symmetry:
 all nodes act as primaries:
 i.e. they handle normal
workload
 but they also monitor each
other
 If one fails, the survivor will
step in and handle double
workload until other node
comes back.
 Failover vs. Failback
Active-active
BITS Pilani

1. Hot standby clusters

2. Active-active clusters
 Symmetry of nodes

 Failover

 When a node fails, applications fail-over to a

designated node that is available
 Fail-over delay may result in increased
response time, data loss etc.
 Failback

 When a node recovers from a failure, the failed-

over applications fail-back to the recovered node.
Active-active configurations
BITS Pilani

 Manageable costs
 Performance impact when a single node handles double
load.
 Partner node may itself be subjected to failures because it is
also running critical applications.
Example
BITS Pilani

• Storage is shared storage.

Assume that network is 100%
available. MTTF and MTTR of
shared storage is 200 days
and 5 days respectively.
Assume that availability of a
node is 99%.
• Considering that failure of any
node will bring down the whole
cluster, calculate the cluster
availability
Example
BITS Pilani

• Storage is shared storage. Assume

that network is 100% available.
MTTF and MTTR of shared storage
is 200 days and 5 days respectively.
Assume that availability of a node is
99%.
• Considering that Node1 is the
primary server and all other
nodes are passive hot
standbys, calculate the cluster
availability? Assume that
failover takes zero time
Example
BITS Pilani

• Storage is shared storage. Assume that

network is 100% available. MTTF and
MTTR of shared storage is 200 days and
5 days respectively. Assume that
availability of a node is 99%.
• Consider that web server is deployed
on Node1 and Node2. Database
server is deployed on Node3 and
Node4. {Node 1, Node2} is
configured to be in active-active
configuration. Similarly {Node3,
Node4} are configured to be in
active-active cluster configuration.
Web application needs both web
server and database to be available.
Calculate the availability of the web
application? Assume that failover
takes zero time.
Large Cluster Configurations
BITS Pilani

• N-to-1 Clusters
• N+1 clusters
N to1 Clusters
BITS Pilani

• A designated node acts as stand-by.

 This node is required to access all disks
• Failure in any node, failover happens to designated node.
• Failback is required when failed node comes back so that
designated node is freed
N plus 1 Configuration
BITS Pilani
22

• N-to-1 requires failback which

affects availability.
• In N-plus-1, all nodes have
access shared storage using
SAN.
• Assume initially node6 is
standby. If node1 fails, it will
failover to node6.
• When node1 comes back,
node1 becomes standby.
• This way cluster config changes
over time.
Example
BITS Pilani

• Storage is shared storage. Assume that

network is 100% available. MTTF and
MTTR of shared storage is 200 days and
5 days respectively. Assume that
availability of a node is 99%.
• Assume that four nodes are
configured as N-to-1 cluster
configuration where Node4 is the
designated passive hot standby
node. Applications running on
Node1, Node2 and Node3 are
required to be available
simultaneously for the cluster to be
available. Failover and Failback
doesn't take any time. Calculate the
cluster availability
Example
BITS Pilani

• Storage is shared storage. Assume that

 The migration of services from one node to another is called

failover.
 Criteria:
 Transparent to users

 Quick

 Minimal Manual intervention

 Guaranteed data access

Failover
BITS Pilani

 Critical elements to be moved

 Network identity

 IP address the client uses should be transferred to

takeover node.
 Access to shared disks

 Set of processes

 The collection of these elements is called a “service group”.

 A service group is the unit that moves from one cluster
node to another.
 A cluster may have multiple service groups.
Failover Requirements
BITS Pilani

 Two nodes
 Network connections
 Pair of heartbeat networks
 Public network
 Admin network
 Disks
 Unshared disks for OS and
failover process
 Shared disks for critical data
 App portability
 Should have binary
compatibility
 No SPoF
Fail Over Management
28

 Diagnosis
 detection of failure and location of the failed component

 heartbeat messages – a common way to

 Notification
 Recovery
 Forward Recovery

 Backward Recovery
Component Monitoring
29

 Hardware components monitoring

 Application health monitoring
 Process table

 Process is running
 Process is running properly??

 Slow response times??

 Query as an end user and check.

Component Monitoring
30

 Split-Brain syndrome
 If disks fail but network is connected, then no
problem.
 If network fails but disks function, then

 Standby server thinks primary has failed. So it tries

to takeover. Primary server goes on with its job.
 Standby also accepts connections. Both access the
data disks simultaneously.
 May lead to data corruption and unexpected
responses.
Checkpointing
31

 Processes periodically save consistent state information

(a.k.a. checkpoint) on a stable storage
 Useful for process migration and fault tolerance (fail
over)
 Single process case:
 Stack, heap, registers, pending signals, fds, fd state
etc are written to file.
 OS after restarting creates a process and the objects
associated with that. The process is set to the state in
the file.

08-Nov-23
Checkpointing – Levels of Implementation
32

 OS Kernel
 OS transparently checkpoints and (on failure) restarts processes

 All data structures are accessible. Can save any data to file.

 Difficulty in implementing.

 Library
 User space checkpointing program

 Imposes restrictions on which systems calls can be used. (forbids IPC).

 May not require source code modifications

 Program has to link to this library

 explicit calls for checkpointing and restarting
 link library with source code
 implicit checkpointing
 i.e. Compiler inserts checkpointing library calls

 Application
 Highest efficiency because user knows what to store.

 Requires modification of the source code.

Implementations
35

 Library
 libckpt

 Condor

 Libtckpt

 System
 VMADump

 CRAK
libckpt
36

 One of the first library implementations for UNIX

 Provides a number of special optimizations to reduce
the size of checkpoint files
 Memory exclusion (mark unused pages or pages that
will not be modified)
 Incremental checkpoint using mprotect()
 Forked checkpointing
 Synchronous checkpoint
 Allows the application to suggest libckpt at what
times to checkpoint.
 The application must be recompiled and statically linked
to libckpt
Failure Recovery – Backward Recovery Scheme
37

 Backward recovery
 Checkpointing:
 processes periodically save consistent state
information (a.k.a. checkpoint) on a stable storage
 Rollback
 Post failure,
 the failed component is isolated,
 previous checkpoint is restored, and
 normal operation is resumed
 Pros and Cons:
 Easy to implement, application independent
 Rollback results in wasted execution
Failure Recovery – Forward Recovery Scheme
38

 Forward recovery is useful in systems

 Forward error recovery attempts to continue the current
computation by restoring the system to a consistent state,
compensating for the inconsistencies found in the current
state.
 Fault masking using Triple Modular Redundancy (TMR) / N-
version programming.
 where execution time is critical
 e.g. Real-Time System
 Space systems /Aero systems
 but may
 be application specific and
 require additional hardware
 Combing outputs during continuous execution may
place overhead on OS. Special hardware (processor) is
reqd.
References
40
Q&A

CSIS Dept. BITS Pilani, Pilani Campus

BITS Pilani
Pilani Campus

Thank You

Availabilty
No ratings yet
Availabilty
23 pages
Failover Clusters: HA & CA Explained
No ratings yet
Failover Clusters: HA & CA Explained
4 pages
The Concept of A Cluster: Solving Three Typical Problems
No ratings yet
The Concept of A Cluster: Solving Three Typical Problems
4 pages
Load Balancing for Stateful Services
No ratings yet
Load Balancing for Stateful Services
51 pages
BDS Session 3
No ratings yet
BDS Session 3
68 pages
BDS Session 3
No ratings yet
BDS Session 3
67 pages
CS 03 Popular Scaling Approaches Continued
No ratings yet
CS 03 Popular Scaling Approaches Continued
53 pages
Microsoft Windows Server Clustering 101
100% (3)
Microsoft Windows Server Clustering 101
16 pages
Windows Server Clustering Essentials
No ratings yet
Windows Server Clustering Essentials
16 pages
2012 Adv Configuration 4
No ratings yet
2012 Adv Configuration 4
68 pages
BITIC-27 Proyecto 3 BITIC 3 2021 Andres Labera Failover-Cluster
No ratings yet
BITIC-27 Proyecto 3 BITIC 3 2021 Andres Labera Failover-Cluster
27 pages
Implementing Failover Clustering
No ratings yet
Implementing Failover Clustering
46 pages
CS 03 Popular Scaling Approaches Continued
No ratings yet
CS 03 Popular Scaling Approaches Continued
53 pages
High Availability and Load Balancing: Realized by
No ratings yet
High Availability and Load Balancing: Realized by
31 pages
Cours HA M1
No ratings yet
Cours HA M1
32 pages
Cours HA LB
No ratings yet
Cours HA LB
34 pages
Technical Essentials of HP Servers, Rev. 11.41
No ratings yet
Technical Essentials of HP Servers, Rev. 11.41
72 pages
Prakash Gopinadham Failover Clustering and HyperV
No ratings yet
Prakash Gopinadham Failover Clustering and HyperV
38 pages
DSS Replicator Plus 2.0 - Configuration Guide - V8.4.0 - 20240102
No ratings yet
DSS Replicator Plus 2.0 - Configuration Guide - V8.4.0 - 20240102
86 pages
Power HA Workshop Overview
No ratings yet
Power HA Workshop Overview
50 pages
High Availaility
No ratings yet
High Availaility
8 pages
Microsoft Official Course: Implementing Failover Clustering
100% (1)
Microsoft Official Course: Implementing Failover Clustering
44 pages
Availability Concepts
No ratings yet
Availability Concepts
32 pages
Failover Cluster
No ratings yet
Failover Cluster
9 pages
Lecture 7 Overview of High Availability and Disaster Recovery
No ratings yet
Lecture 7 Overview of High Availability and Disaster Recovery
50 pages
Elden Christensen - Principal Program Manager Lead - Microsoft Symon Perriman - Vice President - 5nine Software
No ratings yet
Elden Christensen - Principal Program Manager Lead - Microsoft Symon Perriman - Vice President - 5nine Software
34 pages
ADB3
No ratings yet
ADB3
35 pages
Reliability
No ratings yet
Reliability
42 pages
Synology HASWhite Paper
No ratings yet
Synology HASWhite Paper
13 pages
Fail Over
No ratings yet
Fail Over
8 pages
Analysis and Optimization of Service Availability in An HA Cluster With Load-Dependent Machine Availability
No ratings yet
Analysis and Optimization of Service Availability in An HA Cluster With Load-Dependent Machine Availability
13 pages
A Guide To Windows Server 2012 R2 Failover Clustering1
No ratings yet
A Guide To Windows Server 2012 R2 Failover Clustering1
112 pages
Step-by-Step Failover Clustering Guide
No ratings yet
Step-by-Step Failover Clustering Guide
40 pages
SDA Session 8
No ratings yet
SDA Session 8
17 pages
86 HA Theory
No ratings yet
86 HA Theory
7 pages
WS-011 Windows Server 2019 Administration: Reserved
No ratings yet
WS-011 Windows Server 2019 Administration: Reserved
37 pages
WS2008 Multi Site Clustering
No ratings yet
WS2008 Multi Site Clustering
14 pages
O31 SQL Server High Availability: Mike Shelton
No ratings yet
O31 SQL Server High Availability: Mike Shelton
45 pages
BOEBackup and DRSvcs Guide
No ratings yet
BOEBackup and DRSvcs Guide
44 pages
High Availability in IT Infrastructure
No ratings yet
High Availability in IT Infrastructure
30 pages
Windows Server 2012 Failover Cluster
No ratings yet
Windows Server 2012 Failover Cluster
13 pages
Failover Cluster
No ratings yet
Failover Cluster
20 pages
Failover Clustering Explained
No ratings yet
Failover Clustering Explained
2 pages
Failover Clustering Implementation Guide
No ratings yet
Failover Clustering Implementation Guide
66 pages
BSCSF 1552 09 21 Cosf326
No ratings yet
BSCSF 1552 09 21 Cosf326
4 pages
System Operating Lab#7-8
No ratings yet
System Operating Lab#7-8
6 pages
Chapter3high Availability and Disaster Recovery
No ratings yet
Chapter3high Availability and Disaster Recovery
42 pages
Availability Concepts
No ratings yet
Availability Concepts
39 pages
SQLCAT's Guide To High Availability Disaster Recovery
No ratings yet
SQLCAT's Guide To High Availability Disaster Recovery
37 pages
High Availability & Disaster Recovery Solutions
No ratings yet
High Availability & Disaster Recovery Solutions
56 pages
SW Architecture - Lecture - 03
No ratings yet
SW Architecture - Lecture - 03
46 pages
Fail Over
No ratings yet
Fail Over
39 pages
White Paper Synology HA Configuration
No ratings yet
White Paper Synology HA Configuration
13 pages
Nagje HighAvailability
No ratings yet
Nagje HighAvailability
12 pages
CS 10 Designing Reliable Microservice
No ratings yet
CS 10 Designing Reliable Microservice
40 pages
Clustering For Availability
No ratings yet
Clustering For Availability
4 pages
IT602-MidTerm Handouts by Yasir Ejaz
50% (2)
IT602-MidTerm Handouts by Yasir Ejaz
201 pages
Microsoft Official Course: Implementing Failover Clustering With Hyper-V
No ratings yet
Microsoft Official Course: Implementing Failover Clustering With Hyper-V
45 pages
017.2 - ZooKeeper Internals
No ratings yet
017.2 - ZooKeeper Internals
6 pages
018 - Features of Real-Time Architecture
No ratings yet
018 - Features of Real-Time Architecture
2 pages
020.05 - Kafka Topics
No ratings yet
020.05 - Kafka Topics
3 pages
011.3 - Streaming Data System Architecture Components - Processing Tier
No ratings yet
011.3 - Streaming Data System Architecture Components - Processing Tier
3 pages
019.2 - Data Delivery Semantic
No ratings yet
019.2 - Data Delivery Semantic
3 pages
017 - Apache ZooKeeper
No ratings yet
017 - Apache ZooKeeper
4 pages
016.21 - Split Brain Problem
No ratings yet
016.21 - Split Brain Problem
2 pages
016.2 - Distributed State Management
No ratings yet
016.2 - Distributed State Management
3 pages
012.2 - Pros and Cons of Lambda Architecture
No ratings yet
012.2 - Pros and Cons of Lambda Architecture
2 pages
020.08 - Kafka Producers and Consumers
No ratings yet
020.08 - Kafka Producers and Consumers
4 pages
011.2 - Streaming Data System Architecture Components - Data Flow Tier
No ratings yet
011.2 - Streaming Data System Architecture Components - Data Flow Tier
2 pages
010.4 - Streaming Data Sources
No ratings yet
010.4 - Streaming Data Sources
2 pages
009.4 - Traditional Vs Streaming Systems Data Models
No ratings yet
009.4 - Traditional Vs Streaming Systems Data Models
3 pages
008 - Classification of Real Time Systems
No ratings yet
008 - Classification of Real Time Systems
2 pages
007 - Big Data Architecture Style
No ratings yet
007 - Big Data Architecture Style
3 pages
009.1 - Why Is Stream Processing Needed
No ratings yet
009.1 - Why Is Stream Processing Needed
2 pages
011.5 - Streaming Data System Architecture Components - Delivery Tier
No ratings yet
011.5 - Streaming Data System Architecture Components - Delivery Tier
2 pages
006.1 - Properties of Data
No ratings yet
006.1 - Properties of Data
2 pages
008.2 - Real-Time and Streaming Systems
No ratings yet
008.2 - Real-Time and Streaming Systems
2 pages
006.2 - Fact Based Model For Data
No ratings yet
006.2 - Fact Based Model For Data
2 pages
003.1 - Reliability
No ratings yet
003.1 - Reliability
2 pages
003.2 - Scalability
No ratings yet
003.2 - Scalability
3 pages
003.3 - Maintainability
No ratings yet
003.3 - Maintainability
2 pages
EC2 Makeup Old
No ratings yet
EC2 Makeup Old
10 pages
CS 07 Communication and Transaction Management
No ratings yet
CS 07 Communication and Transaction Management
39 pages
Ec2 Regular Old
No ratings yet
Ec2 Regular Old
14 pages
CS 11 Securing and Testing Scalable Services
No ratings yet
CS 11 Securing and Testing Scalable Services
34 pages
Ec2 2025
No ratings yet
Ec2 2025
1 page
CS 12 Deploying Microservices
No ratings yet
CS 12 Deploying Microservices
19 pages
Well Test Interpretation Course Overview
100% (2)
Well Test Interpretation Course Overview
459 pages
Ridhira Zen Detailed Deck
No ratings yet
Ridhira Zen Detailed Deck
32 pages
EHYHBH-AV32, EHYHBX-AV3 - Operation Manual - 4PEN349588-1F - English
No ratings yet
EHYHBH-AV32, EHYHBX-AV3 - Operation Manual - 4PEN349588-1F - English
16 pages
SUPERB Application Form To Participate in The Business Challenge
No ratings yet
SUPERB Application Form To Participate in The Business Challenge
9 pages
Penny Whistle RF Amplifier Manual
No ratings yet
Penny Whistle RF Amplifier Manual
28 pages
Refresher Geo 6 April 2024
No ratings yet
Refresher Geo 6 April 2024
2 pages
SOFARSOLAR Intelligent Anti-Reflux Box SAR-100 Installation Instructions
100% (1)
SOFARSOLAR Intelligent Anti-Reflux Box SAR-100 Installation Instructions
37 pages
Thinking Like An Economist
100% (1)
Thinking Like An Economist
40 pages
Chomsky - Studies On Semantics in Generative Grammar
100% (2)
Chomsky - Studies On Semantics in Generative Grammar
103 pages
Adlerian Therapy
No ratings yet
Adlerian Therapy
7 pages
Class-1-HVAC GLOSSARY
No ratings yet
Class-1-HVAC GLOSSARY
7 pages
BYT 13-600-1000 - Dioda
No ratings yet
BYT 13-600-1000 - Dioda
4 pages
Low-Power Full Adder Design
No ratings yet
Low-Power Full Adder Design
4 pages
Cloud Computing: Sreenidhi Institute of Science and Technology (An Autonomous Institution)
No ratings yet
Cloud Computing: Sreenidhi Institute of Science and Technology (An Autonomous Institution)
8 pages
Hazarika
100% (1)
Hazarika
20 pages
Belt Conveyors
No ratings yet
Belt Conveyors
6 pages
Welding Procedure Specs Guide
No ratings yet
Welding Procedure Specs Guide
23 pages
Business Meeting Role-Play
No ratings yet
Business Meeting Role-Play
4 pages
Kubota Diesel Tractor: F12/R12 Transmission
No ratings yet
Kubota Diesel Tractor: F12/R12 Transmission
16 pages
750 0303 CT1000 - CT2000
No ratings yet
750 0303 CT1000 - CT2000
4 pages
Literacy Exercise (Student)
No ratings yet
Literacy Exercise (Student)
4 pages
Psycho Notes
No ratings yet
Psycho Notes
3 pages
Data Sheet 22 KW
No ratings yet
Data Sheet 22 KW
2 pages
Kindergarten Rhyming Homework Help
100% (1)
Kindergarten Rhyming Homework Help
8 pages
CPU-Upgrade - ASUS M2N-SLI Deluxe Processor Support and Specifications
No ratings yet
CPU-Upgrade - ASUS M2N-SLI Deluxe Processor Support and Specifications
12 pages
فرایویل
No ratings yet
فرایویل
18 pages
Copy Editor Resume
100% (1)
Copy Editor Resume
4 pages
Tire Inspection Format Logistics JL
No ratings yet
Tire Inspection Format Logistics JL
16 pages
Week 01 - Lecture 1 2 Introdutcion To Data Structures and ADT
No ratings yet
Week 01 - Lecture 1 2 Introdutcion To Data Structures and ADT
29 pages
NA Series Super Capacitor Type
No ratings yet
NA Series Super Capacitor Type
3 pages

Module 04C Scaleout & Consistent Hashing - Redundancy

Uploaded by

Module 04C Scaleout & Consistent Hashing - Redundancy

Uploaded by

Scale-out and

• When designing robust, highly available systems three terms

System Type Availability (%) *Downtime per year

Conventional workstation 99 3.6 days

HA System 99.9 8.5 hours

Fault-resilient system 99.99 1 hour

Fault-tolerant system 99.999 5 minutes

*May not include planned downtime (on weekends or midnight)

• (Planned) Shutdown vs. (Unplanned) Failure

• What are SPoF here?

• Assume that network and RAID is 100% available

CSIS Dept. BITS Pilani, Pilani Campus

 Possible cluster configurations:

 Primary node mirrors any data to shared storage,

 Cost of additional system

1. Active-Passive / Hot standby

1. Hot standby clusters

 When a node fails, applications fail-over to a

 When a node recovers from a failure, the failed-

• Storage is shared storage.

• Storage is shared storage. Assume

• Storage is shared storage. Assume that

• A designated node acts as stand-by.

• N-to-1 requires failback which

• Storage is shared storage. Assume that

• Storage is shared storage. Assume that

 The migration of services from one node to another is called

 Minimal Manual intervention

 Guaranteed data access

 Critical elements to be moved

 IP address the client uses should be transferred to

 The collection of these elements is called a “service group”.

 heartbeat messages – a common way to

 Hardware components monitoring

 Slow response times??

 Query as an end user and check.

 Standby server thinks primary has failed. So it tries

 Processes periodically save consistent state information

 Imposes restrictions on which systems calls can be used. (forbids IPC).

 May not require source code modifications

 Program has to link to this library

 Requires modification of the source code.

 One of the first library implementations for UNIX

 Forward recovery is useful in systems

CSIS Dept. BITS Pilani, Pilani Campus

You might also like