Introduction to Business Continuity
Module 3.1
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity
After completing this module, you will be able to: Define Business Continuity and Information Availability
Detail impact of information unavailability
Define BC measurement and terminologies Describe BC planning process Detail BC technology solutions
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 2
What is Business Continuity
Business Continuity is preparing for, responding to, and recovering from an application outage that adversely affects business operations Business Continuity solutions address unavailability and degraded application performance BC is an integrated and enterprise wide process and set of activities to ensure information availability
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 3
What is Information Availability (IA)
IA refers to the ability of an infrastructure to function according to business expectations during its specified time of operation IA can be defined in terms of three parameters:
Accessibility
Information should be accessible at right place and to the right user
Reliability
Information should be reliable and correct
Timeliness
Information must be available whenever required
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 4
Causes of Information Unavailability
Disaster (<1% of Occurrences) Natural or man made
Flood, fire, earthquake Contaminated building
Unplanned Outages (20%) Failure
Database corruption Component failure Human error
Planned Outages (80%) Competing workloads
Backup, reporting Data warehouse extracts Application and data restore
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 5
Impact of Downtime
Lost Productivity Number of employees impacted (x hours out * hourly rate)
Know the downtime costs (per hour, day, two days...)
Lost Revenue Direct loss Compensatory payments Lost future revenue Billing losses Investment losses
Damaged Reputation Customers Suppliers Financial markets Banks Business partners
Financial Performance Revenue recognition Cash flow Lost discounts (A/P) Payment guarantees Credit rating Stock price
Other Expenses Temporary employees, equipment rental, overtime costs, extra shipping costs, travel expenses...
2009 EMC Corporation. All rights reserved. Introduction to Business Continuity - 6
Measuring Information Availability
MTTR Time to repair or downtime
Response Time
Recovery Time
Detection
Repair
Restoration
Incident
Diagnosis
Recovery
Time Incident
Detection elapsed time
Repair time
MTBF Time between
failures or uptime
MTBF: Average time available for a system or component to perform its normal operations between failures MTTR: Average time required to repair a failed component
IA = MTBF / (MTBF + MTTR) or IA = uptime / (uptime + downtime)
2009 EMC Corporation. All rights reserved. Introduction to Business Continuity - 7
Availability Measurement Levels of 9s Availability
% Uptime % Downtime Downtime per Year Downtime per Week
98% 99% 99.8%
2% 1% 0.2%
7.3 days 3.65 days 17 hrs 31 min
3hrs 22 min 1 hr 41 min 20 min 10 sec
99.9%
99.99%
0.1%
0.01%
8 hrs 45 min
52.5 min
10 min 5 sec
1 min
99.999%
99.9999%
2009 EMC Corporation. All rights reserved.
0.001%
0.0001%
5.25 min
31.5 sec
6 sec
0.6 sec
Introduction to Business Continuity - 8
BC Terminologies
Disaster recovery
Coordinated process of restoring systems, data, and infrastructure required to support ongoing business operations in the event of a disaster Restoring previous copy of data and applying logs to that copy to bring it to a known point of consistency Generally implies use of backup technology
Disaster restart
Process of restarting from disaster using mirrored consistent copies of data and applications
Generally implies use of replication technologies
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 9
BC Terminologies (Cont.)
Recovery Point Objective (RPO) Point in time to which systems and data must be recovered after an outage Amount of data loss that a business can endure Recovery Time Objective (RTO) Time within which systems, applications, or functions must be recovered after an outage Amount of downtime that a business can endure and survive
Weeks Days Hours
Tape Backup Periodic Replication Asynchronous Replication
Weeks
Tape Restore Disk Restore Manual Migration
Days
Hours Minutes
Minutes Seconds
Synchronous Replication
Seconds
Global Cluster
Recovery-point objective
2009 EMC Corporation. All rights reserved.
Recovery-time objective
Introduction to Business Continuity - 10
Business Continuity Planning (BCP) Process
Identifying the critical business functions Collecting data on various business processes within those functions Business Impact Analysis (BIA)
Risk Analysis
Assessing, prioritizing, mitigating, and managing risk
Designing and developing contingency plans and disaster recovery plan (DR Plan)
Testing, training and maintenance
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 11
BC Technology Solutions
Following are the solutions and supporting technologies that enable business continuity and uninterrupted data availability:
Single point of failure Multi-pathing software Backup and replication
Backup recovery Local replication
Remote replication
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 12
Resolving Single Points of Failure
Clustered Servers Heartbeat Connection Redundant Ports Redundant Arrays
Client
FC Switches
IP
Storage Array Storage Array Remote Site Redundant Network
Redundant Paths Redundant FC Switches
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 13
Multi-pathing Software
Configuration of multiple paths increases data availability Even with multiple paths, if a path fails I/O will not reroute unless system recognizes that it has an alternate path Multi-pathing software helps to recognize and utilizes alternate I/O path to data
Multi-pathing software also provide the load balancing
Load balancing improves I/O performance and data path utilization
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 14
Backup and Replication
Local Replication
Data from the production devices is copied to replica devices within the same array The replicas can then be used for restore operations in the event of data corruption or other events
Remote Replication
Data from the production devices is copied to replica devices on a remote array In the event of a failure, applications can continue to run from the target device
Backup/Restore
Backup to tape has been a predominant method to ensure business continuity Frequency of backup is depend on RPO/RTO requirements
2009 EMC Corporation. All rights reserved. Introduction to Business Continuity - 15
Module Summary
Key points covered in this module: Importance of Business Continuity
Types of outages and their impact to businesses
Information availability measurements Definitions of disaster recovery and restart, RPO and RTO Business Continuity technology solutions overview
2009 EMC Corporation. All rights reserved.
Introduction to Business Continuity - 16
Concept in Practice EMC PowerPath
Host Based Software
SERVER
Host Application (s) PowerPath
SCSI Driver SCSI SCSI SCSI Driver Driver Driver SCSI SCSI Driver Driver
SCSI SCSI SCSI SCSI SCSI SCSI ControllerController ControllerController ControllerController
Resides between application and SCSI device driver Provides Intelligent I/O path management Transparent to the application Automatic detection and recovery from host-to-array path failures
2009 EMC Corporation. All rights reserved.
Storage Network
STORAGE
LUN LUN LUN LUN
Introduction to Business Continuity - 17
Check Your Knowledge
Which concerns do business continuity solutions address? Availability is expressed in terms of 9s. Explain the relevance of the use of 9s for availability, using examples. What is the difference between RPO and RTO? What is the difference between Disaster Recovery and Disaster Restart? Provide examples of planned and unplanned downtime in the context of data center operations. What are some of the Single Points of Failure in a typical data center environment?
2009 EMC Corporation. All rights reserved. Introduction to Business Continuity - 18