Page |0
https://linkedin.com/in/prafulpatel16
https://github.com/
https://medium.com/@prafulpatel16
Date: June 21, 2022
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Project:
Project Description:
Application Name: Praful’s webportfolio application
Cloud: AWS Cloud
Cloud Services: AWS EC2, AWS EBS, Volume, Snapshots, CloudWatch, SNS
WebServer: apache webserver
An IT services provider, PRAfect Systems Inc., is engaged in providing Cloud/DevOps & software
development solutions. The company recently migrated its entire workload to the AWS Cloud. All
the workload has been running on the EC2 virtual machine where application server is configured
and web application is accessed through this server. They have configure the monitoring system
with AWS Cloudwatch and integrated a SNS notification as well through which cloud engineer
received a notification whenever there is a SystemCheck Failed for EC2 machine.
One morning cloud engineer received a System failure notification in to email, it was about EC2
machine root volume got corrupted due to some wrong system patching and hence it got system
check instance failed via monitoring system.
1
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
RTO & RPO Requirements:
In order to maintain a business continuity the requirement is to maintain an RPO & RTO ratio is
most critical and essential during the disaster condition.
RTO = 10 min. must meet the downtime and acceptable in order to recover the system from
failure.
RPO = 05 min. must meet and system should be able to get back and recover the backup within
the last 05 min.
This project demonstrates an experience of designing and implementing of Disaster recovery
scenario which can fulfil the defined business RPO & RTO requirements, along with cloudwatch
monitoring and SNS notification system.
Project Cost Estimation:
(Note: This cost is Not any actual cost, it’s just an estimation based on high level requirement. Price may be vary
based on adding and removing services based on requirement.)
Tools & Technologies covered:
AWS Cloud
AWS Identity & Access Management (IAM)
AWS EC2 Machine
AWS Cloudwatch
AWS SNS
Terraform (Automated Cloud Provisioning Tool)
Ansible
Visual studio code IDE
GitHub
GitBash
Draw.io
2
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Amazon Simple Notification Service
(Amazon SNS)
AmazonElastic
Amazon Elastic Block
BlockStore
Store
(Amazon EBS)
(Amazon EBS) Volume Snapshot
Amazon Elastic Compute Alarm
Cloud (Amazon EC2)
Alarm
Amazon CloudWatch Email notification Metrics Insights
Resilience in Amazon EC2
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/disaster-recovery-resiliency.html
https://disaster-recovery.workshop.aws/en/intro/disaster-recovery.html
3
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
In addition to the AWS global infrastructure, Amazon EC2 offers the following features to support your
data resiliency:
1. Copying AMIs across Regions
2. Copying EBS snapshots across Regions
3. Automating EBS-backed AMIs using Amazon Data Lifecycle Manager
4. Automating EBS snapshots using Amazon Data Lifecycle Manager
5. Maintaining the health and availability of your fleet using Amazon EC2 Auto Scaling
6. Distributing incoming traffic across multiple instances in a single Availability Zone or multiple
Availability Zones using Elastic Load Balancing
Instance status checks
Instance status checks monitor the software and network configuration of your individual
instance. Amazon EC2 checks the health of the instance by sending an address resolution
protocol (ARP) request to the network interface (NIC). These checks detect problems that
require your involvement to repair. When an instance status check fails, you typically must
address the problem yourself (for example, by rebooting the instance or by making
instance configuration changes).
The following are examples of problems that can cause instance status checks to fail:
Failed system status checks
Incorrect networking or startup configuration
Exhausted memory
Corrupted file system
Incompatible kernel
4
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Solution Architecture:
AWS EBS Root Volume Disaster Recovery by RPO & RTO Architecture
5
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
RPO & RTO Architecture
This project will be completed in following implementation phases.
Project implementation Phase:
Phase 1: Deploy EC2 machine using terraform automation
o Write terraform script to launch EC2 machine
o Write main.tf, variable.tf and output.tf
o Prepare user data webserver and application source code packages in shell script
o Add user data file within the terraform configuration
o Verify that web application is successfully accessed from web browser
Phase 2: Take a snapshot of root volume manual way.
o Go to snapshots and take snapshot of existing root volume A
o Verify that snapshot process is complete
OR
Phase 2.1: Take a snapshot of root volume Ansible automated way.
o Go to VS code IDE
o Gather the instance and volume information manually
o Write snapshot yaml file
6
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
o Run Ansible playbook file
o Verify from the AWS console that snapshot has been created.
Phase 3: Configure Cloudwatch monitoring & SNS topic for failure notification
o Go to Cloudwatch and create an Alarm
o Select an EC2 metric: StatusFailedCheck_System
o Create a new SNS Topic and provide an email address
o Complete the cloud watch process
o Go to SNS Topic and confirm the subscription by verifying the link
Phase 4: Simulate and Trigger a Disaster recovery scenario.
o Prepare manual system failure script
o Write a script to remove an application files from the apache root directory
/var/www/html/
o Run the script from EC2 machine.
o Verify that all application files removed
o Verify that web application is not accessible.
Phase 5: Simulate Cloudwatch monitoring ‘In-Alarm”
o Login to EC2 machine.
o Become a root user
o Configure aws configure
o Run the cloudwatch set-alarm script to put into “In-Alarm” status
o Go to Cloudwatch and verify that status is turned from “OK” to “In-Alarm”
o Go to email and verify that email is received with necessary information.
Phase 6: Recover from Disaster condition (RPO 5 min. RTO 10 min.)
o Go to EC2 machine
o Go to Action – make sure that ec2 machine is running.
o Select an option “Monitor and troubleshoot”
o Select an option “Replace root volume”
o Select a recent snapshot taken within last 5 minutes to meet the RPO condition
o Attach complete the snapshot
o Go to Volume and verify that the new snaphost volume is attached and “ In-Use”
status
o Verify that old root volume is “Available” status which is no use and corrupted
now.
o Verify that web application is now accessible
Pre-Requisite:
7
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
o VS Code installed and configured in windows
o Terraform installed and configured in VS code
o AWS IAM user account with “AWSEC2FullAccess” permission
o User-data script ready for webapp source code
o bash script for application removal
AWS IAM user account with “AWSEC2FullAccess” permission
Create a New IAM user with EC2FullAccess permission with programmatic access
Implementation in an Action:
Phase 1: Deploy EC2 machine using terraform automation
o Write terraform script to launch EC2 machine
o Write main.tf, variable.tf and output.tf
8
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
o Prepare user data webserver and application source code packages in shell script
o Add user data file within the terraform configuration
o Verify that web application is successfully accessed from web browser
Terraform init
Terraform plan
Terraform apply
9
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Apply complete
10
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Verify from aws console if ec2 instance is launched
Verify that web application is accessible from browser
11
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
https://github.com/prafulpatel16/terraform-projects-aws.git
Push source code to github
12
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Verify that code updated to the github
Phase 2: Take a snapshot of root volume
o Go to snapshots and take snapshot of existing root volume A
o Verify that snapshot process is complete
Volume
Go to Volume
13
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Take a snapshot
14
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Snapshot complete
15
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
OR
Phase 2.1: Take a snapshot of root volume Ansible automated way.
o Go to VS code IDE
o Gather the instance and volume information manually
o Write snapshot yaml file
o Run Ansible playbook file
o Verify from the AWS console that snapshot has been created.
Go to VS code IDE
Gather the instance and volume information manually
aws_region:
Instance id:
16
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Device_name:
Write main snapshot yaml file
Run Ansible playbook file
Verify from the AWS console that snapshot has been created.
17
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Phase 3: Configure Cloudwatch monitoring & SNS topic for failure notification
o Go to Cloudwatch and create an Alarm
o Select an EC2 metric: StatusFailedCheck_System
o Create a new SNS Topic and provide an email address
o Complete the cloud watch process
o Go to SNS Topic and confirm the subscription by verifying the link
System Monitoring
Configure Cloudwatch
1. Go to Cloudwatch “Alarm – Create Alarm”
2. Select Metric – EC2
3. Select Per-Instance Metrics
4. Find the metric name: StatusCheckFailed_System
5. Create New SNS Topic
6. Complete the Cloudwatch process
7. Confirm SNS Subscription
Create Alarm
18
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
19
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
20
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Create a New SNS Topic
21
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Go to EC2 Action
Choose: Recover this Instance
22
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Go to SNS Service to confirm
23
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Confirm Subscription
Go to Gmail and grab the url
Click to confirm subscription
24
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Go to Cloudwatch and observe the Alarm status
25
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Phase 4: Simulate and Trigger a Disaster recovery scenario.
o Prepare manual system failure script
o Write a script to remove an application files from the apache root directory
/var/www/html/
o Run the script from EC2 machine.
o Verify that all application files removed
o Verify that web application is not accessible.
Login to EC2 machine
AWS configure with new user into EC2 machine
Aws configure
26
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Simulate Web Server app failure by removing application code:
27
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Create an app remove script
Run the script
28
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Verify all files removed from root directory
Verified that web application removed
Phase 5: Simulate Cloudwatch monitoring ‘In-Alarm”
o Login to EC2 machine.
o Become a root user
o Configure aws configure
o Run the cloudwatch set-alarm script to put into “In-Alarm” status
o Go to Cloudwatch and verify that status is turned from “OK” to “In-Alarm”
o Go to email and verify that email is received with necessary information.
Simulate the EC2 webServer System Failure by CloudWatch
Become a root user
Sudo su –
Prepare a simulation script which can simulate the Alarm in status
cloudwatch alarm trigger:
aws cloudwatch set-alarm-state \
--alarm-name "WebServer_Alarm" \
--state-value ALARM \
29
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
--state-reason "Simulate an EC2 HW failure"
Email Received
Cloudwatch Alarm status changed to In-Alarm
30
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Phase 6: Recover from Disaster condition (RPO 5 min. RTO 10 min.)
o Go to EC2 machine
o Go to Action – make sure that ec2 machine is running.
o Select an option “Monitor and troubleshoot”
o Select an option “Replace root volume”
o Select a recent snapshot taken within last 5 minutes to meet the RPO condition
o Attach complete the snapshot
o Go to Volume and verify that the new snaphost volume is attached and “ In-Use”
status
o Verify that old root volume is “Available” status which is no use and corrupted
now.
o Verify that web application is now accessible
31
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Recover Root Instance from Failure
Go to Snapshot
Verify that snapshot is available
Attach new volume to WebServer EC2 instance
Go to EC2
Replace root volume
32
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
New Volume created and attached to ec2 and “In-Use” and older one is in “Available”
33
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Go to Web browser and verify that web application is up and running again after recovery
Go to Monitoring and verify that Cloudwatch Alarm is in “OK” Status
34
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Congratulations!!!! 🔥🚀
35
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Clean up Project:
Terraform destroy
Remove volume
36
AWS PROJECT: AWS-EBS ROOT VOLUME DISASTER RECOVERY, MONITORING & NOTIFICATION
SOLUTION DESIGN & IMPLEMENTATION BY: PRAFUL PATEL
Remove snaphosts
Remove Cloudwatch Alarm
Remove SNS Topic
Resources:
https://wellarchitectedlabs.com/reliability/300_labs/300_testing_for_resiliency_of_ec2_rds_and_s3/6_failure_injection_app/
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/replace-root.html#view-replacement-tasks
https://github.com/terraform-aws-modules/terraform-aws-cloudwatch
https://docs.ansible.com/ansible/2.5/modules/ec2_snapshot_module.html
Congratulations!!!! 🔥🚀
37