0% found this document useful (0 votes)
9 views78 pages

AWS DevOps Interview Preparation Guide - 78 Pages

The AWS DevOps Interview Preparation Guide is a comprehensive resource covering essential DevOps tools and practices, AWS services, and CI/CD management. It includes detailed sections on Infrastructure as Code with Terraform, IAM policies, and various CI/CD tools like Jenkins and GitHub Actions. Additionally, it provides best practices for scripting and automation, ensuring candidates are well-prepared for interviews and practical applications in DevOps roles.

Uploaded by

kpvivek.97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views78 pages

AWS DevOps Interview Preparation Guide - 78 Pages

The AWS DevOps Interview Preparation Guide is a comprehensive resource covering essential DevOps tools and practices, AWS services, and CI/CD management. It includes detailed sections on Infrastructure as Code with Terraform, IAM policies, and various CI/CD tools like Jenkins and GitHub Actions. Additionally, it provides best practices for scripting and automation, ensuring candidates are well-prepared for interviews and practical applications in DevOps roles.

Uploaded by

kpvivek.97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

AWS DevOps Interview Preparation Guide

Comprehensive Reference for Interview Preparation and Daily Use

Table of Contents
1. Section 1: DevOps Tools & Practices
Infrastructure as Code (Terraform)

IAM Policies & Permissions

CI/CD Design & Build Management

Scripting & Automation

Containerization & Orchestration

Version Control & Repositories

2. Section 2: AWS Services & Concepts


Core AWS Services

DevOps on AWS

Security & Compliance

3. Quick Reference Checklists

SECTION 1: DevOps Tools & Practices


🧱 Infrastructure as Code (IaC) - Terraform
Core Concepts
Modules

Reusable infrastructure components that encapsulate resources

Root modules contain main configuration; child modules are called by root

Use input variables and outputs for flexibility and data flow

Organize by logical grouping (networking, compute, database)

State Files

terraform.tfstate tracks current infrastructure state

JSON format containing resource metadata and dependencies


Critical for determining what changes need to be applied

Never manually edit state files

Use terraform state commands for state manipulation

Remote Backends

Store state files remotely for team collaboration

Enables state locking to prevent concurrent modifications

Common backends: S3 + DynamoDB (AWS), Terraform Cloud, Azure Storage

Configuration example:

hcl

terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}

Resource Provisioning and Lifecycle


Resource Creation

Define resources in .tf files using HCL syntax

Run terraform init to initialize providers

Use terraform plan to preview changes

Execute terraform apply to create resources

Resource Updates

Modify resource configuration in .tf files

Terraform detects changes and determines update method

In-place updates preserve resource identity

Replacement creates new resource and destroys old one

Resource Destruction
terraform destroy removes all managed resources

Remove resource block from config and apply to destroy specific resources

Use prevent_destroy lifecycle argument to protect critical resources

Lifecycle Meta-Arguments

create_before_destroy : Create replacement before destroying original

prevent_destroy : Block resource destruction

ignore_changes : Ignore specific attribute changes

replace_triggered_by : Force replacement when specific resources change

AWS Provider Configuration and Best Practices


Provider Configuration

hcl

provider "aws" {
region = var.aws_region

default_tags {
tags = {
Environment = var.environment
ManagedBy = "Terraform"
}
}
}

Best Practices

Use workspaces for environment separation (dev, staging, prod)

Implement state locking with DynamoDB to prevent conflicts

Version control all .tf files; exclude .tfstate from git

Use variables for configurable values; avoid hardcoding

Structure projects with modules for reusability

Run terraform fmt to maintain consistent formatting

Use terraform validate to check configuration syntax

Always review terraform plan output before applying

Tag all resources for cost tracking and management


Use data sources to reference existing infrastructure

Implement CI/CD for automated terraform deployments

🔐 IAM Policies & Permissions


Role-Based Access Control (RBAC)
IAM Users

Individual identities with long-term credentials (access keys)

Use for permanent human users or legacy applications

Enable MFA for enhanced security

Avoid using root user for daily operations

IAM Roles

Temporary security credentials assumed by trusted entities

No long-term credentials (access keys)

Can be assumed by users, applications, or AWS services

Support cross-account access and federated identities

Automatically rotate credentials

IAM Groups

Collection of users with shared permissions

Attach policies to groups rather than individual users

Simplifies permission management at scale

Users inherit all permissions from groups they belong to

Service Roles

Allow AWS services to perform actions on your behalf

EC2 instances assume roles via instance profiles

Lambda functions use execution roles

ECS tasks use task roles for granular permissions

Policy Creation and Evaluation


Policy Structure
json

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": "203.0.113.0/24"
}
}
}
]
}

Policy Types

Identity-based policies: Attached to users, groups, or roles

Resource-based policies: Attached to resources (S3 buckets, SQS queues)

Permission boundaries: Set maximum permissions for entities

Service Control Policies (SCPs): Organization-wide restrictions

Session policies: Temporary policies for assumed role sessions

Policy Evaluation Logic

1. Explicit Deny: Always takes precedence (immediate deny)

2. Explicit Allow: Required from at least one policy

3. Implicit Deny: Default if no explicit allow exists

Evaluation Flow

Check for explicit deny in all policies (SCPs, permission boundaries, identity/resource policies)

If explicit deny found, request is denied

Check for explicit allow in applicable policies

If no explicit allow, request is denied (implicit deny)

Evaluation Context

AWS Organizations SCPs (if applicable)


Permission boundaries (if set)

Identity-based policies

Resource-based policies

Session policies (for assumed roles)

Permission Boundaries and Trust Relationships


Permission Boundaries

Advanced feature for delegating permissions management

Sets maximum permissions an IAM entity can have

Used to prevent privilege escalation

Common in multi-account or team-based scenarios

Does not grant permissions, only limits them

Trust Relationships (Trust Policies)

Defines who or what can assume a role

Attached to IAM roles, not users or groups

Specifies trusted principals (AWS accounts, services, federated users)

Example trust policy:

json

{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}]
}

Cross-Account Access

Trust policy in target account allows source account

Source account user needs permission to assume role

External ID for enhanced security with third parties


Use roles instead of sharing credentials

Grant least privilege - only required permissions

Use managed policies for common use cases

Create custom policies for specific requirements

Regularly review and audit permissions

Enable MFA for sensitive operations

Use conditions to add additional security controls

Test policies with IAM policy simulator

🔁 CI/CD Design & Build Management


Pipeline Tools Overview
Jenkins

Self-hosted, open-source automation server

Extensive plugin ecosystem (2000+ plugins)

Groovy-based pipeline definitions (Jenkinsfile)

Declarative and Scripted pipeline syntax

Distributed builds with master-agent architecture

Blue Ocean UI for modern visualization

GitHub Actions

Cloud-native CI/CD integrated with GitHub

YAML-based workflow definitions

Event-driven (push, PR, schedule, manual)

Matrix builds for parallel testing

Marketplace with pre-built actions

Free tier for public repositories

Concourse CI

Container-based pipeline execution

Pipeline as code with YAML configuration


Reproducible builds (every task in fresh container)

Resource-oriented architecture

Strong isolation between pipeline steps

Visual pipeline representation

Pipeline Creation, Triggers, and Stages


Common Pipeline Stages

1. Source Stage
Checkout code from version control (Git, CodeCommit)

Trigger on commit, PR, or schedule

Clone repository with specific branch/tag

2. Build Stage
Compile source code

Run unit tests

Static code analysis (SonarQube, linting)

Build artifacts (JAR, WAR, Docker images)

3. Test Stage
Integration tests

Security scanning (SAST, DAST, dependency checks)

Performance testing

Quality gates for code coverage and complexity

4. Deploy Stage
Deploy to target environment (dev → staging → prod)

Blue-green or canary deployments

Database migrations

Configuration management

5. Verify Stage
Smoke tests post-deployment

Health checks and monitoring

Rollback on failure

Pipeline Triggers
SCM polling (check for changes periodically)

Webhooks (immediate notification on push)

Scheduled (cron-based)

Manual approval gates

Upstream/downstream pipeline dependencies

Integration with Testing Tools

Unit testing frameworks (JUnit, PyTest, Jest)

Integration testing (Selenium, Postman/Newman)

Security tools (OWASP ZAP, Snyk, Trivy)

Code quality (SonarQube, CodeClimate)

Apache Maven
Build Lifecycle Phases

1. validate: Validate project structure and configuration

2. compile: Compile source code

3. test: Run unit tests

4. package: Package compiled code (JAR, WAR)

5. verify: Run integration tests

6. install: Install package to local repository

7. deploy: Deploy package to remote repository

Running Maven Commands

bash
# Clean and build
mvn clean install

# Skip tests during build


mvn clean install -DskipTests

# Run specific phase


mvn package

# Run specific goal


mvn dependency:tree

Dependency Management

Central Repository (Maven Central) for public artifacts

POM (pom.xml) defines project dependencies

Transitive dependencies automatically resolved

Dependency scopes: compile, provided, runtime, test, system

Dependency version management with properties

Exclude transitive dependencies when conflicts arise

Plugin Usage

Plugins extend Maven functionality

Common plugins:
maven-compiler-plugin : Configure Java version

maven-surefire-plugin : Run unit tests

maven-failsafe-plugin : Run integration tests

maven-assembly-plugin : Create distribution packages

maven-shade-plugin : Create uber JAR

Configure plugins in <build><plugins> section

Bind plugin goals to lifecycle phases

POM Structure

xml
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>my-app</artifactId>
<version>1.0.0</version>
<packaging>jar</packaging>

<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>2.7.0</version>
</dependency>
</dependencies>
</project>

Best Practices

Use parent POM for multi-module projects

Define versions in properties for consistency

Use dependency management section for version control

Leverage profiles for environment-specific builds

Keep dependencies up to date

Use Maven wrapper (mvnw) for consistent builds

🧪 Scripting & Automation


Writing Reusable Scripts
Script Design Principles

Single Responsibility: Each script should do one thing well

Parameterization: Accept inputs via arguments or environment variables

Idempotency: Safe to run multiple times without adverse effects

Error Handling: Gracefully handle failures with meaningful messages

Logging: Provide visibility into script execution

Documentation: Include usage instructions and examples

Modular Code Structure


Break complex scripts into functions

Separate configuration from logic

Use libraries and modules to avoid duplication

Create reusable utility functions

AWS SDK Usage (boto3 for Python)


Basic boto3 Usage

python

import boto3
from botocore.exceptions import ClientError

# Create service client


s3 = boto3.client('s3', region_name='us-east-1')

# Using resource interface (higher-level)


s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket('my-bucket')

Common Operations

S3 Operations

python

# Upload file
s3.upload_file('local.txt', 'bucket-name', 'key.txt')

# Download file
s3.download_file('bucket-name', 'key.txt', 'local.txt')

# List objects
response = s3.list_objects_v2(Bucket='bucket-name', Prefix='folder/')

EC2 Operations

python
ec2 = boto3.client('ec2')

# Describe instances
response = ec2.describe_instances(
Filters=[{'Name': 'tag:Environment', 'Values': ['production']}]
)

# Start instances
ec2.start_instances(InstanceIds=['i-1234567890abcdef0'])

Session Management

python

# Using specific credentials


session = boto3.Session(
aws_access_key_id='ACCESS_KEY',
aws_secret_access_key='SECRET_KEY',
region_name='us-east-1'
)

s3 = session.client('s3')

# Assume role for cross-account access


sts = boto3.client('sts')
response = sts.assume_role(
RoleArn='arn:aws:iam::123456789012:role/MyRole',
RoleSessionName='session-name'
)

Python, Bash, PowerShell Scripts


Python Best Practices

python
import logging
import sys
from typing import List, Dict

# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

def process_items(items: List[str]) -> Dict:


"""Process list of items with error handling."""
results = {'success': [], 'failed': []}

for item in items:


try:
# Process item
logger.info(f"Processing {item}")
results['success'].append(item)
except Exception as e:
logger.error(f"Failed to process {item}: {str(e)}")
results['failed'].append(item)

return results

if __name__ == '__main__':
items = sys.argv[1:]
results = process_items(items)
logger.info(f"Processed: {len(results['success'])} successful")

Bash Scripting Best Practices

bash
#!/bin/bash
set -euo pipefail # Exit on error, undefined variables, pipe failures

# Configuration
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly LOG_FILE="${SCRIPT_DIR}/script.log"

# Logging function
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a "${LOG_FILE}"
}

# Error handling
error_exit() {
log "ERROR: $1"
exit 1
}

# Main function
main() {
log "Starting script execution"

# Check prerequisites
command -v aws >/dev/null 2>&1 || error_exit "AWS CLI not found"

# Script logic
aws s3 ls || error_exit "Failed to list S3 buckets"

log "Script completed successfully"


}

# Cleanup on exit
cleanup() {
log "Cleaning up temporary files"
}
trap cleanup EXIT

main "$@"

PowerShell for Windows Automation

powershell
[CmdletBinding()]
param(
[Parameter(Mandatory=$true)]
[string]$Environment
)

# Error handling
$ErrorActionPreference = "Stop"

# Logging
function Write-Log {
param([string]$Message)
$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
Write-Host "[$timestamp] $Message"
}

try {
Write-Log "Starting deployment to $Environment"

# AWS operations
$instances = aws ec2 describe-instances --filters "Name=tag:Environment,Values=$Environment" | ConvertFrom-Json

Write-Log "Found $($instances.Reservations.Count) instances"


} catch {
Write-Log "ERROR: $_"
exit 1
}

Error Handling and Logging


Error Handling Strategies

Use try-catch blocks for exception handling

Validate inputs before processing

Implement retry logic with exponential backoff

Provide specific error messages for debugging

Log errors with context (timestamp, operation, inputs)

Logging Best Practices

Use structured logging (JSON format) for machine parsing

Include log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL

Log to both console and file


Rotate log files to manage disk space

Include correlation IDs for distributed systems

Sanitize sensitive data before logging

Retry Logic Example

python

import time
from botocore.exceptions import ClientError

def retry_with_backoff(func, max_retries=3):


for attempt in range(max_retries):
try:
return func()
except ClientError as e:
if e.response['Error']['Code'] == 'ThrottlingException':
wait_time = 2 ** attempt
logger.warning(f"Throttled, retrying in {wait_time}s")
time.sleep(wait_time)
else:
raise
raise Exception("Max retries exceeded")

📦 Containerization & Orchestration


Docker
Image Creation and Dockerfiles

Dockerfile Structure

dockerfile
# Use specific version tags, not 'latest'
FROM node:16-alpine

# Set working directory


WORKDIR /app

# Copy dependency files first (layer caching)


COPY package*.json ./

# Install dependencies
RUN npm ci --only=production && \
npm cache clean --force

# Copy application code


COPY . .

# Non-root user for security


RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
USER nodejs

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD node healthcheck.js || exit 1

# Start command
CMD ["node", "server.js"]

Multi-Stage Builds

dockerfile
# Build stage
FROM node:16 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:16-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --only=production
CMD ["node", "dist/server.js"]

Docker Best Practices

Use official base images from trusted sources

Pin specific image versions (avoid latest tag)

Minimize layer count and image size

Leverage build cache by ordering Dockerfile instructions

Use .dockerignore to exclude unnecessary files

Run containers as non-root users

Scan images for vulnerabilities (Trivy, Snyk)

Use multi-stage builds to reduce final image size

Volumes

Persist data outside container lifecycle

Named volumes: Managed by Docker

Bind mounts: Mount host directories

tmpfs mounts: In-memory storage

bash

docker run -v myvolume:/app/data myimage


docker run -v /host/path:/container/path myimage

Networking
Bridge: Default network, containers can communicate

Host: Container uses host network stack

Overlay: Multi-host networking for Swarm/Kubernetes

None: Disable networking

bash

docker network create mynetwork


docker run --network mynetwork myimage

Kubernetes
Pods

Smallest deployable unit in Kubernetes

Contains one or more containers

Shares network namespace and storage volumes

Ephemeral by design

yaml

apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80

Deployments

Manage ReplicaSets and Pods

Declarative updates and rollbacks

Rolling updates with zero downtime

Self-healing (restart failed pods)

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"

Services

Expose pods to network traffic

Load balancing across pod replicas

Service discovery via DNS

Service Types

ClusterIP: Internal access only (default)

NodePort: Expose on each node's IP at static port

LoadBalancer: External load balancer (cloud provider)

ExternalName: Map to external DNS name

yaml
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
type: LoadBalancer
selector:
app: nginx
ports:
- port: 80
targetPort: 80

ConfigMaps and Secrets

ConfigMaps: Non-sensitive configuration

yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
database_url: "postgres://db:5432"
log_level: "info"

Secrets: Sensitive data (base64 encoded)

yaml

apiVersion: v1
kind: Secret
metadata:
name: db-secret
type: Opaque
data:
username: YWRtaW4=
password: cGFzc3dvcmQ=

Using ConfigMaps and Secrets

yaml
containers:
- name: app
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: db-secret

Helm Charts and Cluster Management


Helm Charts

Package manager for Kubernetes

Templated YAML manifests

Versioning and rollback support

Values file for customization

Chart Structure

mychart/
├── Chart.yaml # Chart metadata
├── values.yaml # Default configuration values
├── charts/ # Dependency charts
└── templates/ # Kubernetes manifest templates
├── deployment.yaml
├── service.yaml
└── ingress.yaml

Helm Commands

bash
# Install chart
helm install myapp ./mychart -f custom-values.yaml

# Upgrade release
helm upgrade myapp ./mychart

# Rollback to previous version


helm rollback myapp 1

# List releases
helm list

# Uninstall release
helm uninstall myapp

Cluster Management

Namespaces

Logical isolation within cluster

Resource quotas per namespace

RBAC policies scoped to namespace

bash

kubectl create namespace production


kubectl get pods -n production

Resource Quotas and Limits

yaml

apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi

RBAC (Role-Based Access Control)


ServiceAccounts for pods

Roles and ClusterRoles for permissions

RoleBindings and ClusterRoleBindings to assign

yaml

apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
subjects:
- kind: ServiceAccount
name: app-sa
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io

Health Checks

Liveness Probe: Restart container if unhealthy

Readiness Probe: Remove from service if not ready

Startup Probe: Delay other probes until app starts

yaml
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

Cluster Best Practices

Use namespaces for environment separation

Set resource requests and limits on all containers

Implement network policies for security

Use Horizontal Pod Autoscaler (HPA) for scaling

Regular cluster and component updates

Monitor cluster health with Prometheus/Grafana

Backup etcd data regularly

Use Pod Disruption Budgets for availability

📁 Version Control & Repositories


Git and GitHub
Branching Strategies

Git Flow

main: Production-ready code

develop: Integration branch for features

feature/*: New features, branch from develop

release/*: Release preparation, branch from develop

hotfix/*: Emergency fixes, branch from main

Workflow:
1. Create feature branch from develop

2. Complete feature, merge back to develop

3. Create release branch when ready for production

4. Test and fix in release branch

5. Merge release to main and develop

6. Tag release in main

7. Hotfixes branch from main, merge to main and develop

Trunk-Based Development

Single main branch (trunk)

Short-lived feature branches (1-2 days max)

Frequent integration to main

Feature flags for incomplete features

Continuous Integration/Deployment

Requires strong automated testing

GitHub Flow

Simplified workflow for continuous deployment

main branch always deployable

Create feature branch from main

Open Pull Request early for discussion

Deploy from branch for testing

Merge to main and deploy immediately

Pull Requests, Hooks, and Collaboration


Pull Request Best Practices

Small, focused changes (easier to review)

Clear title and description

Reference related issues

Self-review before requesting others

Respond to feedback promptly


Keep PRs up to date with base branch

Code Review Guidelines

Review within 24 hours

Focus on logic, not style (use linters)

Ask questions, don't demand changes

Approve when concerns are addressed

Use review comments for discussion

Request changes for blocking issues

Branch Protection Rules

Require pull request reviews (1-2 reviewers)

Require status checks to pass (CI tests)

Require branches to be up to date

Restrict who can push to branch

Prevent force pushes

Require linear history (no merge commits)

Git Hooks

Client-side hooks run on local machine

Server-side hooks run on Git server

Common Hooks

pre-commit: Run linters, formatters before commit

commit-msg: Validate commit message format

pre-push: Run tests before pushing

post-merge: Update dependencies after pull

Example pre-commit hook:

bash
#!/bin/bash
# Run linting before commit
npm run lint
if [ $? -ne 0 ]; then
echo "Linting failed. Fix errors before committing."
exit 1
fi

Commit Message Conventions

Use conventional commits format

Format: <type>(<scope>): <subject>

Types: feat, fix, docs, style, refactor, test, chore

Keep subject under 50 characters

Use imperative mood ("Add feature" not "Added feature")

Examples:

feat(auth): add OAuth2 login support


fix(api): handle null response from database
docs(readme): update installation instructions
refactor(utils): simplify date formatting function

Collaboration Workflows

Fork and Pull Request

Fork repository to personal account

Clone forked repository

Create feature branch

Make changes and push to fork

Open pull request to original repository

Maintainers review and merge

Shared Repository

All contributors have write access

Create branches in same repository

Open pull requests for review


Merge after approval

Git Best Practices

Commit frequently with logical changes

Write meaningful commit messages

Keep commits focused (one logical change per commit)

Pull/fetch regularly to stay updated

Use .gitignore to exclude generated files

Never commit secrets or credentials

Use git rebase for clean history

Tag releases with semantic versioning

Create .gitattributes for consistent line endings

SECTION 2: AWS Services & Concepts


☁️ Core AWS Services
EC2 (Elastic Compute Cloud)
Instance Types

General Purpose (T3, M5): Balanced CPU/memory, web servers, small databases

Compute Optimized (C5, C6): High CPU, batch processing, gaming servers

Memory Optimized (R5, X1): Large datasets, in-memory databases, caching

Storage Optimized (I3, D2): High IOPS, data warehousing, log processing

Accelerated Computing (P3, G4): GPU instances for ML, graphics rendering

Instance Sizing

nano, micro, small, medium, large, xlarge, 2xlarge, etc.

T3 instances: Burstable performance (credits system)

Use AWS Compute Optimizer for right-sizing recommendations

Auto Scaling

Auto Scaling Groups (ASG)

Maintain desired capacity of instances


Scale based on metrics or schedule

Distribute instances across Availability Zones

Health checks and automatic replacement

Integration with ELB for load balancing

Launch Templates/Configurations

Define instance configuration (AMI, instance type, key pair)

Launch templates support versioning

Include user data for initialization scripts

Specify IAM instance profile for permissions

Scaling Policies

Target Tracking: Maintain metric at target value (e.g., 70% CPU)

Step Scaling: Scale in steps based on threshold breaches

Simple Scaling: Single scaling action per alarm

Scheduled Scaling: Scale at specific times (predictable load)

Predictive Scaling: ML-based forecasting of capacity needs

Security Groups

Virtual firewall for instances

Stateful (return traffic automatically allowed)

Rules specify protocol, port, and source/destination

Default deny all inbound, allow all outbound

Can reference other security groups as source

Best practice: Separate security groups by tier (web, app, database)

Example security group rules:

Allow HTTP (port 80) from 0.0.0.0/0

Allow HTTPS (port 443) from 0.0.0.0/0

Allow SSH (port 22) from corporate IP range

Allow PostgreSQL (port 5432) from application security group


S3 (Simple Storage Service)
Storage Classes

S3 Standard: Frequently accessed data, low latency, high durability

S3 Intelligent-Tiering: Automatic cost optimization, moves between tiers

S3 Standard-IA: Infrequent access, lower cost, retrieval fee

S3 One Zone-IA: Single AZ, 20% cheaper than Standard-IA

S3 Glacier Instant Retrieval: Archive with millisecond retrieval

S3 Glacier Flexible Retrieval: Archive, retrieval minutes to hours

S3 Glacier Deep Archive: Lowest cost, 12-hour retrieval

Bucket Policies

JSON-based resource policies attached to buckets

Control access for principals (users, accounts, services)

Can allow or deny actions (GetObject, PutObject, DeleteObject)

Use conditions for fine-grained control (IP address, MFA, encryption)

Example bucket policy:

json

{
"Version": "2012-10-17",
"Statement": [{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"IpAddress": {"aws:SourceIp": "203.0.113.0/24"}
}
}]
}

Lifecycle Rules

Automate transition between storage classes

Expire (delete) objects after specified time


Delete old versions in versioned buckets

Abort incomplete multipart uploads

Reduces storage costs automatically

Example lifecycle rule:

Transition to Standard-IA after 30 days

Transition to Glacier after 90 days

Expire after 365 days

Versioning

Keep multiple variants of objects

Protect against accidental deletion

Enable MFA Delete for additional protection

Previous versions retained until explicitly deleted

Suspended versioning stops creating new versions

Once enabled, cannot fully disable (only suspend)

S3 Best Practices

Enable versioning for critical data

Use lifecycle policies to optimize costs

Enable server access logging for audit trails

Implement bucket policies with least privilege

Use S3 Transfer Acceleration for faster uploads

Enable default encryption at bucket level

Use S3 Object Lock for compliance (WORM)

Monitor with CloudWatch metrics and S3 Storage Lens

VPC (Virtual Private Cloud)


VPC Components

Subnets

Subdivide VPC CIDR block

Public Subnet: Has route to Internet Gateway


Private Subnet: No direct internet access

Each subnet in one Availability Zone

Plan CIDR blocks carefully (cannot change after creation)

Route Tables

Control traffic routing for subnets

Each subnet associated with one route table

Routes define destination and target

Most specific route takes precedence

Local route (VPC CIDR) automatically created

Example routes:

10.0.0.0/16 → local (VPC traffic)

0.0.0.0/0 → igw-xxx (internet traffic)

192.168.0.0/16 → vgw-xxx (VPN traffic)

Internet Gateway (IGW)

Allows VPC resources to access internet

Horizontally scaled, redundant, highly available

One IGW per VPC

Requires route in route table (0.0.0.0/0 → IGW)

Requires public IP or Elastic IP on instance

NAT Gateway

Enable private subnet instances to access internet (outbound only)

Managed service, scales automatically

Deploy in public subnet

Create route in private subnet route table (0.0.0.0/0 → NAT Gateway)

Charged per hour and per GB processed

For multi-AZ, create NAT Gateway in each AZ

NAT Instance (legacy approach)

EC2 instance configured as NAT


Cheaper than NAT Gateway but requires management

Single point of failure unless configured for HA

Must disable source/destination check

VPC Peering

Private connection between two VPCs

Works across accounts and regions

Non-transitive (A-B and B-C doesn't mean A-C)

CIDR blocks must not overlap

Update route tables in both VPCs

No bandwidth bottleneck or single point of failure

VPC Endpoints

Private connection to AWS services without IGW/NAT

Gateway Endpoints: S3 and DynamoDB (no cost)

Interface Endpoints: Other services via PrivateLink (charged)

Traffic doesn't leave AWS network

Improve security by avoiding public internet

Network ACLs (NACLs)

Stateless firewall at subnet level

Rules evaluated in number order

Support allow and deny rules

Default NACL allows all traffic

Custom NACLs deny all by default

Separate inbound and outbound rules

VPC Flow Logs

Capture IP traffic information

Can be created at VPC, subnet, or ENI level

Publish to CloudWatch Logs or S3

Useful for troubleshooting connectivity issues


Analyze security group and NACL effectiveness

VPC Design Best Practices

Use multiple Availability Zones for high availability

Separate public and private subnets

Plan CIDR blocks for future growth

Use security groups as primary access control

Implement defense in depth (SG + NACL)

Use VPC endpoints to reduce data transfer costs

Enable VPC Flow Logs for security monitoring

Tag all VPC resources for cost allocation

RDS (Relational Database Service)


Supported Database Engines

PostgreSQL

MySQL

MariaDB

Oracle

SQL Server

Amazon Aurora (MySQL and PostgreSQL compatible)

Deployment Options

Single-AZ

One database instance in single Availability Zone

Lower cost, suitable for dev/test

Downtime during maintenance

Multi-AZ

Primary in one AZ, standby replica in another

Synchronous replication for high availability

Automatic failover (1-2 minutes)

No performance impact from replication


Use for production databases

Same endpoint after failover

Read Replicas

Asynchronous replication from primary

Read-only copies for scaling read traffic

Can be in same region or cross-region

Up to 5 read replicas per primary

Can be promoted to standalone database

PostgreSQL and MySQL support cascading replication

Backups

Automated Backups

Daily full backup during backup window

Transaction logs backed up every 5 minutes

Point-in-time recovery to any second

Retention period: 0-35 days (default 7)

Stored in S3, no additional charge

Deleted when DB instance deleted

Manual Snapshots

User-initiated database snapshots

Retained until explicitly deleted

Can copy across regions

Can share with other AWS accounts

Use for major changes or pre-production snapshots

Monitoring

CloudWatch Metrics

CPU Utilization

Database Connections

Free Storage Space


Read/Write IOPS

Read/Write Latency

Network Throughput

Enhanced Monitoring

Real-time OS metrics (1-60 second intervals)

Process and thread information

More granular than CloudWatch

Agent runs on DB instance

Additional cost per instance

Performance Insights

Visualize database load

Identify performance bottlenecks

Top SQL queries by load

Wait event analysis

Free for 7 days retention, charge for longer

RDS Best Practices

Enable Multi-AZ for production databases

Use read replicas to offload read traffic

Enable automated backups with appropriate retention

Create manual snapshots before major changes

Monitor CloudWatch metrics and set alarms

Use parameter groups for database configuration

Apply patches during maintenance windows

Enable encryption at rest for sensitive data

Use IAM database authentication where supported

Implement connection pooling in applications

ECS/EKS (Container Services)


ECS (Elastic Container Service)
Launch Types

EC2: Run containers on managed EC2 instances


More control over infrastructure

Can use Reserved Instances for cost savings

Requires cluster management

Fargate: Serverless container execution


No infrastructure management

Pay per task

Simpler to operate

Core Concepts

Task Definitions

Blueprint for application

Specifies Docker images, CPU, memory

Container port mappings

Environment variables and secrets

Volume definitions

IAM task role for permissions

Versioned (immutable once created)

Services

Maintain desired count of tasks

Integrate with load balancers (ALB/NLB)

Auto scaling based on metrics

Rolling updates with deployment configurations

Service discovery via AWS Cloud Map

Clusters

Logical grouping of tasks and services

Can contain EC2 instances or Fargate tasks

Use namespaces for resource isolation

ECS Best Practices


Use Fargate for simplicity, EC2 for cost optimization

Define resource limits (CPU, memory) accurately

Use task IAM roles instead of container credentials

Implement health checks in task definitions

Use service auto scaling for variable load

Enable container insights for monitoring

Use secrets management (Secrets Manager, Parameter Store)

Implement blue/green deployments for zero downtime

EKS (Elastic Kubernetes Service)

Architecture

Managed Kubernetes control plane

Deploy worker nodes as EC2 or Fargate

Integrates with AWS services (IAM, VPC, ELB)

Multi-AZ control plane for high availability

Automatic version updates and patching

Node Groups

Managed Node Groups: AWS manages EC2 instances


Automated updates and patching

Auto Scaling Group integration

One-click updates

Self-Managed Nodes: You manage EC2 instances


More control and customization

Use when specific configurations needed

Fargate Profile: Serverless pod execution


No node management

Pay per pod

Networking

Uses AWS VPC CNI plugin

Pods get IP addresses from VPC


Native AWS networking integration

Supports security groups for pods

IAM Integration

IAM roles for service accounts (IRSA)

Fine-grained permissions for pods

No need for node-level IAM credentials

Service Discovery

AWS Cloud Map for DNS-based discovery

CoreDNS for in-cluster service discovery

External DNS for external services

EKS Best Practices

Use managed node groups for easier operations

Implement cluster autoscaler for scaling nodes

Use IAM roles for service accounts (IRSA)

Deploy control plane across multiple AZs

Implement pod security policies

Use namespaces for workload isolation

Monitor with Container Insights and Prometheus

Regularly update cluster and node versions

Use AWS Load Balancer Controller for ingress

Implement network policies for pod-to-pod security

🔧 DevOps on AWS
CodePipeline
End-to-End CI/CD Orchestration

Automate release process from source to production

Visual pipeline designer in AWS Console

Integrates with AWS and third-party tools


Manual approval actions for gates

Parallel and sequential action execution

Pipeline Structure

Stages

Sequential phases (Source, Build, Test, Deploy)

Can have multiple actions per stage

Transitions between stages can be disabled

Failed stages stop pipeline execution

Actions

Individual tasks within stages

Run sequentially or in parallel

Input/output artifacts passed between actions

Action providers: CodeCommit, GitHub, Jenkins, CodeBuild, CodeDeploy

Source Stage Providers

AWS CodeCommit

GitHub/GitHub Enterprise

Amazon S3

Bitbucket Cloud

AWS ECR (Docker images)

Build Stage Providers

AWS CodeBuild

Jenkins

CloudBees

TeamCity

Deploy Stage Providers

AWS CodeDeploy

AWS Elastic Beanstalk

AWS ECS/EKS
AWS CloudFormation

AWS S3 (static websites)

Third-party deployment tools

Pipeline Execution

Triggered automatically on source changes

Manual execution via console or CLI

Scheduled execution via CloudWatch Events

Webhook triggers from external systems

Artifacts

Stored in S3 bucket

Passed between stages

Versioned automatically

Encrypted at rest

Best Practices

Use separate pipelines for environments

Implement automated testing at multiple stages

Use manual approval for production deployments

Enable artifact encryption

Monitor pipeline execution with CloudWatch

Use parameter overrides for environment-specific configs

Implement rollback mechanisms

Tag pipeline resources for cost tracking

CodeBuild
Fully Managed Build Service

Scales automatically (no build queue)

Pay per build minute

Pre-configured build environments

Custom Docker images supported


No build servers to manage

Buildspec File

YAML file defining build instructions

Must be named buildspec.yml (default)

Can specify alternate buildspec file

Defines build phases and commands

Build Phases

yaml

version: 0.2

phases:
install:
runtime-versions:
nodejs: 16
commands:
- echo Installing dependencies

pre_build:
commands:
- echo Running tests
- npm test

build:
commands:
- echo Building application
- npm run build

post_build:
commands:
- echo Build completed

artifacts:
files:
- '**/*'
base-directory: dist

cache:
paths:
- 'node_modules/**/*'
Build Environment

Compute types: small (3 GB), medium (7 GB), large (15 GB)

Operating systems: Ubuntu, Amazon Linux 2, Windows Server

Pre-installed tools: Docker, Git, AWS CLI, language runtimes

Custom Docker images for specific requirements

Environment Variables

Plaintext: Defined in build project

Parameter Store: Retrieve from Systems Manager

Secrets Manager: Retrieve sensitive values

Available to build commands

Caching

Cache dependencies to S3

Speeds up subsequent builds

Specify cache paths in buildspec

Local caching for Docker layers

Build Artifacts

Output files from build

Stored in S3

Can be encrypted

Used by subsequent pipeline stages

Integration with Testing Tools

Run unit tests in pre_build phase

Integration tests in post_build

Security scanning (SAST, dependency checking)

Code coverage reports

Publish test results to CodeBuild

Best Practices

Cache dependencies to speed up builds


Use appropriate compute size for build

Implement parallel builds for faster execution

Use VPC configuration for private resource access

Enable CloudWatch Logs for debugging

Use secrets management for credentials

Implement build badges for status visibility

Set timeout to prevent runaway builds

CodeDeploy
Automated Deployment Service

Deploy to EC2, Lambda, or ECS

Multiple deployment strategies

Automatic rollback on failure

Integration with load balancers

Deployment Strategies

In-Place Deployment

Update existing instances

Application stopped, new version installed

Brief downtime during deployment

Suitable for dev/test environments

Cannot deploy to immutable infrastructure

Blue/Green Deployment

Deploy to new set of instances (Green)

Test green environment

Shift traffic from old (Blue) to new (Green)

Keep blue environment for rollback

Zero downtime

Double resources during deployment

Deployment Configurations
EC2/On-Premises

CodeDeployDefault.AllAtOnce: Deploy to all instances simultaneously

CodeDeployDefault.HalfAtATime: 50% at a time

CodeDeployDefault.OneAtATime: One instance at a time

Custom: Define percentage or count

Lambda

Canary: Small percentage, then all at once

Linear: Traffic shifted in equal increments

All-at-once: Immediate shift

ECS

Linear: Shift traffic in equal increments

Canary: Shift percentage, then remaining

All-at-once: Immediate shift

AppSpec File

Defines deployment instructions

Lifecycle event hooks

Different format for EC2, Lambda, ECS

EC2 AppSpec Example

yaml
version: 0.0
os: linux
files:
- source: /
destination: /var/www/html
hooks:
BeforeInstall:
- location: scripts/install_dependencies.sh
timeout: 300
ApplicationStart:
- location: scripts/start_server.sh
timeout: 300
ApplicationStop:
- location: scripts/stop_server.sh
timeout: 300

Lifecycle Event Hooks

ApplicationStop: Stop application

DownloadBundle: Download revision

BeforeInstall: Pre-installation tasks

Install: Copy files (automatic)

AfterInstall: Post-installation tasks

ApplicationStart: Start application

ValidateService: Verify deployment

Rollback

Automatic rollback on:


Deployment failure

CloudWatch alarm threshold breach

Manual rollback via console or CLI

Redeploys last known good revision

Integration with Load Balancers

Deregister instances during deployment

Health checks validate successful deployment

Re-register instances after deployment


Works with ALB, NLB, Classic Load Balancer

Best Practices

Use blue/green for production deployments

Implement comprehensive health checks

Set appropriate timeout values

Use lifecycle hooks for validation

Enable automatic rollback on failure

Test AppSpec scripts thoroughly

Monitor deployments with CloudWatch

Use deployment groups for logical grouping

Implement gradual traffic shift for Lambda/ECS

Tag deployment groups for organization

Additional AWS DevOps Services


AWS CloudFormation

Infrastructure as Code using JSON/YAML templates

Declarative resource provisioning

Stack management (create, update, delete)

Change sets preview modifications

Drift detection identifies manual changes

Cross-stack references for dependencies

Nested stacks for modular templates

StackSets for multi-account/region deployment

AWS Systems Manager

Parameter Store

Store configuration data and secrets

Hierarchical key structure (/prod/db/password)

String, StringList, SecureString types

Free for standard parameters


Versioning and change history

IAM-based access control

Session Manager

Browser-based shell access to instances

No SSH keys or bastion hosts required

Session logging to S3 or CloudWatch

IAM-based access control

Works with on-premises servers

Automation

Pre-defined runbooks for common tasks

Custom automation documents

Scheduled or event-triggered execution

Patch management and maintenance windows

AWS CloudWatch

Metrics

Collect and track metrics from AWS services

Custom metrics from applications

1-minute or 5-minute granularity

Metric math for calculations

Detailed monitoring for EC2 (1-minute intervals)

Logs

Centralized log management

Log groups and log streams

Filter patterns for searching

Metric filters to create metrics from logs

Subscription filters for real-time processing

Export to S3 for archival

Alarms
Monitor metrics and trigger actions

States: OK, ALARM, INSUFFICIENT_DATA

Actions: SNS notifications, Auto Scaling, EC2 actions

Composite alarms for complex conditions

Dashboards

Visual representation of metrics

Multiple widgets (graphs, numbers, text)

Share across accounts

Real-time or historical data

AWS X-Ray

Distributed tracing for microservices

Request flow visualization

Identify performance bottlenecks

Analyze errors and exceptions

Service map showing dependencies

Trace sampling to control cost

Integration with Lambda, ECS, Elastic Beanstalk

🔐 Security & Compliance


IAM Best Practices
Account Security

Enable MFA for root account and privileged users

Never use root account for daily operations

Create IAM users with minimum necessary permissions

Use strong password policy

Rotate credentials regularly (90 days recommended)

Delete unused credentials and users

Roles and Permissions


Use IAM roles instead of long-term access keys

Grant least privilege - only required permissions

Use managed policies for common scenarios

Create custom policies for specific requirements

Review permissions regularly

Remove unused permissions

Service Control and Monitoring

Enable CloudTrail for all API activity logging

Use IAM Access Analyzer to identify unintended access

Monitor IAM credential usage reports

Set up alerts for suspicious activity

Use AWS Organizations for centralized control

Implement Service Control Policies (SCPs)

Application Security

Use IAM roles for EC2 instances (instance profiles)

Implement temporary credentials via STS

Use IAM roles for service accounts in EKS (IRSA)

Avoid embedding credentials in code

Use AWS SDK credential chain

Implement credential rotation for applications

Cross-Account Access

Use IAM roles for cross-account access

Implement external ID for third-party access

Require MFA for sensitive cross-account operations

Audit cross-account access regularly

Policy Management

Use policy conditions for fine-grained control

Implement permission boundaries for delegation


Test policies with IAM policy simulator

Version control policy documents

Document policy decisions and exceptions

Encryption at Rest and in Transit


Encryption at Rest

EBS Volumes

Encrypt via KMS when creating volume

Enable encryption by default for region

Encrypted volumes produce encrypted snapshots

Can copy unencrypted snapshot as encrypted

No performance impact from encryption

S3 Buckets

SSE-S3: S3-managed keys (AES-256)

SSE-KMS: KMS-managed keys (audit trail, key rotation)

SSE-C: Customer-provided keys (you manage keys)

Client-side encryption: Encrypt before upload

Enable default encryption at bucket level

Enforce encryption via bucket policy

RDS Databases

Enable encryption when creating DB instance

Cannot encrypt existing unencrypted DB

Transparent Data Encryption (TDE) for Oracle/SQL Server

Encrypted DB creates encrypted snapshots

Read replicas must have same encryption status

DynamoDB

Encryption at rest enabled by default

Uses AWS owned keys or KMS customer managed keys

Encrypts tables, indexes, streams, backups


EFS (Elastic File System)

Enable encryption at rest when creating file system

Uses KMS for key management

Cannot enable after creation

AWS KMS (Key Management Service)

Create and manage encryption keys

Customer managed keys (CMK) or AWS managed keys

Automatic key rotation (once per year)

Key policies for access control

CloudTrail logging of key usage

Envelope encryption for large data

Import your own keys (BYOK) supported

Encryption in Transit

TLS/SSL

Use HTTPS for all API communications

AWS services support TLS 1.2+

Use ACM (AWS Certificate Manager) for SSL certificates

Automatic certificate renewal

Integration with CloudFront, ALB, API Gateway

VPN Connections

Site-to-Site VPN for on-premises connectivity

IPsec encryption by default

Client VPN for remote user access

Uses TLS-based VPN protocol

Application-Level Encryption

Encrypt sensitive data in application code

Use AWS Encryption SDK for client-side encryption

Implement field-level encryption for specific data


Use HTTPS for web applications

Best Practices

Enable encryption by default where available

Use KMS customer managed keys for audit requirements

Rotate encryption keys regularly

Use separate keys for different data classifications

Implement encryption in all environments (dev, prod)

Document encryption standards and key management

Test disaster recovery with encrypted data

Secrets Management
AWS Secrets Manager

Store, retrieve, and rotate secrets

Automatic rotation for RDS, Redshift, DocumentDB

Custom Lambda function for other secret types

Versioning of secrets (AWSCURRENT, AWSPREVIOUS)

Fine-grained IAM policies for access control

Encryption using KMS

Cross-region secret replication

Charged per secret and API call

Use Cases

Database credentials

API keys and tokens

SSH keys

Application configuration

Automatic Rotation

Configure rotation schedule (days)

Uses Lambda function to update secret

Simultaneous use of old and new versions during rotation


RDS integration updates database and secret atomically

Retrieving Secrets

python

import boto3

client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId='prod/db/password')
secret = response['SecretString']

AWS Systems Manager Parameter Store

Store configuration data and secrets

Free tier available (standard parameters)

Advanced parameters for higher throughput

SecureString type encrypted with KMS

Hierarchical structure for organization

Versioning and change history

No automatic rotation (manual update)

Integration with CloudFormation and other services

Comparison: Secrets Manager vs Parameter Store

Secrets Manager: Automatic rotation, higher cost, designed for secrets

Parameter Store: No rotation, lower cost (free tier), general configuration

Best Practices

Never hardcode secrets in source code

Use Secrets Manager for secrets requiring rotation

Use Parameter Store for non-sensitive configuration

Implement least privilege access to secrets

Enable CloudTrail logging for secret access

Use VPC endpoints for private access

Rotate secrets regularly (even without automatic rotation)

Use different secrets for different environments


Implement secret scanning in CI/CD pipelines

Document secret naming conventions

Compliance & Auditing


AWS CloudTrail

Records all AWS API calls in account

Who, what, when, where for every action

Logs delivered to S3 bucket

Optional CloudWatch Logs integration

Enable for all regions

Validate log file integrity with digest files

Key Information Captured

Identity of API caller

Time of API call

Source IP address

Request parameters

Response elements

Best Practices

Enable in all regions (multi-region trail)

Enable log file validation

Encrypt logs with KMS

Set S3 bucket lifecycle policies for retention

Use CloudWatch Logs for real-time monitoring

Create metric filters for security events

Monitor for unusual API activity

Integrate with SIEM tools

AWS Config

Track resource configuration changes over time

Evaluate compliance against rules


Configuration history and snapshots

Relationship tracking between resources

Remediation actions for non-compliant resources

Config Rules

AWS managed rules for common checks

Custom rules using Lambda functions

Evaluate on configuration change or periodic

Examples: encrypted volumes, approved AMIs, required tags

Compliance Monitoring

Continuous compliance assessment

Dashboard showing compliance status

Aggregate data across accounts (Config aggregator)

Generate compliance reports

AWS GuardDuty

Intelligent threat detection service

Analyzes CloudTrail, VPC Flow Logs, DNS logs

Machine learning for anomaly detection

Identifies compromised instances, reconnaissance

Findings prioritized by severity

Integration with Security Hub and EventBridge

Threat Detection

Unusual API calls

Unauthorized deployments

Compromised instances (cryptocurrency mining, malware)

Account compromise

Reconnaissance activity

AWS Security Hub

Centralized security findings


Aggregates from GuardDuty, Inspector, Macie, IAM Access Analyzer

Security standards compliance (CIS, PCI-DSS)

Automated remediation with EventBridge

Cross-account aggregation

Priority-based findings

AWS Compliance Programs

SOC 1, 2, 3 reports

PCI DSS compliance

HIPAA eligible services

GDPR compliance support

ISO certifications

FedRAMP authorization

Audit Best Practices

Enable CloudTrail in all accounts

Configure AWS Config for compliance monitoring

Enable GuardDuty for threat detection

Use Security Hub for centralized view

Implement automated remediation

Regular security assessments

Document compliance controls

Train teams on security best practices

Conduct periodic security reviews

Use AWS Artifact for compliance reports

Network Security
Security Groups

Stateful firewall at instance level

Allow rules only (no deny rules)

Evaluate all rules before deciding


Can reference other security groups

Separate rules for inbound and outbound

Changes take effect immediately

Best Practices

Separate security groups by tier (web, app, db)

Use descriptive names and descriptions

Reference security groups instead of CIDR when possible

Minimize use of 0.0.0.0/0 for inbound rules

Regularly audit security group rules

Remove unused security groups

Network ACLs (NACLs)

Stateless firewall at subnet level

Allow and deny rules

Rules evaluated in number order (lowest first)

Explicit deny or allow

Separate rules for inbound and outbound

Apply to all instances in subnet

NACL vs Security Groups

NACLs: Subnet level, stateless, allow/deny

Security Groups: Instance level, stateful, allow only

Use both for defense in depth

AWS WAF (Web Application Firewall)

Protect web applications from exploits

Deploy on CloudFront, ALB, API Gateway, AppSync

Customizable rules for filtering traffic

Managed rule groups from AWS and partners

Protection Against

SQL injection
Cross-site scripting (XSS)

Geo-blocking

Rate limiting

IP reputation lists

Bot control

Rule Types

IP match conditions

String match conditions

Geo match conditions

Size constraint conditions

SQL injection match

Cross-site scripting match

AWS Shield

DDoS protection service

Shield Standard: Automatic, free, layer 3/4 protection

Shield Advanced: Enhanced protection, DDoS Response Team, cost protection

Shield Advanced Features

Advanced DDoS detection and mitigation

24/7 access to DDoS Response Team (DRT)

DDoS cost protection (scaling charges waived)

Integration with WAF at no extra cost

Protection for EC2, ELB, CloudFront, Route 53, Global Accelerator

VPN and Private Connectivity

Site-to-Site VPN for on-premises connection

AWS Direct Connect for dedicated connection

VPN over Direct Connect for encrypted dedicated line

Client VPN for remote user access

VPC peering for private VPC-to-VPC


Network Security Best Practices

Implement defense in depth (SG + NACL + WAF)

Use VPC Flow Logs for traffic analysis

Enable GuardDuty for threat detection

Implement least privilege network access

Segment network with multiple subnets

Use private subnets for resources without internet access

Enable Shield Advanced for critical applications

Regular security group audits

Monitor for unusual network patterns

Use AWS Firewall Manager for centralized rule management

Quick Reference Checklists


📋 Pre-Deployment Checklist
Code Quality

Code reviewed and approved


All tests passing (unit, integration, security)
No critical security vulnerabilities
Code coverage meets requirements
Linting and formatting checks passed
Dependencies updated and scanned

Infrastructure

Infrastructure code validated (terraform plan)


Resource limits and quotas checked
Auto-scaling policies configured
Monitoring and alerting set up
Log aggregation configured
Backup and restore tested

Security

Security groups reviewed and minimized


IAM policies follow least privilege
Secrets stored securely (no hardcoded credentials)
Encryption enabled (at rest and in transit)
SSL certificates valid and renewed
Compliance requirements met

Documentation

Deployment runbook updated


Rollback procedure documented
Architecture diagrams current
Configuration changes documented
Known issues and workarounds listed
Contact information for escalation

Communication

Stakeholders notified of deployment window


Change management ticket approved
Deployment announcement sent
On-call team informed
Rollback decision-makers identified

Validation

Smoke tests prepared


Health check endpoints verified
Performance baseline established
Disaster recovery plan tested
Monitoring dashboards ready
Success criteria defined

🔒 Security Checklist
Identity and Access

MFA enabled for all privileged accounts


Root account not used for daily operations
IAM users follow least privilege principle
Unused IAM credentials removed
Cross-account access properly configured
Service roles used instead of access keys

Data Protection
Encryption at rest enabled (EBS, S3, RDS)
Encryption in transit enforced (TLS/SSL)
KMS keys properly managed
Secrets stored in Secrets Manager or Parameter Store
No credentials in source code or logs
Data backup and retention policies configured

Network Security

Security groups follow least privilege


NACLs configured for subnet protection
VPC Flow Logs enabled
WAF rules configured for web applications
DDoS protection enabled (Shield)
Private subnets for internal resources

Monitoring and Compliance

CloudTrail enabled in all regions


CloudWatch alarms for security events
AWS Config rules for compliance
GuardDuty enabled for threat detection
Security Hub aggregating findings
Regular security audits scheduled

Application Security

Input validation implemented


SQL injection prevention in place
XSS protection configured
CSRF tokens used
Rate limiting implemented
Security headers configured

🚀 Deployment Best Practices


Version Control

Use semantic versioning (MAJOR.MINOR.PATCH)

Tag releases in Git

Maintain changelog for each release

Branch protection rules enforced


Code review required before merge

Testing Strategy

Unit tests (>80% coverage)

Integration tests for critical paths

Security scanning (SAST/DAST)

Performance testing under load

Chaos engineering for resilience

Deployment Strategies

Blue/Green: Zero downtime, easy rollback, double cost temporarily

Canary: Gradual rollout, risk mitigation, requires monitoring

Rolling: Incremental updates, maintains capacity, slower deployment

Rollback Plan

Keep previous version artifacts

Database migration rollback scripts

Quick rollback procedure documented

Monitoring for rollback triggers

Communication plan for failures

Post-Deployment

Monitor error rates and latency

Verify all features working

Check logs for anomalies

Validate database migrations

Confirm backup success

Update documentation

🎯 Common Interview Topics


Technical Deep Dives
CI/CD Pipeline Design
Explain stages: Source → Build → Test → Deploy

Discuss artifact management between stages

Describe automated testing strategy

Explain deployment strategies (blue/green, canary, rolling)

Detail rollback mechanisms

Blue-Green vs Rolling Deployment

Blue/Green: Deploy to new environment, switch traffic, instant rollback

Rolling: Update instances incrementally, maintains capacity, gradual rollout

Trade-offs: Cost, downtime, rollback speed, complexity

Terraform State Management

State file tracks infrastructure

Remote backend (S3 + DynamoDB) for collaboration

State locking prevents concurrent modifications

Sensitive data in state requires encryption

State file should never be manually edited

Docker Multi-Stage Builds

Separate build and runtime stages

Reduces final image size

Only necessary files in production image

Improves security and performance

Example: Build stage with dev dependencies, runtime stage with only production files

Kubernetes Scaling Strategies

Horizontal Pod Autoscaler (HPA): Scale pods based on CPU/memory/custom metrics

Vertical Pod Autoscaler (VPA): Adjust pod resource requests/limits

Cluster Autoscaler: Scale worker nodes based on pending pods

Manual Scaling: kubectl scale for testing or planned events

IAM Policy Evaluation

Explicit Deny wins (immediate rejection)


Explicit Allow required from at least one policy

Implicit Deny if no explicit allow

Evaluation order: SCPs → Permission Boundaries → Identity/Resource Policies

VPC Design Patterns

Multi-tier architecture (public, private, database subnets)

Multi-AZ deployment for high availability

NAT Gateway per AZ for redundancy

VPC endpoints for AWS service access

Network segmentation with security groups and NACLs

RDS High Availability

Multi-AZ for automatic failover (1-2 minutes)

Read replicas for read scaling (not for HA)

Automated backups with point-in-time recovery

Manual snapshots before major changes

Cross-region replication for disaster recovery

Scenario-Based Questions
"How would you design a highly available web application?"

Multi-AZ deployment across at least 2 AZs

Application Load Balancer distributing traffic

Auto Scaling Group maintaining instance count

RDS Multi-AZ for database high availability

ElastiCache for session management and caching

S3 for static content with CloudFront CDN

Route 53 for DNS with health checks

CloudWatch monitoring and alarms

"Explain your approach to zero-downtime deployments"

Use blue/green or rolling deployment strategy

Implement health checks at load balancer


Connection draining during instance replacement

Database migrations backward compatible

Feature flags for gradual feature rollout

Comprehensive monitoring and automated rollback

Canary deployments to test with small traffic percentage

"How do you secure secrets in CI/CD pipelines?"

Store in AWS Secrets Manager or Parameter Store

Use IAM roles for service access (no hardcoded credentials)

Encrypt secrets at rest with KMS

Rotate secrets regularly

Audit secret access with CloudTrail

Use environment-specific secrets

Never commit secrets to version control

Implement secret scanning in pipelines

"Describe troubleshooting a production incident"

Check monitoring dashboards and alarms

Review recent deployments and changes

Analyze application and system logs

Check CloudWatch metrics for anomalies

Verify infrastructure health (EC2, RDS, load balancers)

Use distributed tracing (X-Ray) for request flow

Check VPC Flow Logs for network issues

Escalate to appropriate teams if needed

Document findings and resolution

"How would you optimize AWS costs?"

Right-size instances using Compute Optimizer

Use Reserved Instances or Savings Plans

Implement auto-scaling to match demand

S3 lifecycle policies to cheaper storage classes


Delete unused resources (EBS volumes, snapshots)

Use Spot Instances for fault-tolerant workloads

CloudWatch to identify underutilized resources

Cost allocation tags for visibility

Scheduled scaling for predictable patterns

📚 Key Concepts Summary


Infrastructure as Code
Terraform for multi-cloud provisioning

CloudFormation for AWS-native IaC

State management critical for team collaboration

Modules for reusable infrastructure components

Version control all infrastructure code

Container Orchestration
Docker for containerization and portability

Kubernetes for complex orchestration needs

ECS/Fargate for AWS-native container services

Helm for Kubernetes package management

Health checks and resource limits essential

CI/CD Principles
Automate everything (build, test, deploy)

Fast feedback loops for developers

Automated testing at multiple stages

Deployment strategies for risk mitigation

Monitoring and observability built-in

AWS Security
Least privilege IAM policies

Encryption everywhere (at rest and in transit)


Secrets management with dedicated services

Network isolation with VPCs and security groups

Continuous monitoring and compliance

High Availability
Multi-AZ deployments for redundancy

Auto Scaling for capacity management

Load balancing for traffic distribution

Database replication and backups

Disaster recovery planning and testing

Monitoring and Observability


CloudWatch for metrics and logs

X-Ray for distributed tracing

Custom metrics for business KPIs

Alarms for proactive issue detection

Dashboards for visibility

💡 Pro Tips for Interviews


Technical Communication

Start with high-level overview, then dive into details

Use diagrams when explaining architecture

Mention trade-offs for design decisions

Relate answers to real-world scenarios

Ask clarifying questions before answering

Demonstrating Experience

Share specific examples from past projects

Explain challenges faced and how you solved them

Discuss lessons learned from failures

Mention tools and technologies you've used


Show understanding of best practices

Problem-Solving Approach

Clarify requirements and constraints

Consider multiple solutions

Evaluate pros and cons

Recommend solution with justification

Discuss implementation steps

AWS-Specific Tips

Know the differences between similar services

Understand pricing models and cost optimization

Be familiar with AWS Well-Architected Framework

Stay updated on new AWS services and features

Mention AWS documentation and best practices

Common Mistake Patterns to Avoid

Don't just list features, explain use cases

Don't ignore security considerations

Don't forget monitoring and observability

Don't overlook cost implications

Don't skip disaster recovery planning

🔧 Troubleshooting Guide
EC2 Instance Issues
Cannot Connect via SSH

Check security group allows port 22 from your IP

Verify instance has public IP (if accessing from internet)

Confirm key pair is correct

Check NACL rules allow SSH traffic

Verify instance is in running state


Check route table for internet gateway route

High CPU Utilization

Check CloudWatch metrics for spike patterns

Review running processes (top, htop)

Analyze application logs for errors

Consider instance type upgrade

Implement auto-scaling if variable load

Check for resource-intensive queries or operations

Instance Status Checks Failing

System status check: AWS infrastructure issue (stop/start instance)

Instance status check: OS or instance issue (reboot or investigate)

Review system logs via console

Check for kernel panics or system errors

Container Issues
Docker Container Won't Start

Check container logs (docker logs)

Verify image exists and is accessible

Check port conflicts with host

Verify sufficient disk space

Review Dockerfile for errors

Check resource limits (memory, CPU)

Kubernetes Pod CrashLoopBackOff

Describe pod for error messages (kubectl describe pod)

Check pod logs (kubectl logs)

Verify image pull secrets for private registries

Check resource requests and limits

Verify ConfigMaps and Secrets exist

Review liveness and readiness probes


ECS Task Failing to Start

Check task definition for errors

Verify IAM task role has required permissions

Check security group allows required ports

Review CloudWatch logs for errors

Verify sufficient resources in cluster

Check ECR repository permissions

Database Issues
RDS Connection Timeouts

Verify security group allows port 3306 (MySQL) or 5432 (PostgreSQL)

Check NACL rules

Confirm instance is available

Verify endpoint and port are correct

Check VPC routing tables

Ensure client is in correct VPC/subnet

RDS High CPU or IOPS

Identify slow queries with Performance Insights

Review missing indexes

Check for lock contention

Consider read replicas for read-heavy workloads

Optimize queries and database schema

Scale up instance class if needed

Database Backup Failures

Check backup window configuration

Verify sufficient storage for backups

Review IAM permissions for RDS

Check automated backup retention period

Review RDS events for error messages


Network Issues
Cannot Reach Internet from Private Subnet

Verify NAT Gateway exists in public subnet

Check route table has 0.0.0.0/0 → NAT Gateway

Confirm NAT Gateway has Elastic IP

Verify security groups allow outbound traffic

Check NACL rules allow return traffic

ALB Returns 502 Bad Gateway

Check target group health checks

Verify targets are healthy and registered

Confirm security groups allow traffic

Review target application logs

Check for connection timeouts

Verify target group protocol and port

VPC Peering Not Working

Verify peering connection is active

Check route tables in both VPCs

Confirm CIDR blocks don't overlap

Verify security groups allow traffic

Check NACLs in both VPCs

CI/CD Pipeline Issues


CodeBuild Failing

Review build logs in CodeBuild console

Check buildspec.yml syntax

Verify IAM permissions for CodeBuild role

Check environment variables and secrets

Verify sufficient build timeout

Review VPC configuration if accessing private resources


CodeDeploy Deployment Failing

Review deployment logs

Check AppSpec file syntax

Verify IAM permissions for CodeDeploy

Check instance tags match deployment group

Review lifecycle hook scripts for errors

Verify CodeDeploy agent is running on instances

Terraform Apply Failing

Review error message carefully

Check for resource conflicts or duplicates

Verify AWS credentials and permissions

Ensure state file is not locked

Review terraform plan output

Check for API rate limiting

📖 Additional Resources
Official Documentation
AWS Documentation: https://docs.aws.amazon.com

Terraform Documentation: https://www.terraform.io/docs

Kubernetes Documentation: https://kubernetes.io/docs

Docker Documentation: https://docs.docker.com

AWS Training and Certification


AWS Skill Builder: Free digital training

AWS Solutions Architect Associate

AWS DevOps Engineer Professional

AWS Certified Developer Associate

Best Practice Frameworks


AWS Well-Architected Framework
12-Factor App Methodology

CNCF Cloud Native Principles

DevOps Research and Assessment (DORA) Metrics

Community Resources
AWS re:Post (community Q&A)

Stack Overflow

GitHub repositories and examples

AWS Blog and What's New feed

Continuous Learning
Stay updated with AWS announcements

Participate in AWS events and webinars

Practice with hands-on labs

Build personal projects

Contribute to open-source projects

📝 Final Notes
Interview Preparation Strategy
1. Week 1-2: Review core concepts and services

2. Week 3-4: Hands-on practice with AWS console and CLI

3. Week 5-6: Build sample projects demonstrating skills

4. Week 7: Mock interviews and scenario practice

5. Week 8: Review weak areas and final preparation

During the Interview


Listen carefully to questions

Think before answering

Admit if you don't know something

Show problem-solving approach

Ask questions about the role and team


Be enthusiastic and genuine

After the Interview


Send thank-you email within 24 hours

Reflect on questions you struggled with

Continue learning and improving

Follow up appropriately on timeline

Key Success Factors


Strong fundamentals: Understanding core concepts deeply

Hands-on experience: Practical application of knowledge

Problem-solving skills: Ability to troubleshoot and debug

Communication: Clearly explaining technical concepts

Continuous learning: Staying current with technologies

🎓 Conclusion
This guide covers the essential topics for AWS DevOps interviews and serves as a daily reference for common
tasks and best practices. Remember that interviews assess not just technical knowledge but also:

Problem-solving approach

Communication skills

Collaborative mindset

Learning agility

Cultural fit

Key Takeaways:

1. Master the fundamentals before diving into advanced topics

2. Practice hands-on with real AWS services

3. Understand the "why" behind best practices

4. Be prepared to discuss trade-offs in design decisions

5. Share real-world experiences and lessons learned

Good luck with your interviews and DevOps journey!


Document Version: 1.0
Last Updated: November 2025
For personal use and interview preparation

You might also like