MODULE 5 CC TSNOTESUntitled Document
MODULE 5 CC TSNOTESUntitled Document
Module 5
Part 1- Cloud Platforms in Industry Amazon web services: - Compute services, Storage services,Communication services, Additional services.
Google AppEngine: - Architecture and core concepts,Application life cycle, Cost model, Observations.
Part 2- Cloud Applications: Scientific applications: - HealthCare: ECG analysis in the cloud, Biology: gene expression data analysis for cancer
diagnosis, Geoscience: satellite image processing. Business and consumer applications: CRM and ERP, Social networking, media applications.
9.1 Amazon web services: Amazon Web Services (AWS) is a comprehensive cloud platform that enables
the development of flexible and scalable applications by providing: Elastic infrastructure scalability,
Messaging capabilities, Data storage solutions.
AWS is accessible through SOAP or RESTful Web Service interfaces and offers a web-based console
for:
● Managing administration and monitoring of resources
● Tracking expenses on a pay-as-you-go basis
1. Compute Services:
• Amazon EC2 (Elastic Compute Cloud):
■ Type: Infrastructure as a Service (IaaS).
■ Functionality: Enables deployment of virtual servers (instances) based on predefined
Amazon Machine Images (AMIs).
■ Instance Types: Standard, micro, high-memory, high-CPU, and cluster instances to
cater to various workloads.
■ Pricing: Hourly rates; includes cost-efficient spot instances with dynamic pricing
based on demand.
2. Storage Solutions:
• Amazon S3 (Simple Storage Service):
■ Functionality: Provides scalable, durable, and highly available object storage for
storing and retrieving data.
• Amazon Elastic Block Store (EBS): Persistent block storage for use with EC2 instances.
• Amazon Glacier: Archival storage for long-term data retention.
3. Database and Caching Services:
• Amazon RDS (Relational Database Service):
■ Fully managed database supporting engines like MySQL, PostgreSQL, Oracle, and
more.
• Amazon ElastiCache:
■ Managed in-memory caching service to accelerate application performance.
Other Services
● AWS Lambda: Serverless computing for running code without provisioning or managing servers.
● AWS IoT: Solutions for connecting and managing IoT devices.
● AWS Machine Learning and AI: Prebuilt tools for building AI and ML applications.
Key Benefits
Compute services are foundational to cloud computing, and AWS's Amazon EC2 (Elastic Compute
Cloud) serves as a cornerstone by providing an Infrastructure as a Service (IaaS) model.
AMIs are templates used to create EC2 instances. They define the operating system, application stack,
and other configurations necessary for launching a virtual machine.
Features of AMIs
1. Storage Location: AMIs are stored in Amazon S3 and identified by a unique ID (ami-xxxxxx)
along with a manifest XML file.
2. Components:
○ Amazon Ramdisk Image (ARI): Specifies the RAM disk configuration (ari-yyyyyy).
○ Amazon Kernel Image (AKI): Defines the kernel settings (aki-zzzzzz).
3. Creation and Customization:
○ AMIs can be created from scratch or bundled from an existing EC2 instance.
○ Common practice involves:
■ Launching an instance from a base AMI.
■ Installing required software and making configuration changes.
■ Saving the instance as a new AMI using AWS tools.
○ Once created, the AMI is stored in an S3 bucket.
Amazon Elastic Compute Cloud (Amazon EC2) is a web-based service that provides scalable,
resizable compute capacity in the cloud. It enables users to launch and manage virtual servers,
known as instances, with complete control over their configuration, including operating systems,
software, CPU, memory, storage, and networking.
Instance Configuration
1. Performance Metrics:
○ Instances use EC2 Compute Units (ECUs) to quantify CPU performance.
○ ECUs provide a consistent measure of compute power, equivalent to a 1.0–1.2 GHz 2007
Opteron or Xeon processor.
○ This abstraction ensures consistent performance even as AWS upgrades its hardware.
2. Instance Classes:
○ Standard Instances: General-purpose configurations for most applications, with varying
levels of compute, storage, and memory.
○ Micro Instances: Designed for lightweight applications needing low compute power with
occasional bursts. Suitable for small web applications with limited traffic.
○ High-Memory Instances: Focused on memory-intensive applications like high-traffic web
services. These instances offer increasing memory proportional to CPU power.
○ High-CPU Instances: Optimized for compute-heavy workloads, providing higher compute
power relative to memory.
○ Cluster Compute Instances: Provide high compute power, memory, and extremely high
network and I/O performance for HPC (High-Performance Computing).
○ Cluster GPU Instances: Feature GPUs for tasks requiring heavy graphics computations,
such as rendering or scientific simulations.
Pricing Models
1. On-Demand Instances:
○ Charged hourly at a fixed rate depending on the instance type.
○ Billing begins at the start of each hour of usage, and users are charged for the entire hour.
2. Spot Instances:
○ Pricing is dynamic and based on available capacity.
○ Users specify a maximum price they are willing to pay. The instance runs as long as the
spot price is below this threshold.
○ Cost-effective for workloads tolerant to interruptions.
Storage and Persistence
1. Ephemeral Storage:
○ The default storage attached to an instance is temporary and data is lost when the
instance shuts down.
2. Elastic Block Store (EBS):
○ Provides persistent block storage that retains data even after instance termination.
○ EBS volumes can be attached to instances for additional storage needs.
Flexibility
● Users can override the default AKI and ARI configurations for specialized needs.
● Instances can be managed through:
○ The AWS Console, which provides a user-friendly interface for managing resources.
○ Command-Line Tools, enabling scripting and automation.
Advantages of EC2
1. Elasticity: Resources scale dynamically to meet workload demands, optimizing cost and
performance.
2. Cost-Effectiveness: Pay-as-you-go pricing eliminates the need for upfront hardwareinvestments.
3. Flexibility: Wide variety of instance types and configurations for different use cases.
4. Consistency: ECUs ensure uniform performance regardless of underlying hardware upgrades.
5. Global Accessibility: Available across multiple regions and zones, ensuring low latency and
redundancy.
EC2 empowers users with scalable compute capacity tailored to applications ranging from simple web
hosting to complex HPC and big data processing.
The EC2 environment forms the virtual infrastructure within which Amazon EC2 instances operate. It
provides critical functionalities, including resource allocation, networking, and security, to support the
hosting of applications in a scalable and controlled manner.
Amazon Web Services (AWS) extends its compute capabilities beyond basic EC2 instances and AMIs to
include advanced services that streamline application deployment, management, and data processing.
1. AWS CloudFormation
● Purpose: Simplifies complex deployments by introducing templates (JSON files) that describe
the resources and relationships needed to run applications.
● Key Features:
○ Links EC2 instances and defines their dependencies.
○ Integrates seamlessly with other AWS services (e.g., S3, SimpleDB, SQS, SNS, Route
53, Elastic Beanstalk).
○ Provides a declarative way to build and manage cloud infrastructure.
● Purpose: A scalable cloud platform for running MapReduce applications using Hadoop.
● Key Features:
○ Uses EC2 instances for compute and Amazon S3 for storage.
○ Supports Hadoop-related tools like Pig and Hive.
○ Enables dynamic cluster resizing and selection of EC2 configurations (e.g., Small, High-
Memory, High-CPU, Cluster Compute, Cluster GPU).
○ Offers pre-built web applications for quick data-intensive operations.
Amazon S3 (Simple Storage Service) is a scalable, durable, and highly available object storage solution
offered by AWS. It allows users to store and retrieve data efficiently, making it a core component of
AWS's storage ecosystem.
9.1.2.1 S3 Key Concepts
Storage Hierarchy and Buckets: S3 organizes its storage into buckets, which cannot be subdivided
further. This structure resembles a flat system, unlike traditional file systems where directories can be
created. However, users can simulate directories by including slashes ("/") in object names.
Immutability of Objects: Once objects are uploaded to S3, they cannot be modified, renamed, or moved.
If changes are needed, the object must be deleted and uploaded again. This reflects S3's focus on
immutability and efficient storage.
Eventual Consistency: S3 is designed to be eventually consistent. This means updates or deletions might
not be immediately visible globally, leading to temporary inconsistencies across different regions.
Request Failures:Due to the large-scale distributed nature of S3, requests may occasionally fail. This is
not a persistent failure but a result of the infrastructure's complexity.
RESTful Web Services: S3 interacts with clients through RESTful web services, meaning all operations
are performed via HTTP requests (GET, PUT, DELETE, HEAD, POST). Each operation corresponds to
actions like adding, retrieving, or removing data.
● Buckets are accessed using URIs, and there are three naming conventions for buckets:
○ Canonical form: http://s3.amazonaws.com/bucket_name/ — Less restrictive with naming.
○ Subdomain form: http://bucketname.s3.amazonaws.com/ — Preferred form with more strict
naming rules.
○ Virtual hosting form: http://bucket-name.com/ — Custom URLs for buckets via DNS
configuration.
Object Naming:
● Objects in a bucket are accessed as part of a URI, and can be represented in any of the above three
forms, depending on the chosen bucket addressing style. The object names are treated as part of the
resource path after the bucket reference.
● Access control lists (ACLs) and other metadata like server logging for buckets can be appended to
the object’s URI, but metadata is not directly accessible via a URI. For example:
○ ACL: http://s3.amazonaws.com/bucket_name/object_name?acl
○ Server logging: http://s3.amazonaws.com/bucket_name?logging
1. Buckets in S3:
● Bucket Overview: Buckets are the top-level containers for objects in S3. They cannot be nested,
meaning there are no "subbuckets" or physical directories within them. Each bucket exists in a
specific geographic region and is eventually replicated for fault tolerance and efficient content
distribution.
● Bucket Creation and Deletion: A bucket is created with a PUT request, specifying its name and
optionally its preferred geographic location. Once created, the bucket cannot be renamed or
relocated—if you need to change a bucket's name or location, you must delete it and create a
new one. A bucket can only be deleted if it is empty.
● Object Storage: All objects stored in the bucket must be kept within the same availability zone of
the bucket.
● Object Creation: An object is created in S3 via a PUT request, specifying the object’s name,
content, and properties. The maximum size for an object is 5 GB.
● Immutability: Once an object is uploaded, it cannot be modified, renamed, or moved to another
bucket. To change an object, you must delete the original and upload a new one.
● Object Metadata: Metadata provides additional information about an object and can be system-
defined (e.g., file type) or user-defined (custom tags). Metadata is passed during the object
creation process and can be retrieved with GET or HEAD requests.
● Access Control Policies (ACPs): S3 uses Access Control Policies (ACPs) to manage who can
access a bucket or object. These policies are written in XML format and grant permissions to
specific users or groups. Permissions include:
○ READ: Retrieve an object and its metadata.
○ WRITE: Add, modify, or delete an object.
○ READ_ACP: View the ACP of a resource.
○ WRITE_ACP: Modify the ACP of a resource.
○ FULL_CONTROL: All of the above permissions.
● Grantees: Users can be granted permissions by using their canonical IDs or email addresses. S3
also provides predefined groups like all users, authenticated users, and log delivery users.
● Default Access Control: By default, the owner has full control over a resource, and this can be
modified later.
● Signed URLs: For fine-grained control, especially for non-authenticated users, S3 supports
signed URLs, which provide temporary access to a resource for a limited time.
4. Advanced Features:
● Server Access Logging: S3 allows logging of all access requests to a bucket and its objects. This
can be enabled by issuing a PUT request with a logging configuration, including the target bucket
for logs and a prefix for log file names.
● BitTorrent Integration: S3 can expose objects to the BitTorrent network, allowing them to be
downloaded via the BitTorrent protocol. To enable this, you append ?torrent to the object URI,
and the object’s ACP must grant read permissions to everyone.
EBS provides persistent storage for EC2 instances, offering block-level volumes that can be mounted
when the EC2 instance starts. These volumes can be formatted according to the needs of the instance
(e.g., raw storage, file systems).
● Persistent Storage for EC2: EBS provides block-level storage volumes that can be attached to
EC2 instances. The volumes persist even after the instance lifecycle and are backed by Amazon
S3.
● Volume Types and Features:
○ Volumes can be up to 1 TB.
○ Volumes are usually in the same availability zone as the EC2 instance to optimize
performance but can also be in different zones.
○ Volumes can be lazily loaded, reducing network I/O requests.
○ Can clone volumes, use them as boot partitions, or resize them (if the file system allows).
● Pricing: Costs depend on storage size and the number of I/O requests:
○ $0.10/GB/month for storage.
○ $0.10 per 1 million I/O requests.
Amazon provides three structured storage services to support enterprise applications: Preconfigured
EC2 AMIs, Amazon RDS, and Amazon SimpleDB. Here’s a concise breakdown of each solution:
Preconfigured EC2 AMIs (Amazon Machine Images) are predefined templates with installations of
specific database management systems. These AMIs are designed to simplify the process of setting up
EC2 instances that require a particular database. Users can combine these AMIs with an EBS (Elastic
Block Store) volume for storage persistence. Some of the available AMIs include:
● IBM DB2, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, Sybase, Vertica
Instances created from these AMIs are priced based on the EC2 hourly cost model. However, using
these AMIs places most of the administrative burden on the EC2 user. The user is responsible for
configuring, maintaining, and managing the relational database.
Amazon RDS is a managed relational database service that runs on EC2 infrastructure. Unlike
preconfigured EC2 AMIs, RDS simplifies database management by handling several
administrative tasks like high availability, patch management, and failover strategies. Key
features include:
RDS is available for MySQL and Oracle databases, and users can choose between on-demand or
reserved instances for pricing. Reserved instances offer discounted hourly rates for long-term
commitments (one to three years).
C. Amazon SimpleDB
Amazon SimpleDB is a lightweight, scalable, and flexible data storage solution for applications that do
not require a full relational database. It is designed to handle semi-structured data using the concepts
of domains, items, and attributes. Some key characteristics of SimpleDB include:
● Domains: Similar to tables in relational databases but with less structure. Each domain can store
up to 10 GB of data.
● Items and Attributes: Items are records with attributes stored as key-value pairs. Items do not
have to adhere to the same column structure, allowing flexibility in data storage.
● Eventual Consistency: Data may not be immediately consistent across all copies. Changes
made to an item may not be visible to all readers in the short term but will converge over time.
This model supports high scalability.
● Conditional Operations: Allows conditional inserts and deletes to prevent lost updates in multi-
writer scenarios.
SimpleDB is particularly useful for applications that need to perform querying and indexing without
requiring transactional consistency.
Pricing:
D. Amazon CloudFront
Amazon CloudFront is a Content Delivery Network (CDN) that optimizes the delivery of static and
dynamic content by caching it on a global network of edge servers. This reduces latency by serving content
from servers closer to the user. Here's a breakdown of its features and pricing:
Key Features:
1. Global Edge Network: CloudFront operates a worldwide network of edge servers, ensuring faster
content delivery by caching data at locations nearer to the user.
2. Content Types: It supports both static content (e.g., images, JavaScript files) and dynamic
streaming content (e.g., video).
3. Origin Servers: Content can originate from various sources, including Amazon S3 buckets, EC2
instances, or even external servers.
4. Access Control: CloudFront allows for granular control over content access through features like
signed URLs and geo-blocking.
5. Content Invalidation: You can invalidate cached content at edge servers before it expires, forcing
the cache to refresh with updated content.
Cost Structure:
• Requests:
• HTTP requests: $0.0075 per 10,000 requests.
• HTTPS requests: $0.0100 per 10,000 requests.
• Data Transfer:
• First 10 TB: $0.120 per GB.
• Next 40 TB: $0.080 per GB.
CloudFront's cost is determined based on the volume of requests and the amount of data transferred, with
a lower rate for larger amounts of data.
In summary, Amazon CloudFront is a CDN service optimized for fast, global delivery of content,
whether static or streaming, and is highly effective for handling large amounts of frequently accessed
content.
Amazon Communication Services provide tools for structuring and facilitating communication between
AWS applications and services. These services are categorized into virtual networking and
messaging:
1. Virtual Networking
• Amazon VPC (Virtual Private Cloud): Offers customizable virtual private networks within AWS.
It supports various templates like public subnets, isolated networks, private networks (with NAT),
and hybrid configurations. Connectivity control between services like EC2 and S3 can be
managed via IAM.
○ Cost: $0.50 per connection hour.
● Amazon Direct Connect: Provides dedicated high-bandwidth connections between user
networks and AWS, ensuring consistent performance. It supports EC2, S3, and VPC, and is ideal
for high-bandwidth scenarios.
○ Ports: 1 Gbps at $0.30/hour, 10 Gbps at $2.25/hour.
○ Traffic: Inbound is free, outbound costs $0.02 per GB.
● Amazon Route 53: Dynamic DNS service that allows users to map their own domain names to
AWS resources (EC2 instances, S3 buckets). It dynamically updates as resources are launched
or created.
○ Cost: $1 per hosted zone per month, plus $0.50 per million queries (first billion queries),
$0.25 after that.
9.1.3.2 Messaging
AWS provides three messaging services to facilitate communication between applications: Amazon SQS,
Amazon SNS, and Amazon SES.
● Amazon SQS (Simple Queue Service): A disconnected model for message exchange using queues.
Applications send messages to queues, and other applications can access them securely. Messages
are stored temporarily and locked while being processed to avoid duplication.
● Amazon SNS (Simple Notification Service): A publish-subscribe model for connecting applications.
Unlike SQS, SNS notifies applications when new content is available. Applications can publish
messages to a topic, and subscribers (via HTTP/HTTPS, email, or SQS) receive notifications
automatically.
● Amazon SES (Simple Email Service): A scalable email service that allows users to send emails via
AWS infrastructure. After verifying an email address, users can send both SMTP-compliant or raw
emails. SES provides delivery status updates and analytics to improve email campaigns.
Pricing: All three services use a pay-as-you-go model, with no minimum commitment.
● Data transfer-in is free, while data transfer-out is charged based on usage ranges.
AWS offers additional services to enhance user experience and manage their AWS infrastructure:
● Amazon CloudWatch: Provides statistics to monitor and optimize applications hosted on AWS. It collects
data from services like EC2, S3, SimpleDB, and CloudFront, allowing developers to analyze service usage
and improve application efficiency. CloudWatch is now available for free to all AWS users, helping to reduce
costs and enhance performance.
● Amazon Flexible Payment Service (FPS): A billing infrastructure that allows AWS users to sell goods and
services to other AWS users. FPS supports various payment models, including one-time payments, periodic
payments for subscriptions, and usage-based transactions, simplifying the payment process for developers.
Google AppEngine is a Platform-as-a-Service (PaaS) that allows developers to build, deploy, and scale web
applications on Google’s cloud infrastructure. It handles server management, scaling, and resource
allocation, so developers can focus on coding.
Key Components
Key Benefits
● Dynamic Scaling: Automatically adjusts to handle traffic.
● Built-in Services: Simplifies development with ready-to-use tools.
● Sandboxed Security: Applications run securely in isolated environments.
● Easy Deployment: Local testing and SDK tools streamline the process.
● Monitoring and Cost Control: Built-in tools for performance tracking and billing.
● Monitoring and Cost Control: Built-in tools for performance tracking and billing.
Google AppEngine simplifies cloud-based application development with automatic scaling, a secure
runtime, and built-in tools for storage, task automation, and monitoring, making it ideal for modern web
applications.
2. with a neat diagram explicate google app engine platform architecture. Diagram go up^
Google AppEngine is designed to provide a secure, scalable, and developer-friendly environment for
hosting web applications. Below is an explanation of its key components:
This architecture allows Google AppEngine to efficiently host web applications, scale resources
dynamically, and provide a secure environment with a variety of built-in services for ease ofdevelopment.
UrlFetch
MemCache
● Purpose: Provides communication capabilities for web applications via email and
messaging.
● Mail:
○ Send and receive emails on behalf of the application.
○ Allows attachments and multiple recipients.
○ Asynchronous operation with error notifications for delivery failures.
● XMPP (Extensible Messaging and Presence Protocol):
○ Send and receive chat messages via services like Google Talk.
○ Useful for connecting with chat bots or creating administrative consoles.
Account Management
Image Manipulation
Compute Services
For computations that cannot be completed within the standard web request timeframe,
AppEngine provides Task Queues and Cron Jobs to handle long-running or time-specific
operations.
Task Queues
● Purpose: Allows deferred execution of tasks that can't be handled in the same web
request.
● Features:
○ Supports up to 10 queues with configurable execution rates.
○ Handles task failures by retrying them automatically.
Cron Jobs
● Purpose: Schedules tasks to run at specific times, ideal for periodic operations or
maintenance tasks.
● Features:
○ Executes tasks at designated times, without retrying in case of failure.
○ Useful for sending periodic notifications or maintenance operations.
Google AppEngine provides comprehensive support throughout the application life cycle, which
includes Development, Testing, Deployment, and Monitoring. This is achieved through the
use of Software Development Kits (SDKs), with the primary options being the Java SDK and
Python SDK.
1. Java SDK:
• Supports Java 5 and Java 6 runtime environments.
• Developers can use Eclipse IDE with the Google AppEngine plugin to build, test, and deploy
Java applications.
• Includes servlet support and other tools for web application development.
2. Python SDK:
• Supports Python 2.5 and includes GoogleAppEngineLauncher for app management and
deployment.
• Provides a user interface for managing apps, viewing logs, and monitoring resource usage.
• Includes the webapp framework for AppEngine, with support for other frameworks like Django.
1. Deployment:
• After development and testing, applications can be deployed to AppEngine via a simple
command-line tool or development environment.
• Developers create a unique application identifier (e.g., http://application-
id.appspot.com), which is used to locate the app on the web.
• A custom domain can also be mapped for commercial use.
• AppEngine automatically manages resources and availability post-deployment.
2. App Identifier:
• A unique application identifier is mandatory for identifying the app within the AppEngine system.
• Developers register this identifier through the AppEngine console.
3. Administrative Console:
• The administrative console provides tools for tracking resource usage (CPU, bandwidth),
managing application logs, and configuring billing settings.
• It allows developers to view logs, manage multiple versions, and track the app's health.
1. Free Service:
○ AppEngine offers a free service with limited quotas that reset every 24 hours.
○ Developers can set up a billing account when the app is ready for production to unlock
additional resources and incur charges based on usage.
2. Quotas:
○ AppEngine uses different types of quotas to manage resource consumption:
■ Billable Quotas: Set by the administrator based on the daily budget.
■ Fixed Quotas: Set by AppEngine to ensure stable performance and prevent
interference between apps.
■ Per-Minute Quotas: Limit short-term resource consumption to avoid service
interruptions.
○ Exceeding a quota results in errors (e.g., HTTP 403 for CPU time or bandwidth).
3. Quota Types:
○ Resources are divided into free and billing-enabled quotas, each with daily limits and
maximum rates.
○ When quotas are exhausted, users receive errors, and resource access is suspended
until the quota resets.