0% found this document useful (0 votes)
6 views6 pages

Module 5 CC Simplified

Module 5 discusses the features and capabilities of cloud and grid platforms, highlighting their similarities and differences in resource management, data functionalities, and programming tools. It covers challenges in cloud programming, the MapReduce programming model, and various cloud services like AWS, Google App Engine, and Microsoft Azure. Additionally, it addresses data management strategies, consistency models, and storage architectures relevant to cloud computing.

Uploaded by

Srushti Reddy V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Module 5 CC Simplified

Module 5 discusses the features and capabilities of cloud and grid platforms, highlighting their similarities and differences in resource management, data functionalities, and programming tools. It covers challenges in cloud programming, the MapReduce programming model, and various cloud services like AWS, Google App Engine, and Microsoft Azure. Additionally, it addresses data management strategies, consistency models, and storage architectures relevant to cloud computing.

Uploaded by

Srushti Reddy V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 5

6.1 Features of Cloud and Grid Platforms

 This section provides an overview of essential features in real-world cloud and


grid platforms, encompassing their capabilities, traditional attributes, data
functionalities, and characteristics relevant for programmers and runtime
systems.
 A foundational understanding of Service-Oriented Architecture (SOA) and web
services is beneficial for readers.

6.1.1 Cloud Capabilities and Platform Features

 Commercial clouds offer cost-effective utility computing with elastic scalability


for resources.
 They provide an increasing array of "Platform as a Service" (PaaS) capabilities.
 Examples:
o Azure: Features like Azure Table, queues, blobs, Database SQL, web, and
Worker roles.
o Amazon (AWS): Primarily an Infrastructure as a Service (IaaS) provider, but
continuously adds platform features such as SimpleDB, queues, notification,
monitoring, content delivery network, relational database, and MapReduce
(Hadoop).
o Google (GAE): Google App Engine is a PaaS designed for web applications.

6.1.2 Traditional Features Shared by Grids and Clouds

 Clouds and grids are both distributed computing paradigms sharing common
features, despite serving potentially different purposes.
 Grids: Emphasize resource sharing and coordinated problem-solving within
dynamic virtual organizations (VOs), prioritizing resource virtualization,
provisioning, and scheduling across VOs.
 Clouds: Focus on high-throughput computing for scalable services from
massive data centers and utility computing, providing virtualized hardware and
software infrastructure as services.
 Shared Features: Both enable the aggregation of significant resources, support
Quality of Service (QoS), utilize common web services for communication, and
provide dynamic resource provisioning and service management.

6.1.3 Data Features and Databases

 Data-Intensive Computing: Both grids and clouds are designed to manage


large-scale data and support data-intensive computing, though their approaches
to data services vary.
 Data Management in Grids: Grids federate distributed data sources using
protocols like GridFTP for data transfer and replica management, often dealing
with scientific datasets and distributed databases.
 Data Management in Clouds: Clouds offer Storage as a Service, managing
massive datasets from web services and social networks, supporting parallel
data processing frameworks (e.g., MapReduce) and various data models.
 NoSQL Databases: Key-value stores (e.g., SimpleDB, BigTable, Cassandra),
graph databases, column-oriented databases, object-oriented databases, and
XML databases are all utilized for cloud-scale data.
 Data Privacy: A crucial consideration for both paradigms.

6.1.4 Features for Programmers and Runtime Systems

 Cloud Programming Tools: Modern cloud applications necessitate new


programming tools that abstract distributed hardware details and optimize for
parallel execution, distributed storage, and fault tolerance.
 MapReduce: A widely adopted parallel programming model (originating from
Google) for processing large datasets on clusters, simplifying parallel
programming.
 SOA and Web Services: Fundamental for enabling cloud and grid applications,
supporting interoperability and service abstraction.
 Programming Models: Include cloud-specific API/SDKs (e.g., AWS EC2
APIs), Remote Procedure Calls (RPC), message queues, REST
(Representational State Transfer) for web service APIs, and scripting languages.
 Runtime Systems: Responsible for managing resource allocation, scheduling,
fault tolerance, and monitoring.
 Virtual Appliances: Pre-configured VM images with entire software stacks for
"out-of-the-box" deployment, reducing software dependence on the hosting
environment.

6.2 Cloud Programming Challenges

 Scalability: Complexities in scaling applications and managing data across


thousands of machines.
 Concurrency: Challenges in handling multiple concurrent users and requests
effectively.
 Data Consistency: Difficulty in maintaining data consistency across distributed
replicas.
 Fault Tolerance and High Availability: Designing resilient systems that
provide continuous service even in the face of failures.
 Security and Privacy: Protecting data and applications within a shared, multi-
tenant environment.
 Debugging and Testing: Inherently more difficult for distributed cloud
applications compared to monolithic ones.
 Interoperability: Lack of standardized APIs and data formats can lead to
vendor lock-in.

6.3 MapReduce Programming Model

 Purpose: A programming model and its implementation for processing and


generating large datasets, highly suitable for big data processing on commodity
hardware.
 Map Phase: A user-defined Map function processes a key/value pair to
generate intermediate key/value pairs.
 Shuffle and Sort Phase: Intermediate values with the same key are grouped
together.
 Reduce Phase: A user-defined Reduce function processes the grouped values,
aggregating them into a smaller set of merged values.
 Fault Tolerance: The framework automatically handles node failures by re-
executing failed tasks.
 Scalability: Highly scalable to thousands of nodes.
 Simplified Parallelism: Abstracts away the complexities of distributed
programming, allowing developers to focus on the core Map and Reduce logic.

6.3.1 Hadoop for Cloud Computing

 Open-Source Implementation: Hadoop is an open-source implementation of


MapReduce and the Google File System (GFS).
 Components:
o Hadoop Distributed File System (HDFS): A highly fault-tolerant distributed
file system designed for commodity hardware, providing high-throughput
access to application data.
o MapReduce Engine: The core processing framework.
 Applications: Utilized for big data analytics, log processing, web indexing, and
data warehousing.
 Scalability and Fault Tolerance: HDFS achieves high fault tolerance by
replicating data across multiple nodes. The MapReduce engine handles task
failures by re-executing them.

6.3.2 Aneka Framework

 Purpose: A .NET-based PaaS framework for building and deploying distributed


applications on various clouds, supporting multiple distributed programming
models.
 Programming Models: Supports MapReduce, Task Programming (sequential
tasks), and Thread Programming (concurrent tasks).
 Cloud Middleware: Provides APIs and tools to streamline the development and
deployment of cloud applications.
 Resource Provisioning: Capable of dynamically provisioning resources on
both private and public clouds.
 Virtual Appliances: Integrates VMs and P2P network virtualization into self-
configuring "virtual appliances" for easy deployment of homogeneously
configured virtual clusters across heterogeneous, wide-area distributed systems.

6.4.1 Dryad and DryadLINQ

 Dryad: A Microsoft Research project, serving as a general-purpose runtime for


executing parallel and distributed computations on clusters, representing
computations as directed acyclic graphs (DAGs).
 DryadLINQ: A language extension for .NET's LINQ that enables developers to
write Dryad computations using a high-level, declarative style, abstracting low-
level distributed programming details.

6.5.1 Cloud Data Management

 Challenges: Managing the immense amounts of data generated by cloud


applications.
 Data Storage Solutions:
o Relational Databases (SQL): Traditional choice, but often struggle with
scalability for cloud workloads.
o NoSQL Databases: Designed for scalability, flexibility, and availability in
distributed environments. Examples include key-value stores (e.g., DynamoDB,
Redis), document databases (e.g., MongoDB, Couchbase), column-family
databases (e.g., Cassandra, HBase), and graph databases (e.g., Neo4j).
 Data Consistency Models: Different consistency models (e.g., strong
consistency, eventual consistency) are chosen based on application
requirements.

6.5.2 Data Consistency

 Strong Consistency: All replicas are updated simultaneously, providing an


immediate consistent view to all clients, though this can reduce availability and
increase latency in distributed systems.
 Eventual Consistency: Replicas eventually converge to a consistent state,
offering higher availability and lower latency but potentially returning stale data
for a period. Suitable for applications where immediate consistency is not
critical (e.g., social media feeds).

6.5.3 Data Availability


 High Availability (HA): Ensuring data accessibility even during failures,
achieved through replication, redundancy, and failover mechanisms.
 Partitions: Distributed databases often partition data across multiple nodes to
enhance scalability and availability.

6.5.4 Cloud Storage Architectures

 Object Storage: Data stored as objects with metadata and a unique identifier;
highly scalable, durable, and cost-effective for large unstructured data (e.g.,
AWS S3, Google Cloud Storage).
 Block Storage: Data stored in fixed-size blocks, similar to traditional hard
drives; offers high performance and low latency, suitable for databases and
applications requiring frequent I/O (e.g., AWS EBS, Azure Disk Storage).
 File Storage: Provides a hierarchical file system interface over a network,
suitable for shared file access (e.g., NFS, Amazon EFS).

6.6.1 Cloud Services

 Cloud services are typically provided in layers: Infrastructure as a Service


(IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).
 SaaS: Applications delivered over the internet (e.g., Salesforce, Gmail).
 PaaS: Platforms for developing, running, and managing applications without
building and maintaining the infrastructure (e.g., Google App Engine, Heroku).
 IaaS: Virtualized computing resources over the internet (e.g., AWS EC2, Azure
Virtual Machines).

6.6.2 Amazon Web Services (AWS)

 Leading Cloud Provider: A pioneer and leading provider of public cloud


services, primarily in the IaaS model.
 Key Services: Includes EC2 (Elastic Compute Cloud) for compute capacity, S3
(Simple Storage Service) for object storage, EBS (Elastic Block Store) for block
storage, RDS (Relational Database Service) for managed relational databases,
Lambda for serverless compute, and VPC (Virtual Private Cloud) for isolated
network sections.
 Elasticity and Scalability: Supports auto-scaling and elastic load balancing for
automatic capacity adjustment.

6.6.3 Google App Engine (GAE)

 PaaS Offering: A PaaS for developing and hosting web applications in


Google's data centers.
 Supported Languages: Supports various programming languages (e.g., Python,
Java, [Link], Go, PHP, Ruby).
 Managed Services: Provides built-in services for data storage (Datastore), user
authentication, caching, and task queues.
 Automatic Scaling: Automatically scales applications based on traffic.

6.6.4 Microsoft Azure

 Comprehensive Cloud Platform: Offers a broad range of cloud services,


including IaaS, PaaS, and SaaS capabilities.
 Services: Provides virtual machines, storage, databases (SQL Database,
Cosmos DB), networking, analytics, AI/ML, IoT, and developer tools.
 Hybrid Cloud Capabilities: Strong support for hybrid cloud deployments,
allowing integration with on-premises infrastructure.

6.6.5 OpenStack

 Open-Source Cloud Software: A collection of open-source software tools for


building and managing cloud computing platforms for both public and private
clouds.
 Modular Architecture: Comprises multiple interoperable components that
control pools of compute, storage, and networking resources.
 Key Components: Includes Nova (compute), Swift (object storage), Cinder
(block storage), Neutron (networking), Keystone (identity), and Glance (image
service).
 API Compatibility: Provides APIs compatible with Amazon EC2 and S3 for
easier migration and interoperability.
 Flexibility: Highly flexible and customizable for specific organizational needs.

6.6.6 Cloudsim

 Simulation Toolkit: A widely used open-source toolkit for simulating cloud


computing environments.
 Purpose: Enables researchers and developers to model and simulate cloud
components (data centers, hosts, VMs, applications, resource provisioning
policies) to test algorithms and strategies without deploying them on real
hardware.
 Features: Supports modeling of IaaS, PaaS, and SaaS, dynamic provisioning,
energy-aware resource management, and various scheduling policies.

You might also like