Module 1
Module 1
INTRODUCTION • In mid-2000 Amazon introduced Amazon Web Services (AWS), based on the IaaS delivery model.
MODULE 1 • Infrastructure-as-a-Service (IaaS) In this model the cloud service provider offers an infrastructure consisting of compute and storage
servers interconnected by high-speed networks that support a set of services to access these
• Platform-as-a-Service (PaaS) resources.
• Picasa is a tool to upload, share, and edit images; it provides 1 GB of disk space per user free of
charge. Users can add tags to images and attach locations to photos using Google Maps. Google
Groups allows users to host discussion forums to create messages online or via email
CLOUD COMPUTING:GOOGLE PERSPECTIVE CLOUD COMPUTING: AZURE AND ONLINE SERVICES CLOUD COMPUTING: AZURE AND ONLINE SERVICES
• Google is also a leader in the Platform-as-a-Service (PaaS) space. AppEngine is a developer • Azure and Online Services are, respectively, PaaS and SaaS cloud platforms from Microsoft. Windows • Compute, which provides a computation environment; Storage for scalable storage; and Fabric
platform hosted on the cloud. Initially it supported only Python, but support for Java was added later. Azure is an operating system, SQL Azure is a cloud-based version of the SQL Server, and Azure Controller, which deploys, manages, and monitors applications; it interconnects nodes consisting of
The database for code development can be accessed with Google Query Language (GQL) with a AppFabric (formerly .NET Services) is a collection of services for cloud applications. servers, high-speed connections, and switches
SQL-like syntax
• Windows Azure has three core components
• The Content Delivery Network (CDN) maintains cache copies of data to speed up computations. The
• Google Base is a service allowing users to load structured data from different sources to a central Connect subsystem supports IP connections between the users and their applications running on
repository that is a very large, self-describing, semi-structured, heterogeneous database. It is self Windows Azure. The API interface to Windows Azure is built on REST, HTTP, and XML
describing because each item follows a simple schema: (item type, attribute names).
• The computations carried out by an application are implemented as one or more roles; an
• Google Drive is an online service for data storage that has been available since April 2012. It gives application typically runs multiple instances of a role. We can distinguish
users 5 GB of free storage and charges $4 per month for 20 GB
(i) Web role instances used to create Web applications;
(ii) Worker role instances used to run Windows-based code; and
• Google has also redefined the laptop with the introduction of the Chromebook, a purely Web-
(iii) VM role instances that run a user-provided Windows Server 2008 R2 image
centric device running Chrome OS. Cloud-based applications, extreme portability, built-in 3G
connectivity, almost instant-on, and all-day battery life are the main attractions of this device with a
keyboard
OPEN-SOURCE SOFTWARE PLATFORMS FOR PRIVATE OPEN-SOURCE SOFTWARE PLATFORMS FOR PRIVATE
CLOUD COMPUTING: AZURE AND ONLINE SERVICES CLOUDS CLOUDS
• Scaling, load balancing, memory management, and reliability are ensured by a fabric controller. The
• Open source cloud computing platforms such as Eucalyptus, OpenNebula, and Nimbus can be used • Eucalyptus supports several operating systems including CentOS 5 and 6, RHEL 5 and 6, Ubuntu 10.04
fabric controller decides where new applications should run; it chooses the physical servers to
as a control infrastructure for a private cloud. LTS, and 12.04 LTS.
optimize utilization using configuration information uploaded with each Windows Azure application.
• Schematically, a cloud infrastructure carries out the following steps to run an application: • The components of the system are:
• The configuration information is an XML-based description of how many Web role instances, how
many Worker role instances, and what other resources the application needs. The fabric controller • Retrieves the user input from the front end. • Virtual machine. Runs under several VMMs, including Xen, KVM, and Vmware.
uses this configuration file to determine how many VMs to create.
• Retrieves the disk image of a VM from a repository. • Node controller. Runs on every server or node designated to host a VM and controls the activities
of the node. Reports to a cluster controller.
• Locates a system and requests the VMM running on that system to set up a VM
• Blobs, tables, queues, and drives are used as scalable storage. A blob contains binary data; a • Cluster controller. Controls a number of servers. Interacts with the node controller on each server
• Invokes the DHCP7 and the IP bridging software to set up a MAC and IP address for the VM.
container consists of one or more blobs. Blobs can be up to a terabyte and they may have to schedule requests on that node. Cluster controllers are managed by the cloud controller.
associated metadata (e.g., the information about where a JPEG photograph was taken).
• Cloud controller. Provides the cloud access to end users, developers, and administrators. It is
accessible through command-line tools compatible withEC2 and through a Web-based
Dashboard. Manages cloud resources, makes high-level scheduling decisions, and interacts with
• The Microsoft Azure platform currently does not provide or support any distributed parallel computing
cluster controllers.
frameworks, such as MapReduce, Dryad, or MPI, other than the support for implementing basic
queue-based job scheduling
OPEN-SOURCE SOFTWARE PLATFORMS FOR PRIVATE OPEN-SOURCE SOFTWARE PLATFORMS FOR PRIVATE
CLOUDS CLOUDS
• The procedure to construct a virtual machine:
• Storage controller. Provides persistent virtual hard drives to applications. It is the correspondent of • The eucatools front end is used to request a VM.
EBS. Users can create snapshots from EBS volumes. Snapshots are stored in Walrus and made
available across availability zones. • The VM disk image is transferred to a compute node.
• Storage service (Walrus). Provides persistent storage and, similarly to S3, allows users to store • This disk image is modified for use by the VMM on the compute node.
objects in buckets. • The compute node sets up network bridging to provide a virtual network interface controller
(NIC) with a virtual Media Access Control (MAC) address.
• The system supports a strong separation between the user space and the administrator space; users • In the head node the DHCP is set up with the MAC/IP pair.
access the system via a Web interface, whereas administrators need root access. • VMM activates the VM.
• The user can now ssh directly into the VM.
• The system supports a decentralized resource management of multiple clusters with multiple cluster
controllers, but a single head node for handling user interfaces. It implements a distributed storage
system.
OPEN-SOURCE SOFTWARE PLATFORMS FOR PRIVATE OPEN-SOURCE SOFTWARE PLATFORMS FOR PRIVATE OPEN-SOURCE SOFTWARE PLATFORMS FOR PRIVATE
CLOUDS CLOUDS CLOUDS
• Open-Nebula is a private cloud with users actually logging into the head node to access cloud • Nimbus is a cloud solution for scientific applications based on the Globus software. • Table summarizes the features of the three systems.
functions.
• The system is centralized and its default configuration uses NFS (Network File System).
• The system inherits from Globus the image storage, the credentials for user authentication, and
• The procedure to construct a virtual machine consists of several steps: the requirement that a running Nimbus process can ssh into all compute nodes.
(i) the user signs into the head node using ssh;
(ii) the system uses the onevm command to request a VM; • Customization in this system can only be done by the system administrators
(iii) the VM template disk image is transformed to fit the correct size and configuration within
the NFS directory on the head node;
(iv) the oned daemon on the head node uses ssh to log into a compute node;
(v) the compute node sets up network bridging to provide a virtual NIC with a virtual MAC; • The conclusions of the comparative analysis are as follows: Eucalyptus is best suited for a large
corporation with its own private cloud because it ensures a degree of protection from user malice
(vi) the files needed by the VMM are transferred to the compute node via the NFS; and mistakes.
(vii) the VMM on the compute node starts the VM; and • OpenNebula is best suited for a testing environment with a few servers.
(viii) the user is able to ssh directly to the VM on the compute node • Nimbus is more adequate for a scientific community less interested in the technical internals of
the system than with broad customization requirements
CLOUD STORAGE DIVERSITY AND VENDOR LOCK-IN CLOUD STORAGE DIVERSITY AND VENDOR LOCK-IN CLOUD STORAGE DIVERSITY AND VENDOR LOCK-IN
• A solution to guarding against the problems posed by the vendor lock-in is to replicate the data to • The disk controller distributes the sequential blocks of data to the physical disks and computes a
• There are several risks involved when a large organization relies solely on a single cloud provider. multiple cloud service providers. Straightforward replication is very costly and, at the same time, parity block by bit-wise XOR-ing of the data blocks.
Cloud services may be unavailable for a short or even an extended period of time. Such an poses technical challenges.
interruption of service is likely to negatively impact the organization. The potential for permanent
• Another solution could be based on an extension of the design principle of a RAID-5 system used
data loss in case of a catastrophic system failure poses an equally great danger. • The parity block is written on a different disk for each file to avoid the bottleneck possible when all
for reliable data storage.
parity blocks are written to a dedicated disk, as is done in the case of RAID-4 systems.
• A RAID-5 system uses block-level stripping with distributed parity over a disk array, as shown in
• Last but not least, a Cloud Service Provider (CSP) may decide to increase the prices for service Figure
and charge more for computing cycles, memory, storage space, and network bandwidth than • This technique allows us to recover the data after a single disk loss.
other CSPs. The alternative in this case is switching to another provider. Unfortunately, this solution • For example, if Disk 2 in Figure is lost, we still have all the blocks of the third file, c1, c2, and c3, and
could be very costly due to the large volume of data to be transferred from the old to the new we can recover the missing blocks for the others as follows:
provider.
• a2 = (a1) XOR (aP) XOR (a3)
• b2 = (b1) XOR (bP) XOR (b3)
• Reliability is a major concern, and here we discuss a solution that addresses both avoidance of
vendor lock-in and storage reliability. • d1 = (dP) XOR (d2) XOR (d3)
SERVICE- LEVEL AGREEMENTS SERVICE- LEVEL AGREEMENTS SERVICE- LEVEL AGREEMENTS
• A service-level agreement (SLA) is a negotiated contract between two parties, the customer and • An SLA records a common understanding in several areas: • The common metrics specified by an SLA are service-specific.
the service provider. The agreement specifies the services that the customer receive rather than
(i) services,
how the service provider delivers the services.
(ii) priorities, • For example, the metrics used by a call center usually are:
(iii) responsibilities, (i) abandonment rate: percentage of calls abandoned while waiting to be answered;
• The objectives of the agreement are:
(iv) guarantees, and (ii) average speed to answer: average time before the service desk answers a call;
• Identify and define customers’ needs and constraints, including the level of resources, security,
timing, and quality of service. (v) warranties. (iii) time service factor: percentage of calls answered within a definite time frame;
• Provide a framework for understanding. A critical aspect of this framework is a clear definition (iv) first-call resolution: percentage of incoming calls that can be resolved without a callback;
of classes of service and costs. and
• An agreement usually covers: services to be delivered, performance, tracking and reporting,
• Simplify complex issues; for example, clarify the boundaries between the responsibilities of the problem management, legal compliance and resolution of disputes, customer duties and (v) turnaround time: time to complete a certain task
clients and those of the provider of service in case of failures. responsibilities, security, handling of confidential information, and termination.
• Reduce areas of conflict. • The common metrics specified by an SLA are service-specific. For example, the metrics used by a
call center usually are: (i) abandonment rate: percentage of calls abandoned while waiting to
• Encourage dialogue in the event of disputes.
be answered; (ii) average speed to answer: average time before the service desk answers a call;
• Eliminate unrealistic expectations. (iii) time service factor: percentage of calls answered within a definite time frame; (iv) first-call
resolution: percentage of incoming calls that can be resolved without a callback; and (v)
turnaround time: time to complete a certain task
• Identity theft and privacy were major concerns for about half of the users questioned; availability,
liability, and data ownership and copyright were raised by a third of respondents.
The application programming interface (API) to the ZooKeeper service is very simple and consists of • (1) An application starts a master instance and M
• • MapReduce is based on a very simple idea for parallel processing of data-intensive
seven operations: worker instances for the Map phase and, later, R
applications supporting arbitrarily divisible load sharing.
worker instances for the Reduce phase.
• create – add a node at a given location on the tree. • First, split the data into blocks, assign each block to an instance or process, and run • (2) The master partitions the input data in M segments.
• delete – delete a node. these instances in parallel.
• (3) Each Map instance reads its input data segment
• get data – read data from a node. • Once all the instances have finished, the computations assigned to them start the and processes the data.
• set data – write data to a node. second phase: Merge the partial results produced by individual instances • (4) The results of the processing are stored on the local
• get children – retrieve a list of the children of the node. • MapReduce is a programming model inspired by the Map and the Reduce primitives of disks of the servers where the Map instances run.
the LISP programming language. • (5) When all Map instances have finished processing
• synch – wait for the data to propagate.
• It was conceived for processing and generating large data sets on computing clusters. As their data, the R Reduce instances read the results of
• The system also supports the creation of ephemeral nodes, which are nodes that are created when a the first phase and merge the partial results.
session starts and deleted when the session ends. a result of the computation, a set of input pairs <key, value> is transformed into a set of
output pairs <key, value>. • (6) The final results are written by the Reduce instances
to a shared storage server.
• For example, one can process logs of Web page requests and count the URL access
frequency. The Map function outputs the pairs <URL, 1> and the Reduce function produces • (7) The master instance monitors the Reduce instances
and, when all of them report task completion, the
the pairs <URL, totalcount>.
application is terminated.