0% found this document useful (0 votes)
33 views35 pages

Cloud Computing Lab-Manual

The document is a lab manual focused on cloud computing, detailing experiments on cloud concepts, application creation in Salesforce, and case studies on Platform-as-a-Service (PaaS) with Google App Engine. It covers the definition, architecture, and service models of cloud computing, including IaaS, PaaS, and SaaS, along with deployment models like public, private, community, and hybrid clouds. Additionally, it outlines essential characteristics of cloud computing and provides step-by-step instructions for creating applications in Salesforce's Force.com.

Uploaded by

kandamadhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views35 pages

Cloud Computing Lab-Manual

The document is a lab manual focused on cloud computing, detailing experiments on cloud concepts, application creation in Salesforce, and case studies on Platform-as-a-Service (PaaS) with Google App Engine. It covers the definition, architecture, and service models of cloud computing, including IaaS, PaaS, and SaaS, along with deployment models like public, private, community, and hybrid clouds. Additionally, it outlines essential characteristics of cloud computing and provides step-by-step instructions for creating applications in Salesforce's Force.com.

Uploaded by

kandamadhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

lOMoARcPSD|158 824 50

Lab Manual
SUBJECT INDEX

Sr. No. Title


1 Introduction to cloud computing.

Creating a Warehouse Application in SalesForce.com.


2

Case Study: PAAS (Facebook, Google App Engine)


3

Case Study: Amazon Web Services.


4
Subject: Lab I- Cloud Computing

Experiment No. 1

Aim: To study in detail about cloud computing.

Theory:

The term cloud has been used historically as a metaphor for the Internet. This
usage was originally derived from its common depiction in network diagrams as an outline of
a cloud, used to represent the transport of data across carrier backbones (which owned the
cloud) to an endpoint location on the other side of the cloud. This concept dates back as early
as 1961, when Professor John McCarthy suggested that computer time-sharing technology
might lead to a future where computing power and even specific applications might be sold
through a utility-type business model. 1 This idea became very popular in the late 1960s, but
by the mid-1970s the idea faded away when it became clear that the IT-related technologies of
the day were unable to sustain such a futuristic computing model. However, since the turn of
the millennium, the concept has been revitalized. It was during this time of revitalization that
the term cloud computing began to emerge in technology circles. Cloud computing is a model
for enabling convenient, on-demand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management effort or service provider
interaction .A Cloud is a type of parallel and distributed system consisting of a collection of
inter-connected and virtualized computers that are dynamically provisioned and presented as
one or more unified computing resource(s) based on service-level agreements established
through negotiation between the service provider and consumers.

When you store your photos online instead of on your home computer, or use
webmail or a social networking site, you are using a cloud computing service. If you are in an
organization, and you want to use, for example, an online invoicing service instead of
updating the in-house one you have been using for many years, that online invoicing service is
a ―cloud computing service. Cloud computing is the delivery of computing services over the
Internet. Cloud services, Allow individuals and businesses to use software and hardware that
are managed by third parties at remote locations. Examples of cloud services include online
file storage, social networking sites, webmail, and online business applications. The cloud
computing model allows access to information and computer resources from anywhere. Cloud
computing provides a shared pool of resources, including data storage space, networks,
Computer processing power, and specialized corporate and user applications.

Architecture

Cloud Service Models

 Cloud Deployment Models

 Essential Characteristics of Cloud Computing

NIST Visual Model of Cloud Computing Definition

Cloud Service Models


 Cloud Software as a Service (SaaS)

 Cloud Platform as a Service (PaaS)

 Cloud Infrastructure as a Service (IaaS)


Infrastructure as a Service (IaaS):--


The capability provided to the consumer is to provision processing, storage, networks,
and other fundamental computing resources.


Consumer is able to deploy and run arbitrary software, which can include operating
systems and applications.


The consumer does not manage or control the underlying cloud infrastructure but has
control over operating systems; storage, deployed applications, and possibly limited
control of select

networking components (e.g., host firewalls).

Platform as a Service (PaaS):--


The capability provided to the consumer is to deploy onto the cloud infrastructure
consumer created or acquired applications created using programming languages and
tools supported by the provider.


The consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, or storage, but has control over the deployed
applications and

Possibly application hosting environment configurations.


Software as a Service (SaaS):--


The capability provided to the consumer is to use the provider‘s applications running on a
cloud infrastructure.

The applications are accessible from various client devices through a thin client interface
such as a web browser (e.g., web-based email).

The consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, storage, or even individual application capabilities,
with the possible exception of limited user specific application configuration settings.
Cloud Deployment Models:

 Public

 Private

 Community Cloud

 Hybrid Cloud


Public Cloud:The cloud infrastructure is made available to the general public or a large
industry group and is owned by an organization selling cloud services.


Private Cloud: The cloud infrastructure is operated solely for a single organization. It
may be managed by the organization or a third party, and may exist on-premises or off-
premises.


Community Cloud:The cloud infrastructure is shared by several organizations and
supports a specific community that has shared concerns (e.g., mission, security
requirements, policy, or compliance considerations). It may be managed by the
organizations or a third party and may exist on-premises or off-premises.


Hybrid Cloud: The cloud infrastructure is a composition of two or more clouds (private,
community, or public) that remain unique entities but are bound together by standardized
or

Proprietary technology that enables data and application portability (e.g., cloud bursting
for load-balancing between clouds).

ESSENTIAL CHARACTERISTICS:--
 On-demand self-service:--A consumer can unilaterally provision computing capabilities
such as server time and network storage as needed automatically, without requiring human
interaction with a service provider.

 Broad network access:--Capabilities are available over the network and accessed through
standard mechanisms that promote use by heterogeneous thin or thick client

platforms (e.g., mobile phones, laptops, and PDAs) as well as other traditional or cloud
based software services.
 Resource pooling:--The provider‘s computing resources are pooled to serve multiple
consumers using a multi-tenant model, with different physical and virtual resources

dynamically assigned and reassigned according to consumer demand.

 Rapid elasticity:--Capabilities can be rapidly and elastically provisioned in some cases


automatically - to quickly scale out; and rapidly released to quickly scale in. To

the consumer, the capabilities available for provisioning often appear to be unlimited and
can be purchased in any quantity at any time.

 Measured service:--Cloud systems automatically control and optimize resource usage by


leveraging a metering capability at some level of abstraction appropriate to the type of
service. Resource usage can be monitored, controlled, and reported - providing transparency
for both the provider and consumer of the service.

Conclusion:
Thus we have studied in detail about overview of cloud computing

*****
Subject: Lab I- Cloud Computing

Experiment No. 2

Aim: Creating a Warehouse Application in SalesForce.com‘s Force.com.


Theory:
Steps to create an application in Force.com by declarative model

   
Step 1: Click on Setup Create Objects New custom object
Label: MySale

Pular Label: MySales

Object Name: MySale

Record Name: MySale Description

Data Type: Text


Click on Save.


Step 2: Under MySale Go to Custom Field and Relationships Click on New Custom Field

st
Creating 1 Field:--
 
select Data type as Auto Number next
  
Enter the details Field Label: PROD_ID Display Format: MYS-{0000}
   
Starting Number: 1001 Field Name: PRODID Next Save & New

nd
Creating 2 Field:--
 
select Data type as Date next
  
Enter the details Field Label: Date of Sale Field Name: Date_of_Sale
  
Default Value: Today()-1 Next Save & New

rd
Creating 3 Field:--

 
select Data type as Number next
    
Enter the details Field Label: Quantity Sold Length:3 Decimal places:0
  
Default Value: Show Formulae Editor:1 Next Save & New

th
Creating 4 Field:--
 
select Data type as Currency next
     
Enter the details Field Label: Rate Field Name: Rate Length:4 Decimal places:2
  
Default Value: 10 Next Save & New

th
Creating 5 Field:--
 
select Data type as Currency next
  
MySale field Quantity Sold c*Rate c next save.

Now create an App

       
Setup Create App new MyShop Next Select an Image Next Add Object MySales.

Now create an Tab

     
Setup Create Tab New Custom Tab Choose MySales object select tab style save.

On the top in the tab bar you can see the tab which has been created by you click on the tab you
can see your object is opened just click on new button and provide the details mentioned.

Conclusion: In this we have created a MyShop Application on Force.com using declarative


model.

*****
Subject: Lab I- Cloud Computing

Experiment No. 3

Aim: : Case Study: PAAS (Face book, Google App Engine)

Theory:

Platform-as-a-Service (PaaS):

Cloud computing has evolved to include platforms for building and running custom web-based
applications, a concept known as Platform-as-a- Service. PaaS is an outgrowth of the SaaS
application delivery model. The PaaS model makes all of the facilities required to support the
complete life cycle of building and delivering web applications and services entirely available
from the Internet, all with no software downloads or installation for developers, IT managers,
or end users. Unlike the IaaS model, where developers may create a specific operating system
instance with homegrown applications running, PaaS developers are concerned only with
webbased development and generally do not care what operating system is used. PaaS services
allow users to focus on innovation rather than complex infrastructure. Organizations can
redirect a significant portion of their budgets to creating applications that provide real business
value instead of worrying about all the infrastructure issues in a roll-your-own delivery model.
The PaaS model is thus driving a new era of mass innovation. Now, developers around the
world can access unlimited computing power. Anyone with an Internet connection can build
powerful applications and easily deploy them to users globally.

Google App Engine:

Architecture :

The Google App Engine (GAE) is Google`s answer to the ongoing trend of Cloud Computing
offerings within the industry. In the traditional sense, GAE is a web application hosting
service, allowing for development and deployment of web-based applications within a pre-
defined runtime environment. Unlike other cloud-based hosting offerings such as Amazon
Web Services that operate on an IaaS level, the GAE already provides an application
infrastructure on the PaaS level. This means that the GAE

abstracts from the underlying hardware and operating system layers by providing the hosted
application with a set of application-oriented services. While this approach is very convenient for
developers of such applications, the rationale behind the GAE is its focus on scalability and
usage-based infrastructure as well as payment.

Costs :

Developing and deploying applications for the GAE is generally free of charge but restricted to a
certain amount of traffic generated by the deployed application. Once this limit is reached within
a certain time period, the application stops working. However, this limit can be waived when
switching to a billable quota where the developer can enter a maximum budget that can be spent
on an application per day. Depending on the traffic, once the free quota is reached the application
will continue to work until the maximum budget for this day is reached. Table 1 summarizes
some of the in our opinion most important quotas and corresponding amount per unit that is
charged when free resources are depleted and additional, billable quota is desired.

Features :

With a Runtime Environment, the Data store and the App Engine services, the GAE can be
divided into three parts.

Runtime Environment

The GAE runtime environment presents itself as the place where the actual application is
executed. However, the application is only invoked once an HTTP request is processed to the
GAE via a web browser or some other interface, meaning that the application is not constantly
running if no invocation or processing has been done. In case of such an HTTP request, the
request handler forwards the request and the GAE selects one out of many possible Google
servers where the application is then instantly deployed and executed for a certain amount of
time (8). The application may then do some computing and return the result back to the GAE
request handler which forwards an HTTP response to the client. It is important to understand that
the application runs completely embedded in this described sandbox environment but only as
long as requests are still coming in or some processing is done within the application. The reason
for this is simple: Applications should only run when they are actually computing, otherwise
they would allocate precious computing power and memory without need. This paradigm shows
already the GAE‘s potential in terms of scalability. Being able to run multiple instances of one
application independently on different servers guarantees for a decent level of scalability.
However, this highly flexible and stateless application execution paradigm has its limitations.
Requestsare processed no longer than 30 seconds after which the response has to be returned to the
client and the application is removed from the runtime environment again (8). Obviously this
method
accepts that for deploying and starting an application each time a request is processed, an
additional lead time is needed until the application is finally up and running. The GAE tries to
encounter this problem by caching the application in the server memory as long as possible,
optimizing for several subsequent requests to the same application. The type of runtime
environment on the Google servers is dependent on the programming language used. For Java
or other languages that have support for Java-based compilers (such as JRuby, Rhino and
Groovy) a Java-based Java Virtual Machine (JVM) is provided. Also, GAE fully supports the
Google Web Toolkit (GWT), a framework for rich web applications. For Python and related
frameworks a Python-based environment is used.

Services

As mentioned earlier, the GAE serves as an abstraction of the underlying hardware and operating
system layers. These abstractions are implemented as services that can be directly called from
the actual application. In fact, the datastore itself is as well a service that is controlled by the
runtime environment of the application.

MEM CACHE

The platform innate memory cache service serves as a short-term storage. As its name suggests,
it stores data in a server‘s memory allowing for faster access compared to the datastore.
Memcache is a non-persistent data store that should only be used to store temporary data within a
series of computations. Probably the most common use case for Memcache is to store session
specific data (15). Persisting session information in the datastore and executing queries on every
page interaction is highly inefficient over the application lifetime, since session-owner instances
are unique per session (16). Moreover, Memcache is well suited to speed up common datastore
queries (8). To interact with the Memcache
GAE supports JCache, a proposed interface standard for memory caches (17).

URL FETCH

Because the GAE restrictions do not allow opening sockets (18), a URL Fetch service can be
used to send HTTP or HTTPS requests to other servers on the Internet. This service works
asynchronously, giving the remote server some time to respond while the request handler can do
other things in the meantime. After the server has answered, the URL Fetch service returns
response code as well as header and body. Using the Google Secure Data Connector an
application can even access servers behind a company‘s firewall (8).

MAIL

The GAE also offers a mail service that allows sending and receiving email messages. Mails can
be sent out directly from the application either on behalf of the application‘s administrator or on
behalf of userswith Google Accounts. Moreover, an application can receive emails in the form of
HTTP requests initiated by the App Engine and posted to the app at multiple addresses. In
contrast to incoming emails, outgoing messages may also have an attachment up to 1 MB (8).

IMAGES

Google also integrated a dedicated image manipulation service into the App Engine. Using this
service images can be resized, rotated, flipped or cropped (18). Additionally it is able to combine
several images into a single one, convert between several image formats and enhance
photographs. Of course the API also provides information about format, dimensions and a
histogram of color values (8).

USERS

User authentication with GAE comes in two flavors. Developers can roll their own
authentication service using custom classes, tables and Memcache or simply plug into Google‘s
Accounts service.

Since for most applications the time and effort of creating a sign-up page and store user passwords
is not worth the trouble (18), the User service is a very convenient functionality which gives an
easy method for authenticating users within applications. As byproduct thousands of Google
Accounts are leveraged. The User service detects if a user has signed in and otherwise redirect
the user to a sign-in page. Furthermore, it can detect whether the current user is an administrator,
which facilitates implementing admin-only areas within the application (8).
App Engine for Business

While the GAE is more targeted towards independent developers in need for a hosting platform
for their medium-sized applications, Google`s recently launched App Engine for Business tries
to target the corporate market. Although technically mostly relying on the described GAE,
Google added some enterprise features and a new pricing scheme to make their cloud computing
platform more attractive for enterprise customers (21). Regarding the features, App Engine for
Business includes a central development manager that allows a central administration of all
applications deployed within one company including access control lists. In addition to that
Google now offers a 99.9% service level agreement as well as premium developer support.
Google also adjusted the pricing scheme for their corporate customers by offering a fixed price
of $8 per user per application, up to a maximum of $1000, per month. Interestingly, unlike the
pricing scheme for the GAE, this offer includes unlimited processing power for a fixed price of
$8 per user, application and month. From a technical point of view, Google tries to accommodate
for established industry standards, by now offering SQL database support in addition to the
existing Bigtable datastore described above (8).

APPLICATION DEVELOPMENT USING GOOGLE APP ENGINE

General Idea

In order to evaluate the flexibility and scalability of the GAE we tried to come up with an
application that relies heavily on scalability, i.e. collects large amounts of data from external
sources. That way we hoped to be able to test both persistency and the gathering of data from
external sources at large scale. Therefore our idea has been to develop an application that
connects people`s delicious bookmarks with their respective Facebook accounts. People using
our application should be able to see what their

Facebook friends‘ delicious bookmarks are, provided their Facebook friends have such a
delicious account. This way a user can get a visualization of his friends‘ latest topics by looking
at a generated tag cloud giving him a clue about the most common and shared interests.

PLATFORM AS A SERVICE: GOOGLE APP ENGINE:--

The Google cloud, called Google App Engine, is a ‗platform as a service‘ (PaaS) offering. In
contrast with the Amazon infrastructure as a service cloud, where users explicitly provision
virtual machines and control them fully, including installing, compiling and running software on
them, a PaaS offering hides the actual execution environment from users. Instead, a software
platform is provided along with an SDK, using which users develop applications and deploy
them on the cloud. The PaaS platform is responsible for executing the applications, including
servicing external service requests, as well as running scheduled jobs included in the application.
By making the actual execution servers transparent to the user, a PaaS platform is able to share
application servers across users who need lower capacities, as well as automatically scale
resources allocated to applications that experience heavy loads. Figure 5.2 depicts a user view of
Google App Engine. Users upload code, in either Java or Python, along with related files, which
are stored on the Google File System, a very large scale fault tolerant and redundant storage
system. It is important to note that an application is immediately available on the internet as soon
as it is successfully uploaded (no virtual servers need to be explicitly provisioned as in IaaS).

Resource usage for an application is metered in terms of web requests served and CPU-
hours actually spent executing requests or batch jobs. Note that this is very different from the
IaaS model: A PaaS application can be deployed and made globally available 24×7, but charged
only when accessed

(or if batch jobs run); in contrast, in an IaaS model merely making an application continuously
available incurs the full cost of keeping at least some of the servers running all the time. Further,
deploying applications in Google App Engine is free, within usage limits; thus applications can
be developed and tried out free and begin to incur cost only when actually accessed by a
sufficient volume of requests. The PaaS model enables Google to provide such a free service
because applications do not run in
dedicated virtual machines; a deployed application that is not accessed merely consumes storage
for its code and data and expends no CPU cycles.

GAE applications are served by a large number of web servers in Google‘s data centers
that execute requests from end-users across the globe. The web servers load code from the GFS
into memory and serve these requests. Each request to a particular application is served by any
one of GAE‘s web servers; there is no guarantee that the same server will serve requests to any
two requests, even from the same HTTP session. Applications can also specify some functions to
be executed as batch jobs which are run by a scheduler.

Google Datastore:--

Applications persist data in the Google Datastore, which is also (like Amazon SimpleDB) a non-
relational database. The Datastore allows applications to define structured types (called ‗kinds‘)
and store their instances (called ‗entities‘) in a distributed manner on the GFS file system. While
one can view Datastore ‗kinds‘ as table structures and entities as records, there are important
differences between a relational model and the Datastore, some of which are also illustrated in
Figure 5.3.

Unlike a relational schema where all rows in a table have the same set of columns, all entities of a
‗kind‘ need not have the same properties. Instead, additional properties can be added to any entity.
This feature is particularly useful in situations where one cannot foresee all the potential properties in
a model, especially those that occur occasionally for only a small subset of records. For example, a
model storing

‗products‘ of different types (shows, books, etc.) would need to allow each product to have a
different set of features. In a relational model, this would probably be implemented using a
separate FEATURES table, as shown on the bottom left of Figure 5.3. Using the Datastore, this
table (‗kind‘) is not required; instead, each product entity can be assigned a different set of
properties at runtime. The Datastore allows simple queries with conditions, such as the first
query shown in Figure 5.3 to retrieve all customers having names in some lexicographic range.
The query syntax (called GQL) is essentially the same as SQL, but with some restrictions. For
example, all inequality conditions in a querymust be on a single property; so a query that also
filtered customers on, say, their ‗type‘, would be illegal in GQL but allowed in SQL.

Relationships between tables in a relational model are modeled using foreign keys. Thus,
each account in the ACCTS table has a pointer ckey to the customer in the CUSTS table that it
belongs to. Relationships are traversed via queries using foreign keys, such as retrieving all
accounts for a particular customer, as shown. The Datastore provides a more object-oriented
approach to relationships in persistent data. Model definitions can include references to other
models; thus each entity of the Accts

‗kind‘ includes a reference to its customer, which is an entity of the Custs ‗kind.‘ Further,
relationships defined by such references can be traversed in both directions, so not only can one
directly access the customer of an account, but also all accounts of a given customer, without
executing any query operation, as shown in the figure.

GQL queries cannot execute joins between models. Joins are critical when using SQL to
efficiently retrieve data from multiple tables. For example, the query shown in the figure
retrieves details of all products bought by a particular customer, for which it needs to join data
from the transactions (TXNS), products (PRODS) and product features (FEATURES) tables.
Even though GQL does not allow joins, its ability to traverse associations between entities often
enables joins to be avoided, as shown in the figure for the above example: By storing references
to customers and products in the Txns model, it is possible to retrieve all transactions for a given
customer through a reverse traversal of the customer reference. The product references in each
transaction then yield all products and their features (as discussed earlier, a separate Features
model is not required because of schema

flexibility). It is important to note that while object relationship traversal can be used as an
alternative to joins, this is not always possible, and when required joins may need to be explicitly
executed by application code.
The Google Datastore is a distributed object store where objects (entities) of all GAE
applications are maintained using a large number of servers and the GFS distributed file system.
From a user perspective, it is important to ensure that in spite of sharing a distributed storage
scheme with many other users, application data is (a) retrieved efficiently and (b) atomically
updated. The Datastore provides a mechanism to group entities from different ‗kinds‘ in a
hierarchy that is used for both these purposes. Notice that in Figure 5.3entities of the Accts and
Txns ‗kinds‘ are instantiated with a parameter ‗parent‘ that specifies a particular customer
entity, thereby linking these three entities in an ‗entity group‘. The Datastore ensures that all
entities belonging to a particular group are stored close together in the distributed file system (we
shall see how in Chapter 10). The Datastore allows processing steps to be grouped into
transactions wherein updates to data are guaranteed to be

atomic; however this also requires that each transaction only manipulates entities belonging to
the same entity group. While this transaction model suffices for most on line applications,
complex batch updates that update many unrelated entities cannot execute atomically, unlike in a
relational database where there are no such restrictions.

Amazon SimpleDB:--

Amazon SimpleDB is also a nonrelational database, in many ways similar to the Google
Datastore.

SimpleDB‗domains‘ correspond to ‗kinds‘, and ‗items‘ to entities; each item can have a number
of attribute-value pairs, and different items in a domain can have different sets of attributes,
similar to Datastore entities. Queries on SimpleDB domains can include conditions, including
inequality conditions, on any number of attributes. Further, just as in the Google Datastore, joins
are not permitted. However, SimpleDB does not support object relationships as in Google
Datastore, nor does it support transactions. It is important to note that all data in SimpleDB is
replicated for redundancy, just as in

GFS. Because of replication, SimpleDB features an ‗eventual consistency‘ model, wherein data
is guaranteed to be propagated to at least one replica and will eventually reach all replicas, albeit
with some delay. This can result in perceived inconsistency, since an immediate read following a
write may not always yield the result written. In the case of Google Datastore on the other hand,
writes succeed only when all replicas are updated; this avoids inconsistency but also makes
writes slower.
PAAS CASE STUDY: FACEBOOK

Facebook provides some PaaS capabilities to application developers:--



Web services remote APIs that allow access to social network properties, data,Like button,

etc.


Many third-parties run their apps off Amazon EC2, and interface to Facebook via its APIs
PaaS
 IaaS


Facebook itself makes heavy use of PaaS services for their own private cloud

Key problems: how to analyze logs, make suggestions, determine which ads to place.

Facebook API: Overview:--

What you can do:



Read data from profiles and pages

Navigate the graph (e.g., via friends lists)

Issue queries (for posts, people, pages, ...)

Facebook API: The Graph API :


{

"id":
"1074724712",
"age_range":
{

"min": 21
},

"locale":
"en_US",
"location":
{

"id": "101881036520836",

"name": "Philadelphia,
Pennsylvania"
}
}
 Requests are mapped directly to HTTP:

 https://graph.facebook.com/(identifier)?fields=(fieldList)

 Response is in JSON

Uses several HTTP methods:


 GET for reading

 POST for adding or modifying

 DELETE for removing

 IDs can be numeric or names

 /1074724712 or /andreas.haeberlen

 Pages also have IDs

 Authorization is via 'access tokens'

 Opaque string; encodes specific permissions (access user location, but not interests, etc.)

 Has an expiration date, so may need to be refreshed


Facebook Data Management / Warehousing Tasks

Main tasks for “cloud” infrastructure:



Summarization (daily, hourly)


to help guide development on different components


to report on ad performance


recommendations

Ad hoc analysis:
 Answer questions on historical data – to help with managerial decisions

 Archival of logs

 Spam detection

 Ad optimization

 Initially used Oracle DBMS for this

 But eventually hit scalability, cost, performance bottlenecks just like Salesforce
does now
Data Warehousing at Facebook:

PAAS AT FACEBOOK:
 Scribe – open source logging, actually records the data that will be analyzed by Hadoop

 Hadoop (MapReduce – discussed next time) as batch processing engine for data analysis

 As of 2009: 2nd largest Hadoop cluster in the world, 2400 cores, > 2PB data with
> 10TB added every day

 Hive – SQL over Hadoop, used to write the data analysis queries

 Federated MySQL, Oracle – multi-machine DBMSs to store query results

Example Use Case 1: Ad Details


 Advertisers need to see how their ads are performing

 Cost-per-click (CPC), cost-per-1000-impressions (CPM)

 Social ads – include info from friends

 Engagement ads – interactive with video

 Performance numbers given:

 Number unique users, clicks, video views, …

 Main axes:
 Account, campaign, ad

 Time period

 Type of interaction

 Users

 Summaries are computed using Hadoop via Hive

Use Case 2: Ad Hoc analysis, feedback


 Engineers, product managers may need to understand what is going on

 e.g., impact of a new change on some sub-population

 Again, Hive-based, i.e., queries are in SQL with database joins

 Combine data from several tables, e.g., click-through rate = views combined with
clicks

 Sometimes requires custom analysis code with sampling

CONCLUSION :

Cloud Computing remains the number one hype topic within the IT industry at present. Our
evaluation of the Google App Engine and facebook has shown both functionality and limitations
of the platform. Developing and deploying an application within the GAE is in fact quite easy
and in a way shows the progress that software development and deployment has made. Within
our application we were able to use the abstractions provided by the GAE without problems,
although the concept of Bigtable requires a big change in mindset when developing. Our
scalability testing showed the limitations of the GAE at this point in time. Although being an
extremely helpful feature and a great USP for the GAE, the built-in scalability of the GAE
suffers from both purposely-set as well as technical restrictions at the moment. Coming back to
our motivation of evaluating the GAE in terms of its sufficiency for serious large-scale
applications in a professional environment, we have to conclude that the GAE not (yet) fulfills
business needs for enterprise applications at present.

*****
Subject: Lab I- Cloud Computing

Experiment No. 4

Aim: AWS Case Study: Amazon.com.

Theory: About AWS


Launched in 2006, Amazon Web Services (AWS) began exposing key infrastructure
services to businesses in the form of web services -- now widely known as cloud computing.


The ultimate benefit of cloud computing, and AWS, is the ability to leverage a new business
model and turn capital infrastructure expenses into variable costs.


Businesses no longer need to plan and procure servers and other IT resources weeks or
months inadvance.


Using AWS, businesses can take advantage of Amazon's expertise and economies of scale
to access resources when their business needs them, delivering results faster and at a lower
cost.


Today, Amazon Web Services provides a highly reliable, scalable, low-cost infrastructure
platform in the cloud that powers hundreds of thousands of businesses in 190 countries around
the world.


Amazon.com is the world‘s largest online retailer. In 2011, Amazon.com switched
from tape backup to using Amazon Simple Storage Service (Amazon S3) for backing
up the majority of its

Oracle databases. This strategy reduces complexity and capital expenditures, provides
faster backup and restore performance, eliminates tape capacity planning for backup
and archive, and frees up administrative staff for higher value operations. The company
was able to replace their backup tape infrastructure with cloud-based Amazon S3
storage, eliminate backup software, and experienced a 12X performance improvement,
reducing restore time from around 15 hours to 2.5 hours in select scenarios.

With data center locations in the U.S., Europe, Singapore, and Japan, customers across all
industries
are taking advantage of the following benefits:

 Low Cos

 Agility and Instant Elasticity

 Open and Flexible

 Secure

The Challenge

As Amazon.com grows larger, the sizes of their Oracle databases continue to grow, and so does
the sheer number of databases they maintain. This has caused growing pains related to backing
up legacy Oracle databases to tape and led to the consideration of alternate strategies including
the use of Cloud services of Amazon Web Services (AWS), a subsidiary of Amazon.com. Some
of the business challenges Amazon.com faced included:

 Utilization and capacity planning is complex, and time and capital expense budget are at a
premium. Significant capital expenditures were required over the years for tape hardware,
data center space for this hardware, and enterprise licensing fees for tape software. During
that time, managing tape infrastructure required highly skilled staff to spend time with setup,
certification and engineering archive planning instead of on higher value projects. And at the
end of every fiscal year, projecting future capacity requirements required time consuming
audits, forecasting, and budgeting. 

 The cost of backup software required to support multiple tape devices sneaks up on you.
Tape robots provide basic read/write capability, but in order to fully utilize them, you must
invest in proprietary tape backup software. For Amazon.com, the cost of the software had
been high, and added significantly to overall backup costs. The cost of this software was an
ongoing budgeting pain point, but one that was difficult to address as long as backups needed
to be written to tape devices.
 Maintaining reliable backups and being fast and efficient when retrieving data requires a lot
of time and effort with tape. When data needs to be durably stored on tape, multiple copies
are required. When everything is working correctly, and there is minimal contention for tape
resources, the tape robots and backup software can easily find the required data. However, if
there is a hardware failure, human intervention is necessary to restore from tape. Contention
for tape drives resulting from multiple users‘ tape requests slows down restore processes
even more. This adds to the recovery time objective (RTO) and makes achieving it more
challenging compared to backing up to Cloud storage.

Why Amazon Web Services?

Amazon.com initiated the evaluation of Amazon S3 for economic and performance


improvements related to data backup. As part of that evaluation, they considered security,
availability, and performance aspects of Amazon S3 backups. Amazon.com also executed a
cost-benefit analysis to ensure that a migration to Amazon S3 would be financially worthwhile.
That cost benefit analysis included the following elements:

 Performance advantage and cost competitiveness. It was important that the overall costs of
the backups did not increase. At the same time, Amazon.com required faster backup and
recovery performance. The time and effort required for backup and for recovery operations
proved to be a significant improvement over tape, with restoring from Amazon S3 running
from two to twelve times faster than a similar restore from tape. Amazon.com required any
new backup medium to provide improved performance while maintaining or reducing overall
costs. Backing up to on-premises disk based storage would have improved performance, but
missed on cost competitiveness. Amazon S3 Cloud based storage met both criteria. 

 Greater durability and availability. Amazon S3 is designed to provide 99.999999999%


durability and 99.99% availability of objects over a given year. Amazon.com compared these
figures with those observed from their tape infrastructure, and determined that Amazon S3
offered significant improvement.

 Less operational friction. Amazon.com DBAs had to evaluate whether Amazon S3 backups
would be viable for their database backups. They determined that using Amazon S3 for
backups was easy to implement because it worked seamlessly with Oracle RMAN. 
 Strong data security. Amazon.com found that AWS met all of their requirements for physical
security, security accreditations, and security processes, protecting data in flight, data at rest,
and utilizing suitable encryption standards. 

The Benefits

With the migration to Amazon S3 well along the way to completion, Amazon.com has realized
several

benefits, including:

 Elimination of complex and time-consuming tape capacity planning. Amazon.com is


growing larger

and more dynamic each year, both organically and as a result of acquisitions. AWS has
enabled Amazon.com to keep pace with this rapid expansion, and to do so seamlessly.
Historically, Amazon.com business groups have had to write annual backup plans,
quantifying the amount of tape storage that they plan to use for the year and the frequency
with which they will use the tape resources. These plans are then used to charge each
organization for their tape usage, spreading the cost among many teams. With Amazon S3,
teams simply pay for what they use, and are billed for their usage as they go. There are
virtually no upper limits as to how much data can be stored in Amazon S3, and so there are
no worries about running out of resources. For teams adopting Amazon S3 backups, the need
for formal planning has been all but eliminated.

 Reduced capital expenditures. Amazon.com no longer needs to acquire tape robots, tape
drives, tape inventory, data center space, networking gear, enterprise backup software, or
predict future tape consumption. This eliminates the burden of budgeting for capital
equipment well in advance as well as the capital expense.

 Immediate availability of data for restoring – no need to locate or retrieve physical tapes.
Whenever a DBA needs to restore data from tape, they face delays. The tape backup software
needs to read the tape catalog to find the correct files to restore, locate the correct tape, mount
the tape, and read the data from it. In almost all cases the data is spread across multiple tapes,
resulting in further delays. This, combined with contention for tape drives resulting from
multiple users‘ tape requests, slows the process down even more. This is especially severe
during critical events such as a data center outage, when many databases must be restored
simultaneously and as soon as possible. None of these problems occur with Amazon S3. Data
restores can begin immediately, with no waiting or tape queuing – and that means the
database can be recovered much faster.

 Backing up a database to Amazon S3 can be two to twelve times faster than with tape drives.
As one example, in a benchmark test a DBA was able to restore 3.8 terabytes in 2.5 hours
over gigabit Ethernet. This amounts to 25 gigabytes per minute, or 422MB per second. In
addition, since Amazon.com uses RMAN data compression, the effective restore rate was
3.37 gigabytes per second. This 2.5 hours compares to, conservatively, 10-15 hours that
would be required to restore from tape.

 Easy implementation of Oracle RMAN backups to Amazon S3. The DBAs found it easy to
start backing up their databases to Amazon S3. Directing Oracle RMAN backups to Amazon
S3 requires

only a configuration of the Oracle Secure Backup Cloud (SBC) module. The effort required
to configure the Oracle SBC module amounted to an hour or less per database. After this one-
time setup, the database backups were transparently redirected to Amazon S3.

 Durable data storage provided by Amazon S3, which is designed for 11 nines durability. On
occasion, Amazon.com has experienced hardware failures with tape infrastructure – tapes
that break, tape drives that fail, and robotic components that fail. Sometimes this happens
when a DBA is trying to restore a database, and dramatically increases the mean time to
recover (MTTR). With the durability and availability of Amazon S3, these issues are no
longer a concern.

 Freeing up valuable human resources. With tape infrastructure, Amazon.com had to seek out
engineers who were experienced with very large tape backup installations – a specialized,
vendor-specific skill set that is difficult to find. They also needed to hire data center
technicians and dedicate them to problem-solving and troubleshooting hardware issues –
replacing drives, shuffling tapes around, shipping and tracking tapes, and so on. Amazon S3
allowed them to free up these specialists from day-to-day operations so that they can work on
more valuable, business-critical engineering tasks. 

 Elimination of physical tape transport to off-site location. Any company that has been storing
Oracle backup data offsite should take a hard look at the costs involved in transporting,
securing and storing

their tapes offsite – these costs can be reduced or possibly eliminated by storing the data in
Amazon S3.

As the world‘s largest online retailer, Amazon.com continuously innovates in order to provide
improved customer experience and offer products at the lowest possible prices. One such
innovation has been to replace tape with Amazon S3 storage for database backups. This
innovation is one that can be easily replicated by other organizations that back up their Oracle
databases to tape.

Products & Services


Compute
Content Delivery
Database
 Deployment & Management
E-Commerce
Messaging
Monitoring
 Networking
Payments & Billing

Storage

Support

Web Traffic

Workforce
Products & Services

Compute
›Amazon Elastic Compute Cloud (EC2)

Amazon Elastic Compute Cloud delivers scalable, pay-as-you-go compute capacity in the cloud.
›Amazon Elastic MapReduce

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data
analysts, and developers to easily and cost-effectively process vast amounts of data.
›Auto Scaling
Auto Scaling allows to automatically scale our Amazon EC2capacity up or down according
to conditions we define.
Content Delivery
›Amazon CloudFront
Amazon CloudFront is a web service that makes it easy to distribute content with low latency
via a global network of edge locations.
Database
›Amazon SimpleDB
Amazon SimpleDB works in conjunction with Amazon S3 and AmazonEC2 to run queries
on structured data in real time.
›Amazon Relational Database Service (RDS)
Amazon Relational Database Service is a web service that makes it easy to set up, operate,
and scale a relational database in the cloud.
›Amazon ElastiCache
Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-
memory cache in the cloud.
E-Commerce
›Amazon Fulfillment Web Service (FWS)
Amazon Fulfillment Web Service allows merchants to deliver products using Amazon.com‘s
worldwide fulfillment capabilities.
Deployment & Management

:AWS Elastic Beanstalk

AWS Elastic Beanstalk is an even easier way to quickly deploy and manage applications in
the AWS cloud. We simply upload our application, and Elastic Beanstalk automatically
handles the deployment details of capacity provisioning, load balancing, auto-scaling, and
application health monitoring.
›AWS CloudFormation
AWS CloudFormation is a service that gives developers and businesses an easy way to create
a collection of related AWS resources and provision them in an orderly and predictable
fashion.
Monitoring
›Amazon CloudWatch
Amazon CloudWatch is a web service that provides monitoring for AWS cloud resources,
starting with Amazon EC2
Messaging
›Amazon Simple Queue Service (SQS)
Amazon Simple Queue Service provides a hosted queue for storing messages as they travel
between computers, making it easy to build automated workflow between Web services.
›Amazon Simple Notification Service (SNS)
Amazon Simple Notification Service is a web service that makes it easy to set up, operate,
and send notifications from the cloud.
›Amazon Simple Email Service (SES)
Amazon Simple Email Service is a highly scalable and cost-effective bulk and transactional
email-sending service for the cloud.

Workforce
›Amazon Mechanical Turk
Amazon Mechanical Turk enables companies to access thousands of global workers on
demand and programmatically integrate their work into various business processes.

Networking
›Amazon Route 53
Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service.
›Amazon Virtual Private Cloud (VPC)
Amazon Virtual Private Cloud (Amazon VPC) lets you provision a private, isolated section
of the Amazon Web Services (AWS) Cloud where we can launch AWS resources in a virtual
network that you define. With Amazon VPC, we can define a virtual network topology that
closely resembles a traditional network that you might operate in your own datacenter.
›AWS Direct Connect
AWS Direct Connect makes it easy to establish a dedicated network connection from your
premise to AWS, which in many cases can reduce our network costs, increase bandwidth
throughput, and provide a more consistent network experience than Internet-basedconnections.
›Elastic Load Balancing
Elastic Load Balancing automatically distributes incoming application traffic across multiple
Amazon EC2 instances.

Payments & Billing


›Amazon Flexible Payments Service (FPS)
Amazon Flexible Payments Service facilitates the digital transfer of money between any two
entities, humans or computers.
›Amazon DevPay
Amazon DevPay is a billing and account management service which enables developers to
collect payment for their AWS applications.
Storage
›Amazon Simple Storage Service (S3)
Amazon Simple Storage Service provides a fully redundant data storage infrastructure for
storing and retrieving any amount of data, at any time, from anywhere on the Web.
›Amazon Elastic Block Store (EBS)

Amazon Elastic Block Store provides block level storage volumes for use with
Amazon EC2 instances. Amazon EBS volumes are off-instance storage that persists
independently from the life of an instance.
›AWS Import/Export
AWS Import/Export accelerates moving large amounts of data into and out of AWS using
portable storage devices for transport.

Support
›AWS Premium Support AWS Premium Support is a one-on-one, fast-response support
channel to help you build and run applications on AWS Infrastructure Services.
Web Traffic
›Alexa Web Information Service
Alexa Web Information Service makes Alexa‘s huge repository of data about structure and
traffic patterns on the Web available to developers.
›Alexa Top Sites
Alexa Top Sites exposes global website traffic data as it is continuously collected and
updated by Alexa Traffic Rank.

Amazon CloudFront
 Amazon CloudFront is a web service for content delivery.
 
It integrates with other Amazon Web Services to give developers and businesses an easy
way to distribute content to end users with low latency, high data transfer speeds, and no
commitments.

Amazon CloudFront delivers our static and streaming content using a global network of
edge locations.

 
Requests for our objects are automatically routed to the nearest edge location, so content
is delivered with the best possible performance.

Amazon CloudFront

 
Amazon CloudFront is optimized to work with other Amazon Web Services, like Amazon
Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2).

 
Amazon CloudFront also works seamlessly with any origin server, which stores the
original, definitive versions of our files.

 
Like other Amazon Web Services, there are no contracts or monthly commitments for
using Amazon CloudFront _ we pay only for as much or as little content as you actually
deliver through the service.

Amazon Simple Queue Service (Amazon SQS)

 
Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable, hosted
queue for storing messages as they travel between computers.

 
By using Amazon SQS, developers can simply move data between distributed
components of their applications that perform different tasks, without losing messages or
requiring each component to be always available.

 
Amazon SQS makes it easy to build an automated workflow, working in close
conjunction with the Amazon Elastic Compute Cloud (Amazon EC2) and the other AWS
infrastructure web services.

Amazon Simple Queue Service (Amazon SQS)

 
Amazon SQS works by exposing Amazon_s web-scale messaging infrastructure as a web
service.

 
Any computer on the Internet can add or read messages without any installed software or
special firewall configurations.

 
Components of applications using Amazon SQS can run independently, and do not need
to be on the same network, developed with the same technologies, or running at the same
time

BigTable
 
Bigtable is a distributed storage system for managing structured data that is designed to
scale to a very large size: petabytes of data across thousands of commodity servers.
 
Many projects at Google store data in Bigtable, including web indexing, Google Earth,
and Google Finance.
 
These applications place very different demands on Bigtable, both in terms of data size
(from URLs to web pages to satellite imagery) and latency requirements (from backend bulk
processing to real-time data serving)
 
Despite these varied demands, Bigtable has successfully provided a fexible, high-
performance solution for all of these Google products.

The Google File System(GFS)

 
The Google File System (GFS) is designed to meet the rapidly growing demands of
Google_s data processing needs.

 
GFS shares many of the same goals as previous distributed file systems such as
performance, scalability, reliability, and availability.

 
It provides fault tolerance while running on inexpensive commodity hardware, and it
delivers high aggregate performance to a large number of clients.

 
While sharing many of the same goals as previous distributed file systems, file system
has successfully met our storage needs.

 
It is widely deployed within Google as the storage platform for the generation and
processing of data used by our service as well as research and development efforts that
require large data sets.
 
The largest cluster to date provides hundreds of terabytes of storage across thousands
of disks on over a thousand machines, and it is concurrently accessed by hundreds of
clients.

Conclusion:

Thus, we have studied a case study on amazon web services.

*****

You might also like