0% found this document useful (0 votes)
80 views21 pages

Cloud Elasticity and Scaling Strategies

Uploaded by

surajit jati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views21 pages

Cloud Elasticity and Scaling Strategies

Uploaded by

surajit jati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Resource Provisioning,

Load Balancing and Security

UNIT 5 SCALING

Structure

5.0 Introduction
5.1 Objectives
5.2 Cloud Elasticity
5.3 Scaling Primitives
5.4 Scaling Strategies
5.4.1 Proactive Scaling
5.4.2 Reactive Scaling
5.4.3 Combinational Scaling
5.5 Auto Scaling in Cloud
5.6 Types of Scaling
5.6.1 Vertical Scaling or Scaling Up
5.6.2 Horizontal Scaling or Scaling Out
5.7 Summary
5.8 Solutions/Answers
5.9 Further Readings

5.0 INTRODUCTION

In the earlier unit we had studied resource pooling, sharing and provisioning in
cloud computing. In this unit let us study other important characteristic
features of cloud computing – Cloud Elasticity and Scaling.

The Scalability in cloud computing refers to the flexibility of allocating IT


resources as per the demand. Various applications running on cloud instances
experience variable traffic loads and hence the need of scaling arises. The need
of such applications can be of different types such as CPU allocation, Memory
expansion, storage and networking requirements etc. To address these different
requirements, virtual machines are one of the best ways to achieve scaling.
Each of the virtual machines is equipped with a minimum set of configurations
for CPU, Memory and storage. As and when required, the machines can be
configured to meet the traffic load. This is achieved by reconfiguring the
virtual machine for better performance for the target load. Sometimes it is quite
difficult to manage such on demand configurations by the persons, hence auto
scaling techniques plays a good role.

In this unit we will focus on the various methods and algorithms used in the
process of scaling. We will discuss various types of scaling, their usage and a
few examples. We will also discuss the importance of various techniques in
saving cost and man efforts by using the concepts of cloud scaling in highly

1
Scaling
dynamic situations. The suitability of scaling techniques in different scenarios
is also discussed in detail.

Understanding elasticity property of cloud is important to study Scaling in


cloud computing. Cloud Elasticity is discussed in the next section.

5.1 OBJECTIVES

After going through this unit you should be able to:


• understand the concept of cloud elasticity and its importance
• list the advantages of cloud elasticity and some use cases.
• describe scaling and its advantage;
• understand the different scaling techniques;
• learn about the scaling up and down approaches;
• understand the basics of auto scaling; and
• compare among proactive and reactive scaling.

5.2 CLOUD ELASTICITY

Cloud Elasticity is the property of a cloud to grow or shrink capacity for CPU,
memory, and storage resources to adapt to the changing demands of an
organization. Cloud Elasticity can be automatic, without need to perform
capacity planning in advance of the occasion, or it can be a manual process
where the organization is notified they are running low on resources and can
then decide to add or reduce capacity when needed. Monitoring tools offered
by the cloud provider dynamically adjust the resources allocated to an
organization without impacting existing cloud-based operations.

The extent of a cloud provider's elasticity is gauged by its capability to


autonomously scale resources in response to workload fluctuations, alleviating
the need for constant resource monitoring by IT administrators. This proactive
provisioning and deprovisioning of CPU, memory, and storage resources align
closely with demand, avoiding surplus capacity or resource shortages.
Cloud Elasticity, often linked with horizontal scaling architecture, is
commonly associated with pay-as-you-go models offered by public cloud
providers. This approach enables real-time adjustments in cloud expenses by
spinning up or down virtual machines based on fluctuating demand for specific
applications or services.

This flexibility empowers businesses and IT organizations to seamlessly


address unexpected surges in demand without the need for idle backup
equipment. Leveraging Cloud Elasticity allows organizations to 'cloudburst,'
shifting operations to the cloud when demand peaks and returning to on-
premises setups once demand subsides. Ultimately, Cloud Elasticity results in
Resource Provisioning,
substantial savings, reducing infrastructure costs, human resource allocation, Load Balancing and Security
and overall IT expenses.

5.2.1 Importance of Cloud Elasticity

In the absence of Cloud Elasticity, organizations would face paying for largely
unused capacity and handling the ongoing management and upkeep of that
capacity, including tasks like OS upgrades, patching, and addressing
component failures. Cloud Elasticity serves as a defining factor in cloud
computing, setting it apart from other models like client-server setups, grid
computing, or traditional infrastructure.

Cloud Elasticity acts as a vital tool for businesses, preventing both over-
provisioning (allocating more IT resources than necessary for current
demands) and under-provisioning (failing to allocate sufficient resources to
meet existing or imminent demands).

Over-provisioning leads to unnecessary spending, wasting valuable capital that


could be better utilized elsewhere. Even within the realm of public cloud
usage, the absence of elasticity could result in thousands of dollars squandered
annually on unused virtual machines (VMs).

Conversely, under-provisioning can result in an inability to meet existing


demand, causing unacceptable delays, dissatisfaction among users, and
ultimately, loss of business as customers opt for more responsive
organizations. The lack of Cloud Elasticity thus translates to potential business
losses and significant impacts on the bottom line.

5.2.2 How does it Work?

Cloud Elasticity empowers organizations to swiftly adjust their capacity, either


automatically or manually, by scaling up or down. It encompasses the concept
of 'cloud bursting,' where on-premises infrastructure extends into the public
cloud, especially to meet sudden or seasonal surges in demand. Moreover,
Cloud Elasticity involves the ability to expand or reduce resources utilized by
cloud-based applications.

This elasticity can be activated automatically, responding to workload patterns,


or initiated manually, often within minutes. Previously, without the benefits of
Cloud Elasticity, organizations had to rely on standby capacity or go through
lengthy processes of ordering, configuring, and installing additional capacity,
which could take weeks or months.

When demand subsides, capacity can be swiftly reduced within minutes.


Consequently, organizations only pay for the resources actively used at any

3
Scaling
given time, eliminating the necessity to acquire or retire on-premises
infrastructure to cater to fluctuating demand.

5.2.3 Use Cases of Cloud Elasticity

Common use cases where Cloud Elasticity proves beneficial include:


• Seasonal spikes in retail or e-commerce, notably during holiday periods
like Black Friday through early January.
• Peaks in demand during school district registration, especially in the
spring before the school term starts.
• Businesses experiencing sudden surges due to product launches or viral
social media attention, such as streaming services scaling up resources
for new releases or increased viewership.
• Utilizing public cloud capabilities for Disaster Recovery and Business
Continuity (DR/BC), enabling off-site backups or rapid VM
deployment during on-premises infrastructure outages.
• Scaling virtual desktop infrastructure in the cloud for temporary
workers, contractors, or remote learning applications.
• Temporary scaling of cloud infrastructure for test and development
purposes, dismantling it once testing or development is finished.
• Adapting to unplanned projects with short deadlines.
• Temporary initiatives like data analytics, batch processing, or media
rendering, requiring scalable resources.

5.2.4 Advantages of Cloud Elasticity

The advantages of cloud elasticity encompass:

• Flexibility: By eradicating the need for purchasing, configuring, and


installing new infrastructure during demand fluctuations, Cloud
Elasticity eliminates the necessity to anticipate unexpected spikes in
demand. This empowers organizations to readily address unforeseen
surges, whether triggered by seasonal peaks, mentions on platforms like
Reddit, or endorsements from influential sources like Oprah’s book
club.

• Usage-based Pricing: Unlike paying for idle infrastructure, Cloud


Elasticity enables organizations to exclusively pay for actively utilized
resources. This approach closely aligns IT expenses with real-time
demand, allowing organizations to optimize their infrastructure size
dynamically. Amazon asserts that adopting its instance scheduler with
EC2 cloud service can yield savings exceeding 60% compared to non-
adopters.
Resource Provisioning,
• High Availability: Cloud elasticity fosters both high availability and Load Balancing and Security
fault tolerance by enabling replication of VMs or containers in case of
potential failure. This ensures uninterrupted business services and a
consistent user experience, even amidst automatic provisioning or
deprovisioning, preserving operational continuity.

• Efficiency: Automation of resource adjustments liberates IT personnel


from manual provisioning tasks, enabling them to focus on projects that
significantly benefit the organization.

• Accelerated Time-to-Market: Access to capacity within minutes, as


opposed to the weeks or months typically required in traditional
procurement processes, expedites organizations' ability to deploy
resources swiftly, thereby enhancing their speed-to-market.

Now let us study scaling concept in the next section after understanding the
cloud elasticity and underlying concepts.

5.3 SCALING PRIMITIVES

The basic purpose of Scaling is to enable one to use cloud computing


infrastructure as much as required by the application. Here, the cloud resources
are added or removed according to the current need of the applications. The
property to enhance or to reduce the resources in the cloud is referred to as
cloud elasticity. Scaling exploits the elastic property of the cloud which we
had studied in the earlier section. The scalability of cloud architecture is
achieved using virtualization (see Unit 3: Resource Virtualization).
Virtualization uses virtual machines (VM’s) for enhancing (up scaling) and
reducing (down scaling) computing power. The scaling provides opportunities
to grow businesses to a more secure, available and need based computing/
storage facility on the cloud. Scaling also helps in optimizing the financial
involved for highly resource bound applications for small to medium
enterprises.

The key advantages of cloud scaling are: -

• Minimum cost: The user has to pay a minimum cost for access usage
of hardware after upscaling. The hardware cost for the same scale can
be much greater than the cost paid by the user. Also, the maintenance
and other overheads are also not included here. Further, as and when
the resources are not required, they may be returned to the Service
provider resulting in the cost saving.

5
Scaling
• Ease of use: The cloud upscaling and downscaling can be done in just
a few minutes (sometime dynamically) by using service providers
application interface.

• Flexibility: The users have the flexibility to enable/ disable certain


VM’s for upscaling and downscaling by them self and thus saving
configuration/ installation time for new hardware if purchased
separately.

• Recovery: The cloud environment itself reduces the chance of disaster


and amplifies the recovery of information stored in the cloud.

The scalability of the clouds aims to optimize the utilization of various


resources under varying workload conditions such as under provisioning and
over provisioning of resources. In non-cloud environments resource utilization
can be seen as a major concern as one has no control on scaling. Various
methods exist in literature which may be used in traditional environment
scaling. In general, a peak is forecasted and accordingly infrastructure is set up
in advance. This scaling experience high latency and require manual
monitoring. The associated drawbacks of this type of setup is quite crucial in
nature as estimation of maximum load may exist at both ends making either
high end or poorly configured systems.

Cost

Workload

Checkpoint

Time

Figure 1: Manual Scaling in traditional environment


Resource Provisioning,
Load Balancing and Security

Cost

Workload
Checkpoint

Time

Figure 2: Semi-Automatic Scaling in Cloud Environment

In the case of the clouds, virtual environments are utilized for resource
allocation. These virtual machines enable clouds to be elastic in nature which
can be configured according to the workload of the applications in real time. In
such scenarios, downtime is minimized and scaling is easy to achieve.

On the other hand, scaling saves cost of hardware setup for some small time
peaks or dips in load. In general most cloud service providers provide scaling
as a process for free and charge for the additional resource used. Scaling is also
a common service provided by almost all cloud platforms.

When resources are scaled down in cloud computing, users experience


substantial cost savings due to the pay-as-you-go model inherent in the cloud.
Scaling down entails reducing allocated resources such as CPU, memory, or
storage to match the current demand, ensuring that users only pay for what
they actively use. This optimization results in reduced expenditure on unused
or underutilized resources, aligning expenses more closely with actual
consumption. Additionally, by efficiently managing resource allocation and
avoiding over-provisioning, users benefit from a cost-effective approach that
minimizes unnecessary expenses, thereby optimizing their overall spending
within the cloud environment.

5.4 SCALING SRATEGIES

Let us now see what the strategies for scaling are, how one can achieve scaling
in a cloud environment and what are its types. In general, scaling is categorized

7
Scaling
based on the decision taken for achieving scaling. The three main strategies for
scaling are discussed below.

5.4.1 Proactive Scaling

Consider a scenario when a huge surge in traffic is expected on one of the


applications in the cloud. In this situation a proactive scaling is used to cater
the load. The proactive scaling can also be pre scheduled according to the
expected traffic and demand. This also expects the understanding of traffic
flow in advance to utilize maximum resources, however wrong estimates
generally lead to poor resource management. The prior knowledge of the load
helps in better provisioning of the cloud and accordingly minimum lag is
experienced by the end users when sudden load arrives. The given figure 3
shows the resource provision when load increases with time.
Load

Time of Day
Figure 3: Proactive Scaling
5.4.2 Reactive Scaling

The reactive scaling often monitors and enables smooth workload changes to
work easily with minimum cost. It empowers users to easily scale up or down
computing resources rapidly. In simple words, when the hardware like CPU or
RAM or any other resource touches highest utilization, more of the resources
are added to the environment by the service providers. The auto scaling works
on the policies defined by the users/ resource managers for traffic and scaling.
One major concern with reactive scaling is a quick change in load, i.e. user
experiences lags when infrastructure is being scaled. The given figure 4 shows
the resource provision in reactive scaling.
Resource Provisioning,
Load Balancing and Security

F
Load

Time of Day
Figure 4: Proactive Scaling
5.4.3 Combinational Scaling

Till now we have seen need based and forecast based techniques for scaling.
However, for better performance and low cool down period we can also
combine both of the reactive and proactive scaling strategies where we have
some prior knowledge of traffic. This helps us in scheduling timely scaling
strategies for expected load. On the other hand, we also have provision of load
based scaling apart from the predicted load on the application. This way both
the problems of sudden and expected traffic surges are addressed.

Following table 1 shows the comparison between proactive and reactive


scaling strategies.

Table 1: Proactive Scaling Vs Reactive Scaling

Parameters Proactive Scaling Reactive Scaling

Suitability For applications increasing loads For applications increasing


in expected/ known manner loads in unexpected/
unknown manner

Working User sets the threshold but a User defined threshold values
downtime is required. optimize the resources

Cost Reduction Medium cost reduction Medium cost reduction

Implementation A few steps required Fixed number of steps


required

9
Scaling
 Check Your Progress 1

1. Explain the importance of scaling in cloud computing?


…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………

2. How proactive scaling is achieved through virtualization?


…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………

3. Write differences between combinational and reactive scaling.

…………………………………………………………………………

…………………………………………………………………………

…………………………………………………………………………

4. List the differences between Cloud Elasticity and Scaling.

…………………………………………………………………………

…………………………………………………………………………

…………………………………………………………………………

5.5 AUTO SCALING IN CLOUD

One of the potential risks in scaling a cloud infrastructure is its magnitude of


scaling. If we scale it down to a very low level, it will adversely affect the
throughput and latency. In this case, a high latency will be affecting the user’s
experience and can cause dissatisfaction of the users. On the other hand, if we
scale-up the cloud infrastructure to a large extent then it will not be a resource
optimization and also would cost heavily, affecting the host and the whole
purpose of cost optimization fails.

In a cloud, auto scaling can be achieved using user defined policies, various
machine health checks and schedules. Various parameters such as Request
counts, CPU usage and latency are the key parameters for decision making in
autoscaling. A policy here refers to the instruction sets for clouds in case of a
Resource Provisioning,
particular scenario (for scaling -up or scaling -down). The autoscaling in the Load Balancing and Security
cloud is done on the basis of following parameters.

1. The number of instances required to scale.


2. Absolute no. or percentage (of the current capacity)

The process of auto scaling also requires some cooldown period for resuming
the services after a scaling takes place. No two concurrent scaling are triggered
so as to maintain integrity. The cooldown period allows the process of
autoscaling to get reflected in the system in a specified time interval and saves
any integrity issues in cloud environment.

Costs

Workload

Time

Figure 4. Automatic scaling in cloud environments

Consider a more specific scenario, when the resource requirement is high for
some time duration e.g. in holidays, weekends etc., a Scheduled scaling can
also be performed. Here the time and scale/ magnitude/ threshold of scaling
can be defined earlier to meet the specific requirements based on the previous
knowledge of traffic. The threshold level is also an important parameter in auto
scaling as a low value of threshold results in under utilization of the cloud
resources and a high level of threshold results in higher latency in the cloud.

After adding additional nodes in scale-up, the incoming requests per second
drops below the threshold. This results in triggering the alternate scale-up-
down processes known as a ping-pong effect. To avoid both under-scaling and
overscaling issues load testing is recommended to meet the service level
agreements (SLAs).

Service Level Agreements (SLAs) in cloud computing outline the terms,


expectations, and commitments between a service provider and users regarding
the quality, availability, and performance of the offered services. These
agreements specify uptime percentages, response times, and support
availability, establishing benchmarks against which the provider's performance
is measured. SLAs ensure reliability and transparency, guaranteeing users a
11
Scaling
certain level of service and outlining remedies or compensations if agreed-
upon standards are not met. They serve as vital tools in fostering trust between
cloud service providers and users by delineating responsibilities and ensuring
accountability, thereby maintaining a mutually beneficial relationship based on
defined service expectations.

In addition, the scale-up process is required to satisfy the following properties.

1. The number of incoming requests per second per node > threshold of
scale down, after scale-up.
2. The number of incoming requests per second per node < threshold of
scale up, after scale-down

Here, in both the scenarios one should reduce the chances of ping-pong effect.

Now we know what scaling is and how it affects the applications hosted on the
cloud. Let us now discuss how auto scaling can be performed in fixed amounts
as well as in percentage of the current capacity.

Fixed Amount Autoscaling

As discussed earlier, the auto scaling can be achieved by determining the


number of instances required to scale by a fixed number. The detailed
algorithm for fixed amount autoscaling threshold is given below. The
algorithm works for both scaling-up and scaling-down and takes inputs U and
D for both respectively.

--------------------------------------------------------------------------------------------
Algorithm : 1
--------------------------------------------------------------------------------------------
Input : SLA specific application
Parameters:
N_min minimum number of nodes
D - scale down value.
U scale up value.
T_U scale up threshold
T_D scale down threshold

Let T (SLA) return the maximum incoming request per second (RPS) per node
for the specific SLA.

T_D ← 0.50 x T_U


T_U ← 0.90 x T (SLA)

Let N_c and RPS_n represent the current number of nodes and incoming
requests per second per node respectively.
Resource Provisioning,
Load Balancing and Security
L1: /* scale up (if RPS_n> T_U) */
Repeat:
N_(c_old) ←N_c
N_c ←N_c + U
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n> T_U

L2: /* scale down (if RPS_n< T_D) */

Repeat:
N_(c_old) ←N_c
N_c ← max(N_min, N_c - D)
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n< T_D or N_c = N_min

Now, let us discuss how this algorithm works in detail. Let the values of a few
parameters are given as U = 2, D = 2, T_U = 120 and T_D = 150. Suppose in
the beginning, RPS = 450 and N_c = 4. Now RPS is increased to 1800 and
RPS_n almost reached to T_U, in this situation an autoscaling request is
generated leading to adding U = 2 nodes. Table - 1 lists all the parameters as
per the scale -up requirements.

Nodes Nodes RPS RPS_n Total nodes New


(Current) (added) (required) RPS_n

4 0 450 112.5 4

1800

2 6 300

2510

2 8 313.75

3300

2 10 330.00

4120

2 12 343.33

5000

2 14 357.14

13
Scaling
Similarly, in case of scaling down, let initially RPS = 8000 and N_c = 19. Now
RPS is reduced to 6200 and following it RPS_n reaches T_D, here an
autoscaling request is initiated deleting D = 2 nodes. Table - 2 lists all the
parameters as per the scale -down requirements.

Nodes Nodes RPS RPS_n Total New


(Current) (reduced) (required) nodes RPS_n

18 8000 421.05 19

6200

2 17 364.7

4850

2 15 323.33

3500

2 13 269.23

2650

2 11 240.90

1900

2 8 211.11
The given table shows the stepwise increase/ decrease in the cloud capacity
with respect to the change in load on the application(request per node per
second).

Percentage Scaling

In the previous section we discussed how scaling up or down is carried out by


a fixed amount of nodes. Considering the situation when we scale up or down
by a percentage of current capacity we change using percentage change in
current capacity. This seems a more natural way of scaling up or down as we
are already running to some capacity.

The below given algorithm is used to determine the scale up and down
thresholds for respective autoscaling.

-----------------------------------------------------------------------------------------------
Algorithm : 2
-----------------------------------------------------------------------------------------------
Input : SLA specific application
Parameters:
N_min - minimum number of nodes
Resource Provisioning,
D - scale down value. Load Balancing and Security
U - scale up value.
T_U - scale up threshold
T_D - scale down threshold

Let T (SLA) returns the maximum requests per second (RPS) per node for
specific SLA.

T_U ← 0.90 x T (SLA)


T_D ← 0.50 x T_U

Let N_c and RPS_n represent the current number of nodes and incoming
requests per second per node respectively.

L1: /* scale up (if RPS_n> T_U) */


Repeat:
N_(c_old) ←N_c
N_c ←N_c + max(1, N_c x U/100)
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n> T_U

L2: /* scale down (if RPS_n< T_D) */

Repeat:
N_(c_old) ←N_c
N_c ← max(N_min, N_c - max(1, N_c x D/ 100))
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n< T_D or N_c = N_min

Let us now understand the working of this algorithm by an example. Let


N_min = 1, at the beginning RPS = 500 and N_c = 6. Now the demand rises
and RPS reaches to 1540 while RPS_n reaches T_U. Here an upscaling is
requested adding 1 i.e. max(1, 6 x 10/200) nodes.

Similarly in case of scaling down, initial RPS = 5000 and N_c = 19, here RPS
reduces to 4140 and RPS_n reaches T_D requesting scale down and hence
deleting 1 i.e. max(1, 1.8 x 8/100). The detailed example is explained using
Table -3 giving details of upscaling with D = 8, U = 1, N_min = 1, T_D = 230
and T_U = 290 .

Nodes Nodes RPS RPS_n Total New


(Current) (added) (required) nodes RPS_n

6 0 500 83.33 6

1695

15
Scaling
1 7 242.14

2190

1 8 273.75

2600

1 9 288.88

3430

1 10 343.00

3940

1 11 358.18

4420

1 12 368.33

4960

1 13 381.53

5500

1 14 392.85

5950

1 15 396.6

The scaling down with the same algorithm is detailed in the table below.

Nodes Nodes RPS RPS_n Total New


(Current) (added) (required) nodes RPS_n

19 5000 263.15 19

3920

1 18 217.77

3510

1 17 206.47

3200

1 16 200

2850
Resource Provisioning,
Load Balancing and Security
1 15 190

2600

1 14 185.71

2360

1 13 181.53

2060

1 12 171.66

1810

1 11 164.5

1500

150

Here if we compare both the algorithms 1 and 2, it is clear that the values of
the threshold U and D are at the higher side in case of 2. In this scenario the
utilization of hardware is more and the cloud experiences low footprints.

Check your Progress 2


1) Explain the concept of fixed amount auto scaling.
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………

2) In Algorithm 1 for fixed amount auto scaling, calculate the values in table
if U = 3.
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………

3) What is a cool down period?

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

17
Scaling

5.6 TYPES OF SCALING

Let us now discuss the types of scaling, how we see the cloud infrastructure for
capacity enhancing/ reducing. In general we scale the cloud in a vertical or
horizontal way by either provisioning more resources or by installing more
resources.

5.6.1 Vertical scaling or scaling up

The vertical scaling in the cloud refers to either scaling up i.e. enhancing the
computing resources or scaling down i.e. reducing/ cutting down computing
resources for an application. In vertical scaling, the actual number of VMs are
constant but the quantity of the resource allocated to each of them is increased/
decreased. Here no infrastructure is added and application code is also not
changed. The vertical scaling is limited to the capacity of the physical machine
or server running in the cloud. If one has to upgrade the hardware requirements
of an existing cloud environment, this can be achieved by minimum changes.

B 4 CPUs
vertical scaling

A 2 CPUs

An IT resource (a virtual server with two CPUs) is scaled up by replacing it with a more
powerful IT resource with increased capacity for data storage (a physical server with four CPUs).

5.6.2 Horizontal scaling or scaling out

In horizontal scaling, to meet the user requirements for high availability,


excess resources are added to the cloud environment. Here, the resources are
added/ removed as VMs. This includes addition of storage disks, new server
for increasing CPUs or installation of additional RAMs and work like a single
system. To achieve horizontal scaling, a minimum downtime is required. This
Resource Provisioning,
type of scaling allows one to run distributed applications in a more efficient Load Balancing and Security
manner.

Pooled
physical
servers

virtual demand demand


servers

A A B A B C

horizontal scaling
An IT resource (Virtual Server A) is scaled out by adding more of the same IT resources (Virtual Servers B and C).

Another way of maximizing the resource utilization is Diagonal Scaling. This


combines the ideas of both vertical and horizontal scaling. Here, the resource is
scaled up vertically till one hit the physical resource capacity and afterwards
new resources are added like horizontal scaling. The new added resources have
further capacity of being scaled like vertical scaling.

5.7 SUMMARY

In the end, we are now aware of various types of scaling, scaling strategies and
their use in real situations. Various cloud service providers like Amazon AWS,
Microsoft Azure and IT giants like Google offer scaling services on their
application based on the application requirements. These services offer good
help to the entrepreneurs who run small to medium businesses and seek IT
infrastructure support. We have also discussed various advantages of cloud
scaling for business applications.

5.8 SOLUTIONS / ANSWERS

Check Your Progress 1

19
Scaling
1. Cloud being used extensively in serving applications and in other
scenarios where the cost and installation time of infrastructure/ capacity
scaling is expectedly high. Scaling helps in achieving optimized
infrastructure for the current and expected load for the applications with
minimum cost and setup time. Scaling also helps in reducing the
disaster recovery time if happens. (for details see section 5.3)

2. How proactive scaling is achieved through virtualization: The


proactive scaling is a process of forecasting and then managing the load
on the could infrastructure in advance. The precise forecasting of the
requirement is key to success here. The preparedness for the estimated
traffic/ requirements is done using the virtualization. In virtualization,
various resources may be assigned to the required machine in no time
and the machine can be scaled to its hardware limits. The virtualization
helps in achieving low cool down period and serve instantly. (for
details you may refer Resource Utilization Unit.)

3. The reactive scaling technique only works for the actual variation of
load on the application however, the combination works for both
expected and real traffic. A good estimate of load increases
performance of the combinational scaling.

4. Following are the differences between Scaling and Cloud Elasticity:

Scaling Cloud Elasticity

Increasing the capacity to meet the Increasing or Reducing the capacity to


increasing workload. meet the increasing or reducing the
workload.

In a scaling environment, the In elasticity environment, the


available resources may exceed to available resources match the current
meet the future demands. demands.

Scalability adapts only to the It adopts to both the workload


workload increase by provisioning the increase and workload decrease in an
resources in an incremental manner. automatic manner.

Scalability enables a corporate to Elasticity enables a corporate to meet


meet expected demands for services the unexpected changes in the demand
with long term strategic needs. for service with “short-term” tactical
needs.

Check Your Progress 2

2. The fixed amount scaling is a simplistic approach for scaling in cloud


environment. Here the resources are scaled up/ down by a user defined
number of nodes. In fixed amount scaling resource utilization is not
optimized. It can also happen that only a small node can solve the
Resource Provisioning,
resource crunch problem but the used defined numbers are very high Load Balancing and Security
leading to underutilized resources. Therefore a percentage amount of
scaling is a better technique for optimal resource usage.
3. In Algorithm 1 for fixed amount auto scaling, calculate the values in
table if U = 3: For the given U = 3, following calculations are made.

Nodes Nodes RPS RPS_n Total nodes New


(Curren (added) (required) RPS_n
t)

4 0 450 112.5 4

1800

3 7 257.14

2510

3 10 251

3300

3 13 253.84

4120

3 16 257.50

5000

3 19 263.15

3. When auto scaling takes place in cloud, a small time interval (pause)
prevents the triggering next auto scale event. This helps in maintaining
the integrity in the cloud environment for applications. Once the cool
down period is over, next auto scaling event can be accepted.

5.9 FURTHER READINGS

1. Cloud Computing: Principles and Paradigms, Rajkumar Buyya, James


Broberg and Andrzej M. Goscinski, Wiley, 2011.
2. Mastering Cloud Computing, Rajkumar Buyya, Christian Vecchiola,
and Thamarai Selvi, Tata McGraw Hill, 2013.
3. Essentials of cloud Computing: K. Chandrasekhran, CRC press, 2014.
4. Cloud Computing, Sandeep Bhowmik, Cambridge University Press,
2017.

21

You might also like