Cloud Elasticity and Scaling Strategies
Cloud Elasticity and Scaling Strategies
UNIT 5 SCALING
Structure
5.0 Introduction
5.1 Objectives
5.2 Cloud Elasticity
5.3 Scaling Primitives
5.4 Scaling Strategies
5.4.1 Proactive Scaling
5.4.2 Reactive Scaling
5.4.3 Combinational Scaling
5.5 Auto Scaling in Cloud
5.6 Types of Scaling
5.6.1 Vertical Scaling or Scaling Up
5.6.2 Horizontal Scaling or Scaling Out
5.7 Summary
5.8 Solutions/Answers
5.9 Further Readings
5.0 INTRODUCTION
In the earlier unit we had studied resource pooling, sharing and provisioning in
cloud computing. In this unit let us study other important characteristic
features of cloud computing – Cloud Elasticity and Scaling.
In this unit we will focus on the various methods and algorithms used in the
process of scaling. We will discuss various types of scaling, their usage and a
few examples. We will also discuss the importance of various techniques in
saving cost and man efforts by using the concepts of cloud scaling in highly
1
Scaling
dynamic situations. The suitability of scaling techniques in different scenarios
is also discussed in detail.
5.1 OBJECTIVES
Cloud Elasticity is the property of a cloud to grow or shrink capacity for CPU,
memory, and storage resources to adapt to the changing demands of an
organization. Cloud Elasticity can be automatic, without need to perform
capacity planning in advance of the occasion, or it can be a manual process
where the organization is notified they are running low on resources and can
then decide to add or reduce capacity when needed. Monitoring tools offered
by the cloud provider dynamically adjust the resources allocated to an
organization without impacting existing cloud-based operations.
In the absence of Cloud Elasticity, organizations would face paying for largely
unused capacity and handling the ongoing management and upkeep of that
capacity, including tasks like OS upgrades, patching, and addressing
component failures. Cloud Elasticity serves as a defining factor in cloud
computing, setting it apart from other models like client-server setups, grid
computing, or traditional infrastructure.
Cloud Elasticity acts as a vital tool for businesses, preventing both over-
provisioning (allocating more IT resources than necessary for current
demands) and under-provisioning (failing to allocate sufficient resources to
meet existing or imminent demands).
3
Scaling
given time, eliminating the necessity to acquire or retire on-premises
infrastructure to cater to fluctuating demand.
Now let us study scaling concept in the next section after understanding the
cloud elasticity and underlying concepts.
• Minimum cost: The user has to pay a minimum cost for access usage
of hardware after upscaling. The hardware cost for the same scale can
be much greater than the cost paid by the user. Also, the maintenance
and other overheads are also not included here. Further, as and when
the resources are not required, they may be returned to the Service
provider resulting in the cost saving.
5
Scaling
• Ease of use: The cloud upscaling and downscaling can be done in just
a few minutes (sometime dynamically) by using service providers
application interface.
Cost
Workload
Checkpoint
Time
Cost
Workload
Checkpoint
Time
In the case of the clouds, virtual environments are utilized for resource
allocation. These virtual machines enable clouds to be elastic in nature which
can be configured according to the workload of the applications in real time. In
such scenarios, downtime is minimized and scaling is easy to achieve.
On the other hand, scaling saves cost of hardware setup for some small time
peaks or dips in load. In general most cloud service providers provide scaling
as a process for free and charge for the additional resource used. Scaling is also
a common service provided by almost all cloud platforms.
Let us now see what the strategies for scaling are, how one can achieve scaling
in a cloud environment and what are its types. In general, scaling is categorized
7
Scaling
based on the decision taken for achieving scaling. The three main strategies for
scaling are discussed below.
Time of Day
Figure 3: Proactive Scaling
5.4.2 Reactive Scaling
The reactive scaling often monitors and enables smooth workload changes to
work easily with minimum cost. It empowers users to easily scale up or down
computing resources rapidly. In simple words, when the hardware like CPU or
RAM or any other resource touches highest utilization, more of the resources
are added to the environment by the service providers. The auto scaling works
on the policies defined by the users/ resource managers for traffic and scaling.
One major concern with reactive scaling is a quick change in load, i.e. user
experiences lags when infrastructure is being scaled. The given figure 4 shows
the resource provision in reactive scaling.
Resource Provisioning,
Load Balancing and Security
F
Load
Time of Day
Figure 4: Proactive Scaling
5.4.3 Combinational Scaling
Till now we have seen need based and forecast based techniques for scaling.
However, for better performance and low cool down period we can also
combine both of the reactive and proactive scaling strategies where we have
some prior knowledge of traffic. This helps us in scheduling timely scaling
strategies for expected load. On the other hand, we also have provision of load
based scaling apart from the predicted load on the application. This way both
the problems of sudden and expected traffic surges are addressed.
Working User sets the threshold but a User defined threshold values
downtime is required. optimize the resources
9
Scaling
Check Your Progress 1
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………
In a cloud, auto scaling can be achieved using user defined policies, various
machine health checks and schedules. Various parameters such as Request
counts, CPU usage and latency are the key parameters for decision making in
autoscaling. A policy here refers to the instruction sets for clouds in case of a
Resource Provisioning,
particular scenario (for scaling -up or scaling -down). The autoscaling in the Load Balancing and Security
cloud is done on the basis of following parameters.
The process of auto scaling also requires some cooldown period for resuming
the services after a scaling takes place. No two concurrent scaling are triggered
so as to maintain integrity. The cooldown period allows the process of
autoscaling to get reflected in the system in a specified time interval and saves
any integrity issues in cloud environment.
Costs
Workload
Time
Consider a more specific scenario, when the resource requirement is high for
some time duration e.g. in holidays, weekends etc., a Scheduled scaling can
also be performed. Here the time and scale/ magnitude/ threshold of scaling
can be defined earlier to meet the specific requirements based on the previous
knowledge of traffic. The threshold level is also an important parameter in auto
scaling as a low value of threshold results in under utilization of the cloud
resources and a high level of threshold results in higher latency in the cloud.
After adding additional nodes in scale-up, the incoming requests per second
drops below the threshold. This results in triggering the alternate scale-up-
down processes known as a ping-pong effect. To avoid both under-scaling and
overscaling issues load testing is recommended to meet the service level
agreements (SLAs).
1. The number of incoming requests per second per node > threshold of
scale down, after scale-up.
2. The number of incoming requests per second per node < threshold of
scale up, after scale-down
Here, in both the scenarios one should reduce the chances of ping-pong effect.
Now we know what scaling is and how it affects the applications hosted on the
cloud. Let us now discuss how auto scaling can be performed in fixed amounts
as well as in percentage of the current capacity.
--------------------------------------------------------------------------------------------
Algorithm : 1
--------------------------------------------------------------------------------------------
Input : SLA specific application
Parameters:
N_min minimum number of nodes
D - scale down value.
U scale up value.
T_U scale up threshold
T_D scale down threshold
Let T (SLA) return the maximum incoming request per second (RPS) per node
for the specific SLA.
Let N_c and RPS_n represent the current number of nodes and incoming
requests per second per node respectively.
Resource Provisioning,
Load Balancing and Security
L1: /* scale up (if RPS_n> T_U) */
Repeat:
N_(c_old) ←N_c
N_c ←N_c + U
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n> T_U
Repeat:
N_(c_old) ←N_c
N_c ← max(N_min, N_c - D)
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n< T_D or N_c = N_min
Now, let us discuss how this algorithm works in detail. Let the values of a few
parameters are given as U = 2, D = 2, T_U = 120 and T_D = 150. Suppose in
the beginning, RPS = 450 and N_c = 4. Now RPS is increased to 1800 and
RPS_n almost reached to T_U, in this situation an autoscaling request is
generated leading to adding U = 2 nodes. Table - 1 lists all the parameters as
per the scale -up requirements.
4 0 450 112.5 4
1800
2 6 300
2510
2 8 313.75
3300
2 10 330.00
4120
2 12 343.33
5000
2 14 357.14
13
Scaling
Similarly, in case of scaling down, let initially RPS = 8000 and N_c = 19. Now
RPS is reduced to 6200 and following it RPS_n reaches T_D, here an
autoscaling request is initiated deleting D = 2 nodes. Table - 2 lists all the
parameters as per the scale -down requirements.
18 8000 421.05 19
6200
2 17 364.7
4850
2 15 323.33
3500
2 13 269.23
2650
2 11 240.90
1900
2 8 211.11
The given table shows the stepwise increase/ decrease in the cloud capacity
with respect to the change in load on the application(request per node per
second).
Percentage Scaling
The below given algorithm is used to determine the scale up and down
thresholds for respective autoscaling.
-----------------------------------------------------------------------------------------------
Algorithm : 2
-----------------------------------------------------------------------------------------------
Input : SLA specific application
Parameters:
N_min - minimum number of nodes
Resource Provisioning,
D - scale down value. Load Balancing and Security
U - scale up value.
T_U - scale up threshold
T_D - scale down threshold
Let T (SLA) returns the maximum requests per second (RPS) per node for
specific SLA.
Let N_c and RPS_n represent the current number of nodes and incoming
requests per second per node respectively.
Repeat:
N_(c_old) ←N_c
N_c ← max(N_min, N_c - max(1, N_c x D/ 100))
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n< T_D or N_c = N_min
Similarly in case of scaling down, initial RPS = 5000 and N_c = 19, here RPS
reduces to 4140 and RPS_n reaches T_D requesting scale down and hence
deleting 1 i.e. max(1, 1.8 x 8/100). The detailed example is explained using
Table -3 giving details of upscaling with D = 8, U = 1, N_min = 1, T_D = 230
and T_U = 290 .
6 0 500 83.33 6
1695
15
Scaling
1 7 242.14
2190
1 8 273.75
2600
1 9 288.88
3430
1 10 343.00
3940
1 11 358.18
4420
1 12 368.33
4960
1 13 381.53
5500
1 14 392.85
5950
1 15 396.6
The scaling down with the same algorithm is detailed in the table below.
19 5000 263.15 19
3920
1 18 217.77
3510
1 17 206.47
3200
1 16 200
2850
Resource Provisioning,
Load Balancing and Security
1 15 190
2600
1 14 185.71
2360
1 13 181.53
2060
1 12 171.66
1810
1 11 164.5
1500
150
Here if we compare both the algorithms 1 and 2, it is clear that the values of
the threshold U and D are at the higher side in case of 2. In this scenario the
utilization of hardware is more and the cloud experiences low footprints.
2) In Algorithm 1 for fixed amount auto scaling, calculate the values in table
if U = 3.
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
17
Scaling
Let us now discuss the types of scaling, how we see the cloud infrastructure for
capacity enhancing/ reducing. In general we scale the cloud in a vertical or
horizontal way by either provisioning more resources or by installing more
resources.
The vertical scaling in the cloud refers to either scaling up i.e. enhancing the
computing resources or scaling down i.e. reducing/ cutting down computing
resources for an application. In vertical scaling, the actual number of VMs are
constant but the quantity of the resource allocated to each of them is increased/
decreased. Here no infrastructure is added and application code is also not
changed. The vertical scaling is limited to the capacity of the physical machine
or server running in the cloud. If one has to upgrade the hardware requirements
of an existing cloud environment, this can be achieved by minimum changes.
B 4 CPUs
vertical scaling
A 2 CPUs
An IT resource (a virtual server with two CPUs) is scaled up by replacing it with a more
powerful IT resource with increased capacity for data storage (a physical server with four CPUs).
Pooled
physical
servers
A A B A B C
horizontal scaling
An IT resource (Virtual Server A) is scaled out by adding more of the same IT resources (Virtual Servers B and C).
5.7 SUMMARY
In the end, we are now aware of various types of scaling, scaling strategies and
their use in real situations. Various cloud service providers like Amazon AWS,
Microsoft Azure and IT giants like Google offer scaling services on their
application based on the application requirements. These services offer good
help to the entrepreneurs who run small to medium businesses and seek IT
infrastructure support. We have also discussed various advantages of cloud
scaling for business applications.
19
Scaling
1. Cloud being used extensively in serving applications and in other
scenarios where the cost and installation time of infrastructure/ capacity
scaling is expectedly high. Scaling helps in achieving optimized
infrastructure for the current and expected load for the applications with
minimum cost and setup time. Scaling also helps in reducing the
disaster recovery time if happens. (for details see section 5.3)
3. The reactive scaling technique only works for the actual variation of
load on the application however, the combination works for both
expected and real traffic. A good estimate of load increases
performance of the combinational scaling.
4 0 450 112.5 4
1800
3 7 257.14
2510
3 10 251
3300
3 13 253.84
4120
3 16 257.50
5000
3 19 263.15
3. When auto scaling takes place in cloud, a small time interval (pause)
prevents the triggering next auto scale event. This helps in maintaining
the integrity in the cloud environment for applications. Once the cool
down period is over, next auto scaling event can be accepted.
21