0% found this document useful (0 votes)
35 views19 pages

FLAS

The document presents FLAS (Forecasted Load Auto-Scaling), an innovative auto-scaling architecture for distributed services that combines proactive and reactive approaches to optimize resource allocation based on predictive models of high-level metrics. FLAS aims to ensure compliance with Service Level Agreements (SLAs) by anticipating changes in performance metrics and utilizing a less invasive reactive system to adapt to various applications. The system has been validated through extensive testing, demonstrating over 99% compliance with performance requirements in various scenarios.

Uploaded by

jrssjtrunksjr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

FLAS

The document presents FLAS (Forecasted Load Auto-Scaling), an innovative auto-scaling architecture for distributed services that combines proactive and reactive approaches to optimize resource allocation based on predictive models of high-level metrics. FLAS aims to ensure compliance with Service Level Agreements (SLAs) by anticipating changes in performance metrics and utilizing a less invasive reactive system to adapt to various applications. The system has been validated through extensive testing, demonstrating over 99% compliance with performance requirements in various scenarios.

Uploaded by

jrssjtrunksjr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

FLAS: a combination of proactive and reactive auto-scaling architecture for

distributed services

Vı́ctor Rampéreza,∗, Javier Sorianoa , David Lizcanob , Juan A. Larab


a Universidad Politécnica de Madrid (UPM), 28660 - Boadilla del Monte, Madrid, Spain
b Madrid Open University (UDIMA), 28400 Collado Villalba, Madrid, Spain

Abstract
Cloud computing has established itself as the support for the vast majority of emerging technologies, mainly due to the
characteristic of elasticity it offers. Auto-scalers are the systems that enable this elasticity by acquiring and releasing
resources on demand to ensure an agreed service level. In this article we present FLAS (Forecasted Load Auto-Scaling),
an auto-scaler for distributed services that combines the advantages of proactive and reactive approaches according to
the situation to decide the optimal scaling actions in every moment. The main novelties introduced by FLAS are (i)
a predictive model of the high-level metrics trend which allows to anticipate changes in the relevant SLA parameters
(e.g. performance metrics such as response time or throughput) and (ii) a reactive contingency system based on the
estimation of high-level metrics from resource use metrics, reducing the necessary instrumentation (less invasive) and
allowing it to be adapted agnostically to different applications. We provide a FLAS implementation for the use case of
a content-based publish-subscribe middleware (E-SilboPS) that is the cornerstone of an event-driven architecture. To
the best of our knowledge, this is the first auto-scaling system for content-based publish-subscribe distributed systems
(although it is generic enough to fit any distributed service). Through an evaluation based on several test cases recreating
not only the expected contexts of use, but also the worst possible scenarios (following the Boundary-Value Analysis or
BVA test methodology), we have validated our approach and demonstrated the effectiveness of our solution by ensuring
compliance with performance requirements over 99% of the time.
Keywords: Cloud, Elasticity, Automatic Scaling, Distributed Systems
2010 MSC: 68-04

1. Introduction high-performance distributed systems (publish/subscribe


20 message brokers, distributed stream processing systems or
We have seen how in just a few years society has trans- distributed datastores), which are the cornerstones of these
formed and evolved towards an increasingly digitalized architectures [2, 3, 4].
world, where all aspects of its daily life depend on tech- The key feature of Cloud Computing is elasticity, which
5 nology and more specifically on Internet services. For all is the capability to acquire and release resources on de-
these reasons, it is not surprising that computer resources 25 mand to meet end-user requirements, which are formally
are now an essential utility in modern societies on a par expressed through Service Level Agreements or SLAs. How-
with electricity, gas or water. As a result, cloud comput- ever, it is not a trivial task to decide the exact amount of
ing emerged as a way to provide computing resources as a resources needed at any given time to meet these SLAs.
10 service, that is, on-demand computing resources that users There are several types of SLAs depending on the magni-
acquire on a pay-as-you-go basis. 30 tude that end users want to manage such as performance,
Cloud computing has been consolidated as a support cost or energy consumption. Therefore, an auto-scaling
for the vast majority of current and emerging technologies. system is desirable to free the users from the burden of
For example, the widespread adoption of event-driven ar- adjusting allocated resources to meet SLAs at any given
15 chitectures [1], which are essential for real-time-sensitive time. The main objective of auto-scaling systems is to
digital business such as IoT (Internet of Things), has been 35 avoid both over-provisioning and under-provisioning of re-
possible because cloud computing is able to provide an sources, which would increase the cost and violate the SLA
infrastructure that meets the requirements demanded by respectively.
Many auto-scaling systems have been developed in both
∗ Corresponding the literature and the industry proposing different approaches
author
Email addresses: [email protected] (Vı́ctor Rampérez ), 40 to the problem. These auto-scaling techniques are classi-
[email protected] (Javier Soriano), [email protected] fied into two major groups: (i) reactive techniques, where
(David Lizcano), [email protected] (Juan A. Lara) the scaling action is in reaction to a change in the system,

Preprint submitted to Future Generation Computer Systems December 16, 2020


and therefore does not anticipate such a change; and (ii)100 such auto-scaling system can be applied, since it allows
predictive or proactive techniques, which attempt to an- the detection of the resource that is the bottleneck and
45 ticipate future changes in the system by performing the therefore the resource to be scaled regardless of the type
necessary scaling actions before such changes occur [5]. of application and in a totally transparent way for the end
A scaling action is defined by the specific values of its user.
dimensions, i.e. which resource is to be scaled (CPU, mem-105 With all this, although there are several jobs related
ory, network, etc), when to scale, how many resources are to auto-scaling systems in the Cloud, we have identified
50 to be added or removed, and how to scale (horizontal or the following unmet needs. On the one hand, there is a
vertical scaling). An auto-scaling system can be seen as need to be able to relate or establish a mapping between
a system that returns a specific scaling action (with spe- resource utilization metrics or low-level metrics with rel-
cific values to each of the dimensions) based on a series of110 evant high-level metrics in SLAs through some predictive
parameters or input information provided to it (e.g. SLA, model. This would allow identifying the resources that
55 workload, application information to be scaled, predictive act as bottlenecks (KPIs) automatically (without having
models, threshold-based scaling rules, etc) to ensure com- to assume anything), which would have to be monitored
pliance with a SLA. Because of this, auto-scaling systems and scaled to avoid a possible SLA violation in the future
are quite complex and existing approaches usually focus115 or reduce unnecessary costs. On the other hand, there is a
only on one type of SLA (e.g. performance, cost or en- need to develop a predictive model capable of determining
60 ergy consumption), one or two dimensions of the scaling how fast a SLA violation situation or unnecessary over-
actions (e.g. when to scale and how much) and a specific provisioning situation can be reached in order to perform
application or type of application (e.g. distributed stream the necessary scaling action at the most convenient time,
processing systems). 120 as opposed to current scenarios that only predict future
In order to achieve the desired elasticity of an appli- workload and not how this will affect SLA compliance.
65 cation, several works and authors highlight the need to In this paper we propose FLAS (Forecasted Load Auto-
understand the relationship between the low-level behav- Scaling) a proactive and reactive auto-scale architecture of
ior of that application and the high-level parameters of distributed systems. FLAS works by learning and predict-
the SLA to be ensured [6, 7, 8, 9, 10, 11]. Therefore,125 ing patterns in the performance behavior of distributed
auto-scale systems would have to be equipped with the systems in order to take the appropriate scaling decisions
70 necessary mechanisms that allow them to establish a re- at any given time to ensure compliance with SLAs. The
lationship between the low level behavior (i.e. at resource main contribution of this work, and especially of FLAS,
level expressed through resource metrics such as CPU us- is oriented to cover the needs previously identified and
age, memory usage, etc) and the high level behavior (i.e.130 are the following: (i) a predictive model of the high-level
SLA parameters of performance, cost, etc) of the appli- metrics trend which allows to anticipate changes in the
75 cation in order to take the appropriate scaling actions to relevant SLA parameters (e.g. performance metrics such
ensure compliance with the corresponding SLA. However, as response time or throughput) and (ii) a reactive contin-
few works address this problem, as they take for granted gency system based on the estimation of high-level metrics
the resource that is the bottleneck and therefore the re-135 from resource use metrics, reducing the necessary instru-
source to be scaled. For example, many jobs take for mentation (less invasive) and allowing it to be adapted
80 granted that the limiting resource or KPI (Key Perfor- agnostically to different applications.
mance Indicator) is the processing capacity and therefore Due to the great importance of event-driven architec-
their scaling action consists in increasing the processing tures in current technologies like IoT [1], we wanted to
capacity by adding more processors directly (scale-up) or140 evaluate our auto-scaling system with a high performance
more virtual machines (scale-out). We do not doubt that distributed system like a publish-subscribe middleware.
85 in these works the resource that they scale is the adequate Among all the publish-subscribe systems, we have opted
one, since they usually demonstrate it empirically, but we for content-based systems (CBPS) due to the greater com-
defend that the study of this relation between low-level plexity of their scaling actions as a result of the distribu-
metrics and high-level metrics allows to characterize the145 tion of their internal state. More specifically, being FLAS
system in a more precise way. In fact, there are several re- a generic solution, we have chosen to apply it to the E-
90 search works that point in this direction to improve their SilboPS due to the great challenge that it represented,
approaches in their future work [6, 12]. For example, al- being a CBPS that supports transparent, publisher-wise
though the resource to be scaled is processing capacity, the dynamic state repartitioning without client disconnection
percentage of CPU usage may not be the most informative150 and with minimal notification delivery interruption for sub-
metric, and context changes, interruptions, or the percent- scribers [10].
95 age of time processors spend in user or kernel space may Due to privacy issues and commercial interests in re-
be more useful. leasing user information, there is a great lack of publicly
The inclusion of the information of this relationship be- available and realistic workloads for research and evalua-
tween low and high-level metrics in an auto-scaling system155 tion of content-based publish-subscribe systems [13]. There-
considerably extends the range of applications to which fore, the evaluation has been done with synthetic work-
2
loads through several test cases recreating not only the be the bottleneck, but also focus on the CPU resource
expected contexts of use, but other test cases represent- alone, justifying it as frequently being the key resource in
ing the worst possible scenarios (following the Boundary- determining performance.
160 Value Analysis or BVA test methodology). The results of215 Many studies have identified the need to establish some
this evaluation show how the integration of proactive tech- kind of relationship or mapping between the high-level
niques with models to predict workload behavior, scaling metrics, in which cloud consumers are interested, and the
time and relationship between low and high-level metrics, low-level metrics offered by cloud providers, in order to
together with a reactive contingency system, results in a establish mechanisms to ensure that the service levels de-
165 minimum violation of the established SLAs (less than 1%220 manded by cloud consumers are met. According to [7],
of run time). there is a gap between monitored metrics (low-level enti-
The rest of the document is organized as follows: Sec- ties) and SLAs (high-level user guarantee parameters) and
tion 2 reviews the related work analyzing the different aca- none of the approaches discussed in their work deal with
demic and commercial solutions proposed. Section 3 intro- the mappings of low-level monitored metrics to high-level
170 duces the system modeling and presents the problem to be225 SLA guarantees necessary in cloud-like environments. In
addressed. The architecture of FLAS is explained in detail the same vein, Paschke et al. [9] highlight the problem of
in Section 4. Sections 5 and 6 describe the evaluation of the poor translation of SLAs into low-level metrics, claim-
FLAS with a distributed content-based publish-subscribe ing that the metrics used to measure and manage perfor-
system (E-SilboPS) through multiple test cases and an- mance compliance to SLA commitments are the heart of
175 alyze the quantitative results of such evaluation, respec-230 a successful agreement and that inexperience in the use
tively. Finally, the conclusions of this work are raised in and automation of performance metrics causes problems
Section 7 and future lines are expressed in Section 8. for many organizations as they attempt to formulate their
SLA strategies and set the metrics needed to support those
2. Related work strategies. Springs et al. [8] again also clearly identify the
235 need to address this problem, stating that a key prerequi-
Many auto-scaling systems, both academic and com- site for meeting these goals is to understand the relation-
180 mercial, have been proposed recently due to the ubiquity ship between high-level SLA parameters (e.g., availability,
of cloud computing and the improvement of predictive throughput, response time) and low-level resource metrics,
systems in recent years, using diverse approaches based such as counters and gauges. However, it is not easy to
on both reactive and predictive (also known as proactive)240 map SLA parameters to metrics that are retrieved from
strategies [5, 14, 15, 16]. The reactive approach, widely managed resources. In [12] the authors claim that specif-
185 studied in the past, is usually based mainly on threshold- ically domain experts are usually involved in translating
based rules techniques with different variations to solve or these SLOs into lower-level policies that can then be used
mitigate some of the intrinsic problems of this approach, for design and monitoring purposes, as this often necessi-
such as the use of cool-down times (also called inertia or245 tates the application of domain knowledge to this problem.
calm) or dynamic thresholds. In recent years, more fo- In [20], the authors go further and establish correlation
190 cus has been placed on predictive solutions using machine models between absolute resource utilization metrics (i.e.
learning, reinforcement learning, queuing theory, control “measures report about the cumulative activity counters
theory or time series analysis techniques, among others. in the operating system”) and relative resource utilization
The vast majority of these works tend to focus on250 metrics (i.e. “those performance measures which values
the temporal dimension (when to scale) and the quanti- are based on the data collected from the /cgroup virtual
195 tative dimension (how much to scale), making it obvious file”), demonstrating that the use of relative resource uti-
which resource to scale. Usually this dimension of scaling lization metrics underestimates the capacity required and
is not analyzed because it is considered trivial as a re- therefore are not appropriate for determining the amount
sult of previous knowledge of the application to be scaled.255 of resources needed to meet performance SLA (e.g. Re-
Moreover, usually this resource is the processing capacity, sponse Time). There are many works that make this map-
200 taking as Key Performance Indicator (KPI) the percent- ping between high and low-level metrics using black-box
age of CPU usage or some similar metric in this sense prediction techniques (e.g. Artificial Neural Networks or
[17, 12, 16, 18, 19]. Nevertheless, there are many works ANN), which even performing very accurate predictions,
that in their future lines mark the need to study other dif-260 do not allow to really understand the relationships between
ferent scaling metrics, including some of the authors that these levels of metrics [17, 11]. We believe that it is essen-
205 highlight the limitation of the previous approach [17, 12]. tial to be able to understand these relationships in order
For example, in [17] Lombardi, F. et al. consider the CPU to characterize and classify the applications, and that is
as the KPI since it is the most prominent bottleneck for why we have opted for a statistical method based on re-
the type of application they scale, however they also note265 gression that allows us to clearly interpret the established
the intention to include memory and bandwidth in a more mappings with adequate predictive accuracy.
210 complete model as future work. In the same vein, the au- On the other hand, event-driven architectures are be-
thors of [12] recognize that many resources can potentially coming more prevalent recently in multiple technological
3
paradigms, with message brokers being the cornerstone therefore detect the resource that is the bottleneck, as well
270 of these architectures [1]. One of the best implementa- as the metric(s) that monitor it (KPI) and therefore the
tions of these message brokers are content-based publish- resource to be scaled.
subscribe systems (CBPS) because of their ability to al-
low subscribers to specify their interests and only receive
3. System model and problem statement
notifications according to those interests, as opposed to
275 the processing overhead that subscribers to topic-based330 3.1. System configuration model
publish-subscribe systems have to perform [10, 13, 21, 22,
We consider a cluster of M nodes, understanding a
23]. Therefore, in this work we have opted for a content-
node as an abstract entity of infrastructure that allows to
based publish-subscribe distributed system to evaluate our
execute a software (i.e. physical or virtual machines of
implementation of FLAS. More specifically we have used
a cluster, containers, etc), in which the different opera-
280 our previous work, E-SilboPS [21, 22, 23], which is a content-
335 tors of a distributed system are executed. Each operator
based publish-subscribe system specifically designed to be
can have several instances, and the number of instances
elastic due to its scaling algorithm. Its architecture is in-
of each operator can be increased and decreased indepen-
spired by other CBPS like SIENA [24, 25] and E-StreamHub
dently. Thus, a 1-3-2 configuration indicates that there is
[26]. Despite their importance, an obstacle to the research
1 instance of operator 1, 3 instances of operator 2 and 2
285 of these systems is the lack of real and publicly available
340 instances of the third operator. Therefore, a node can be
workloads, due to the privacy issue involved in disclosing
composed of a number of operator instances that can vary
the interests (subscriptions) of users and other commercial
over time.
interests of the companies. The authors of [13] note this
Through a scaling action or sa, the operator instances
problem and address it by proposing a wide-area workload
of a certain configuration can be increased or decreased.
290 generator for content-based publish-subscribe systems. For
345 Continuing with the notation of F. Lombardi et al. in
this purpose, both subscriber interests and geographic lo-
[17], a scaling action requires a time Tsa that depends on
cations are generated through statistical summaries of pub-
the workload, and in case a reconfiguration of the internal
lic data traces. However, despite indicating its intention
state of the instances is necessary, it will also depend on the
to make this generator public, it is not currently available.
size of the state that has to be exchanged and the number
295 F.Lombardi et al. present in [17] a work that is closely
350 of instances before and after performing the scaling action.
related to FLAS. In that work they introduce PASCAL,
According to [27], a scaling action is defined by three
which is a predictive auto-scaling system for distributed
points in time: (i) sa demand point (DPsa ) which is the
systems by predicting workload patterns, estimating the
point at which a new configuration is required, i.e. a scal-
minimum configuration required by the application and
ing action; (ii) sa triggering point (T Psa ) which is the
300 making decisions on the corresponding scaling action based
355 point at which a scaling action is activated, and (iii) sa
on this information at each moment. More specifically,
reconfiguration point (RPsa ) is the point in time at which
PASCAL predicts the workload input rate and estimates
the scaling action has been completely terminated. There-
the application’s performance at each moment in order
fore, as shown in Figure 1, the time of a scaling action Tsa
to estimate the minimum required configuration and take
is calculated as the difference between RPsa and T Psa ,
305 the scaling decisions that will allow reaching that mini-
360 Tsa = RPsa − T Psa .
mum configuration. Both FLAS and PASCAL work in
In addition, a scaling action is clearly defined by spe-
two phases, a monitoring and learning phase for the gen-
cific values of four dimensions, namely:
eration of the predictive models and an auto-scaling phase
in which the decisions about the scaling actions to be per- When. It refers to the time at which the scaling action
310 formed are made. As for the predictive part, its objective should be performed. As mentioned above, there
is to predict the workload input rate, while in our ap-365 are two approaches, (i) reactive techniques in which
proach we seek to predict the trend of the relevant SLA the scaling action is taken in reaction to a change
parameters (e.g. throughput or response time), which al- (a certain condition is met) and (ii) predictive (also
lows us to predict how quickly a state of SLA violation known as proactive) techniques that aim to antici-
315 can be reached. In addition, their performance estima- pate changes before they occur in order to add or re-
tion model currently only takes into account CPU usage,370 lease the necessary resources by the time that change
which we consider very limited, as stated above, compared occurs.
to FLAS which establishes mappings between high-level
metrics and a large set of low-level metrics. Furthermore, How. It represents the type of scaling (horizontal or ver-
320 PASCAL uses models based on Artificial Neural Networks tical) and the specific scaling action (scale-out/in or
for its predictions, which although it provides them with scale-up/down for horizontal and vertical scaling re-
very good prediction results, it does not allow the under-375 spectively).
standing of the relationships between these levels of met-
What. It refers to which resource must be scaled to meet
rics. Our proposal, through statistical methods based on
a given SLA. Some applications may be CPU bound
325 regression, allows us to understand these relationships, and
4
while others may be memory bound or limited by
other resources.
380 How much. It denotes the amount of resources that must SLART
be added or released to satisfy the SLA.

Although FLAS is a generic solution, for the sake of

RT
clarity in this work we are going to focus on a specific
case combining predictive and reactive techniques (when),
385 horizontal scaling (how ), a CPU-bound type application,
which implies that the resource to be scaled is the pro-
cessing capacity (what) and as a first approximation we
are going to add or reduce the resources to double or half
in each scaling action (how much). TPsa RPsaDPsa
Time
390 3.2. Workload and performance model
As indicated above, there is a wide variety of SLAs to Figure 1: Example in which the response time of a distributed system
reflect different end-user interests. In this paper we will increases rapidly as a consequence of saturation. To avoid exceeding
the maximum response time imposed by the SLA it is necessary to
focus on performance SLAs. In particular we will focus perform a scale-out action that is triggered in T Psa and ends in
on the two performance metrics par excellence in this area RPsa . The scaling time will be Tsa = RPsa − T Psa . As it is a scale-
395 (high-level metrics or SLA parameters) which are through- out operation that ends before the DPsa instant, the system will be
in over-provisioning a time equal to DPsa − RPsa .
put and response time, although economic cost and energy
consumption are also considered to the extent that over-
provisioning is minimized. However, FLAS could work Saturation or overloaded: occurs when the input rate is
with other types of SLAs as economic cost or energy con- higher than the service rate, i.e. λ(t) > S(t) and
400 sumption by modifying the SLA parameters and providing therefore messages have to be queued. This causes
the corresponding sources of monitoring of those parame- the response time to increase exponentially (Figure
ters. 435 1) and the throughput to remain constant at a value
Following the modeling described in [17, 28], end users close to the service rate X(t) ' S(t).
or clients interact with the distributed system by sending
405 messages. This input rate or workload λ(t) is defined as Although the solution proposed in this paper is not
the number of messages received by the system per unit based on queuing theory, we do believe that it is very
of time in a given instant t. The system has a capacity to interesting for the modelization of our problem. More
process a certain number of messages per time unit called440 specifically, as indicated by k.Lazowska in [28], asymptotic
service time S. Depending on the application, the service bound analysis provides optimistic and pessimistic limits
410 time can be constant or can be variable depending on the for throughput and response time that provide rapid in-
system state at a time t, S(t). The system is said to be sights that are essential for determining the main factors
in saturation or overload when it receives more messages affecting performance.
per time unit than it can process in that same amount of445 When the system reaches its saturation point, the sys-
time, i.e. λ(t) > S(t). When the system is in saturation, tem begins to act in a saturated state and performance
415 the successive messages are queued, since they cannot be degrades and therefore some service level objective of the
processed immediately. The response time RT (t) is the SLA is often violated, especially if the trend continues. To
time required by the system to process a message, which avoid this, a scale-out action is usually performed, which
will depend on the occupation of the system, since if the450 allows the necessary resources to be added so that the
system is saturated the response time will be greater be- service offered does not degrade. On the other hand, a
420 cause it has, in addition to the processing time S(t), to scale-in action is necessary when the current resources are
wait a time in the queue before being processed. On the greater than those required to provide service without vi-
other hand, the throughput X(t) is defined as the amount olating the SLA, thus saving costs or energy consumption.
of messages that the system can process per unit of time.455 Considering that the objective of the scaling actions is
The behavior of both metrics will be determined by the to return or maintain the system in a normal operating
425 state of the system: state to avoid non-compliance with SLAs, the scaling ac-
tion should ideally be completed at the same time as it is
Normal: input rate is less than or equal to the service demanded. Therefore, one of the objectives of the scaling
rate, i.e. λ(t) ≤ S(t) and therefore the response time460 systems in terms of the time dimension would be to ensure
can be equivalent to the service rate, i.e. RT ' S(t) RPsa = DPsa through reactive or proactive techniques.
and the throughput will be equal to the input rate, When this is not fulfilled, one of these two alternatives
430 i.e. X(t) ' λ(t) (ignoring propagation delays). occurs:

5
• RPsa > DPsa : in this case the scaling action will be DISTRIBUTED SYSTEM
465 completed after it is needed. In the case of a scale- OP1,1 OP2,1 OPN,1
out it would mean an under-provisioning of resources WORKLOAD .
.
.
.
.
.
.
.
.

(a configuration with less resources than required) OP1,i OP2,j OPN,k

and in the case of a scale-in it would mean an over- Workload Scaling action
metrics
provisioning of resources (a configuration with more
Configuration
470 resources than required). DECIDER
Parameters
Scaling SCALING TIME Monitoring
Time
• RPsa < DPsa : in the opposite case to the previous FORECASTER Service
Model
one, the scaling action ends before the instant it is low-leve
necessary. In this case, if the scaling action was a Scaling time Workload metrics
forecast trend Performance
scale-out or a scale-in we will have over-provisioning forecast forecast
475 or under-provisioning of resources respectively (Fig- Workload WORKLOAD Metric
ure 1). TREND PERFORMANCE
trend Mapping
FORECASTER FORECASTER
Model Model
3.3. Problem statement
FLAS AUTO-SCALER
In general, the problem of auto-scaling is to calculate
the specific scaling action (i.e. specific values at each of its
480 dimensions) needed at each moment to ensure compliance Figure 2: Functional diagram of the components that form the FLAS
with an SLA. Since this problem is too vast to cover its architecture and the integration with a distributed system.
entire domain, and because elasticity is a per-application
task [2], in this paper we are going to focus on a subset that compose it and its workflow. The following section
of applications that share a common set of characteristics. (Section 5) describes the functional flow of the components
485 We have focused on a generic auto-scaling solution for high520 that compose the architecture presented here for a specific
performance distributed systems that represents by itself integration with a distributed system.
a quite wide and diverse set of applications. FLAS, like other auto-scaling systems [17], works in
As mentioned above, we have focused our study on per- two phases, a monitoring phase and an auto-scaling phase.
formance SLAs and the dimensions of when to scale and In the first phase, the system collects the necessary data to
490 what resource to scale. The dimension of how to scale is525 acquire the knowledge needed to generate prediction mod-
application dependent and how it is designed, so it can- els. More specifically, it collects data on the workload, the
not be addressed in a generic solution and will have to be evolution of trends over time of this workload, the behavior
application specific. However, it is not a limitation of our in terms of low-level resources and the high-performance
solution as shown in the evaluation of this work. As for variables used in the SLA. Once these models have been
495 how much to scale, we have opted for a first approach of530 generated, in the auto-scaling phase, the different mod-
multiplying or dividing by two the resources to scale that ules will make use of these models to provide the necessary
allows us to verify and evaluate our solution without in- predictions to the decision maker, who will ultimately be
troducing too much complexity. However, we are working responsible for deciding which auto-scaling actions should
to expand our work in this direction. be applied, if any.
500 More specifically, the aim is to minimize the distance535 As shown in Figure 2, the FLAS architecture consists
between the moment when a new configuration is demanded of 4 functional modules: (i) Scaling Time Forecaster, (ii)
and the moment when the scaling action covering that de- Workload Trend Forecaster, (iii) Performance Forecaster
mand is concluded, i.e. minimize |DPsa − RPsa |, to mini- and (iv) Decider. In the following subsections these func-
mize the time of over- and under-provisioning of resources. tional modules will be explained in more detail, however,
505 In addition, as part of the solution to this problem, we540 the implementation of each of them may require slight
intend to develop a model that allows the mapping of low- adjustments to integrate with the different existing dis-
level behavior of the application, represented by the be- tributed systems. An example of this integration is ex-
havior of its low-level or resource metrics, and the SLA plained in more detail in sections 5 and 6. In addition,
parameters, or high-level metrics, to automatically detect each of these modules works as a black-box within the
510 which is the resource to be scaled and which low-level met-545 FLAS architecture, which allows replacing the implemen-
ric(s) are the most descriptive and useful when monitoring tation of each of these modules in a transparent manner
the system (KPIs). All of this with the main objective of as long as they respect the definition (interface) of these
not violating the SLA or minimizing the time that is being modules.
violated.
4.1. Scaling Time Forecaster
515 4. FLAS architecture 550 This module is in charge of predicting the time that
This section presents in a generic and agnostic man- a scaling action will take depending on the workload of
0 0
ner the architecture of FLAS, as well as the components the distributed system, i.e. Tsa (workload), being Tsa the

6
prediction of the scaling time and Tsa the actual time of these resources are saturated (there are no more available),
that scaling action. Although the scaling time is mostly they lead the application to enter a state of saturation and
555 influenced by the workload, the predictions can be more therefore a service degradation. This knowledge is essen-
accurate if other variables are included on which this scal-610 tial, because it indicates (i) the resource(s) to be moni-
ing time could depend on the target distributed system. tored and by means of which metrics, and (ii) the resource
For example, in some distributed systems it is likely that to be scaled so that the application is not saturated, and
the configuration before and after the scaling operation is therefore to avoid SLA violation. As mentioned, there
560 a factor that influences this time, especially if the opera- are few solutions that address this problem in the liter-
tors that are scaled are operators with internal state that615 ature, since they assume the resource to be scaled, and
has to be reconfigured. therefore the metric to be monitored, but this not only re-
stricts the range of application of these solutions, but also,
4.2. Workload Trend Forecaster a model that reflects the behavior at low level could show
This module is responsible for predicting the trend of that it is more effective to monitor other metric(s) (KPIs)
565 system performance (i.e. SLA parameters) in the near fu-620 [17, 12, 16, 18, 19].
ture. Since a single prediction in a single future instant In order to establish the relationship between the be-
can be misleading, it is not only predicted in a single fu- havior at the resource level and the performance level in
ture instant, but in a time horizon h (also called a forecast terms of SLA of the distributed system, the Performance
window) composed of several consecutive future instants. Forecaster is in charge of collecting the low-level met-
570 This prediction time horizon (h) is used to see how these625 rics (low-level resource use metrics) and high-level met-
predictions will evolve at different time points in the future rics (SLA parameters) in the profiling phase to generate a
so that the trend can be consolidated and shown in a way model capable of mapping them. This model will be the
that is not misleading. However, the choice of this h-value one used in auto-scaling phase to make the estimations of
must be made carefully as if it is too short many fluctua- the high-level metrics that will be provided to the Decider.
575 tions may be observed, and if it is too wide relevant details630 Low-level metrics are provided to the Performance Fore-
of the behavior may be lost. The future instant from which caster through a monitoring service, which periodically
these h predictions are made (indicating the value of h the collects utilization metrics. In the profiling phase, this
number of consecutive predictions in time to be made, or service collects a wide variety of resource metrics, which
window size) is usually calculated by means of the scaling after being pre-processed and transformed will be the pre-
580 time predicted by the Scaling Time Forecaster module.635 dictors of the predictive performance model. In the auto-
For example, if h = 4, t0 is the current instant and Tsa 0
(t0 ) scaling phase, the model is able to predict the system’s
is the predicted scaling time for the current instant, then performance based on the values of the resource utiliza-
the Workload Trend Forecaster will make predictions in tion metrics that the monitoring service delivers periodi-
the future instants ti = t0 + Tsa0
(t0 ) + i, ∀i ∈ {0, . . . h − 1} cally. These performance estimations are sent to the De-
585 The objective of this module is to provide the Decider 640 cider along with the Workload Trend Forecaster predic-
module with information on the trend of the performance tions.
of the distributed system in the prediction time horizon,
indicating, for example, if during the prediction time hori- 4.4. Decider
zon the response time will increase exponentially or if the The Decider is the module in charge of determining if it
590 throughput will remain constant in the next instants, both is necessary to trigger any scaling action. This decision is
being symptoms that the system is heading towards its645 taken based on the information received from the previous
saturation point. modules, this is, the performance trend predicted in the
To accomplish this, during the profiling phase, the be- prediction time horizon h and the current performance of
havior of the performance metrics or SLA parameters must the system estimated in that moment. In addition, it needs
595 be recorded as a function of time to generate a time series some configuration parameters that establish the thresh-
model based on which to make these predictions in the650 olds to decide which scaling action to perform, allowing
auto-scaling phase. flexibility and adaptation of this solution to different dis-
tributed systems. For the sake of clarity, this module is
4.3. Performance Forecaster explained in more detail in Section 5.3, where the specific
This module is responsible for the prediction of the implementation of this module for a particular distributed
600 performance metrics (high-level metrics) that compose the655 system is explained.
SLA. More specifically, it aims to monitor the use of low-
level resources to extract a model capable of capturing 5. FLAS for E-SilboPS
the relationships between low-level and high-level metrics.
This is essential because it allows to know the key per- In recent years we have seen the increasing impor-
605 formance indicators (KPIs) of the application, which in- tance of publish-subscribe systems as a consequence of
dicate the bottleneck resource of the application, since if the strong adoption of event-driven architectures, where

7
660 these systems are the cornerstone since they are in charge that reflects the different scaling situations. Scaling ac-
of sending the information asynchronously in the form of tions during this phase are triggered in a reactive manner
events [1, 13]. Compared to topic-based systems, content- using different threshold-based rules on workload. With
based publish-subscribe systems allow subscribers to indi- this training dataset, a linear regression model has been
cate their interests through predicates in a multi-dimensional720 built that allows in the auto-scaling phase to determine the
665 system, which significantly reduces the processing of no- time of a scaling action based on the current workload.
tifications by end users. To achieve this, they are usually The Workload Trend Forecaster is responsible for pre-
implemented as distributed systems matching the incom- dicting the performance trend over a future prediction time
ing notifications to the stored subscriptions by determin- horizon h. For the sake of simplicity and clarity, we have
ing which subscribers should receive each of the incoming725 focused on response time as an SLA performance metric.
670 notifications based on the interests described in each sub- In this case, the trend of performance is translated into the
scription. trend of response time expressed as the first order deriva-
This section describes how FLAS integrates with a tive of response time with respect to the time δRT δt . In the
high performance distributed system such as E-SilboPS. profiling phase, time series of the response time are col-
As mentioned, we have chosen to validate FLAS with E-730 lected (top of Figure 3). A positive value of δRT δt will indi-
675 SilboPS due to its greater complexity in its scaling actions cate an increasing trend and a negative value will indicate
that supports transparent, publisher-wise dynamic state a decreasing trend with a more or less pronounced slope
repartitioning without client disconnection and with mini- depending on the absolute value of the prediction. In this
mal notification delivery interruption for subscribers. This way, we do not directly forecast the response time or the
functionality makes the scaling time dependent on both735 workload, but rather the trend of the response time, which
680 the input workload and the current internal state of the allows us to know how fast a response time that violates
system, which is a major challenge for the evaluation of an the SLA could be reached. To smooth this function and
auto-scaling system such as FLAS since the estimation of avoid fluctuations of the first order derivative, a Savitzky-
the scaling time is quite variable. Golay filter has been applied to the first order derivative
More specifically, E-SilboPS was conceived by us as a740 of the data, which is a digital filter for smoothing the data,
685 content-based publish-subscribe middleware specially de- increasing its accuracy without distorting its trend (bot-
signed to be elastic [10, 21, 22, 23]. It is a distributed tom of Figure 3). These smoothed data were used to gen-
system composed by four layers of operators (Connection erate various time series analysis models, after which the
Point, Access Point, Matcher and Exit Point) forming a model with the lowest prediction error was chosen. This
directed acyclic graph (DAG). Each of these operators745 model is in charge of making the predictions on the trend
690 can have a different number of instances, which can be of the response time in the future prediction time horizon
increased or decreased independently by means of hori- h in the auto-scaling phase. To generate these models,
zontal scaling operations (scale-out/in). The scaling algo- some time series analysis techniques have been evaluated,
rithm allows dynamic distribution without disconnecting such as ARIMA, STL decomposition with an ETS model
clients and with minimal interruption of the notification750 for the seasonally adjusted data and harmonic regression.
695 service. In order to test these models, a cross-validation was per-
It is important to emphasize that the prediction models formed with the models, choosing the one with the lowest
are architecture agnostic, so it is a generic solution, since prediction error (4).
the implementation can differ as long as the defined API During the auto-scaling phase, the Workload Trend
is respected. Nevertheless, the following sections describe755 Forecaster is in charge of forecasting in t0 the values of δRTδti
0
700 the specific implementations that have shown good results in the future instants ti = t0 +Tsa (t0 )+i, ∀i ∈ {0, . . . h−1}
for the case study exposed, as it can be seen from the by means of the prediction model obtained in the profiling
analysis of the results of this evaluation. phase (Figure 5). In this way, we obtain a forecast of the
response time trend in the h future instants after finish-
5.1. Scaling Time Forecaster and Workload Trend Fore-760 ing a possible scaling action that started at the current
caster implementation instant t0 . These forecast values are the ones that will be
705 The Scaling Time Forecaster module is responsible for sent to the Decider. As already mentioned, the decision
0
predicting the scaling time (Tsa ) based on the workload at of which h-value (also known as forecast window size) to
0
a given instant, Tsa (workload). More specifically, as the take is a complex one that depends on the degree of detail
load of content-based publish-subscribe systems is deter-765 desired. On the one hand, a very small h-value can cause
mined by the ratio of notifications per unit of time and large fluctuations in the predicted values, while a too large
710 stored subscriptions, N and S respectively, then the best value can omit fluctuations that are significant and should
possible function is pursued to calculate the time of a scal- trigger a scaling action.
0
ing action based on these parameters, Tsa (N, S). For the
generation of the predictive model, in the profiling phase, 5.2. Performance Forecaster implementation
the scaling times of several scaling actions with different770 As previously mentioned, the monitoring service is in
715 workloads have been collected, in order to obtain a dataset charge of collecting performance metrics in the monitoring
8
2
RT (s)

2
1

δRT
δt
Forecasted
0
0
8

4
δRT
δt

-2
0

-4

0 10 20 30 40 50 1 2 3 4 5 6 7 8 9 10 11 12
Time (ms) Forecast horizon
SavitzkyGolay filter length 1001 101 2001
Figure 5: Forecasts of the values of δRT
δti
in a prediction time hori-
Figure 3: Part of a time series of the response time data before (top) zon using the harmonic regression model ARIM A(2, 0, 2)(dotted
and after applying a Savitzky-Golay digital filter for smoothing the line).The horizontal axis represents the periods of seasonality.
data with different lengths (bottom).

data, the outliers are detected and removed. In addition,


3.0
790 the values of the resource monitoring are average values
2.5
of the sampling period. Finally, and as explained, FLAS
scaling decisions require that appropriate conditions are
MAE

2.0 maintained over time, and not at a single point.


A regression-based model has been chosen since sta-
1.5
795 tistical models, besides allowing us to make performance
1.0 predictions in the auto-scaling phase, allow us to infer and
understand the relationships between low and high-level
0 25 50 75 100
metrics, being able to detect the KPIs of the application
Forecast horizon
and the resources to be scaled in each scaling action.
ARIMA(2,0,1)(0,1,1)[48] 800 More specifically, statistically valid linear regression
Model Regression with ARIMA(2,0,2) errors
STL + ETS(A,Ad,N)
models have been created for the two main performance
metrics, throughput and response time. The results show
that, as in the case of throughput, the low-level metrics
Figure 4: Comparison of the MAE value in the three time series anal-
that contribute the most information to the model (KPIs)
ysis models tested by cross-validation for workload trend forecast.
805 are the amount of free RAM, the number context switches
and the network usage (received and sent).On the other
phase and resource usage metrics in both the profiling and hand, the main KPI of the response time is the percent-
auto-scaling phases to send them to the Performance Fore- age of memory use, and the number of context changes,
caster. This service is executed periodically every second although to a lesser extent.
775 (the period is configurable) and collects a wide variety of810 Once the mapping algorithms/models are chosen and
resource usage metrics. For this implementation we have trained for a given application type and for the KPIs con-
used the dstat 1 service, collecting more than 30 usage met- sidered in this paper, they remain static. They could need
rics of various resources such as processor (system, user, to be changed or trained again for a different set of KPIs
idle, wait, hardware interrupt, software interrupt, context (e.g associated to a different application type). They are
780 switch metrics), memory (used, buffers, cache, free met-815 dynamic in this sense, and this is why they are parame-
rics), disk (read, write metrics), network (receive, send terizable in our system by design.
metrics), etc. Once these metrics are collected, the data is To exemplify this, we carried out a series of tests using
cleaned and pre-processed, adding compound metrics such four different publish-subscribe systems: E-SilboPS[10],
as utilization percentages that will be reported to the Per- RabbitMQ, ActiveMQ and Be-Tree[29] to analyze the type
785 formance Forecaster in both execution phases. 820 of relationship between low-level or resource-use metrics
The treatment of outliers is especially important, and and high-level metrics or SLA parameters. From the re-
that is why FLAS includes several mechanisms to treat sults of these tests it was clear that content-based publish-
them. In the cleaning and pre-processing phase of the subscribe applications (i.e. E-SilboPS and Be-Tree) were
CPU bound, and in the case of topic-based publish-subscribe
825 (RabbitMQ and ActiveMQ) they were memory bound.
1 https://linux.die.net/man/1/dstat

9
R2 R2 The Decider, like the rest of modules in this imple-
RespTime (ms) Throughput (notif/s) 860 mentation, is a service that runs periodically every second
1.00 0.99 and executes the algorithm described in Algorithm 1. As
0.95
0.98 can be seen, the main function of this module receives as
0.97
0.90
parameters the current instant (t0 ), the workload, a vec-
0.96
tor with the response time estimates in the moments prior
Value

MAE MAE 865 to t0 (RT 0 ), and the Decider configuration, which con-
RespTime (ms) Throughput (notif/s) tains a series of adjustable parameters of the Decider (i.e.
300 1500 h, reactW , incT rendT H, decT rendT H, reactU pperT H,
200 1000 reactLowerT H and majority). The first verification it
100 500 makes is that it is not currently in the cool-down time (line
0 0 870 1). After a scaling operation, a cool-down time is required
ANN FLAS GAM RF ANN FLAS GAM RF
model model to allow the system to stabilize and not trigger successive
Model scaling actions continuously. The time of a possible scaling
0
action (Tsa ) is then predicted (line 2) on the basis of the
Figure 6: R2 and M AE values obtained from the 10-fold cross- load (rate of incoming notifications per second and number
validation performed with more than 40 predictive models of differ-
ent types for comparison with the predictive models implemented in
875 of stored subscriptions). The response time trend predic-
FLAS, both for response time and throughput (i.e. the model with tion is obtained from the Workload Trend Forecaster by
the best results has been chosen as representative of its category, passing the current instant (t0 ), the forecast scaling time
which is shown). The different categories of models are: Artificial 0
(Tsa ) and the value of h as an argument. As a result, a
Neural Networks (ANN), Generalized Additive Models (GAM) and
Random Forests (RF).
prediction vector is obtained of δRT δt for each of the time
880 instants of h (line 3). In addition, the resource usage met-
rics obtained from the Monitoring Service are passed as
Therefore, we can consider these relationships are static for an argument to the Performance Forecaster to obtain an
the same application type (being it content-based or topic- estimate of the response time at the current instant RTt00
based publish-subscribe), but not for different application (lines 4 and 5) and this response time estimation is added
types. Even more, we cannot ensure anything beyond that,885 to the vector RT 0 (line 6) containing the previous RT fore-
830 not even within applications of the same paradigm, as is casts.
the case with publish-subscribe. Having made the corresponding predictions, the algo-
In order to evaluate the predictive capacity of these two rithm then decides whether any scaling action needs to be
models, a k-fold cross-validation (k=10) was performed triggered. First, it checks if the predictions of δRTδt vec-
with more than 40 predictive models generated for com-890 tor follow an upward trend (line 8). More specifically, the
835 parison of several types such as Random Forests (RF), incT rend() function checks that at least as many predic-
Artificial Neural Networks (ANN), Generalized Additive tions of the time horizon h as indicated by the majority
Models (GAM) or Generalized Linear Models (GLM). In configuration parameter are above an IncT rendT H de-
a very summarized way, due to the lack of space, Fig- fined also in the configuration. In this way, it can be
ure 6 shows the results of the 10-fold cross-validation (R2 895 verified whether, despite occasional fluctuations, the pre-
840 and M AE) comparing the FLAS predictive models for re- dicted values follow an increasing trend above a partic-
sponse time and for throughput with other types of predic- ular value. If the predictive condition of the scale-out
tive models (i.e. the model with the best results has been is not fulfilled, the reactive condition is checked by call-
chosen as the representative model of each category).It can ing the RT AboveT H() function. This function checks
be seen how FLAS models have a very good predictive ca-900 whether the last N response time estimations (reactive
845 pacity (R2 ) with a very low prediction error (M AE). window, reactW configuration parameter) are above a cer-
tain threshold (reactU pperT H) expressed in terms of the
5.3. Decider maximum response time specified by the SLA. If either
The Decider is the module in charge of gathering all of these two conditions are met, a scale-out action is trig-
the information from the previous modules to decide if905 gered, doubling the number of Matcher instances and mea-
any scaling action should be triggered, and if so, to de- suring the real time it takes to complete the scaling ac-
850 cide the specific values of each of the scaling dimensions. tion, Tsa (line 9). After any scaling action, the cool-down
This implementation of the Decider exploits the benefits of time is activated (calculated as a function of the Tsa time
both approaches, predictive and reactive, since it initially previously measured) in which no scaling action can be
checks the future predictions of the performance trend in910 performed (line 10). In addition, the response time esti-
order to take a decision in advance (proactive), but it also mation vector is cleaned so that the reactive condition can
855 checks the current values of the estimated performance be reassessed (line 11). Similarly, the conditions for carry-
and compares them with some thresholds (reactive) as a ing out a scale-in action are assessed using the respective
contingency plan against possible failures of the predictive thresholds of the configuration (lines 15 to 20).
model. 915 For the sake of clarity, some implementation details
10
Algorithm 1: Decider auto-scaling algorithm
20.0%
Input: t0 , workload, RT’, h, reactWindow,
incTrendTH, decTrendTH,

Percent of Total
reactUpperTH, reactLowerTH, majority 15.0%
1 if coolDown == 0 then
0
2 Tsa ← f orecastT (workload.N, workload.S);
δRT 0 10.0%
3 δt ← f orecastRT T rend(t0 , Tsa , h);
4 lowLevelM etrics ← monitoringService(t0 );
5 RTt00 ← estimateRT (lowLevelM etrics); 5.0%
6 RT 0 .add(RTt00 );
7
0.0%
// Scale-out evaluation
-2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
8 if incT rend( δRTδt , incT rendT H, majotiry) Relative Error of RT’
|| RT AboveT H(RT 0 , reactU pperT H, reactW )
then Figure 7: Relative frequency histogram and standardized density
9 Tsa ← startScaleOut(); function of the relative error of response time estimation. The vast
10 coolDown ← getCoolDownT ime(Tsa ); majority of the response time estimation error is constrained to low
values (99 percentile shown by the vertical dashed line), and therefore
11 RT 0 .clear(); the RT 0 (t) ' RT (t) approach is used for reactive scaling.
12 return
13 end
14 sponse time estimation is very large (from values close to 0
// Scale-in evaluation to tens of thousands of milliseconds or more). In addition,
if decT rend( δRT the model makes its estimation based on a snapshot of the
15 δt , decT rendT H, majotiry)
|| RT BelowT H(RT 0 , reactLowerT H, reactW ) resource usage provided by the low-level metrics, which
then 940 may cause that at a given moment there is a measurement
16 Tsa ← startScaleIn(); of a punctual peak usage due to their large variability,
17 coolDown ← getCoolDownT ime(Tsa ); which causes an estimation far above the real response
18 RT 0 .clear(); time. However, it has been demonstrated that these cases
19 return are statistically not frequent and irrelevant (Figure 7).
20 end
21 else 945 6. Experimental evaluation
22 coolDown--;
23 end This section presents the evaluation of FLAS as an
auto-scaling system for a distributed content-based publish-
subscribe system such as E-SilboPS. In addition, the re-
such as synchronization between scaling operations have sults of this evaluation are analyzed, showing how FLAS
been omitted. This case reflects the horizontal scaling of950 allows to minimize the time of violation of the performance
the matchers (CPU-bound operator) which is the most SLA in different situations.
complex case, since the scaling of the rest of operators is Unfortunately, there are no real public workloads avail-
920 trivial as it has no state [10]. able due to privacy concerns and commercial interests,
As seen, the reactive FLAS approach does not use the which often hinders the validation of content-based publish-
real response time (RT) metric, but the response time esti-955 subscribe systems. Some works have been done as in [9],
mated by the Performance Forecaster (i.e. RT 0 ). The ap- where the authors describe a possible solution, but it is not
proach RT 0 (t) ' RT (t) for all t instants allows that during available for use. For this reason, several test cases with
925 the auto-scaling phase the application does not have to be synthetic workloads have been generated for this evalu-
monitored and therefore it is not necessary to instrumen- ation. These test cases recreate several typical and ex-
talize the application in this phase, which makes FLAS a960 pected functioning scenarios of the proposed solution (i.e.
less invasive solution and reduces the monitoring overhead. test cases 1 to 3). However, following the Boundary-Value
However, it has been concluded that this approach is valid Analysis (BVA) test methodology, other test cases are pro-
930 since the relative error of the response time estimation is posed that recreate some of the worst possible scenarios
limited to relatively low values for approximately 98% of (i.e. test cases 4 and 5), thus complementing the evalua-
the estimations (percentile 99), that is, values outside this965 tion made and showing how our solution works even in the
range can be considered neither statistically frequent nor worst possible cases, which are difficult to present in real
relevant, as shown in Figure 7. Although the relative error contexts. The following subsections present and explain
935 may seem high, it should be noted that the domain of re- the different test cases proposed and the last subsection

11
analyses the quantitative results obtained after multiple1020 subscriptions and subsequent unsubscriptions as a result,
970 executions of the different test cases. for example, of interest in an event that is repeated season-
We ran our experiment on an Intel Core i7-4790K 4.00GHz ally with the same period and that after some time loses
and 16 GB of RAM running Linux 5.3.0-28-generic and interest (e.g. shopping channels with discount notifications
openjdk 11.0.7. All test cases have been performed with before Christmas or the Black Friday, traffic news chan-
an SLA that imposes a maximum response time of 1 sec-1025 nels before holiday periods, etc). This situation, although
975 ond. Regarding the workload, the notifications are spe- infrequent, could cause a saturation that would degrade
cially designed to match at least one subscription of those the service of other end users using the same E-SilboPS
processed by E-SilboPS, which is a worst case scenario, instance.
since in a real scenario not all the notifications match at In this test case both scale-out and scale-in actions
least one subscription. The publication rate of these noti-1030 have been taken proactively due to the predictions of the
980 fications is constant in all test cases with a value of 10k no- Workload Trend Forecaster. As can be seen in Figure 8,
tifications/s and the subscription dispatch rate is the one the scale-out action is triggered before the response time
that varies throughout the test cases, since E-SilboPS is grows exponentially, as a consequence of the forecast per-
specially designed to scale in the number of subscriptions. formance trend. Once the scale-out action is completed, it
This type of workload is typical of applications in contexts1035 goes from a 1-1-1 topology to 1-2-1, doubling the number
985 where the number of subscriptions is much higher than the of Matcher instances, which doubles the processing capac-
number of notifications, such as stock market applications, ity of the workload and reduces the response time. Later,
or more recently, centralized applications for the notifica- when there is a forecast in which the trend is decreasing
tion and tracking of COVID-19 patients. Furthermore, all during h, the scale-in action is triggered, which reduces
test cases start with a minimum configuration of 1-1-1 (11040 the Matcher instances to save resources, thus returning to
990 Access/Connection Point, 1 Matcher and 1 Exit Point). a 1-1-1 topology. By means of the scale-out action, the
In addition, the decider configuration parameters used in maximum value of 1 second response time indicated by
all the test cases have been the following: the SLA is not reached as in the case it would not have
scaled (dashed line). Moreover, when the peak has passed
• h=4 1045 and the processing demand is lower, the system releases
the unnecessary resources by performing a scale-in action.
• incT rendT H = 0.9
As shown in Figure 8, the time intervals between the
995 • decT rendT H = −0.9 vertical lines and the stroke and point lines indicates the
0 0
forecast scaling time (Tso and Tsi for the scale-out and
• reactU pperT H = 750ms (75% of the maximum RT1050 scale-in respectively) and the time intervals between the
allowed by the SLA) two vertical lines and light shading indicates the actual
• reactLowerT H = 10ms (1% of the maximum RT time of the scaling action (Tso and Tsi ). Both scaling ac-
allowed by the SLA) tions end slightly earlier than predicted. In addition, due
to the lack of a greater resolution cannot be seen how there
1000 • reactW = 2 (2 estimations must exceed the Reac-1055 is a slight increase in the response time when the scaling
tiveUpperThreshold value to trigger a scaling action) actions are initiated, this is because the scaling algorithm
of the Matcheer in E-SilboPS requires to distribute the
These specific values of the configuration parameters internal state between the total Matcher instances of the
have been obtained empirically and have demonstrated the new configuration. This dynamic repartitioning requires
best results for this particular case. All the graphs shown1060 sending the corresponding status to each instance and its
1005 in this section have been generated with these configura- processing, which implies an overhead to the processing
tion parameters that have been empirically demonstrated of the workload that continues to be received during the
to be the ones that produce the least number of SLA vio- scaling operations.
lations for this set of test cases. However, as can be seen in Finally, the dotted line represents the estimation of the
Section 6.6, other values of these parameters can generate1065 response time in each one of those instants (RT 0 ) that as
1010 different scaling decisions. More specifically, Section 6.6 can be seen is quite approximate to the value of the real
compares the quantitative results of some of the alterna- response time measured in those instants (RT ). A figure
tive configurations tested and their comparison with the similar to this (Figure 8) can be observed in each of the
chosen configuration. following subsections where the test cases and their results
1070 are explained, and therefore, the way to interpret it is the
6.1. Test Case 1: Stationary peak same as explained here.
1015 In this first test case there is a huge increase and later
decrease in the number of subscriptions (up to 80k subs/s) 6.2. Test Case 2: Non-stationary peak
in a time interval of a few seconds (peak), as shown by the In this test case, there is a workload peak similar to the
dotted line in Figure 8. This peak occurs seasonally every previous test case, but in a non-stationary manner. This
certain period of time, representing a massive sending of1075 change cancels out the predictive part of FLAS, since it is
12
T0so T0si T0so T0si
Tso Tsi Tso Tsi
3000 3000
Response Time (ms)

Response Time (ms)


2000 2000

1000 1000

500 500
250 1-1-1 1-2-1 1-1-1 250 1-1-1 1-2-1 1-1-1

10 10
0 5 10 15 20 25 30 35 40 45 50 55 0 5 10 15 20 25 30 35 40 45 50 55
Time (s) Time (s)

RT RT without scaling RT’ RT RT without scaling

Figure 8: Test case of a stationary peak of subscriptions that FLAS is Figure 9: Test case similar to that shown in Section 6.1 but using a
able to cope with by means of a predictive scale-out before the peak reactive rather than predictive approach because it’s non-stationary.
is reached and a scale-in also predictive when the workload is going A threshold of 75% of the maximum RT established by the SLA (i.e.
to decrease permanently to save resources. The values of Tso 0 and 750 ms) has been used. Compared to Figure 8 it can be seen how the
0 indicate the forecast scaling time for the scale-out and scale-in
Tsi scaling action occurs later and therefore the response time increases.
actions respectively, while the values Tso and Tsi indicate the actual
times that these scaling actions have taken
1110 6.3. Test Case 3: Steady increase
not able to predict when that peak will occur. However, This test case reflects a scenario in which subscrip-
FLAS is able to make the corresponding scaling decisions tions increase slowly but steadily over time (+500 sub-
in a reactive manner to ensure compliance with the SLA. scriptions/s) until more than 100k subscriptions are stored.
In a real context, this can occur if there is a programming In a real context, this small but constant increase in load
1080 error in the application using E-SilboPS, a denial of ser-1115 may correspond to a certain fashion or trend that causes
vice (DoS) type attack, or simply a sudden and ephemeral subscriptions related to a certain entity to gradually in-
massive interest in a certain entity. crease as this entity becomes more popular, which means
Both scaling decisions (scale-out and scale-in) have been that large peaks are not reflected as in the previous cases
taken in a reactive manner using threshold based rules (Figure 10). An example of this behavior can be social
1085 (Figure 9). When the RT predicted by the Performance1120 network profiles whose interest is slowly but steadily in-
Forecaster exceeds a certain threshold for a certain amount creasing over time or subscriptions to a platform that is
of time, the scaling action is triggered. Several execu- gradually becoming popular.
tions have been performed with different values both of This type of situation means that the slope predicted
the threshold value (i.e. 75%, 80% and 90% of the max- by the Workload Trend Forecaster is not sharp enough to
1090 imum response time imposed by the SLA in the case of1125 trigger a scale-out in the first few moments, and when this
scale-out), and of the amount of time that should be ex- slope is clearly accentuated, the maximum response time
ceeded to trigger the scaling action (i.e. reactive window value imposed by the SLA is already being exceeded by a
size of 1, 2 and 3 consecutive estimations). These results long margin. In this situation, a predictive scale-out is not
have allowed us to empirically obtain the best FLAS con- possible because the scaling decision would be very late,
1095 figuration as detailed in Section 6.6 Quantitative analysis1130 once the SLA is violated which would boost even more
of results, where a quantitative comparison of all these the response time by adding the overhead of the scale-out
variations can be seen. process. Instead, the reactive approach we already dis-
In general, the reactive approach is less efficient than cussed is able to react earlier in these cases, especially with
the predictive one, since it tends to incur longer periods low threshold values (i.e. 75% of the maximum response
1100 of under-provision in the scale-out and over-provision in1135 time set by the SLA). Moreover, even if the reactive ap-
the scale-in as a result of its lack of anticipation. The proach determines a little late that a scale-out should be
higher the threshold value of the reactive scaling rules, performed, in this context it is not so harmful because the
the more these differences are accentuated. On the other increase of the response time is quite slow and therefore
hand, adding the restriction of exceeding the threshold it will take longer to violate the maximum response time
1105 value for more than one estimation prevents the triggering1140 imposed by the SLA.
of scaling actions as a consequence of a false positive of
the Performance Forecaster, but once again, it delays the 6.4. Test Case 4: Isolated and close to SLA limit workload
decision making, and if it is not a false positive it could peak
mean a higher SLA violation. This test case is very similar to the first one, but unlike
the first one, the peak of the workload occurs in a much
13
T0so 1000

Response Time (ms)


Tso
10000 750
Response Time (ms)

8000 1-1-1 1-2-1


500
6000

4000
250
2000

1000
500

10 10
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 0 10 20 30 40 50 60 70 80 90 100 110
Time (s) Time (s)
RT RT without scaling RT’ RT RT without scaling RT’

Figure 10: A test case in which the subscription rate increases slowly, Figure 11: Test case with a very punctual peak workload in which
which causes the response time to increase gradually and not gener- FLAS determines that it is not efficient to trigger scaling actions
ate a peak. In this case the predictive approach of FLAS is not able (neither predictively nor reactively), therefore it has a 1-1-1 config-
to predict sufficiently in advance the scale-out action, but the reac- uration during the whole test case.
tive approach allows FLAS to trigger the scale-out action without
violating the SLA.
the previous cases, is repeated periodically. In this case,
1180 it would be considered advisable to perform a scale-out
1145 shorter instant of time, going from a workload of 30k sub- action, since otherwise the maximum response time im-
scriptions/s to 120k subscriptions/s, to return the next posed by the SLA would be violated for quite some time
instant to 30k subscriptions/s (Figure 11). As previously as shown in Figure 12.
mentioned, this test case and next one represents a worst However, as in the previous test case, it is not possible
case scenarios following a BVA analysis, and therefore, it1185 to give a predictive response, as the upward trend is not
1150 will be very difficult to find these type of case in a real stable over time, but fluctuates continuously. However,
context, but it allows to evaluate FLAS even in the worst FLAS is still able to determine that it must perform a
imaginable conditions. scale-out since it detects that the estimated response time
Although this test case seems a little artificial, it serves is above a certain threshold (75% of the maximum estab-
to illustrate how the system is capable of adequately man-1190 lished by the SLA, in this case, 750 ms) for two consecutive
1155 aging peaks so punctual that it is not worth triggering a estimations. In this way, as shown in Figure 11, a reactive
scaling action, since the duration of the scale-out operation scale-out is produced. Some SLA violations can still occur
itself could be double or triple the duration of the peak, in one or two seconds, but it is far less than if no scaling
in addition the overhead introduced by the scaling action action were taken. Finally, scale-in is predictive, since it
itself, could make the response time at that instant even1195 can detect a clear trend of decreasing response time once
1160 greater. In this case, it is not efficient to trigger a scale- the succession of workload peaks is over.
out and scale-in operation and FLAS is able to detect it
through the condition that the trend has to be maintained 6.6. Quantitative analysis of results
during a certain number of the predictions of the predic-
tion time horizon h. That is, the Workload Trend Fore- This section presents and discusses the results of the
1165 caster detects the peak, but since it does not show a trend evaluation of the case studies described above (Tables 1
maintained over time, the Decider resolves that it should1200 and 2). The results presented here are average values ob-
not trigger any scaling action. Moreover, in this case, the tained from 20 executions of each test case. The default
reactive window has to be large enough not to trigger a setting for the reactive scaling is the one mentioned above
scale-out action as a result of the peak in response time (75% of the maximum response time imposed by the SLA
1170 estimation when the workload peak occurs. As shown in as a threshold for the scale-out and reactive window size of
Figure 11, only the maximum response time SLA of one1205 2). It is later demonstrated that this is the most efficient
second is violated during a very short period of time (ap- configuration for this case.
proximately 1 second). When interpreting Tables 1 and 2, the distinction be-
tween NA and 0.00 values must be taken into account. NA
6.5. Test Case 5: Consecutive and close to SLA limit values indicate that there is no data for that case, since
1175 workload peaks 1210 it does not apply that measure for that particular case,
while the other case indicates that the value of that met-
This test case is based on the previous one, but instead ric is 0.00.
of being a single punctual peak, several peaks occur in a For each test case a single version has been executed
row during a certain time interval. This behavior, as in without any scaling action in order to obtain a baseline
14
T0so T0si 1250 this test case and the null value of under-provisioning.
Tso Tsi The Test case 4 is the only case where no scaling action
3000 is taken and yet the SLA violation percentage is very low
Response Time (ms)

2000
(less than 1%). It should be remembered that in this case,
since the peak is so small, it is not profitable to perform a
1000 1255 scale-out, since the overhead of the scale-out action would
produce an increase in response time that would result in
500
a larger SLA violation than if this scale-out action is not
250 1-1-1 1-2-1 1-1-1
performed. Due to the reactive scaling window, FLAS is
10 able to detect this situation and conclude that it is not
0 5 10 15 20 25 30 35 40 45 50 55 1260 necessary to trigger the scale-out action.
Time (s) For the case of the Non-stationary peak test (Test case
RT RT without scaling RT’ 2) several reactive versions have been test to be able to
be compared with the predictive one (Test case 1) and
show how the configuration values of the Decider have
Figure 12: Test case with continuous punctual load peaks. FLAS is1265 been fixed empirically. Each version of Test case 2 is iden-
able to perform a scale-out to reactively to cope with the increased
workload. The scale-in is predictive, since it predicts a continuous
tified as X-Y, where X is the scale-out threshold (as a
decrease of the response time. Only one estimation above the reactive percentage of the maximum RT defined by the SLA), and
threshold can be appreciated but this is due to the aggregation of Y is the size of the reactive window. Therefore, the higher
the data for the sake of clarity. the X and Y values, the later the scaling action will be
1270 triggered. As can be seen, the predictive version (Test
1215 of the response time behavior for that test case. This case 1) does not violate the SLA at any time, since the
baseline has been used to determine the Demand Point of under-provisioning is very low and the over-provisioning is
each scaling action (DPsa ). In each execution, the differ- somewhat higher (due to the prediction error). Looking
ence between a demand point for an scaling action and the at the reactive versions, for the same value of the scal-
completion of the corresponding scaling action (RPsa ), i.e.1275 ing threshold, the total under-provisioning time increases
1220 DPsa − RPsa , has been calculated. Positive values of this as the size of the reactive scaling window increases, and
difference indicate over-provisioning in the case of a scale- therefore, the time in violation of the SLA increases. In
out action and under-provisioning in the case of a scale-in view of this data, the reactive configuration 75-1 and 75-2
action. On the other hand, negative values of this differ- are the ones that generate less SLA violations, and specif-
ence indicate under-provisioning and over-provisioning if1280 ically, the implementation of configuration 75-2 has been
1225 the scaling action is scale-out and scale-in respectively. In chosen in the Decider since it allows to efficiently treat
this way, the percentage of time in over-provisioning and cases such as Test case 4, while with configuration 75-1
under-provisioning can be calculated as the sum of all over- it would not be possible (although with configuration 75-
provisioning and under-provisioning times for each scaling 2 SLA violations slightly increase compared to 75-1, they
action divided by the total runtime. 1285 still do not exceed 1% of the time).
1230 Finally, Table 1 also shows the percentage of time that With respect to Table 2, it can be seen how the scale-
the SLA has been violated, that is, the percentage of time out time (avg T scale-out) increases with the size of the
that the RT has been greater than 1 second. Table 2 shows reactive window and the scale-out threshold. This is due
the scale action times calculated as
P
|T Psa − RPsa | for to the fact that, as mentioned, with higher values of the
all scale-in and scale-out actions respectively. In addition,1290 reactive window and the scale-out threshold, the scale-out
1235 for each type of scaling action it shows the relative error of action is triggered later, and therefore the E-SilboPS load
the prediction of T , calculated as
P Tf orecast −T
. The sign is higher and in saturation, which increases the scale-out
T
of this relative prediction error indicates if the predicted time. This same phenomenon can be observed in the rela-
T value is greater than the real value of the scaling time tive error of prediction of T in the scale-out (last column),
(positive) or if, on the contrary, the predicted scaling time1295 since it is observed that for reactive cases, the error goes
1240 is less than the real time of the scaling action (negative). from positive values (i.e. T 0 > T ), to negative values (i.e.
The Test case 5 has a very low SLA maximum response T 0 < T ).
time violation rate (0.36%). The over-provisioning of this In Table 2 it can be seen how for the scaling actions
test case is mainly due to the predictive scale-in actions triggered in a predictive way, the relative error of predic-
and the under-provisioning to the reactive scale-out ac-1300 tion of T oscillates between approximately 30% and 40%.
1245 tions, being the latter the ones that cause these small SLA This a priori could be translated in a bad accuracy of the
violations. In turn, the Test case 3 does not violate the predictive model of the Scaling Time Forecaster, but it is
maximum response time of the SLA at any time due to the necessary to analyze it in this context. The scaling horizon
early reactive scaling as seen in the percentage of over- h of the Workload Trend Forecaster depended on the pre-
provisioning time, the slow increase in response time of1305 diction of T , so if the predicted T is always slightly higher,
this causes that h is also larger, and therefore the scaling
15
Reactive Total Total
Time in SLA
Test case Window over-provisioning under-provisioning
violation (%)
Size time (%) time (%)
Test case 5 2 10.16 7.39 0.36
Test case 3 2 3.98 0.00 0.00
Test case 4 2 NA NA 0.61
Test case 1 2 7.66 0.84 0.00
1 7.65 0.84 0.18
Test case 2
2 7.61 2.16 0.78
(reactU pperT H = 750ms)
3 9.18 8.13 6.14
1 8.87 3.20 1.56
Test case 2
2 7.02 4.92 5.21
(reactU pperT H = 800ms)
3 12.10 11.02 5.71
1 9.00 4.72 2.27
Test case 2
2 8.08 4.45 2.73
(reactU pperT H = 900ms)
3 10.07 9.54 7.79

Table 1: Results of the percentage of under-provisioning, over-provisioning and violation of the maximum response time imposed by the SLA
for the different test cases. In the case of the Stationary Peak test case, the predictive and reactive X-Y versions are compared (being X the
scale-out threshold and Y the size of the reactive window respectively).

Reactive Relative error Relative error


avg T avg T
Test case Window of T prediction of T prediction
scale-in (s) scale-out (s)
Size in scale-in (%) in scale-out (%)
Test case 5 2 1.99 2.88 40.71 -2.84
Test case 3 2 NA 2.93 NA 2.86
Test case 4 2 NA NA NA NA
Test case 1 2 1.97 2.11 42.31 36.81
1 1.99 2.92 41.05 5.31
Test case 2
2 2.29 3.19 28.87 -0.57
(reactU pperT H = 750ms)
3 2.02 5.39 38.51 -41.03
1 2.06 3.27 36.07 0.11
Test case 2
2 1.41 4.32 99.07 -10.40
(reactU pperT H = 800ms)
3 1.99 5.33 41.13 -38.50
1 1.99 3.79 41.07 -12.70
Test case 2
2 1.97 3.89 41.91 -16.91
(reactU pperT H = 900ms)
3 2.14 5.80 33.12 -43.81

Table 2: Results of the scaling times (T ) and the error in the prediction of these times for the scale-out and scale-in actions of the different
test cases.

decisions are taken slightly in advance of what would be latter has been chosen because it is the most widely used
the optimum. This leads to a slight increase in the per- auto-scaler in practice, since it is the one offered by the
centage of time in over-provisioning in those same cases, main Cloud providers. In this case, it has been configured
1310 as can be seen in Table 1. However, it must be taken so that the scale-out is triggered if CPU usage is above
into account that in this context the objective of FLAS1330 80% during more than two monitoring periods, and the
is to minimize performance SLA violations, and therefore, scale-in is triggered in an analogous way with a threshold
that the scaling actions are slightly advanced (as a conse- of 40%. This configuration has been chosen because it has
quence of a predicted T greater than the real T ) entails a been empirically proven to minimize SLA violations. Af-
1315 higher cost derived from a higher over-provisioning. How- ter any scaling action of any of the auto-scalers, the same
ever, it should be borne in mind that a slightly delayed1335 cool down time has been used.
scaling action (predicted T less than actual T ) usually en- As can be seen in Figure 13, the percentage of time
tails an increase in SLA violations (as seen in the results in violation of FLAS SLAs is the lowest of the two tech-
of Table 1), which is a much greater penalty than a slight niques that compose it separately, that is, FLAS is ca-
1320 over-provisioning, and therefore, in this context it seems pable of coordinating the two auto-scaler techniques to
cautious to have that margin of error. 1340 choose the most favorable one in any situation, even in the
Following the evaluation methodology of similar works[17], most extreme cases represented here. On the other hand,
Figure 13 compares FLAS with each one of auto-scaling it can also be seen how the reactive technique that uses
techniques that compose it (reactive and predictive) and threshold-based rules has a percentage of SLA violation
1325 with a reactive technique using threshold-based rules. The very similar to that of FLAS, and therefore better than the

16
100
SLA violation (%)

75
Time in

50

25

0
Total over-provisioning

100

75
time (%)

50

25

0
Test case 1 Test case 2 Test case 3 Test case 4 Test case 5

Auto-scaler FLAS Only predictive (FLAS) Only reactive (FLAS) Threshold-based rules (reactive)

Figure 13: Comparison of FLAS with other auto-scaling techniques in terms of percentage of time that the SLA is violated and percentage
of time in over-provisioning.

1345 auto-scaling techniques of FLAS separately in some test combining the advantages of both auto-scaling approaches,
cases. However, it achieves this with an over-provisioning proactive and reactive. The main contributions of this pa-
much higher than FLAS and its separate parts in all test per are (i) the design and definition of the architecture
cases, which results in an increase of the economic cost1375 of a generic framework for distributed system auto-scaling
and energy consumption compared to FLAS. through the combination of proactive and reactive tech-
1350 Finally, considering these results we can conclude that niques, and (ii) a solution for the auto-scaling of a content-
FLAS meets its objective of minimizing performance SLA based publish-subscribe distributed system (E-SilboPS).
violations, especially since some of the test cases presented The problems that have been addressed by implementing
here are the worst possible scenarios. In these circum-1380 the architectural framework for the case study have been
stances (taking into account that reactive versions with (i) to create a model to predict the trend of relevant SLA
1355 configuration different from 75-2 are shown only for com- parameters (i.e. response time or throughput), (ii) to cre-
parison), it is verified that the percentage of time in which ate a model capable of estimating, agnostically with re-
the maximum response time imposed by the SLA of 1 sec- spect to the type of application, SLA parameters based on
ond is violated is less than 1% (i.e. 0.78% in the worst1385 resource usage metrics, (iii) to create a model to predict
case), which ensures a compliance with the SLA of 99.22% the scaling time and (iv) a mainly proactive auto-scaling
1360 of the time. Furthermore, it has been demonstrated that algorithm that also contemplates a reactive scaling as a
FLAS has a better performance than each of the auto- contingency to possible prediction failures. In addition,
scaling techniques that compose it, being able to coordi- the modular and decoupled design of the FLAS architec-
nate both to choose the most convenient one in each sit-1390 ture allows it to adapt to different distributed systems by
uation. When compared to other auto-scalers, FLAS has modifying few configuration parameters and to replace the
1365 demonstrated to obtain similar results in terms of SLA predictive models of the different modules of its architec-
compliance, but with a quite inferior over-provisioning, ture while respecting their definition.
which results in a lower economic cost and energy savings Experimental results show how FLAS is able to meet
using FLAS. 1395 the performance requirements set by the SLA even in the
worst possible cases. More specifically, it has been demon-
strated how FLAS enables a content-based publish-subscribe
7. Conclusions
system to meet the maximum response time imposed by
1370 In this paper we have presented FLAS (Forecasted Load the SLA for over 99% of the time by taking the necessary
Auto-Scaling) a generic distributed system auto-scaler by1400 scaling decisions at any given time under a wide variety

17
of workloads by exploiting the advantages of both reactive predictive model could be developed with other methods
and proactive approaches. In addition, it has also been (i.e. Artificial Neural Networks) that take into account
demonstrated how its scaling algorithm is able to deter- these discovered relationships. However, although more
mine those cases in which triggering an scaling action is complex predictive techniques can improve the accuracy
1405 not efficient, since the overhead involved in the scaling ac-1460 of the predictive approach of FLAS, it should be taken
tion could lead to a violation of the SLA greater than that into account that this can impact on the performance of
incurred in the case of not scaling. To the best of our FLAS causing a considerable overhead.
knowledge, this would be the first auto-scaler for content-
based publish-subscribe systems.
References
1410 Furthermore, it has been empirically demonstrated how
FLAS efficiently coordinates both the reactive and predic- [1] W. R. S. Yefim Natis, Massimo Pezzini, Keith Guttridge, The 5
tive auto-scaling approaches that compose it to offer bet-1465 Steps Toward Pervasive Event-Driven Architecture, Tech. Rep.
June, Gartner (2019).
ter or equal SLA compliance than each of them separately. [2] P. Wu, Q. Shen, R. H. Deng, X. Liu, Y. Zhang, Z. Wu, ObliDC,
Compared to other auto-scalers, FLAS achieves the same in: Proceedings of the 2019 ACM Asia Conference on Com-
1415 positive results as other auto-scalers in compliance with puter and Communications Security, ACM, New York, NY,
SLAs but with a much lower over-provisioning, reducing1470 USA, 2019, pp. 86–99. doi:10.1145/3321705.3329822.
URL https://dl.acm.org/doi/10.1145/3321705.3329822
the economic cost and the energy consumption. [3] M. Ali, J. Mohajeri, M.-R. Sadeghi, X. Liu, A fully dis-
tributed hierarchical attribute-based encryption scheme, The-
oretical Computer Science 815 (2020) 25–46. doi:10.1016/j.
8. Future directions 1475 tcs.2020.02.030.
URL https://doi.org/10.1016/j.tcs.2020.02.030
This work addresses all dimensions of scaling, however, [4] X. Liu, R. H. Deng, P. Wu, Y. Yang, Lightning-Fast and
1420 the current implementation of FLAS duplicates or divides Privacy-Preserving Outsourced Computation in the Cloud,
by two the resources to be added or removed as a first arXiv (2019) 1–19arXiv:1909.12540.
approach to the dimension of how much to scale. We are1480 URL http://arxiv.org/abs/1909.12540
[5] T. Lorido-Botran, J. Miguel-Alonso, J. A. Lozano, A Review
currently working on a module that allows predicting the of Auto-scaling Techniques for Elastic Applications in Cloud
minimum configuration required for the performance pre- Environments, Journal of Grid Computing 12 (4) (2014)
1425 dictions obtained. However, this is a module whose con- 559–592. doi:10.1007/s10723-014-9314-7.
figuration is quite specific to the application and therefore1485 URL http://link.springer.com/10.1007/
s10723-014-9314-7
not as generic for all distributed systems as the rest of the [6] L. Rodero-Merino, L. M. Vaquero, V. Gil, F. Galán, J. Fontán,
FLAS architecture that we have presented in this work. R. S. Montero, I. M. Llorente, From infrastructure delivery to
For example, the E-SilboPS with which we have evaluated service management in clouds, Future Generation Computer
Systems 26 (8) (2010) 1226–1240. doi:10.1016/j.future.
1430 FLAS in this work allows to deploy or remove several in-1490
2010.02.013.
stances of an operator in a single scaling operation, but URL http://dx.doi.org/10.1016/j.future.2010.02.013
other systems do not allow this option and must sequence [7] V. C. Emeakaroha, I. Brandic, M. Maurer, S. Dustdar, Low
successive scaling operations. level Metrics to High level SLAs - LoM2HiS framework: Bridg-
ing the gap between monitored metrics and SLA parameters in
In addition, we are currently working to improve work-1495 cloud environments, in: 2010 International Conference on High
1435 load trend prediction in order to predict non-stationary Performance Computing & Simulation, IEEE, 2010, pp. 48–54.
workloads. doi:10.1109/HPCS.2010.5547150.
Currently, the prediction of the time of a scaling ac- URL http://ieeexplore.ieee.org/document/5547150/
0 1500 [8] C. Springs, M. Debusmann, A. Keller, Sla-Driven Management
tion (Tsa ) is determined by the load, more specifically, in of Distributed Systems Using the Common Information Model,
the case of E-SilboPS we have seen that it is based on the IBM TJ Watson Research Center.
1440 ratio of notifications/s and the load of processed subscrip- [9] A. Paschke, E. Schnappinger-Gerull, A Categorization Scheme
tions. We believe that it would be interesting to add as for SLA Metrics, Service Oriented Electronic Commerce (2006)
1505 25–40.
a predictor to this model the prediction of the load in fu- URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=
ture moments, to determine more precisely the time of the 10.1.1.93.1800{&}rep=rep1{&}type=pdf
scaling action and avoid possible uncontrolled peaks of re- [10] S. Vavassori, J. Soriano, R. Fernández, Enabling Large-Scale
1445 sponse time when the scaling algorithm of the application IoT-Based Services through Elastic Publish/Subscribe, Sensors
1510 17 (9) (2017) 2148. doi:10.3390/s17092148.
involves an overhead (dynamic distribution of the state of URL http://www.mdpi.com/1424-8220/17/9/2148
the operators). [11] G. Kousiouris, D. Kyriazis, S. Gogouvitis, A. Menychtas,
Finally, other improvements that are being studied are, K. Konstanteli, T. Varvarigou, Translation of application-level
terms to resource-level attributes across the Cloud stack layers,
on the one hand, the inclusion of the current configuration
1515 in: 2011 IEEE Symposium on Computers and Communications
1450 of the distributed system as an additional predictor of the (ISCC), IEEE, 2011, pp. 153–160. doi:10.1109/ISCC.2011.
predictive models to test whether the configuration of a 5984009.
distributed system at a given time can condition future URL http://ieeexplore.ieee.org/document/5984009/
[12] Y. Chen, S. Iyer, X. Liu, D. Milojicic, A. Sahai, Trans-
predictions. On the other hand, now that the relation-1520 lating Service Level Objectives to lower level policies for
ships between low-level and high-level metrics have been multi-tier services, Cluster Computing 11 (3) (2008) 299–311.
1455 studied by means of a statistical predictive model, another doi:10.1007/s10586-008-0059-6.

18
URL http://link.springer.com/10.1007/ IEEE 34th International Conference on Distributed Computing
s10586-008-0059-6 1595 Systems, IEEE, 2014, pp. 567–576. doi:10.1109/ICDCS.2014.
1525 [13] A. Yu, P. Agarwal, J. Yang, Generating wide-area content- 64.
based publish/subscribe workloads, Network Meets Database URL http://ieeexplore.ieee.org/document/6888932/
(NetDB). [27] M. Ulbrich, U. Geilmann, A. Achraf, E. Ghazi, M. Taghdiri,
[14] E. F. Coutinho, F. R. de Carvalho Sousa, P. A. L. Rego, D. G. Karlsruhe Reports in Informatics 2011,37, Tech. rep. (2011).
Gomes, J. N. de Souza, Elasticity in cloud computing: a survey,1600 [28] E. D. Lazowska, J. Zahorjan, G. S. Graham, K. C. Sevcik, Quan-
1530 annals of telecommunications - annales des télécommunications titative System Performance: Computer System Analysis Using
70 (7-8) (2015) 289–309. doi:10.1007/s12243-014-0450-7. Queueing Network Models, Prentice-Hall, Inc., USA, 1984.
URL http://link.springer.com/10.1007/ [29] M. Sadoghi, H.-A. Jacobsen, BE-tree, in: Proceedings of
s12243-014-0450-7 the 2011 international conference on Management of data -
[15] G. Galante, L. C. E. de Bona, A Survey on Cloud Comput-1605 SIGMOD ’11, ACM Press, New York, New York, USA, 2011,
1535 ing Elasticity, in: 2012 IEEE Fifth International Conference on p. 637. doi:10.1145/1989323.1989390.
Utility and Cloud Computing, no. 1, IEEE, 2012, pp. 263–270. URL http://portal.acm.org/citation.cfm?doid=1989323.
doi:10.1109/UCC.2012.30. 1989390
URL http://ieeexplore.ieee.org/document/6424959/
[16] C. Qu, R. N. Calheiros, R. Buyya, Auto-Scaling Web Applica-
1540 tions in Clouds, ACM Computing Surveys 51 (4) (2018) 1–33.
arXiv:1609.09224, doi:10.1145/3148149.
URL https://dl.acm.org/doi/10.1145/3148149
[17] F. Lombardi, A. Muti, L. Aniello, R. Baldoni, S. Bonomi,
L. Querzoni, PASCAL: An architecture for proactive auto-
1545 scaling of distributed services, Future Generation Computer
Systems 98 (2019) 342–361. doi:10.1016/j.future.2019.03.
003.
URL https://doi.org/10.1016/j.future.2019.03.003
[18] Y. Zhai, W. Xu, Efficient Bottleneck Detection in Stream Pro-
1550 cess System Using Fuzzy Logic Model, in: 2017 25th Eu-
romicro International Conference on Parallel, Distributed and
Network-based Processing (PDP), IEEE, 2017, pp. 438–445.
doi:10.1109/PDP.2017.71.
URL http://ieeexplore.ieee.org/document/7912685/
1555 [19] Zhenhuan Gong, Xiaohui Gu, J. Wilkes, PRESS: PRedictive
Elastic ReSource Scaling for cloud systems, in: 2010 Interna-
tional Conference on Network and Service Management, IEEE,
2010, pp. 9–16. doi:10.1109/CNSM.2010.5691343.
URL http://ieeexplore.ieee.org/document/5691343/
1560 [20] E. Casalicchio, A study on performance measures for
auto-scaling CPU-intensive containerized applications, Clus-
ter Computing 22 (3) (2019) 995–1006. doi:10.1007/
s10586-018-02890-1.
URL https://doi.org/10.1007/s10586-018-02890-1
1565 [21] V. Rampérez, J. Soriano, D. Lizcano, A Multidomain
Standards-Based Fog Computing Architecture for Smart
Cities, Wireless Communications and Mobile Computing 2018
(2018) 1–14. doi:10.1155/2018/4019858.
URL https://www.hindawi.com/journals/wcmc/2018/
1570 4019858/
[22] S. Vavassori, J. Soriano, D. Lizcano, R. F. Andez, Cloud moni-
toring using elastic Publish/Subscribe, in: TRANSACTIONS
ON EMERGING TELECOMMUNICATIONS TECHNOLO-
GIES, Vol. 25, 2014, pp. 294–307. doi:10.1002/ett.
1575 [23] S. Vavassori, J. Soriano, D. Lizcano, M. Jiménez, Explicit Con-
text Matching in Content-Based Publish/Subscribe Systems,
Sensors 13 (3) (2013) 2945–2966. doi:10.3390/s130302945.
URL http://www.mdpi.com/1424-8220/13/3/2945
[24] A. Carzaniga, D. S. Rosenblum, A. L. Wolf, Design and eval-
1580 uation of a wide-area event notification service, ACM Trans-
actions on Computer Systems 19 (3) (2001) 332–383. doi:
10.1145/380749.380767.
URL https://dl.acm.org/doi/10.1145/380749.380767
[25] A. Carzaniga, A. L. Wolf, Forwarding in a content-based
1585 network, in: Proceedings of the 2003 conference on Applica-
tions, technologies, architectures, and protocols for computer
communications - SIGCOMM ’03, ACM Press, New York, New
York, USA, 2003, p. 163. doi:10.1145/863973.863975.
URL http://portal.acm.org/citation.cfm?doid=863955.
1590 863975
[26] R. Barazzutti, T. Heinze, A. Martin, E. Onica, P. Felber, C. Fet-
zer, Z. Jerzak, M. Pasin, E. Riviere, Elastic Scaling of a High-
Throughput Content-Based Publish/Subscribe Engine, in: 2014

19

You might also like