0% found this document useful (0 votes)
170 views11 pages

The Architecture Tradeoff Analysis Method

This document introduces the Architecture Tradeoff Analysis Method (ATAM), which provides a structured technique for understanding the tradeoffs inherent in software design. The ATAM analyzes how a system architecture addresses quality attributes like performance, modifiability, security, and more. It identifies tradeoff points between these interacting attributes to facilitate stakeholder communication and clarify requirements. The ATAM is an iterative process that leads to refined architectures through analysis and risk mitigation. It builds on prior work analyzing individual attributes and provides a framework for ongoing design and analysis.

Uploaded by

ucinfpracticas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views11 pages

The Architecture Tradeoff Analysis Method

This document introduces the Architecture Tradeoff Analysis Method (ATAM), which provides a structured technique for understanding the tradeoffs inherent in software design. The ATAM analyzes how a system architecture addresses quality attributes like performance, modifiability, security, and more. It identifies tradeoff points between these interacting attributes to facilitate stakeholder communication and clarify requirements. The ATAM is an iterative process that leads to refined architectures through analysis and risk mitigation. It builds on prior work analyzing individual attributes and provides a framework for ongoing design and analysis.

Uploaded by

ucinfpracticas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

The Architecture Tradeoff Analysis Method

Rick Kazman, Mark Klein, Mario Barbacci,


Tom Longstaff, Howard Lipson, Jeromy Carriere

Software Engineering Institute


Carnegie Mellon University
Pittsburgh, PA 15213

ABSTRACT up a sound stage for optimal fidelity. Clearly such a task is


This paper presents the Architecture Tradeoff Analysis untenable. The only difference between this analogy and
Method (ATAM), a structured technique for understanding software architecture is that software systems have far more
the tradeoffs inherent in design. This method was developed than 28 independent but interacting variables to be “tuned”.
to provide a principled way to evaluate a software architec-
ture’s fitness with respect to multiple competing quality There are techniques that designers have used to try to miti-
gate the risks in choosing an architecture to meet a broad pal-
attributes: modifiability, security, performance, availability,
ette of quality attributes. The recent activity in cataloguing
and so forth. These attributes interact—improving one often
comes at the price of worsening one or more of the others as design patterns and architectural styles is an example of this.
A designer will choose one pattern because it is “good for
is demonstrated in the paper. The ATAM is a spiral model of
portability” and another because it is “easily modifiable”.
design: one of postulating candidate architectures followed
by analysis and risk mitigation, leading to refined architec- But the analysis of patterns doesn’t go any deeper than that.
A user of these patterns does not know how portable, or
tures.
modifiable, or robust an architecture is until it has been built.
KEYWORDS To address these problems this paper introduces the Archi-
Software architecture analysis, Quality attributes tecture Tradeoff Analysis Method (ATAM). ATAM is a
method for evaluating architecture-level designs that consid-
ARCHITECTURE TRADEOFF ANALYSIS ers multiple quality attributes such as modifiability, perfor-
Quality attributes of large software systems are principally mance, reliability and security in gaining insight as to
determined by the system’s software architecture. That is, in whether the fully fleshed out incarnation of the architecture
large systems, the achievement of qualities such as perfor- will meet its requirements. The method identifies trade-off
mance, availability, and modifiability depends more on the points between these attributes, facilitates communication
overall software architecture than on code-level practices between stakeholders (such as user, developer, customer,
such as language choice, detailed design, algorithms, data maintainer) from the perspective of each attribute, clarifies
structures, testing, and so forth. This is not to say that choice and refines requirements, and provides a framework for an
of algorithms or data structures is unimportant, but rather ongoing, concurrent process of system design and analysis.
that such choices are less crucial to a system’s success than
its overall software structure, it’s architecture. Thus, it is in The ATAM has grown out of work at the Software Engineer-
our interest to try and determine, before it is built, whether a ing Institute on architectural analysis of individual quality
system is destined to satisfy its desired qualities. attributes: SAAM (Software Architecture Analysis Method)
[4] for modifiability, performance analysis [5], availability
Although methods for analyzing specific quality attributes analysis, and security analysis [6]. SAAM has already been
exist (e.g. [4], [5], [8]), these analyses have typically been successfully used to analyze architectures from a wide vari-
performed in isolation. In reality, however, the attributes of a ety of domains: software tools, air traffic control, financial
system interact. Performance impacts modifiability. Avail- management, telephony, multimedia, embedded vehicle con-
ability impacts safety. Security affects performance. Every- trol, and so on.
thing affects cost. And so forth. While experienced designers
know that these tradeoffs exist, there is no principled method The ATAM, as with SAAM, has both social and technical
for characterizing them and, in particular, for characterizing aspects. The technical aspects deal with the kinds of data to
the interactions among attributes. be collected and how it is analyzed. The social aspects deal
with the interactions among the system’s stakeholders and
For this reason, software architectures are often designed “in area-specific experts, allowing them to communicate using a
the dark”. Tradeoffs are made—they must be made if the common framework, to make the implicit assumptions in
system is to be built—but they are made in an ad hoc fash- their analyses explicit, and to provide an objective basis for
ion. Imagine a sound engineer being given a 28 band graphic negotiating the inevitable architecture tradeoffs. This paper
equalizer, where each of the equalizer’s controls has effects will demonstrate the use of the method, and its benefits in
that interact with some subset of the other controls. But the clarifying design issues along multiple attribute dimensions,
engineer is not given a spectrum analyzer, and is asked to set particularly the tradeoffs in design.
WHY USE ARCHITECTURE TRADEOFF ANALY- iteration takes one to a more complete understanding of the
SIS? system, reduces risk, and perturbs the design. It is unlike the
All design, in any discipline, involves tradeoffs; this is well standard spiral in that no implementation need be involved:
accepted. What is less well understood is the means for mak- each iteration is motivated by the results of the analysis and
ing informed, and possibly even optimal tradeoffs. Design results in new, more elaborated, more informed designs.
decisions are often made for non-technical reasons: strategic
Analyzing an architecture involves manipulating, control-
business concerns, meeting the constraints of cost and sched-
ule, using available personnel, and so forth. ling, and measuring several sets of architectural elements,
environment factors and architectural constraints. The pri-
Having a structured method helps ensure that the right ques- mary task of an architect is to lay out an architecture that will
tions will be asked early, during the requirements and design lead to system behavior which is as close as possible to the
stages when discovered problems can be solved cheaply. It requirements within the cost constraints. For example, per-
guides users of the method—the stakeholders—to look for formance requirements are stated in terms of latency and/or
conflicts in the requirements and for resolutions to these con- throughput. However, these attributes depend on the archi-
flicts in the software architecture. tectural elements pertaining to resource allocation: the policy
for allocating processes to processors, scheduling concurrent
In realizing the method, we assume that attribute-specific
processes on a single processor or managing access to shared
analyses are interdependent, and that each quality attribute data store. The architect must understand the impact of such
has connections with other attributes, through specific archi-
architectural elements on the ability of the system to meet its
tectural elements. An architectural element is a component, a
requirements and manipulate those elements appropriately.
property of the component, or a property of the relationship
between components that affects some quality attribute. For This task is typically approached with a dearth of tools how-
example, the priority of a process is an architectural element ever. The best architects use their hunches, their experience
that could affect performance. The ATAM helps to identify with other systems, and prototyping to guide them. Occa-
these dependencies among attributes; what we call tradeoff sionally an explicit modeling step is also included as a
points. This is the principal difference between the ATAM design activity, or an explicit formal analysis of a single
and other software analysis techniques—that it explicitly quality attribute is performed.
considers the connections between multiple attributes, and
The Steps of the Method
permits principled reasoning about the tradeoffs that inevita-
The method is divided into four main areas of activities.
bly result from such connections. Other analysis frame-
works, if they consider connections at all, do so only in an These are: scenario and requirements gathering, architectural
views and scenario realization, model building and analysis,
informal fashion, or at a high level of abstraction (e.g. [7],
and tradeoffs. The method works, in broad brush, as follows:
[8]). As we will show, tradeoff points arise from architec-
tural elements that are affected by multiple attributes. once a system’s initial set of scenarios and requirements
have been elicited and an initial architecture (or small set of
THE ATAM architectures) is proposed, subject to environment and other
The ATAM is a spiral model of design [3], depicted in Figure considerations, each quality attribute will be evaluated in
1. The ATAM is like the standard spiral model in that each turn, and in isolation, with respect to any proposed design.

PHASE IV PHASE I
Tradeoffs Scenario &
Requirements
Gathering

Identify Collect
Tradeoffs Scenarios

Collect
Identify Requirements,
Sensitivities Constraints,
Environment

Describe
Architectural
Views
Attribute
Specific
Analyses
Realize
Scenarios

PHASE III PHASE II


Model Building Architectural Views
& Analyses & Scenario Realization

Figure 1: Steps of the Architecture Tradeoff Analysis Method


After these evaluations comes a critique step. During this compared. Hence the need for new models that mirror those
step tradeoff points are found: elements that affect multiple changes. Using the ATAM, then, is a continual process of
attributes. After the critique we can either: refine the models choosing among competing architectures, even when these
and re-evaluate; refine the architectures, change the models look “pretty much the same” to a casual observer.
to reflect these refinements and re-evaluate; or change some
Step 4 — Attribute-Specific Analyses
requirements. We now look at each these steps in more
Once a system’s initial set of requirements and scenarios has
detail.
been elicited and an initial architecture (or small set of archi-
Step 1 — Collect Scenarios tectures) is proposed, each quality attribute must be analyzed
The first step in the method is to elicit system usage scenar- in isolation, with respect to each architecture. These analyses
ios from a representative group of stakeholders. This serves can be conducted in any order; no individual critique of
the same purposes as it does in SAAM: to operationalize attributes against requirements or interaction between
both functional and quality requirements, to facilitate com- attributes is done at this point. Allowing separate (concur-
munication between stakeholders, and to develop a common rent) analysis is an important separation of concerns that
vision of the important activities the system should support. allows individual attribute experts to bring their expertise to
bare on the system.
Step 2— Collect Requirements/Constraints/Environment
The second step in the method is to identify the attribute- The result of the analyses leads to statements about system
based requirements, constraints, and environment of the sys- behavior with respect to values of particular attributes:
tem. A requirement can have a specific value or can be “requests are responded to in 60 ms. average”, “the mean
described via scenarios of hypothetical situations. The envi- time to failure is 2.3 days”, “the system is resistant to known
ronment must be characterized for subsequent analyses (e.g. attack scripts:, “the hardware will cost $80,000 per plat-
performance or security) and constraints on the design space, form”, “the software will require 4 people per year to main-
as they evolve, are recorded as these too affect attribute anal- tain”, and so forth.
yses. This step places a strong emphasis on revisiting the
Step 5 — Identify Sensitivities
scenarios from the previous step to ensure that they account
Here, the sensitivity of individual attribute analyses to par-
for important quality attributes.
ticular architectural elements is determined. That is, one or
Step 3 — Describe Architectural Views more attributes of the architecture are varied, the models are
The requirements, scenarios, and engineering design princi- then varied to capture these design changes, and the results
ples together generate candidate architectures and constrain are evaluated. Any modelled values that are significantly
the space of design possibilities. In addition, design almost affected by a change to the architecture are considered to be
never starts from a clean slate: legacy systems, interoperabil- sensitivity points.
ity, and the successes/failures of previous projects all con-
strain the space of architectures. Step 6 — Identify Tradeoffs
The next step of the method is to critique the models build in
Moreover, the candidate architectures are described in terms step 4 and to find the architectural tradeoff points Although
of the architectural elements that are relevant to each of the it is standard practice to critique designs, significant addi-
important quality attributes. For example, voting schemes tional leverage can be gained by focussing this critique on
are an important element for reliability; concurrency decom- the interaction of attribute-specific analyses, particularly the
position and process prioritization are important for perfor- location of tradeoff points. Here is how this is done.
mance; firewalls and intruder models are important for
security, and encapsulation is important for modifiability. Once the architectural sensitivity points have been deter-
mined, finding tradeoff points is simply the identification of
Throughout our presentation of the method, we assume that architectural elements to which multiple attributes are sensi-
multiple, competing architectures are being compared. How- tive. For example, the performance of a client-server archi-
ever, designers typically consider themselves to be working tecture might be highly sensitive to the number of servers
on only a single architecture at a time. Why are these views (performance increases, within some range, by increasing
not aligned? From our perspective, an architecture is a col- the number of servers). The availability of that architecture
lection of functionality assigned to a set of structural ele- might also vary directly with the number of servers. How-
ments, with constraints on the coordination model—the ever, the security of the system might vary inversely with the
control flow and data flow among those elements. Almost number of servers (because the system contains more poten-
any change will mutate one of these aspects, thus resulting in tial points of attack). The number of servers, then, is a
a new architecture. While this point might seem like a split- tradeoff point with respect to this architecture. It is an ele-
ting of hairs, these are important hairs to split in this context ment, potentially one of many, where architectural tradeoffs
for the following reason. The ATAM requires building and will be made, consciously or unconsciously.
maintaining attribute models (both quantitative models and
Iterations of the ATAM
qualitative) that reflect and help to reason about the architec-
ture. To change any aspect of an architecture—functionality, When we have completed the above steps, we are then in a
position to compare the results of the analyses to the require-
structural elements, coordination model—will change one or
ments. When the analyses show that the system’s predicted
more of the models. Once a change has been proposed, the
new and old architectures are “competing”, and must be behavior comes adequately close to its requirements, the
designers can proceed to a more detailed level of design or to
implementation. In practice, however, it is useful to continue Architectural Options
to track the architecture with analytic models; to support Since the ATAM was created to illustrate architectural
development, deployment, and beyond to maintenance. tradeoffs, we need some architectures to analyze. We will
Design never ceases in a system’s life cycle, and neither consider three options: a simple Client-Server architecture, a
should analysis. more complex version of this architecture, called Client-
Server-Server, where the server has been replicated, and
In the event that the analysis reveals a problem, we now
finally an option called Client-Intelligent Cache-Server.
develop an action plan for changing the architecture, the Each of these architectures will use the identical server
models, or the requirements. The action plan will draw on
architecture—all that changes is the ways in which the rest
the attribute-specific analyses and identification of tradeoff
of the system interacts with the server (or servers).
points. This then leads to another iteration of the method.
The server’s architecture contains three kinds of compo-
It should be made clear that we do not expect these steps to
nents: furnace tasks (independently scheduled units of exe-
be followed linearly. They can and do interact with each
cution), that schedule themselves to run with some period; a
other in complex ways: an analysis can lead to the reconsid- shared communication facility task, that accepts messages
eration of a requirement; the building of a model can point
from the furnace tasks and sends them to a specified client;
out places where the architecture has not been adequately
and the ADC task, which accepts requests from the furnace
though out or documented. This is why we depict the steps tasks, interfaces with the physical furnaces to determine their
as wedges in a circle: at the center of the circle every step
temperatures, and passes the result back to the requesting
touches (and exchanges information with) every other step.
furnace task.
AN EXAMPLE ANALYSIS
To exemplify the ATAM, we have chosen an example that Furnace Task1
has already been extensively analyzed in the research litera-
ture, that of a remote temperature sensor (discussed in [8] to Furnace Task2 to
and elsewhere). We have chosen this example precisely furnaces clients
ADC Comm
because it has already been heavily scrutinized. The exist-

...
ence of other analyses focuses attention on the differences of
the ATAM. We will analyze this system with respect to its Furnace Task16
availability, security, and performance attributes.
System Description Figure 2: The Architecture of a Furnace Server
The RTS (remote temperature sensor) system exists to mea-
sure the temperatures of a set of furnaces, and to report those Now that the server architecture has been described, we will
temperatures to an operator at some other location. In the present the overall system architectures. In each of the sys-
original example the operator was located at a “host” com- tems a set of 16 clients interacts with one or more servers,
puter. The RTS sends periodic temperature updates to the communicating via a network.
host computer, and the host computer sends control requests Architectural Option 1 (Client-Server)
to the RTS, to change the frequency at which periodic Option 1 is the baseline; a simple and inexpensive client-
updates are sent. These requests and updates are done on a server architecture, with a single server serving all 16 clients,
furnace by furnace basis. That is, each furnace can be report- as shown in Figure 3.
ing its temperature at a different frequency. The RTS is pre-
sumably part of a larger process control system. The control Furnace Client 1
part of the system is not discussed in this example, however.
RTS Server Furnace Client 2
We are interested in analyzing the RTS for the qualities of
performance, security, and availability. To illustrate these .
.
...

analyses we have made the model problem richer and more .


complex that its original manifestation. In addition to the
Furnace Client 15
original set of functional requirements, we have embedded
the RTS into a system architecture based on the client-server
idiom. The remote temperature sensor functionality is encap- Furnace Client 16
sulated in a server, that serves some number of clients. To
remain consistent with the original problem, our analysis Figure 3: Option 1’s Architecture
will assume that there are 16 clients, one per furnace.
Architectural Option 2 (Client-Server-Server)
The RTS server hardware includes an analog to digital con- Option 2 differs from option 1 in that it adds a second server
verter (ADC), that can read and convert a temperature for to the system architecture. These servers interact with clients
one furnace at a time. Requests for temperature readings are as a “primary” server (indicated by the solid lines between
queued and fed, one at a time, to the ADC. The ADC mea- servers and clients) or as a “backup” server (indicated by the
sures the temperature of each furnace at the frequency speci- dashed lines). As shown in Figure 4, each server has its own
fied by its most recently received control request. set of independent furnace tasks, ADC and Comm, but com-
municates with the same furnaces and with the same set of ing and clarifying requirements. This is because each
clients, although under normal operation each server only analysis technique incorporates (often implicit) assumptions.
serves 8 of the 16 clients. The use of several analysis techniques together helps to
uncover these assumptions and make them explicit.
Every client knows the location of both servers and if they
detect that the server is down (because it has failed to PERFORMANCE ANALYSES
respond for a prescribed period of time), they will automati- In the analyses that follow, we will not show the details of
cally switch to their specified backup. doing performance, availability, security, or any other kind
of analysis in detail. We do this for two reasons. First, these
Furnace Server 1 Furnace Client 1 details can be found in [2]. Second, this paper is not intended
to propose or exemplify any particular analysis technique.
Furnace Client 2 Indeed, any technique that meets the information require-
...

ments of the ATAM would do just as well. Our interest is in


. how the techniques interact, and how this interaction mini-
. mizes risk in a rational, documented design process.
Furnace Server 2 .
Furnace Client 15 In doing a performance analysis, we will consider require-
ments that typically are derived from scenarios generated
...

through interviews with the stakeholders. In this case the


Furnace Client 16
performance requirements are:
Figure 4: Option 2’s Architecture PR1: Client must receive a temperature reading within F
seconds of sending a control request.
Architectural Option 3 (Client-Intelligent Cache-Server) PR2: Given that Client X has requested a periodic update
Option 3 differs from option 1 in only one way: each client every T(i) seconds, it must receive a temperature on the
has a “wrapper” that intercedes between it and the server. average every T(i) seconds.
This wrapper is an “intelligent cache”, shown as IC in Figure PR3: The interval between consecutive periodic updates
5. The cache works as follows: it intercepts periodic temper-
must be not more than 2T(i) seconds.
ature updates from the server to the client, builds a history of
these updates, and then passes each update to the client. In In addition to these requirements, we will assume that the
the event of a service interruption, the cache synthesizes behavior patterns and execution environment as follows:
updates for the client. It is an intelligent cache because the • Relatively infrequent control requests
updates it provides take advantage of historical temperature • Requests are not dropped
trends to extrapolate plausible values into the future. This
intelligence may be nothing more than linear extrapolation • No message priorities
or it might be a sophisticated model that analyzes changes in • Server latency = de-queuing time (Cdq = 10 ms) + fur-
temperature trends, or takes advantage of domain-specific nace task computation (Cfnc = 160 ms)
knowledge on how furnaces heat up and cool down. • Network latency between client and server (Cnet = 1200
As long as the furnaces exhibit regular behavior in terms of ms)
temperature trends, then the cache’s extrapolated updates Because attributes “trade off” against each other, each
will be accurate. Obviously, the cache’s synthesized updates assumption is subject to inspection, validation, and question-
will become less meaningful over time. ing as a result of the ATAM.
Performance Analysis of Option 1
IC Furnace Client 1
The performance characteristics of architectural option 1 are
Furnace Server summarized in Table 1.1
IC Furnace Client 2
. WCCL ACPL Jitter
...

.
41,120 ms 5,100 ms 20,400 ms
.
IC Furnace Client 15 Table 1: Performance Analysis for Option 1
A worst case control latency of 41.12 seconds sounds like a
IC Furnace Client 16 bad thing. However, is it? To answer this question we must
understand the requirement better. How often will the worst
Figure 5: Option 3’s Architecture (with Cache) case occur? Is it ever tolerable to have the worst case occur?
For a safety-critical application, the answer might be “no”.
These then are our three initial architectural alternatives. To For an interactive Web-based application, the answer might
understand and compare them, we will analyze them using be “yes”, because the price of ensuring a smaller worst case
the ATAM. This method will aid us in understanding not
only the relative strengths and weaknesses of each architec- 1. WCCL = worst-case control latency, ACPL = average-case
ture, but will also provide a structured framework for elicit- periodic latency, and BCPL = best-case periodic latency.
is prohibitive. Doing an analysis of a single quality attribute elapses, the intelligent cache can pass a synthesized update
forces one to consider such requirements issues. to the client. When the actual update arrives, the cache
updates its state accordingly. Thus, if we trust the intelligent
The worst case periodic latency is 37.12 seconds. However,
cache, we can bound the worst case jitter to any desired
the worst case scenario is unlikely: it assumes that all fur- value. The smaller the bounding value the more likely a
naces are queried at the maximum rate (T(1) = 10), that all
given update will be synthesized by the intelligent cache
periodic updates are issued simultaneously, and that the
rather than coming directly from the server.
update being measured (the worst case update) is the last one
in the queue. More importantly, in this application the cost of Critique of the Analysis
a missed update is not great—another one will arrive in the This simple performance analysis gives insight into the char-
next T(i) seconds. Given these facts, we calculate the aver- acteristics of each solution early in the design process, as
age case latency, to see if the system can meet its deadlines befits an architectural level analysis. As more details are
under more normal conditions, and accept the fact that an required, the analyses can be refined, using techniques such
occasional periodic update might be missed. as RMA [5], SPE [8], simulation, or prototyping. More
importantly, a high-level analysis guides our future investi-
Finally, we turn to PR3, the “Jitter” requirement. Jitter is the
gations, highlighting potential performance “hot-spots”, and
amount of time variation that can occur between consecutive
allowing us to determine areas of architectural sensitivity to
periodic temperature updates. The requirement is that the jit- performance, which lead us to the location of tradeoff points.
ter be not more than 2T(i), which is a minimum of 20 sec-
onds for T(i) = 10. The interval between consecutive The ATAM thus promotes analysis at multiple resolutions as
readings will be not more than 2T(i) if the difference a means of minimizing risk at acceptable levels of cost.
between best case and worst case latency is not more than Areas of high risk are analyzed more deeply (perhaps simu-
2T(i), for this is an expression of jitter. So, the worst case jit- lated or prototyped) than the rest of the architecture. And
ter = BCPL - WCPL = 21,760 - 1,360 = 20,400 ms. This is each level of analysis helps determine where to analyze more
greater than the minimum 2T(1) of 20 seconds, and so option deeply in the next iteration.
1 cannot meet PR3.
AVAILABILITY ANALYSES
However, in evaluating architectural option 1’s response to We will initially only consider a single availability require-
PR3, we must ask “What is the cost of a missed update?”. Is ment for the RTS system:
it ever acceptable to violate this requirement? In some
safety-critical applications the answer would be “no”. In AR1: System must not be unavailable for more than 60
most applications, the answer would be “yes”, providing that minutes per year.
this occurrence was infrequent. The results of this evaluation The availability analysis considers a range of component
force one to reconsider the importance of meeting PR3. failure rates, from 0 to 24 per year. We only present the
Performance Analysis of Option 2 results for the case of 24 failures per year. We also consider
two classes of repairs, depending on the type of failure:
The performance characteristics of architectural option 2 are
summarized in Table 2. • major failures, such as a burned-out power supply, that
require a visit by a hardware technician to repair, taking
WCCL ACPL Jitter 1/2 a day; and
20,560 ms 2,550 ms 9,520 ms • minor failures, such as software bugs, that can be
“repaired” by rebooting the system, taking 10 minutes.
Table 2: Performance Analysis for Option 2
To understand the availability of each of the architectural
One point should be noted here, and will be returned to later options, we built and solved a Markov model. In this analy-
in this discussion: if one of the servers fails, option 2 has the sis, we only considered server availability.
performance and availability characteristics of option 1.
Availability Analysis of Option 1
Performance Analysis of Option 3 Solving the Markov model for option 1 gives the results
The performance characteristics of architectural option 3 are shown in Table 4: 279 hours of down time per year for the
summarized in Table 3. burned-out power supply and almost 4 hours down per year
for the faulty operating system.
WCCL ACPL Jitter
41,120 ms 5,200 ms ≤20,400 ms Repair Time Failures/yr Availability Hrs down/yr
12 hours 24. 0.96817 278.833
Table 3: Performance Analysis for Option 3
10 minutes 24. 0.99954 3.9982
For this analysis, we have added a new factor: servicing the
intelligent cache (adding a new update and recalculating the Table 4: Availability of Option 1
extrapolation model) takes 100 ms. In this case, the worst
Availability Analysis of Option 2
case jitter is exactly the same as for option 1, 20,400 ms. We would expect option 2 to have better availability than
However, the intelligent cache exists to protect the client
option 1, since each server acts as a backup for the other, and
against some amount of lost data. As a consequence, it can
we expect the probability of both servers being unavailable
bound the worst case jitter. When some pre-set time period
to be small. Solving the Markov model for this architecture, the detailed cost analyses can be found in [2]).
we get the results shown in Table 5. • Option 2 has excellent availability, but at the cost of extra
hardware. It also has excellent performance (when both
Repair Time Failures/yr Availability Hrs down/yr
servers are functioning), and the characteristics of option
12 hours 24. 0.99798 17.7327 1 when a single server is down.
10 minutes 24. ~1.0 0.0036496 • Option 3 has slightly better availability than option 1,
better performance than option 1 (in that the worst case
Table 5: Availability of Option 2 jitter can be bounded), slightly greater cost than Option
Table 5 shows that option 2 now suffers almost 18 hours of 1, and lower cost than Option 2.
down time per year in the burned-out power supply case. The conclusions that our analyses lead us to also cause us to
This indicates that architectural option 2 might still suffer ask some further questions.
outages if it encounters frequent hardware problems. On the
other hand, option 2 shows near-perfect availability in the Further Investigation of Option 2
operating system reboot scenario. The availability is shown For example, we need to consider the nature of option 2 with
as perfect 1.0 (the calculations were performed to 5 digits of a server failure. Given that option 2 is identical to option 1
accuracy). In the worst case of 24 annual failures option 2 when one server fails, and we have already concluded that
exhibits only 13 seconds of down time per year. option 1 has poor performance and availability, it is impor-
tant to know how much time option 2 will be in that reduced
Availability Analysis of Option 3 state of service. When we calculate the availability of both
Considering architectural option 3, we expect that it will servers, using our worst-case assumption of 24 failures per
have better availability characteristics than option 1, but year, we expect to suffer over 22 days of reduced service.
worse than option 2. This is because the intelligent cache,
while providing some resistance to server failures, is not Action Plan
expected to be as trustworthy as an independent server. Solv- Given this understanding of options 2 and 3, we see that
ing the Markov model, we get the results shown in Table 6 none of these completely meet their requirements. While
for a cache that is assumed to be trustworthy for 5 minutes. option 2 meets its availability target (for failures that involve
rebooting the server), it leaves the system in a state where its
Repair Time Failures/yr Reliability Hrs down/yr performance targets can not be met for more than 22 days
per year. Perhaps a combination of options 2 and 3—dual
12 hours 24. 0.96839 276.91 servers and intelligent cache on clients—will be a better
10 minutes 24. 0.9997 2.66545 alternative. This option will provide the superior availability
and performance of option 2, but during the times when one
Table 6: Availability of Option 3
server has failed, we mitigate the jitter problems of the single
The results in Table 6 show that the 5 minute intelligent remaining server by using the intelligent cache.
cache does little to improve option 3 over option 1 in the sce-
We could not have made these design decisions without
nario with the burnt-out power supply. Option 3 still suffers
knowledge gained from the analysis. Performing a multi-
over 277 hours of down time per year. However, the results attribute analysis allows one to understand the strengths and
for the reboot scenario look more encouraging. The cache
weaknesses of a system, and of the parts of a system, within
reduces down time to 2.7 hours per year. Thus, it appears
a framework that supports making design decisions.
that the intelligent cache, if its extrapolation was improved,
might provide high availability at low cost (since this option SENSITIVITY ANALYSES
uses a single server, compared with the replicated servers Given that the performance and availability of option 2 were
used in option 2). We return to this issue shortly. so much better than option 1, we would suspect that these
attributes are sensitive to the number of servers. Sensitivity
CRITIQUE OF THE OPTIONS
analysis confirms this: performance increases linearly as the
Now that we have seen two different attribute analyses, one number of servers increases (up to the point where there is 1
part of the method can be commented on: the level of granu-
server per client) and availability increases by roughly an
larity at which a system is analyzed. The ATAM advocates
order of magnitude with each additional server [2].
analysis at multiple levels of resolution as a means of mini-
mizing risk at acceptable investments of time and effort. Given that option 3 has some desirable characteristics in
Areas that are deemed to be of high risk are analyzed and terms of cost and jitter, we might ask if we can improve the
evaluated more deeply than the rest of the architecture. And intelligent cache sufficiently to make this option acceptable
each level of analysis helps to determine “hot spots” to focus from an availability perspective. To answer this, we plot
on in the next iteration. We will illustrate this point next. option 3’s availability against the length of time during
which the intelligent cache’s data is trusted. This plot is
The three architectures can be partially characterized and
understood by the measures that we have just derived. From
this analysis, we can conclude the following:
• Option 1 has poor performance and availability. It is also
the least expensive option (in terms of hardware costs;
shown in Figure 6. an acceptable window of opportunity for an intruder, we
define initial values that are reasonable for the functions
provided in the RTS architectures. These are:
Attack Components Value
(hours/year)
Down time

Attack Exposure Window 60 minutes


Attack Rate 0.05 systems/min
Server failure rate 24 failures/year
Prob of server failure 0.0027
within 1 hour
TCP Intercept 0.5

successful
Prob of
Spoof IP address 0.9
Cache life (minutes) Kill Connection 0.75
Kill Server 0.25
Figure 6: Down time vs. Intelligent Cache Life
Table 7: Environmental Security Assumptions
As we can see, an improved intelligent cache does improve
availability. However, the rate of improvement in availability In addition, we will posit two attack scenarios: one where the
as a function of cache life is so small that no reasonable, intruder uses a “man in the middle” (MIM) attack, and one
achievable amount of cache improvement will result in the where the intruder uses a “spoof server” attack.
kind of availability demonstrated for option 2. In effect, the For the MIM attack, the attacker uses a TCP intercept tool to
intelligent cache is an architectural barrier with respect to modify the values of the temperatures during transmission.
availability, because it can not be made to achieve the levels Since there are no specific security countermeasures to this
of utility required of it. To put it another way, the availability attack, the only barrier is the 60 minute window of opportu-
of option 3 is not sensitive to cache life. To increase the nity and the 0.5 probability of success for the TCP intercept
availability substantially, other paths must be investigated. tool. Thus the rate of successful attack is 0.025 systems/
minute, or about 1.5 successful attacks expected in the win-
SECURITY ANALYSES dow of opportunity.
Although we could have been conducting security analyses
with the performance and availability analyses from the For the spoof-server attack, there are three possible ways to
start, the ATAM does not require that all attributes be ana- succeed. The intruder could wait for a server to fail, then
lyzed in parallel. The ATAM allows the designer to focus on spoof that server’s address and take over the client connec-
those attributes that are considered to be primary, and then tions. This presumes that the intruder can determine when a
introduce others later on. This can lead to cost benefits in server has failed and can take advantage of this before the
applying the method, since what may be costly analyses for clients time out. Another successful method would be to
some secondary attributes need not be applied to architec- cause the server to fail (the “kill server” attack), then take
tures that were unsuitable for the primary attributes. Though over the connections. A third is to disrupt the connections
all analyses need not occur “up-front and simultaneously”, between the client and server, then establish new connec-
the analyses for the secondary attributes can still occur well tions as a spoofed server (the “kill connection” attack). For
before implementation begins. this analysis, it is presumed that the intruder is equally likely
to attempt any of these methods in a given attack and the
We will now analyze our three options in terms of their secu- results are summarized in Table 8. Of course, these numbers
rity. In particular, we will examine the connections between appear precise, but must be treated as estimates given the
the furnace servers and clients, since this could be the sub- subjective nature of the environmental assumptions.
ject of an attack. The object at risk is the temperature sent
from the server to the client, since this information is used by Attack Type Expected Intrusions in 60 Mins
the client to adjust the furnace settings. If the temperature is
Kill Connection 2.04
tampered with it could be a significant safety concern. Thus
we have the security requirement: Kill Server 0.66
SR1: The temperature readings must not be corrupted Server Failure 0.0072
before they arrive to the client. Table 8: Anticipated Spoof Attack Success Rates
Our initial security investigation of the architectural options
It should be noted that if the system must deal with switching
must, once again, make some environmental assumptions.
servers and reconnecting clients when a server goes in and
These assumptions are dependent on the operational envi-
ronment of the delivered system and include factors such as out of service, it will be easier for an intruder to spoof a
server and convince a client to switch to the bogus server.
operator training and patch management. These dependen-
We will return to this point in the sensitivity analysis.
cies are out of scope for the analysis at this level of detail,
but must be considered later in the design process. The results of this analysis show that in each case, it is
expected that a penetration will take place within 60 min-
So, to calculate the probability of a successful attack within
utes. For the MIM scenario, the expected number of success-
ful attacks is 1.5, indicating that an intruder would have form of intrusion detection reduces the number of expected
more than enough time to complete the attack before detec- intrusions by 1-2 orders of magnitude, giving a result com-
tion. For the spoof attack, the number of successful attacks parable to encryption, but at substantially lower performance
ranges from 0.0072 to just over 2, again showing that a pene- and software/hardware costs:
tration using this technique is also likely.
Attack Type Expected Intrusions in 60 Mins
Refined Architectural Options
To address the apparent inadequacy of the three options, we Kill Connection 0.16875
need to cycle around the steps of the ATAM, proposing new Kill Server 0.05625
architectures. The modified versions of the options include Server Failure 0.005
the addition of encryption/decryption and the use of the
intelligent cache as an intrusion detection mechanism, as Table 11: Spoof Attack Success Rates with Intrusion
shown in Figure 7. Detection
At this point, new performance and availability analyses will
E/D IC Furnace Client 1 need to be run to account for the additional functionality and
Furnace Server hardware required by the intelligent cache or encryption
E/D IC Furnace Client 2 modifications, thus instigating another trip around the spiral.
E/ .
...

D . SENSITIVITIES AND TRADEOFFS


. Following the ATAM, we are now in a position to do further
E/D IC Furnace Client n-1 sensitivity analysis. In particular, we noted earlier that both
availability and performance highly positively correlated to
E/D IC Furnace Client n the number of servers. A sensitivity analysis respect to secu-
rity shows just the opposite: security is negatively related to
Figure 7: Security Modifications the number of servers. This is for two reasons:
• going from one to multiple servers requires additional
Encryption/decryption needs little explanation; it is the most logic within the clients, so that they are able to switch
common security “bolt on” solution. The other security between servers. This provides an entry point for spoof-
enhancement is not a topological change, but rather a change ing attacks that does not exist when a client is “hard-
in the function of the intelligent cache. In this design, the wired” to a single server;
cache uses its predictive values to determine if the tempera-
tures supplied by the network are reasonable. A temperature • the probability of a server failure within 1 hour increases
linearly with the number of servers, thus increasing the
that is significantly outside a reasonable change range is
deemed to be generated by an intruder and thus the cache opportunities for server spoofing.
aids the operator in detecting an intrusion. Adding encryp- At this point we have discovered an architectural tradeoff
tion adds some new environmental assumptions. These are: point, in the number of servers. Performance and availability
are positively correlated, while security and presumably cost
Attack Components Value are negatively correlated, with the number of servers. We
successful

Decrypt 0.0005 cannot maximize cost, performance, availability, and secu-


Prob of

rity simultaneously. Using this information, we can make


Replay 0.05
informed tradeoff decisions regarding the level of the vari-
Key Distribution 0.09 ous attributes that we can achieve at an acceptable cost, and
Table 9: Additional Security Assumptions we can do so within an analytic framework.

Based on these assumptions, we can calculate the expected THE IMPLICATIONS OF THE ATAM
number of intrusions. Not surprisingly, the addition of For every assumption that we make in a system’s design, we
encryption has reduced these substantially—by at least an trade cost for knowledge. For example, if a periodic update
order of magnitude—in each option: is supposed to arrive every 10 seconds, do we want it to
arrive exactly every 10 seconds, on average every 10 sec-
Attack Type Expected Intrusions in 60 Mins onds, some time within each 10 second window? To give
Kill Connection 0.18225 another example, consider the requirement detailing the
worst case latency of control packets. As discussed earlier, is
Kill Server 0.03375 this worst case ever acceptable? If so, how frequently can we
Server Failure 0.0006 tolerate it? The process of analyzing architectural attributes
forces us to try to answer these questions. Either we under-
Table 10: Spoof Attack Success Rates with Encryption
stand our requirements precisely or we pay for ignorance by
Our analysis of the intelligent cache changes only one envi- over-engineering or under-engineering the system. If we
ronmental assumption: the “Attack Exposure Window” goes over-engineering, we pay for our ignorance by making the
down to 5 minutes, since we assume that an operator can system needlessly expensive. If we under-engineer, we face
detect and respond to an intrusion in that time. Using this system failures, losing customers or perhaps even lives.
Can we believe the numbers that we generated in our analy- new interactions between attributes which may require fur-
ses? No. However we can believe the trends—we have seen ther analysis, sometimes at different levels of abstraction.
differences among designs in terms of orders of magni- Such obstacles are an intrinsic part of a detailed methodical
tude—and these differences, along with sensitivity analysis, exploration of the design space and cannot be avoided. Man-
tell us where to investigate further, where to get better envi- aging the conflicts and interactions that are revealed by the
ronmental information, where to prototype, which will get us ATAM places heavy demands on the analysis skills of the
numbers that we can believe. Every analysis step that we individual attribute experts. Success largely depends upon
take precipitates new questions. While this seems like a the ability of those experts to transcend barriers of differing
daunting, never-ending prospect, it is manageable because terminology and methodology to understand the implications
these questions are posed and answered within an analytic of inter-attribute dependencies, and to jointly devise candi-
attribute framework, and because in architectural analysis date architectural solutions for further analysis. As burden-
we are more interested in finding large effects than in precise some as this may appear to be, it is far better to intensively
estimates. manage these attribute interactions early in the design pro-
cess than to wait until some unfortunate consequences of the
In addition to concretizing requirements, the ATAM has one
interactions are revealed in a deployed system.
other benefit: it helps to uncover implicit requirements. This
occurs because attribute analyses are, as we have seen, inter- CONCLUSIONS
dependent—they depend, at least partially, on a common set The ATAM was motivated by a desire to make rational
of elements, such as the number of servers. However, in the choices among competing architectures, based upon well-
past, they have been modeled as though they were indepen- documented analyses of system attributes at the architectural
dent. This is clearly not the case. level, concentrating on the identification of tradeoff points.
Each analyzed attribute has implications for other attributes. The ATAM also serves as a vehicle for the early clarification
For example, although the availability analysis was only of requirements. As a result of performing an architecture
focussed on servers availability, in a complete analysis we tradeoff analysis, we have an enhanced understanding of,
would look at potential failures of all components, including and confidence in, a system’s ability to meet its require-
the furnaces, the clients, and the communication lines, and ments. We also have a documented rationale for the architec-
we would look at the various failure types. One such failure tural choices made, consisting of both the scenarios used to
is dropping a message. If we assume that the communication motivate the attribute-specific analyses and the results of
channel is not reliable, then we might want to plan for re- those analyses.
sending messages. To do this involves additional computa- Consider the RTS case study: we began with vague require-
tion (to detect and re-send lost messages), storage (to store ments and enumerated three architectural options. The ana-
the messages until they have been successfully transmitted), lytical framework helped determine the useful characteristics
and time (for a time-out interval and for message re-trans- of each of the architectural options and highlighted the costs
mission). Thus one of the major implications of this avail- and benefits of the architectural features. More importantly,
ability concern is that the performance models of the options the ATAM helped determine the locations of architectural
under consideration need to be modified. tradeoff points, which helped us understand the limits of
To recap, we discover attribute interactions in two ways: each option. This helped us develop informed action plans
using sensitivity analysis to find tradeoff points, and by for modifying the architecture, leading to new evaluations
examining the assumptions that we make for analysis A and new iterations of the method.
while performing analysis B. The “no dropped packets”
assumption is one example of such an interaction. This REFERENCES
assumption, if false, may have implications for safety, secu- [1] M. Barbacci, M. Klein, C. Weinstock, “Principles for
rity, and availability. A solution to dropping packets will Evaluating the Quality Attributes of a Software Architec-
have implications for performance. ture”, CMU/SEI -96-TR-36, 1996.
In the ATAM attribute experts independently create and ana- [2] M. Barbacci, J. Carriere, R. Kazman, M. Klein, H. Lip-
lyze their models, then they exchange information (clarify- son, T. Longstaff, C. Weinstock, “Architecture Tradeoff
ing or creating new requirements). On the basis of this Analysis: Managing Attribute Conflicts and Interactions”,
information they refine their models. The interaction of CMU/SEI -97-TR-29, 1997.
attribute-specific analyses, and the identification of
tradeoffs, has a greater effect on system understanding and [3] B. Boehm, “A Spiral Model of Software Development
stakeholder communication than any of those analyses could and Enhancement”, ACM Software Eng. Notes, 11(4), 22-42,
do on their own. 1986.
The complexity inherent in most real-world software design [4] R. Kazman, G. Abowd, L. Bass, P. Clements, “Scenario-
implies that an architecture tradeoff analysis will rarely be a Based Analysis of Software Architecture”, IEEE Software,
straightforward activity that allows you to proceed linearly Nov. 1996, 47-55.
to a perfect solution. Each step of the method answers some
design questions, and brings some issues into sharper focus. [5] M. Klein, T. Ralya, B. Pollak, R. Obenza, M. Gonzales
However, each step often raises new questions and reveals Harbour, A Practitioner’s Handbook for Real-Time Analysis,
Kluwer Academic, 1993.
[6] H. Lipson, T. Longstaff (eds.), Proceedings of the 1997
Information Survivability Workshop, IEEE CS Press, 1997.
[7] J. McCall, “Quality Factors”, in (J. Marciniak, ed.),
Encyclopedia of Software Engineering, Vol. 2, Wiley: New
York, 1994, 958-969.
[8] C. Smith, L. Williams, “Software Performance Engi-
neering: A Case Study Including Performance Comparison
with Design Alternatives”, IEEE Transactions on Software
Engineering, 19(7), 720-741.

You might also like