2009 IEEE 34th Conference on Local Computer Networks (LCN 2009)
Zürich, Switzerland; 20-23 October 2009
An Architecture for Scalable Network Defense
Tim Strayer, Walter Milliken, Ronald Watro,
Walt Heimerdinger†, Steve Harp††, Robert P. Goldman†††, Dustin Spicuzza,
Beverly Schwartz, David Mankins, Derrick Kong, Peiter Mudge Zatko
† †† †††
BBN Technologies Honeywell Adventium SIFT, LLC
{strayer|milliken|rwatro|dspicuzz|bschwart|dm|dkong|mudge}@bbn.com
[email protected];
[email protected];
[email protected] Abstract—We describe a novel architecture for network de- handle the output of the algorithms. Attack detection algo-
fense designed for scaling to very high data rates (100 Gb/s) and rithms with a single, myopic view of the network tend to issue
very large user populations. Scaling requires both efficient attack many more false alarms than true ones, causing the operators
detection algorithms as well as appropriate an execution envi- to either ignore the alarms (guaranteeing that all attacks evade
ronment. Our architecture considers the time budget of traffic
detection), or raise the alarm threshold (allowing some attacks
data extraction and algorithmic processing, provides a suite of
detection algorithms—each designed to present different and to pass that would have otherwise been caught).
complementary views of the data—that generate many “traffic An attack detection algorithm’s output—the alarms it gen-
events,” and reduces false positives by correlating these traffic erates—is a product of the algorithm’s internal model of the
events into benign or malicious hypotheses. network and actions of the attack. Most of the time these mod-
els are implicit in the algorithm itself; when the algorithm
I. SCALABLE ARCHITECTURES makes a choice about the maliciousness of network activity, it
One of the most pressing challenges imposed on network is applying its internal models, and therefore the results are
defense mechanisms is the significant increase in network limited to the model’s accuracy and coverage. A better ap-
speeds. While the well-known Moore’s Law states that com- proach is to employ explicit network and attack models at a
pute power doubles every eighteen months, Gilder’s Law [1] correlation point, where events (not yet alarms) from multiple
states that communication power doubles every six, suggest- algorithms can be reinforced to provide high value alarms, or
ing that bandwidth grows at least three times faster than com- explained away to reduce false positives.
puter power. This is a harsh reality for computer network de- Current network monitoring algorithms face strong chal-
fense; the implication is that defensive strategies must be in- lenges when their use is intended for very high speeds. Several
herently scalable, or they become moment-in-time solutions. approaches have attempted to address these challenges. Broder
Scalable attack detection algorithms must operate effi- and Mitzenmacher [2] describe the use of Bloom filters to
ciently and effectively without regard to the bandwidth of the speed up the accounting of network data traffic; Estan and
input. Since bandwidth triples with computation power, it is Varghese [3] describe research directions in the context of
impossible to consider “scalable” algorithms without also con- gathering information from high speed net-works; Iannaccone
sidering the scalability of their execution environments. The et al. [4] describe an effort for basic measurements of relative
increasing volume of input also implies that there is less time high-speed links (e.g., OC-12 and OC-48) employing special
available to investigate each alert issued by the algorithms, network interface cards (NICs).
precipitating the need to have fewer, higher valued alerts. A
truly scalable network monitoring solution, therefore requires II. SMITE
innovation in the scalable algorithms themselves, the ability to With research funding from the US Government, we are
extract and process traffic features at line speed, and reduction developing a network monitoring solution called SMITE:
and aggregation of the output into high value alarms. Scalable Monitoring in the Extreme. The SMITE architecture
In the abstract, algorithms take some input, process it, and design reflects three key observations on building for scale.
then generate some output. Network defense algorithms take First, very high data rates necessarily cause changes in the
as input features extracted from the network traffic. There are way traffic features are extracted from the network. Second,
physical limits to hardware, and the most important one is the the features that are extracted must be appropriate to support a
speed of accessing memory. As the network line rate in- broad range of attack detection algorithms. Third, the algo-
creases, DRAM memory access latency dominates many net- rithms should be viewed as contributing evidence about at-
work processing algorithms. While algorithms may have in- tacks, where the evidence is correlated to prove (or disprove)
herent theoretical properties that govern how well they scale, an attack hypothesis.
until those algorithms are applied to a real execution environ-
A. Layered Approach
ment, their scalability remains only theoretic.
Figure 1 shows the three levels of the SMITE architecture:
An effective monitoring solution must also consider how to
Sensors, Algorithms, and the Correlator. The sensors extract
This material is based upon work supported by DARPA under contract simple traffic features and other information from the network
N66001-08-C-2050 with SPAWAR. Approved for Public Release, distribu-
tion unlimited.
978-1-4244-4487-8/09/$25.00 ©2009 IEEE 368
traffic and place that data into temporary data structures.
Here, very close to the actual traffic, there is no time
budget for complex analysis or careful culling of the alerts
generated. Instead, the sensors perform very fast
processing to build the data structures that act as a first
pass aggregation of the traffic features.
Data collected at one or more sensors is passed to the
Algorithms Layer, where various algorithms do more
complex processing of the extracted traffic features and
build more complex data structures. The time budget at
this layer is not as constrained because the algorithms are
operating on aggregated data periodically collected from
the Sensors Layer.
The top layer in the architecture, the Correlation Layer,
holds the Event Correlation Analyzer. It takes as input the
events generated by the various algorithms and, using
models of both attack profiles and normal network behav-
ior kept in the correlation knowledge base, explores a
space of hypotheses to determine which events are truly
high value alerts.
Note that most intrusion detection systems do not ex-
plicitly separate the Sensors and Algorithms Layers be-
cause both functions—feature extraction and feature
analysis—are performed on the same computational Figure 1—SMITE Layered Architecture
platform. However, the two layers require very different
execution environments, especially at high data rates. The
number of memory accesses is the key driver for the design of changes the behavior of the processing, while load-splitting
a sensor, since memory is by far the slowest aspect of process- can introduce ordering and state-sharing complications. Load-
ing the traffic, while the time budget allows algorithms to run splitting designs usually depend on flow bandwidth being
on a more conventional computational platform. small relative to a single processing element, and on passing
Also note that a feature of this architecture is that it sup- all packets of all flows that share state through a single proc-
ports algorithms that operate on different temporal and spatial essing element. These characteristics do not necessarily hold
scales. Sensors operate on very little context—that of the for the target environment for a network monitor—single
packet only—and have very little time to process the features. flows can be multiple gigabits in bandwidth, and many detec-
The algorithms operate on features collected from many pack- tion algorithms of interest must examine traffic across multi-
ets, allowing detection of events that occur within flows and ple flows.
even over several flows. Since data can be stored much longer The minimum packet duration (Tmin) constrains the design
at the Algorithms Layer, algorithms that employ long-term choices for hardware implementation at multi-gigabit speeds.
trend analysis can be used here as well. At 1 Gb/s, Tmin is about 500 ns, depending on the link layer
This layered architecture provides an extensible framework used. A Tmin of 500 ns allows thousands of instructions per
for adding new algorithms and expanding the deployment packet in a single conventional 3 GHz CPU core, but only
footprint, supporting both singular and distributed deploy- about 10 random memory references to main (DRAM) mem-
ment. ory. Six of these 10 memory accesses are used just to read the
packet into memory and selected fields into the CPU registers.
B. Scaling Issues
This leaves a scant 4 memory accesses per packet for sensor
The primary place to address scaling is at the Sensor Layer,
data structures. These large structures usually will not fit into
where the sensors meet the traffic, and where lessons from
CPU L1 or L2 caches, and exhibit no locality of reference, so
router design can be applied. The key driver in high-speed
access to main memory is typical. It is worse, of course, at
network processing system design is the duration of a mini-
higher line rates.
mum-size packet at link speed; any operation that takes longer
Thus, conventional processors are marginal at processing
than this time must be parallelized either by breaking the op-
packets at 1 Gb/s line rate, so load-splitting parallelism would
eration down into smaller steps (e.g., pipelining), or by spread-
be required. Since this restricts algorithm choice, may require
ing the load over multiple processing elements (e.g., cluster
additional bookkeeping about packet ordering, and scales
parallelism).
poorly to 100 Gb/s (requiring hundreds of CPUs, each with its
Successful high-speed router designs generally employ
own main memory), we use high-performance pipelined
pipelining rather than load splitting because pipelining rarely
369
hardware, the approach that has proven itself successful in A: New Server Detected
router applications [5].
Despite the advances in the area of network monitoring, the
current state of the art does not achieve the monitoring at very
high speeds. Monitoring network traffic with the intention of
feeding network defense algorithms operating at these speeds
requires special platform designs, and many approaches are
based on research into high-speed routers [6]. Special purpose
network processors, such as Intel’s IXP line [7], are capable of
high speed switching but fail at more complex processing due
to memory limitations.
The SMITE platform is therefore a high-performance pipe-
lined FPGA hardware design that can support a wide variety
of sensors that examine the traffic stream flowing past each
sensor in the processing pipeline. In general, computation is
not a limiting factor; the primary limitations on sensors are B: Entropy Too High
memory access time and memory table size.
C. Event Correlation
Any system that seeks to provide high value alarms that are
actionable and accurate with few false alarms must aggregate
and fuse as much detector information as possible. Individual
detection algorithms will report suspicious or malicious activ-
ity with varying degrees of detail or confidence. Yet data from
as many detection alerts as possible must be included in the
process of generating high value alarms. The reasoning proc-
ess must analyze detection events with widely varying scope
and credibility in a single framework.
Our approach for building the Event Correlation Analyzer
component is to leverage the successful Scyllarus event corre-
lation engine [8]. Scyllarus synthesizes high value alarms from
these events in a three-step process: First, events from indi- C: Client Acts Like a Server
vidual detection algorithms are clustered with events sharing
common attributes such as time of arrival or locality in ad-
dress space. Next, based on these clusters, event explanations
are hypothesized, some as malicious, and some as benign.
Since some of the events represent competing hypotheses to
explain the same reports (e.g., what looks like malware trans-
fer might be simply software update), the final stage of the
process weighs evidence for and against different event hy-
potheses.
D. Attack Detection Algorithms
Informed by these architectural considerations, SMITE
employs algorithms covering the major forms of attack: mali-
cious behavior, malicious code infections, information gather-
ing, and covert control of assets. These algorithms are not a
canonical set of algorithms; while they do cover an interesting
set of attacks, the SMITE architecture is designed to be exten- Figure 2—Detection Algorithms
sible to new sensors and algorithms.
It is important to note that there is some overlap in the at-
tacks these algorithms can detect. This is a feature we exploit
in SMITE. Reducing the number of false alarms, and conse-
III. WORKED EXAMPLE
quently increasing the number of high value alerts, requires Figure 2 shows a worked example of the SMITE detection
corroboration and correlation of data gathered by the various architecture. The monitor is placed at the gateway of an enter-
algorithms. prise network, where there are two well-known web servers.
In this attack, one of the internal servers has been compro-
370
mised and BitTorrent has been installed. Bit-
Torrent is a peer-to-peer network where files are
distributed among various hosts, and hosts both
get and serve these files, so BitTorrent nodes are
simultaneously both clients and servers.
Figure 2a shows the results of the “Host
Characterization” algorithm that detects new
servers. At 9:08, a new service port is detected on
Server 1. Figure 2b shows the “Entropy” algorithm
tracking the diversity of connections from a par-
ticular host. Clients naturally show higher entropy
because they typically initiate many connections,
where servers do not. At 9:10, Server 2 starts
showing many outgoing connections. Figure 2c
measures how much a host acts like a server,
which is part of the “Connection Analyzer” algo-
rithm. At 9:10, Server 1 starts acting like a client.
Figure 3 shows how the Event Correlation
Analyzer is building evidence towards a
hypothesis of attack. When new service ports
where added to Server 1, the Correlator made note
of that (“Server Adds Ports” became “Observed”).
There are two explanations, one benign (“Legit
Svc Added”) and the other malicious (“Infected”),
but there is not enough evidence to pick. Then
Figure 3—Event Correlation Analyzer
Server 1 started connecting to many hosts (“High
Entropy External”), not what servers do, and it
began to act like a client (“Server to Client”), again not what Event Correlation Analyzer actually works better with more
servers do. The plausible explanation for both is that Server 1 events. Without the Correlation Layer, the events would be
now “Initiates Conns”. Combined with the possibility that the alarms, and number of false positives would be overwhelming.
server is “Infected”, enough evidence is present to set the With the Event Correlation Analyzer, the system builds hy-
alarm. potheses to explain the observed behavior, looking for both
benign and malicious explanations, until one or the other is
IV. CONCLUSION proved; only then is an alarm sent to the network operator.
This paper presents the SMITE network monitoring archi-
tecture designed specifically to scale in both network line rate REFERENCES
and network size. The obvious solution to scalable monitoring [1] G. Gilder, TELECOSM: How Infinite Bandwidth will Revolu-
is to develop and use algorithms that are intrinsically scalable. tionize Our World, Free Press/Simon & Shuster, 2000.
This is a good and necessary first step, but the key insight to [2] A. Broder and M. Mitzenmacher, “Network Applications of
scaling is not just the algorithms but having an appropriate Bloom Filters,” Internet Mathematics, 2004.
[3] C. Estan and G. Varghese, “New Directions in Traffic Meas-
platform on which to execute the algorithms. Since network
urement and Accounting,” SIGCOMM, Pittsburgh, PA, August
line rates are outpacing processing speed, network features 19-23, 2002.
must be extracted from the traffic and placed into aggregating [4] G. Iannaccone, C. Diot, I. Graham, and N. McKe-own, “Moni-
data structures as simply as possible. Here, at the network in- toring Very High Speed Links,” ACM Internet Measurement
terface, it is memory accesses and not computation that is the Workshop, San Francisco, CA, November 1-2, 2001.
limiting factor. [5] C. Partridge, et al., “A 50-Gb/s IP Router,” IEEE/ACM Trans.
The SMITE architecture separates the feature extraction on Networking, Vol. 6, No. 3, June 1998.
and processing into a Sensor Layer (running at network line [6] S. Kumar, et al., “Algorithms to Accelerate Multiple Regular
rate) and an Algorithm Layer (processing aggregated data, so Expressions Matching for Deep Packet Inspection,” SIGCOMM
Computer Communication Review, August 2006.
running with less time constraints). The architecture provides
[7] Intel IXP2XXX Product Line of Network Processors.
a suite of algorithms providing multiple views into the data. [8] W. Heimerdinger, “Scyllarus Intrusion Detection Report Corre-
The algorithms produce “events,” or notification of anomalous lator and Analyzer,” Proceedings of the DISCEX-III Confer-
behavior, and pass these events on to the Correlation Layer. ence, 2003.
The Correlation Layer affords the algorithms the luxury of
being liberal with event notification—less effort is required to
cull the events within the algorithms themselves since the
371