Machine Learning
Machine Learning
at test time
1 Introduction
Machine learning is being increasingly used in security-sensitive applications such
as spam filtering, malware detection, and network intrusion detection [3,5,9,11,14,15,16,19,21].
Due to their intrinsic adversarial nature, these applications differ from the classi-
cal machine learning setting in which the underlying data distribution is assumed
to be stationary. To the contrary, in security-sensitive applications, samples (and,
thus, their distribution) can be actively manipulated by an intelligent, adaptive
adversary to confound learning; e.g., to avoid detection, spam emails are often
modified by obfuscating common spam words or inserting words associated with
legitimate emails [3,9,16,19]. This has led to an arms race between the designers
of learning systems and their adversaries, which is evidenced by the increasing
complexity of modern attacks and countermeasures. For these reasons, classical
performance evaluation techniques are not suitable to reliably assess the secu-
rity of learning algorithms, i.e., the performance degradation caused by carefully
crafted attacks [5].
To better understand the security properties of machine learning systems
in adversarial settings, paradigms from security engineering and cryptography
have been adapted to the machine learning field [2,5,14]. Following common se-
curity protocols, the learning system designer should use proactive protection
mechanisms that anticipate and prevent the adversarial impact. This requires
(i ) finding potential vulnerabilities of learning before they are exploited by the
adversary; (ii ) investigating the impact of the corresponding attacks (i.e., eval-
uating classifier security); and (iii ) devising appropriate countermeasures if an
attack is found to significantly degrade the classifier’s performance.
Two approaches have previously addressed security issues in learning. The
min-max approach assumes the learner and attacker’s loss functions are antago-
nistic, which yields relatively simple optimization problems [10,12]. A more gen-
eral game-theoretic approach applies for non-antagonistic losses; e.g., a spam fil-
ter wants to accurately identify legitimate email while a spammer seeks to boost
his spam’s appeal. Under certain conditions, such problems can be solved using a
Nash equilibrium approach [7,8]. Both approaches provide a secure counterpart
to their respective learning problems; i.e., an optimal anticipatory classifier.
Realistic constraints, however, are too complex and multi-faceted to be incor-
porated into existing game-theoretic approaches. Instead, we investigate the vul-
nerabilities of classification algorithms by deriving evasion attacks in which the
adversary aims to avoid detection by manipulating malicious test samples.4 We
systematically assess classifier security in attack scenarios that exhibit increas-
ing risk levels, simulated by increasing the attacker’s knowledge of the system
and her ability to manipulate attack samples. Our analysis allows a classifier
designer to understand how the classification performance of each considered
model degrades under attack, and thus, to make more informed design choices.
The problem of evasion at test time was addressed in prior work, but lim-
ited to linear and convex-inducing classifiers [9,19,22]. In contrast, the methods
presented in Sections 2 and 3 can generally evade linear or non-linear classifiers
using a gradient-descent approach inspired by Golland’s discriminative direc-
tions technique [13]. Although we focus our analysis on widely-used classifiers
such as Support Vector Machines (SVMs) and neural networks, our approach is
applicable to any classifier with a differentiable discriminant function.
4
Note that other kinds of attacks are possible, e.g., if the adversary can manipulate
the training data. A comprehensive taxonomy of attacks can be found in [2,14].
This paper is organized as follows. We present the evasion problem in Sec-
tion 2 and our gradient-descent approach in Section 3. In Section 4 we first
visually demonstrate our attack on the task of handwritten digit recognition,
and then show its effectiveness on a realistic application related to the detection
of PDF malware. Finally in Section 5, we summarize our contributions, discuss
possibilities for improving security, and suggest future extensions of this work.
Most of the previous work on evasion attacks assumes that the attacker
can arbitrarily change every feature [8,10,12], but they constrain the degree of
manipulation, e.g., limiting the number of modifications, or their total cost.
However, many real domains impose stricter restrictions. For example, in the
task of PDF malware detection [20,24,25], removal of content is not feasible, and
content addition may cause correlated changes in the feature vectors.
2 2
0.5 0.5
0 0 0 0
−0.5
−2 −2 −0.5
−1
−1
−4 −4
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
g(x) − λ p(x|yc=−1), λ=20
4
−1
2 −1.5
−2
0 −2.5
−3
−2 −3.5
−4
−4 −4.5
−4 −3 −2 −1 0 1 2 3 4
to constrain the shape of g. Thus, when our gradient descent procedure produces
an evasion example in these regions, the attacker cannot be confident that this
sample will actually evade the corresponding classifier. Therefore, to increase the
probability of successful evasion, the attacker should favor attack points from
densely populated regions of legitimate points, where the estimate ĝ(x) is more
reliable (closer to the real g(x)), and tends to become negative in value.
To overcome this shortcoming, we introduce an additional component into
our attack objective, which estimates p(x|y c = −1) using a density estimator.
This term acts as a penalizer for x in low density regions and is weighted by a
parameter λ ≥ 0 yielding the following modified optimization problem:
λ X
x−xi
arg min F (x) = ĝ(x) − k h (2)
x n
i|yic =−1
10: return: x∗ = xm
alternate objective trades off between minimizing ĝ(x) (or p(y c = −1|x)) and
maximizing the estimated density p(x|y c = −1). The extra component favors
attack points that imitate features of known legitimate samples. In doing so, it
reshapes the objective function and thereby biases the resulting gradient descent
towards regions where the negative class is concentrated (see the bottom plot
in Fig. 1). This produces a similar effect to that shown by mimicry attacks in
network intrusion detection [11].7 For this reason, although our setting is rather
different, in the sequel we refer to this extra term as the mimicry component.
Finally, we point out that, when mimicry is used (λ > 0), our gradient
descent clearly follows a suboptimal path compared to the case when only g(x)
is minimized (λ = 0). Therefore, more modifications may be required to reach the
same value of g(x) attained when λ = 0. However, as previously discussed, when
λ = 0, our descent approach may terminate at a local minimum where g(x) > 0,
without successfully evading detection. This behavior can thus be qualitatively
regarded as a trade-off between the probability of evading the targeted classifier
and the number of times that the adversary must modify her samples.
7
Mimicry attacks [11] consist of camouflaging malicious network packets to evade
anomaly-based intrusion detection systems by mimicking the characteristics of the
legitimate traffic distribution.
v11
x1 δ1
w1
…
…
vk1
wk
xi δk g(x)
…
wm
xd
vmd δm
4 Experiments
In this section, we first report a toy example from the MNIST handwritten
digit classification task [18] to visually demonstrate how the proposed algorithm
modifies digits to mislead classification. We then show the effectiveness of the
proposed attack on a more realistic and practical scenario: the detection of mal-
ware in PDF files.
Before attack (3 vs 7) After attack, g(x)=0 After attack, last iter. g(x)
2
5 5 5
1
10 10 10
15 15 15 0
20 20 20 −1
25 25 25
−2
5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 0 500
number of iterations
Fig. 3. Illustration of the gradient attack on the digit data, for λ = 0 (top row) and
λ = 10 (bottom row). Without a mimicry component (λ = 0) gradient descent quickly
decreases g but the resulting attack image does not resemble a “7”. In contrast, the
attack minimizes g slower when mimicry is applied (λ = 0) but the final attack image
closely resembles a mixture between “3” and “7”, as the term “mimicry” suggests.
a weak resemblance to the target class “7” but are, nevertheless, reliably mis-
classified. This is the same effect demonstrated in the top-left plot of Fig. 1: the
classifier is evaded by making the attack sample sufficiently dissimilar from the
malicious class. Conversely, when λ = 10, the attack images strongly resemble
the target class because the mimicry term favors samples that are more similar
to the target class. This is the same effect seen in the bottom plot of Fig. 1.
Finally note that, as expected, g(x) tends to decrease more gracefully when
mimicry is used, as we follow a suboptimal descent path. Since the targeted
classifier can be easily evaded when λ = 0, exploiting the mimicry component
would not be the optimal choice in this case. However, in the case of limited
knowledge, as discussed at the end of Section 2.3, mimicry may allow us to
trade for a higher probability of evading the targeted classifier, at the expense
of a higher number of modifications.
0.8 0.8
0.6 0.6
FN
FN
0.4 PK(C=1) 0.4
LK (C=1)
0.2 PK(C=100) 0.2 PK
LK (C=100) LK
0 0
0 10 20 30 40 50 0 10 20 30 40 50
dmax dmax
0.8 0.8
0.6 0.6
FN
FN
FN
0.4 0.4
0.2 0.2 PK
LK
0 0
0 10 20 30 40 50 0 10 20 30 40 50
dmax dmax
Fig. 4. Experimental results for SVMs with linear and RBF kernel (first and sec-
ond row), and for neural networks (third row). We report the FN values (attained at
FP=0.5%) for increasing dmax . For the sake of readability, we report the average FN
value ± half standard deviation (shown with error bars). Results for perfect (PK) and
limited (LK) knowledge attacks with λ = 0 (without mimicry) are shown in the first
column, while results with λ = 500 (with mimicry) are shown in the second column.
In each plot we considered different values of the classifier parameters, i.e., the regu-
larization parameter C for the linear SVM, the kernel parameter γ for the SVM with
RBF kernel, and the number of neurons m in the hidden layer for the neural network,
as reported in the plot title and legend.
Experimental results. We report our results in Figure 4, in terms of the
false negative (FN) rate attained by the targeted classifiers as a function of the
maximum allowable number of modifications, dmax ∈ [0, 50]. We compute the
FN rate corresponding to a fixed false positive (FP) rate of FP= 0.5%. For
dmax = 0, the FN rate corresponds to a standard performance evaluation using
unmodified PDFs. As expected, the FN rate increases with dmax as the PDF is
increasingly modified. Accordingly, a more secure classifier will exhibit a more
graceful increase of the FN rate.
Results for λ = 0. We first investigate the effect of the proposed attack in the
PK case, without considering the mimicry component (Figure 4, first column),
for varying parameters of the considered classifiers. The linear SVM (Figure 4,
top-left plot) is almost always evaded with as few as 5 to 10 modifications, in-
dependent of the regularization parameter C. It is worth noting that attacking
a linear classifier amounts to always incrementing the value of the same highest-
weighted feature (corresponding to the /Linearized keyword in the majority of
the cases) until it reaches its upper bound. This continues with the next highest
weighted non-bounded feature until termination. This occurs simply because the
gradient of g(x) does not depend on x for a linear classifier (see Section 3.1).
With the RBF kernel (Figure 4, middle-left plot), SVMs exhibit a similar be-
havior with C = 1 and various values of its γ parameter,10 and the RBF SVM
provides a higher degree of security compared to linear SVMs (cf. top-left plot
and middle-left plot in Figure 4). Interestingly, compared to SVMs, neural net-
works (Figure 4, bottom-left plot) seem to be much more robust against the
proposed evasion attack. This behavior can be explained by observing that the
decision function of neural networks may be characterized by flat regions (i.e.,
regions where the gradient of g(x) is close to zero). Hence, the gradient descent
algorithm based solely on g(x) essentially stops after few attack iterations for
most of the malicious samples, without being able to find a suitable attack.
In the LK case, without mimicry, classifiers are evaded with a probability
only slightly lower than that found in the PK case, even when only ng = 100
surrogate samples are used to learn the surrogate classifier. This aspect highlights
the threat posed by a skilled adversary with incomplete knowledge: only a small
set of samples may be required to successfully attack the target classifier using
the proposed algorithm.
Results for λ = 500. When mimicry is used (Figure 4, second column), the
success of the evasion of linear SVMs (with C = 1) decreases both in the PK
(e.g., compare the blue curve in the top-left plot with the solid blue curve in the
top-right plot) and LK case (e.g., compare the dashed red curve in the top-left
plot with the dashed blue curve in the top-right plot). The reason is that the
computed direction tends to lead to a slower descent; i.e., a less direct path that
often requires more modifications to evade the classifier. In the non-linear case
(Figure 4, middle-right and bottom-right plot), instead, mimicking exhibits some
beneficial aspects for the attacker, although the constraint on feature addition
10
We also conducted experiments using C = 0.1 and C = 100, but did not find
significant differences compared to the presented results using C = 1.
may make it difficult to properly mimic legitimate samples. In particular, note
how the targeted SVMs with RBF kernel (with C = 1 and γ = 1) in the PK case
(e.g., compare the solid blue curve in the middle-left plot with the solid blue curve
in the middle-right plot) is evaded with a significantly higher probability than
in the case of λ = 0. The reason is that, as explained at the end of Section 2.3, a
pure descent strategy on g(x) may find local minima (i.e., attack samples) that
do not evade detection, while the mimicry component biases the descent towards
regions of the feature space more densely populated by legitimate samples, where
g(x) eventually attains lower values. For neural networks, this aspect is even more
evident, in both the PK and LK settings (compare the dashed/solid curves in
the bottom-left plot with those in the bottom-right plot), since g(x) is essentially
flat far from the decision boundary, and thus pure gradient descent on g can not
even commence for many malicious samples, as previously mentioned. In this
case, the mimicry term is thus critical for finding a reasonable descent path to
evasion.
Discussion. Our attacks raise questions about the feasibility of detecting ma-
licious PDFs solely based on logical structure. We found that /Linearized,
/OpenAction, /Comment, /Root and /PageLayout were among the most com-
monly manipulated keywords. They indeed are found mainly in legitimate PDFs,
but can be easily added to malicious PDFs by the versioning mechanism. The
attacker can simply insert comments inside the malicious PDF file to augment
its /Comment count. Similarly, she can embed legitimate OpenAction code to add
/OpenAction keywords or add new pages to insert /PageLayout keywords.
Acknowledgments. This work has been partly supported by the project CRP-
18293, L.R. 7/2007, Bando 2009, and by the project “Advanced and secure
sharing of multimedia data over social networks in the future Internet” (CUP
F71J11000690002), both funded by Regione Autonoma della Sardegna. Davide
Maiorca gratefully acknowledges Regione Autonoma della Sardegna for the fi-
nancial support of his PhD scholarship. Blaine Nelson thanks the Alexander von
Humboldt Foundation for providing additional financial support. The opinions
expressed in this paper are solely those of the authors and do not necessarily
reflect the opinions of any sponsor.
References
1. Adobe: PDF Reference, sixth edition, version 1.7
2. Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar, J.D.: Can machine learn-
ing be secure? In: ASIACCS ’06: Proc. of the 2006 ACM Symp. on Information,
computer and comm. security. pp. 16–25. ACM, New York, NY, USA (2006)
3. Biggio, B., Fumera, G., Roli, F.: Multiple classifier systems for robust classifier
design in adversarial environments. Int’l J. of Machine Learning and Cybernetics
1(1), 27–41 (2010)
4. Biggio, B., Fumera, G., Roli, F.: Design of robust classifiers for adversarial en-
vironments. In: IEEE Int’l Conf. on Systems, Man, and Cybernetics (SMC). pp.
977–982 (2011)
5. Biggio, B., Fumera, G., Roli, F.: Security evaluation of pattern classifiers under
attack. IEEE Trans. on Knowl. and Data Eng. 99(PrePrints), 1 (2013)
6. Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector ma-
chines. In: Langford, J., Pineau, J. (eds.) 29th Int’l Conf. on Mach. Learn. (2012)
7. Brückner, M., Scheffer, T.: Stackelberg games for adversarial prediction problems.
In: Knowl. Disc. and D. Mining (KDD). pp. 547–555 (2011)
8. Brückner, M., Kanzow, C., Scheffer, T.: Static prediction games for adversarial
learning problems. J. Mach. Learn. Res. 13, 2617–2654 (2012)
9. Dalvi, N., Domingos, P., Mausam, Sanghai, S., Verma, D.: Adversarial classifica-
tion. In: 10th ACM SIGKDD Int’l Conf. on Knowl. Discovery and Data Mining
(KDD). pp. 99–108. (2004)
10. Dekel, O., Shamir, O., Xiao, L.: Learning to classify with missing and corrupted
features. Mach. Learn. 81, 149–178 (2010)
11. Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending
attacks. In: Proc. 15th Conf. on USENIX Sec. Symp. USENIX Association, CA,
USA (2006)
12. Globerson, A., Roweis, S.T.: Nightmare at test time: robust learning by feature
deletion. In: Cohen, W.W., Moore, A. (eds.) Proc. of the 23rd Int’l Conf. on Mach.
Learn. vol. 148, pp. 353–360. ACM (2006)
13. Golland, P.: Discriminative direction for kernel classifiers. In: Neu. Inf. Proc. Syst.
(NIPS). pp. 745–752 (2002)
14. Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B., Tygar, J.D.: Adversarial ma-
chine learning. In: 4th ACM Workshop on Art. Int. and Sec. (AISec 2011). pp.
43–57. Chicago, IL, USA (2011)
15. Kloft, M., Laskov, P.: Online anomaly detection under adversarial impact. In: Proc.
of the 13th Int’l Conf. on Art. Int. and Stats. (AISTATS). pp. 405–412 (2010)
16. Kolcz, A., Teo, C.H.: Feature weighting for improved classifier robustness. In: Sixth
Conf. on Email and Anti-Spam (CEAS). Mountain View, CA, USA (2009)
17. Laskov, P., Kloft, M.: A framework for quantitative security analysis of machine
learning. In: AISec ’09: Proc. of the 2nd ACM works. on Sec. and art. int. pp. 1–4.
ACM, New York, NY, USA (2009)
18. LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker,
H., Guyon, I., Müller, U., Säckinger, E., Simard, P., Vapnik, V.: Comparison of
learning algorithms for handwritten digit recognition. In: Int’l Conf. on Art. Neu.
Net. pp. 53–60 (1995)
19. Lowd, D., Meek, C.: Adversarial learning. In: Press, A. (ed.) Proc. of the Eleventh
ACM SIGKDD Int’l Conf. on Knowl. Disc. and D. Mining (KDD). pp. 641–647.
Chicago, IL. (2005)
20. Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious
pdf files detection. In: MLDM. pp. 510–524 (2012)
21. Nelson, B., Barreno, M., Chi, F.J., Joseph, A.D., Rubinstein, B.I.P., Saini, U.,
Sutton, C., Tygar, J.D., Xia, K.: Exploiting machine learning to subvert your
spam filter. In: LEET’08: Proc. of the 1st Usenix Work. on L.-S. Exp. and Emerg.
Threats. pp. 1–9. USENIX Association, Berkeley, CA, USA (2008)
22. Nelson, B., Rubinstein, B.I., Huang, L., Joseph, A.D., Lee, S.J., Rao, S., Tygar,
J.D.: Query strategies for evading convex-inducing classifiers. J. Mach. Learn. Res.
13, 1293–1332 (2012)
23. Platt, J.: Probabilistic outputs for support vector machines and comparison to reg-
ularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans,
D. (eds.) Adv. in L. M. Class. pp. 61–74 (2000)
24. Smutz, C., Stavrou, A.: Malicious pdf detection using metadata and structural
features. In: Proc. of the 28th Annual Comp. Sec. App. Conf.. pp. 239–248 (2012)
25. Šrndić, N., Laskov, P.: Detection of malicious pdf files based on hierarchical doc-
ument structure. In: Proc. 20th Annual Net. & Dist. Sys. Sec. Symp. (2013)
26. Young, R.: 2010 IBM X-force mid-year trend & risk report. Tech. rep., IBM (2010)