Ensemble Learning for Malware Analysis
Ensemble Learning for Malware Analysis
Abstract
In this paper, we consider ensemble classifiers, that is, machine learning based classifiers that
utilize a combination of scoring functions. We provide a framework for categorizing such classifiers,
and we outline several ensemble techniques, discussing how each fits into our framework. From this
general introduction, we then pivot to the topic of ensemble learning within the context of malware
analysis. We present a brief survey of some of the ensemble techniques that have been used in malware
(and related) research. We conclude with an extensive set of experiments, where we apply ensemble
arXiv:2103.12521v1 [cs.CR] 7 Mar 2021
techniques to a large and challenging malware dataset. While many of these ensemble techniques
have appeared in the malware literature, previously there has been no way to directly compare results
such as these, as different datasets and different measures of success are typically used. Our common
framework and empirical results are an effort to bring some sense of order to the chaos that is evident
in the evolving field of ensemble learning—both within the narrow confines of the malware analysis
problem, and in the larger realm of machine learning in general.
1 Introduction
In ensemble learning, multiple learning algorithms are combined, with the goal of improved accuracy
as compared to the individual algorithms. Ensemble techniques are widely used, and as a testament
to their strength, ensembles have won numerous machine learning contests in recent years, including
the KDD Cup [15], the Kaggle competition [14], and the Netflix prize [26].
Many such ensembles resemble Frankenstein’s monster [33], in the sense that they are an ag-
glomeration of disparate components, with some of the components being of questionable value—an
“everything and the kitchen sink” approach clearly prevails. This effect can be clearly observed in
the aforementioned machine learning contests, where there is little (if any) incentive to make systems
that are efficient or practical, as accuracy is typically the only criteria for success. In the case of the
Netflix prize, the winning team was awarded $1,000,000, yet Netflix never implement the winning
scheme, since the improvements in accuracy “did not seem to justify the engineering effort needed to
bring them into a production environment” [3]. In real-world systems, practicality and efficiency are
necessarily crucial factors.
In this paper, we provide a straightforward framework for categorizing ensemble techniques. We
then consider specific (and relatively simple) examples of various categories of such ensembles, and
we show how these fit into our framework. For various examples of ensembles, we also provide
experimental results, based on a large and diverse malware dataset.
While many of the techniques that we consider have previously appeared in the malware literature,
we are not aware of any comparable study focused on the effectiveness of various ensembles using
a common dataset and common measures of success. While we believe that these examples are
interesting in their own right, they also provide a basis for discussing various tradeoffs between
measures of accuracy and practical considerations.
The remainder of this paper is organized as follows. In Section 2 we discuss ensemble classifiers,
including our framework for categorizing such classifiers. Section 3 contains our experimental results.
This section also includes a discussion of our dataset, scoring metrics, software used, and so on.
Finally, Section 4 concludes the paper and includes suggestions for future work.
2 Ensemble Classifiers
In this section, we first give a selective survey of some examples of malware (and closely related)
research involving ensemble learning. Then we provide a framework for discussing ensemble classifiers
in general.
∗ [email protected]
‡ [email protected]
§ [email protected]
¶ [email protected]
S Department of Computer Science, San Jose State University, San Jose, California
1
2.1 Examples of Related Work
The paper [18] discusses various ways to combine classifiers and provides a theoretical framework for
such combinations. The focus is on straightforward combinations, such as a maximum, sum, product,
majority vote, and so on. The work in [18] has clearly been influential, but it seems somewhat dated,
given the wide variety of ensemble methods that are used today.
The book [20] presents the topic of ensemble learning from a similar perspective as [18] but in
much more detail. Perhaps not surprisingly, the more recent book [62] seems to have a somewhat
more modern perspective with respect to ensemble methods, but retains the theoretical flavor of [20]
and [18]. The brief blog at [35] provides a highly readable (if highly selective) summary of some of
the topics covered in the books [20] and [62].
Here, we take an approach that is, in some sense, more concrete than that in [18, 20, 62]. Our
objective is to provide a relatively straightforward framework for categorizing and discussing ensemble
techniques. We then use this framework as a frame of reference for experimental results based on a
variety of ensemble methods.
Table 1 provides a summary of several research papers where ensemble techniques have been
applied to security-related problems. The emphasis here is on malware, but we have also included a
few closely related topics. In any case, this represents a small sample of the many papers that have
been published, and is only intended to provide an indication as to the types and variety of ensemble
strategies that have been considered to date. On this list, we see examples of ensemble methods based
on bagging, boosting, and stacking, as discussed below in Section 2.3.
2
Let 𝜔1 , 𝜔2 , . . . , 𝜔𝑛 be training samples, and let 𝑣𝑖 be a feature vector of length 𝑚, where the
features that comprise 𝑣𝑖 are extracted from sample 𝜔𝑖 . We collect the feature vectors for all 𝑛
training samples into an 𝑚 × 𝑛 matrix that we denote as
(︀ )︀
𝑉 = 𝑣1 𝑣2 · · · 𝑣𝑛 (1)
where each 𝑣𝑖 is a column of the matrix 𝑉 . Note that each row of 𝑉 corresponds to a specific feature
type, while column 𝑖 of 𝑉 corresponds to the features extracted from the training sample 𝜔𝑖 .
Let 𝑆 : R𝑚 → R be a scoring function. Such a scoring function will be determined based on
training data, where this training data is given by a feature matrix 𝑉 , as in equation (1). A scoring
function 𝑆 will generally also depend on a set of 𝑘 parameters that we denote as
(︀ )︀
Λ = 𝜆1 𝜆2 . . . 𝜆𝑘 (2)
The score generated by the scoring function 𝑆 when applied to sample 𝑥 is given by
𝑆(𝑥; 𝑉, Λ)
where we have explicitly included the dependence on the training data 𝑉 and the function parame-
ters Λ.
For any scoring function 𝑆, there is a corresponding classification function that we denote as 𝑆̂︀ :
R𝑚 → {0, 1}. That is, once we determine a threshold to apply to the scoring function 𝑆, it provides a
binary classification function that we denote as 𝑆.
̂︀ As with 𝑆, we explicitly indicate the dependence
on training data 𝑉 and the function parameters Λ by writing
𝑆(𝑥;
̂︀ 𝑉, Λ).
For example, each training sample 𝜔𝑖 could be a malware executable file, where all of the 𝜔𝑖
belong to the same malware family. Then an example of an extracted feature 𝑣𝑖 would be the opcode
histogram, that is, the relative frequencies of the mnemonic opcodes that are obtained when 𝜔𝑖 is
disassembled. The scoring function 𝑆 could, for example, be based on a hidden Markov model that
is trained on the feature matrix 𝑉 as given in equation (1), with the parameters Λ in equation (2)
being the initial values that are selected when training the HMM.
In its most general form, an ensemble method for a binary classification problem can be viewed
as a function 𝐹 : Rℓ → {0, 1} of the form
(︀ )︀
𝐹 𝑆1 (𝑥; 𝑉1 , Λ1 ), 𝑆2 (𝑥; 𝑉2 , Λ2 ), . . . , 𝑆ℓ (𝑥; 𝑉ℓ , Λℓ ) (3)
That is, the ensemble method defined by the function 𝐹 produces a classification based on the
scores 𝑆1 , 𝑆2 , . . . , 𝑆ℓ , where scoring function 𝑆𝑖 is trained using the data 𝑉𝑖 and parameters Λ𝑖 .
2.3.1 Bagging
In bootstrap aggregation (i.e., bagging), different subsets of the data or features (or both) are used
to generate different scores. The results are then combined in some way, such as a sum of the scores,
or a majority vote of the corresponding classifications. For bagging we assume that the same scoring
method is used for all scores in the ensemble. For example, bagging is used when generating a random
forest, where each individual scoring function is based on a decision tree structure. One benefit of
bagging is that it reduces overfitting, which is a particular problem for decision trees.
For bagging, the general equation (3) is restricted to
(︀ )︀
𝐹 𝑆(𝑥; 𝑉1 , Λ), 𝑆(𝑥; 𝑉2 , Λ), . . . , 𝑆(𝑥; 𝑉ℓ , Λ) (4)
That is, in bagging, each scoring function is essentially the same, but each is trained on a different
feature set. For example, suppose that we collect all available feature vectors into a matrix 𝑉 as in
equation (1). Then bagging based on subsets of samples would correspond to generating 𝑉𝑖 by deleting
a subset of the columns of 𝑉 . On the other hand, bagging based on features would correspond to
generating 𝑉𝑖 by deleting a subset of the rows of 𝑉 . Of course, we can easily extend this to bagging
based on both the data and features simultaneously, as in a random forest. In Section 2.4, we discuss
specific examples of bagging.
3
2.3.2 Boosting
Boosting is a process whereby distinct classifiers are combined to produce a stronger classifier. Gen-
erally, boosting deals with weak classifiers that are combined in an adaptive or iterative manner so as
to improve the overall classifier. We restrict our definition of boosting to cases where the classifiers
are closely related, in the sense that they differ only in terms of parameters. From this perspective,
boosting can be viewed as “bagging” based on classifiers, rather than data or features. That is, all of
the scoring functions are reparameterized versions of the same scoring technique. Under this definition
of boosting, the general equation (3) becomes
(︀ )︀
𝐹 𝑆(𝑥; 𝑉, Λ1 ), 𝑆(𝑥; 𝑉, Λ2 ), . . . , 𝑆(𝑥; 𝑉, Λℓ ) (5)
That is, the scoring functions differ only by re-parameterization, while the scoring data and features
do not change.
Below, in Section 2.4, we discuss specific examples of boosting; in particular, we discuss the most
popular method of boosting, AdaBoost. In addition, we show that some other popular techniques fit
our definition of boosting.
2.3.3 Stacking
Stacking is an ensemble method that combines disparate scores using a meta-classifier [35]. In this
generic form, stacking is defined by the general case in equation (3), where the scoring functions can
be (and typically are) significantly different. Note that from this perspective, stacking is easily seen
to be a generalization of both bagging and boosting.
Because stacking generalizes both bagging and boosting, it is not surprising that stacking based
ensemble methods can outperform bagging and boosting methods, as evidenced by recent machine
learning competitions, including the KDD Cup [15], the Kaggle competition [14], as well as the
infamous Netflix prize [26]. However, this is not the end of the story, as efficiency and practicality are
often ignored in such competitions, whereas in practice, it is virtually always necessary to consider
such issues. Of course, the appropriate tradeoffs will depend on the specifics of the problem at hand.
Our empirical results in Section 3 provide some insights into these tradeoff issues within the malware
analysis domain.
In the next section, we discuss concrete examples of bagging, boosting, and stacking techniques.
Then in Section 3 we present our experimental results, which include selected bagging, boosting, and
stacking architectures.
2.4.1 Maximum
In this case, we have
(︀ )︀
𝐹 𝑆1 (𝑥; 𝑉1 , Λ1 ), 𝑆2 (𝑥; 𝑉2 , Λ2 ), . . . , 𝑆ℓ (𝑥; 𝑉ℓ , Λℓ ) = max{𝑆𝑖 (𝑥; 𝑉𝑖 , Λ𝑖 )} (6)
2.4.2 Averaging
Averaging is defined by
ℓ
(︀ )︀ 1 ∑︁
𝐹 𝑆1 (𝑥; 𝑉1 , Λ1 ), 𝑆2 (𝑥; 𝑉2 , Λ2 ), . . . , 𝑆ℓ (𝑥; 𝑉ℓ , Λℓ ) = 𝑆𝑖 (𝑥; 𝑉𝑖 , Λ𝑖 ) (7)
ℓ 𝑖=1
2.4.3 Voting
Voting could be used as a form of boosting, provided that no bagging is involved (i.e., the same data
and features are used in each case). Voting is also applicable to stacking, and is generally applied
in such a mode, or at least with significant diversity in the scoring functions, since we want limited
correlation when voting.
4
In the case of stacking, a simple majority vote is of the form
(︀ )︀
𝐹 𝑆̂︀1 (𝑥; 𝑉1 , Λ1 ), 𝑆̂︀2 (𝑥; 𝑉2 , Λ2 ), . . . , 𝑆̂︀ℓ (𝑥; 𝑉ℓ , Λℓ )
(︀ )︀
= maj 𝑆̂︀1 (𝑥; 𝑉1 , Λ1 ), 𝑆̂︀2 (𝑥; 𝑉2 , Λ2 ), . . . , 𝑆̂︀ℓ (𝑥; 𝑉ℓ , Λℓ )
where “maj” is the majority vote function. Note that the majority vote is well defined in this case,
provided that ℓ is odd—if ℓ is even, we can simply flip a coin in case of a tie.
As an aside, we note that it is easy to see why we want to avoid correlation when voting is used
as a combining function. Consider the following example from [47]. Suppose that we have the three
highly correlated scores
⎛ ⎞ ⎛ ⎞
𝑆̂︀1 1 1 1 1 1 1 1 1 0 0
⎝ 𝑆2 ⎠ = 1 1 1 1 1 1 1 1 0 0 ⎠
⎜ ̂︀ ⎟ ⎝
𝑆̂︀3 1 0 1 1 1 1 1 1 0 0
where each 1 indicates correct classification, and each 0 is an incorrect classification. Then, both 𝑆̂︀1
and 𝑆̂︀2 are 80% accurate, and 𝑆̂︀3 is 70% accurate. If we use a simple majority vote, then we obtain
the classifier
𝐶=( 1 1 1 1 1 1 1 1 0 0 )
which is 80% accurate. On the other hand, the less correlated classifiers
⎛ ⎞ ⎛
𝑆̂︀1′
⎞
1 1 1 1 1 1 1 1 0 0
⎜ ̂︀ ′ ⎟ ⎝
⎝ 𝑆2 ⎠ = 0 1 1 1 0 1 1 1 0 1 ⎠
′ 1 0 0 0 1 0 1 1 1 1
𝑆3
̂︀
are only 80%, 70% and 60% accurate, respectively, but the majority vote in this case gives us
𝐶′ = ( 1 1 1 1 1 1 1 1 0 1 )
2.4.5 AdaBoost
Given a collection of (weak) classifiers 𝑐1 , 𝑐2 , . . . , 𝑐ℓ , AdaBoost is an iterative algorithm that generates
a series of (generally, stronger) classifiers, 𝐶1 , 𝐶2 , . . . , 𝐶𝑀 based on the classifiers 𝑐𝑖 . Each classifier is
determined from the previous classifier by the simple linear extension
and the final classifier is given by 𝐶 = 𝐶𝑀 . Note that at each iteration, we include a previously
unused 𝑐𝑖 from the set of (weak) classifiers and determine a new weight 𝛼𝑖 . A greedy approach is
used when selecting 𝑐𝑖 , but it is not a hill climb, so that results might get worse at any step in the
AdaBoost process.
From this description, we see that the AdaBoost algorithm fits the form in equation (5), with 𝑆(𝑥;
̂︀ 𝑉, Λ𝑖 ) =
𝐶𝑖 (𝑥), and
(︀ )︀
𝐹 𝑆(𝑥;
̂︀ 𝑉, Λ1 ), 𝑆(𝑥;
̂︀ 𝑉, Λ2 ), . . . , 𝑆(𝑥;
̂︀ 𝑉, Λ𝑀 ) = 𝑆(𝑥;
̂︀ 𝑉, Λ𝑀 ) = 𝐶𝑀 (𝑥)
5
2.4.7 HMM with Random Restarts
A hidden Markov model can be viewed as a discrete hill climb technique [37, 38]. As with any hill
climb, when training an HMM we are only assured of a local maximum, and we can often significantly
improve our results by executing the hill climb multiple times with different initial values, selecting
the best of the resulting models. For example, in [51] it is shown that an HMM can be highly effective
for breaking classic substitution ciphers and, furthermore, by using a large number of random restarts,
we can significantly increase the success rate in the most difficult cases. The work in [51] is closely
related to that in [7], where such an approach is used to analyze the unsolved Zodiac 340 cipher.
From the perspective considered in this paper, an HMM with random restarts can be seen as special
case of boosting. If we simply select the best model, then the “combining” function is particularly
simple, and is given by
(︀ )︀
𝐹 𝑆(𝑥; 𝑉, Λ1 ), 𝑆(𝑥; 𝑉, Λ2 ), . . . , 𝑆(𝑥; 𝑉, Λℓ ) = max{𝑆(𝑥; 𝑉, Λ𝑖 )} (8)
Here, each scoring function is an HMM, where the trained models differ based only on different initial
values. We see that equation (8) is a special case of equation (6). However, the “max” in equation (8)
is the maximum over the HMM model scores, not the maximum over any particular set of input
values. That is, we select the highest scoring model and use it for scoring. Of course, we could use
other combining functions, such as an average or majority vote of the corresponding classifiers. In any
case, since there is a score associated with each model generated by an HMM, any such combining
function is well-defined.
where 𝑆 is a perceptron and each 𝑃𝑖 represents a set of initial values. We see that equation (9) is a
special case of the averaging example given in equation (7). Also, we note that in this sum, we are
averaging the perceptron models, not the classifications generated by the models.
Although this technique is sometimes referred to as “bagged” perceptrons [47], by our criteria, it
is a boosting scheme. That is, the “bagging” here is done with respect to parameters of the scoring
functions, which is our working definition of boosting.
6
3.1 Dataset and Features
Our dataset consists of samples from the 21 malware families listed in Table 2. These families are from
various different types of malware, including Trojans, worms, backdoors, password stealers, so-called
VirTools, and so on.
7
From each available malware sample, we extract the first 1000 mnemonic opcodes using the revers-
ing tool Radare2 (also know as R2) [29]. We discard any malware executable that yields less than 1000
opcodes, as well as a number of executables that were found to be corrupted. The resulting opcode
sequences, each of length 1000, serve as the feature vectors for our machine learning experiments.
Table 3 gives the number of samples (per family) from which we successfully obtained opcode
feature vectors. Note that our dataset contains a total of 9725 samples from the 21 malware families
and that the dataset is highly imbalanced—the number of samples per family varies from a low of 129
to a high of nearly 1000.
3.2 Metrics
The metrics used to quantify the success of our experiments are accuracy, balanced accuracy, precision,
recall, and the F1 score. Accuracy is simply the ratio of correct classifications to the total number of
classifications. In contrast, the balanced accuracy is the average accuracy per family.
Precision, which is also known as the positive predictive value, is the number of true positives
divided by the sum of the true positives and false positives. That is, the precision is the ratio of
samples classified as positives that are actually positive to all samples that are classified as positive.
Recall, which is also known as the true positive rate or sensitivity, is the computed by dividing the
number of true positives by the number true positives plus the number of false negatives. That is,
the recall is the fraction of positive samples that are classified as such. The F1 score is computed as
precision · recall
F1 = 2 · ,
precision + recall
which is the harmonic mean of the precision and recall.
3.3 Software
The software packages used in our experiments include hmmlearn [11], XGBoost [57], Keras [16], and
TensorFlow [39], and scikit-learn [30], as indicated in Table 4. In addition, we use Numpy [27] for
linear algebra and various tools available in the package scikit-learn (also known as sklearn) for
general data processing. These packages are all widely used in machine learning.
8
Table 4: Software used in experiments
Technique Software
HMM hmmlearn
XGBoost XGBoost
AdaBoost sklearn
CNN Keras, TensorFlow
LSTM Keras, TensorFlow
Random Forest sklearn
We also conduct bagging and boosting experiments based on a subset of the techniques considered
in our baseline standard experiments. These results demonstrate that both bagging and boosting can
provide some improvement over our baseline techniques.
Finally, we consider a set of stacking experiments, where we restrict our attention to simple voting
schemes, all of which are based on architectures previously considered in this paper. Although these
are very basic stacking architectures, they clearly show the potential benefit of stacking multiple
techniques.
From Table 5, we note that a significant number of parameter combinations were tested in each
case. For example, in the case of our random forest model, we tested
53 · 3 · 6 = 2250
different combinations of parameters.
The confusion matrices for all of the experiments in this section can be found in the Appendix in
Figure 2 (a) through Figure 2 (d). We present the results of all of these experiments—in terms of the
metrics discussed previously (i.e., accuracy, balanced accuracy, precision, recall, and F1 score)—in
Section 3.9, below.
9
3.6 Bagging Experiments
Recall from our discussion above, that we use the term bagging to mean a multi-model approach where
the individual models are trained with the same technique and essentially the same parameters, but
different subsets of the data or features. In contrast, we use boosting to refer to multi-model cases
where the data and features are essentially the same and the models are of the same type, with the
model parameters varied.
We will use AdaBoost and XGBoost results to serve as representative examples of boosting. We
also consider bagging experiments (in the sense described in the previous paragraph) involving each of
the HMM, CNN, and LSTM architectures. The results of these three distinct bagging experiments—
in the form of confusion matrices—are given in Figure 3 in the Appendix. In terms of the metrics
discussed above, the results of these experiments are summarized in Section 3.9, below.
Confusion matrices for these two boosting experiments are given in Figure 4 in the Appendix.
The results of these experiments are summarized in Section 3.9, below, in terms of accuracy, balanced
accuracy, and so on.
10
3.9 Discussion
Table 7 summarizes the results of all of the experiments discussed above, in term of the following met-
rics: accuracy, balanced accuracy, precision, recall, and F1 score. These metrics have been introduced
in Section 3.1, above.
Balanced
Experiments Case Accuracy Precision Recall F1 score
accuracy
HMM 0.6717 0.6336 0.7325 0.6717 0.6848
CNN 0.8211 0.7245 0.8364 0.8211 0.8104
Standard
Random Forest 0.7549 0.6610 0.7545 0.7523 0.7448
LSTM 0.8410 0.7185 0.7543 0.7185 0.8145
Bagged HMM 0.7168 0.6462 0.7484 0.7168 0.7165
Bagging Bagged CNN 0.8910 0.8105 0.9032 0.8910 0.8838
Bagged LSTM 0.8602 0.7754 0.8571 0.8602 0.8549
AdaBoost 0.5378 0.4060 0.5231 0.5378 0.5113
Boosting
XGBoost 0.7472 0.6636 0.7371 0.7472 0.7285
Classic 0.8766 0.8079 0.8747 0.8766 0.8719
CNN 0.9260 0.8705 0.9321 0.9260 0.9231
LSTM 0.8560 0.7470 0.8511 0.8560 0.8408
Voting
Bagged neural networks 0.9337 0.8816 0.9384 0.9337 0.9313
All neural networks 0.9208 0.8613 0.9284 0.9208 0.9171
All models 0.9188 0.8573 0.9249 0.9188 0.9154
In Table 7, the best result for each type of experiment is in boldface, with the best results overall
also being boxed. We see that a voting strategy based on all of the bagged neural network techniques
gives us the best result for each of the five statistics that we have computed.
Since our dataset is highly imbalanced, we consider the balanced accuracy as the best measure of
success. The balanced accuracy results in Table 7 are given in the form of a bar graph in Figure 1.
0.90
0.80
0.70
0.60
Accuracy
0.50
0.40
0.30
Standard
0.20 Bagging
Boosting
0.10
Voting
0.00
M
TM
ost
N
TM
t
NN
TM
st
ic
els
res
ork
ork
ass
CN
CN
oo
HM
HM
od
Bo
C
Fo
LS
LS
LS
etw
tw
aB
Cl
lm
XG
ed
ne
ed
Ad
m
ed
ln
gg
Al
do
gg
gg
ral
ura
Ba
n
Ba
Ba
eu
Ra
ne
ln
ged
Al
g
Ba
Note that the results in Figure 1 clearly show that stacking techniques are beneficial, as compared
to the corresponding “standard” techniques. Stacking not only yields the best results, but it dominates
in all categories. We note that five of the six stacking experiments perform better than any of the
11
standard, bagging, or boosting experiments. This is particularly noteworthy since we only considered
a simple stacking approach. As a results, our stacking experiments likely provide a poor lower bound
on stacking in general, and more advanced stacking techniques may improve significantly over the
results that we have obtained.
References
[1] Adware:win32/hotbar. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=Adware:Win32/Hotbar&threatId=6204.
[2] Mamoun Alazab, Sitalakshmi Venkatraman, Paul Watters, and Moutaz Alazab. Zero-day mal-
ware detection based on supervised learning algorithms of API call signatures. In Proceedings
of the Ninth Australasian Data Mining Conference, volume 121 of AusDM ’11, pages 171–182.
Australian Computer Society, 2011.
[3] Xavier Amatriain and Justin Basilico. Netflix recommendations: Beyond the 5 stars
(part 1). https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-
stars-part-1-55838468f429, 2012.
[4] Backdoor:win32/bifrose. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=Backdoor:Win32/Bifrose&threatId=-2147479537.
[5] Backdoor:win32/cycbot.g. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=Backdoor:Win32/Cycbot.G.
[6] Backdoor:win32/vb. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=Backdoor:Win32/VB&threatId=7275.
[7] Taylor Berg-Kirkpatrick and Dan Klein. Decipherment with a million random restarts. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP
2013, pages 874–878, 2013.
[8] Prakash Mandayam Comar, Lei Liu, Sabyasachi Saha, Pang-Ning Tan, and Antonio Nucci. Com-
bining supervised and unsupervised learning for zero-day malware detection. In 2013 Proceedings
IEEE INFOCOM, pages 2022–2030. IEEE, 2013.
[9] Marko Dimjaševic, Simone Atzeni, Ivo Ugrina, and Zvonimir Rakamaric. Android malware
detection based on system calls. Technical Report UUCS-15-003, School of Computing, University
of Utah, Salt Lake City, Utah, 2015.
[10] Shanqing Guo, Qixia Yuan, Fengbo Lin, Fengyu Wang, and Tao Ban. A malware detection algo-
rithm based on multi-view fusion. In International Conference on Neural Information Processing,
ICONIP 2010, pages 259–266. Springer, 2010.
[11] hmmlearn. https://hmmlearn.readthedocs.io/en/latest/.
12
[12] Fauzia Idrees, Muttukrishnan Rajarajan, Mauro Conti, Thomas M Chen, and Yogachandran
Rahulamathavan. Pindroid: A novel android malware detection system using ensemble learning
methods. Computers & Security, 68:36–46, 2017.
[13] Sachin Jain and Yogesh Kumar Meena. Byte level 𝑛-gram analysis for malware detection. In
Computer Networks and Intelligent Computing, pages 51–59. Springer, 2011.
[14] Kaggle. Welcome to Kaggle competitions. https://www.kaggle.com/competitions, 2018.
[15] KDD Cup of fresh air. https://biendata.com/competition/kdd_2018/, 2018.
[16] Keras: The Python deep learning API. https://keras.io/.
[17] Muhammad Salman Khan, Sana Siddiqui, Robert D McLeod, Ken Ferens, and Witold Kinsner.
Fractal based adaptive boosting algorithm for cognitive detection of computer malware. In 15th
International Conference on Cognitive Informatics & Cognitive Computing, ICCI*CC, pages 50–
59. IEEE, 2016.
[18] Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas. On combining classifiers.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239, March 1998.
[19] Deguang Kong and Guanhua Yan. Discriminant malware distance learning on structural informa-
tion for automated malware classification. In Proceedings of the 19th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, KDD ’13, pages 1357–1365. ACM, 2013.
[20] Ludmila I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms.
Wiley, Hoboken, New Jersey, 2004. https://pdfs.semanticscholar.org/453c/
2b407c57d7512fdbe19fa1cefa08dd22614a.pdf.
[21] Marios Michailidis. Investigating machine learning methods in recommender systems. Thesis,
University College London, 2017.
[22] Microsoft malware protection center, winwebsec. https://www.microsoft.com/security/
portal/threat/encyclopedia/entry.aspx?Name=Win32%2fWinwebsec.
[23] Symantec security response, zbot. http://www.symantec.com/security_response/writeup.
jsp?docid=2010-011016-3514-99.
[24] Salvador Morales-Ortega, Ponciano Jorge Escamilla-Ambrosio, Abraham Rodriguez-Mota, and
Lilian D Coronado-De-Alba. Native malware detection in smartphones with Android OS using
static analysis, feature selection and ensemble classifiers. In 11th International Conference on
Malicious and Unwanted Software, MALWARE 2016, pages 1–8. IEEE, 2016.
[25] Masoud Narouei, Mansour Ahmadi, Giorgio Giacinto, Hassan Takabi, and Ashkan Sami.
Dllminer: structural mining for malware detection. Security and Communication Networks,
8(18):3311–3322, 2015.
[26] Netflix Prize. https://www.netflixprize.com, 2009.
[27] Numpy. https://numpy.org/.
[28] Pws:win32/onlinegames. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=PWS%3AWin32%2FOnLineGames.
[29] Radare2: Libre and portable reverse engineering framework. https://rada.re/n/.
[30] scikit-learn: Machine learning in Python. https://scikit-learn.org/stable/.
[31] Raja Khurram Shahzad and Niklas Lavesson. Comparative analysis of voting schemes for
ensemble-based malware detection. Journal of Wireless Mobile Networks, Ubiquitous Computing,
and Dependable Applications, 4(1):98–117, 2013.
[32] Shina Sheen, R Anitha, and P Sirisha. Malware detection by pruning of parallel ensembles using
harmony search. Pattern Recognition Letters, 34(14):1679–1686, 2013.
[33] Mary Wollstonecraft Shelley. Frankenstein or The Modern Prometheus. Dent, 1869.
[34] Tanuvir Singh, Fabio Di Troia, Visaggio Aaron Corrado, Thomas H. Austin, and Mark Stamp.
Support vector machines and malware detection. Journal of Computer Virology and Hacking
Techniques, 12(4):203–212, 2016.
[35] Vadim Smolyakov. Ensemble learning to improve machine learning results. https://blog.
statsbot.co/ensemble-learning-d1dcd548e936, 2017.
[36] Charles Smutz and Angelos Stavrou. Malicious pdf detection using metadata and structural
features. In Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC
2012, pages 239–248. ACM, 2012.
13
[37] Mark Stamp. A revealing introduction to hidden Markov models. https://www.cs.sjsu.edu/
~stamp/RUA/HMM.pdf, 2004.
[38] Mark Stamp. Introduction to Machine Learning with Applications in Information Security. Chap-
man and Hall/CRC, Boca Raton, 2017.
[39] TensorFlow: An end-to-end open source machine learning platform. https://www.tensorflow.
org/.
[40] Fergus Toolan and Joe Carthy. Phishing detection using classifier ensembles. In eCrime Re-
searchers Summit, 2009, eCRIME ’09, pages 1–9. IEEE, 2009.
[41] Trojandownloader:win32/adload. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=TrojanDownloader%3AWin32%2FAdload.
[42] Trojandownloader:win32/agent. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=TrojanDownloader:Win32/Agent&ThreatID=14992.
[43] Trojandownloader:win32/renos. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=TrojanDownloader:Win32/Renos&threatId=16054.
[44] Trojandownloader:win32/small. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=TrojanDownloader:Win32/Small&threatId=15508.
[45] Trojan:win32/bho. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=Trojan:Win32/BHO&threatId=-2147364778.
[46] Trojan:win32/toga. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=Trojan:Win32/Toga&threatId=-2147259798.
[47] Hendrik Jacob van Veen, Le Nguyen The Dat, and Armando Segnini. Kaggle ensembling guide.
https://mlwave.com/kaggle-ensembling-guide/, 2015.
[48] Virtool:win32/ceeinject. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=VirTool%3AWin32%2FCeeInject.
[49] Virtool:win32/injector. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=VirTool:Win32/Injector&threatId=-2147401697.
[50] Virtool:win32/vbinject. https://www.microsoft.com/en-us/wdsi/threats/malware-
encyclopedia-description?Name=VirTool:Win32/VBInject&threatId=-2147367171.
[51] Rohit Vobbilisetty, Fabio Di Troia, Richard M. Low, Corrado Aaron Visaggio, and Mark Stamp.
Classic cryptanalysis using hidden Markov models. Cryptologia, 41(1):1–28, 2017.
[52] Win32/allaple. https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-
description?Name=Win32/Allaple&threatId=.
[53] Win32/fakerean. https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-
description?Name=Win32/FakeRean.
[54] Win32/rimecud. https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-
description?Name=Win32/Rimecud&threatId=.
[55] Win32/vobfus. https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-
description?Name=Win32/Vobfus&threatId=.
[56] Win32/vundo. https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-
description?Name=Win32/Vundo&threatId=.
[57] XGBoost documentation. https://xgboost.readthedocs.io/en/latest/.
[58] Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, and Min Zhao. Sbmds: an
interpretable string based malware detection system using svm ensemble with bagging. Journal
in Computer Virology, 5(4):283, 2009.
[59] Yanfang Ye, Tao Li, Yong Chen, and Qingshan Jiang. Automatic malware categorization us-
ing cluster ensemble. In Proceedings of the 16th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’10, pages 95–104. ACM, 2010.
[60] Suleiman Y Yerima, Sakir Sezer, and Igor Muttik. High accuracy android malware detection
using ensemble learning. IET Information Security, 9(6):313–320, 2015.
[61] Boyun Zhang, Jianping Yin, Jingbo Hao, Dingxing Zhang, and Shulin Wang. Malicious codes
detection based on ensemble learning. In International Conference on Autonomic and Trusted
Computing, ATC 2007, pages 468–477. Springer, 2007.
[62] Zhi-Hua Zhou. Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton,
Florida, 2012. http://www2.islab.ntua.gr/attachments/article/86/Ensemble%20methods%
20-%20Zhou.pdf.
14
Appendix: Confusion Matrices
1 1
Adload 0.829 0.098 0.049 0.024 Adload 0.881 0.119
Agent 0.348 0.022 0.022 0.022 0.022 0.022 0.109 0.022 0.152 0.043 0.130 0.065 0.022 Agent 0.486 0.114 0.029 0.029 0.057 0.029 0.086 0.029 0.057 0.086
BHO 0.060 0.867 0.012 0.036 0.012 0.012 BHO 0.042 0.887 0.014 0.028 0.014 0.014
0.8 0.8
Bifrose 0.026 0.436 0.026 0.051 0.026 0.128 0.077 0.051 0.051 0.077 0.051 Bifrose 0.105 0.158 0.105 0.026 0.132 0.395 0.053 0.026
CeeInject 0.018 0.005 0.091 0.723 0.009 0.014 0.059 0.036 0.009 0.005 0.009 0.005 0.018 CeeInject 0.006 0.899 0.006 0.013 0.032 0.013 0.006 0.019 0.006
Cycbot 0.007 0.839 0.007 0.007 0.007 0.094 0.040 Cycbot 0.953 0.031 0.016
FakeRean 0.014 0.007 0.741 0.029 0.050 0.029 0.036 0.022 0.072 FakeRean 0.011 0.874 0.011 0.021 0.021 0.063
Injector 0.050 0.100 0.025 0.100 0.025 0.025 0.075 0.050 0.175 0.050 0.025 0.100 0.025 0.175 Injector 0.032 0.161 0.097 0.226 0.129 0.065 0.032 0.161 0.065 0.032
OnLineGames 0.074 0.056 0.037 0.019 0.019 0.704 0.074 0.019 OnLineGames 0.021 0.043 0.745 0.021 0.021 0.043 0.064 0.043
Renos 0.015 0.008 0.015 0.015 0.023 0.818 0.023 0.008 0.008 0.015 0.008 0.045 Renos 0.010 0.010 0.019 0.886 0.010 0.019 0.038 0.010
Rimecud 0.077 0.026 0.846 0.026 0.026 0.4 Rimecud 0.088 0.853 0.059 0.4
Small 0.043 0.022 0.022 0.065 0.087 0.196 0.326 0.065 0.152 0.022 Small 0.029 0.029 0.029 0.029 0.057 0.429 0.086 0.057 0.257
Toga 0.040 0.010 0.208 0.030 0.030 0.030 0.020 0.030 0.465 0.040 0.079 0.020 Toga 0.025 0.013 0.013 0.114 0.025 0.443 0.013 0.241 0.013 0.038 0.051 0.013
VB 0.023 0.023 0.011 0.011 0.011 0.023 0.023 0.529 0.092 0.241 0.011 VB 0.034 0.017 0.017 0.483 0.397 0.034 0.017
VBinject 0.004 0.021 0.204 0.021 0.043 0.021 0.047 0.055 0.106 0.328 0.132 0.009 0.009 VBinject 0.006 0.045 0.022 0.006 0.034 0.011 0.815 0.011 0.039 0.011
0.2 0.2
Vobfus 0.009 0.004 0.013 0.009 0.017 0.948 Vobfus 0.005 0.005 0.074 0.905 0.011
Vundo 0.005 0.073 0.005 0.068 0.047 0.016 0.105 0.005 0.052 0.105 0.492 0.021 0.005 Vundo 0.006 0.006 0.036 0.018 0.916 0.018
Winwebsec 0.010 0.048 0.019 0.138 0.033 0.005 0.010 0.048 0.024 0.005 0.562 0.100 Winwebsec 0.006 0.006 0.024 0.006 0.012 0.006 0.941
Zbot 0.013 0.053 0.013 0.026 0.026 0.039 0.026 0.026 0.053 0.039 0.145 0.013 0.013 0.013 0.013 0.026 0.461 Zbot 0.036 0.018 0.182 0.018 0.036 0.073 0.036 0.073 0.527
0 0
A t
C ct
eG r
ga
Vo t
nt
ct
eG r
ga
Vo t
e
ke t
ke t
O
es
R nos
es
R nos
V VB
d
ee e
H n
O Inj r
ud
V s
in do
c
ot
H n
O Inj r
ud
V s
in do
c
ot
n
al
al
in to
in to
c
Fa bo
Fa bo
pl
pl
ba
ba
u
u
C ros
se
os
se
oa
ea
oa
ea
V
je
je
je
je
H
H
ge
ge
am
am
To
To
Zb
Zb
bf
bf
Sm
W un
Sm
W un
ec
ec
nL ec
nL ec
lla
lla
eb
eb
ifr
yc
yc
ot
ot
e
In
in
In
in
R
R
B
B
dl
dl
if
A
A
R
R
im
im
A
w
w
B
B
C
A
ee
V
C
(a) HMM (b) CNN
1 1
Adload 0.929 0.024 0.024 0.024 Adload 0.810 0.119 0.024 0.024 0.024
Agent 0.486 0.057 0.114 0.086 0.029 0.029 0.029 0.086 0.029 0.057 Agent 0.171 0.057 0.029 0.114 0.057 0.086 0.029 0.114 0.200 0.114 0.029
BHO 0.042 0.014 0.915 0.014 0.014 BHO 0.944 0.014 0.042
0.8 0.8
Bifrose 0.026 0.079 0.553 0.053 0.026 0.026 0.053 0.184 Bifrose 0.053 0.105 0.053 0.026 0.026 0.105 0.026 0.237 0.132 0.132 0.079 0.026
CeeInject 0.038 0.006 0.905 0.006 0.006 0.013 0.006 0.013 0.006 CeeInject 0.006 0.019 0.829 0.006 0.032 0.006 0.006 0.006 0.070 0.006 0.013
FakeRean 0.042 0.042 0.874 0.021 0.011 0.011 FakeRean 0.011 0.063 0.642 0.011 0.011 0.011 0.095 0.095 0.063
Injector 0.065 0.097 0.677 0.032 0.097 0.032 Injector 0.032 0.032 0.097 0.032 0.032 0.097 0.161 0.032 0.323 0.065 0.097
OnLineGames 0.064 0.021 0.766 0.064 0.064 0.021 OnLineGames 0.043 0.021 0.021 0.021 0.043 0.553 0.021 0.021 0.191 0.043 0.021
Renos 0.067 0.010 0.914 0.010 Renos 0.010 0.010 0.010 0.010 0.010 0.010 0.810 0.010 0.038 0.029 0.010 0.029 0.019
Rimecud 0.118 0.029 0.824 0.029 0.4 Rimecud 0.059 0.029 0.706 0.088 0.118 0.4
Small 0.029 0.114 0.057 0.029 0.086 0.571 0.057 0.057 Small 0.029 0.029 0.029 0.029 0.029 0.086 0.086 0.143 0.229 0.314
Toga 0.013 0.177 0.025 0.025 0.013 0.013 0.025 0.519 0.152 0.025 0.013 Toga 0.013 0.013 0.038 0.025 0.025 0.278 0.013 0.291 0.114 0.063 0.038 0.089
VB 0.034 0.034 0.017 0.517 0.259 0.103 0.034 VB 0.017 0.017 0.362 0.293 0.241 0.017 0.034 0.017
VBinject 0.022 0.011 0.011 0.006 0.006 0.006 0.006 0.854 0.073 0.006 VBinject 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.017 0.006 0.045 0.146 0.393 0.326 0.017 0.011
0.2 0.2
Vobfus 0.005 0.005 0.047 0.942 Vobfus 0.005 0.011 0.042 0.021 0.916 0.005
Vundo 0.006 0.006 0.030 0.012 0.934 0.006 0.006 Vundo 0.006 0.006 0.006 0.018 0.012 0.006 0.940 0.006
Winwebsec 0.035 0.018 0.006 0.006 0.935 Winwebsec 0.006 0.029 0.006 0.024 0.024 0.018 0.006 0.006 0.012 0.006 0.865
Zbot 0.145 0.018 0.018 0.018 0.036 0.018 0.055 0.691 Zbot 0.036 0.018 0.018 0.145 0.018 0.127 0.018 0.109 0.073 0.091 0.345
0 0
A t
C ct
eG r
ga
Vo t
A t
C ct
eG r
ga
Vo t
e
ke t
ke t
O
es
R nos
es
R nos
V VB
d
ee e
H n
O Inj r
ud
V s
in do
c
ot
ee e
H n
O Inj r
ud
V s
in do
c
ot
n
n
al
al
in to
in to
c
Fa bo
Fa bo
pl
pl
ba
ba
u
u
C ros
se
C ros
se
oa
ea
oa
ea
V
je
je
je
je
H
H
ge
ge
am
am
To
To
Zb
Zb
bf
bf
Sm
W un
Sm
W un
ec
ec
nL ec
nL ec
lla
lla
eb
eb
yc
yc
ot
ot
e
In
in
In
in
R
R
B
B
dl
dl
if
if
A
A
R
R
im
im
w
w
B
B
A
A
V
15
1 1
Adload 0.146 0.829 0.024 Adload 0.952 0.024 0.024
Agent 0.022 0.304 0.022 0.022 0.065 0.130 0.022 0.043 0.022 0.022 0.152 0.152 0.022 Agent 0.629 0.086 0.029 0.057 0.057 0.086 0.057
BHO 0.060 0.012 0.831 0.024 0.072 BHO 0.042 0.915 0.014 0.014 0.014
0.8 0.8
Bifrose 0.410 0.051 0.051 0.026 0.103 0.051 0.026 0.051 0.128 0.051 0.051 Bifrose 0.053 0.132 0.184 0.026 0.026 0.053 0.079 0.421 0.026
CeeInject 0.050 0.009 0.064 0.764 0.005 0.032 0.018 0.005 0.014 0.005 0.023 0.005 0.009 CeeInject 0.924 0.019 0.006 0.019 0.006 0.019 0.006
FakeRean 0.007 0.777 0.007 0.014 0.007 0.029 0.029 0.036 0.036 0.050 0.007 FakeRean 0.958 0.011 0.011 0.021
Injector 0.150 0.100 0.025 0.075 0.025 0.225 0.025 0.025 0.025 0.050 0.050 0.075 0.025 0.025 0.100 Injector 0.065 0.032 0.355 0.032 0.065 0.032 0.290 0.097 0.032
OnLineGames 0.111 0.056 0.759 0.019 0.037 0.019 OnLineGames 0.021 0.021 0.915 0.021 0.021
Renos 0.008 0.008 0.008 0.886 0.015 0.008 0.053 0.015 Renos 0.010 0.010 0.019 0.943 0.010 0.010
Rimecud 0.872 0.026 0.026 0.026 0.051 0.4 Rimecud 0.118 0.824 0.029 0.029 0.4
Small 0.152 0.043 0.043 0.043 0.043 0.065 0.348 0.065 0.087 0.065 0.043 Small 0.086 0.029 0.571 0.143 0.057 0.029 0.086
Toga 0.010 0.139 0.030 0.020 0.030 0.020 0.030 0.495 0.099 0.030 0.020 0.050 0.010 0.020 Toga 0.013 0.025 0.013 0.747 0.190 0.013
VB 0.023 0.011 0.011 0.034 0.023 0.586 0.069 0.230 0.011 VB 0.017 0.034 0.017 0.638 0.259 0.034
VBinject 0.009 0.026 0.140 0.017 0.081 0.013 0.051 0.060 0.123 0.285 0.132 0.047 0.017 VBinject 0.006 0.011 0.006 0.938 0.006 0.022 0.011
0.2 0.2
Vobfus 0.013 0.021 0.004 0.957 0.004 Vobfus 0.005 0.005 0.005 0.026 0.958
Vundo 0.026 0.010 0.005 0.005 0.010 0.010 0.005 0.010 0.010 0.869 0.010 0.026 Vundo 0.006 0.036 0.006 0.006 0.922 0.018 0.006
Winwebsec 0.024 0.005 0.005 0.119 0.095 0.010 0.005 0.005 0.010 0.010 0.700 0.014 Winwebsec 0.006 0.006 0.006 0.982
Zbot 0.053 0.013 0.013 0.026 0.026 0.013 0.013 0.039 0.066 0.092 0.013 0.013 0.013 0.053 0.039 0.513 Zbot 0.018 0.109 0.036 0.036 0.036 0.764
0 0
nt
C ct
eG r
ga
Vo t
nt
ct
eG r
ga
Vo t
e
ke t
ke t
O
es
R nos
V VB
es
R nos
V VB
d
ee e
H n
O Inj r
ud
Vu s
in do
c
ot
H n
O Inj r
ud
V s
in do
c
ot
al
al
in to
in to
c
Fa bo
Fa bo
pl
pl
ba
ba
u
u
C ros
se
os
se
oa
ea
oa
ea
je
je
je
je
H
H
ge
ge
am
am
To
To
Zb
Zb
bf
bf
Sm
W n
Sm
W un
ec
ec
nL ec
nL ec
lla
lla
eb
eb
ifr
yc
yc
ot
ot
e
In
in
In
in
R
R
B
B
dl
dl
if
A
A
R
R
im
im
A
A
w
w
B
B
C
A
ee
C
(a) Bagged HMM (b) Bagged CNN
1
Adload 0.833 0.119 0.024 0.024
Agent 0.514 0.029 0.086 0.086 0.086 0.057 0.029 0.029 0.029 0.029 0.029
Allaple 1.000
Cycbot 1.000
Injector 0.032 0.290 0.161 0.032 0.161 0.032 0.032 0.097 0.032 0.032 0.065 0.032
Toga 0.013 0.025 0.025 0.013 0.025 0.025 0.494 0.342 0.025 0.013
VBinject 0.011 0.006 0.006 0.006 0.011 0.006 0.006 0.062 0.860 0.017 0.006 0.006
0.2
Vobfus 0.016 0.005 0.011 0.968
Zbot 0.018 0.127 0.055 0.036 0.018 0.018 0.055 0.018 0.018 0.036 0.036 0.564
0
nt
C ct
eG r
ga
Vo t
e
ke t
O
es
R nos
V VB
d
ee e
H n
O Inj r
ud
Vu s
in do
c
ot
al
in to
c
Fa bo
pl
ba
u
C os
se
oa
ea
je
je
H
ge
am
To
Zb
bf
Sm
W n
ec
nL ec
lla
eb
ifr
yc
ot
e
In
in
R
B
dl
A
R
im
A
w
B
B
A
1 1
Adload 0.231 0.077 0.038 0.615 0.038 Adload 0.923 0.038 0.038
Agent 0.158 0.053 0.053 0.158 0.053 0.526 Agent 0.211 0.053 0.053 0.158 0.053 0.053 0.053 0.105 0.053 0.053 0.158
Allaple 0.035 0.894 0.035 0.024 0.012 Allaple 0.976 0.012 0.012
BHO 0.688 0.031 0.062 0.219 BHO 0.062 0.875 0.031 0.031
0.8 0.8
Bifrose 0.062 0.125 0.188 0.062 0.188 0.062 0.188 0.062 0.062 Bifrose 0.125 0.062 0.188 0.062 0.062 0.062 0.312 0.062 0.062
CeeInject 0.011 0.659 0.022 0.044 0.011 0.011 0.143 0.099 CeeInject 0.022 0.846 0.011 0.011 0.011 0.033 0.011 0.022 0.011 0.022
Cycbot 0.470 0.045 0.288 0.015 0.030 0.152 Cycbot 0.985 0.015
FakeRean 0.024 0.381 0.024 0.095 0.071 0.024 0.381 FakeRean 0.024 0.071 0.738 0.024 0.024 0.095 0.024
Injector 0.091 0.091 0.091 0.091 0.182 0.091 0.273 0.091 Injector 0.091 0.091 0.182 0.091 0.091 0.273 0.091 0.091
OnLineGames 0.038 0.077 0.346 0.038 0.154 0.115 0.231 OnLineGames 0.038 0.038 0.038 0.577 0.077 0.115 0.115
Renos 0.125 0.018 0.018 0.018 0.018 0.429 0.036 0.054 0.018 0.268 Renos 0.018 0.018 0.018 0.857 0.018 0.054 0.018
Rimecud 0.077 0.538 0.231 0.154 0.4 Rimecud 0.077 0.077 0.615 0.154 0.077 0.4
Small 0.105 0.158 0.053 0.158 0.105 0.421 Small 0.105 0.105 0.105 0.105 0.105 0.158 0.105 0.158 0.053
Toga 0.023 0.023 0.047 0.372 0.349 0.047 0.140 Toga 0.047 0.023 0.023 0.023 0.302 0.372 0.047 0.116 0.047
VB 0.036 0.143 0.571 0.143 0.107 VB 0.036 0.321 0.464 0.107 0.071
VBinject 0.011 0.011 0.022 0.033 0.011 0.011 0.011 0.033 0.600 0.244 0.011 VBinject 0.022 0.033 0.022 0.056 0.700 0.122 0.011 0.033
0.2 0.2
Vobfus 0.019 0.010 0.058 0.442 0.452 0.019 Vobfus 0.010 0.019 0.010 0.019 0.106 0.837
Vundo 0.046 0.023 0.874 0.057 Vundo 0.011 0.034 0.011 0.011 0.885 0.034 0.011
Winwebsec 0.037 0.012 0.012 0.012 0.025 0.012 0.012 0.025 0.049 0.222 0.580 Winwebsec 0.025 0.049 0.012 0.012 0.012 0.877 0.012
Zbot 0.040 0.080 0.040 0.040 0.040 0.280 0.080 0.080 0.080 0.200 0.040 Zbot 0.120 0.040 0.120 0.040 0.200 0.040 0.040 0.160 0.240
0 0
A t
C ct
eG r
ga
Vo t
A t
C ct
eG r
ga
Vo t
e
ke t
ke t
O
es
R nos
V VB
es
R nos
V VB
d
ee e
H n
O Inj r
ud
Vu s
in do
c
ot
ee e
H n
O Inj r
ud
V s
in do
c
ot
n
n
al
al
in to
in to
c
Fa bo
Fa bo
pl
pl
ba
ba
u
u
C ros
se
C ros
se
oa
ea
oa
ea
je
je
je
je
H
H
ge
ge
am
am
To
To
Zb
Zb
bf
bf
Sm
W n
Sm
W un
ec
ec
nL ec
nL ec
lla
lla
eb
eb
yc
yc
ot
ot
e
In
in
In
in
R
R
B
B
dl
dl
if
if
A
A
R
R
im
im
w
w
B
B
A
16
1 1
Adload 0.976 0.024 Adload 0.833 0.119 0.024 0.024
Agent 0.743 0.057 0.029 0.029 0.057 0.029 0.057 Agent 0.514 0.029 0.057 0.057 0.086 0.086 0.057 0.057 0.029 0.029
BHO 0.042 0.915 0.014 0.028 BHO 0.944 0.014 0.014 0.028
0.8 0.8
Bifrose 0.368 0.158 0.026 0.026 0.026 0.368 0.026 Bifrose 0.026 0.079 0.184 0.026 0.026 0.053 0.079 0.026 0.368 0.053 0.079
CeeInject 0.949 0.019 0.013 0.013 0.006 CeeInject 0.013 0.013 0.911 0.013 0.025 0.006 0.019
Injector 0.032 0.161 0.548 0.161 0.065 0.032 Injector 0.032 0.323 0.032 0.097 0.032 0.097 0.032 0.032 0.129 0.032 0.032 0.129
OnLineGames 0.043 0.936 0.021 OnLineGames 0.064 0.043 0.745 0.021 0.085 0.043
Renos 0.010 0.990 Renos 0.010 0.010 0.933 0.010 0.029 0.010
Rimecud 0.029 0.941 0.029 0.4 Rimecud 0.029 0.029 0.912 0.029 0.4
Small 0.029 0.029 0.714 0.114 0.029 0.086 Small 0.143 0.029 0.029 0.057 0.029 0.429 0.057 0.057 0.057 0.114
Toga 0.013 0.013 0.759 0.215 Toga 0.025 0.013 0.038 0.025 0.013 0.025 0.443 0.380 0.013 0.013 0.013
VB 0.052 0.017 0.724 0.190 0.017 VB 0.052 0.017 0.603 0.190 0.121 0.017
VBinject 0.006 0.006 0.006 0.955 0.006 0.011 0.011 VBinject 0.006 0.006 0.006 0.006 0.006 0.017 0.933 0.006 0.011 0.006
0.2 0.2
Vobfus 0.011 0.989 Vobfus 0.016 0.005 0.011 0.963 0.005
Vundo 0.018 0.958 0.018 0.006 Vundo 0.006 0.006 0.012 0.006 0.006 0.952 0.006 0.006
Winwebsec 0.006 0.006 0.982 0.006 Winwebsec 0.018 0.006 0.006 0.971
Zbot 0.055 0.036 0.055 0.855 Zbot 0.018 0.109 0.055 0.091 0.018 0.018 0.036 0.018 0.018 0.018 0.055 0.545
0 0
nt
C ct
eG r
ga
Vo t
nt
C ct
eG r
ga
Vo t
e
ke t
ke t
O
es
R nos
V VB
es
R nos
V VB
d
ee e
H n
O Inj r
ud
Vu s
in do
c
ot
ee e
H n
O Inj r
ud
V s
in do
c
ot
al
al
in to
in to
c
Fa bo
Fa bo
pl
pl
ba
ba
u
u
C ros
se
os
se
oa
ea
oa
ea
je
je
je
je
H
H
ge
ge
am
am
To
To
Zb
Zb
bf
bf
Sm
W n
Sm
W un
ec
ec
nL ec
nL ec
lla
lla
eb
eb
ifr
yc
yc
ot
ot
e
In
in
In
in
R
R
B
B
dl
dl
if
A
A
R
R
im
im
A
A
w
w
B
B
A
C
(a) CNN (b) LSTM
1 1
Adload 0.976 0.024 Adload 0.952 0.024 0.024
Agent 0.771 0.029 0.029 0.029 0.057 0.029 0.057 Agent 0.514 0.029 0.086 0.057 0.057 0.057 0.086 0.029 0.029 0.057
BHO 0.042 0.915 0.014 0.028 BHO 0.042 0.915 0.014 0.014 0.014
0.8 0.8
Bifrose 0.026 0.342 0.158 0.026 0.395 0.053 Bifrose 0.026 0.053 0.500 0.053 0.053 0.026 0.026 0.026 0.211 0.026
CeeInject 0.956 0.019 0.006 0.006 0.006 0.006 CeeInject 0.019 0.013 0.025 0.892 0.006 0.013 0.006 0.006 0.006 0.006 0.006
FakeRean 0.989 0.011 FakeRean 0.011 0.011 0.032 0.926 0.011 0.011
Injector 0.032 0.161 0.516 0.194 0.065 0.032 Injector 0.032 0.032 0.097 0.065 0.484 0.032 0.032 0.032 0.161 0.032
OnLineGames 0.021 0.021 0.894 0.064 OnLineGames 0.043 0.021 0.809 0.043 0.064 0.021
Toga 0.013 0.013 0.658 0.316 Toga 0.013 0.025 0.013 0.013 0.025 0.013 0.696 0.177 0.013 0.013
VB 0.052 0.017 0.672 0.224 0.034 VB 0.034 0.017 0.017 0.569 0.172 0.155 0.034
VBinject 0.006 0.006 0.966 0.006 0.006 0.011 VBinject 0.006 0.011 0.006 0.006 0.006 0.006 0.011 0.034 0.820 0.067 0.017 0.011
0.2 0.2
Vobfus 0.005 0.995 Vobfus 0.005 0.005 0.005 0.026 0.958
Vundo 0.012 0.964 0.018 0.006 Vundo 0.006 0.018 0.958 0.012 0.006
Zbot 0.018 0.036 0.018 0.055 0.055 0.018 0.800 Zbot 0.018 0.018 0.018 0.036 0.018 0.145 0.018 0.036 0.055 0.636
0 0
A t
C ct
eG r
ga
Vo t
A t
C ct
eG r
ga
Vo t
e
ke t
ke t
O
es
R nos
es
R nos
B
d
ee e
H n
O Inj r
ud
V s
in do
c
ot
ee e
H n
O Inj r
ud
V s
in do
c
ot
n
n
al
al
in to
in to
c
Fa bo
Fa bo
pl
pl
ba
ba
u
u
C ros
se
C ros
se
oa
ea
oa
ea
V
V
je
je
je
je
H
H
ge
ge
am
am
To
To
Zb
Zb
bf
bf
Sm
W un
Sm
W un
ec
ec
nL ec
nL ec
lla
lla
eb
eb
yc
yc
ot
ot
e
In
in
In
in
R
R
B
B
dl
dl
if
if
A
A
R
R
im
im
w
w
B
B
A
A
V
Agent 0.771 0.029 0.029 0.029 0.057 0.029 0.057 Agent 0.600 0.057 0.029 0.057 0.029 0.029 0.057 0.029 0.057 0.057
BHO 0.042 0.915 0.014 0.028 BHO 0.042 0.901 0.014 0.042
0.8 0.8
Bifrose 0.026 0.342 0.158 0.026 0.395 0.053 Bifrose 0.026 0.447 0.079 0.053 0.368 0.026
CeeInject 0.956 0.019 0.006 0.006 0.006 0.006 CeeInject 0.968 0.006 0.013 0.006 0.006
Injector 0.032 0.161 0.516 0.194 0.065 0.032 Injector 0.129 0.548 0.032 0.194 0.065 0.032
OnLineGames 0.021 0.021 0.894 0.064 OnLineGames 0.021 0.872 0.021 0.064 0.021
Toga 0.013 0.013 0.658 0.316 Toga 0.013 0.013 0.013 0.658 0.304
VB 0.052 0.017 0.672 0.224 0.034 VB 0.017 0.034 0.707 0.172 0.052 0.017
VBinject 0.006 0.006 0.966 0.006 0.006 0.011 VBinject 0.011 0.006 0.006 0.961 0.006 0.006 0.006
0.2 0.2
Vobfus 0.005 0.995 Vobfus 0.016 0.984
Vundo 0.012 0.964 0.018 0.006 Vundo 0.018 0.958 0.018 0.006
Zbot 0.018 0.036 0.018 0.055 0.055 0.018 0.800 Zbot 0.018 0.036 0.018 0.018 0.091 0.055 0.055 0.709
0 0
A t
C ct
eG r
ga
Vo t
A t
C ct
eG r
ga
Vo t
e
ke t
ke t
O
es
R nos
V VB
es
R nos
B
d
ee e
H n
O Inj r
ud
Vu s
in do
c
ot
ee e
H n
O Inj r
ud
V s
in do
c
ot
n
n
al
al
in to
in to
c
Fa bo
Fa bo
pl
pl
ba
ba
u
u
C ros
se
C ros
se
oa
ea
oa
ea
V
je
je
je
je
H
H
ge
ge
am
am
To
To
Zb
Zb
bf
bf
Sm
W n
Sm
W un
ec
ec
nL ec
nL ec
lla
lla
eb
eb
yc
yc
ot
ot
e
In
in
In
in
R
R
B
B
dl
dl
if
if
A
A
R
R
im
im
w
w
B
B
A
17