Mini Project
Mini Project
As a tremendous amount of service being streamed online to their users along with massive
digital privacy information transmitted in recent years, the internet has become the backbone
of most people's everyday workflow. The extending usage of the internet, however, also
expands the attack surface for cyberattacks. If no effective protection mechanism is
implemented, the internet will only be much vulnerable, and this will raise the risk of data
getting leaked or hacked. The focus of this paper is to propose an Intrusion Detection System
(IDS) based on the Convolutional Neural Network (CNN) to reinforce the security of the
internet. The proposed IDS model is aimed at detecting network intrusions by classifying all
the packet traffic in the network as benign or malicious classes. The Canadian Institute for
Cybersecurity Intrusion Detection System (CICIDS2017) dataset has been used to train and
validate the proposed model. The model has been evaluated in terms of the overall accuracy,
attack detection rate, false alarm rate, and training overhead. A comparative study of the
proposed model's performance against nine other well-known classifiers has been presented.
CHAPTER -1
INTRODUCTION
In recent years, the amount of applications that stream services to their users has increased
explosively. This type of service requires minimal installations and computing power on the
user terminal because the applications are operating at the service carrier's cloud servers instead
of the local terminal; all the inputs and outputs are streamed to the users via the internet. Seeing
the obvious advantage of providing high-end service to customers, who are not able to access
high-end devices, many corporations have started to develop their streaming services. For
instance, entertaining service such as Google Stadia makes high-end gaming, which is typically
hardware demanding, now possible on any portable devices with good internet connectivity.
The game is processed and rendered at Google's cloud server with user's inputs in real time,
then the video is streamed back to the user's terminal via the internet. However, the extensive
data exchange at the network between the cloud servers and local user terminals also expand
the attack surface for intrusions. Malicious hackers may deploy various types of attacks, such
as Distributed Denial-of-Service (DDoS), Port Scan and Infiltration attack to hijack valuable
data or make servers unavailable to users. To stop these cyberattacks from happening, the
development of a reliable and effective Intrusion Detection System (IDS) for cybersecurity has
become an urgent issue to be solved.
The idea of the IDS is not new. In 1980, James P. Anderson delineated a type of cyber-security
device or application to monitor the network traffic with the intention to warn the
administrators of any suspicious activities or violations of the system policy [1]. The IDS was
proposed to include tools for the administrators to inspect the audit trails of the network traffic
in a system. By analysing the statistical information of audit trails, system administrators could
hunt for malicious traffic and take further actions to secure the system. Most traditional IDS
use an attack signature database that is created from expert's knowledge, to cooperate with
some predefined decision rules in the program to find out intrusions [2]. In [3], the author
claimed that it is easy to develop and understand a signature-based IDS if the network behavior
of the target anomaly activities is known. However, in recent years, cyberattacks have become
more sophisticated, especially the attacks on the systems that are storing or manipulating
sensitive information.
The hand-crafted attack database by experts will obsolete quickly without constant updates.
Another major problem with the signature-based IDS is that they fail to be generalized to detect
innovative attacks that have signatures that are not logged in the signature database. In short,
the attack signature database causes a high storage overhead in order to include the signature
of all the known attacks, making them hard to get implemented or distributed. Also, matching
the incoming data flow with the signatures logged in the dataset could be computationally
expensive.
The Artificial Neural Network (ANN) architecture is a popular solution nowadays for
prediction and classification tasks. There are many advantages of ANN which make it
especially suitable for network intrusion detection [4]. Firstly, ANN performs great at
modelling non-linear data with a large number of input features; for example, network packets.
Secondly, once an ANN is trained, its forward propagation, in other words, the predictions, are
fast. This is crucial for network's speed if the IDS model is placed in-line with the network
traffic. In brief, an ANN is trained by a big dataset to become a generalized solution for a
specific task. The traditional signature-based IDS, on the other hand, is using manually defined
and human-understandable rules for intrusion detection. The ANN approach relies on a solid
mathematics backbone in how the error is backpropagated with the stochastic gradient descent
method to optimize the model [5], [6]. In addition, no predefined rules are required during the
ANN's training process. Meaning that the developers are not required to possess much expertise
in the field of cybersecurity to train an ANN-based IDS. In addition, since the decision
mechanisms of the ANN-based IDS are generalized from the features of all the known attacks,
they have the potential to detect the innovative attacks sharing similar traits with the known
attacks. On the other hand, the signature-based IDS will fail to detect innovative attacks due to
the lack of knowledge on their specific signature.
In this paper, an IDS for cybersecurity based on the Convolutional Neural Network (CNN)
architecture is proposed. Unlike some previously proposed CNN based IDS, which mostly
aiming at one-class or a subset of classes classification [7]–[9]. The proposed IDS has high
performance at multi-class classification, for identifying innovative and all the known attack
classes in a dataset. And compared to some state-of the-art CNN based IDS's that are aiming
at multi-class classification, such as Potluri's work in [10] for classifying the UNSW-NB 15
dataset [11]. The Canadian Institute for Cybersecurity Intrusion Detection System
(CICIDS2017) dataset, used in this paper, is more challenging for a classifier with its larger
feature set and more attack classes that are 59% and 55% more than the UNSW-NB 15 dataset.
This paper is structured to include a brief review of some recent research on deep learning
applications at cybersecurity. Then in the model design section, CICIDS2017 database is
explored to reveal the design decisions of the custom training and test dataset of the proposed
model. Also, the architecture of the CNN, which is the deep learning model on which the
proposed IDS is based, is discussed and the mathematical details are provided. Finally, the
architecture of the proposed IDS is presented along with its parameter selections. And the
model is evaluated by the defined benchmarks in the simulation result section to provide
performance validation and comparison.
LITERATURE REVIEW
Prior to the rise of machine and deep learning, the design of the IDS is usually based on network
experts’ knowledge of the attacks. In [12], Ansam Khraisat et al. classified various kinds of
IDS models based on their detecting techniques. In short, the IDS with statistics-based
techniques builds a distribution model for the benign traffic and flag the low probability events
as potential attacks. The IDS with knowledge-based techniques, on the other hand, creates a
knowledge base to reflect the legitimate traffic profile. Afterward, any action that differs from
the standard profile is flagged as an intrusion. Finally, there are IDS with machine learning
techniques. These models mine characteristics of each type of attack from large quantities of
data and classify traffic based on the learned characteristics. In addition, a survey of the datasets
for intrusion detection systems is presented by the authors in [12]. They explored some public
datasets such as Knowledge Discovery Databases (KDD) Cup’99, Center for Applied Internet
Data Analysis (CAIDA), Network Security Laboratory- Knowledge Discovery Databases
(NSL-KDD), and CICIDS2017, and a comparative study of those IDS datasets, in terms of
their feature selection and type of computer attacks, is presented as well. Finally, the authors
provided classification results of the selected datasets based on their prior research.
Specifically, on CICIDS2017, their model that combinedly uses a Multiplayer Perceptron
(MLP) neural network and payload classifier reaches an accuracy of 95.2%.
Machine learning is a popular solution for classification tasks due to its relatively simple
architecture and low computation overhead. Many studies that are applying machine learning
techniques on the CICIDS2017 dataset for attack classification have been proposed [13]–[15].
Many of which have reached decent accuracy at one-class classifications for a certain attack
class, such as DDoS, in the dataset. However, to reach a usable multi-class classification
accuracy in detecting modern network intrusions, many data preprocessing methods have been
proposed to improve the model's performance. In [13], Yonghao Gu et al. proposed a semi-
supervised K-means model to detect the DDoS attacks in CICIDS2017. Besides, they
implemented a hybrid feature selection algorithm to exclude the unreasonable features from
being the input of the model to prevent “the curse of dimensionality”. Their feature selection
algorithm takes the available features as input. The features are processed by a series of
procedures including the data normalization, feature ranking, and feature subset searching, and
selected features are output by the algorithm. Finally, with their proposed feature selection
method, they reached a detection rate of 96.50% and a false positive rate of 30.5%.
In recent years, deep learning models such as neural networks have become more effective
solutions for classification tasks because of their ability to generalize more complex patterns
of features of tasks. In [16], the authors provided a study of anomaly analysis for intrusion
detection with K-Nearest Neighbors (KNN) and Deep Neural Network (DNN) to compare the
classification performance of a machine learning model to a deep learning model. They used
CICIDS2017 as the database for the simulations of the model's performance in the study. They
concluded that DNN performed significantly better than KNN. In specific, their DNN yields
an accuracy of 96.427%, which is much higher than 90.913% accuracy by the KNN. In
addition, they also compared the computation time overhead of the two models. The 110 (s)
CPU time of DNN is shorter than the 130 (s) CPU time of KNN, therefore, indicated that DNN
has a lower time overhead than KNN.
In [7], another study of using deep learning models for cybersecurity in the Internet of Things
(IoT) networks is proposed by Monika Roopak, Gui Yun Tian and Jonathon Chambers. The
authors used the DDoS attack samples in CICIDS2017 to evaluate the performance of MLP,
Long Short-Term Memory (LSTM), CNN, and a hybrid model of LSTM and CNN. They
reached a 98.44% precision by the LSTM model, followed by a 98.14% precision by the CNN
model and a 97.41% precision by the hybrid model. Lastly, the MLP model reached an 88.47%
accuracy in their simulation. The authors also compared their results to some machine learning
models. After the simulation, they concluded that all the tested deep learning models except
MLP outperform the machine learning models such as SVM, Bayes and Random forest.
In [8], Sungwoog Yeom and Kyungbeak Kim tested the performance of Naïve bayes, SVM
and CNN based classifier on the CICIDS2017 dataset. In the study, it was focus more on the
model's binary classification performance on each attack class in the dataset. The raw
CICIDS2017 dataset, which included separate sub-datasets of the network traffic in a day, was
used to train the models. The authors trained and evaluated a CNN based classifier based on
each sub-dataset, which mostly included only 1∼3 attack classes. And the accuracy, precision,
recall and F-measure of the models were record. After the evaluation, the authors concluded
that CNN and SVM generally had high detection rates. In addition, CNN was better than SVM
in term of the processing time. However, they also observed that CNN had mediocre
performance with datasets with many labels. In other words, it was challenging for the CNN
models if there were many labels need to be classified.
In [9], a CNN based IDS was proposed by Jiyeon Kim et al. In this paper, the authors employed
deep learning techniques and developed a CNN model for the CICIDS2018 dataset, which was
a dataset sharing the same feature set with CICIDS2017 but with larger sample counts. The
training and test of the models in the study were performed on sub-datasets which included a
subset of types of network traffic from CICIDS2018. Therefore, the models were simulated for
multi-class classification for certain classes in the dataset, not all of them at once. In this study,
the experimental results showed that the performance of the CNN based IDS could be higher
than that of the recurrent neural network (RNN), which is another deep learning model that is
popular in the cases of time series data being used as input. The CNN model proposed in this
study reached a 96.77% accuracy in the sub-dataset which was composed by the benign and
DoS samples from CICIDS2018. On the other hand, the RNN model tested in this study
reached a 82.84% accuracy in the same dataset, which was significantly lower than of the CNN
model.
In [17], Ahmed Ahmim et al. proposed a novel hierarchical IDS that is based on Decision Tree
and Rules-based models. They also used CICIDS2017 as the dataset to evaluate the
performance of their model. Their proposed model combines Reduced Error Pruning Tree
(REP Tree) and JRip algorithm at its first stage. The input features of the dataset are used as
input at this stage to classify traffic as attacks or benign. Then a Forest PA classifier takes the
output of the two classifiers at the first stage, in combination with the input features of the
initial dataset as input to generate the final classification result. Their model reached decent
performance on almost every traffic class in CICIDS2017. They also provided a performance
comparison of their proposed model with 11 well-known classifiers to validate its classification
power. Within the 12 classifiers models, their model had the best classification performance at
seven attack classes and the lowest false alarm rate for benign traffic. This model is competitive
for its great overall classification performance on CICIDS2017. Therefore, in the result section
of this paper, the proposed IDS model has been compared against their novel hierarchical IDS
to assess the proposed model's performance.
CHAPTER-3
MODEL DESIGN
This section specifies many design elements of the proposed IDS model. At first, the
cybersecurity database CICIDS2017 that is used to train the proposed model is introduced and
analyzed. The data preprocessing methods and the design of the training dataset are also
presented. Then, the architecture and mathematical model of the CNN are introduced and
discussed. Finally, the architecture, operating flow of the proposed model and input data
collecting methods are presented in details.
The training, validating, and testing of the proposed model are all accomplished with the data
from CICIDS2017 database. CICIDS2017 database was proposed in 2018 by Iman
Sharafaldin et al. [18]. The database was developed to combat the sophisticated and ever-
growing network attacks in the modern era. The creators of the database had the ambition to
replace outdating security databases such as DARPA98, KDD99, ISC2012, and ADFA13.
These databases were vastly used for the evaluation of intrusion detection and prevention
models in many papers in the past decades, but were in fact obsolete and unreliable to use [19].
Therefore, for the proposed model, which aims at detecting modern cyberattacks, CICIDS2017
is a top-notch option for the training database.
The CICIDS2017 database emulates the network traffic of a real environment for one week.
The network traffic contains normal network packets and a diverse set of attack scenarios that
are injected by the attack profiles created by developers. In total, the database contains
2830,743 samples, and the composition of the traffic classes is presented in Table 1.
TABLE 1: CIDIDS2017 With Sample Size and Class Composition
Each sample in the database contains 78 features that are extracted from the traffic of two nodes
in the network, and a label to indicate the class of the traffic. A few examples of the extracted
features are the destination port, flow duration, total forward packets, and total backward
packets. Among the 78 features, the proposed model takes all of them as input except for the
destination port. The destination port of the network traffic may be a useful audit trail for the
network admin to track down the place where a cyberattack is initiated or target towards.
However, it is a method of encoding the network ports into identification numbers. The
destination port is not a quantitative measure of network traffic that our neural network-based
IDS is expected for input. Accordingly, the inclusion of it, as an input feature, can cause
unexpected issues at the training process of the proposed neural network model. The rest of the
77 features are all quantitative measure, thus, they are all valid inputs for the proposed model.
Albeit CICIDS2017 is an advanced modern security database with merits, it does have certain
shortcomings that need to be considered and addressed to unlock its full potential. There are
three main issues of the database: scattered presence, missing values and class imbalance. The
network traffic is logged into eight separate files corresponding to the time window and class
of the traffic sample. The scattered presence is not ideal for the training of the proposed model
since the model aims to classify all the cyberattacks in the database instead of particular types
of attack. Further, according to [19], CICIDS2017 has a total of 288,602 samples with an absent
class label, and 203 instances of missing information. The samples with an absent class label
need to be removed, and the missing information needs to be restored.
Table 1 shows the class composition of the original data in CICIDS2017. Among all the
samples, the benign samples have a proportion of 80.301%, which is the highest in the database.
On the opposite, the Heartbleed samples have a proportion of only 0.001%. This high
imbalance of in-class sample size typically leads to biased classification results. It needs to be
avoided in the proposed model to prevent a drop at overall performance.
The three issues in the database raise the difficulty for the neural network model to reach decent
performance. However, they are not necessarily defects for a database, which is aimed at
simulating sophisticated network traffic. In fact, the three issues, mentioned above, do have a
high chance to appear in the data that is collected from a real-world environment. Therefore,
by solving these shortcomings in CICIDS2017, the proposed model is also prepared to be
implemented in real environments.
The first issue of missing values is solved in the data preprocessing phase by a data imputer.
In this case, the missing values are all replaced by 0 to prevent value errors. In addition, the
rest of the two issues, namely, class imbalance and scattered presence, are solved by building
a custom database that concatenates all the separate files of CICIDS2017 with balanced class
composition. This custom database is called α-Dataset.
Table 2 presents a few design decisions of the α-Dataset, such as the size of samples for each
class, splitting strategy of training and test set, and the class composition of the training set.
Balancing the samples for a high imbalance database is a problem with trade-offs. Most neural
network models work better with databases with balanced samples in each class. However,
neural network models also require a big dataset for better generalization of the task assigned
to it. Therefore, it is not feasible to simply trim down the number of samples in the classes with
a higher proportion to match with the minority classes. For the best of both worlds, the database
is divided into two categories. Namely, the normal attack samples, and minority attack samples
that are marked grey in Table 2.
TABLE 2: α-Dataset With Sample Size, Training/Test Distribution and Class Composition in
Training Dataset
The normal attack samples include the benign samples from the Monday file, DDoS, DoS, and
Port Scan. Each of the four classes is split into training and test samples with a distinct ratio.
These ratios are tuned to generate a training set with a balanced size of samples across the four
classes. Table 2 shows that each of the four classes has roughly 102,000 samples to form a
balanced training set. On the other hand, the minority samples contain the rest of the attack
samples in the database. Given their small size of samples, 80% of the samples, in each class,
become training samples, and the whole class is also used as test samples to provide sufficient
data for the training and testing process.
CNN is a popular neural network that is typically used in image processing tasks. In recent
years, due to its popularity, CNN is also extensively used in natural language processing, video
analyzing, and even some models with sequential inputs. The unique convolution and pooling
process of CNN allow the model to learn complicated patterns of features form a high-
dimensional data space while maintaining reasonable storage and computation time overhead.
This advantage is significant when a CNN model is trained with a network traffic dataset, given
that these datasets typically have a large feature set. For example, there are 78 features recorded
for each traffic sample in the CICIDS2017 dataset, which can lead to high computational
complexity for other deep learning models such as DNN [20]. Also, for any network IDS, it is
exceptionally crucial to have the ability to capture the complicated features of network traffic
within a short amount of time. Therefore, it is believed that CNN is the priority neural network
model for developing the prototype of the proposed model. Fig. 1 shows the architecture of a
CNN model with the input size of 8x8 matrix. The whole model could be separated into three
main sections: convolution layer, pooling layer, and fully connected layer. In terms of
functionality, the convolution layer and pooling layer together handle the feature extraction of
the training samples, and the fully connected layer processes the final classification. At the
convolution layer, CNN model convolves the input sample with some specific matrices called
“feature detector” or “kernel map”. The result of the convolution is called “feature map”.
Feature detectors are matrices used to extract, from the input sample, specific features, such as
patterns, shape and lines. Their value is initially randomized and gradually updated by the
optimizer during the training process. After the input sample is convolved with the feature
detectors, the feature maps are generated. For CNN model with K feature detectors, this process
could be mathematically denoted as:
featuremap{i}=input⊗featuredetector{i};i∈K.(1)
Then, the feature map calculated by (1) is mapped to a non-linear activation function to break
the linearity of the model. In Fig. 1, the whole process of generating the feature maps and
mapping them to the activation function is represented by the green connections at the
convolution layer.
The second section of the model is the pooling layer. At this layer, the feature maps will be
pooled to bring down their dimension. The purpose of this process is to eliminate noise and
model's dependent on the spatial location of the learned patterns.
In short, pooling mitigates the effects when a specific pattern that the model is trained to
recognize appears at different locations or angles at the input sample. Furthermore, to eliminate
noise, pooling brings down the dimension of the feature maps, while preserving the valid
information carried by them. This also eases the calculation of any further processing to the
feature maps.
After the feature maps are calculated and pooled, they will undergo a “flattening” process
before being passed to the fully connected layer. The flattening process converts the feature
maps from 2-D matrices to 1-D arrays in row-major order, which is the format expected by the
fully connected layer. The fully connected layer is composed of input neurons, optional hidden
neurons and output neurons. Each neuron is connected to every neuron at the adjacent layers
with distinct weights and biases as the parameter. The output neurons, which are also the final
output of the CNN model, determine to which class each input sample belongs.
In the case of cybersecurity, the model can use the output neurons to represent the benign class
and every class of attack that the model is trained to classify. When a network traffic sample is
processed by the model, it will be classified as the class that is represented by the output neuron
that holds the highest value at the output layer.
One of the characteristics that distinguish CNN from other neural networks is weight sharing.
CNN has shared weights, which means the model uses the same weight vectors to do the
convolution. In other words, the feature detectors, which contain the weight vectors, are
replicated at every convolution process at the convolution layer. There are some advantages to
the implementation of shared weights. First, it makes the model have substantially fewer
parameters to be optimized. This means the optimizer can potentially converge the model to
optimal loss with less time overhead. Secondly, it slightly lowers the degrees of freedom of the
model. This could help the model to avoid overfitting, which happens when the model makes
too much of an effort to fit itself to a limited set of samples.
The constraint on weights also enables the model to achieve better generalization on the tasks
assigned to it. Briefly, CNN has merits that are suitable for the development of IDS for
cybersecurity in the modern era. CNN requires less time overhead at the testing process.
Besides, with its convolution mechanism, CNN could potentially learn the much more complex
characteristics of some modern cyberattacks, which other neural network models struggle to
capture. Lastly, CNN can achieve better generalization on the classification of cyberattack
samples. This enables the IDS to potentially detect innovative attacks that share similar traits
with the known attacks.
However, from [7]–[9], it is observed that CNN, albeit with its merits at generalize complex
patterns of features in the dataset, has some potential issues at multi-class classifications. The
CNN models proposed in these studies are tested on classification tasks of either one-class or
a subset of classes in the dataset. This is not a surprise since neural networks depend a lot on
training data to reach the full potential and prevent issues such as biased classification results
or over-fitting [4]. Therefore, to access the advantage of the neural network models, it is
important to train the model with a dataset that is not only rich in its sample counts and feature
set but also has a balance class composition.
In this study, a novel data-preprocessing method is proposed in the Model Design section to
generate a relatively balance sub-dataset, which still includes all the available attack classes
from the CICIDS2017 dataset, for the training and test of the proposed IDS model. And the
proposed IDS is trained to perform multi-class classifications on every types of attacks that are
known in the CICIDS2017 dataset.
The proposed IDS model, in this paper, is developed on Python 3. PyTorch is used as the
framework to build the neural network layers of the proposed model [21].
CNN has many layers to be implemented and some parameters to be selected before training
one. Empirically, the number of convolution and fully connected layers for a CNN for network
intrusion detection tasks could be selected from the range of 1∼3 layers each [8], [9], [22]. In
general, the more complex the task assigned to a neural network, the more layers are added.
For example, the ResNet-50, which is a large CNN that classify images into 1000 object
classes, has 50 layers in total. However, a large network size does not always guarantee the
performance of a CNN [22]. The input of the model is batches of matrices. Each matrix has a
size of 9 × 9, which is composed of 77 extracted features from the network traffic samples and
four zero pads for format concern. The model contains two convolution layers and two fully
connected layers. The decision of hyperparameters, such as the number of layers, the kernel
size for convolution, and the number of neurons in a fully connected layer, are all determined
by exhaustive grid search. In this method, all the hyperparameters are selected from a subset
of values that are manually specified, and the model will be evaluated iteratively to search for
the best combination in the parameter values. The architecture of the proposed model is
presented in Fig. 2. At the first convolution layer, the kernel of the convolution is designed to
have a size of 3 × 3, and the stride and padding of the convolution to be one. In addition, the
number of output feature maps is selected to be 16, and the activation function to be a rectified
linear unit (ReLU) function at this layer. Accordingly, for each matrix in a training batch, the
model generates 16 feature maps at the first convolution layer. Each feature map has a size of
9x9 and is mapped to a ReLU function. Then, the feature maps are pooled at the first pooling
layer by the Max Pooling algorithm with a kernel size of 2x2, stride of 2, and padding of 1.
This generates 16 pooled feature maps, each has a size of 5x5. Afterward, the pooled feature
maps are passed to the second convolution layer and pooling layer, which both have the same
kernel size, stride, and padding configuration as the first two layers.
The second convolution layer is designed to output 32 feature maps, and they are also
individually mapped to a ReLU activation function. Eventually, after the two convolution
layers and two pooling layers, the proposed model generates 32 pooled 3 × 3 feature maps for
each input matrix. At the flattening layer, all the elements of the pooled feature maps are
converted into a 1-D array in row-major order, which is the format expected by the fully
connected layer. This generates an array of 288 elements for each input matrix. Then the array
becomes an input sample of the fully connected layer, which has 288 input neurons, a hidden
layer and nine output neurons.
Consequently, the proposed model is trained to classify network traffic into nine different
classes represented by the nine output neurons, namely, Benign, Brute Force, Bot, DDoS, DoS,
Heartbleed, Infiltration, Port Scan, and Web Attack. The number of classes is designed this
way for the model to reach a better generalization for the cyberattacks.
For a certain dataset with a limited number of samples, when the number of classes increases,
the number of samples for each class decreases inevitably. It raises the difficulty of the model
to generalize the characteristics of each class. Therefore, the proposed model has a custom
dataset function to merge some of the samples, based on their type of attack, into a bigger class.
For instance, from CICIDS2017 database, samples with the four labels of DoS attacks, namely,
DoS GoldenEye, DoS Hulk, DoS Slowhttptest, and DoS slowloris are merged into a single
DoS class. The same merging strategy is also implemented at various labels of Brute Force and
Web Attack samples in the database to form bigger classes of the two types of attacks.
Eventually, the samples from CICIDS2017 are assorted into nine distinct classes with some
greater sizes of in-class samples to facilitate the model's generalization of cyberattacks. Hence
the selection of nine output neurons at the final output layer of the model. Meanwhile, the
original label is still attached to each sample for label-wise performance evaluation of the
proposed model.
CHAPTER-4
SIMULATION RESULTS
This section presents in detail the performance of the proposed IDS model based on the
simulation results with some benchmarks. The benchmarks include the Detection Rate (DR),
True Negative Rate (TNR), and overall accuracy, which are all explicitly defined and
calculated. In addition, a comparative study of the proposed model with nine other well-known
classifiers is provided. Finally, the proposed model's performance on innovative attacks is also
validated. All the simulations in this section are done on a workstation of the Networked
Embedded Systems (NES) Laboratory at California State University, Long Beach. This
workstation runs a 64-bit Windows 10 OS and is equipped with an Intel Xeon W-2125
processor and 64 GB of RAM.
A. Performance Benchmarks
To evaluate the classification performance of the proposed model on cyberattacks, it has been
simulated with several benchmarks that are derived from the confusion matrix generated by the
model. Confusion matrix, as shown in Table 3, is a matrix showing all the possible scenarios
of classification results. In terms of cybersecurity, the positive class is defined to be the attack
samples, and the negative class contains the benign samples. There are four scenarios in a
confusion matrix, namely, True Negative (TN), True Positive (TP), False Negative (FN) and
False Positive (FP). Among the four scenarios, TN and TP contribute to the overall accuracy
of the classification outcomes. They represent the instances when a model successfully predicts
a sample to be its true case. On the other hand, instances of FN and FP deteriorate the
performance of a classification model. They occur when a model predicts a sample incorrectly.
In a sense of cybersecurity, FP occurs when a benign sample is classified as an attack, which
is also known as a false alarm.
TABLE 3: Confusion Matrix
Confusion matrix is a great tool to record the simulation results of a classification model.
However, it is not necessarily easy to read nor handy enough to quickly evaluate the
performance of IDS models. Thus, for comparative studies, some performance benchmarks are
derived from the confusion matrix, which are universally used. The first benchmark is DR,
which could be understood as the ratio of the samples in an attack class, which are correctly
classified as attacks by an IDS model. Mathematically, DR could be derived from the confusion
matrix and denoted as:
DRclass=TPclassTPclass+FNclass.(2)
In general, when an IDS model has a high DR at an attack class, it implies the model performs
great at recognizing this type of attack and not confusing them with the benign samples.
TNR is a benchmark of an IDS model's classification power on the benign samples. When an
IDS model yields a high TNR, it suggests that the model performs well on isolating the benign
samples from the attack samples. TNR is also derived from the confusion matrix.
Mathematically, it could be denoted as:
TNR=TNBENIGNTNBENIGN+FPBENIGN.(3)
Another performance benchmark, which is closely related to the TNR, is the False Alarm Rate
(FAR). As its name implies, FAR gives the probability of a false alarm, which occurs when a
benign sample is falsely classified as an attack, initiated by the IDS model. It is very important
to evaluate a classifier's performance at FAR, especially in the cases of anomaly detections. A
classifier for network intrusion detections is not trustworthy when it has a high DR for attacks
but also a high FAR. In such case, the classifier will perform poorly with data from the real
world. Given that the benign traffic is the majority in an ordinary network environment, a high
FAR means the IDS can falsely warn the network admin on benign traffic very frequently,
making the IDS not usable. In other words, for an IDS model, the higher the FAR, the lower
the TNR. FAR could be denoted as:
FAR=FPBENIGNTNBENIGN+FPBENIGN=1−TNR.(4)
Finally, the proposed IDS model's overall accuracy is also provided. The overall accuracy
reflects a classifier's ability of classifying each sample into its true class. In the case of network
intrusion detections, it is a comprehensive performance metric of an IDS's classification power
at both benign and attack samples. It is defined to be the ratio of the total correct predictions
made by the classifier to the total number of the input samples. The overall accuracy is
calculated as follows:
Accuracy=TNoverall+TPoverallTNoverall+TPoverall+FNoverall+FPoverall.(5)
The proposed IDS model is trained on the training set from the α-Dataset. The Adam
optimizer [23] is used in the training process. The learning rate is set to 0.0002 for the first 50
epochs and decreased to its 1/10 for refine tuning. Some performance benchmarks of the trained
model are shown in Table 4. In Table 4, DR and TNR of the proposed model on the training
set and test set are presented. The proposed model reaches over 90% in DR at 12 attack classes
out of the total 14 classes. In addition, at 10 attack classes, the proposed model successfully
yields over 99% at DR. Also, TNR of the benign class reaches 98.984%. The proposed IDS
model performs relatively poorly at the Botnet and Sql Injection class when compared to the
other attack classes. This is due to the small size of in-class samples of the two classes. Botnet
and Sql Injection only make proportions of 0.07% and 0.001%, respectively, in the original
CICIDS2017 database. Accordingly, the proposed IDS model struggles to capture the
characteristics of these two types of network traffic due to their insufficient samples. This also
happened at some other IDS models using CICIDS2017 as the training database, as shown
in Table 5.
TABLE 4: Performance of the Proposed Model on α -Dataset
TABLE 5: Class-Wise Performance of the Proposed IDS Model and Other Classifiers
C. Performance Comparison
The proposed IDS model is also evaluated by comparing to some well-known classifiers.
In Table 5, the class-wise DR and TNR of the following nine models: Hierarchical [17],
WISARD [24], Forest PA [25], J48, LIBSVM [26], FURIA [27], Random Forest, MLP, and
Naive Bayes are presented. In addition, the figures in Table 5 that are bolded represent the best
performer in a class. For example, the 98.984% TNR yielded by the proposed IDS is bolded to
indicate it has the highest TNR performance among the competitors.
Among the selected models, the proposed IDS model has the highest DR at 10 attack classes
out of the total 14 classes. These 10 classes are the Bot, DDoS, DoS GoldenEye, DoS Hulk,
DoS Slowhttptest, DoS slowloris, Heartbleed, Port Scan, Web Attack-Brute Force, and Web
Attack-XSS. For the rest of the four attack classes, in terms of DR, the proposed model is
ranked the second at FTP-Patator, the seventh at SSH-Patator, the third at Infiltration, and the
fourth at Web Attack-Sql Injection. In addition, the proposed model has the highest TNR for
benign samples compared to the nine selected models.
This also means that this model has the lowest FAR, which can be calculated by (19).
Accordingly, out of the total 15 traffic classes in CICIDS2017, the proposed model has the best
classification performance at 11 classes when compared to the rest of the nine classifier models.
Finally, in Figs. 3 and 4, the overall attack DR, FAR and accuracy of the ten selected classifiers
are presented to provide a visually intuitive performance comparison. From the figures, it is
easily observable that the proposed model has the highest overall DR and accuracy and the
lowest FAR within the ten models.
Fig 4.1: Overall detection rate and accuracy of the ten selected classifiers.
In addition, the proposed IDS is compared with two other CNN based IDS that are recently
published. The compared models have been introduced in the literature review chapter [8], [9].
Both studies perform intrusion detections at the CICIDS dataset with a similar CNN
architecture yet different data pre-processing methods. In [8], the raw CICIDS2017 dataset,
which included separate sub-datasets of the network traffic in a day, was used to train the
models. The authors trained and evaluated a CNN based IDS based on each of the sub-dataset,
which mostly included only 1∼3 attack classes. In [9], the authors manually separated the
CICIDS2018 dataset into ten sub-datasets, and each sub-dataset included up to three attack
classes. Therefore, the CNN based IDS in [8], [9] both performed one-class or a subset of class
classification, and the model's accuracy, DR, and TNR were provided. In Table 6, the DR and
TNR of the proposed IDS and the compared CNN based IDS are provided. The proposed IDS
is superior to the other models in many aspects. First, in the comparison, the proposed IDS is
the only model performing true multi-class classification on all the 15 classes that are available
in the CICIDS dataset. The other two CNN based IDS, while performing well on binary
classification on one attack class, have poor performance if there are many attack classes need
to be classified. Second, in term of class-wise classification performance, the proposed IDS
also has the highest TNR, and a higher or comparable DR at each attack class. In summary, the
proposed IDS, thanks to the innovative dataset pre-processing method, not only can classify
more classes at once with one CNN classifier but also yields a competitive benchmark at each
class-wise classification category compared with other previously proposed CNN based IDS.
TABLE 6: Class-Wise Performance of Other CNN Models on CICIDS2017
To test how the proposed model performs on innovative attacks, the data from CICIDS2017 is
used to design some custom training and testing datasets for simulation. More specifically, the
DoS attacks is selected to be the test subject due to its diversity.
CICIDS2017 has four DoS varieties, namely, the DoS GoldenEye, DoS Hulk, DoS
Slowhttptest and DoS slowloris. During the simulation, four IDS models are generated, each
has the same architecture of the proposed model, with four combinations of training and test
dataset. Each time a model is trained, only one type of the DoS samples are included in the
training dataset to represent the known attacks. Meanwhile, the rest of the three DoS varieties
are only used in the test dataset to represent the innovative attacks. In this scenario, since the
model never encounters the innovative DoS samples in the test dataset, some traditional
signature-based IDS models may fail easily due to the lack of signature information in the
database. On the other hand, the proposed model has the potential to recognize these innovative
DoS samples.
The proposed IDS model, which is a neural network-based model, captures the generalized
characteristics of each type of cyberattacks. Therefore, if some innovative DoS samples share
similar traits with the DoS samples in the training dataset, they might be successfully classified
as the DoS class by the model.
The four models are trained separately and their DR on the innovative DoS attacks in the test
dataset is recorded. The simulation results of all the four models are presented in Table 7.
From Table 7, one can easily observe that when only the DoS GoldenEye samples are used in
the model's training process, the model has the potential to also detect DoS Hulk at a decent
87.29% DR.
Besides, when the model is only trained on the DoS slowloris samples, it has the potential to
also detect DoS Slowhttptest at an 85.73% DR. These simulation results confirm that the
proposed model is superior to the traditional IDS at detecting innovative attacks.
CHAPTER-5
IoT Interruption is characterized as an unapproved activity or movement that hurts the IoT
biological system. For instance, an assault that will make the PC administrations inaccessible
to its real clients is viewed as an interruption. An IDS is characterized as a product or equipment
framework that keeps up with the security of the framework by recognizing vindictive exercises
on the PC frameworks. The primary point of IDS is to distinguish unapproved PC utilization
and vindictive organization traffic which is preposterous while utilizing a customary firewall.
This outcomes in making the PC frameworks exceptionally defensive against the noxious
activities that compromise the accessibility, respectability, or secrecy of PC frameworks. A.
Signature-based intrusion detection systems (SIDS) Signature interruption location
frameworks (SIDS) use design matching procedures to track down a referred to assault; these
are otherwise called Information based Recognition. In SIDS, matching techniques are utilized
to track down a past interruption. As such, when an interruption signature matches the mark of
a past interruption that as of now exists in the mark data set, an alert sign is set off. For SIDS,
the host's logs are reviewed to observe arrangements of orders or activities which have recently
been distinguished as malware. SIDS has likewise been named in the writing as Information
Based Discovery or Abuse Recognition. Customary strategies for SIDS experience issues in
distinguishing assaults that length different parcels as they inspect network bundles and
perform matching against an information base of marks. With the expanded refinement of
current malware, separating mark data from different bundles might be required. With this,
IDS needs to bring the substance of prior parcels also. For making a mark for SIDS, by and
large, there have been a few strategies where marks are made as state machines, formal
language string designs or semantic circumstances. B. Anomaly-based intrusion detection
system (AIDS) © 2022 IJRAR September 2022, Volume 9, Issue 3 www.ijrar.org (E-ISSN
2348-1269, P- ISSN 2349-5138) IJRAR22C2601 International Journal of Research and
Analytical Reviews (IJRAR) www.ijrar.org 932 Helps has drawn in a great deal of researchers
due to its element to beat the constraint of SIDS. In Helps, a typical model of the conduct of a
PC framework is made utilizing AI, measurable based or information based techniques. Any
huge deviation between the noticed conduct and the model is viewed as an irregularity, which
can be deciphered as an interruption. This sort of strategy chips away at the way that pernicious
conduct is not quite the same as commonplace client conduct. The conduct of unusual clients
that separates from the standard conduct is characterized as an interruption. There are two
stages in the advancement of Helps: the preparation stage and the testing stage. In the
preparation stage, the typical traffic profile is utilized to gain proficiency with a model of
ordinary conduct. In the testing stage, another informational index is utilized to foster the
framework's ability to sum up to beforehand inconspicuous interruptions. Helps can be sub-
arranged in light of the strategy utilized for preparing, for example, factual based, information
based and AI based. The primary benefit of Helps is the capacity to distinguish zero-day
assaults on the grounds that perceiving the strange client movement doesn't depend on a mark
information base. Helps sets off a risk signal when the inspected conduct goes amiss from
ordinary conduct. Moreover, Helps has various advantages. To begin with, they can find inside
malignant exercises. Assuming an interloper begins making exchanges in a taken record that
are unidentified in the average client movement, it makes a caution. Second, it is trying for a
cybercriminal to perceive what a typical client conduct is without delivering a ready as the
framework is developed from redid profiles. C. Machine Learning based Technique AI is the
most common way of separating information from huge amounts of information. AI models
include a bunch of rules, techniques, or complex "move works" that can be applied to observe
intriguing information designs or to perceive or anticipate conduct. AI procedures have been
applied broadly in the space of Helps. To extricate the information from interruption datasets,
various calculations and strategies, for example, grouping, brain organizations, affiliation rules,
choice trees, hereditary calculations, and closest neighbor techniques are used. Some earlier
examination has analyzed the utilization of various strategies to assemble AIDSs. Analyzed the
presentation of two element determination calculations including Bayesian organizations (BN)
and Characterization Relapse Trees (CRC) and consolidated these strategies for higher
exactness. Procedures of component determination utilizing a mix of element choice
calculations like Data Gain (IG) and Connection Characteristic assessment. They tried the
presentation of the chose highlights by applying different order calculations like C4.5, guileless
Bayes, NB-Tree and Multi-facet Perceptron. A hereditary fluffy rule mining strategy has been
utilized to assess the significance of IDS highlights. NIDS by utilizing the Arbitrary Tree model
to further develop exactness and diminish the misleading problem rate. The primary point of
utilizing AI strategies is to make IDS that requires less human information and further develop
exactness. The amount of Helps which utilizes AI procedures has been expanding over the
most recent couple of years. The fundamental target of IDS in light of AI research is to
distinguish examples and fabricate an interruption discovery framework in view of the dataset.
For the most part, there are two classes of AI strategies, regulated and unaided. Internet of
Things (IoT) are interconnected systems of devices that facilitate seamless information
exchange between physical devices. These devices could be medical and healthcare devices,
driverless vehicles, industrial robots, smart TVs, wearables and smart city infrastructures; and
they can be remotely monitored and regulated. IoT devices are expected to become more
prevalent than mobile devices and will have access to the most sensitive information, such as
personal information. This will result in increasing attack surface area and probabilities of
attacks will increase. As security will be a vital supporting element of most IoT applications,
IoT intrusion detection systems need also be developed to secure communications enabled by
such IoT technologies (Granjal et al., 2015).
In the last few years, advancement in Artificial Intelligent (AI) such as machine learning and
deep learning techniques has been used to improve IoT IDS (Intrusion Detection System). The
current requirement is to do an up-to-date, thorough taxonomy and critical review of this recent
work. Numerous related studies applied different machine learning and deep learning
techniques through various datasets to validate the development of IoT IDS. But, it’s still not
clear that which dataset, machine learning or deep learning techniques are more effective for
building an efficient IoT IDS. Secondly, the time consumed in building and testing IoT IDS is
not considered in the evaluation of some IDSs techniques, despite being a critical factor for the
effectiveness of ‘on-line’ IDSs (Khraisat et al., 2019a).
This paper provides an up to date taxonomy, together with a critical review of the significant
research works on IoT IDSs up to the present time; and a classification of the proposed systems
according to the taxonomy. It provides a structured and comprehensive overview of the existing
IoT IDSs so that a researcher can become quickly familiar with the key aspects of IoT IDS.
This paper also provides a critical review of machine learning and deep learning techniques
applied to build IoT IDS. The detection techniques, validation strategies, deployment strategies
are reviewed, along with several techniques used in each method. The complexity of different
detection techniques, intrusion deployment strategy, and their evaluation techniques are
discussed, followed by a set of suggestions identifying the best techniques, depending on the
nature of the IoT IDS. Challenges for the current IoT IDSs are also discussed. Compared to
previous survey publications (Khraisat et al., 2019a; Benkhelifa et al., 2018; Chaabouni et
al., 2019; Zarpelao et al., 2017; Hindy et al., 2018) this paper presents a discussion on IoT
techniques, IoT deployment strategy and IDS dataset problems which are of main concern to
the research community in the area of IoT intrusion detection systems (IDS). Prior studies such
as (Yang et al., 2017; Yar & Steinmetz, 2019) have not completely reviewed IoT IDSs in terms
of the datasets, challenges and techniques. In this paper, we provide a structured and
contemporary, wide-ranging study on IDS in terms of techniques, IoT attacks and datasets; and
also highlight challenges of the IoT techniques and then make recommendations.
During the last few years, several surveys on IoT IDS have been published. Table 1 shows the
IDS techniques and datasets covered by this survey and previous survey papers. The
comparison that in this table discusses the contributions of each survey related to the develop
intrusion detection system for IoT. The survey on intrusion detection systems and taxonomy
by Axelsson (Axelsson, 2000) classified intrusion detection systems based on the detection
methods. The highly cited survey by Debar et al. (Debar et al., 2000) surveyed detection
methods based on the behaviour and knowledge profiles of the attacks. A taxonomy of IoT
intrusion systems by Liao et al. (Liao et al., 2013a), has presented a classification of five
subclasses with an in-depth perspective on their characteristics: Statistics-based, Pattern-based,
Rule-based, State-based and Heuristic-based.
The highly cited survey by Alvarenga et al. (Zarpelao et al., 2017) reviews the IoT security
issues in general. Attacks against IoT devices are not discussed in their studies, such as Denial
of Service (DoS) Attack and attack on RPL (Routing Protocol for Low-Power and Lossy
Networks). Critical Infrastructure such as power systems, transport, the internet, air traffic
control, railways and power plants could all be disrupted by an IoT attacker. The authors
reviewed intrusion detection in IoT, and they presented a great taxonomy to classify the IoT
IDSs based on detection method, IDS placement strategy, and security threat and validation
strategy. It was also indicated by Alvarenga et al. (Zarpelao et al., 2017) in 2017 that intrusion
detection for IoT is still in an initial stage and that the existing IDSs do not enough for a wide
variety of IoT attacks. This paper explored and discussed if the recent IoT IDSs are enough to
deal with different IoT attacks.
Existing review articles (e.g., such as (Chaabouni et al., 2019; da Costa et al., 2019; Buczak &
Guven, 2016; Lunt, 1988; Agrawal & Agrawal, 2015)) focus on intrusion detection techniques
or dataset issue or type of computer attack and IDS evasion. No articles comprehensively
reviewed IoT IDS, dataset problems, deployment strategies, IoT Intrusion techniques, and
different kinds of attack altogether. In addition, the development of IoT IDS has been such that
several different systems have been proposed in the meantime, and so there is a need for an up-
to-date. The updated survey of the taxonomy of IoT IDS discipline is presented in this paper
further enhances taxonomies given in (Khraisat et al., 2019a; Benkhelifa et al., 2018;
Chaabouni et al., 2019; Liao et al., 2013a).
Given the discussion on prior surveys, this article focuses on the following:
Classifying various kinds of IoT IDS based on intrusion techniques, deployment
strategy, and validation strategy.
Presenting a recent works effort to improve IoT security IDS.
Taxonomy of IoT attacks.
Discussion of available IDS datasets.
The challenges of IoT IDS.
Intrusion detection in the internet of things
In this section, a review of the existing IDS research for IoT is presented. Each research was
categorized by considering the following characteristics: IDS placement strategy, detection
method, and validation strategy. Figure 1 shows the classification of IDS for IoT networks,
while Table 1 provides some recent related research.
Figure 1 shows the IDS techniques, deployment strategy, validation strategy, attacks on IoT
and datasets covered by this paper and previous research papers. The variety in the IoT IDS
surveys indicates that a study of IDS for IoT must be reviewed. Specifically, none of these
surveys cover all detection methods of IoT, which is considered crucial because of the
heterogeneous nature of the IoT ecosystem. For that reason, this survey review IDS for IoT
from a broad technological scale.
IoT intrusion detection systems methods
IoT Intrusion is defined as an unauthorised action or activity that harms the IoT ecosystem. In
other words, an attack that results in any kind of damage to the confidentiality, integrity or
availability of information is considered an intrusion. For example, an attack that will make the
computer services unavailable to its legitimate users is considered an intrusion. An IDS is
defined as a software or hardware system that maintains the security of the system by
identifying malicious activities on the computer systems (Liao et al., 2013a). The main aim of
IDS is to identify unauthorised computer usage and malicious network traffic which is not
possible while using a traditional firewall. This results in making the computer systems highly
protective against the malicious actions that compromise the availability, integrity, or
confidentiality of computer systems. IDS system has two main sub-categories: Signature-based
Intrusion Detection System (SIDS) and Anomaly-based Intrusion Detection System (AIDS).
Signature intrusion detection systems (SIDS) utilize pattern matching techniques to find a
known attack; these are also known as Knowledge-based Detection or Misuse Detection
(Khraisat et al., 2018). In SIDS, matching methods are used to find a previous intrusion. In
other words, when an intrusion signature matches the signature of a previous intrusion that
already exists in the signature database, an alarm signal is triggered. For SIDS, the host’s logs
are inspected to find sequences of commands or actions which have previously been identified
as malware. SIDS has also been labelled in the literature as Knowledge-Based Detection or
Misuse Detection (Modi et al., 2013).
Figure 2 demonstrates the conceptual working of SIDS approaches. The main idea is to build
a database of intrusion signatures and to compare the current set of activities against the
existing signatures and raise the alarm if a match is found. For example, a rule in the form of
“if: antecedent -then: consequent” may lead to “if (source IP address=destination IP address)
then label as an attack “.
AIDS has attracted a lot of scholars because of its feature to overcome the limitation of SIDS.
In AIDS, a normal model of the behavior of a computer system is created using machine
learning, statistical-based or knowledge-based methods. Any significant deviation between the
observed behavior and the model is regarded as an anomaly, which can be interpreted as an
intrusion. This kind of technique works on the fact that malicious behaviour is different from
typical user behaviour. The behaviour of abnormal users that differentiates from the standard
behaviour is defined as an intrusion. There are two phases in the development of AIDS: the
training phase and the testing phase. In the training phase, the normal traffic profile is used to
learn a model of normal behaviour. In the testing phase, a new data set is used to develop the
system’s capacity to generalise to previously unseen intrusions. AIDS can be sub-categorized
based on the method used for training, for instance, statistical-based, knowledge-based and
machine learning-based (Butun et al., 2014).
The main advantage of AIDS is the ability to identify zero-day attacks because recognizing the
abnormal user activity does not rely on a signature database (Alazab et al., 2012). AIDS
triggers a danger signal when the examined behavior deviates from normal behavior.
Furthermore, AIDS has a number of benefits. First, they can discover internal malicious
activities. If an intruder starts making transactions in a stolen account that are unidentified in
the typical user activity, it creates an alarm. Second, it is challenging for a cybercriminal to
recognize what is a normal user behavior without producing an alert as the system is
constructed from customized profiles.
Table 2 presents the differences between signature-based detection and anomaly-based
detection. The main difference between these two is that AIDS can discover zero-day attacks,
whereas SIDS can only detect previously known intrusions. However, AIDS can result in a
high false-positive rate because anomalies may just be new normal activities rather than
genuine intrusions.
Since there is a lack of a taxonomy for anomaly-based intrusion detection systems, we have
identified five subclasses based on their features: Statistics-based, Pattern-based, Rule-based,
State-based and Heuristic-based as shown in Table 3.
This section presents an overview of AIDS approaches proposed in recent years for improving
detection accuracy and reducing false alarms.
AIDS methods can be categorized into four main groups: supervised learning (Chao et
al., 2015), unsupervised learning (Elhag et al., 2015; Can & Sahingoz, 2015), reinforcement
learning and deep learning (Buczak & Guven, 2016; Meshram & Haas, 2017). Supervised
learning involves collecting and examining every input variable and an output variable and you
use an algorithm to learn the normal user behaviour from the input to the output. The objective
is to approximate the mapping function so well that when a new input record is collected that
predicts the output variables for that record. On the other hand, Unsupervised learning tries to
identify the requested actions from existing system data such as protocol specifications and
network traffic instances where you only have input data and no corresponding output
variables, while reinforcement learning methods enable an agent to learn in an interactive
environment by trial and error using feedback from its own actions and experiences. In
reinforcement learning, the aim is to obtain an appropriate action model that would maximize
the total cumulative reward of the agent. Deep learning models are based on artificial neural
networks, specifically convolutional neural networks (CNN)s. These four classes along with
examples of their subclasses are shown in Fig. 3.
Machine learning is the process of extracting knowledge from large quantities of data. Machine
learning models comprise of a set of rules, methods, or complex “transfer functions” that can
be applied to find interesting data patterns or to recognise or predict behaviour (Dua &
Du, 2016). Machine learning techniques have been applied extensively in the area of AIDS. To
extract the knowledge from intrusion datasets, different algorithms and techniques such as
clustering, neural networks, association rules, decision trees, genetic algorithms, and nearest
neighbour methods are utilized.
Some prior research has examined the use of different techniques to build AIDSs. Chebrolu et
al. examined the performance of two feature selection algorithms involving Bayesian networks
(BN) and Classification Regression Trees (CRC) and combined these methods for higher
accuracy (Chebrolu et al., 2005).
Bajaj et al. proposed a technique for feature selection using a combination of feature selection
algorithms such as Information Gain (IG) and Correlation Attribute evaluation. They tested the
performance of the selected features by applying different classification algorithms such as
C4.5, naïve Bayes, NB-Tree and Multi-Layer Perceptron (Khraisat et al., 2018; Bajaj &
Arora, 2013). A genetic-fuzzy rule mining method has been used to evaluate the importance of
IDS features (Elhag et al., 2015). Thaseen et al. proposed NIDS by using the Random Tree
model to improve accuracy and reduce the false alarm rate (Thaseen & Kumar, 2013).
Subramanian et al. proposed classifying the NSL-KDD dataset using decision tree algorithms
to construct a model for their metric data and studying the performance of decision tree
algorithms (Subramanian et al., 2012).
Various AIDSs have been created based on machine learning techniques as shown in Fig. 4.
The main aim of using machine learning methods is to create IDS that requires less human
knowledge and improve accuracy. The quantity of AIDS which makes use of machine learning
techniques has been increasing in the last few years. The main objective of IDS based on
machine learning research is to detect patterns and build an intrusion detection system based
on the dataset. Generally, there are two categories of machine learning methods, supervised
and unsupervised.
This subsection presents various supervised learning techniques for IDS. Each technique is
presented in detail, and references to important research publications are presented.
Supervised learning-based IDS techniques detect intrusions by using labeled training data. A
supervised learning approach usually consists of two stages, namely, training and testing. In
the training stage, relevant features and classes are identified and then the algorithm learns
from these data samples. In supervised learning IDS, each record is a pair, containing a network
or host data source and an associated output value (i.e., label), namely intrusion or normal.
Next, feature selection can be applied to eliminating unnecessary features. Using the training
data for selected features, a supervised learning technique is then used to train a classifier to
learn the inherent relationship that exists between the input data and the labelled output value.
A wide variety of supervised learning techniques have been explored in the literature, each
with its advantages and disadvantages. In the testing stage, the trained model is used to classify
the unknown data into intrusion or normal class. The resultant classifier then becomes a model
that, given a set of feature values, predicts the class to which the input data might belong.
Figure 5 shows a general approach for applying classification techniques. The most existing
IDSs proposed are trained in a supervised way. It implies that the cybersecurity professional
need to label the network traffic and revise the model manually from time to time.
There are many classification methods such as decision trees, rule-based systems, neural
networks, support vector machines, naïve Bayes and k-nearest neighbour. Each technique uses
a learning method to build a classification model. However, a suitable classification approach
should not only handle the training data, but it should also identify accurately the class of
records it has not ever seen before. Creating classification models with reliable generalization
ability is an important task of the learning algorithm.
Decision trees
A decision tree has three basic components. The first component is a decision node, which is
used to identify a test attribute. The second is a branch, where each branch represents a possible
decision based on the value of the test attribute. The third is a leaf that comprises the class to
which the instance belongs (Rutkowski et al., 2014). There are many different decision tree
algorithms, including ID3 (Quinlan, 1986), C4.5 (Quinlan, 2014) and CART (Breiman, 1996).
Naïve Bayes
This approach is based on applying Bayes' principle with robust independence assumptions
among the attributes. Naïve Bayes answers questions such as “what is the probability that a
particular kind of attack is occurring, given the observed system activities?” by applying
conditional probability formulae. Naïve Bayes relies on the features that have different
probabilities of occurring in attacks and normal behavior. The naïve Bayes classification model
is one of the most prevalent models in IDS due to its ease of use and calculation efficiency,
both of which are taken from its conditional independence assumption property (Yang &
Tian, 2012). However, the system does not operate well if this independence assumption is not
valid, as was demonstrated on the KDD’99 intrusion detection dataset, which has complex
attribute dependencies (Koc et al., 2012). The results also reveal that the Naïve Bayes model
has reduced accuracy for large datasets. A further study showed that the more sophisticated
Hidden Naïve Bayes (HNB) model can be applied to IDS tasks that involve high
dimensionality, extremely interrelated attributes and high-speed networks (Koc et al., 2012).
ANN is one of the most broadly applied machine-learning methods and has been shown to be
successful in detecting different malware. The most frequent learning technique employed for
supervised learning is the backpropagation (BP) algorithm. The BP algorithm assesses the
gradient of the network’s error with respect to its modifiable weights. However, for ANN-
based IDS, detection precision, particularly for less frequent attacks, and detection accuracy
still need to be improved. The training dataset for less-frequent attacks is small compared to
that of more-frequent attacks, and this makes it difficult for the ANN to learn the properties of
these attacks correctly. As a result, detection accuracy is lower for less frequent attacks. In the
information security area, huge damage can occur if low-frequency attacks are not detected.
For instance, if the User to Root (U2R) attacks evade detection, a cybercriminal can gain the
authorization privileges of the root user and thereby carry out malicious activities on the
victim’s computer systems. In addition, less common attacks are often outliers (Wang et
al., 2010). ANNs often suffer from local minima and thus learning can become very time-
consuming. The strength of ANN is that, with one or more hidden layers, it can produce highly
nonlinear models that capture complex relationships between input attributes and classification
labels. With the development of many variants such as recurrent and convolutional NNs, ANNs
are powerful tools in many classification tasks including IDS.
Fuzzy logic
This technique is based on the degrees of uncertainty rather than the typical true or false
Boolean logic on which the contemporary PCs are created. Therefore, it presents a
straightforward way of arriving at a conclusion based upon unclear, ambiguous, noisy,
inaccurate or missing input data. With a fuzzy domain, fuzzy logic permits an instance to
belong, possibly partially, to multiple classes at the same time. Therefore, fuzzy logic is a good
classifier for IDS problems as the security itself includes vagueness, and the borderline between
the normal and abnormal states is not well identified. In addition, the intrusion detection
problem contains various numeric features in the collected data and several derived statistical
metrics. Building IDSs based on numeric data with hard thresholds produces high false alarms.
An activity that deviates only slightly from a model could not be recognized, or a minor change
in normal activity could produce false alarms. With fuzzy logic, it is possible to model this
minor abnormality to keep the false rates low. Elhag et al. showed that with fuzzy logic, the
false alarm rate in determining intrusive actions could be decreased. They outlined a group of
fuzzy rules to describe the normal and abnormal activities in a computer system, and a fuzzy
inference engine to define intrusions (Elhag et al., 2015).
HMM is a statistical Markov model in which the system being modeled is assumed to be a
Markov process with unseen data. Prior research has shown that HMM analysis can be applied
to identify particular kinds of malware (Annachhatre et al., 2015). In this technique, a Hidden
Markov Model is trained against known malware features (e.g., operation code sequence) and
once the training stage is completed, the trained model is applied to score the incoming traffic.
The score is then contrasted to a predefined threshold, and a score greater than the threshold
indicates malware. Likewise, if the score is less than the threshold, the traffic is identified as
normal.
K-Nearest Neighbors (KNN) classifier: The k-Nearest Neighbor (k-NN) technique is a typical
non-parametric classifier applied in machine learning (Lin et al., 2015). The idea of these
techniques is to name an unlabelled data sample to the class of its k nearest neighbors (where
k is an integer defining the number of neighbors to be considered). Figure 6 illustrates a K-
Nearest Neighbors classifier where k = 5. The point X represents an instance of unlabelled data
that needs to be classified. Amongst the five nearest neighbors of X, there are three similar
patterns from the class Intrusion and two from the class Normal. Taking a majority vote enables
the assignment of X to the Intrusion class.
Fig: 5.6 An example of classification by k-Nearest Neighbour for k = 5
k-NN can be appropriately applied as a benchmark for all the other classifiers because it
provides a good classification performance in most IDSs (Lin et al., 2015).
Ensemble methods
Multiple machine learning algorithms can be used to obtain better predictive performance than
any of the constituent learning algorithms alone (Vasan et al., 2020a). Training several
classifiers at the same stage to detect different attacks, and then uniting their result to increase
the detection rate. Typically, the ensemble’s ability is better than a single classifier’s, as it can
enhance weak classifiers to produce better results than can a solitary classifier (Aburomman &
Reaz, 2017). Several different ensemble methods have been proposed, such as Boosting,
Bagging and Stacking. Boosting refers to a family of algorithms that can transform weak
learners into strong learners. Bagging means training the same classifier on different subsets of
the same dataset. Stacking combines various classification via a meta-classifier (Aburomman
& Ibne Reaz, 2016). The base-level models are built based on a whole training set, and then
the meta-model is trained on the outputs of the base level model as attributes.
Researchers have revealed that the combination of different classifier techniques is an effective
way to resolve the shortcomings traditional IDSs have when they are applied for IoT. Jabbar et
al. proposed an ensemble classifier that is built using Random Forest and also the Average
One-Dependence Estimator (AODE which solves the attribute dependency problem in the
Naïve Bayes classifier. Random Forest (RF) enhances precision and reduces false alarms
(Jabbar et al., 2017). It is combining both approaches in ensemble results in improved accuracy
over either technique applied independently.
More recently, Khraisat, et al. (Khraisat et al., 2019b) proposed a stacking ensemble method
that combined the C5 decision tree classifier and one-class support vector machine. The
reported classification accuracy of detection of malware is 94% on the IoT intrusion dataset C5
decision tree classifier, while it is 92.5% in stage two. They reported in the stacking ensemble,
and the classification accuracy was 99.97%.
Unsupervised learning is a kind of machine learning that makes use of input datasets without
class labels to extract interesting information. The input data points are normally treated as a
set of random variables. A joint density model is then created for the data set. In supervised
learning, the output labels are given and used to train the machine to get the required results
for an unseen data point. In contrast, in unsupervised learning, no labels are given, and instead,
the data is grouped automatically into various classes through the learning process. In the
context of developing an IDS, unsupervised learning means, use of a mechanism to identify
intrusions by using unlabelled data to train the model. IoT network traffic is clustered into
groups, based on the similarity of the traffic, without the need to pre-define these groups.
As shown in Fig. 7, once records are clustered, all of the cases that appear in small clusters are
labeled as an intrusion because the normal occurrences should produce sizable clusters
compared to the anomalies. In addition, malicious intrusions and normal instances are
dissimilar, thus they do not fall into an identical cluster.
Fig: 5.7 Using Clustering for Intrusion Detection
K-means
The K-means technique is one of the most prevalent techniques of clustering analysis that aims
to separate ‘n’ data objects into ‘k’ clusters in which each data object is selected in the cluster
with the nearest mean. K means it is an iterative clustering algorithm that aids to obtain the
highest value for every iteration. It is a distance-based clustering technique and it does not need
to compute the distances between all combinations of records. It applies a Euclidean metric as
a similarity measure. The number of clusters is determined by the user in advance. Typically,
several solutions will be tested before accepting the most appropriate one. Annachhatre et al.
used the K-means clustering algorithm to identify different host behaviour profiles
(Annachhatre et al., 2015). They have proposed new distance metrics that can be used in the k-
means algorithm to relate the clusters closely. They have clustered data into several clusters
and associated them with known behaviour for evaluation. Their outcomes have revealed that
k-means clustering is a better approach to classify the data using unsupervised methods for
intrusion detection when several kinds of datasets are available. Clustering could be used in
IDS for reducing intrusion signatures, generate a high-quality signature or similar group
intrusion.
Probabilistic clustering
is a common method for obtaining a set of low dimensional features from the largest set of
features.
Hierarchical clustering
This is a clustering technique that aims to create a hierarchy of clusters. Approaches for
hierarchical clustering are normally classified into two categories:
1. (i)
Agglomerative- bottom-up clustering techniques where clusters have sub-clusters,
which in turn have sub-clusters and pairs of clusters are combined as one moves up the
hierarchy.
2. (ii)
Divisive - hierarchical clustering algorithms where iteratively the cluster with the
largest diameter in feature space is selected and separated into binary sub-clusters with
a lower range.
It is used for showing hidden factors that underlie sets of random features.
A lot of work has been done in the area of the cyber-physical control system (CPCS) with
attack detection and reactive attack mitigation by using unsupervised learning. For example, a
redundancy-based resilience approach was proposed by Alcara (Alcaraz, 2018). He proposed
a dedicated network sublayer that can handle the context by regularly collecting consensual
information from the driver nodes controlled in the control network itself, and discriminating
view differences through data mining techniques such as k-means and k-nearest neighbor. Chao
Shen et al. proposed Hybrid-Augmented device fingerprinting for IDS in Industrial Control
System Networks. They used different machine learning techniques to analyse network packets
to filter anomaly traffic to detect intrusions in ICS networks (Shen et al., 2018).
Likewise, Khraisat, et al. (Khraisat et al., 2020) experimented with both single and ensemble
classifiers composed of the decision tree, and SVM, for classification of the NLS KDD
intrusion detection evaluation data set. They found that an ensemble of all three classifiers,
based on majority voting, marginally out-performed all other classifiers.
Reinforcement learning
Deep Reinforcement learning utilizes deep learning and reinforcement learning principles for
building IDS. Reinforcement learning involves an agent interacting with an environment. The
agent is trying to achieve a goal of some kind within the environment. The purpose of the agent
is to learn how to interact with its environment in such a way that allows it to achieve its goals.
Deep reinforcement learning is the application of reinforcement learning to train deep neural
networks. It has an input layer, an output layer, and multiple hidden layers same as prior deep
neural networks. However, our input is the state of the environment. For instance, a bus is
trying to get its passengers to their destination. The inputs are the position, speed, and direction;
our output is a series of possible actions like speed up, slow down, turn left, or turn right. In
addition, we’re feeding our rewards signal into the network so that we can learn to associate
what actions produce positive results given a specific state of the environment.
Deep Q-network
It is combined reinforcement learning and deep neural networks at scale. The algorithm was
developed by enhancing a classic RL algorithm called Q-Learning with deep neural networks.
Double Q-learning
Deep learning
Deep learning is a form of machine learning where a computer uses a hierarchy of data based
on experience and form multiple layers as an output. Deep learning can be supervised as well
as unsupervised. In the case of supervised deep learning, data can be classified whereas in the
case of unsupervised deep learning data patterns are analyzed. Deep learning is directly related
to artificial intelligence where machines will acquire knowledge by learning with experience
and will replace human intelligence. Deep learning works on the platform of artificial neural
networks by studying massive amounts of data with the help of algorithms prepared by human
intelligence. It is referred to as ‘deep learning’ as the artificial neural networks possess different
deep layers that enables them to learn. Table 4 shows a Comparison of Machine learning and
deep learning. Table 5 shows a summary of the deep learning model techniques.
In neural networks, each neural node of every single hidden layer calculates the weighted
values receiving from the previous layer and passes on the output values to the subsequent
layer. The result value of the last layer can be considered as the final results achieved by the
neural networks from the raw data.
Fully Connected Feedforward Neural Networks are the standard network architecture applied
in mainly basic neural network applications. Fully connected denotes that an individual neuron
in the earlier layer is linked to every neuron in the subsequent layer. Feedforward indicates that
neurons in any preceding layer are only ever connected to the neurons in a subsequent layer.
Fully Connected Neural Networks can be used for feature extraction (Wang et al., 2020).
The recurrent neural network can function efficiently on a series of data with variable input
length. This means that RNNs use the information of its prior state as an input for their current
prediction, and we can repeat this process for an arbitrary number of steps allowing the network
to propagate information via its hidden state through time. This is essentially like giving a
neural network a short-term memory. This feature makes RNNs very effective for working
with sequences of data that occur over time. Yin, et al. (Yin et al., 2017) proposed a deep
learning approach for intrusion detection using recurrent neural networks (RNN-IDS). their
experimental results show that RNN-IDS is very appropriate for creating IDS with high
accuracy and that its performance is superior to that of traditional machine learning
classification methods in both binary and multiclass classification.
The Generative Adversarial Network is an integration of two deep learning neural networks:
Generator Network, and a Discriminator Network. The Generator Network produces synthetic
data, and the Discriminator Network tries to detect if the data that it’s seeing is real or synthetic.
These two networks are adversaries in the sense that they’re both competing to beat one
another.
A Convolutional Neural Network is contained of one or more convolutional layers and then
linked by one or more fully connected layers as in a standard multilayer neural network (Vasan
et al., 2020b). A convolutional neural network contains an input and an output layer, as well as
multiple hidden layers. The hidden layers of a CNN typically contain a sequence of
convolutional layers that convolve with a multiplication. A CNN receives a 2-D input and
abstracts high-level features via a sequence of hidden layers. CNN’s, which better upon the
architecture of the common neural networks, benefit from spatial features (Vasan et al., 2020c).
Spatial features are usually applied types of traffic features in the area of IDS. When applying
spatial features, network traffic is reformed into traffic images; it follows that the image
classification technique is used to categorize the traffic images, which also ultimately achieves
the objective of detecting the intrusion traffic. This technique is comparatively recent, but then
numerous recent research results prove its great potential. For example, Vasan, et al. (Vasan et
al., 2020b) adopted CNNS techniques and transformed the raw malware binary into both
grayscale and colour images and apply the fine-tuned CNN architecture.
Autoencoder
An autoencoder is trained to restructure its inputs. Autoencoders have been used for developing
online IoT IDS (Mirsky et al., 2018). In general, an autoencoder trained on X gains the
capability to restructure unobserved instances from the identical data distribution as X. If an
instance does not be appropriate to the model learned from X, then it is expected the restructure
to have a high error.
IDS can also be classified based on the deployment used to detect IoT attacks. In IDS
Deployment strategies, IDS can be classified as distributed, centralized or hybrid.
Distributed IDS
In distributed placement, the IoT devices could be responsible for checking other IoT devices.
Distributed IDS be made up of several IDS over a big IoT ecosystem, all of which communicate
with each other, or with a central server that assists advanced intrusion detection systems,
packet analysis, and incident response.
Several IDS deploy distributed architectures. This includes a subset of the network checking
the other nodes. Distributed IDS offers the incident analyst many advantages over centralized
IDS. The main benefit is the capability to identify attack forms across a whole IoT ecosystem.
This might increase prompt IoT attack prevention and detection. The additional supported
benefit is to allow early detection of an IoT Botnet creating its way through corporate IoT
devices. This data could then be used to detect and clean systems that have been infected by
the IoT Botnet and stop further spread of the Botnet into the IoT ecosystem consequently take
down any IoT devices damaged that would otherwise have occurred. Furthermore, the
advantage of distributed IDS rather than centralized IDS computing resources also implies
reduced control over those resources.
Centralized IDS
In the centralized IDS location, the IDS is placed in central devices, for instance, in the
boundary switch or a nominated device. All the information that the IoT devices collect and
then send to the network boundary switch passes through the boundary switch (Benkhelifa et
al., 2018). Consequently, the IDS positioned in a boundary switch can check the packets
switched between the IoT devices and the network. Despite this, checking the network packets
that pass through the boundary switch is not adequate to identify anomalies that affect the IoT
devices. The network traffic is monitored in centralized IDS. This traffic is extracted from the
network through different network data sources such as packet capture, NetFlow, etc. The
computers connected in a network can be monitored by Network-based IDS. Moreover, NIDS
is also capable of monitoring the external malicious activities that could have been commenced
from an external threat at an earlier stage, before these threats expand to other computer
systems. However, NIDS comes with some limitations such as its restricted ability to inspect
the whole data in a high bandwidth network because of the volume of data passing through
modern high-speed communication networks (Bhuyan et al., 2014). NIDS deployed at several
positions within a particular network topology, together with HIDS and firewalls, can provide
a concrete, resilient, and multi-tier protection against both external and insider attacks. shows
a summary of comparisons between IDS deployment strategies.
Data source consists of system calls, application program interfaces, log files, data packets that
are extracted from well-known attacks. These data sources can be useful to classify intrusion
behaviors from abnormal actions.
Hierarchical IDS
In Hierarchical IDS, the network is separated into clusters. The sensor nodes that are adjacent
to each other typically belong to the same cluster. Each cluster is assigned a leader, the so-
called cluster head that screens the member nodes and plays a part in network-wide analyses.
IDS Validation is the process for determining whether the IoT IDS model is an accurate enough
representation of the system, for detecting IoT attacks. To validate the effectiveness of IDSs,
researchers have used different techniques such as theoretical, empirical, and hypothetical
strategies for validating their techniques.
There are many classification metrics for IDS, some of which are known by multiple names.
shows the confusion matrix for a two-class classifier which can be used for evaluating the
performance of an IDS. Each column of the matrix represents the instances in a predicted class,
while each row represents the instances in an actual class.
IDS are typically evaluated based on the following standard performance measures:
True Positive Rate (TPR): It is calculated as the ratio between the number of correctly
predicted attacks and the total number of attacks. If all intrusions are detected then the
TPR is 1 which is extremely rare for an IDS. TPR is also called a Detection Rate (DR)
or the Sensitivity. The TPR can be expressed mathematically as
���=����+��
False Positive Rate (FPR): It is calculated as the ratio between the number of normal instances
incorrectly classified as an attack and the total number of normal instances.
���=����+��
False Negative Rate (FNR): False negative means when a detector fails to identify an
anomaly and classifies it as normal. The FNR can be expressed mathematically as:
���=����+��
Classification rate (CR) or Accuracy: The CR measures how accurate the IDS is in
detecting normal or anomalous traffic behavior. It is described as the percentage of all
those correctly predicted instances to all instances:
��������=��+����+��+��+��
Receiver Operating Characteristic (ROC) curve: ROC has FPR on the x-axis and TPR on the
y-axis. In the ROC curve, the TPR is plotted as a function of the FPR for different cut-off
points. Each point on the ROC curve represents an FPR and TPR pair corresponding to a certain
decision threshold. As the threshold for classification is varied, a different point on the ROC is
selected with different False Alarm Rate (FAR) and different TPR. A test with perfect
discrimination (no overlap in the two distributions) has a ROC curve that passes through the
upper left corner (100% sensitivity, 100% specificity).
Cho et al. proposed a methodology for checking packets that are passing through the border
router for communication between physical and network devices. Their methodology is based
on the botnet attacks by checking the packet length (Cho et al., 2009). However, no information
is presented about the technique employed to create a normal behaviour profile. It is also not
clear how the proposed IDS techniques would work on resource constraints nodes in the IoT.
Rathore et al. proposed semi-supervised Fuzzy learning-based distributed attack detection
framework for IoT (Rathore & Park, 2018). The evaluation was done on the NSL-KDD dataset
and consequently suffered from the same limitations concerning the dataset as mentioned
above.
Hodo et al. use an Artificial Neural Network (ANN) to detect DDoS and DoS attacks against
legitimate IoT network traffic. The proposed ANN model was tested with the use of a simulated
IoT network. Hoda et al. proposed a threat analysis of IoT using ANN to detect DDoS/DoS
attacks. A multi-level perceptron, a type of supervised ANN, is trained using internet packet
traces and then the model is assessed on its ability to thwart (DDoS/DoS) attacks (Hodo et
al., 2016). Hoda et al. did not consider effectiveness after the deployment of the proposed IDS
in the IoT ecosystem on low-capacity devices. According to their experimentation, the system
achieved an accuracy of 99.4% for DDoS/DoS. However, no details of the dataset are provided.
Diro et al. developed an IoT network attack detection system based on distributed deep
learning. Their work showed that distributed attack detection could identify IoT attacks better
than a centralized strategy with a 96% detection rate. Their approach was evaluated using the
NLS-KDD dataset. Even though this dataset is another version of the KDD data set, it still
suffers from various issues reviewed by McHugh (McHugh, 2000). We believe this dataset
should not be used as a practical benchmark dataset in the IoT as this data was collected from
the traditional network (Diro & Chilamkurti, 2018). This leads us to develop IDSs that take
into consideration the specific requirement of IoT protocol such as (Low-power Wireless
Personal Area Networks) 6LowPAN. Hence, the Intrusion detection system that is created for
the IoT ecosystem should operate under rigorous settings of low processing ability, high-speed
connection, and big capacity data processing.
Moustafa et al. proposed an ensemble of IDSs to detect abnormal activities, in specific botnet
attacks against Domain Name System (DNS), Hypertext Transfer Protocol (HTTP) and
Message Queue Telemetry Transport (MQTT) (Moustafa et al., 2019). Their ensemble
methods are based on the AdaBoost learning method and they used three machine learning
techniques: Artificial Neural Networks (ANN), Decision Tree (DT) and Naive Bayes (NB) to
evaluate their methodology (Moustafa et al., 2019). The proposed IDS result in significant
overhead which degrades its performance.
Cervantes, et al. proposed IDS for detecting sinkhole attacks on 6LoWPAN for the IoT. Their
IDS approach applies a combination of anomaly detection and support vector machine (SVM).
IDS during the training process, each IDS agent trains the SVM and executes a majority voting
decision to mark the infected nodes (Cervantes et al., 2015). Their simulation results show that
their IDS achieve a sinkhole detection rate of up to 92% on the fixed scenario and 75% in a
mobile scenario. However, their approach has not been evaluated for other types of attacks in
the IoT.
Khraisat, et al. (Khraisat et al., 2019b) proposed an ensemble Hybrid Intrusion Detection
System (HIDS) by combining a C5 classifier and a One-Class Support Vector Machine
classifier. C5 classifier is used to detect well know intrusion. One-Class Support Vector
Machine classifier is used to detect a new attack.
Attacks on IoT ecosystem
As IoT technology involves many devices like sensors, processors and many other
technologies, the purpose of sharing the data and connecting to other networks has been served
successfully. As it involves many devices connected, the data shared may not be secure and
the security concern raises. IoT Security refers to protect the information shared among
different networks through IoT devices using IoT technology. These devices are connected to
others using the internet which allows vulnerabilities to take place by allowing the hacker to
hack the data. Data without the security will lead to many concerns and brings huge loss for
many industries and even to the individuals ending with the loss of the data from their systems
(Khraisat et al., 2019b).
IoT grabbed the attention of the people and the organizations from many sectors onto it, by
providing extreme benefits to them. Along with its tremendous growth, some security issues
have risen by which IoT attacks have taken place by preventing people to use many of its
upcoming applications. Hence, this section report discusses the concept of IoT security, the
Challenge of IoT security, the impacts of them followed by the IoT attack and its types. IoT
devices can be accessed from any place within a trusted network. So, there are chances of lots
of malicious attacks in the IoT network. Hence, security, privacy, and confidentiality issues
must be appropriately addressed in the IoT to protect it from malicious attacks. For example,
the attacking of traffic lights and driverless vehicles not only reasons chaos and rises
contamination, but also can initiate harm and severe collisions leading to wounded.
Different devices and equipment of home and office can be virtually connected with the help
of the internet to left them they can perform their activities by monitoring the device’s remote.
Figure 8 shows the IoT system architecture with layers where attacks can occur. An IoT system
can comprise three fundamental layers which are the perception layer, network layer, and
application layer (Liao et al., 2013a). The perception layer is the lowest layer of the
conventional architecture of IoT. This layer consists of devices, sensors, and controllers. This
layer’s fundamental task is to gather valuable information from IoT sensors systems.
Fig: 5.8 IoT architecture and layer attacks
In the network layer, IoT involves a variety of diverse networks such as WSNs, wireless mesh
networks, WLAN, etc. These networks help sensors in IoT exchange information. A gateway
can simplify the communication of several sensors over the network. Thus, a gateway could be
beneficial to handle many complex aspects involved in communication on the network. The
network layer ensures the successful transmission of data while the application layer is the
highest layer that processes the data for visualization.
In the application layer, the data source can be obtained from Internet Service Provider (ISP)
and mobile network providers’ web-based services, virtual online identities, edge network,
devices logs, Radio-Frequency Identification (RFID) tags, and readers, etc.
Most of the attackers’ target IoT devices and equipment rather than a single PC. IoT has an
interconnection of various devices and equipment along with some embedded devices as well.
The major causes of IoT as a malware target can be summarized below:
All the devices and equipment in an IoT need to be always on and it is easy for attackers
to assess that equipment where the power mode is on at any point in time.
Devices and equipment interconnected in an IoT are always connected and the attackers
may access the interconnected devices from a single device.
In most cases, proper security measures and knowledge to defend and tackle attack in
a whole set of interconnected devices is difficult than in a single PC.
Lack of proper encryption features in the interconnecting devices and weak passwords
is another cause of malware target in IoT.
The level of sophistication for the exploitation of the IoT is much lower and easy as
compared to a single device.
Twenty-four hours of internet exposure of the IoT devices and equipment is another
cause of IoT as a malware target. Due to the unlimited internet connection, the devices
will accept the incoming traffic signals and become vulnerable to attacks.
The attributes and features of malware differ in a single device and a set of interconnected
devices and equipment.
Table 9 shows the different security attributes of a single device that is PC and the set of devices
that is IoT about malware. Cyber-attacks on IoT applications can be both internal and external
attacks. The attacker is a compromised node of the network in an inside attack whereas the
attacker is not a part of the network in an outside attack. Figure 9 shows the significant types
of cyber-attack that target IoT applications. The types of attacks as well as how the attack will
impact the IoT network and their implications are described.
Fig: 5.9 Taxonomy of Security attacks within IoT
Physical/perception layer
Attacks are based on hidden aspects of devices and equipment. These attacks can take control
of the device by tampering with hardware. IoT physical attacks are launched when an attack is
close to the network or IoT device. Some of the significant threats at the physical/perception
layer include:
CHAPTER-6
CONCLUSION
[1] Anderson, James P., Computer Security Threat Monitoring and Surveillance, Washing,
PA, James P. Anderson Co., 1980.
[2] Wenke Lee et al., "Real time data mining-based intrusion detection," In Proceedings
DARPA Information Survivability Conference and Exposition II. DISCEX'01, Anaheim,
CA, USA, pp. 89-100, 2001.
[3] Jyothsna, V. V. R. P. V., VV Rama Prasad, and K. Munivara Prasad, "A review of
anomaly based intrusion detection systems," International Journal of Computer Applications,
vol. 28, no. 7, pp. 26-35, 2011.
[4] G. Ciaburro and B. Venkateswaran. “Neural networks with R: smart models using CNN,
RNN, deep learning, and artificial intelligence principles.” Birmingham, UK: Packt
Publishing, 2017.
[5] J. Kiefer and J. Wolfowitz, “Stochastic Estimation of the Maximum of a Regression
Function,” The Annals of Mathematical Statistics, vol. 23, no. 3, pp. 462-466, 1952.
[6] Bottou, Léon, Frank E. Curtis, and Jorge Nocedal, "Optimization methods for large-
scale machine learning," Siam Review, vol. 60, no. 2, pp. 223-311, 2018.
[7] M. Roopak, G. Yun Tian and J. Chambers, "Deep learning models for cyber security in
IoT networks," In Proceedings of IEEE 9th Annual Computing and Communication
Workshop and Conference, Las Vegas, NV, USA, pp. 0452-0457, 2019.
[8] Yeom, Sungwoong, and Kyungbaek Kim, "Detail analysis on machine learning based
malicious network traffic classification." In Proceedings of the International Conference
on Smart Media & Applications, pp. 49-53, 2019.
[9] Kim, Jiyeon, Yulim Shin, and Eunjung Choi, "An Intrusion Detection Model based on a
Convolutional Neural Network," Journal of Multimedia Information System, vol. 6, no. 4,
pp. 165-172, 2019.
[10] Khraisat, Ansam, et al. "Survey of intrusion detection systems: techniques, datasets
and challenges," Cybersecurity, vol. 2, no. 1, p. 20, 2019.
[11] Y. Gu, K. Li, Z. Guo, and Y. Wang, “Semi-supervised K-means DDoS detection method
using hybrid feature selection algorithm,” IEEE Access, vol. 7, pp. 64351–64365, 2019.