Intrusion Detection Using Pca With Random Forest
Intrusion Detection Using Pca With Random Forest
Mr Upendra Singh
Assistant Professor
Department of Information Technology
Shri Govindram Seksaria Institute of Technology and Science, Indore, India
Upendrasingh49@gmail.com
Abstract: With the evolution in wireless wormhole etc. are seen on the network system. These
communication, there are many security threats over attacks are to steal the information fro m the system or
the internet. The intrusion detection system (IDS ) helps to corrupt the data present over any system [1]. To
to find the attacks on the system and the intruders are
make misuse of the data, the intruders attack the
detected. Previously various machine learning (ML)
techniques are applied on the IDS and tried to improve system in various ways, some of the attacks are DoS,
the results on the detection of intruders and to increase probe, snort, r2l etc. So to prevent the system fro m
the accuracy of the IDS . This paper has proposed an such attacks, the intrusion detection system was
approach to develop efficient IDS by using the principal introduced. IDS keep track of attacks on the system
component analysis (PCA) and the random forest and to prevent the system from these attacks [2].
classification algorithm. Where the PCA will help to
organise the dataset by reducing the dimensionality of So to detect such attacks, the various works have
the dataset and the random forest will help in done earlier by using various techniques. Here an
classification. Results obtained states that the proposed intrusion detection system that makes use of the
approach works more efficiently in terms of accuracy as
principal co mponent analysis is used along with the
compared to other techniques like S VM, Naïve Bayes,
and Decision Tree. The results obtained by proposed
random forest technique. Both the methods work for
method are having the values for performance time a special purpose, where the PCA g ives the
(min) is 3.24 minutes, Accuracy rate (%) is 96.78 %, granularity in the data, and the random forest helps
and the Error rate (%) is 0.21 %. the classification between the nodes for attacks [3].
Keywords: IDS, Knowledge Discovery Dataset, PCA, 1.1 Intrusion Detection System:
Random Forest.
Intrusion is a term which deals with entering the
I.INTRODUCTION system without any permission and with spoiling the
informat ion present inside the system [4]. This
Nowadays, the involvement of the internet in normal intrusion in any system can also harm the hardware
life has been increased rapidly. The internet has made of the system. The intrusion has become a significant
a crucial place in everyone’s life. The use of the term to prevent the system fro m. Th is intrusion inside
internet has become very crucial fo r everyone. So any system can be controlled or keep ing track of this
with the increase in the use of the internet for intrusion can be done with the help of the I DS. The
personal activities, it is also necessary to keep secure various types of intrusion detection systems are used
the system from malicious activities. earlier, but in the end, the accuracy concerns are seen
Different attacks are seen on the system or the in every method used. The two terms, such as
network. The attacks like a b lack hole, grey hole, detection rate and the false alarm rate, are analysed
Authorized licensed use limited to: Auckland University of Technology. Downloaded on August 08,2020 at 06:56:18 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
for the evaluation of the accuracy of the system [5]. prediction fro m the classifier that obtained in the very
These two terms should be in the manner that the first step.
false alarm rate should be minimised and the
improvement in the detection rate should be there in Pseudocode for the creation of a random forest is as
the system. So the random forest along with the PCA follows:
is applied for the IDS.
The IDS can be of two types in nature, fo r which it 1. Select some features k from total m as k<<m
2. By applying split point fro m k features get node
works, that are:
d
Network Intrusion Detection Systems
3. By applying best split get the daughter nodes
(NIDS): In this system, the network t raffic is
4. Repeat 3 steps till 1 node is reached
analysed, and the intrusion over it is analysed.
5. Create forest by repeating the steps fro m 1 to 4
Host-based Intrusion Detection Systems for the creation of forest.
(HIDS): Here, the system keeps track of the
system files that are accessed over the network.
There is also a subset of IDS types. The most
common variants are based on signature
detection and anomaly detection.
Signature-based: In this, the system found
some specific patterns which are used by
malware. These detected patterns are called
signatures. This is good in detecting known
attacks, but when it comes to new attacks, it
fails in such signature detection.
Anomaly-based: This is specially developed for
the detection of unknown attacks. This system
uses ML to construct the model.
1.3 PCA:
Authorized licensed use limited to: Auckland University of Technology. Downloaded on August 08,2020 at 06:56:18 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
4. Calculate the eigen vectors (e1, e2, e3 … .ed), Chowdhury and ANN submitted by Alex Shenfield,
and eigen values (v1,, v2, v3, ….vd). Aladdin Ayesh and David Day.[14]
5. Perform sorting of eigenvalue in decreasing order Here the authors studied various machine learn ing
and select n eigenvector with the highest algorith ms for the intrusion detection system. They
eigenvalues to get a matrix of d*n= M. compared some of the techniques like SVM, Ext reme
6. By using this M form a new sample space. learning machine and the random forest. The authors
7. The obtained sample spaces are the principal have stated the results as the Extreme mach ine
components. learning method performs a way better as compared
to other algorithms.[15]
II. LITERATURE REVIEW The authors here worked to improve the quality of
the dataset to provide it to the intrusion detection
Authors here presented a mechanism to design the system. They have used a fuzzy rule-based feature
IDS fo r the IoT that is based on the classification of selection technique for the improvement of the
the traffic by making the use of deep learning model. dataset. They used the KDD dataset and resulted
They performed the binary and mult i-class shown dynamic growth in the result of the IDS.[16]
classification. The obtained accuracy for the
presented system is high.[7] III. PROBLEM DOMAIN
The authors here gave a solution for the IDS as they
applied the SVM and Naïve Bayes algorithms and The systems which wo rk over the internet suffer fro m
proved that the SVM works better than the Naïve various malicious activities. The major problem seen
Bayes method. They carried the experiment on the in this field is the intrusion in the system for violat ing
KDD dataset, and they also give the results in terms the information. This intrusion is detected by creating
like detection and false alarm rate.[8] an intrusion detection system; this system also needs
In this paper, the authors performed three different to be accurate and efficient in the detection of the
experiments. They applied the feature selection as intruders. Various machine learn ing algorithms were
well in the analysis. Also showed the naïve Bayes, used for intrusion detection; some of them are SVM,
adaptive boost and partial decision tree. They Naïve Bayes etc. But the results state that there may
analysed all techniques for intrusion detection. [9] be some improvements to be done on terms of
In this paper, the authors have evaluated that the accuracy and the detection rates and the false alarm
Artificial neural networks with the feature selection rate. So me other techniques can replace previously
technique will provide better results as compared to applied techniques such as SVM and Naïve Bayes.
the Support vector machine technique. They used Also, the study states that the dataset can be
NSL-KDD dataset for the experiment. The given improved by using some methods over it. To imp rove
approach worked well.[10] the quality of the input to the proposed system.
Here the authors presented a review on the intrusion
detection systems, which uses a machine-learn ing
algorith m. The authors provided various machine IV. PROPOSED SOLUTION
learning algorith m’s comparison based on their
performance. They evaluated the survey based on the The intrusion detection system works for the
detection rates and false alarm rates.[11] improvement of the system, wh ich is affected by the
Authors have presented an approach for intrusion intruders. This system can do the detection of the
detection, which uses logistic regression and belief intruders. The proposed system tries to eliminate the
propagation. And the proposed method has proved existing problems related to the previous work. The
that it provides better average detection time as proposed system consists of the two methods that are
compared to earlier techniques.[12] principal co mponent analysis , and the other one is the
The authors used an in-depth learning approach for random forest.
the feature extraction fro m the dataset. They tried to The principal co mponent analysis is used for the
extract the features from dataset to make a dataset reduction of the dimension of the dataset; by this
efficient for use and in this way, they decided to method, the dataset quality will be improved as the
provide better input to the intrusion detection dataset may contain the correct attributes. After this,
system.[13] the random forest algorith m will be applied for the
Here they have surveyed the intrusion detection detection of the intruders, which provide both the
systems based on the machine learning approach. detection rate and the false alarm rate in an improved
They analysed all the machine learning algorith ms manner as compared to SVM.
that are used till the date and concluded that the
algorith ms proposed by Md Nasimuzzaman
Authorized licensed use limited to: Auckland University of Technology. Downloaded on August 08,2020 at 06:56:18 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
4.1 Algorithm for the proposed solution: 4.2 Flowchart for the Proposed Algorithm:
1. Attribute compatibility
Let the modulus be | Pr | for the main decision set,
secondary set be | Se |, and attribute co mpatibility is
defined as:
| | | |
CO( X → D) = | |
(1)
Authorized licensed use limited to: Auckland University of Technology. Downloaded on August 08,2020 at 06:56:18 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
Decision Tree using Random Forests Classifier with SMOTE and Feature
80 Reduction” 2013 International Conference on Cloud &
60 Ubiquitous Computing & Emerging T echnologies, 978-0-
4799-2235-2/13 $26.00 © 2013 IEEE
PCA with 5. Le, T.-T.-H., Kang, H., & Kim, H. (2019). The Impact of
40
Random PCA-Scale Improving GRU Performance for Intrusion
20 Forest Detection. 2019 International Conference on Platform
Technology and Service
0 (PlatCon). Doi:10.1109/platcon.2019.8668960
6. Anish Halimaa A, Dr K.Sundarakantham: Proceedings of the
Third International Conference on Trends in Electronics and
Informatics (ICOEI 2019) 978-1-5386-9439-8/19/$31.00
©2019 IEEE “ MACHINE LEARNING BASED
INT RUSION DETECTION SYST EM.”
7. Mengmeng Ge, Xiping Fu, Naeem Syed, Zubair Baig,
Gideon Teo, Antonio Robles-Kelly (2019). Deep Learning-
Based Intrusion Detection for IoT Networks, 2019 IEEE 24th
Pacific Rim International Symposium on Dependable
parameters Computing (PRDC), pp. 256-265, Japan.
8. R. Patgiri, U. Varshney, T. Akutota, and R. Kunde, ’’An
Investigation on Intrusion Detection System Using Machine
Figure 4. Result Comparison with other Classifiers Learning” 978-1-5386-9276-9/18/$31.00 c2018IEEE.
9. Rohit Kumar Singh Gautam, Er. Amit Doegar; 2018 8th
International Conference on Cloud Computing, Data Science
Here is the graphical representation of the obtained & Engineering (Confluence) “ An Ensemble Approach for
values. It is seen that in all the three aspects, the Intrusion Detection System Using Machine Learning
proposed method worked well. The values can be Algorithms.”
10. Kazi Abu Taher, Billal Mohammed Yasin Jisan, Md.
seen in the above graph. Mahbubur Rahma, 2019 International Conference on
Robotics, Electrical and Signal Processing Techniques
(ICREST )“Network Intrusion Detection using Supervised
Machine Learning T echnique with Feature Selection.”
11. L. Haripriya, M.A. Jabbar, 2018 Second International
Conference on Electronics, Communication and Aerospace
Authorized licensed use limited to: Auckland University of Technology. Downloaded on August 08,2020 at 06:56:18 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
Authorized licensed use limited to: Auckland University of Technology. Downloaded on August 08,2020 at 06:56:18 UTC from IEEE Xplore. Restrictions apply.