0% found this document useful (0 votes)
51 views5 pages

2023 Scopus Ensemble Based Dimensionality

The document presents a study on an ensemble-based hybrid approach for intrusion detection in wireless networks using machine learning techniques, specifically Random Forest and Blended Linear Discriminant Analysis (BLDA) for feature selection. The proposed method demonstrates improved accuracy and performance, achieving 90.12% and 91.0% accuracy on the NSL-KDD and UNSW_NB15 datasets, respectively, compared to traditional methods. The research highlights the importance of effective feature selection and dimensionality reduction in enhancing the performance of Intrusion Detection Systems (IDS).

Uploaded by

yuva raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views5 pages

2023 Scopus Ensemble Based Dimensionality

The document presents a study on an ensemble-based hybrid approach for intrusion detection in wireless networks using machine learning techniques, specifically Random Forest and Blended Linear Discriminant Analysis (BLDA) for feature selection. The proposed method demonstrates improved accuracy and performance, achieving 90.12% and 91.0% accuracy on the NSL-KDD and UNSW_NB15 datasets, respectively, compared to traditional methods. The research highlights the importance of effective feature selection and dimensionality reduction in enhancing the performance of Intrusion Detection Systems (IDS).

Uploaded by

yuva raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Proceedings of the 5th International Conference on Smart Systems and Inventive Technology (ICSSIT 2023)

IEEE Xplore Part Number: CFP23P17-ART; ISBN: 978-1-6654-7467-2

Ensemble based Dimensionality Reduction for


Intrusion Detection using Random Forest in
Wireless Networks
2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT) | 978-1-6654-7467-2/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICSSIT55814.2023.10060929

S. Suhana S. Karthic Dr. N. Yuvaraj


PG Student Assistant Professor Professor
Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering, Engineering, Engineering,
KPR Institute of Engineering and KPR Institute of Engineering and KPR Institute of Engineering and
Technology, Technology, Technology,
Coimbatore, India Coimbatore, India Coimbatore, India
[email protected] [email protected] [email protected]

Abstract—In the present digital era, the evolution of internet Support Vector Machines(SVM). Large number of IDS
technologies increases rapidly and as a result large number of methods are designed by researchers involving SVM in
devices is connected with the public network. Due to this, huge various forms[3]. KNN is a traditional weight based
volume of data is been generated and transmitted over the classification strategy. Weighted KNN and optimization
network. Similarly the attacker has formulated large number of algorithms are also used in performing IDS[4]. Naïve bayes
ways to access to information on the network. Due to the classifier uses probability models for classification of data
availability of internet, almost every device are interconnected based on the details of known data using bayes rule. Feature
and are exposed to data breach. Hence Intrusion Detection selection is the process of retrieving the important features and
System (IDS) are mandatory to ensure the information is secure
thereby eliminating the problem of overfitting. Variety of
from the attackers. As technology is developing the attackers
identifies a new technique to breach the secure data. Machine
schemes are utilized in the existing methods for this purpose.
Learning approach is been employed in different domain and it A hybrid feature selection approach is employed involving set
has given better results in terms of performance and accuracy. theory to rank relevant features[5]. Nature based approaches
In order to design an effective IDS, this study utilizes machine are widely used to perform feature retrieval task[6]. These
leaning models. This article has proposed an ensemble-based approaches generally pick the attributes reaching a threshold
hybrid approach for intrusion detection in networks. In the score of relevance and importance. A hybrid approach
initial stage, the important features are extracted using Blended encompassing black hole and PSO is considered for selecting
Linear Discriminant Analysis (BLDA). Further the essential features for cancer detection[7].
dimensionality reduced dataset are used to detect intrusion
using Random Forest Classifier. Here, two benchmark datasets Deep learning models are superior to machine learning
namely NSL-KDD and UNSW_NB15 are used to evaluate the models. They are capable of learning and performing the
potential of the proposed method. To prove the effectiveness of classification task automatically. Still these DL models have
our approach, the proposed scheme is compared with classical certain challenges like overfitting and requirement of well-
LDA, PCA and PLS based feature selection schemes where the balanced data. A hybrid approach involving OHDNN
presented method provides. The accuracy of the proposed classifier and ECRF method feature selection produced better
method is 90.12 % and 91.0 % for NSL-KDD and UNSW_NB15 ID scheme[8]. CNN is a deep learning model well suited for
datasets respectively. The results clearly shows that our dealing with images. Due to its robustness it is employed in
proposed method provides considerable improvement in IDS. Using CNN requires the features to be scaled into image.
performance of IDS. These image are further used by CNN for the detection of
attack[9]. Selecting necessary features from the dataset and
Keywords— Intrusion detection system, Dimensionality applying CNN over the reduced dataset has shown prominent
reduction, Random Forest Classifier, Linear Discriminant improvement in the performance of traditional CNN
Analysis model[10]. RNN models are widely used for sequential data
classification problems. LSTM is the modified version of
I. INTRODUCTION
RNN and capable of using the previous data for processing
The latest developments in internet and IoT has increased involving three gates. Since the size of the dataset is very
the number of gadgets attached to the public network and huge, LSTM is applied in developing IDS. These models are
creates security and privacy concerns. The security aspect of capable of performing both binary and multiclass
the network is a major concern as the malicious attacks are classification in detecting intrusions[11]. Further the LSTM
very frequent. The attacker formulates new approaches to performance is enhanced by selecting appropriate feature
breach the network and its security mechanisms [1] IDS is a using Grey wolve scheme [12]. Autoencoders are capable of
way of designing requirement mechanism to identify reducing the data and widely used for the purpose of feature
unethical activities and to safeguard the data from malware selection. Further the reduced datasets are used by DL models
and untrusted access in the network. in detecting the intrusions. The processing of reduced dataset
The faster computing capabilities of machine learning and provides the advantage of faster processing and accurate
deep learning model involves in developing security solutions detection[13]. Hybrid deep learning approaches has shown
for IOT environment[2]. Machine learning generally utilizes significant improvement in the performance of deep learning
labelled data for performing classification tasks. The most models[14,15]. The challenges in designing an efficient IDS
commonly used machine learning method for classification is are unbalanced data, requirement to select essential attribute

978-1-6654-7467-2/23/$31.00 ©2023 IEEE 704

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on March 20,2023 at 07:20:57 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 5th International Conference on Smart Systems and Inventive Technology (ICSSIT 2023)
IEEE Xplore Part Number: CFP23P17-ART; ISBN: 978-1-6654-7467-2

from dataset, issues of overfitting, estimation overhead and A. Linear Discriminant Analysis (LDA)
performance consideration. The challenges of IDS are LDA is a common strategy for feature nomination. It
overcome by designing a hybrid BLDA approach and random transforms the data in higher dimensional space to lower
forest classifier. dimensional space that linearly separates the features and
With the objective of designing an efficient IDS that can eliminates overfitting. LDA consists of following steps.
produce better detection rates, the main contributions of the 1. Compute between class variance as the distance
article is presented as follows: among average of different classes termed as
• Selection of most relevant features is carried out using between class variance using the formula
BLDA approach, 𝑆𝑏 = ∑𝑛𝑖=1 𝑁𝑖 (𝑋̅𝑖 − 𝑋̅)(𝑋̅𝑖 − 𝑋̅)𝑇 (1)
• The reduced features are further fed to random 2. Estimate the distance among mean for same class
classifier to determine the presence of attack. termed as within class variance
• The performance of the proposed IDS scheme is 𝑆𝑤 = ∑𝑛𝑖=1 ∑𝑁𝑖 ̅ ̅ 𝑇 (2)
𝑗=1 𝑁𝑖 (𝑋𝑖,𝑗 − 𝑋𝑖 )(𝑋𝑖,𝑗 − 𝑋𝑖 )
validated against existing methods and shows
promising results. 3. Compute the eigen values 1, 2…. n and their
corresponding eigen vectors v1, v2,…vn where n is
II. PROPOSED METHODOLOGY 41 for NSLKDD and 49 for USWNB 15 data set.
The IDS scheme used in this article used BLDA approach Calculate the transformation matrix as
for retrieving necessary features of the dataset. The primary 𝑆𝑏 𝑋 = 𝑆𝑖 𝑋 (3)
task is preprocessing od data that involves handling missing
and null values. One hot encoder is used to transform the 4. Pick k eigen vectors from descending sorted list and
categorical attributes into numerical. Further max min construct the lower dimensional space having
normalization is performed to scale the data into uniform smaller within class variance and higher between
range. 80% of the dataset is used for training and model and class variance
its performance is evaluated using 20% test data. The reduced
features are further classified using RF classifier. Fig.1. gives Y=XVk (4)
the overall structure of proposed IDS scheme. Thus, LDA reduces M dimension features to k feature set
and eliminates M-k features that are irrelevant. A feature is
considered as good if it is relevant to the class concept and at
the same time it should not be redundant. In our proposed
Input Data method, in addition to the features generated using LDA,
highly correlated features are included. The correlation
between all independent features with dependent features are
Data computed and then the top ‘n’ highly correlated features are
Preprocessing selected and included in dataset used for training the classifier.
To avoid including redundant features the value of ‘n’ remains
minimum within the range 1 to 5.
B. Random Forest Classifier
Training Set Test set(20%)
Random Forest is an ensemble based model used for
(80%)
classification tasks proposed by Breiman in 2001[16].
Random Forest performs classification with the help of
Feature Selection creating multiple decision trees. These trees are formulated
using Information Gain, Gini Index and Gain ratio. The
accuracy of a single decision tree will be to some extend and
when ensemble of decision trees are used then the class with
RF classifier maximum score will be considered as final result thereby
model improving the accuracy compared to single decision tree
predictor. The steps followed in random forest classifier is
given below:
Evaluate dataset 1. Transform the input dataset from high dimensional
space to low dimensional space using Blended Linear
Discriminant Analysis (BLDA).
2. Select K random points from the dimension reduced
Attack / Normal dataset.
3. For each of these K points construct a decision tree
predictor. Use these decision trees to predict the result as
Fig.1. Overview of Proposed IDS attack or normal.
4. Make the final decision considering the maximum
score of the individual decision tree predictors.

978-1-6654-7467-2/23/$31.00 ©2023 IEEE 705

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on March 20,2023 at 07:20:57 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 5th International Conference on Smart Systems and Inventive Technology (ICSSIT 2023)
IEEE Xplore Part Number: CFP23P17-ART; ISBN: 978-1-6654-7467-2

Input Data Precision is the ratio of rightly classified instances out of


total classified instance from the testset and is estimated using
TP
Precision= (6)
TP+FP
Data Preprocessing
Sensitivity is the ratio of rightly classified instances as
attack to the actual attack instances in the testset
TN
Sensitivity= (7)
Feature Selection TN+FN
(BLDA Algorithm) The performance of HLDA-RF method is compared with
other machine learning algorithms including PCA+RF,
LDA+RF, Corr+RF and PLS+RF. Results obtained for NSL-
KDD dataset is presented in the Table 1 and graphically
Random Forest
presented in Fig. 3.
Classifier
Table 1: Performance of BLDA-RF method over NSL-
KDD dataset

Normal Class Attack Class Method Accuracy Specificity Sensitivity

Fig. 2. Architecture of proposed Approach BLDA + RF 90.12 89.21 91.44

In our proposed BLDA approach, initially the high PCA + RF 84.71 85.51 83.8
dimensional dataset is transformed to low dimensional dataset
using classic Linear Discriminant Analysis (LDA). Then ‘n’ LDA + RF 86.09 87.08 84.94
highly correlated features are comprised in the transformed
dataset. The optimized resultant attributes are further Corr + RF 83.41 85.71 80.08
employed to train the classifier model. The overall framework
of our proposed method is depicted in Fig. 2. PLS + RF 79.35 81.43 77.21

III. RESULTS AND DISCUSSION


The proposed BLDA approach is evaluated for its The proposed method with reduced feature set has
performance using UNSW-NB15 and NSLKDD dataset. produced higher classification accuracy of 90.12% and
NSLKDD is an improvised version of KDDcup’99 dataset specificity of 89.21 % which is better than all other methods
removing the redundant and missing attributes which consists for binary classification. The sensitivity is 91.44 % and our
of 41 attributes and 5 classes of attacks. The UNSW-NB15 approach outperforms other methods in terms accuracy and
dataset comprises of 42 features and includes nine categories specificity.
of attacks. These datasets are standard dataset used for
evaluating the performance of IDS and consists of multiple Proposed PCA + RF LDA + RF
attributes and various classes of attacks. First the data set is
Corr + RF PLS + RF
preprocessed to handle missing and null values. Further one
hot encoding is employed to transform the features into 100
numerical values. Finally random forest classifier is applied.
80
Metrics in %

A. Evaluation Parameters
The performance of the proposed BLDA approach is 60
analyzed using two benchmark datasets and compared using 40
three evaluation parameters including accuracy, specificity
and sensitivity. 20
True Positive (TP): Count of correctly categorized 0
instances as normal. Accuracy Specificity Sensitivity
False Positive (FP): Count of normal records wrongly
categorized as attack type. Fig. 3. Performance comparison of different models over
True Negative (TN): Count of correctly categorized NSL-KDD dataset
records as attack type. Our method also gives significant improvement in
performs for the UNSW-NB15 dataset. The comparison
False Negative (FN): Count of attack type records that are results with other approaches are carried involving various
categorized as normal. metrics and are given in Table 2 and graphically presented in
Accuracy measures the ratio of rightly classified instances fig. 4. It is observed that our approach gives better
out of total instances from the testset and is estimated using classification accuracy of 91.08 % and 88.75% sensitivity.
TP+TN
PCA+RF gives 85.66% accuracy, 88.07 % specificity and
Accuracy= (5) 82.73% sensitivity. Other methods including LDA+RF gives
TP+FP+TN+FN
88.32 % accuracy, 91.25 % specificity, 86.34 % sensitivity
and Corr+RF method gives 81.65 % accuracy, 87.28%

978-1-6654-7467-2/23/$31.00 ©2023 IEEE 706

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on March 20,2023 at 07:20:57 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 5th International Conference on Smart Systems and Inventive Technology (ICSSIT 2023)
IEEE Xplore Part Number: CFP23P17-ART; ISBN: 978-1-6654-7467-2

specificity, 78.66 % sensitivity. For the UNSW-NB15 dataset,


PLS+RF gives accuracy of 80.21 %, specificity of 82.78 % NSL KDD UNSWNB 15
and sensitivity of 77.85%. Our proposed methods give better
100
performance when compared with other methods.
90
Table 2: Overall performance of different models
over UNSW-NB15 dataset 80
70

Performance in %
Method Accuracy Specificity Sensitivity
60
BLDA + RF 91.08 92.81 88.75 50
PCA + RF 85.66 88.07 82.73 40
LDA + RF 88.32 91.25 86.34 30
Corr + RF 81.65 87.28 78.66 20
PLS + RF 80.21 82.78 77.85 10
0
Accuracy Specificity Sensitivity
Proposed PCA + RF LDA + RF
Corr + RF PLS + RF Fig. 5. Overall performance of proposed IDS
The proposed model is also used to perform multiclass
100 clssification to determine the attack class. The NSL-KDD
90 dataset consists of four categories of attack.
80
70
Metrics in %

60
50 BLDA + RF PCA + RF LDA + RF
40 Corr + RF PLS + RF
30
20 100
10
0 90
Accuracy Specificity Sensitivity
80
Fig. 4. Performance comparison of different models over
UNSW NB15 dataset 70
Accuracy in %

The performance of the proposed approach is assessed


using the evaluation metrics for the two datasets is graphically 60
presented in the fig.5. Proposed BLDA method produced an
accuracy of 90.12% for NSL-KDD and 91.08% for UNSW- 50
NB15 datasets. BLDA produced the specificity 89.21% for
NSLKDD datasets and 92.81% for UNSW-NB15 datasets. 40
Among the other considered approaches, PCA+RF has
85.66%, 88.07 % and 85.73% of accuracy, specificity and 30
sensitivity respectively. The sensitivity of Corr+RF and
PLS+RF is 77.66 % and 77.85 %. From these values it is clear 20
evident the proposed method is capable for performing IDS
effectively. 10
Probe DOS U2R R2L
Our proposed method also produced sensitivity of 91.44%
for NSL-KDD datasets and 88.75% for UNSW-NB15 Attack Classes
datasets. The above results clearly highlight the fact that our
proposed dimensionality reduction based intrusion
classification system outperforms other equivalent machine Fig. 6. Accuracy of detecting attack classes using NSL-
learning approaches in terms of accuracy, specificity and KDD dataset
sensitivity. The accuracy of detecting diffent categories of attack is
presented in the Fig. 6. It is clearly evident the proposed
approach has the highest accuracy in detecting the attck
classes compared to other approaches. The deteection rate is
92.08% or probe, 89.9 % for DOS, 91.0% and 87.5% for U2R
and R2L inds of attacks. Amongs the other methods
considered LDA+RF has detection rate of 86.4% for DOS
attack, PCA+RF has accuracy of 87.9% for U2R and 84.7 %
for R2L kind of attack classes. The selection of most relevant

978-1-6654-7467-2/23/$31.00 ©2023 IEEE 707

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on March 20,2023 at 07:20:57 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 5th International Conference on Smart Systems and Inventive Technology (ICSSIT 2023)
IEEE Xplore Part Number: CFP23P17-ART; ISBN: 978-1-6654-7467-2

features carried out by BLDA approach has significantly [12] Karthic, S., Manoj Kumar, S., & Senthil Prakash, P.N. (2022). Grey
shown improvement performance than other feature selection wolf based feature reduction for intrusion detection in WSN using
LSTM. International Journal of Information Technology.
strategies.
[13] Rao, K.N., Rao, K.V., & Reddy, P.V. (2021). A hybrid Intrusion
Detection System based on Sparse autoencoder and Deep Neural
IV. CONCLUSION Network. Comput. Commun., 180, 77-88.
In this article, a blended LDA based dimensionality [14] Kim, B., Yuvaraj, N., Sri Preethaa, K.R., & Arun Pandian, R. (2021).
reduction method is introduced for intrusion detection system. Surface crack detection using deep learning with shallow CNN
The specified BLDA method effectively eliminates the architecture for enhanced computation. Neural Computing and
Applications, 33, 9289 - 9305.
irrelevant features for building the classification model and
[15] Kim, B., Yuvaraj, N., SriPreethaa, K.R., Santhosh, R., & Sabari, A.
BLDA also takes necessary measures to handle the redundant (2020). Enhanced pedestrian detection using optimized deep
features available in the input dataset. The essential features convolution neural network for smart building surveillance. Soft
of the dataset are extracted and is then used to train the Computing, 1-12.
Random Forest classifier. The proposed model is then utilized [16] Breiman, L.. “Random Forests.” Machine Learning 45 (2001): 5-32
to determine the presence of attack in the testset. The
effectiveness of our approach is evaluated using two
benchmark datasets namely NSL-KDD, UNSW_NB15 and its
capability is evaluated with other ML methods using the
evaluation metrics. The result clearly insights that our
proposed BLDA based intrusion detection system gives better
performance in classifying the network attacks.
Implementation of BLDA for detecting multiclass attacks
using deep learning approaches can be carried out as a future
work. Also the model can be trained against real time dataset
in detecting intrusions.
.
REFERENCES

[1] Nassar, Mostafa, et al. "Network Intrusion Detection, Literature


Review and Some Techniques Comparision." 2019 15th International
Computer Engineering Conference (ICENCO). IEEE, 2019.
[2] Latif, S., Huma, Z., Jamal, S.S., Ahmed, F., Ahmad, J., Zahid, A.,
Dashtipour, K., Aftab, M.U., Ahmad, M., & Abbasi, Q.H. (2022).
Intrusion Detection Framework for the Internet of Things Using a
Dense Random Neural Network. IEEE Transactions on Industrial
Informatics, 18, 6435-6444.
[3] Mohammadi, M., Rashid, T.A., Karim, S.H., Aldalwie, A.H., Tho,
Q.T., Bidaki, M., Rahmani, A.M., & Hosseinzadeh, M. (2021). A
comprehensive survey and taxonomy of the SVM-based intrusion
detection systems. J. Netw. Comput. Appl., 178, 102983.
[4] Xu, H., Przystupa, K., Fang, C., Marciniak, A., Kochan, O., & Beshley,
M. (2020). A Combination Strategy of Feature Selection Based on an
Integrated Optimization Algorithm and Weighted K-Nearest Neighbor
to Improve the Performance of Network Intrusion Detection.
Electronics.
[5] Albulayhi, K., Abu Al-Haija, Q., Alsuhibany, S.A., Jillepalli, A.A.,
Ashrafuzzaman, M., & Sheldon, F.T. (2022). IoT Intrusion Detection
Using Machine Learning with a Novel High Performing Feature
Selection Method. Applied Sciences.
[6] Rostami, M., Berahmand, K., & Forouzandeh, S. (2021). Review of
Swarm Intelligence-based Feature Selection Methods. Eng. Appl.
Artif. Intell., 100, 104210.
[7] Pashaei, E., Pashaei, E., & Aydin, N. (2018). Gene selection using
hybrid binary black hole algorithm and modified binary particle swarm
optimization. Genomics.
[8] Karthic, S., & Kumar, S.M. (2022). Hybrid Optimized Deep Neural
Network with Enhanced Conditional Random Field Based Intrusion
Detection on Wireless Sensor Network. Neural Processing Letters.
[9] Kim, J., Kim, J.W., Kim, H.J., Shim, M., & Choi, E. (2020). CNN-
Based Network Intrusion Detection against Denial-of-Service Attacks.
Electronics.
[10] Riyaz, B., Ganapathy, S. A deep learning approach for effective
intrusion detection in wireless networks using CNN. Soft Comput 24,
17265–17278 (2020). https://doi.org/10.1007/s00500-020-05017-0
[11] Muhuri, P.S., Chatterjee, P., Yuan, X., Roy, K., & Esterline, A.C.
(2020). Using a Long Short-Term Memory Recurrent Neural Network
(LSTM-RNN) to Classify Network Attacks. Inf., 11, 243.

978-1-6654-7467-2/23/$31.00 ©2023 IEEE 708

Authorized licensed use limited to: Kyungpook National Univ. Downloaded on March 20,2023 at 07:20:57 UTC from IEEE Xplore. Restrictions apply.

You might also like