0% found this document useful (0 votes)
88 views6 pages

Anomaly Detection in 5G Using Variational Autoencoders

This paper explores the use of Variational Autoencoders (VAE) for anomaly detection in 5G Non-IP Data Delivery (NIDD) datasets, addressing the need for effective security measures in rapidly expanding 5G networks. The study demonstrates the VAE's ability to detect unknown threats by analyzing reconstruction errors from network traffic data, showcasing its effectiveness through empirical evaluation against various denial-of-service attack scenarios. The findings contribute to enhancing the security and reliability of data transmission in complex 5G environments.

Uploaded by

mathsmasterdp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views6 pages

Anomaly Detection in 5G Using Variational Autoencoders

This paper explores the use of Variational Autoencoders (VAE) for anomaly detection in 5G Non-IP Data Delivery (NIDD) datasets, addressing the need for effective security measures in rapidly expanding 5G networks. The study demonstrates the VAE's ability to detect unknown threats by analyzing reconstruction errors from network traffic data, showcasing its effectiveness through empirical evaluation against various denial-of-service attack scenarios. The findings contribute to enhancing the security and reliability of data transmission in complex 5G environments.

Uploaded by

mathsmasterdp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Anomaly Detection in 5G using Variational Autoencoders

Amanul Islam∗, Sang-Yoon Chang†, Jinoh Kim‡, and Jonghyun Kim§


∗ Department of Computer Science and Information Technology, University of Malaya, Kuala Lumpur 50603, Malaysia

Department of Computer Science, University of Colorado Colorado Springs, CO 80918, US

Computer Science Department, Texas A&M University-Commerce (TAMUC), Commerce, TX 75428, USA
§
Cybersecurity Division, ETRI, Daejeon 34129, Korea

Abstract— With the rapid expansion of 5G networks, application in the context of 5G networks, particularly using
ensuring the security and integrity of data transmission 5G-NIDD datasets, remains relatively unexplored [4].
2024 Silicon Valley Cybersecurity Conference (SVCC) | 979-8-3503-8314-0/24/$31.00 ©2024 IEEE | DOI: 10.1109/SVCC61185.2024.10637312

becomes paramount. In this context, anomaly detection


plays a critical role in identifying suspicious activities or This paper bridges this gap by investigating the effectiveness
deviations from normal behavior. This paper presents an of VAE for anomaly detection in 5G telecommunications
approach to anomaly detection in 5G Non-IP Data networking.. We use VAE to build an unsupervised machine
Delivery (NIDD) datasets using Variational Autoencoders learning to enable the anomaly detection where anomaly
(VAE). We use VAE to build an unsupervised machine includes unknown threats, because supervised learning is
learning to enable the detection of unknown and disabled for the unknown and previously unseen threats, of
previously unseen threats. The VAE is trained on 5G- which we do not have the data to train the machine learning
NIDD datasets, capturing the intricate patterns and model. We design and build an anomaly detection based on
characteristics of normal network behavior. VAE and test it with the 5G-NIDD dataset [5]. We build an
Subsequently, anomalies are detected by measuring the anomaly detection because it can detect various anomalies
reconstruction error between the input data and the including those from malicious sources/attackers, including
reconstructed output from the VAE. Experimental results the unknown zero-day threats.We use the 5G-NIDD dataset
demonstrate the efficacy of the VAE, showcasing its as it provides us with many popular denial-of-service (DoS)
ability to accurately detect anomalies in 5G-NIDD threats against 5G networking, and we test our anomaly
datasets. This research contributes to the advancement of detection against those eight threats simulated in 5G-NIDD.
anomaly detection techniques in 5G networks, enhancing Through empirical evaluation and comparative analysis, we
the security and reliability of data transmission in seek to validate the viability of our approach and contribute
complex network environments. to the advancement of anomaly detection techniques tailored
to the unique characteristics of 5G networks.
Keywords— Cybersecurity, Machine Learning,
Unsupervised Learning, Cyber-attack detection, Anomaly II. LITERATURE REVIEW
detection, Mobile network, 5G Anomaly detection in network environments has been a
subject of extensive research due to its crucial role in
I. INTRODUCTION identifying malicious activities, system failures, and
abnormal behavior. Fernandes et al. (2019) [6] provide a
The advent of 5G technology promises revolutionary
comprehensive review of anomaly detection techniques in
advancements in wireless communication, enabling
network environments, highlighting the diverse range of
unprecedented speed, connectivity, and scalability.
Alongside its transformative potential, however, 5G methodologies and the challenges associated with their
networks introduce new challenges, particularly in ensuring application. Traditional methods, such as statistical
thresholding and rule-based approaches, often struggle to
the security and integrity of data transmission, especially in
adapt to the dynamic and heterogeneous nature of modern
Non-IP Data Delivery (NIDD) scenarios where machine-to-
networks. Kingma and Welling (2014) [7] introduced the
machine (M2M) communications are prevalent. The need for
robust anomaly detection mechanisms becomes increasingly concept of Variational Autoencoders (VAE), which are
critical in this context [1]. Traditional methods of anomaly probabilistic generative models capable of learning latent
representations of data. VAE have been successfully applied
detection often struggle to cope with the dynamic and
in various domains, including image generation, natural
heterogeneous nature of 5G networks, leading to a growing
language processing, and anomaly detection. Hasan et al.
interest in exploring advanced techniques capable of
(2016) [8] demonstrated the effectiveness of VAE in learning
effectively identifying anomalies [2]. One promising
approach is the utilization of Variational Autoencoders temporal regularities in video sequences, showcasing their
(VAE) for anomaly detection. VAE, as a class of generative potential for capturing complex patterns in sequential data.
models, learn to encode and reconstruct high-dimensional
data efficiently, offering a powerful framework for capturing Despite the advancements in anomaly detection techniques,
the underlying structure of normal data distribution and applying these methods to 5G networks, particularly in the
detecting deviations indicative of anomalous behavior [3]. context of Non-IP Data Delivery (NIDD), presents distinct
While several studies have demonstrated the efficacy of VAE challenges. The dynamic nature of 5G networks, coupled
in various anomaly detection tasks, including cybersecurity, with the sheer volume and diversity of data generated,
medical diagnosis, and industrial quality control, their necessitates tailored approaches for anomaly detection. For

Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
Table 1: Description of 5G-NIDD Dataset for Anomaly Detection

Feature Description Values/Types Distribution


Type Normal or Attack Labeled 80% Normal, 20% Attack
Data Sources Base stations, attacker nodes, pcap files Various protocols (HTTP, DNS,
benign users UDP, etc.)
Features Extracted from network traffic Numerical, categorical Includes packet-level features (e.g.,
captures size, duration, flags) and flow-level
features (e.g., source/destination IP,
port numbers)
Number of 238 Varies depending on the Some features might be present only
Features feature in specific protocols
Data Volume Approximately 20 GB Compressed pcap files

Table 2: Preprocessing and Data Preparation for Anomaly Detection

Preprocessing Step Description


Data Cleaning Handle missing values, outliers, and inconsistencies within the dataset.
Standardization/Normalization Standardize or normalize features to bring them to a common scale,
preventing dominance during training.
Categorical Encoding Encode categorical variables using techniques like one-hot encoding for
numerical representation.
Feature Engineering Extract relevant information or create new features capturing underlying
patterns in network traffic data.

instance, research by Lam et al. (2020) [9] proposes a deep A. 5G-NIDD Dataset Description
learning-based approach for anomaly detection in 5G
networks, leveraging CNNs to extract features from network 5G-NIDD dataset is created using a real 5G test network for
traffic data and detect anomalies. Similarly, Haque et al. network intrusion detection. It was published by [5]. It has
(2019) [10] investigate the use of VAE for anomaly detection total 52 features, 32 float type, 12 int type and 8 categorical
in wireless sensor networks, demonstrating the effectiveness type. This dataset has 1215890 records, where 477737 are
of latent space representations in capturing anomalous benign, and 738153 are malicious. Benign records are 39.2%,
patterns. Kim et al. (2023) [11] uses VAE and introduces new whereas malicious records are 60.7%. This dataset has eight
features (to check the physical mobility constraints and different types of attacks; their names and percentage in the
inconsistency) to detect location spoofing and anomalies in attack traffic are: UDPFlood (61.9%), HTTPFlood (19.0%),
mobile and vehicular networking (which future application SlowrateDoS (9.9%), TCPConnectScan (2.7%), SYNScan
builds on the underlying 5G technology). (2.7%), UDPScan (2.1%), SYNFlood (1.3%), and
ICMPFlood (0.15%).The 5G-NIDD dataset is a
Furthermore, studies such as the work by Vanerio et al. comprehensive collection of network traffic data generated
(2021) [12] explore ensemble learning methods, combining over a 5G testbed shows in Table 1. It aims to facilitate the
multiple anomaly detection algorithms to enhance detection development and evaluation of machine learning models for
performance in telecommunication networks. Similarly, anomaly detection and intrusion prevention in 5G networks.
research by Maimó et al. (2020) [13] investigates the
application of machine learning techniques,including B. Preprocessing and Data Preparation
Random Forest and Support Vector Machines (SVMs), for
anomaly detection in 5G networks. Data preprocessing plays a critical role in preparing the 5G
Non-IP Data Delivery (NIDD) dataset for anomaly detection
III. METHODOLOGY using Variational Autoencoders (VAE). Initially, the raw
dataset undergoes meticulous preprocessing steps to ensure
The methodology employed in this study encompasses its suitability for training and evaluation. Table 2 is a
several key steps aimed at developing and evaluating an summary Table illustrating the preprocessing steps applied to
anomaly detection system using Variational Autoencoders the 5G-NIDD dataset for anomaly detection using Variational
(VAE) on 5G Non-IP Data Delivery (NIDD) datasets. By Autoencoders (VAE).
employing unsupervised learning techniques, such as VAE,
the model can autonomously learn and detect anomalies, Table 2 summarizes the key preprocessing steps undertaken
thereby enhancing the network's security posture against both to prepare the 5G-NIDD dataset for anomaly detection using
known and unknown threats. The methodology is structured VAE. Each step contributes to ensuring the dataset's
to ensure robustness, reproducibility, and effectiveness in suitability for training and evaluation, enhancing the
addressing the research objectives.

Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
accuracy and robustness of the subsequent anomaly detection for each data sample, the anomaly detection process
process. in Variational Autoencoders (VAE) involves
applying a thresholding mechanism to classify
C. Variational Autoencoder (VAE) Architecture samples as either normal or anomalous.
 Identify anomalies: Flag data with high error or
The Variational Autoencoder (VAE) architecture is a type of anomaly scores as potential threats.
neural network architecture specifically designed for
unsupervised learning and latent variable modeling. It
comprises two main components: the encoder and the This process helps identify abnormal patterns
decoder. Below is a schematic diagram illustrating the (anomalies) in 5G network traffic data, enabling
Variational Autoencoder (VAE) architecture along with its proactive security measures.
components and the training objective:

Fig 1: Variational Autoencoder (VAE) Architecture [14]

This Figure (Fig 1) illustrates the architecture of the


Variational Autoencoder (VAE), which consists of an
encoder network followed by a decoder network. The input
data is first processed by the encoder, which transforms it into
a compact latent space representation characterized by the
mean (mu) and variance (sigma) parameters of a Gaussian
distribution. The KL divergence loss ensures that the learned
latent space approximates a standard normal distribution. The Fig 2: Process of Anomaly Detection
decoder network then reconstructs the original input data
from the latent space representation. The reconstruction loss IV. EXPERIMENTAL RESULTS
quantifies the dissimilarity between the input data and its
reconstructed output. The training objective of the VAE is to The experimental tools used in this research included
maximize the Evidence Lower Bound (ELBO), which programming languages, libraries, and frameworks that
comprises the reconstruction loss and the KL divergence loss. facilitated the implementation and evaluation of Anomaly
This optimization process balances the reconstruction of Detection in 5G-NIDD Datasets [5] using Variational
input data with the regularization of the latent space Autoencoders (VAE).
distribution, enabling robust representation learning and
effective anomaly detection. The anomaly detection in 5G-NIDD datasets utilized Python
as the primary programming language. Various machine
D. Anomaly Detection Process learning libraries such as Scikit-learn, Keras, TensorFlow,
and PyTorch were employed for implementing unsupervised
learning techniques, including autoencoders and CNNs. For
The anomaly detection process using Variational
data analysis and visualization, tools like NumPy, Pandas,
Autoencoders (VAE) involves several steps shows in Fig 2 to
and Matplotlib were used to preprocess data, conduct
identify abnormal patterns or outliers in the input data. Here's
exploratory analysis, and visualize results. Jupyter Notebook
a description of each step:
or JupyterLab served as the development environment,
facilitating coding, experimentation, and documentation.
 Encode data: Compress data into a low- These tools collectively provided a robust framework for
dimensional representation using the VAE's implementing, evaluating, and analyzing anomaly detection
encoder. using variational autoencoders in 5G-NIDD datasets.
 Decode data: Reconstruct the original data from the
compressed representation using the decoder. A. Loss and Error Evaluations
 Calculate error: Measure the difference between
In the experiment, we applied variational autoencoder to
the original and reconstructed data. detect anomalies using 5G-NIDD dataset. Dataset was
 Thresholding or Anomaly Score Calculation: preprocessed and used to train and evaluate the performance
Following the computation of reconstruction errors of the algorithms. The Figure below presents the results

Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
obtained from experiment.This Figure (Fig 3) provides the the validation loss (val_loss). The number of epochs here
training loss and validation loss for each epoch during the is set to 50, and for each epoch, it provides the average loss
training process. Each line represents one epoch during over the entire training (loss) and validation (val_loss)
training. In each epoch, it shows the training loss (loss) and datasets.

Fig 3: Training History of Variational Autoencoder (VAE) on 5G-NIDD Datasets

training loss (loss) decreases from 0.3427 to 0.1719, and learning and improving its performance on both the training
the validation loss (val_loss) decreases from 0.2724 to and validation datasets as the training progresses.
0.1698 over the 50 epochs. This indicates that the model is

Fig 4: Reconstruction Error by True Level of Variational Autoencoder (VAE) on 5G-NIDD Datasets

There does appear to be a trend where higher reconstruction It is essential to identify performance measures that are
errors tend to be associated with attack data points. This appropriate for the problem to be solved when assessing the
suggests that the model may be better at reconstructing performance of machine-learning models. To measure
normal data points than attack data points.Overall, the scatter detection performance, we use the standard metrics of
plot (Fig 4) suggests that the model has varying degrees of accuracy and F1 score defined: Accuracy =
success in reconstructing both normal and attack data points. ∗ ∗
and F1score == where TP=true positives,
B. Detection Accuracy Results FP=false positives, FN=false negatives, and TN=true
negatives. Accuracy is widely used but could be biased if the

Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
population of the minority class (i.e., either normal or 0.91, which is considered a high value and further indicates
anomaly) is too small. In contrast, F1 score is known to be strong performance. An AUC of 1 represents a perfect model,
more reliable to the class imbalance problem. Table 3 shows while 0.5 represents a random guess.Overall, based on the
the results for the anomaly detection performance using ROC curve, the model seems to be effective in differentiating
training history of the Variational Autoencoder (VAE) on the between normal and anomalous data. It has a low false
5G-NIDD dataset.The anomaly detection model achieved an positive rate (FPR), meaning it correctly identifies most
accuracy of 92%, indicating that 92% of instances were normal data points and avoids falsely classifying them as
correctly classified as normal or anomalous. The precision is anomalies. Additionally, it has a high true positive rate
0.94, meaning that among instances classified as normal, (TPR), indicating it successfully identifies most of the actual
94% were actually normal. The recall is 0.90, indicating that anomalies.
the model correctly identified 90% of the normal instances.
Confusion Matrix provide insights into how well the model
Table 3: Performance Metrics on 5G-NIDD Dataset is performing in distinguishing between normal and
anomalous instances.

Metric Value Table 4: Confution Matrix of Variational Autoencoder


(VAE) on 5G-NIDD Dataset

Accuracy 0.92 Attack Normal


(Actual) (Actual)

Precision 0.94 Attack (Predicted) 175 (TN) 5 (FP)

Normal (Predicted) 15 (FN) 10 (TP)


Recall 0.90

In the confusion matrix shows in Table 4,The top-left value


F1 Score 0.92 (170) represents true negatives (TN), indicating normal
instances correctly classified as normal.The top-right value
(5) represents false positives (FP), indicating normal
instances incorrectly classified as anomalous.The bottom-left
The F1-score, which balances precision and recall, is 0.92.
value (15) represents false negatives (FN), indicating
These results suggest that the anomaly detection model based
anomalous instances incorrectly classified as normal.The
on the VAE and the chosen threshold (based on the final
bottom-right value (10) represents true positives (TP),
validation loss from the training history-0.1698) performed
indicating anomalous instances correctly classified as
reasonably well on the 5G-NIDD dataset.
anomalous.

Table 5: Classification Reports of Variational Autoencoder


(VAE) on 5G-NIDD Dataset

Metric Precision Recall F1- Support


Score
Attack 0.92 0.97 0.95 175.00

Normal 0.67 0.40 0.50 25.00


Accuracy None None 0.90 200.00
Macro 0.79 0.69 0.73 200.00
Avg
Weighted 0.88 0.90 0.88 200.00
Avg

Fig 5: Receiver Operating Characteristic (ROC) Curve The Table (Table 5) provides a concise overview of the
on 5G-NIDD Datasets performance metrics for a binary classification model tasked
with distinguishing between "Attack" and "Normal"
instances. It reveals that the model achieves high precision
The ROC curve shows (Fig 5) in good performance for
for both classes, particularly excelling in accurately
anomaly detection. The curve is close to the top-left corner,
identifying "Attack" instances. However, the model's recall is
signifying a good ability to distinguish between normal and
notably lower for "Normal" instances, indicating its struggle
anomalous data points. The area under the curve (AUC) is

Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
to effectively capture all actual positive instances in this class. conditional variational autoencoder. IEEE Internet
Despite this imbalance, the model still achieves a respectable of Things Journal, 8(8), 6187-6196.
overall accuracy of 90%. The F1-Score, a harmonic mean 5. Samarakoon, S., Siriwardhana, Y., Porambage, P.,
of precision and recall, further illustrates the model's higher Liyanage, M., Chang, S. Y., Kim, J., ... & Ylianttila, M.
effectiveness in detecting "Attack" instances compared to (2022). 5G-NIDD: A Comprehensive Network Intrusion
"Normal" instances. Additionally, considering class Detection Dataset Generated over 5G Wireless Network.
arXiv preprint arXiv:2212.01298.
imbalances, the weighted average metrics provide insight
6. Fernandes, G., Rodrigues, J. J., Carvalho, L. F., Al-
into the overall model performance, highlighting areas for
Muhtadi, J. F., & Proença, M. L. (2019). A
potential improvement in addressing class-specific
comprehensive survey on network anomaly
challenges.
detection. Telecommunication Systems, 70, 447-
489.
VI. CONCLUSION AND FUTURE WORK
7. Kingma, D. P., & Welling, M. (2014). Auto-
Encoding Variational Bayes. arXiv preprint
In conclusion, this study demonstrates the effectiveness of
arXiv:1312.6114.
Variational Autoencoders (VAE) in anomaly detection
8. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury,
within 5G Non-IP Data Delivery (NIDD) datasets. By
A. K., & Davis, L. S. (2016). Learning temporal
leveraging unsupervised learning techniques and latent
regularity in video sequences. In Proceedings of the
variable modeling, the VAE-based approach accurately
IEEE conference on computer vision and pattern
identifies abnormal network behavior, such as intrusions,
recognition (pp. 733-742).
cyber attacks, or malfunctions. This method offers significant
9. Lam, J., & Abbas, R. (2020). Machine learning
implications for enhancing network security and operational
based anomaly detection for 5g networks. arXiv
integrity in 5G networks by enabling proactive identification
preprint arXiv:2003.03474.
and mitigation of potential threats and anomalies. However,
10. Haque, A., Chowdhury, N. U. R., Soliman, H.,
challenges remain in optimizing thresholding mechanisms
Hossen, M. S., Fatima, T., & Ahmed, I. (2023,
and fine-tuning model parameters for improved performance
September). Wireless sensor networks anomaly
across various network conditions. Enhancing the
detection using machine learning: a survey. In
interpretability of detected anomalies also presents an
Intelligent Systems Conference (pp. 491-506).
ongoing area of focus for advancing anomaly detection
Cham: Springer Nature Switzerland.
techniques in 5G networks. Nonetheless, the findings
11. Kim, C. Chang, S.-Y., Lee, D., Kim, J., Park, K.,
underscore the potential of VAE as a powerful tool for
Kim, J. (2023, January). Reliable detection of
enhancing network security and resilience in the face of
location spoofing and variation attacks. IEEE
emerging threats, contributing to ongoing efforts to ensure
Access, 11, 10813-10825.
the integrity and reliability of 5G networks.
12. Vanerio, J., & Casas, P. (2017, August). Ensemble-
learning approaches for network security and
ACKNOWLEDGMENT anomaly detection. In Proceedings of the Workshop
on Big Data Analytics and Machine Learning for
This work was supported by Institute of Information & Data Communication Networks (pp. 1-6).
Communications Technology Planning & Evaluation (IITP) 13. Maimó, L. F., Gómez, Á. L. P., Clemente, F. J. G.,
grant funded by the Korea government (MSIT) (No. 2021-0- Pérez, M. G., & Pérez, G. M. (2018). A self-adaptive
02107, Collaborative Research on Element Technologies for deep learning-based system for anomaly detection in
6G Security-by-Design and Standardization-Based 5G networks. Ieee Access, 6, 7700-7712.
International Cooperation). 14. Wikipedia. (n.d.). VAE Basic.png [Digital image].
Retrieved April 10, 2024, from
REFERERENCES https://en.wikipedia.org/wiki/File:VAE_Basic.png

1. Savic, M., Lukic, M., Danilovic, D., Bodroski, Z.,


Bajović, D., Mezei, I., ... & Jakovetić, D. (2021).
Deep learning anomaly detection for cellular IoT
with applications in smart logistics. IEEE Access, 9,
59406-59419.
2. Boualouache, A., & Engel, T. (2023). A survey on
machine learning-based misbehavior detection
systems for 5g and beyond vehicular networks.
IEEE Communications Surveys & Tutorials.
3. Lin, Y. D., Liu, Z. Q., Hwang, R. H., Nguyen, V. L.,
Lin, P. C., & Lai, Y. C. (2022). Machine learning
with variational AutoEncoder for imbalanced
datasets in intrusion detection. IEEE Access, 10,
15247-15260.
4. Xu, X., Li, J., Yang, Y., & Shen, F. (2020). Toward
effective intrusion detection using log-cosh

Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.

You might also like