Anomaly Detection in 5G Using Variational Autoencoders
Anomaly Detection in 5G Using Variational Autoencoders
Abstract— With the rapid expansion of 5G networks, application in the context of 5G networks, particularly using
ensuring the security and integrity of data transmission 5G-NIDD datasets, remains relatively unexplored [4].
2024 Silicon Valley Cybersecurity Conference (SVCC) | 979-8-3503-8314-0/24/$31.00 ©2024 IEEE | DOI: 10.1109/SVCC61185.2024.10637312
Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
Table 1: Description of 5G-NIDD Dataset for Anomaly Detection
instance, research by Lam et al. (2020) [9] proposes a deep A. 5G-NIDD Dataset Description
learning-based approach for anomaly detection in 5G
networks, leveraging CNNs to extract features from network 5G-NIDD dataset is created using a real 5G test network for
traffic data and detect anomalies. Similarly, Haque et al. network intrusion detection. It was published by [5]. It has
(2019) [10] investigate the use of VAE for anomaly detection total 52 features, 32 float type, 12 int type and 8 categorical
in wireless sensor networks, demonstrating the effectiveness type. This dataset has 1215890 records, where 477737 are
of latent space representations in capturing anomalous benign, and 738153 are malicious. Benign records are 39.2%,
patterns. Kim et al. (2023) [11] uses VAE and introduces new whereas malicious records are 60.7%. This dataset has eight
features (to check the physical mobility constraints and different types of attacks; their names and percentage in the
inconsistency) to detect location spoofing and anomalies in attack traffic are: UDPFlood (61.9%), HTTPFlood (19.0%),
mobile and vehicular networking (which future application SlowrateDoS (9.9%), TCPConnectScan (2.7%), SYNScan
builds on the underlying 5G technology). (2.7%), UDPScan (2.1%), SYNFlood (1.3%), and
ICMPFlood (0.15%).The 5G-NIDD dataset is a
Furthermore, studies such as the work by Vanerio et al. comprehensive collection of network traffic data generated
(2021) [12] explore ensemble learning methods, combining over a 5G testbed shows in Table 1. It aims to facilitate the
multiple anomaly detection algorithms to enhance detection development and evaluation of machine learning models for
performance in telecommunication networks. Similarly, anomaly detection and intrusion prevention in 5G networks.
research by Maimó et al. (2020) [13] investigates the
application of machine learning techniques,including B. Preprocessing and Data Preparation
Random Forest and Support Vector Machines (SVMs), for
anomaly detection in 5G networks. Data preprocessing plays a critical role in preparing the 5G
Non-IP Data Delivery (NIDD) dataset for anomaly detection
III. METHODOLOGY using Variational Autoencoders (VAE). Initially, the raw
dataset undergoes meticulous preprocessing steps to ensure
The methodology employed in this study encompasses its suitability for training and evaluation. Table 2 is a
several key steps aimed at developing and evaluating an summary Table illustrating the preprocessing steps applied to
anomaly detection system using Variational Autoencoders the 5G-NIDD dataset for anomaly detection using Variational
(VAE) on 5G Non-IP Data Delivery (NIDD) datasets. By Autoencoders (VAE).
employing unsupervised learning techniques, such as VAE,
the model can autonomously learn and detect anomalies, Table 2 summarizes the key preprocessing steps undertaken
thereby enhancing the network's security posture against both to prepare the 5G-NIDD dataset for anomaly detection using
known and unknown threats. The methodology is structured VAE. Each step contributes to ensuring the dataset's
to ensure robustness, reproducibility, and effectiveness in suitability for training and evaluation, enhancing the
addressing the research objectives.
Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
accuracy and robustness of the subsequent anomaly detection for each data sample, the anomaly detection process
process. in Variational Autoencoders (VAE) involves
applying a thresholding mechanism to classify
C. Variational Autoencoder (VAE) Architecture samples as either normal or anomalous.
Identify anomalies: Flag data with high error or
The Variational Autoencoder (VAE) architecture is a type of anomaly scores as potential threats.
neural network architecture specifically designed for
unsupervised learning and latent variable modeling. It
comprises two main components: the encoder and the This process helps identify abnormal patterns
decoder. Below is a schematic diagram illustrating the (anomalies) in 5G network traffic data, enabling
Variational Autoencoder (VAE) architecture along with its proactive security measures.
components and the training objective:
Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
obtained from experiment.This Figure (Fig 3) provides the the validation loss (val_loss). The number of epochs here
training loss and validation loss for each epoch during the is set to 50, and for each epoch, it provides the average loss
training process. Each line represents one epoch during over the entire training (loss) and validation (val_loss)
training. In each epoch, it shows the training loss (loss) and datasets.
training loss (loss) decreases from 0.3427 to 0.1719, and learning and improving its performance on both the training
the validation loss (val_loss) decreases from 0.2724 to and validation datasets as the training progresses.
0.1698 over the 50 epochs. This indicates that the model is
Fig 4: Reconstruction Error by True Level of Variational Autoencoder (VAE) on 5G-NIDD Datasets
There does appear to be a trend where higher reconstruction It is essential to identify performance measures that are
errors tend to be associated with attack data points. This appropriate for the problem to be solved when assessing the
suggests that the model may be better at reconstructing performance of machine-learning models. To measure
normal data points than attack data points.Overall, the scatter detection performance, we use the standard metrics of
plot (Fig 4) suggests that the model has varying degrees of accuracy and F1 score defined: Accuracy =
success in reconstructing both normal and attack data points. ∗ ∗
and F1score == where TP=true positives,
B. Detection Accuracy Results FP=false positives, FN=false negatives, and TN=true
negatives. Accuracy is widely used but could be biased if the
Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
population of the minority class (i.e., either normal or 0.91, which is considered a high value and further indicates
anomaly) is too small. In contrast, F1 score is known to be strong performance. An AUC of 1 represents a perfect model,
more reliable to the class imbalance problem. Table 3 shows while 0.5 represents a random guess.Overall, based on the
the results for the anomaly detection performance using ROC curve, the model seems to be effective in differentiating
training history of the Variational Autoencoder (VAE) on the between normal and anomalous data. It has a low false
5G-NIDD dataset.The anomaly detection model achieved an positive rate (FPR), meaning it correctly identifies most
accuracy of 92%, indicating that 92% of instances were normal data points and avoids falsely classifying them as
correctly classified as normal or anomalous. The precision is anomalies. Additionally, it has a high true positive rate
0.94, meaning that among instances classified as normal, (TPR), indicating it successfully identifies most of the actual
94% were actually normal. The recall is 0.90, indicating that anomalies.
the model correctly identified 90% of the normal instances.
Confusion Matrix provide insights into how well the model
Table 3: Performance Metrics on 5G-NIDD Dataset is performing in distinguishing between normal and
anomalous instances.
Fig 5: Receiver Operating Characteristic (ROC) Curve The Table (Table 5) provides a concise overview of the
on 5G-NIDD Datasets performance metrics for a binary classification model tasked
with distinguishing between "Attack" and "Normal"
instances. It reveals that the model achieves high precision
The ROC curve shows (Fig 5) in good performance for
for both classes, particularly excelling in accurately
anomaly detection. The curve is close to the top-left corner,
identifying "Attack" instances. However, the model's recall is
signifying a good ability to distinguish between normal and
notably lower for "Normal" instances, indicating its struggle
anomalous data points. The area under the curve (AUC) is
Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.
to effectively capture all actual positive instances in this class. conditional variational autoencoder. IEEE Internet
Despite this imbalance, the model still achieves a respectable of Things Journal, 8(8), 6187-6196.
overall accuracy of 90%. The F1-Score, a harmonic mean 5. Samarakoon, S., Siriwardhana, Y., Porambage, P.,
of precision and recall, further illustrates the model's higher Liyanage, M., Chang, S. Y., Kim, J., ... & Ylianttila, M.
effectiveness in detecting "Attack" instances compared to (2022). 5G-NIDD: A Comprehensive Network Intrusion
"Normal" instances. Additionally, considering class Detection Dataset Generated over 5G Wireless Network.
arXiv preprint arXiv:2212.01298.
imbalances, the weighted average metrics provide insight
6. Fernandes, G., Rodrigues, J. J., Carvalho, L. F., Al-
into the overall model performance, highlighting areas for
Muhtadi, J. F., & Proença, M. L. (2019). A
potential improvement in addressing class-specific
comprehensive survey on network anomaly
challenges.
detection. Telecommunication Systems, 70, 447-
489.
VI. CONCLUSION AND FUTURE WORK
7. Kingma, D. P., & Welling, M. (2014). Auto-
Encoding Variational Bayes. arXiv preprint
In conclusion, this study demonstrates the effectiveness of
arXiv:1312.6114.
Variational Autoencoders (VAE) in anomaly detection
8. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury,
within 5G Non-IP Data Delivery (NIDD) datasets. By
A. K., & Davis, L. S. (2016). Learning temporal
leveraging unsupervised learning techniques and latent
regularity in video sequences. In Proceedings of the
variable modeling, the VAE-based approach accurately
IEEE conference on computer vision and pattern
identifies abnormal network behavior, such as intrusions,
recognition (pp. 733-742).
cyber attacks, or malfunctions. This method offers significant
9. Lam, J., & Abbas, R. (2020). Machine learning
implications for enhancing network security and operational
based anomaly detection for 5g networks. arXiv
integrity in 5G networks by enabling proactive identification
preprint arXiv:2003.03474.
and mitigation of potential threats and anomalies. However,
10. Haque, A., Chowdhury, N. U. R., Soliman, H.,
challenges remain in optimizing thresholding mechanisms
Hossen, M. S., Fatima, T., & Ahmed, I. (2023,
and fine-tuning model parameters for improved performance
September). Wireless sensor networks anomaly
across various network conditions. Enhancing the
detection using machine learning: a survey. In
interpretability of detected anomalies also presents an
Intelligent Systems Conference (pp. 491-506).
ongoing area of focus for advancing anomaly detection
Cham: Springer Nature Switzerland.
techniques in 5G networks. Nonetheless, the findings
11. Kim, C. Chang, S.-Y., Lee, D., Kim, J., Park, K.,
underscore the potential of VAE as a powerful tool for
Kim, J. (2023, January). Reliable detection of
enhancing network security and resilience in the face of
location spoofing and variation attacks. IEEE
emerging threats, contributing to ongoing efforts to ensure
Access, 11, 10813-10825.
the integrity and reliability of 5G networks.
12. Vanerio, J., & Casas, P. (2017, August). Ensemble-
learning approaches for network security and
ACKNOWLEDGMENT anomaly detection. In Proceedings of the Workshop
on Big Data Analytics and Machine Learning for
This work was supported by Institute of Information & Data Communication Networks (pp. 1-6).
Communications Technology Planning & Evaluation (IITP) 13. Maimó, L. F., Gómez, Á. L. P., Clemente, F. J. G.,
grant funded by the Korea government (MSIT) (No. 2021-0- Pérez, M. G., & Pérez, G. M. (2018). A self-adaptive
02107, Collaborative Research on Element Technologies for deep learning-based system for anomaly detection in
6G Security-by-Design and Standardization-Based 5G networks. Ieee Access, 6, 7700-7712.
International Cooperation). 14. Wikipedia. (n.d.). VAE Basic.png [Digital image].
Retrieved April 10, 2024, from
REFERERENCES https://en.wikipedia.org/wiki/File:VAE_Basic.png
Authorized licensed use limited to: OSMANIA UNIVERSITY. Downloaded on October 28,2024 at 07:21:00 UTC from IEEE Xplore. Restrictions apply.