Abstract: Software-intensive systems produce logs for troubleshooting purposes. Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data. These models typically claim very high detection accuracy. For example, most models report an F-measure greater than 0.9 on the commonly-used HDFS dataset. To achieve a profound understanding of how far we are from solving the problem of log-based anomaly detection, in this paper, we conduct an in-depth analysis of five state-of-the-art deep learning-based models for detecting system anomalies on four public log datasets. Our experiments focus on several aspects of model evaluation, including training data selection, data grouping, class distribution, data noise, and early detection ability. Our results point out that all these aspects have significant impact on the evaluation, and that all the studied models do not always work well. The problem of log-based anomaly detection has not been solved yet. Based on our findings, we also suggest possible future work. This repository provides the implementation of recent log-based anomaly detection methods.
| Model | Paper |
|---|---|
| Unsupervised | |
| DeepLog (CCS '17) | DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning |
| LogAnomaly (IJCAI '19) | LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs |
| LogBERT (IJCNN '21) | LogBERT: Log Anomaly Detection via BERT |
| Semi-supervised | |
| PLELog (ICSE '21) | Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation |
| Supervised | |
| CNN (DSAC '18) | Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network |
| LogRobust (ESEC/FSE '19) | Robust log-based anomaly detection on unstable log data |
| NeuralLog (ASE '21) | Log-based Anomaly Detection Without Log Parsing |
- Python 3
- NVIDIA GPU + CUDA cuDNN
- PyTorch
The required packages are listed in requirements.txt. Install:
pip install -r requirements.txt
Raw and preprocessed datasets (including parsed logs and their embeddings) are available at https://zenodo.org/record/8115559.
We use datasets collected by LogPAI for evaluation. The datasets are available at loghub. The details of datasets is shown as belows:
| Dataset | Size | # Logs | # Anomalies | Anomaly Ratio |
|---|---|---|---|---|
| HDFS | 1.5 GB | 11,175,629 | 16,838 | 2.93% |
| BGL | 743 MB | 4,747,963 | 348,460 | 7.34 % |
| Thunderbird | 1.4 GB | 10,000,000 | 4,934 | 0.49% |
| Spirit | 1.4 GB | 5,000,000 | 764,500 | 15.29% |
We use log parsers from logparser to parse raw logs. We use AEL, Spell, Drain, and IPLoM for our experiments. The configuration for each parser used in our experiments can be found here.
For a fair comparison, we use the same fastText-based embedding method for all models. Use the following command to generate embeddings for log templates:
$ cd dataset
# download fastText word2vec model
$ wget https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M.vec.zip & unzip crawl-300d-2M.vec.zip
# generate embeddings for log templates
$ python generate_embeddings.py <dataset> <strategy>
# where <dataset> is one of {HDFS, BGL, Thunderbird, or Spirit}
# and <strategy> is one of {average or tfidf}The configuration files used to set up the hyperparameters for training and testing can be found at /config.
Main parameters are described as follows:
data_dir: the directory of the datasetlog_file: the path to the log filedataset_name: the name of the datasetgrouping: the type of log grouping technique (session or sliding)session_level: to grouping with sliding window by time or log entries (i.e., entry or minute)window_size: window size for sliding groupingstep_size: step size for sliding grouping (ifstep_size=window_size, it is equivalent to fixed grouping)is_chronological: whether to use chronological order for train/test split (only apply for sliding grouping)model_name: the name of the model (e.g., DeepLog, LogAnomaly, LogRobust, PLELog, CNN)sequential: whether to use sequential features (i.e., indexes of log templates) or notquantitative: whether to use quantitative features (i.e., event count vectors) or notsemantic: whether to use semantic features (i.e., log template embeddings) or notembedding_dim: the dimension of log template embeddingsembeddings: the path to the json file for log template embeddings
Training parameters such as batch_size, lr, max_epoch, optimizer, etc. are also defined in the configuration files.
python main_run.py --config_file <config_file>
# where `<config_file>` is the path to the configuration file.To see all the options, run python main_run.py -h.
If you find the code and models useful for your research, please cite the following paper:
@inproceedings{le2022log,
title={Log-based Anomaly Detection with Deep Learning: How Far Are We?},
author={Le, Van-Hoang and Zhang, Hongyu},
booktitle={2022 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)},
year={2022}
}