System Log File Anomaly Detection with Sparse Transformer Models
Examensarbete för masterexamen
Physics (MPPHS), MSc
HARF ABILI, JOEL
Log anomaly detection is a useful tool for analyzing system log files and is based on identifying anomalous log messages in such files. Recent years have seen a surge in the use of automated, machine learning/artificial intelligence-based, methods for log anomaly detection. This is due to a general increase of system complexity, which has made manual methods a very time consuming and difficult task. The natural language processing based transformer model has seen success in the field of log anomaly detection but may fail in cases where log data is highly unstructured and where anomalous log messages may be far apart. One reason for this could be the transformer model’s squared dependency on input length, limiting how many log messages can be used as input to the model. So called sparse transformers address this problem with different variants achieving sub-quadratic dependencies on input length. In this project, one transformer-based model and two sparse transformerbased models are investigated and compared in their effectiveness for log anomaly detection in system log files. The transformer-based model uses a BERT-style architecture whereas the two sparse transformer-based models use a Big Bird- and a Longformer-type architecture. All three models then have a hyperspherical loss function applied directly on the raw model outputs. These outputs are then used to compute an anomaly score which in turn is used to classify a log message as being either normal or anomalous. Furthermore, all models are scaled down and trained from scratch on system log files in order to make them fit on the GPU. The log files used for evaluation in this project are the two open source data sets Hadoop Distributed File System (HDFS) and BlueGene/L (BG/L) as well as one Ericsson system log data set. All models are evaluated on annotated test data sets and the two main metrics looked at are F1-scores and estimated anomaly score probability density functions. Across the data sets, the highest F1-scores are achieved by the sparse transformer based models suggesting that the increased input size does affect performance. However, the highest F1-scores vary among the data sets with some only being slightly higher than those achieved by the transformer-based model, suggesting future to work explore other areas to increase performance. The estimated anomaly score probability density functions show a general tendency of the models failing to separate normal and anomalous log messages, although some models show hints of separation on certain data sets.
log anomaly detection, natural language processing, transformer, sparse transformer