Log Classification using NLP Techniques Data-Driven Fault Categorization of Multimodal Logs using Natural Language Processing Techniques
Examensarbete för masterexamen
Suhren Gustafsson, Adam
System logs record system states to facilitate debugging of issues and failures. At Ericsson, several logs are analyzed when faulty baseband hardware is returned from customer networks. Classifying a unit given several logs can be considered a multimodal classification problem where each log represents modes of the system. As systems increase in size and complexity, the resources needed for subject matter experts to analyze these logs increase to a point where it’s no longer efficient. Therefore, Ericsson has used machine learning models using manual feature extraction patterns to analyze these logs according to the best understanding of which features should be used for classification. However, this manual feature engineering gives no guarantee of correlation between the best representation of the logs and the output of the classification model. In this thesis, we have shown that a data-driven NLP approach where concatenated bag-of-words representations for each log file fitted on an XGBoost classifier was able to match the production model used by Ericsson. Attempts to incorporate sequential representations of the log entries and parameter lists produced by the Spell and Drain log parsers did not yield improved results. In addition, while deep learning models like Transformers combined with neural Word2Vec embeddings were able to produce similar results, they are prohibitively complex in relation to the simpler solution. Our findings indicate that the baseband unit logs do not show the same high variability in sentence structure, nor seem to depend on structures of sequences for different hardware- or software faults. We also propose that care should be taken when treating logs as texts found in other classical NLP tasks, like sentiment analysis, or document classification where the text is in fact directly generated by humans, as opposed to automatic logging systems. All tested models were evaluated on a holdout test dataset used by the current production model. The existing Ericsson model achieved a macro F1-score of 0.866, the XGBoost model 0.885, and the Transformer model 0.861.
NLP, log, classification, machine learning, word embedding, LSTM, transformer, XGBoost, Spell, Drain.