Log Classification using NLP Techniques Data-Driven Fault Categorization of Multimodal Logs using Natural Language Processing Techniques
Typ
Examensarbete för masterexamen
Program
Publicerad
2021
Författare
Wirehed, Adam
Suhren Gustafsson, Adam
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
System logs record system states to facilitate debugging of issues and failures. At
Ericsson, several logs are analyzed when faulty baseband hardware is returned
from customer networks. Classifying a unit given several logs can be considered a
multimodal classification problem where each log represents modes of the system.
As systems increase in size and complexity, the resources needed for subject matter
experts to analyze these logs increase to a point where it’s no longer efficient.
Therefore, Ericsson has used machine learning models using manual feature extraction
patterns to analyze these logs according to the best understanding of which features
should be used for classification. However, this manual feature engineering gives no
guarantee of correlation between the best representation of the logs and the output
of the classification model. In this thesis, we have shown that a data-driven NLP
approach where concatenated bag-of-words representations for each log file fitted on
an XGBoost classifier was able to match the production model used by Ericsson.
Attempts to incorporate sequential representations of the log entries and parameter
lists produced by the Spell and Drain log parsers did not yield improved results.
In addition, while deep learning models like Transformers combined with neural
Word2Vec embeddings were able to produce similar results, they are prohibitively
complex in relation to the simpler solution. Our findings indicate that the baseband
unit logs do not show the same high variability in sentence structure, nor seem to
depend on structures of sequences for different hardware- or software faults. We
also propose that care should be taken when treating logs as texts found in other
classical NLP tasks, like sentiment analysis, or document classification where the text
is in fact directly generated by humans, as opposed to automatic logging systems.
All tested models were evaluated on a holdout test dataset used by the current
production model. The existing Ericsson model achieved a macro F1-score of 0.866,
the XGBoost model 0.885, and the Transformer model 0.861.
Beskrivning
Ämne/nyckelord
NLP, log, classification, machine learning, word embedding, LSTM, transformer, XGBoost, Spell, Drain.