Critical Event Prediction in Logs at Customer Network

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Publicerad
2022
Författare
Hajizada, Elmar
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Implementing effective maintenance prognosis for Radio units at Ericsson can result in a number of benefits, including better system safety, improved operational reliability, longer equipment lifespan, and lower maintenance costs. Preventive investigations and repairs on the hardware and software level can be done to avoid the radio unit from failing by forecasting whether or not the radio unit will have an alarm in the near future. The goal of this thesis was to use multiple logs taken from a radio unit to predict whether an alarm would occur in the next one to nine days. The log file contents have been divided into chunks using different approaches like expanding window, independent chunks and time interval chunks where each chunk labeled according to timestamp of the alarm. Ericsson has used a combination of verdicts (features that are defined by subject matter experts) to extract the best features from the log files. This rule-based approach is inefficient since it requires modification of the script using expert knowledge when there is a change in the design of the hardware. The purpose of this thesis project was achieved using data-driven NLP approaches including log parsers and word embeddings. An independent chunks approach with Drain log parser using concatenated bag-of-words representations for each log file fitted on the Xgboost model outperformed other combination of log parsers and word embeddings. LSTM model was used with 1 day interval chunks to see if the complex sequential model can achieve a sufficient score. Experiments using complex sequential model, such as the LSTM many-to-many model with doc2vec embedding, have shown shown that they can predict alerts before they occur. All the tested models were evaluated using cross-validation. The Xgboost model with the independent chunks approach using Drain log parser and BOW embedding achieved an average F1-score of 0.873, LSTM model with time interval chunks approach using doc2vec embedding achieved average 0.853 F1-score across shifting time periods from one to nine days.
Beskrivning
Ämne/nyckelord
NLP, log, predictive maintenance, classification, machine learning, word embedding, LSTM, XGBoost, AWSOM-LP, Drain.
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index