Critical Event Prediction in Logs at Customer Network
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Implementing effective maintenance prognosis for Radio units at Ericsson can result
in a number of benefits, including better system safety, improved operational
reliability, longer equipment lifespan, and lower maintenance costs. Preventive investigations
and repairs on the hardware and software level can be done to avoid
the radio unit from failing by forecasting whether or not the radio unit will have an
alarm in the near future. The goal of this thesis was to use multiple logs taken from
a radio unit to predict whether an alarm would occur in the next one to nine days.
The log file contents have been divided into chunks using different approaches like
expanding window, independent chunks and time interval chunks where each chunk
labeled according to timestamp of the alarm. Ericsson has used a combination of
verdicts (features that are defined by subject matter experts) to extract the best
features from the log files. This rule-based approach is inefficient since it requires
modification of the script using expert knowledge when there is a change in the
design of the hardware.
The purpose of this thesis project was achieved using data-driven NLP approaches
including log parsers and word embeddings. An independent chunks approach with
Drain log parser using concatenated bag-of-words representations for each log file fitted
on the Xgboost model outperformed other combination of log parsers and word
embeddings. LSTM model was used with 1 day interval chunks to see if the complex
sequential model can achieve a sufficient score. Experiments using complex sequential
model, such as the LSTM many-to-many model with doc2vec embedding, have
shown shown that they can predict alerts before they occur. All the tested models
were evaluated using cross-validation. The Xgboost model with the independent
chunks approach using Drain log parser and BOW embedding achieved an average
F1-score of 0.873, LSTM model with time interval chunks approach using doc2vec
embedding achieved average 0.853 F1-score across shifting time periods from one to
nine days.
Beskrivning
Ämne/nyckelord
NLP, log, predictive maintenance, classification, machine learning, word embedding, LSTM, XGBoost, AWSOM-LP, Drain.