Flow-Based Detection of Linux Backdoor Communication - A NetFlow Based ML-Approach to Backdoor Detection in Linux Environments
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The increasing prevalence of Linux-based systems and their susceptibility to malware
attacks necessitates effective detection mechanisms for backdoor communication.
This thesis explores the application of machine learning (ML) models to detect
backdoor communication in Linux environments using flow-based data. Specifically,
it leverages NetFlow traffic data. The study aims to determine the effectiveness of
ML techniques in identifying malicious patterns associated with backdoor communication
without inspecting the actual payload. Linux systems are underrepresented in
existing benchmark datasets, which predominantly focus on Windows environments.
To address this gap, our research trains models on flow data specific to Linux malware
and environments. Through data preprocessing steps including feature mapping,
aggregation, scaling, and feature selection methodologies like ANOVA F-test,
models were trained and evaluated on both benign and malicious traffic datasets.
The results indicate that ensemble models such as Random Forest (RF) and Extreme
Gradient Boosting (XGBoost) can effectively distinguish between normal and
anomalous traffic patterns, highlighting the potential of flow-based detection systems
in enhancing network security. The Synthetic Minority Over-sampling Technique
(SMOTE) was applied to address class imbalance, further improving the detection
performance though in terms of precision. We conclude that flow-based data is a
valuable tool for training models to classify malicious traffic in Linux environments.
Future work will focus on acquiring or creating higher quality datasets of malicious
Linux malware traffic to improve the capabilities of detection systems.
Beskrivning
Ämne/nyckelord
backdoor detection, machine learning, NetFlow, Linux, malware, network security, anomaly detection, flow-based data, big data