Flow-Based Detection of Linux Backdoor Communication - A NetFlow Based ML-Approach to Backdoor Detection in Linux Environments

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

The increasing prevalence of Linux-based systems and their susceptibility to malware attacks necessitates effective detection mechanisms for backdoor communication. This thesis explores the application of machine learning (ML) models to detect backdoor communication in Linux environments using flow-based data. Specifically, it leverages NetFlow traffic data. The study aims to determine the effectiveness of ML techniques in identifying malicious patterns associated with backdoor communication without inspecting the actual payload. Linux systems are underrepresented in existing benchmark datasets, which predominantly focus on Windows environments. To address this gap, our research trains models on flow data specific to Linux malware and environments. Through data preprocessing steps including feature mapping, aggregation, scaling, and feature selection methodologies like ANOVA F-test, models were trained and evaluated on both benign and malicious traffic datasets. The results indicate that ensemble models such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost) can effectively distinguish between normal and anomalous traffic patterns, highlighting the potential of flow-based detection systems in enhancing network security. The Synthetic Minority Over-sampling Technique (SMOTE) was applied to address class imbalance, further improving the detection performance though in terms of precision. We conclude that flow-based data is a valuable tool for training models to classify malicious traffic in Linux environments. Future work will focus on acquiring or creating higher quality datasets of malicious Linux malware traffic to improve the capabilities of detection systems.

Beskrivning

Ämne/nyckelord

backdoor detection, machine learning, NetFlow, Linux, malware, network security, anomaly detection, flow-based data, big data

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced