Flow-Based Detection of Linux Backdoor Communication - A NetFlow Based ML-Approach to Backdoor Detection in Linux Environments

dc.contributor.authorEspinosa, Naomi
dc.contributor.authorMalki, Lenia
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerSabelfeld, Andrei
dc.contributor.supervisorSabelfeld, Andrei
dc.date.accessioned2025-01-08T10:32:11Z
dc.date.available2025-01-08T10:32:11Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractThe increasing prevalence of Linux-based systems and their susceptibility to malware attacks necessitates effective detection mechanisms for backdoor communication. This thesis explores the application of machine learning (ML) models to detect backdoor communication in Linux environments using flow-based data. Specifically, it leverages NetFlow traffic data. The study aims to determine the effectiveness of ML techniques in identifying malicious patterns associated with backdoor communication without inspecting the actual payload. Linux systems are underrepresented in existing benchmark datasets, which predominantly focus on Windows environments. To address this gap, our research trains models on flow data specific to Linux malware and environments. Through data preprocessing steps including feature mapping, aggregation, scaling, and feature selection methodologies like ANOVA F-test, models were trained and evaluated on both benign and malicious traffic datasets. The results indicate that ensemble models such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost) can effectively distinguish between normal and anomalous traffic patterns, highlighting the potential of flow-based detection systems in enhancing network security. The Synthetic Minority Over-sampling Technique (SMOTE) was applied to address class imbalance, further improving the detection performance though in terms of precision. We conclude that flow-based data is a valuable tool for training models to classify malicious traffic in Linux environments. Future work will focus on acquiring or creating higher quality datasets of malicious Linux malware traffic to improve the capabilities of detection systems.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/309054
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectbackdoor detection
dc.subjectmachine learning
dc.subjectNetFlow
dc.subjectLinux
dc.subjectmalware
dc.subjectnetwork security
dc.subjectanomaly detection
dc.subjectflow-based data
dc.subjectbig data
dc.titleFlow-Based Detection of Linux Backdoor Communication - A NetFlow Based ML-Approach to Backdoor Detection in Linux Environments
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer systems and networks (MPCSN), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 24-48 LM NSE.pdf
Storlek:
1.8 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: