Malicious Traffic Generator for ML-Based Network Anomaly Detection

dc.contributor.authorMourad, Mohammad
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerYu, Yinan
dc.contributor.supervisorTran, Muoi
dc.contributor.supervisorChanis, Ilias
dc.date.accessioned2026-06-30T08:16:13Z
dc.date.issued2026
dc.date.submitted
dc.description.abstractGood-quality labelled network traffic remains a key bottleneck for research in machine learning-based intrusion detection. Canadian Institute for Cybersecurity Intrusion De tection System 2017 (CICIDS2017) and University of New South Wales Network-Based 2015 (UNSW-NB15) public benchmarks played an important role for evaluation; how ever, these datasets are mostly static and problematic to reproduce, adapt or modify to fit new requirements posed by containerization and service-oriented architectures. This thesis addresses this problem by proposing a reproducible framework to construct, replay, capture and label malicious traffic from Packet Capture (PCAP) traces inside a Docker based testbed. The framework identifies communicating hosts, protocol edges, DNS names, and Dynamic Host Configuration Protocol (DHCP) metadata from the input traces. These elements are then mapped into a synthetic multi-zone topology with automatic Docker Compose configuration generation. Traffic is then rewritten and replayed from simulated source containers via a Scapy-based replay engine. A routed gateway is used as an observation point, a delay-injection point, and a capture point. Metadata about the replay process is stored as ground truth, traffic is converted to Zeek connection logs, and flow labels are derived based on replay time windows and traffic class metadata. An additional packet to-flow mapping step is performed to improve data traceability. While a new intrusion detection model is not a key contribution of this thesis, it in troduces a reproducible pipeline for constructing a malicious traffic dataset. After early live-execution trials, the project shifted to a replay-based design in order to improve re producibility, containment, and experimental control. Preliminary machine learning (ML) evaluation using Zeek connection-level features and an Extreme Gradient Boosting (XGBoost) classifier showed that replay-generated datasets achieved classification performance close to datasets generated directly from the original PCAPtraces. The results suggest that the replay process preserved many of the flow-level statistical properties relevant for ML-based intrusion-detection tasks.
dc.identifier.coursecodeDATX05
dc.identifier.urihttps://hdl.handle.net/20.500.12380/311648
dc.setspec.uppsokTechnology
dc.subjectmalicious traffic generation, PCAP replay, Docker testbed, Zeek flow labelling, packet-to-flow traceability, intrusion detection, machine learning.
dc.titleMalicious Traffic Generator for ML-Based Network Anomaly Detection
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer systems and networks (MPCSN), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 26-38 MM.pdf
Size:
850.65 KB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Size:
2.35 KB
Format:
Item-specific license agreed upon to submission
Description: