Malicious Traffic Generator for ML-Based Network Anomaly Detection

Mourad, Mohammad

Malicious Traffic Generator for ML-Based Network Anomaly Detection

dc.contributor.author	Mourad, Mohammad
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Yu, Yinan
dc.contributor.supervisor	Tran, Muoi
dc.contributor.supervisor	Chanis, Ilias
dc.date.accessioned	2026-06-30T08:16:13Z
dc.date.issued	2026
dc.date.submitted
dc.description.abstract	Good-quality labelled network traffic remains a key bottleneck for research in machine learning-based intrusion detection. Canadian Institute for Cybersecurity Intrusion De tection System 2017 (CICIDS2017) and University of New South Wales Network-Based 2015 (UNSW-NB15) public benchmarks played an important role for evaluation; how ever, these datasets are mostly static and problematic to reproduce, adapt or modify to fit new requirements posed by containerization and service-oriented architectures. This thesis addresses this problem by proposing a reproducible framework to construct, replay, capture and label malicious traffic from Packet Capture (PCAP) traces inside a Docker based testbed. The framework identifies communicating hosts, protocol edges, DNS names, and Dynamic Host Configuration Protocol (DHCP) metadata from the input traces. These elements are then mapped into a synthetic multi-zone topology with automatic Docker Compose configuration generation. Traffic is then rewritten and replayed from simulated source containers via a Scapy-based replay engine. A routed gateway is used as an observation point, a delay-injection point, and a capture point. Metadata about the replay process is stored as ground truth, traffic is converted to Zeek connection logs, and flow labels are derived based on replay time windows and traffic class metadata. An additional packet to-flow mapping step is performed to improve data traceability. While a new intrusion detection model is not a key contribution of this thesis, it in troduces a reproducible pipeline for constructing a malicious traffic dataset. After early live-execution trials, the project shifted to a replay-based design in order to improve re producibility, containment, and experimental control. Preliminary machine learning (ML) evaluation using Zeek connection-level features and an Extreme Gradient Boosting (XGBoost) classifier showed that replay-generated datasets achieved classification performance close to datasets generated directly from the original PCAPtraces. The results suggest that the replay process preserved many of the flow-level statistical properties relevant for ML-based intrusion-detection tasks.
dc.identifier.coursecode	DATX05
dc.identifier.uri	https://hdl.handle.net/20.500.12380/311648
dc.setspec.uppsok	Technology
dc.subject	malicious traffic generation, PCAP replay, Docker testbed, Zeek flow labelling, packet-to-flow traceability, intrusion detection, machine learning.
dc.title	Malicious Traffic Generator for ML-Based Network Anomaly Detection
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Computer systems and networks (MPCSN), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 26-38 MM.pdf
Size:: 850.65 KB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen