Malicious Traffic Generator for ML-Based Network Anomaly Detection
| dc.contributor.author | Mourad, Mohammad | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Yu, Yinan | |
| dc.contributor.supervisor | Tran, Muoi | |
| dc.contributor.supervisor | Chanis, Ilias | |
| dc.date.accessioned | 2026-06-30T08:16:13Z | |
| dc.date.issued | 2026 | |
| dc.date.submitted | ||
| dc.description.abstract | Good-quality labelled network traffic remains a key bottleneck for research in machine learning-based intrusion detection. Canadian Institute for Cybersecurity Intrusion De tection System 2017 (CICIDS2017) and University of New South Wales Network-Based 2015 (UNSW-NB15) public benchmarks played an important role for evaluation; how ever, these datasets are mostly static and problematic to reproduce, adapt or modify to fit new requirements posed by containerization and service-oriented architectures. This thesis addresses this problem by proposing a reproducible framework to construct, replay, capture and label malicious traffic from Packet Capture (PCAP) traces inside a Docker based testbed. The framework identifies communicating hosts, protocol edges, DNS names, and Dynamic Host Configuration Protocol (DHCP) metadata from the input traces. These elements are then mapped into a synthetic multi-zone topology with automatic Docker Compose configuration generation. Traffic is then rewritten and replayed from simulated source containers via a Scapy-based replay engine. A routed gateway is used as an observation point, a delay-injection point, and a capture point. Metadata about the replay process is stored as ground truth, traffic is converted to Zeek connection logs, and flow labels are derived based on replay time windows and traffic class metadata. An additional packet to-flow mapping step is performed to improve data traceability. While a new intrusion detection model is not a key contribution of this thesis, it in troduces a reproducible pipeline for constructing a malicious traffic dataset. After early live-execution trials, the project shifted to a replay-based design in order to improve re producibility, containment, and experimental control. Preliminary machine learning (ML) evaluation using Zeek connection-level features and an Extreme Gradient Boosting (XGBoost) classifier showed that replay-generated datasets achieved classification performance close to datasets generated directly from the original PCAPtraces. The results suggest that the replay process preserved many of the flow-level statistical properties relevant for ML-based intrusion-detection tasks. | |
| dc.identifier.coursecode | DATX05 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12380/311648 | |
| dc.setspec.uppsok | Technology | |
| dc.subject | malicious traffic generation, PCAP replay, Docker testbed, Zeek flow labelling, packet-to-flow traceability, intrusion detection, machine learning. | |
| dc.title | Malicious Traffic Generator for ML-Based Network Anomaly Detection | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | Computer systems and networks (MPCSN), MSc |
