Scalable Anomaly-Based Network Intrusion Detection Using a Statistical Model and Data Sketches
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The growing frequency and complexity of cyber-attacks, especially Distributed Denial of Service (DDoS) attacks, has made protecting networks a major priority for businesses. Traditional Network Intrusion Detection Systems (NIDS) often struggle
to cope with the large volumes of traffic seen in todays networks. These systems can be inefficient, often bogged down by high memory usage and significant computational demands. In this thesis, we propose a solution to these challenges by
developing a more efficient, scalable system for detecting anomalies in network traffic. Our approach, the Baseline Configuration, combines a statistical model with probabilistic data structures, such as Count-Min Sketch and HeavyKeeper Sketch,
to process high volumes of traffic in real-time while keeping resource consumption to a minimum. At the core of the Baseline Configuration is a statistical model built around the Interquartile Range (IQR) rule, which adjusts a detection threshold based on changes in network traffic. This helps the system identify abnormal patterns without flagging harmless variations as threats. To make the system even more responsive, we incorporate sliding window techniques, enabling it to continuously monitor traffic in small, manageable time segments. This ensures that the system remains accurate and efficient, even when network traffic spikes. The performance of the proposed system is tested using different datasets, including
traffic data from Ericsson and the Center for Applied Internet Data Analysis (CAIDA). CAIDA is a well-known repository that provides real-world internet traffic traces commonly used for network research. The memory efficiency and processing
times are compared to a Hash Map and Priority Queue (HP) Configuration, which uses these data structures instead of the Count-Min and HeavyKeeper sketch. Additionally, the detection accuracy and performance of the Baseline Configuration are
compared to a Machine Learning (ML) Configuration which uses the Isolation Forest algorithm. The evaluation results demonstrate that the Baseline Configuration not only provides higher detection accuracy but also operates with significantly lower memory usage and faster response times than the other configurations. The systems ability to adapt to increasing traffic without compromising its performance makes it suitable for large-scale network environments. Through this work, it is shown that combining statistical models with data sketches provides a cost-effective, scalable, and efficient solution for real-time network intrusion detection for DDoS attacks.
Beskrivning
Ämne/nyckelord
Cybersecurity, Distributed Denial of Service (DDoS) Attacks, Network v Intrusion Detection Systems (NIDS), Anomaly Detection, Statistical Models, Data Sketches, Count-Min Sketch, HeavyKeeper Sketch, Interquartile Range (IQR), Sliding Window Technique, Real-Time Traffic Monitoring, Scalable Detection Systems, Hash Map and Priority Queue Configuration, Memory Efficiency, Processing Efficiency, Machine Learning (ML), Isolation Forest, Detection Accuracy, Resource Efficiency, Traffic Analysis, Network Traffic Fluctuations, Large-Scale Network Security, Real-Time Network Intrusion Detection
