AI-Powered Behavioral Analysis of Vehicle Communication to Strengthen API Security Master’s thesis in Computer science and engineering Johanna Edh & Aurora Veldhuis Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2025 Master’s thesis 2025 AI-Powered Behavioral Analysis of Vehicle Communication to Strengthen API Security Johanna Edh & Aurora Veldhuis Department of Computer Science and Engineering Chalmers University of Technology University of Gothenburg Gothenburg, Sweden 2025 AI-Powered Behavioral Analysis of Vehicle Communication to Strengthen API Se- curity Johanna Edh & Aurora Veldhuis © Johanna Edh & Aurora Veldhuis, 2025. Supervisor: Adina Aniculaesei, Computer Science Department Examiner: Wolfgang Ahrendt, Computer Science Department Master’s Thesis 2025 Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg SE-412 96 Gothenburg Telephone +46 31 772 1000 Typeset in LATEX Gothenburg, Sweden 2025 iv AI-Powered Behavioral Analysis of Vehicle Communication to Strengthen API Se- curity Johanna Edh & Aurora Veldhuis Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg v Abstract As vehicles become increasingly connected, the volume of API communication be- tween cars and cloud-based services grows, exposing new security risks. Traditional rule-based security systems, such as AWS Web Application Firewall, are limited to detecting known threats and patterns that can be pre-defined in the ruleset. This thesis explores the use of AI-powered anomaly detection, specifically the Isolation Forest algorithm, as a complement to existing rule-based methods to secure API traffic in connected vehicles. A series of experiments were conducted using both synthetic and real-world API request data. The results show that Isolation Forest can effectively detect anomalous requests, especially when trained on sufficiently large and representative datasets. Comparisons with a rule-based system revealed that AI-based methods might be better at identifying unknown threats, while rule- based filters remain reliable for known attack patterns. Overall, the study highlights the potential of combining machine learning with traditional approaches to create more adaptive and intelligent API security systems for connected vehicles. Keywords: Anomaly Detection, Machine Learning, Vehicle Communication, Isola- tion Forest, API Communication, API Security vi Statement Regarding The Use of Generative AI In this report generative AI tools have been used to assist with grammar, sentence formulation, and language refinement. Although the core analysis, ideas, and con- clusions are made from human authorship, AI has supported the writing process to improve clarity and readability. The AI tool used is ChatGPT. All critical thinking, experiment design, data interpretation, and decisions presented in the report were carried out by the authors. viii Acknowledgements A huge thank you to our supervisor, Adina Aniculaesei, from the Department of Computer Science at Chalmers, who has helped us structure this study, reviewed all our drafts, and supported us throughout the entire process. We would also like to express our sincere thanks to Alfred Kjeller from WirelessCar, who helped us with the contact to the company. Without the support of both of you, this thesis would not have been possible. A final thank you to our examiner, Wolfgang Ahrendt, for your guidance. Johanna Edh and Aurora Veldhuis, Gothenburg, 2025-06-10 x xii Contents List of Figures xv List of Tables xvii 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Theory 5 2.1 API Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 API Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2.1 Common Types of API Attacks . . . . . . . . . . . . 7 2.2 Vehicle Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 AWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 API Gateways and the Shared Responsibility Model . . . . . . 9 2.3.2 AWS Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.3 AWS S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.4 AWS WAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.5 Managed Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Machine Learning and AI . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 AI for API Security . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Behavioral Analysis . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.3 Supervised and Unsupervised Learning Algorithms . . . . . . 14 2.4.4 Isolation Forests . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.5 Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.6 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Methods 21 3.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Selection of Machine Learning Model . . . . . . . . . . . . . . . . . . 24 3.3 Threat Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 xiii Contents 3.4 Isolation Forest Experiment Set Up . . . . . . . . . . . . . . . . . . . 25 3.4.1 Initial Experiment . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.2 Contamination Experiment . . . . . . . . . . . . . . . . . . . 28 3.4.3 Dataset Size Experiment . . . . . . . . . . . . . . . . . . . . . 29 3.4.3.1 Experiment with Real Life Data . . . . . . . . . . . 31 3.5 Rule Based Testing Through AWS . . . . . . . . . . . . . . . . . . . . 31 3.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4 Results 33 4.1 Isolation Forest Results . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.1 Initial Experiment on Generated Dataset . . . . . . . . . . . . 33 4.1.1.1 Generated Dataset with No Anomalies in Training set 33 4.1.1.2 Generated Dataset with a Subset of Handmade Anoma- lies in Training Batch . . . . . . . . . . . . . . . . . 34 4.1.1.3 Generated Dataset with Handmade Anomalies in Train- ing and Testing Batch . . . . . . . . . . . . . . . . . 35 4.1.2 Experiments With Contamination On Generated Dataset . . . 37 4.1.2.1 Contamination Set to 10% . . . . . . . . . . . . . . . 37 4.1.2.2 Contamination Set to 5% . . . . . . . . . . . . . . . 37 4.1.2.3 Contamination Set to 15% . . . . . . . . . . . . . . . 38 4.1.2.4 Contamination Set to ’auto’ . . . . . . . . . . . . . . 39 4.1.3 Experiments with Different Dataset Sizes . . . . . . . . . . . . 39 4.1.3.1 Dataset of Size 100 . . . . . . . . . . . . . . . . . . . 40 4.1.3.2 Dataset of Size 1,000 . . . . . . . . . . . . . . . . . . 40 4.1.3.3 Dataset of Size 10,000 . . . . . . . . . . . . . . . . . 41 4.1.3.4 Dataset of Size 100,000 . . . . . . . . . . . . . . . . . 41 4.1.4 Experiments On Company Data . . . . . . . . . . . . . . . . . 42 4.2 Rule Based System Results . . . . . . . . . . . . . . . . . . . . . . . . 42 5 Discussion 45 5.1 Initial Experiment Evaluation . . . . . . . . . . . . . . . . . . . . . . 45 5.2 Contamination Experiment Evaluation . . . . . . . . . . . . . . . . . 46 5.3 Dataset Size Experiment Evaluation . . . . . . . . . . . . . . . . . . . 46 5.4 Real Life Data Experiment Evaluation . . . . . . . . . . . . . . . . . 47 5.5 Rule-Based System Experiment Evaluation . . . . . . . . . . . . . . . 47 5.6 Experiment Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.7 Improvements and Future Work . . . . . . . . . . . . . . . . . . . . . 49 6 Conclusion 51 Bibliography 53 A Appendix 1 I xiv List of Figures 1.1 Context of the project. . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 Isolation Forest visualization with three different iTrees. . . . . . . . . 16 4.1 The plot of anomaly scores of the isolation forest built and trained on a generated dataset of 1,000,000 samples where no anomalies are included. Here the test set contains 100 samples for a better visual- ization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2 The confusion matrix of the isolation forest built on a generated dataset where no anomalies are included. . . . . . . . . . . . . . . . . 34 4.3 The plot of anomaly scores of the isolation forest built on a generated dataset where a subset of anomalies are included in the training set. Here the test set contains 100 samples for a better visualization. . . . 35 4.4 The confusion matrix of the isolation forest built on a generated dataset where a subset anomalies are included in the training set. . . 35 4.5 The plot of anomaly scores of the isolation forest built on a generated dataset where the whole set of anomalies are included in the training set. Here the test set contains 100 samples for a better visualization. 36 4.6 The confusion matrix of the isolation forest built on a generated dataset where the whole set of anomalies are included in the training set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.7 The confusion matrix of an isolation forest built on a dataset of 250,000 samples. The real contamination proportion is 10% and the contamination parameter is set to 10%. . . . . . . . . . . . . . . . . . 37 4.8 The confusion matrix of an isolation forest built on a dataset of 250,000 samples. The real contamination proportion is 10% and the contamination parameter is set to 5%. . . . . . . . . . . . . . . . . . 38 4.9 The confusion matrix of an isolation forest built on a dataset of 250,000 samples. The real contamination proportion is 10% and the contamination parameter is set to 15%. . . . . . . . . . . . . . . . . . 38 4.10 The confusion matrix of an isolation forest built on a dataset of 250,000 samples. The real contamination proportion is 10% and the contamination parameter is set to ’auto’. . . . . . . . . . . . . . . . . 39 4.11 The confusion matrix of an isolation forest built on a dataset of 100 samples. The real contamination proportion is 10% and the contam- ination parameter is set to 10%. . . . . . . . . . . . . . . . . . . . . . 40 xv List of Figures 4.12 The confusion matrix of an isolation forest built on a dataset of 1,000 samples. The real contamination proportion is 10% and the contam- ination parameter is set to 10%. . . . . . . . . . . . . . . . . . . . . . 40 4.13 The confusion matrix of an isolation forest built on a dataset of 10,000 samples. The real contamination proportion is 10% and the contam- ination parameter is set to 10%. . . . . . . . . . . . . . . . . . . . . . 41 4.14 The confusion matrix of an isolation forest built on a dataset of 100,000 samples. The real contamination proportion is 10% and the contamination parameter is set to 10%. . . . . . . . . . . . . . . . . . 41 4.15 The confusion matrix of an isolation forest built on a real life dataset of 8,186 samples. The real contamination proportion is 10% and the contamination parameter is set to ’auto’. . . . . . . . . . . . . . . . . 42 4.16 The confusion matrix from the AWS system set up with managed rules and the generated dataset. . . . . . . . . . . . . . . . . . . . . . 42 xvi List of Tables 2.1 Common attacks in an API system. . . . . . . . . . . . . . . . . . . . 8 2.2 Core Rule Set (CRS). . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Admin Protection Rule Set. . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Known Bad Inputs Rule Set. . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 A confusion matrix for binary classification. TN and TP represent the number of correct predicted classes, while FN and FP represents the number of misclassified instances. . . . . . . . . . . . . . . . . . . 18 3.1 Overview of the structure of the generated request logs. . . . . . . . . 22 3.2 Overview of some parameters available for construction of an isolation forest in Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Overview of the parameters available for constructing an Isolation Forest using scikit-learn in Python. . . . . . . . . . . . . . . . . . . . 26 3.4 Test 1: Generated dataset with malicious only in testing. . . . . . . . 28 3.5 Test 2: Generated dataset with malicious in both training and testing. 28 3.6 Test 3: Generated dataset with all malicious requests included in training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.7 Test 4: Contamination parameter set to 0.10. . . . . . . . . . . . . . 29 3.8 Test 5: Contamination parameter set to 0.05. . . . . . . . . . . . . . 29 3.9 Test 6: Contamination parameter set to 0.15. . . . . . . . . . . . . . 29 3.10 Test 7: Contamination parameter set to ’auto’. . . . . . . . . . . . . 29 3.11 Test 8: Dataset size 100. . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.12 Test 9: Dataset size 1,000. . . . . . . . . . . . . . . . . . . . . . . . . 30 3.13 Test 10: Dataset size 10,000. . . . . . . . . . . . . . . . . . . . . . . . 30 3.14 Test 11: Dataset size 100,000. . . . . . . . . . . . . . . . . . . . . . . 30 3.15 Test 12: Real company dataset (10% malicious). . . . . . . . . . . . . 31 3.16 Test 13: AWS setup with generated dataset (10% malicious). . . . . . 31 4.1 Results of the initial experiments. Each result is illustrated with a corresponding plot of anomaly scores and a confusion matrix. . . . . . 33 4.2 Results of tests on the contamination rate. Each result is illustrated with a corresponding confusion matrix. . . . . . . . . . . . . . . . . . 37 4.3 Results of experiments on different dataset sizes. Each result is illus- trated with a corresponding confusion matrix. . . . . . . . . . . . . . 39 4.4 Results of experiments on aws setup, rule-based approach. . . . . . . 43 4.5 Examples of malicious requests (label = 1) allowed by AWS WAF. . . 43 xvii List of Tables xviii 1 Introduction As vehicles become increasingly connected, the amount of data flowing between cars and backend systems grows exponentially. Each day, millions of API calls are exchanged between vehicles and servers around the world. Companies such as WirelessCar specialize in providing connected vehicle services where communication happens largely through API requests and responses. With so many data sources at work, making these API communications secure and reliable is crucial not only to protect sensitive user data, but also to maintain system integrity and availability. Unauthorized access, data breaches and cyberattacks pose significant risks to both vehicle software manufacturers and users. Traditional security measures like rule-based systems are coded to detect and block known threats using static patterns and preconfigured attack signatures. A system like this performs well against well-known vulnerabilities such as SQL injections and cross- site scripting (XSS). A rule-based system however, does not perform well against new attacks or covert malicious activity that does not conform to known patterns or behaviors. To ensure the integrity and availability of vehicle communication networks, a more intelligent approach to API security is necessary [1]. This thesis explores the potential of AI-powered anomaly detection to strengthen API security in connected vehicles. By leveraging machine learning techniques, we aim to complement existing rule-based methods with a more dynamic and predictive security solution. 1.1 Background API security at WirelessCar is built on a multi-layered approach that includes so- lutions such as AWS Web Application Firewall (WAF), which uses predefined rule sets to help detect known attack patterns. While effective against well-documented threats such as SQL injections and cross-site scripting (XSS), these systems lack when exposed to more sophisticated attacks that do not conform to known patterns. The increasing complexity of cyber threats requires more advanced security mecha- nisms. Anomaly detection, a technique commonly used in cybersecurity, can identify suspicious activity by analyzing deviations from common behavior. When applied to API requests, anomaly detection can potentially uncover previously unseen attack methods, improving the overall security of a connected vehicle ecosystem [1]. 1 1. Introduction For this project one of WirelessCars applications dedicated to remote services will be studied. The focus will be on the traffic coming from the application to their servers. The remote services this application handles are things like unlocking and locking the car, remote honk, flash control, and many more. This is illustrated in 1.1. Figure 1.1: Context of the project. Finding and exploring new, effective, and, potentially more, manageable solutions to aid API security is therefore a crucial step to be able to further expand the connectivity in large businesses. Protecting users and critical information is essential, as failure to do so can be costly. According to IBM, the average cost of a data breach in 2020 was $3.86 million [1]. 1.2 Purpose The purpose of this project is to evaluate how well an AI-powered solution can per- form compared to a rule-based system. Specifically, this thesis aims to see how an AI model can be used to identify unusual patterns and parameters in API requests. Ultimately, this thesis aims to provide insight on how AI and machine learning mod- els can improve API security and complement traditional methods to pave the way for more dynamic and intelligent cybersecurity solutions in the world of connected vehicles. To achieve this purpose the study aims to answer the following research questions: • How effectively can an AI-based anomaly detection model identify malicious API requests compared to a rule-based system? • What types of attacks or anomalies can an AI model detect that rule-based security systems fail to recognize? • What are the advantages and limitations of using AI-based anomaly detection for API security? 2 1. Introduction 1.3 Goal The goal of this thesis is to design, implement, and evaluate an AI-based anomaly detection system for API security. This will be achieved by: • Implementing and training an AI model for detecting anomalies in API re- quests. • Assessing its effectiveness in detecting real-world API threats. • Comparing its performance against AWS WAFs rule-based security. • Analyzing the strengths and weaknesses of both approaches. In the long run, this research aims to contribute to the evolution of more adaptive cybersecurity solutions, paving the way for intelligent, self-learning security systems that can proactively defend against emerging threats. 1.4 Thesis Description This thesis is structured as follows: • Chapter 1 Introduction: Introduces the background, motivation, and goals of the thesis, including the research questions addressed. • Chapter 2 Theory: Provides the theoretical background on API commu- nication, security concerns, AWS infrastructure, and machine learning tech- niques relevant to anomaly detection. • Chapter 3 Methods: Describes the experimental setup, including dataset creation and preprocessing, model selection, and the configuration of the Iso- lation Forest algorithm. • Chapter 4 Results: Presents the outcomes of the conducted experiments, including performance evaluations of the AI model and comparisons with rule- based systems. • Chapter 5 Discussion: Interprets the experimental findings, reflects on model reliability, and discusses the implications and limitations of the results. • Chapter 6 Conclusion: Summarizes the main findings, connects them to the research questions, and outlines directions for future work. 3 1. Introduction 4 2 Theory This chapter provides the essential technical background necessary to fully under- stand the project. It begins by exploring API communication, covering the funda- mentals of what an API is and the intricacies of vehicle communication within this context. Next, security concerns related to APIs are addressed, focusing on the challenges and solutions for ensuring secure data exchange. Following this, a section dedicated to AWS (Amazon Web Services) and machine learning is presented, out- lining their relevance to the project. Finally previous work in these areas is reviewed, offering additional context and insights into the projects foundation. 2.1 API Communication While APIs enable efficient communication between systems, especially in connected vehicles, they also introduce a broad attack surface for malicious actors. The fol- lowing section will cover API communications, explaining what an API is, how they are used in vehicle communication, and, lastly, what security risks there are in an API infrastructure. 2.1.1 APIs A study done by SlashData in 2020 showed that 90 % of developers use APIs and a total of 30 % of their time is spent coding APIs [2]. Application Programming Interfaces, or APIs, are a crucial part of smooth communication and technical devel- opment today. An API is a set of rules and protocols that enable effective exchange of data, features, and functionality between software applications. This communica- tion is mostly done through a series of requests and responses between clients and servers [3], the service sending the request being the client and the service receiving it being the server [4]. An API can operate in different scopes„ which are usually defined by four categories, that determine how the API can be used and what its purpose is. These categories are typically formalized as protocols or architectures. They consist of [4]; • SOAP API (Simple Object Access Protocol API): Uses XML as a messaging standard for network communication [4]. 5 2. Theory • RPC API (Remote Procedure Call): Allows a client to execute functions or procedures on a remote server as if they were local [4]. • WebSocket API: Supports real-time, two-way communication using JSON ob- jects to pass data [4]. • REST API (Representational State Transfer API): Uses HTTP requests such as GET, PUT, HEAD, and DELETE to interact with resources and is the most widely used architecture for web services [5]. There are four main types of APIs, each operating within a different scope. Public APIs are accessible to any external entity, while private APIs, also called internal APIs, are restricted to communication within an organization. Partner APIs are designed for external use but are only available to select outside services and users, often in business-to-business interactions. Lastly, composite APIs combine multiple API types, allowing them to work together in sequence or as a unified system [5]. 2.1.2 API Security The widespread usage of APIs for communication creates vulnerabilities that mali- cious actors can take advantage of. Securing APIs has therefore become a key part of keeping a system secure and trustworthy. More than 83% of all internet traffic in 2018 was attributed to Web APIs and of around 30% of all authentication attempts on APIs were found to be from malicious actors [6]. In 2021 it was predicted that 90% of web-enabled applications would be exposed to cyberattacks by 2021 due to inadequate API security measures [1]. Millions of API calls occur daily, generating huge traffic volumes that are difficult to analyze. This amount of traffic creates massive amounts of exploitable vulnerabilities in a system. In 2001 a foundation called the Open Worldwide Application Security Project (OWASP) was launched with the mission "To be the global open community that powers secure software through education, tools, and collaboration". Since their launch OWASP has become a huge contributor in the field of cyber security. In 2023 they released a list of of top 10 API security risks in 2023. These risks were defined as follows [7]: • Broken Object Level Authorization (BOLA) • Broken Authentication • Broken Object Property Level Authorization • Unrestricted Resource Consumption • Broken Function Level Authorization • Unrestricted Access to Sensitive Business Flows • Server-Side Request Forgery (SSRF) • Security Misconfiguration • Improper Inventory Management 6 2. Theory • Unsafe Consumption of APIs API security is a combination of several established security disciplines like informa- tion security, network security and application security. Information security focuses on protecting data throughout its life cycle, network security ensures the safe trans- mission of data and the protection of the network as a whole, and application security ensures that a software system is designed to withstand attacks. Knowledge from all of these practices are needed when designing a secure API system [8]. Key components of API security include API gateways, which serve as mediators between clients and backend services. These gateways act as reverse proxies, com- bining multiple APIs into what appears to be a single API for users. This not only simplifies interactions by providing a more unified interface but also enhances secu- rity by enforcing authentication, authorization, and rate-limiting policies to prevent abuse and unauthorized access [8]. Another crucial security measure is a Web Application Firewall (WAF), which op- erates at a higher level that traditional firewalls by inspecting and filtering HTTP traffic. Unlike standard firewalls that primarily control access based on IP addresses and ports, a WAF analyzes the content of incoming requests, blocking common at- tacks such as SQL injections, cross-site scripting (XSS) and other threats targeting web applications [8]. To further strengthen security, organizations can also implement an intrusion de- tection system (IDS) and intrusion prevention system (IPS). These tools monitor internal network traffic for anomalies or malicious activity. An IDS primarily detects and alerts administrators to suspicious patterns, while an IPS goes a step further by actively blocking potentially harmful traffic before it reaches critical systems. To- gether, these security measures create multiple layers of protection for an application [8]. In general, API security aims to achieve three key goals: confidentiality, ensuring that information can only be read by its intended audience; integrity, preventing unauthorized modification, creation, or deletion of information; and availability, en- suring that legitimate users can access the information when needed. Other desirable properties include accountability and nonrepudiation, which ensure that actions can be traced back to the user who performed them and that the user cannot deny having done so [8]. Building a secure and effective solution for cloud services is demanding and costly; therefore, companies usually take the help of a cloud service provider (CSP). There are three major CSPs today, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). These providers offer companies with an environ- ment technology and infrastructure with which to set up services. Each provider supports different cloud architectures and deployment models, allowing businesses to choose solutions based on their needs [9]. 2.1.2.1 Common Types of API Attacks Table 2.1 presents some of the most common attacks in an API system [10]: 7 2. Theory Attack Type Target How Risk Injection At- tacks Request body, query parameters, headers Injecting malicious code (SQL, XML, commands) into in- put fields Data theft, manip- ulation, or full sys- tem compromise DoS/DDoS API endpoints (high request vol- ume/frequency) Overloading the server with exces- sive requests from one (DoS) or many (DDoS) sources Service outage, degraded perfor- mance, financial and reputational loss Authentication Hijacking Authorization headers, tokens (e.g., JWT) Stealing or manip- ulating tokens to impersonate legiti- mate users Unauthorized access, data breaches, identity theft Data Expo- sure API responses, data in transit Exposing sensitive data due to design flaws or lack of en- cryption Privacy violations, regulatory fines, sensitive data leaks Parameter Tampering URL query param- eters, path param- eters, request body Manipulating parameters (e.g., user_id, limit, price) to access data or alter trans- actions Data leaks, unau- thorized actions, fi- nancial fraud Man-in- the-Middle (MitM) Data in transit (client ↔ API server) Intercepting or modifying API communication (exploiting weak or missing TL- S/SSL) Data theft, injec- tion of malicious data, session hi- jacking Table 2.1: Common attacks in an API system. 2.2 Vehicle Communication Vehicle communication involves how a car gathers, processes, and shares data both within the vehicle’s systems and with external devices or platforms. This is the core of modern vehicle telematics and infotainment systems, allowing cars to become part of the Internet of Things (IoT) [11]. 8 2. Theory A vehicle telematics system connects a car to the outside world. The technology uses telecommunications and computers to send, receive, and store information about the vehicle and driving patterns. Its purpose is safety, vehicle monitoring, and commu- nication. The system can include things like: GPS tracking, automatic alerts in case of an accident on the road ahead of the ego-vehicle, remote control of the vehicle, like locking or unlocking the vehicle, vehicle health reports and maintenance reminders, emergency help, and speed monitoring. In some cases reports and patterns collected by a telematics system can be used as a basis to pay a lower insurance premium if the driver is considered to behave safely in traffic. This is called Usage-Based Insurance (UBI) [12]. One major advancement in telematics is its use in smartphone-based platforms. Instead of relying solely on factory-installed systems, mobile apps and aftermarket devices can now provide telematics functions, making the technology more accessible and widely adopted[12]. While telematics focuses on vehicle monitoring and external communication, info- tainment systems are designed for in-car entertainment and user interaction. Mod- ern infotainment systems include touchscreen interfaces, voice commands, Bluetooth connectivity, and smartphone integration, making driving more convenient and con- nected [13]. 2.3 AWS Amazon Web Services (AWS) offers a suite of tools and services to enhance API security, primarily through the Amazon API Gateway. The following section covers the AWS services used in the project. 2.3.1 API Gateways and the Shared Responsibility Model The API Gateway acts as the entry point for applications to access data, logic, or functionality from back-end services, managing traffic, security, and API monitor- ing. It supports both REST and WebSocket APIs, making it suitable for various applications, including serverless and container-based solutions. Additionally, API Gateway provides version control and seamless integration with other AWS services to enhance security and monitoring [14]. One of the fundamental aspects of API security in AWS is the shared responsibility model. AWS is responsible for the security of the cloud which includes protecting the infrastructure that runs AWS services. This encompasses the physical security of data centers, hardware, and the software that operates AWS services. The customers are responsible for security in the cloud, which enables managing the security of their applications and data. This includes configuring security settings, managing access controls, and ensuring data protections [15]. 9 2. Theory 2.3.2 AWS Lambda When deploying an API Gateway in AWS, you have the option of avoiding the need for an actual server to handle responses. This is done through AWS Lambda, which is a server-less calculation service that lets you run your code without having to consider servers or clusters. The service executes code in response to events and automatically handles the underlying resources. This means that the user does not need to handle any commissions or infrastructure. One of the big advantages of the AWS Lambda is the automatic scaling as an answer to code executing requests of all sizes. AWS Lambda can be used for a variety of applications, including fast large- scale data processing, running interactive web and mobile backends, and creating event-driven applications [16]. 2.3.3 AWS S3 Amazon Simple Storage Service (Amazon S3) is a scalable object storage service to store and retrieve any amount of data from anywhere. Data is stored as objects in buckets, each uniquely identified by a key. It offers multiple storage classes with cost- efficient pricing and automated life cycle management. Common use cases include data leakes, backup and recovery, low-cost archiving, and powering generative AI applications [17]. 2.3.4 AWS WAF AWS WAF (Web Application Firewall) is a service that protects your web appli- cations, such as a API Gateway, from common attacks and exploits. With AWS WAF, you can create security rules to manage bot traffic and block common attack patterns such as SQL injection and cross-site scripting (XSS). One of the advan- tages of AWS WAF is that it saves time by using managed rules. It also makes it easier to monitor, block, or rate-limit common and recurring bot traffic. In addition, it improves visibility into your web traffic by giving you detailed control over how statistics are generated [18]. Use cases for AWS WAF include filtering web traffic by creating rules based on various conditions such as IP addresses, HTTP headers and body, or custom URIs. It can also be used to prevent fraud related to account takeovers by monitoring the application’s login page for unauthorized access with compromised credentials. Furthermore, AWS WAF can be managed through APIs, enabling automated rule creation and maintenance, as well as integration into the development and design process [18]. 2.3.5 Managed Rules AWS Managed Rules is a managed service that provides protection against applica- tion vulnerabilities and other unwanted traffic. These are collections of predefined, ready-to-use rules created and maintained by AWS and vendors on AWS Market- place. Rule groups from AWS Managed Rules can be added to Web Access Control List (ACL) to be used to protect an application [19]. 10 2. Theory These rule groups are designed to protect against common web threats and, when used as documented, add an extra layer of security to your applications. However, they are not intended to replace the users own security responsibilities, which de- pend on what AWS resources are in use. Many AWS and Marketplace vendors provide automatic updates to these rule groups as new vulnerabilities and threats are discovered. In some cases, AWS may receive information about vulnerabilities before public disclosure, allowing preemptive updates to AWS Managed Rules. To protect vendors’ intellectual property and prevent malicious actors from bypassing the rules, the details of individual rules within a managed rule group are not fully visible [20]. Baseline managed rule groups provide general protection against a wide range of common threats. Users can select one or more of these rule groups to establish basic protection for their resources. The following rule groups are included in the baseline category [21]: Table 2.2: Core Rule Set (CRS). Rule Group Core Rule Set (CRS) Name AWSManagedRulesCommonRuleSet Description Provides general protection against a wide range of vul- nerabilities, including OWASP Top 10 risks. Adds labels for monitoring and further rule evaluation. Example Rules • NoUserAgent_HEADER: Blocks requests without a User-Agent header. • UserAgent_BadBots_HEADER: Blocks bad bots (e.g., Nessus, Nmap). • SizeRestrictions: Blocks oversized query strings, cookies, body, or URI paths. • EC2MetaDataSSRF: Blocks attempts to exfiltrate EC2 metadata. • GenericLFI: Detects Local File Inclusion (LFI) at- tacks. • RestrictedExtensions: Blocks unsafe system file extensions. • GenericRFI: Detects Remote File Inclusion (RFI) attempts. • CrossSiteScripting: Detects common XSS pat- terns. Table 2.3: Admin Protection Rule Set. Rule Group Admin Protection Name AWSManagedRulesAdminProtectionRuleSet 11 2. Theory Description Blocks external access to common administrative paths, reducing the risk of unauthorized access to administra- tive interfaces. Adds labels for monitoring. Example Rule • AdminProtection_URIPATH: Blocks requests to known admin paths (e.g., sqlmanager). Table 2.4: Known Bad Inputs Rule Set. Rule Group Known Bad Inputs Name AWSManagedRulesKnownBadInputsRuleSet Description Blocks known bad patterns commonly associated with exploitation or vulnerability discovery. Adds labels for monitoring. Example Rules • Java Deserialization RCE detection across headers, body, URI path, and query string. • Host_localhost_HEADER: Blocks Host headers targeting localhost. • PROPFIND_METHOD: Blocks the HTTP PROPFIND method. • ExploitablePaths_URIPATH: Blocks attempts to access exploitable paths (e.g., web-inf). • Log4j (CVE-2021-44228) detection in headers, body, URI path, and query string. 2.4 Machine Learning and AI Machine learning (ML) is a subset of artificial intelligence (AI) focused on enabling computers and machines to learn and adapt by identifying patterns. It involves de- veloping algorithms that allow systems to perform tasks autonomously and improve their accuracy by being exposed to training data [22]. 2.4.1 AI for API Security AI and ML are increasingly used in cybersecurity, both for application and for net- work protection, as well as for threat and risk analysis across data sources. ML mod- els are good at identifying relationships between various types of threats, suspicious IP addresses, and abnormal behaviors through advanced data analysis. In real-time applications where fast responses are critical, ML’s ability to quickly analyze secu- rity data, make decisions, and trigger actions are valuable for API developers and managers [23]. 12 2. Theory Traditional API security measures focus on access control mainly through authen- tication, authorization, rate limiting, and network privacy. Although these provide protection at some layers, they are not always sufficient to address more specialized threats such as API-specific Denial of Service (DoS) attacks, application layer at- tacks, data exfiltration, or credential-based attacks. AI-powered security solutions are a great complement to traditional methods, providing deeper insights into API traffic patterns, historical attacks, and real-time anomaly detection. AI solutions enable proactive defense mechanisms that adapt to new and evolving threats [23]. A comprehensive API security strategy requires not only basic security functions but also anomaly detection capabilities. This serves as a first line of defense, where malicious behavior can be detected and flagged immediately, often without prior knowledge of specific attacks or pre-written rules. AI and ML are well suited for building intelligent API security solutions capable of identifying unusual behaviors, harmful data trends, and blocking attacks in dynamic environments. Over time, such systems can continuously learn and improve, detecting deviations from normal behavior even without explicit attack signatures or policies [23]. Various machine learning algorithms such as Naïve Bayes, K-Nearest Neighbors (KNN), Decision Trees, Random Forests, Support Vector Machines (SVM), as well as Deep Learning models and Neural Networks, are commonly recommended and applied in API security to strengthen detection and response capabilities [23]. 2.4.2 Behavioral Analysis The growing use of IoT (Internet of Things) technology, such as vehicle telematics systems, has made these devices attractive targets for attackers who exploit common security and access control weaknesses. Many IoT threats rely on simple vulnerabili- ties. A well-known example is the Mirai botnet, which exploited the Telnet protocol due to weak or default security configurations. Botnets like Mirai can use DNS to communicate with their Command and Control (C2) servers or even leverage DNS itself as an attack vector to increase traffic [24]. Event logging and notification systems are fundamental to effective cybersecurity. Traditional Intrusion Detection Systems (IDS) primarily focus on analyzing system and network logs. However, as noted in the literature, monitoring can also involve direct traffic analysisranging from advanced honeypot-based detection to more tra- ditional approaches like deep packet inspection (DPI) or proxy-based analysis at central network nodes. A combination of these methods can result in a robust hybrid IDS capable of identifying various types of malicious traffic [24]. Signature-based detection remains a powerful and relatively user-friendly approach, though heuristic and anomaly-based methods may trigger more false positives. Hon- eypots offer a unique strategy by intentionally exposing endpoints to attract suspi- cious traffic. This traffic often exhibits abnormal patterns, which can be used to generate new detection signatures [24]. Large-scale analysis of REST API usage reveals that many APIs suffer from design flaws, such as improper use of HTTP methods and operation tunneling through 13 2. Theory query parameters, both of which diverge from standard RESTful practices. These poor implementation choices can introduce detectable anomalies in behavior, sup- porting the case for behavior-based anomaly detection systems [25]. Another common issue is the inconsistent naming and structuring of resources within RESTful APIs. A study examining real-world APIs like Facebook, Twitter, and YouTube identified frequent mistakes known as linguistic antipatterns. For example, a URL such as https://www.example.com/newspapers/players?id=123 combines two unrelated resources, “newspapers” and “players”, in a single endpoint. This can confuse both developers and automated systems, making it unclear what the API is meant to do. In behavior analysis, such deviations may be flagged as anomalies since the structure breaks typical design patterns. For models trained to recognize normal behavior, these flawed requests can increase error rates and false positives. Detecting antipatterns is therefore essential not only for better API design but also for improving the reliability of behavior-based detection systems [26]. API behavior analysis is not only a technical security tool. It also serves as a valu- able business intelligence asset. It provides insight into how applications, services, and users interact. This goes beyond simply measuring how often an API is used. It involves collecting meaningful data that helps organizations understand usage patterns and their impact [27]. By analyzing the frequency and nature of API calls, businesses can identify popular features, usability issues, or areas where users encounter problems. These insights help teams better understand user behavior and adjust services accordingly. API analysis also reveals market trends, enabling companies to stay agile and meet evolv- ing demands [27]. 2.4.3 Supervised and Unsupervised Learning Algorithms Two common learning techniques in machine learning are supervised learning and unsupervised learning. Supervised learning involves algorithms trained on labeled data, meaning each input is paired with a correct known output. Such models learn by comparing their predictions with the corresponding labels and adjusting themselves to minimize errors. Unsupervised learning is another approach where input data is unlabeled and the goal is to recognize patterns and structures by only considering its features [28]. Algorithms developed with supervised learning are commonly used to solve classifi- cation problems, where the goal is to determine the correct class for a given input. Some commonly used models for such problems are listed below: • Logistic Regression • Support Vector Machines • Decision Trees • Random Forests • K-Nearest Neighbors 14 2. Theory • Convolutional Neural Networks Unsupervised learning models are useful for anomaly detection, a component of data analysis aimed at identifying irregularities among normal data. Since these models do not require labeled data, they can detect anomalies in an unsupervised manner [29]. Some commonly used models for such problems are listed below [30]: • Isolation Forests • K-Means Clustering • One-class support vector machine (SVM) • One-class SVM with stochastic gradient descent (SGD) • Robust covariance 2.4.4 Isolation Forests In an Isolation Forest, the goal is to isolate anomalies, where anomalies are data points that differ from the rest of a dataset. Within this concept, isolation refers to the process of separating an instance from the rest of the data. The separation process relies on the characteristics of the anomalies: they are small in number and obtain different attributes compared to normal data instances. With this in consideration, Isolation Forests are constructed using binary trees, where instances are partitioned recursively. Anomalies tend to be isolated with a shorter path in such a tree due to their rarity and distinct attribute values [31]. Each binary tree in an Isolation Forest, in this context also called an isolation tree (or iTree), consists of two different types of nodes. A node is either an external node with no children or an internal node with two daughter nodes. A tree is grown by sampling instances from the data set and, at each node, randomly selecting an attribute q and a split value p. The test q < p determines wether the path to a data point travels to the left or right daughter node. This process continues until only one data point remains in a node or all instances at a node have the same value q. Hence, a path in an isolation tree, from the root node to an leaf node, represents an instance of a data point from the considered data set. The traversal of a data point depends on the randomly selected splits [31]. To be able to determine which external nodes represent anomalies, an anomaly score is calculated for each data instance. Since anomalies are rare and have distinct attribute values, they are more likely to be isolated early in the tree, resulting in a shorter path from the root node to an external node. With the results from multiple isolation trees combined into an Isolation Forest, anomalies are detected by analyzing the average path length of each data point. This average path length is used to derive an anomaly score s, with s defined as: s(x, ψ) = 2−E(h(x)) c(ψ) (2.1) where h(x) is the path length of data point x, E(h(x)) is the average path length 15 2. Theory of data point x, and c(ψ) is the average path length of unsuccessful searches in a Binary Search Tree (BST) of size ψ. c(ψ) can be utilized in this way due to the structural equivalence between isolation trees and a BST. c(ψ) is defined as: c(ψ) =  2H(ψ − 1) − 2(ψ−1) ψ for ψ > 2 1 for ψ = 2 0 otherwise (2.2) with H(i) being the harmonic number, estimated as H(i) ≈ ln(i) + 0.5772156649 (Euler’s constant) [31]. The anomaly score s(x, ψ) lies in the range (0, 1], where a score close to 1 suggests that the data point is likely an anomaly, and a score close to 0 indicates it is likely normal [31]. A visualization of how an Isolation Forest works is shown in Figure 2.1. In this figure, three different isolation trees are displayed. Each tree represents decision paths for a subset of data points in the dataset. Normal data points are visualized as blue nodes, while anomalies are shown as red nodes. The deeper a node appears in a tree, the more similar the corresponding data point is to the majority of the data. If a data point is isolated early in the tree (i.e., closer to the root), it indicates that the point is significantly different from the rest, and is therefore classified as an anomaly. Figure 2.1: Isolation Forest visualization with three different iTrees. 2.4.5 Data Encoding A machine learning model typically requires data to be represented as numerical values. This means raw data must be transformed before the model can interpret it. This transformation is usually done using an encoding tool designed to preserve patterns, balance or normalize the data, and handle missing values to prevent errors [32]. A commonly used paradigm for encoding and data transformations is the fit-predict paradigm. The transformer should be fitted only on the training data, for example, 16 2. Theory recording the mean and standard deviation when using a standard scaler. The training data is then transformed using the fitted transformer before training the model. The same fitted transformer is later used to transform the test data before evaluation. Fitting the transformer on the entire dataset before splitting can lead to data leakage, resulting in misleading model evaluations [32]. Two commonly used encoding types are a one-hot encoder and a text data encoder. One-hot encoding is used for nominal data, creating binary features for each category without implying order. It converts a feature with n values into n separate 0/1 features, which can increase dimensionality and requires handling unseen categories in the test set- When encoding text data the data must be converted into numerical form for the ML models. This is often done by tokenizing text into words and assigning unique indexes and representing each sample as word indexes. These indexes can then be transformed using one-hot encoding or embeddings [32]. Scikit-learn1 offers a variety of encoding tools that works well for ML models [33]. The OneHotEncoder, CountVectorizer and TfidfVectorizer are three of them. The OneHotEncoder is typically used for categorical data that has no inheret order [34], while the CountVectorizer and TfidfVectorizer are used for text data. The CountVectorizer creates a bag-of-words representation meaning the text is tokenized into words and each words occurance is counted [35]. TfidfVectorizer applied Term Frequency-Ineverse Document Frequency (TF-IDF) weighting instead of raw counts. Assigns higher importance to words that appear frequently in a document but rarely across the dataset. This method helps to reduce the impact of words of high fre- quency [36]. 2.4.6 Evaluation Metrics Various metrics are commonly used to evaluate machine learning models. Two of the most widely used metrics for classification problems are the accuracy and the confusion matrix, both of which are used in this project. Accuracy measures the proportion of correct predictions made by a model relative to the total number of input samples. It is calculated as the ratio of correctly classified classes to the total number of predictions [37]. Accuracy = number of correct predictions total number of predictions A confusion matrix is a matrix in quadratic form that visualizes the performance of a classification model. Each row represents the real class labels, while each column represents the predicted class labels. This structure helps to identify where the model is making correct predictions and where it is not, providing a more detailed understanding of its performance beyond the accuracy [37]. An example of the structure on a confusion matrix can be seen in Table 2.5. 1scikit-learn, a Python library for machine learning. Available at: https://scikit-learn. org/stable/ 17 https://scikit-learn.org/stable/ https://scikit-learn.org/stable/ 2. Theory Predicted: Negative Predicted: Positive Real: Negative True Negative (TN) False Positive (FP) Real: Positive False Negative (FN) True Positive (TP) Table 2.5: A confusion matrix for binary classification. TN and TP represent the number of correct predicted classes, while FN and FP represents the number of misclassified instances. 2.5 Related Work In [38], Alfardus and Rawat propose a machine learning approach to improve security in in-vehicle networks (IVNs). These networks are used for communication between different components in modern cars, such as sensors, infotainment systems, and control units. However, as vehicles become more connected, IVNs also become more vulnerable to cyber-attacks. To address this problem, the authors explore how deep learning and feature engineering can be used to detect anomalies in IVNs and through that improve cybersecurity. They used real-world IVN traffic data for the experiment, which included both normal and attack traffic. Before training any models the dataset was normalized so that it was easier to work with. After that that useful features were chosen and extracted from the traffic data. These included statistical features, such as the average and variance of the signals, as well as time and frequency domain features. A convolutional neural network (CNN) was used to learn deep features, these are complex patterns in the data that might not be easy to see with traditional methods. These features were used to train two deep learning models. The first model was a deep neural network (DNN), which was used for direct classification of traffic as normal or abnormal. The second model was a deep autoencoder, which was trained only on normal traffic and then tested on its ability to reconstruct inputs. If the autoencoder could not reconstruct an input well, it was likely an anomaly. This idea is based on the assumption that normal data is easy to reconstruct, while attack data is not. The experiment produced promising results. Their proposed method achieved around 95% accuracy, with an F1 score of 0.95. This means that the system was good at detecting both actual attacks and avoiding false alarms. When the deep learning models were compared to more traditional machine learning models like Support Vector Machine (SVM), Random Forest, and K-Nearest Neighbors (KNN), the deep learning models performed better in all evaluation metrics. This suggests that using a combination of feature engineering and deep learning can significantly improve the performance of anomaly detection in IVNs. The importance of carefully selecting and tuning the models’ hyperparameters was also discussed by the authors. For example, they found that a smaller learning rate and a slightly deeper network led to better results. In addition, they emphasized that combining hand-made features with learned features from CNNs gave a more complete picture of the network traffic, which improved detection performance. 18 2. Theory In conclusion, the study shows that deep learning models, especially when combined with good feature engineering, are a powerful tool for detecting anomalies in vehicle networks. However, the authors noted that future work is needed to make the models more robust against more complex or unknown types of attacks. They also suggested using larger and more diverse datasets in future studies to improve the generalization of the models. In [39], Edmund Fosu Agyemang provides a comprehensive analysis of five promi- nent unsupervised machine learning algorithms tailored for anomaly detection. The algorithms evaluated include One-Class Support Vector Machine (One-Class SVM), One-Class SVM with Stochastic Gradient Descent (SGD), Isolation Forest, Local Outlier Factor (LOF), and Robust Covariance (also known as the Elliptic Envelope method). The purpose of the study was to explore how these models perform in controlled simulation settings and to provide insight into their practical applicability. The study was conducted using a synthetically generated dataset designed to mimic real-world scenarios where anomalies are rare and well-separated from the normal data. The data consisted of two-dimensional features, with 100 normal points clus- tered around two centers and 20 uniformly distributed anomalies. This setup allowed for a focused investigation into how each model responds to clear outliers, an ideal setting to understand baseline performance, although less representative of messy real-world data. A key aspect of the research was that all models were trained exclusively on nor- mal data points. This reflects a common challenge in anomaly detection: the rarity or complete absence of labeled anomalies during training. The models were then evaluated based on their ability to correctly identify outliers using accuracy, preci- sion, recall, and F1 score. These metrics provided a nuanced understanding of the trade-offs each algorithm would entail. Special attention was given to the model selection process and the motivations be- hind each algorithms inclusion. One-Class SVM, a boundary-based method, was selected due to its ability to encapsulate the region containing normal data and identify deviations. Its variant, One-Class SVM with SGD, was introduced to ad- dress scalability issues by enabling more efficient training on large datasets using stochastic updates. Isolation Forest was included for its unique approach of isolating anomalies through random splits, making it effective and fast in high-dimensional settings. LOF was selected as a representative of density-based methods, assessing local density deviations to identify outliers. Lastly, Robust Covariance was chosen for its statistical grounding in modeling data distribution and identifying anomalies as points lying outside an estimated Gaussian envelope. The study found that the performance of each algorithm was highly dependent on the characteristics of the dataset. One-Class SVM and Robust Covariance achieved perfect recall but suffered from moderate precision, suggesting they were good at capturing all anomalies but often at the cost of misclassifeying some normal data points. In contrast, One-Class SVM with SGD exhibited excellent precision mean- ing it rarely misclassified normal points but at the expense of very low recall, miss- ing many actual outliers. This makes it suitable in contexts where false positives 19 2. Theory are especially costly. Isolation Forest emerged as a strong general-purpose option, providing a good balance between recall and precision, and maintaining a high F1 score. Meanwhile, LOF performed the worst in this specific setting, possibly due to its sensitivity to neighborhood parameters and the relatively uniform distribution of anomalies in the synthetic dataset. Overall, the article emphasizes that there is no one-size-fits-all algorithm for anomaly detection. The effectiveness of each method hinges on both the data characteristics (such as distribution, dimensionality, and density) and the operational context of its application. For example, applications prioritizing the minimization of false alarms may benefit from high-precision models like One-Class SVM with SGD, while scenarios where catching every anomaly is critical (such as fraud or fault detection) may require high-recall models like Robust Covariance or traditional One-Class SVM. The study also underlines the importance of careful hyperparameter tuning and encourages further research using real-world datasets to validate the insights gained from the controlled simulation. 20 3 Methods This chapter explains how the data was prepared and how the machine learning model was selected and implemented. It also describes the experiments conducted and the tests that were performed. 3.1 The Dataset Getting data for the project turned out to be challenging. Due to GDPR regulations and internal policies at WirelessCar, access to their data took a long time. It had to pass through several security checks before it could be used for training the model. Because of the limited time available for the project, it became necessary to start working with a different dataset while waiting for access to WirelessCars data. At first, the project was carried out using computer-generated datasets, made up of normal, non-malicious API requests. These had a similar structure to WirelessCars real data in terms of which components were generated and the general structure of the requests. Since these generated datasets only contained normal traffic, abnormal requests were manually created based on common injection attack patterns. This allowed for testing the model’s ability to detect unusual behavior. The components of the generated dataset are listed in Table 3.1, where each com- ponent is provided with a short explanation and an example entry from a request log. Component Explanation Example IP of Client IP address of the client that sent the address. 125.87.60.188 Remote Log Name Remote name of the client sending the request. This in- formation is hidden or not available. - User ID ID of the client sending the request. This information is hidden or not available. - 21 3. Methods Component Explanation Example Date and Time in UTC format The date and time of the re- quest. [27/Dec/2037:12:00:00 +0530] Request Type, API, Protocol and Version An API string that con- tains the type of the re- quest (GET, POST, PUT or DELETE), the API of the website to which the request is related, and the protocol and its version used for con- necting with the server. "GET /usr HTTP/1.0" Status Code The code the server returns after the request. For exam- ple, 200 is returned when the request was performed suc- cessfully. 404 Byte The amount of data in bytes that was sent back from the server to the client. 4961 Referrer The websites/source from where the user was directed to the current website. If none it is represented by -. http://www.parker- miller.org/tag/list/list/privacy/ UA String The user agent string con- tains details of the browser and the host device (like the name, version, device type etc.). "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 OPR/73.0.3856.329" Response Time The response time the server took to serve the request. 2529 Table 3.1: Overview of the structure of the generated request logs. Later in the project, real data from WirelessCar became available. It included two datasets with normal API traffic from the application and WirelessCars servers. Due to security precautions, some fields in the data were hashed to protect sensi- tive information. Although the data was unlabeled, it had already passed through WirelessCars internal security filters, so it was assumed to only contain legitimate, non-harmful, traffic. As with the earlier dataset, custom attack samples were created 22 3. Methods and added in order to test the models performance in spotting malicious activity. Listing 3.1: Example of a raw API request log entry. 125.87.60.188 - - [27/Dec/2037:12:00:00 +0530] "GET /usr HTTP/1.0" 404 4961 "http://www.parker-miller.org/tag/list/list/privacy/" "Mozilla /5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 OPR/73.0.3856.329" 2529 3.1.1 Data Preprocessing Since attacks on APIs can be carried out in various ways, on various parts of the API and communication process, it was decided to only analyze one request attribute. The API string, which includes the request type, path, and protocol, was chosen because it is known to be vulnerable to threats such as injection attacks and path traversal. The dataset was generated using publicly available code1 , where some entries were modified to better suit the needs of the project. Specifically, API requests containing the path segment ’admin’ were reclassified as malicious instead of normal. This was based on the assumption that administrative access should not be publicly accessible, which was later enforced through an AWS security rule that was set up. An example of non-malicious generated log component is shown in Listing 3.2. Listing 3.2: Example of a normal API component. "DELETE /usr HTTP/1.0" To be able to evaluate the implemented machine learning models, malicious requests were generated for 10% of the data. Common injection attacks were randomly added as payloads to the path components of the API strings. Each request was labeled by adding a new field, where 0 was assigned to normal requests and 1 to the generated malicious ones. See Listing 3.3 for the malicious paths used, and Listing 3.4 for an example of a malicious path added to an API component. Listing 3.3: Malicious payloads added to the API paths. "1' OR '1'='1" "/api/etc/user/passwd" "rm -rf /" "105'; DROP TABLE users; --" "/admin" Listing 3.4: Example of a malicious API string with a SQL injection. "PUT /usr1' OR '1'=1 HTTP/1.0" For the models to process textual data, its attributes had to be encoded into numer- ical values. OneHotEncoder, CountVectorizer, and TfidfVectorizer from the Python 1Code and data from Vishnu U., Server Logs Dataset. Kaggle. Available at: https://www. kaggle.com/datasets/vishnu0399/server-logs (accessed 2025-02-21). 23 https://www.kaggle.com/datasets/vishnu0399/server-logs https://www.kaggle.com/datasets/vishnu0399/server-logs 3. Methods module scikit-learn were tested, with TfidfVectorizer2 being selected. The TfidfVec- torizer is an implementation of the TF-IDF (Term Frequency-Inverse Document Frequency), a common measure for natural language processing used to evaluate the importance of words in a text document relative to an entire collection of docu- ments [40]. 3.2 Selection of Machine Learning Model The first step of the project was to select a suitable machine learning model for anomaly detection. Since the raw data that servers receive is unlabeled it made sense to use a unsupervised model to be able to accurately assess it’s potential. To guide this choice, several related reports were reviewed, with one particularly influential study being Anomaly Detection Using Unsupervised Machine Learning Algorithms: A Simulation Study by Edmund Fosu Agyemang [39], discussed in Chapter 2.5. This study compared five commonly used unsupervised models under controlled conditions, including Isolation Forest, One-Class SVM, and Robust Covariance. All models were trained only on normal data and then tested on a mix of normal and anomalous points, which is a setup similar to our project. In our project the model will however be trained on a dataset containing both normal and malicious requests. The results from the study showed that Isolation Forest offered the best overall balance between precision and recall, and it consistently achieved high F1 scores across different types of data. It also performed well on datasets with different structures and feature sets, which was important for our project, as we used both generated and real-world data from WirelessCar that differed in format. Given its strong general-purpose performance, low sensitivity to data dimensionality, and efficiency on larger datasets, Isolation Forest was chosen as the most appropriate model for our needs. Its ability to isolate anomalies without requiring labeled attack data made it a particularly good fit for this project, since that is how the model would have to operate in practice. 3.3 Threat Modeling Before starting the project a threat model was made to identify potential threats that the application is exposed to. This was made by analyzing the application, its security risks, and infrastructure. This was also done to be able to accurately hand-craft attacks that would simulate threats that might arise in real life. In this application all common API threats, such as traffic overload (DoS/DDoS), au- thentication hijacking, injection attacks, Man-in-the-Middle (MitM), data exposure, and parameter tampering, would be possible. To be able to analyze the results in clear, and since only one part of the API-request was chosen to train the model, only one specific attack type was chosen. Since the 2TfidfVectorizer from scikit-learn v1.6.1. Documentation: https://scikit-learn.org/ stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html 24 https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html 3. Methods request parameter studied in this experiment was the API string, injection attacks was chosen as the primary threat to focus on. 3.4 Isolation Forest Experiment Set Up Isolation forests in this project were implemented using the isolation forest algo- rithm3 from the scikit-learn library in Python. The isolation forest model in scikit- learn provides several methods for its functionality, and the methods used together with a description of their functionalities are listed in Table 3.2: Method Functionality fit(X) Builds a fitted isolation forest estimator based on the input samples X. decision_function(X) Returns the mean anomaly score of the trees in the isolation forest for each instance in the dataset X. predict(X) Predicts whether each data point from X is an anomaly or not based on a fitted model. -1 is return for anomalies and 1 for normal in- stances. Table 3.2: Overview of some parameters available for construction of an isolation forest in Python. In addition to this isolation forest’s fitting and prediction capabilities, such a model can be adjusted with the parameters listed in Table 3.3. Parameter Explanation n_estimators The number of estimators (trees) in the isolation forest. max_samples int, float or ’auto’. The number of samples drawn from the dataset to train each estimator. If ’auto’, then max_samples = min(256, n_samples). contamination ’auto’ or float. The proportion of anomalies in the dataset. If set to a float value, it sets the threshold for prediction and must be in the range (0, 0.5]. If ’auto’, anomalies are detected as in a standard Isolation Forest (see Section 2.4.4). max_features int or float. The number of features to draw from the dataset to train each estimator. 3IsolationForest from scikit-learn v1.6.1. Documentation: https://scikit-learn.org/ stable/modules/generated/sklearn.ensemble.IsolationForest.html 25 https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html 3. Methods bootstrap bool. If True, trees are fit on random subsets of the data with replacement. If False, without replacement. n_jobs int. The number of parallel jobs to run during model fitting. random_state int, RandomState instance, or None. Controls randomness in feature selection and split values. Ensures reproducibility if set. verbose int. Controls the verbosity of the tree building process. warm_start bool. If True, allows additional trees to be added to an exist- ing forest. Table 3.3: Overview of the parameters available for constructing an Isolation Forest using scikit-learn in Python. An example of an isolation forest implementation can be seen in Listing 3.5, where X_train and X_test contain the pre-processed data used for training and testing. 1 from sklearn.ensemble import IsolationForest 2 3 model = IsolationForest( 4 n_estimators = 200, 5 max_samples = 0.8, 6 contamination = 0.1, 7 n_jobs = -1, 8 verbose = 1 9 ) 10 11 model.fit(X_train) 12 13 anomaly_scores = model.decision_function(X_test) 14 anomaly_lables = model.predict(X_test) Listing 3.5: Isolation forest with scikit-learn. To answer the research questions posed in this thesis, multiple experiments were conducted. Each experiment was designed with a specific purpose to address differ- ent aspects of the overall research objectives. The experimental process began with validating the setup and dataset to ensure proper functionality. Based on the initial analysis of results, further experiments were carried out to deepen the investigation. The initial experiments used a generated dataset, and later, when company data became available, additional tests were performed using that data. The general setup for each experiment involved implementing the Isolation Forest algorithm with specific configurations, dataset types, and dataset sizes. This allowed for evaluating both the functionality of the Isolation Forest approach and the im- pact of various real-world factors on its performance. Datasets of varying sizes were 26 3. Methods generated and used for both training and testing. Multiple experiments were per- formed with different values for the contamination parameter to assess its effect on detection accuracy. The fit method from the scikit-learn library was used for train- ing, while the predict and decision_function methods were used for evaluation. Since the fit method does not provide evaluation scores, performance assessment relied entirely on the results obtained from predict and decision_function. To measure the performance of the Isolation Forest implementations, accuracy met- rics4 and confusion matrices5 were evaluated using tools provided by the scikit-learn library (see Listing 3.6). 1 from sklearn.metrics import accuracy_score, confusion_matrix 2 3 predicted_lables = (anomaly_lables == -1).astype(int) 4 5 accuracy = accuracy_score(real_lables, predicted_lables) 6 cm = confusion_matrix(real_lables, predicted_lables) Listing 3.6: Evaluation metrics with scikit-learn. 3.4.1 Initial Experiment To begin, three Isolation Forest models were generated to evaluate the initial setup and determine how to proceed with the remaining experiments. The first model was trained using a generated dataset consisting of one million normal data points. During testing, a dataset of 10,000 data points were used of which 10% were hand- crafted malicious requests. Following this, a second model was implemented, this time including some of the malicious requests in the training set. The testing was done the same way as in the first test. Lastly, a third test was done with all of the malicious requests used during training, while the testing was carried out in the same way again. The purpose of this experiment was to understand how the Isolation Forest behaves when exposed to malicious requests only during testing, compared to when such requests are present in both training and testing. For both of these tests with malicious requests in the training set, the contamination rate was set to the actual rate of bad requests. 4accuracy_score from scikit-learn v1.6.1. Documentation: https://scikit-learn.org/ stable/modules/generated/sklearn.metrics.accuracy_score.html 5confusion_matrix from scikit-learn v1.6.1. Documentation: https://scikit-learn.org/ stable/modules/generated/sklearn.metrics.confusion_matrix.html 27 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html 3. Methods Table 3.4: Test 1: Generated dataset with malicious only in testing. Attribute Value Training Data Generated dataset, no malicious data (1,000,000 data points) Testing Data 10,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter ’auto’ (estimated by model, can not be zero) Ground Truth Contamination 0 Table 3.5: Test 2: Generated dataset with malicious in both training and testing. Attribute Value Training Data Generated dataset with a subset of the hand- crafted malicious requests (1,000,000 data points; 100,000 malicious, 900,000 benign) Testing Data 10,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter 0.10 Ground Truth Contamination 0.10 Table 3.6: Test 3: Generated dataset with all malicious requests included in training. Attribute Value Training Data Generated dataset with all hand-crafted ma- licious requests included (1,000,000 data points; 100,000 malicious, 900,000 benign) Testing Data 10,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter 0.10 Ground Truth Contamination 0.10 3.4.2 Contamination Experiment During the initial experiments it became clear that the contamination parameter significantly influences the performance of the model. To better understand this relationship, four additional Isolation Forest models were trained with varying con- tamination settings. The purpose of these experiments is to examine how overesti- mating or underestimating the contamination rate, relative to the true rate, impacts detection effectiveness. In real-world deployments, the true contamination rate is typically unknown and can fluctuate over time. This makes it essential to under- stand how tuning the contamination parameter affects outcomes. These insights are valuable to configuring models more effectively in dynamic, real-world environments. The following tests were conducted to explore this further. 28 3. Methods Table 3.7: Test 4: Contamination parameter set to 0.10. Attribute Value Training Data 250,000 data points (10% hand-crafted mali- cious requests) Testing Data 250,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter 0.10 Ground Truth Contamination 0.10 Table 3.8: Test 5: Contamination parameter set to 0.05. Attribute Value Training Data 250,000 data points (10% hand-crafted mali- cious requests) Testing Data 250,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter 0.05 Ground Truth Contamination 0.10 Table 3.9: Test 6: Contamination parameter set to 0.15. Attribute Value Training Data 250,000 data points (10% hand-crafted mali- cious requests) Testing Data 250,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter 0.15 Ground Truth Contamination 0.10 Table 3.10: Test 7: Contamination parameter set to ’auto’. Attribute Value Training Data 250,000 data points (10% hand-crafted mali- cious requests) Testing Data 250,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter ’auto’ (anomalies estimated by model) Ground Truth Contamination 0.10 3.4.3 Dataset Size Experiment To test how the size of the dataset affects the results additional tests were made using dataset sizes of 100, 1,000, 10,000, and 100,000. The goal was to experiment 29 3. Methods on how small the dataset could be while still allowing the algorithm to independently detect anomalies with the contamination parameter set to ’auto’. Table 3.11: Test 8: Dataset size 100. Attribute Value Training Data 100 data points (10% hand-crafted malicious requests) Testing Data 100 data points (10% hand-crafted malicious requests) Contamination Parameter ’auto’ Ground Truth Contamination 0.10 Table 3.12: Test 9: Dataset size 1,000. Attribute Value Training Data 1,000 data points (10% hand-crafted mali- cious requests) Testing Data 1,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter ’auto’ Ground Truth Contamination 0.10 Table 3.13: Test 10: Dataset size 10,000. Attribute Value Training Data 10,000 data points (10% hand-crafted mali- cious requests) Testing Data 10,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter ’auto’ Ground Truth Contamination 0.10 Table 3.14: Test 11: Dataset size 100,000. Attribute Value Training Data 100,000 data points (10% hand-crafted mali- cious requests) Testing Data 100,000 data points (10% hand-crafted mali- cious requests) Contamination Parameter ’auto’ Ground Truth Contamination 0.10 30 3. Methods 3.4.3.1 Experiment with Real Life Data When the company data became accessible a final test was done to confirm that the model would work on a more varied dataset. As only 8186 data points were received this is the size of the dataset used. 10% of these requests were edited to have malicious payloads. This means that the path changed to contain something that could be harmful, just like how the previous malicious requests were created. Table 3.15: Test 12: Real company dataset (10% malicious). Attribute Value Training Data Real company dataset (8,186 data points; 90% benign, 10% hand-made malicious) Testing Data Same as training set Contamination Parameter ’auto’ Ground Truth Contamination 0.10 3.5 Rule Based Testing Through AWS To compare the performance of a rule-based system with the Isolation Forest ap- proach used in the experiment, a rule-based system was implemented and tested using AWS infrastructure. The setup involved deploying an AWS Lambda function, which served as the protected resource. An Amazon API Gateway was configured in front of the Lambda function, along with an associated AWS Web Application Firewall (WAF) that enforced request filtering rules. Due to time constraints, no custom (hand-crafted) rules were created. Instead, AWS-managed rule groups were added to the WAF. These rule groups are designed to cover a wide range of common web-based threats and were expected to block the crafted attacks used in the experiment. However, the specific details of these managed rules are not publicly available for security reasons. The dataset was uploaded to an Amazon S3 bucket and used to generate requests sent through the API Gateway. The WAF was configured to log all blocked requests along with the reasons for blocking. This logging enabled analysis of the systems performance by identifying which requests were blocked and which were allowed through. Table 3.16: Test 13: AWS setup with generated dataset (10% malicious). Attribute Value Data Generated dataset (250,000 data points; 90% benign, 10% hand-made malicious) Contamination 0.10 Rules AWS Managed rules (Core Rule Set, Admin Protection, Known Bad Inputs) 31 3. Methods 3.6 Evaluation To evaluate the success of the project, the performance of the rule-based system using AWS managed rules will be compared to the trained AI model. The comparison will focus on how many of our crafted bad requests each system successfully detects (the accuracy). Since this project serves as a proof of concept, we will consider it successful if the AI model can catch at least all the requests detected by the rule- based system. Achieving this would demonstrate that the AI approach is capable of identifying previously unseen attacks and malicious behavior without relying on predefined rules. 32 4 Results To evaluate the effectiveness of AI-based anomaly detection for securing API com- munication in connected vehicles, a series of experiments were conducted using both generated and real-world datasets. This chapter presents the outcomes of these ex- periments, focusing on the performance of the Isolation Forest model under various configurations and scenarios. The results are organized based on key aspects of the evaluation process, including the models behavior with different contamination rates, varying dataset sizes, and the proportion of handcrafted malicious samples in the data. Finally, the results obtained from the real-world company dataset, as well as the rule-based approach using AWS WAF, are presented. Each experiment is accom- panied by a confusion matrix, and the corresponding accuracy metric. Additionally, the first experiment includes a visualization of the anomaly scores. 4.1 Isolation Forest Results This section presents the results from the experiments conducted with the Isolation Forest model. It begins with the initial setup experiments, followed by evaluations focusing on contamination rate sensitivity and dataset size. The section concludes with the results obtained from experiments using the real-world company dataset. 4.1.1 Initial Experiment on Generated Dataset Test Accuracy Test 1: Generated Dataset with No Anomalies in Training set 0.1889 Test 2: Generated Dataset with a Subset of Handmade Anomalies in Training Batch 0.9806 Test 3: Generated Dataset with Handmade Anomalies in Training and Testing Batch 1.0000 Table 4.1: Results of the initial experiments. Each result is illustrated with a corre- sponding plot of anomaly scores and a confusion matrix. 4.1.1.1 Generated Dataset with No Anomalies in Training set Results from Test 1. The set up is presented in Table 3.4. 33 4. Results Figure 4.1: The plot of anomaly scores of the isolation forest built and trained on a generated dataset of 1,000,000 samples where no anomalies are included. Here the test set contains 100 samples for a better visualization. Figure 4.2: The confusion matrix of the isolation forest built on a generated dataset where no anomalies are included. 4.1.1.2 Generated Dataset with a Subset of Handmade Anomalies in Training Batch Results from Test 2. The set up is presented in Table 3.5. 34 4. Results Figure 4.3: The plot of anomaly scores of the isolation forest built on a generated dataset where a subset of anomalies are included in the training set. Here the test set contains 100 samples for a better visualization. Figure 4.4: The confusion matrix of the isolation forest built on a generated dataset where a subset anomalies are included in the training set. 4.1.1.3 Generated Dataset with Handmade Anomalies in Training and Testing Batch Results from Test 3. The set up is presented in Table 3.6. 35 4. Results Figure 4.5: The plot of anomaly scores of the isolation forest built on a generated dataset where the whole set of anomalies are included in the training set. Here the test set contains 100 samples for a better visualization. Figure 4.6: The confusion matrix of the isolation forest built on a generated dataset where the whole set of anomalies are included in the training set. 36 4. Results 4.1.2 Experiments With Contamination On Generated Dataset Test Accuracy Test 4: Contamination Set to 10% 1.0000 Test 5: Contamination Set to 5% 0.9497 Test 6: Contamination Set to 15% 0.9549 Test 7: Contamination Set to ’auto’ 1.0000 Table 4.2: Results of tests on the contamination rate. Each result is illustrated with a corresponding confusion matrix. 4.1.2.1 Contamination Set to 10% Results from Test 4. The set up is presented in Table 3.7. Figure 4.7: The confusion matrix of an isolation forest built on a dataset of 250,000 samples. The real contamination proportion is 10% and the contamination parame- ter is set to 10%. 4.1.2.2 Contamination Set to 5% Results from Test 5. The set up is presented in Table 3.8. 37 4. Results Figure 4.8: The confusion matrix of an isolation forest built on a dataset of 250,000 samples. The real contamination proportion is 10% and the contamination parame- ter is set to 5%. 4.1.2.3 Contamination Set to 15% Results from Test 6. The set up is presented in Table 3.9. Figure 4.9: The confusion matrix of an isolation forest built on a dataset of 250,000 samples. The real contamination proportion is 10% and the contamination parame- ter is set to 15%. 38 4. Results 4.1.2.4 Contamination Set to ’auto’ Results from Test 7. The set up is presented in Table 3.10. Figure 4.10: The confusion matrix of an isolation forest built on a dataset of 250,000 samples. The real contamination proportion is 10% and the contamination parame- ter is set to ’auto’. 4.1.3 Experiments with Different Dataset Sizes Test Accuracy Test 8: Dataset of Size 100 0.9800 Test 9: Dataset of Size 1,000 0.9900 Test 10: Dataset of Size 10,000 1.0000 Test 11: Dataset of Size 100,000 1.0000 Table 4.3: Results of experiments on different dataset sizes. Each result is illustrated with a corresponding confusion matrix. 39 4. Results 4.1.3.1 Dataset of Size 100 Results from Test 8. The set up is presented in Table 3.11. Figure 4.11: The confusion matrix of an isolation forest built on a dataset of 100 samples. The real contamination proportion is 10% and the contamination parame- ter is set to 10%. 4.1.3.2 Dataset of Size 1,000 Results from Test 9. The set up is presented in Table 3.12. Figure 4.12: The confusion matrix of an isolation forest built on a dataset of 1,000 samples. The real contamination proportion is 10% and the contamination parame- ter is set to 10%. 40 4. Results 4.1.3.3 Dataset of Size 10,000 Results from Test 10. The set up is presented in Table 3.13. Figure 4.13: The confusion matrix of an isolation forest built on a dataset of 10,000 samples. The real contamination proportion is 10% and the contamination parame- ter is set to 10%. 4.1.3.4 Dataset of Size 100,000 Results from Test 11. The set up is presented in Table 3.14. Figure 4.14: The confusion matrix of an isolation forest built on a dataset of 100,000 samples. The real contamination proportion is 10% and the contamination parame- ter is set to 10%. 41 4. Results 4.1.4 Experiments On Company Data Results from Test 12. The setup is presented in Table 3.15. The accuracy is 0.9961 and the corresponding confusion matrix can be seen in Figure 4.15. Figure 4.15: The confusion matrix of an isolation forest built on a real life dataset of 8,186 samples. The real contamination proportion is 10% and the contamination parameter is set to ’auto’. 4.2 Rule Based System Results Results from Test 13. The setup is presented in Table 3.16. Figure 4.16: The confusion matrix from the AWS system set up with managed rules and the generated dataset. 42 4. Results Test Accuracy Test 13: Rule-based Approach/AWS 0.9548 Table 4.4: Results of experiments on aws setup, rule-based approach. Table 4.5: Examples of malicious requests (label = 1) allowed by AWS WAF. Method URL Path Suspicious Element Notes DELETE /live/usr/api/etc/ user/passwd Attempt to access sys- tem file Contains path resem- bling /etc/passwd; known target in Unix- based attacks PUT /live/usr/register/api/ etc/user/passwd File path resembling system credential file No referrer; common in automated malicious scans GET /live/usrrm -rf / Command injection pattern Mimics destructive Unix command rm -rf / in URL path 43 4. Results 44 5 Discussion This chapter evaluates the results of the experiments, examining factors that may have influenced the outcomes. Each experiment is first analyzed individually, fol- lowed by a combined discussion to assess the overall reliability of the project. Finally, potential improvements and directions for future work are explored. 5.1 Initial Experiment Evaluation The first experiment, shown in 3.4, did not yield good results, as most of the ma- licious data was not recognized as such. An iTree is built based on the features it encounters during training. When data with similar features is passed through the trained tree, it is matched accordingly and the traversal stops. However, if the data contains new, unseen features, it continues down the tree in search of a match. Since no match exists, it reaches the bottom of the tree. Because the model cannot make a confident prediction in this case, the data point is not marked as an anomaly and is instead treated as normal. This behavior was not immediately clear during the initial experiments but was later understood to occur because the TfidfVectorizer creates a fixed feature vocabulary from the training data. As a result, any unseen features in the test set are ignored and therefore not considered by the Isolation Forest. This limitation was clearly demonstrated in the first experiment. Therefore, in the following two experiments with setups shown in Table 3.5 and Table 3.6, malicious data was included in both the training and testing batches. In the test described in Table 3.5, only a subset of the malicious requests was used for training. The results showed some improvement, but as discussed earlier, the model still failed to classify data points that included previously unseen features. In the final initial experiment shown in Table 3.6, all types of created attacks were included in both the training and testing sets. As a result, the model was able to correctly label all of the data points. This gave valuable insights in how the model worked and how it should be trained and tested in the following experiments to get the best results, that is that the training and testing dataset should be the same. 45 5. Discussion 5.2 Contamination Experiment Evaluation The contamination experiments were conducted to observe how overestimating or underestimating the expected contamination rate, compared to the ground truth contamination in the data batch, would affect model performance. Four experi- ments were performed: one with an underestimated contamination rate, one with an overestimated rate, one matching the ground truth contamination, and one using the ’auto’ setting where the model estimates the anomalies on its own. Both underestimating and overestimating the contamination negatively impacted the results. When the contamination was overestimated, the model incorrectly la- beled some benign data points as malicious. Conversely, when the expected contam- ination was lower than the actual contamination, some malicious data points were incorrectly labeled as benign. The best results, with an accuracy of 1.0, were achieved when the contamination rate was either correctly specified or set to ’auto’. The model uses the expected contamination rate to determine how many data points to label as anomalies. If the expected contamination is lower than the actual rate, the model identifies fewer anomalies than truly exist. If the expected rate is higher, the model starts labeling the most anomalous-looking normal data points as anomalies to meet the expected quotaeven if those data points are actually benign. 5.3 Dataset Size Experiment Evaluation The previous results showed that the contamination parameter significantly influ- enced the models performance. Using an accurate contamination rate proved crucial for making reliable predictions. However, in real-world scenarios, knowing the exact contamination rate is not feasible, as it can vary over time. Therefore, the final set of experiments aimed to determine how much data the model requires for the ’auto’ setting to produce accurate results. The experiments revealed that the ’auto’ setting could accurately estimate the con- tamination rate, provided that the data set was large enough. However, when the dataset was too small, the model’s estimation became less reliable. This is likely be- cause small datasets do not exhibit clear patterns. As a result, the model sometimes falsely labeled benign data as malicious. When the model builds its trees, it relies on the features present in the data and the frequency with which they appear. In a small dataset, normal data may include features that appear unique by chance. This can cause the model to misclassify anomalous data as normal since the lines between what is normal and anomalous are not as clear. In contrast, with a larger dataset, the patterns become more consistent and statistically meaningful, reducing the likelihood of these false positives. 46 5. Discussion 5.4 Real Life Data Experiment Evaluation The tests using the companys data, combined with the configurations that had performed best in earlier experiments (i.e., contamination set to ’auto’ and inclusion of malicious data in both the training and testing sets), yielded promising results. The model successfully detected almost all of the malicious requests. The ones it missed might have been because the dataset was quite small. As seen in previous experiment the accuracy began to dip when training was done on less than about 10,000 data points, which this dataset had. However, it should be noted that the malicious requests were manually crafted and deliberately inserted into the data set. This may have made them easier to detect compared to more subtle organic attacks that could occur in a real-world setting. 5.5 Rule-Based System Experiment Evaluation The results from the rule-based system using AWS WAF highlighted its strengths, but also its limitations. Some of the handcrafted malicious requests were successfully detected by matching predefined rules in the AWS Managed Ruleset. AWS WAF is designed to block known attack signatures such as SQL injections, cross-site script- ing, and access to restricted paths. However, the system failed to detect some of the crafted attacks, which demonstrates a key limitation of purely rule-based defenses: they cannot catch what they are not explicitly programmed to recognize. While the AWS WAF test provided useful insight into how rule-based systems work, it does not fully reflect a real production setup. In practice the system would be adjusted the rules to match the specific application and threat landscape. None of these were included in this experiment. Overall, AWS WAF performed well on known, simple attacks, but the test setup was limited so not much else can be concluded. This does, however, sufficiently highlight the need to combine rule-based tools with adaptive methods like machine learning to detect unknown threats. 5.6 Experiment Reliability There are several important considerations when interpreting the results of this study. The most significant factor is the nature of the data used in the experiments. Since the entire dataset was synthetically generated, it tends to be quite predictable and follows easily recognizable patterns. Although different features appear in various combinations, the overall number of distinct features is limited, which reduces the datas diversity. The attack scenarios included in the dataset were also quite constrained. Only five distinct attack paths were created. While these were combined with other API path components in different ways, the overall variation remained low. It could be argued, however, that because the contamination rate was set relatively high 47 5. Discussion despite the limited attack diversity, the frequency of attacks in the dataset increased, potentially making them harder for the model to detect. Isolation Forest is effective at identifying patterns that indicate normality. When the data is inherently predictable, the model performs particularly well at recognizing these patterns. However, since the analysis involved tokenizing and separating each parameter in the API path (using the TfidfVectorizer), the sequential relationships between tokens were not preserved. This lack of context can cause the model to miss important behavioral cues. For example, a path ending with a specific term x might be harmful, whereas having x in the middle may not be. Similarly, a certain path y might appear harmless when used with method A but could be suspicious when used with method B. The high predictability of the dataset likely explains why the model achieved an ac- curacy of 1. The attack patterns were relatively easy to distinguish from the normal data points, making the classification task less challenging. One factor that signif- icantly impacted performance was the contamination rate. The results indicated that this parameter needed to be either set accurately or configured as ’auto’ for the model to perform well. In systems with a constant flow of unpredictable data, how- ever, the contamination rate is generally unknown in advan