AI-Powered Behavioral Analysis of
Vehicle Communication to Strengthen
API Security

Master’s thesis in Computer science and engineering

Johanna Edh & Aurora Veldhuis

Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2025


Master’s thesis 2025

AI-Powered Behavioral Analysis of
Vehicle Communication to Strengthen

API Security

Johanna Edh & Aurora Veldhuis

Department of Computer Science and Engineering
Chalmers University of Technology

University of Gothenburg
Gothenburg, Sweden 2025


AI-Powered Behavioral Analysis of Vehicle Communication to Strengthen API Se-
curity

Johanna Edh & Aurora Veldhuis

© Johanna Edh & Aurora Veldhuis, 2025.

Supervisor: Adina Aniculaesei, Computer Science Department
Examiner: Wolfgang Ahrendt, Computer Science Department

Master’s Thesis 2025
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Typeset in LATEX
Gothenburg, Sweden 2025

iv


AI-Powered Behavioral Analysis of Vehicle Communication to Strengthen API Se-
curity

Johanna Edh & Aurora Veldhuis
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg

v


Abstract
As vehicles become increasingly connected, the volume of API communication be-
tween cars and cloud-based services grows, exposing new security risks. Traditional
rule-based security systems, such as AWS Web Application Firewall, are limited to
detecting known threats and patterns that can be pre-defined in the ruleset. This
thesis explores the use of AI-powered anomaly detection, specifically the Isolation
Forest algorithm, as a complement to existing rule-based methods to secure API
traffic in connected vehicles. A series of experiments were conducted using both
synthetic and real-world API request data. The results show that Isolation Forest
can effectively detect anomalous requests, especially when trained on sufficiently
large and representative datasets. Comparisons with a rule-based system revealed
that AI-based methods might be better at identifying unknown threats, while rule-
based filters remain reliable for known attack patterns. Overall, the study highlights
the potential of combining machine learning with traditional approaches to create
more adaptive and intelligent API security systems for connected vehicles.

Keywords: Anomaly Detection, Machine Learning, Vehicle Communication, Isola-
tion Forest, API Communication, API Security

vi


Statement Regarding The Use of Generative AI
In this report generative AI tools have been used to assist with grammar, sentence
formulation, and language refinement. Although the core analysis, ideas, and con-
clusions are made from human authorship, AI has supported the writing process to
improve clarity and readability. The AI tool used is ChatGPT. All critical thinking,
experiment design, data interpretation, and decisions presented in the report were
carried out by the authors.

viii


Acknowledgements
A huge thank you to our supervisor, Adina Aniculaesei, from the Department of
Computer Science at Chalmers, who has helped us structure this study, reviewed all
our drafts, and supported us throughout the entire process. We would also like to
express our sincere thanks to Alfred Kjeller from WirelessCar, who helped us with
the contact to the company. Without the support of both of you, this thesis would
not have been possible. A final thank you to our examiner, Wolfgang Ahrendt, for
your guidance.

Johanna Edh and Aurora Veldhuis, Gothenburg, 2025-06-10

x


xii


Contents

List of Figures xv

List of Tables xvii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 5
2.1 API Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 API Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.2.1 Common Types of API Attacks . . . . . . . . . . . . 7
2.2 Vehicle Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 AWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 API Gateways and the Shared Responsibility Model . . . . . . 9
2.3.2 AWS Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 AWS S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.4 AWS WAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.5 Managed Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Machine Learning and AI . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 AI for API Security . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.2 Behavioral Analysis . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.3 Supervised and Unsupervised Learning Algorithms . . . . . . 14
2.4.4 Isolation Forests . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.5 Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.6 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Methods 21
3.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Selection of Machine Learning Model . . . . . . . . . . . . . . . . . . 24
3.3 Threat Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

xiii


Contents

3.4 Isolation Forest Experiment Set Up . . . . . . . . . . . . . . . . . . . 25
3.4.1 Initial Experiment . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Contamination Experiment . . . . . . . . . . . . . . . . . . . 28
3.4.3 Dataset Size Experiment . . . . . . . . . . . . . . . . . . . . . 29

3.4.3.1 Experiment with Real Life Data . . . . . . . . . . . 31
3.5 Rule Based Testing Through AWS . . . . . . . . . . . . . . . . . . . . 31
3.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Results 33
4.1 Isolation Forest Results . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1.1 Initial Experiment on Generated Dataset . . . . . . . . . . . . 33
4.1.1.1 Generated Dataset with No Anomalies in Training set 33
4.1.1.2 Generated Dataset with a Subset of Handmade Anoma-

lies in Training Batch . . . . . . . . . . . . . . . . . 34
4.1.1.3 Generated Dataset with Handmade Anomalies in Train-

ing and Testing Batch . . . . . . . . . . . . . . . . . 35
4.1.2 Experiments With Contamination On Generated Dataset . . . 37

4.1.2.1 Contamination Set to 10% . . . . . . . . . . . . . . . 37
4.1.2.2 Contamination Set to 5% . . . . . . . . . . . . . . . 37
4.1.2.3 Contamination Set to 15% . . . . . . . . . . . . . . . 38
4.1.2.4 Contamination Set to ’auto’ . . . . . . . . . . . . . . 39

4.1.3 Experiments with Different Dataset Sizes . . . . . . . . . . . . 39
4.1.3.1 Dataset of Size 100 . . . . . . . . . . . . . . . . . . . 40
4.1.3.2 Dataset of Size 1,000 . . . . . . . . . . . . . . . . . . 40
4.1.3.3 Dataset of Size 10,000 . . . . . . . . . . . . . . . . . 41
4.1.3.4 Dataset of Size 100,000 . . . . . . . . . . . . . . . . . 41

4.1.4 Experiments On Company Data . . . . . . . . . . . . . . . . . 42
4.2 Rule Based System Results . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Discussion 45
5.1 Initial Experiment Evaluation . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Contamination Experiment Evaluation . . . . . . . . . . . . . . . . . 46
5.3 Dataset Size Experiment Evaluation . . . . . . . . . . . . . . . . . . . 46
5.4 Real Life Data Experiment Evaluation . . . . . . . . . . . . . . . . . 47
5.5 Rule-Based System Experiment Evaluation . . . . . . . . . . . . . . . 47
5.6 Experiment Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.7 Improvements and Future Work . . . . . . . . . . . . . . . . . . . . . 49

6 Conclusion 51

Bibliography 53

A Appendix 1 I

xiv


List of Figures

1.1 Context of the project. . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Isolation Forest visualization with three different iTrees. . . . . . . . . 16

4.1 The plot of anomaly scores of the isolation forest built and trained
on a generated dataset of 1,000,000 samples where no anomalies are
included. Here the test set contains 100 samples for a better visual-
ization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 The confusion matrix of the isolation forest built on a generated
dataset where no anomalies are included. . . . . . . . . . . . . . . . . 34

4.3 The plot of anomaly scores of the isolation forest built on a generated
dataset where a subset of anomalies are included in the training set.
Here the test set contains 100 samples for a better visualization. . . . 35

4.4 The confusion matrix of the isolation forest built on a generated
dataset where a subset anomalies are included in the training set. . . 35

4.5 The plot of anomaly scores of the isolation forest built on a generated
dataset where the whole set of anomalies are included in the training
set. Here the test set contains 100 samples for a better visualization. 36

4.6 The confusion matrix of the isolation forest built on a generated
dataset where the whole set of anomalies are included in the training
set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.7 The confusion matrix of an isolation forest built on a dataset of
250,000 samples. The real contamination proportion is 10% and the
contamination parameter is set to 10%. . . . . . . . . . . . . . . . . . 37

4.8 The confusion matrix of an isolation forest built on a dataset of
250,000 samples. The real contamination proportion is 10% and the
contamination parameter is set to 5%. . . . . . . . . . . . . . . . . . 38

4.9 The confusion matrix of an isolation forest built on a dataset of
250,000 samples. The real contamination proportion is 10% and the
contamination parameter is set to 15%. . . . . . . . . . . . . . . . . . 38

4.10 The confusion matrix of an isolation forest built on a dataset of
250,000 samples. The real contamination proportion is 10% and the
contamination parameter is set to ’auto’. . . . . . . . . . . . . . . . . 39

4.11 The confusion matrix of an isolation forest built on a dataset of 100
samples. The real contamination proportion is 10% and the contam-
ination parameter is set to 10%. . . . . . . . . . . . . . . . . . . . . . 40

xv


List of Figures

4.12 The confusion matrix of an isolation forest built on a dataset of 1,000
samples. The real contamination proportion is 10% and the contam-
ination parameter is set to 10%. . . . . . . . . . . . . . . . . . . . . . 40

4.13 The confusion matrix of an isolation forest built on a dataset of 10,000
samples. The real contamination proportion is 10% and the contam-
ination parameter is set to 10%. . . . . . . . . . . . . . . . . . . . . . 41

4.14 The confusion matrix of an isolation forest built on a dataset of
100,000 samples. The real contamination proportion is 10% and the
contamination parameter is set to 10%. . . . . . . . . . . . . . . . . . 41

4.15 The confusion matrix of an isolation forest built on a real life dataset
of 8,186 samples. The real contamination proportion is 10% and the
contamination parameter is set to ’auto’. . . . . . . . . . . . . . . . . 42

4.16 The confusion matrix from the AWS system set up with managed
rules and the generated dataset. . . . . . . . . . . . . . . . . . . . . . 42

xvi


List of Tables

2.1 Common attacks in an API system. . . . . . . . . . . . . . . . . . . . 8
2.2 Core Rule Set (CRS). . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Admin Protection Rule Set. . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Known Bad Inputs Rule Set. . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 A confusion matrix for binary classification. TN and TP represent

the number of correct predicted classes, while FN and FP represents
the number of misclassified instances. . . . . . . . . . . . . . . . . . . 18

3.1 Overview of the structure of the generated request logs. . . . . . . . . 22
3.2 Overview of some parameters available for construction of an isolation

forest in Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Overview of the parameters available for constructing an Isolation

Forest using scikit-learn in Python. . . . . . . . . . . . . . . . . . . . 26
3.4 Test 1: Generated dataset with malicious only in testing. . . . . . . . 28
3.5 Test 2: Generated dataset with malicious in both training and testing. 28
3.6 Test 3: Generated dataset with all malicious requests included in

training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7 Test 4: Contamination parameter set to 0.10. . . . . . . . . . . . . . 29
3.8 Test 5: Contamination parameter set to 0.05. . . . . . . . . . . . . . 29
3.9 Test 6: Contamination parameter set to 0.15. . . . . . . . . . . . . . 29
3.10 Test 7: Contamination parameter set to ’auto’. . . . . . . . . . . . . 29
3.11 Test 8: Dataset size 100. . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.12 Test 9: Dataset size 1,000. . . . . . . . . . . . . . . . . . . . . . . . . 30
3.13 Test 10: Dataset size 10,000. . . . . . . . . . . . . . . . . . . . . . . . 30
3.14 Test 11: Dataset size 100,000. . . . . . . . . . . . . . . . . . . . . . . 30
3.15 Test 12: Real company dataset (10% malicious). . . . . . . . . . . . . 31
3.16 Test 13: AWS setup with generated dataset (10% malicious). . . . . . 31

4.1 Results of the initial experiments. Each result is illustrated with a
corresponding plot of anomaly scores and a confusion matrix. . . . . . 33

4.2 Results of tests on the contamination rate. Each result is illustrated
with a corresponding confusion matrix. . . . . . . . . . . . . . . . . . 37

4.3 Results of experiments on different dataset sizes. Each result is illus-
trated with a corresponding confusion matrix. . . . . . . . . . . . . . 39

4.4 Results of experiments on aws setup, rule-based approach. . . . . . . 43
4.5 Examples of malicious requests (label = 1) allowed by AWS WAF. . . 43

xvii


List of Tables

xviii


1
Introduction

As vehicles become increasingly connected, the amount of data flowing between
cars and backend systems grows exponentially. Each day, millions of API calls
are exchanged between vehicles and servers around the world. Companies such as
WirelessCar specialize in providing connected vehicle services where communication
happens largely through API requests and responses.

With so many data sources at work, making these API communications secure and
reliable is crucial not only to protect sensitive user data, but also to maintain system
integrity and availability. Unauthorized access, data breaches and cyberattacks
pose significant risks to both vehicle software manufacturers and users. Traditional
security measures like rule-based systems are coded to detect and block known
threats using static patterns and preconfigured attack signatures. A system like this
performs well against well-known vulnerabilities such as SQL injections and cross-
site scripting (XSS). A rule-based system however, does not perform well against
new attacks or covert malicious activity that does not conform to known patterns
or behaviors. To ensure the integrity and availability of vehicle communication
networks, a more intelligent approach to API security is necessary [1].

This thesis explores the potential of AI-powered anomaly detection to strengthen
API security in connected vehicles. By leveraging machine learning techniques, we
aim to complement existing rule-based methods with a more dynamic and predictive
security solution.

1.1 Background
API security at WirelessCar is built on a multi-layered approach that includes so-
lutions such as AWS Web Application Firewall (WAF), which uses predefined rule
sets to help detect known attack patterns. While effective against well-documented
threats such as SQL injections and cross-site scripting (XSS), these systems lack
when exposed to more sophisticated attacks that do not conform to known patterns.

The increasing complexity of cyber threats requires more advanced security mecha-
nisms. Anomaly detection, a technique commonly used in cybersecurity, can identify
suspicious activity by analyzing deviations from common behavior. When applied to
API requests, anomaly detection can potentially uncover previously unseen attack
methods, improving the overall security of a connected vehicle ecosystem [1].

1


1. Introduction

For this project one of WirelessCars applications dedicated to remote services will
be studied. The focus will be on the traffic coming from the application to their
servers. The remote services this application handles are things like unlocking and
locking the car, remote honk, flash control, and many more. This is illustrated in
1.1.

Figure 1.1: Context of the project.

Finding and exploring new, effective, and, potentially more, manageable solutions
to aid API security is therefore a crucial step to be able to further expand the
connectivity in large businesses. Protecting users and critical information is essential,
as failure to do so can be costly. According to IBM, the average cost of a data breach
in 2020 was $3.86 million [1].

1.2 Purpose

The purpose of this project is to evaluate how well an AI-powered solution can per-
form compared to a rule-based system. Specifically, this thesis aims to see how an
AI model can be used to identify unusual patterns and parameters in API requests.
Ultimately, this thesis aims to provide insight on how AI and machine learning mod-
els can improve API security and complement traditional methods to pave the way
for more dynamic and intelligent cybersecurity solutions in the world of connected
vehicles.

To achieve this purpose the study aims to answer the following research questions:

• How effectively can an AI-based anomaly detection model identify malicious
API requests compared to a rule-based system?

• What types of attacks or anomalies can an AI model detect that rule-based
security systems fail to recognize?

• What are the advantages and limitations of using AI-based anomaly detection
for API security?

2


1. Introduction

1.3 Goal
The goal of this thesis is to design, implement, and evaluate an AI-based anomaly
detection system for API security. This will be achieved by:

• Implementing and training an AI model for detecting anomalies in API re-
quests.

• Assessing its effectiveness in detecting real-world API threats.

• Comparing its performance against AWS WAFs rule-based security.

• Analyzing the strengths and weaknesses of both approaches.

In the long run, this research aims to contribute to the evolution of more adaptive
cybersecurity solutions, paving the way for intelligent, self-learning security systems
that can proactively defend against emerging threats.

1.4 Thesis Description
This thesis is structured as follows:

• Chapter 1 Introduction: Introduces the background, motivation, and goals
of the thesis, including the research questions addressed.

• Chapter 2 Theory: Provides the theoretical background on API commu-
nication, security concerns, AWS infrastructure, and machine learning tech-
niques relevant to anomaly detection.

• Chapter 3 Methods: Describes the experimental setup, including dataset
creation and preprocessing, model selection, and the configuration of the Iso-
lation Forest algorithm.

• Chapter 4 Results: Presents the outcomes of the conducted experiments,
including performance evaluations of the AI model and comparisons with rule-
based systems.

• Chapter 5 Discussion: Interprets the experimental findings, reflects on
model reliability, and discusses the implications and limitations of the results.

• Chapter 6 Conclusion: Summarizes the main findings, connects them to
the research questions, and outlines directions for future work.

3


1. Introduction

4


2
Theory

This chapter provides the essential technical background necessary to fully under-
stand the project. It begins by exploring API communication, covering the funda-
mentals of what an API is and the intricacies of vehicle communication within this
context. Next, security concerns related to APIs are addressed, focusing on the
challenges and solutions for ensuring secure data exchange. Following this, a section
dedicated to AWS (Amazon Web Services) and machine learning is presented, out-
lining their relevance to the project. Finally previous work in these areas is reviewed,
offering additional context and insights into the projects foundation.

2.1 API Communication
While APIs enable efficient communication between systems, especially in connected
vehicles, they also introduce a broad attack surface for malicious actors. The fol-
lowing section will cover API communications, explaining what an API is, how they
are used in vehicle communication, and, lastly, what security risks there are in an
API infrastructure.

2.1.1 APIs
A study done by SlashData in 2020 showed that 90 % of developers use APIs and
a total of 30 % of their time is spent coding APIs [2]. Application Programming
Interfaces, or APIs, are a crucial part of smooth communication and technical devel-
opment today. An API is a set of rules and protocols that enable effective exchange
of data, features, and functionality between software applications. This communica-
tion is mostly done through a series of requests and responses between clients and
servers [3], the service sending the request being the client and the service receiving
it being the server [4].

An API can operate in different scopes„ which are usually defined by four categories,
that determine how the API can be used and what its purpose is. These categories
are typically formalized as protocols or architectures. They consist of [4];

• SOAP API (Simple Object Access Protocol API): Uses XML as a messaging
standard for network communication [4].

5


2. Theory

• RPC API (Remote Procedure Call): Allows a client to execute functions or
procedures on a remote server as if they were local [4].

• WebSocket API: Supports real-time, two-way communication using JSON ob-
jects to pass data [4].

• REST API (Representational State Transfer API): Uses HTTP requests such
as GET, PUT, HEAD, and DELETE to interact with resources and is the
most widely used architecture for web services [5].

There are four main types of APIs, each operating within a different scope. Public
APIs are accessible to any external entity, while private APIs, also called internal
APIs, are restricted to communication within an organization. Partner APIs are
designed for external use but are only available to select outside services and users,
often in business-to-business interactions. Lastly, composite APIs combine multiple
API types, allowing them to work together in sequence or as a unified system [5].

2.1.2 API Security
The widespread usage of APIs for communication creates vulnerabilities that mali-
cious actors can take advantage of. Securing APIs has therefore become a key part
of keeping a system secure and trustworthy. More than 83% of all internet traffic in
2018 was attributed to Web APIs and of around 30% of all authentication attempts
on APIs were found to be from malicious actors [6]. In 2021 it was predicted that
90% of web-enabled applications would be exposed to cyberattacks by 2021 due to
inadequate API security measures [1].

Millions of API calls occur daily, generating huge traffic volumes that are difficult to
analyze. This amount of traffic creates massive amounts of exploitable vulnerabilities
in a system. In 2001 a foundation called the Open Worldwide Application Security
Project (OWASP) was launched with the mission "To be the global open community
that powers secure software through education, tools, and collaboration". Since their
launch OWASP has become a huge contributor in the field of cyber security. In 2023
they released a list of of top 10 API security risks in 2023. These risks were defined
as follows [7]:

• Broken Object Level Authorization (BOLA)

• Broken Authentication

• Broken Object Property Level Authorization

• Unrestricted Resource Consumption

• Broken Function Level Authorization

• Unrestricted Access to Sensitive Business Flows

• Server-Side Request Forgery (SSRF)

• Security Misconfiguration

• Improper Inventory Management

6


2. Theory

• Unsafe Consumption of APIs

API security is a combination of several established security disciplines like informa-
tion security, network security and application security. Information security focuses
on protecting data throughout its life cycle, network security ensures the safe trans-
mission of data and the protection of the network as a whole, and application security
ensures that a software system is designed to withstand attacks. Knowledge from
all of these practices are needed when designing a secure API system [8].

Key components of API security include API gateways, which serve as mediators
between clients and backend services. These gateways act as reverse proxies, com-
bining multiple APIs into what appears to be a single API for users. This not only
simplifies interactions by providing a more unified interface but also enhances secu-
rity by enforcing authentication, authorization, and rate-limiting policies to prevent
abuse and unauthorized access [8].

Another crucial security measure is a Web Application Firewall (WAF), which op-
erates at a higher level that traditional firewalls by inspecting and filtering HTTP
traffic. Unlike standard firewalls that primarily control access based on IP addresses
and ports, a WAF analyzes the content of incoming requests, blocking common at-
tacks such as SQL injections, cross-site scripting (XSS) and other threats targeting
web applications [8].

To further strengthen security, organizations can also implement an intrusion de-
tection system (IDS) and intrusion prevention system (IPS). These tools monitor
internal network traffic for anomalies or malicious activity. An IDS primarily detects
and alerts administrators to suspicious patterns, while an IPS goes a step further by
actively blocking potentially harmful traffic before it reaches critical systems. To-
gether, these security measures create multiple layers of protection for an application
[8].

In general, API security aims to achieve three key goals: confidentiality, ensuring
that information can only be read by its intended audience; integrity, preventing
unauthorized modification, creation, or deletion of information; and availability, en-
suring that legitimate users can access the information when needed. Other desirable
properties include accountability and nonrepudiation, which ensure that actions can
be traced back to the user who performed them and that the user cannot deny having
done so [8].

Building a secure and effective solution for cloud services is demanding and costly;
therefore, companies usually take the help of a cloud service provider (CSP). There
are three major CSPs today, Amazon Web Services (AWS), Microsoft Azure, and
Google Cloud Platform (GCP). These providers offer companies with an environ-
ment technology and infrastructure with which to set up services. Each provider
supports different cloud architectures and deployment models, allowing businesses
to choose solutions based on their needs [9].

2.1.2.1 Common Types of API Attacks

Table 2.1 presents some of the most common attacks in an API system [10]:

7


2. Theory

Attack
Type

Target How Risk

Injection At-
tacks

Request body,
query parameters,
headers

Injecting malicious
code (SQL, XML,
commands) into in-
put fields

Data theft, manip-
ulation, or full sys-
tem compromise

DoS/DDoS API endpoints
(high request vol-
ume/frequency)

Overloading the
server with exces-
sive requests from
one (DoS) or many
(DDoS) sources

Service outage,
degraded perfor-
mance, financial
and reputational
loss

Authentication
Hijacking

Authorization
headers, tokens
(e.g., JWT)

Stealing or manip-
ulating tokens to
impersonate legiti-
mate users

Unauthorized
access, data
breaches, identity
theft

Data Expo-
sure

API responses,
data in transit

Exposing sensitive
data due to design
flaws or lack of en-
cryption

Privacy violations,
regulatory fines,
sensitive data
leaks

Parameter
Tampering

URL query param-
eters, path param-
eters, request body

Manipulating
parameters (e.g.,
user_id, limit,
price) to access
data or alter trans-
actions

Data leaks, unau-
thorized actions, fi-
nancial fraud

Man-in-
the-Middle
(MitM)

Data in transit
(client ↔ API
server)

Intercepting or
modifying API
communication
(exploiting weak
or missing TL-
S/SSL)

Data theft, injec-
tion of malicious
data, session hi-
jacking

Table 2.1: Common attacks in an API system.

2.2 Vehicle Communication
Vehicle communication involves how a car gathers, processes, and shares data both
within the vehicle’s systems and with external devices or platforms. This is the core
of modern vehicle telematics and infotainment systems, allowing cars to become part
of the Internet of Things (IoT) [11].

8


2. Theory

A vehicle telematics system connects a car to the outside world. The technology uses
telecommunications and computers to send, receive, and store information about the
vehicle and driving patterns. Its purpose is safety, vehicle monitoring, and commu-
nication. The system can include things like: GPS tracking, automatic alerts in case
of an accident on the road ahead of the ego-vehicle, remote control of the vehicle, like
locking or unlocking the vehicle, vehicle health reports and maintenance reminders,
emergency help, and speed monitoring. In some cases reports and patterns collected
by a telematics system can be used as a basis to pay a lower insurance premium
if the driver is considered to behave safely in traffic. This is called Usage-Based
Insurance (UBI) [12].

One major advancement in telematics is its use in smartphone-based platforms.
Instead of relying solely on factory-installed systems, mobile apps and aftermarket
devices can now provide telematics functions, making the technology more accessible
and widely adopted[12].

While telematics focuses on vehicle monitoring and external communication, info-
tainment systems are designed for in-car entertainment and user interaction. Mod-
ern infotainment systems include touchscreen interfaces, voice commands, Bluetooth
connectivity, and smartphone integration, making driving more convenient and con-
nected [13].

2.3 AWS

Amazon Web Services (AWS) offers a suite of tools and services to enhance API
security, primarily through the Amazon API Gateway. The following section covers
the AWS services used in the project.

2.3.1 API Gateways and the Shared Responsibility Model

The API Gateway acts as the entry point for applications to access data, logic, or
functionality from back-end services, managing traffic, security, and API monitor-
ing. It supports both REST and WebSocket APIs, making it suitable for various
applications, including serverless and container-based solutions. Additionally, API
Gateway provides version control and seamless integration with other AWS services
to enhance security and monitoring [14].

One of the fundamental aspects of API security in AWS is the shared responsibility
model. AWS is responsible for the security of the cloud which includes protecting
the infrastructure that runs AWS services. This encompasses the physical security of
data centers, hardware, and the software that operates AWS services. The customers
are responsible for security in the cloud, which enables managing the security of their
applications and data. This includes configuring security settings, managing access
controls, and ensuring data protections [15].

9


2. Theory

2.3.2 AWS Lambda
When deploying an API Gateway in AWS, you have the option of avoiding the need
for an actual server to handle responses. This is done through AWS Lambda, which
is a server-less calculation service that lets you run your code without having to
consider servers or clusters. The service executes code in response to events and
automatically handles the underlying resources. This means that the user does not
need to handle any commissions or infrastructure. One of the big advantages of the
AWS Lambda is the automatic scaling as an answer to code executing requests of all
sizes. AWS Lambda can be used for a variety of applications, including fast large-
scale data processing, running interactive web and mobile backends, and creating
event-driven applications [16].

2.3.3 AWS S3
Amazon Simple Storage Service (Amazon S3) is a scalable object storage service to
store and retrieve any amount of data from anywhere. Data is stored as objects in
buckets, each uniquely identified by a key. It offers multiple storage classes with cost-
efficient pricing and automated life cycle management. Common use cases include
data leakes, backup and recovery, low-cost archiving, and powering generative AI
applications [17].

2.3.4 AWS WAF
AWS WAF (Web Application Firewall) is a service that protects your web appli-
cations, such as a API Gateway, from common attacks and exploits. With AWS
WAF, you can create security rules to manage bot traffic and block common attack
patterns such as SQL injection and cross-site scripting (XSS). One of the advan-
tages of AWS WAF is that it saves time by using managed rules. It also makes it
easier to monitor, block, or rate-limit common and recurring bot traffic. In addition,
it improves visibility into your web traffic by giving you detailed control over how
statistics are generated [18].

Use cases for AWS WAF include filtering web traffic by creating rules based on
various conditions such as IP addresses, HTTP headers and body, or custom URIs.
It can also be used to prevent fraud related to account takeovers by monitoring
the application’s login page for unauthorized access with compromised credentials.
Furthermore, AWS WAF can be managed through APIs, enabling automated rule
creation and maintenance, as well as integration into the development and design
process [18].

2.3.5 Managed Rules
AWS Managed Rules is a managed service that provides protection against applica-
tion vulnerabilities and other unwanted traffic. These are collections of predefined,
ready-to-use rules created and maintained by AWS and vendors on AWS Market-
place. Rule groups from AWS Managed Rules can be added to Web Access Control
List (ACL) to be used to protect an application [19].

10


2. Theory

These rule groups are designed to protect against common web threats and, when
used as documented, add an extra layer of security to your applications. However,
they are not intended to replace the users own security responsibilities, which de-
pend on what AWS resources are in use. Many AWS and Marketplace vendors
provide automatic updates to these rule groups as new vulnerabilities and threats
are discovered. In some cases, AWS may receive information about vulnerabilities
before public disclosure, allowing preemptive updates to AWS Managed Rules. To
protect vendors’ intellectual property and prevent malicious actors from bypassing
the rules, the details of individual rules within a managed rule group are not fully
visible [20].

Baseline managed rule groups provide general protection against a wide range of
common threats. Users can select one or more of these rule groups to establish basic
protection for their resources. The following rule groups are included in the baseline
category [21]:

Table 2.2: Core Rule Set (CRS).

Rule Group Core Rule Set (CRS)
Name AWSManagedRulesCommonRuleSet
Description Provides general protection against a wide range of vul-

nerabilities, including OWASP Top 10 risks. Adds labels
for monitoring and further rule evaluation.

Example Rules
• NoUserAgent_HEADER: Blocks requests without a

User-Agent header.
• UserAgent_BadBots_HEADER: Blocks bad bots

(e.g., Nessus, Nmap).
• SizeRestrictions: Blocks oversized query

strings, cookies, body, or URI paths.
• EC2MetaDataSSRF: Blocks attempts to exfiltrate

EC2 metadata.
• GenericLFI: Detects Local File Inclusion (LFI) at-

tacks.
• RestrictedExtensions: Blocks unsafe system

file extensions.
• GenericRFI: Detects Remote File Inclusion (RFI)

attempts.
• CrossSiteScripting: Detects common XSS pat-

terns.

Table 2.3: Admin Protection Rule Set.

Rule Group Admin Protection
Name AWSManagedRulesAdminProtectionRuleSet

11


2. Theory

Description Blocks external access to common administrative paths,
reducing the risk of unauthorized access to administra-
tive interfaces. Adds labels for monitoring.

Example Rule
• AdminProtection_URIPATH: Blocks requests to

known admin paths (e.g., sqlmanager).

Table 2.4: Known Bad Inputs Rule Set.

Rule Group Known Bad Inputs
Name AWSManagedRulesKnownBadInputsRuleSet
Description Blocks known bad patterns commonly associated with

exploitation or vulnerability discovery. Adds labels for
monitoring.

Example Rules
• Java Deserialization RCE detection across headers,

body, URI path, and query string.
• Host_localhost_HEADER: Blocks Host headers

targeting localhost.
• PROPFIND_METHOD: Blocks the HTTP PROPFIND

method.
• ExploitablePaths_URIPATH: Blocks attempts to

access exploitable paths (e.g., web-inf).
• Log4j (CVE-2021-44228) detection in headers,

body, URI path, and query string.

2.4 Machine Learning and AI
Machine learning (ML) is a subset of artificial intelligence (AI) focused on enabling
computers and machines to learn and adapt by identifying patterns. It involves de-
veloping algorithms that allow systems to perform tasks autonomously and improve
their accuracy by being exposed to training data [22].

2.4.1 AI for API Security
AI and ML are increasingly used in cybersecurity, both for application and for net-
work protection, as well as for threat and risk analysis across data sources. ML mod-
els are good at identifying relationships between various types of threats, suspicious
IP addresses, and abnormal behaviors through advanced data analysis. In real-time
applications where fast responses are critical, ML’s ability to quickly analyze secu-
rity data, make decisions, and trigger actions are valuable for API developers and
managers [23].

12


2. Theory

Traditional API security measures focus on access control mainly through authen-
tication, authorization, rate limiting, and network privacy. Although these provide
protection at some layers, they are not always sufficient to address more specialized
threats such as API-specific Denial of Service (DoS) attacks, application layer at-
tacks, data exfiltration, or credential-based attacks. AI-powered security solutions
are a great complement to traditional methods, providing deeper insights into API
traffic patterns, historical attacks, and real-time anomaly detection. AI solutions
enable proactive defense mechanisms that adapt to new and evolving threats [23].

A comprehensive API security strategy requires not only basic security functions
but also anomaly detection capabilities. This serves as a first line of defense, where
malicious behavior can be detected and flagged immediately, often without prior
knowledge of specific attacks or pre-written rules. AI and ML are well suited for
building intelligent API security solutions capable of identifying unusual behaviors,
harmful data trends, and blocking attacks in dynamic environments. Over time,
such systems can continuously learn and improve, detecting deviations from normal
behavior even without explicit attack signatures or policies [23].

Various machine learning algorithms such as Naïve Bayes, K-Nearest Neighbors
(KNN), Decision Trees, Random Forests, Support Vector Machines (SVM), as well
as Deep Learning models and Neural Networks, are commonly recommended and
applied in API security to strengthen detection and response capabilities [23].

2.4.2 Behavioral Analysis
The growing use of IoT (Internet of Things) technology, such as vehicle telematics
systems, has made these devices attractive targets for attackers who exploit common
security and access control weaknesses. Many IoT threats rely on simple vulnerabili-
ties. A well-known example is the Mirai botnet, which exploited the Telnet protocol
due to weak or default security configurations. Botnets like Mirai can use DNS to
communicate with their Command and Control (C2) servers or even leverage DNS
itself as an attack vector to increase traffic [24].

Event logging and notification systems are fundamental to effective cybersecurity.
Traditional Intrusion Detection Systems (IDS) primarily focus on analyzing system
and network logs. However, as noted in the literature, monitoring can also involve
direct traffic analysisranging from advanced honeypot-based detection to more tra-
ditional approaches like deep packet inspection (DPI) or proxy-based analysis at
central network nodes. A combination of these methods can result in a robust
hybrid IDS capable of identifying various types of malicious traffic [24].

Signature-based detection remains a powerful and relatively user-friendly approach,
though heuristic and anomaly-based methods may trigger more false positives. Hon-
eypots offer a unique strategy by intentionally exposing endpoints to attract suspi-
cious traffic. This traffic often exhibits abnormal patterns, which can be used to
generate new detection signatures [24].

Large-scale analysis of REST API usage reveals that many APIs suffer from design
flaws, such as improper use of HTTP methods and operation tunneling through

13


2. Theory

query parameters, both of which diverge from standard RESTful practices. These
poor implementation choices can introduce detectable anomalies in behavior, sup-
porting the case for behavior-based anomaly detection systems [25].

Another common issue is the inconsistent naming and structuring of resources within
RESTful APIs. A study examining real-world APIs like Facebook, Twitter, and
YouTube identified frequent mistakes known as linguistic antipatterns. For example,
a URL such as https://www.example.com/newspapers/players?id=123 combines two
unrelated resources, “newspapers” and “players”, in a single endpoint. This can
confuse both developers and automated systems, making it unclear what the API
is meant to do. In behavior analysis, such deviations may be flagged as anomalies
since the structure breaks typical design patterns. For models trained to recognize
normal behavior, these flawed requests can increase error rates and false positives.
Detecting antipatterns is therefore essential not only for better API design but also
for improving the reliability of behavior-based detection systems [26].

API behavior analysis is not only a technical security tool. It also serves as a valu-
able business intelligence asset. It provides insight into how applications, services,
and users interact. This goes beyond simply measuring how often an API is used.
It involves collecting meaningful data that helps organizations understand usage
patterns and their impact [27].

By analyzing the frequency and nature of API calls, businesses can identify popular
features, usability issues, or areas where users encounter problems. These insights
help teams better understand user behavior and adjust services accordingly. API
analysis also reveals market trends, enabling companies to stay agile and meet evolv-
ing demands [27].

2.4.3 Supervised and Unsupervised Learning Algorithms
Two common learning techniques in machine learning are supervised learning and
unsupervised learning. Supervised learning involves algorithms trained on labeled
data, meaning each input is paired with a correct known output. Such models
learn by comparing their predictions with the corresponding labels and adjusting
themselves to minimize errors. Unsupervised learning is another approach where
input data is unlabeled and the goal is to recognize patterns and structures by only
considering its features [28].

Algorithms developed with supervised learning are commonly used to solve classifi-
cation problems, where the goal is to determine the correct class for a given input.
Some commonly used models for such problems are listed below:

• Logistic Regression

• Support Vector Machines

• Decision Trees

• Random Forests

• K-Nearest Neighbors

14


2. Theory

• Convolutional Neural Networks

Unsupervised learning models are useful for anomaly detection, a component of data
analysis aimed at identifying irregularities among normal data. Since these models
do not require labeled data, they can detect anomalies in an unsupervised manner
[29]. Some commonly used models for such problems are listed below [30]:

• Isolation Forests

• K-Means Clustering

• One-class support vector machine (SVM)

• One-class SVM with stochastic gradient descent (SGD)

• Robust covariance

2.4.4 Isolation Forests
In an Isolation Forest, the goal is to isolate anomalies, where anomalies are data
points that differ from the rest of a dataset. Within this concept, isolation refers
to the process of separating an instance from the rest of the data. The separation
process relies on the characteristics of the anomalies: they are small in number
and obtain different attributes compared to normal data instances. With this in
consideration, Isolation Forests are constructed using binary trees, where instances
are partitioned recursively. Anomalies tend to be isolated with a shorter path in
such a tree due to their rarity and distinct attribute values [31].

Each binary tree in an Isolation Forest, in this context also called an isolation tree
(or iTree), consists of two different types of nodes. A node is either an external
node with no children or an internal node with two daughter nodes. A tree is grown
by sampling instances from the data set and, at each node, randomly selecting an
attribute q and a split value p. The test q < p determines wether the path to a data
point travels to the left or right daughter node. This process continues until only
one data point remains in a node or all instances at a node have the same value q.
Hence, a path in an isolation tree, from the root node to an leaf node, represents an
instance of a data point from the considered data set. The traversal of a data point
depends on the randomly selected splits [31].

To be able to determine which external nodes represent anomalies, an anomaly score
is calculated for each data instance. Since anomalies are rare and have distinct
attribute values, they are more likely to be isolated early in the tree, resulting in
a shorter path from the root node to an external node. With the results from
multiple isolation trees combined into an Isolation Forest, anomalies are detected
by analyzing the average path length of each data point. This average path length
is used to derive an anomaly score s, with s defined as:

s(x, ψ) = 2−E(h(x))
c(ψ) (2.1)

where h(x) is the path length of data point x, E(h(x)) is the average path length

15


2. Theory

of data point x, and c(ψ) is the average path length of unsuccessful searches in a
Binary Search Tree (BST) of size ψ. c(ψ) can be utilized in this way due to the
structural equivalence between isolation trees and a BST. c(ψ) is defined as:

c(ψ) =


2H(ψ − 1) − 2(ψ−1)

ψ
for ψ > 2

1 for ψ = 2
0 otherwise

(2.2)

with H(i) being the harmonic number, estimated as H(i) ≈ ln(i) + 0.5772156649
(Euler’s constant) [31].

The anomaly score s(x, ψ) lies in the range (0, 1], where a score close to 1 suggests
that the data point is likely an anomaly, and a score close to 0 indicates it is likely
normal [31].

A visualization of how an Isolation Forest works is shown in Figure 2.1. In this
figure, three different isolation trees are displayed. Each tree represents decision
paths for a subset of data points in the dataset. Normal data points are visualized
as blue nodes, while anomalies are shown as red nodes. The deeper a node appears
in a tree, the more similar the corresponding data point is to the majority of the
data. If a data point is isolated early in the tree (i.e., closer to the root), it indicates
that the point is significantly different from the rest, and is therefore classified as
an anomaly.

Figure 2.1: Isolation Forest visualization with three different iTrees.

2.4.5 Data Encoding
A machine learning model typically requires data to be represented as numerical
values. This means raw data must be transformed before the model can interpret
it. This transformation is usually done using an encoding tool designed to preserve
patterns, balance or normalize the data, and handle missing values to prevent errors
[32].

A commonly used paradigm for encoding and data transformations is the fit-predict
paradigm. The transformer should be fitted only on the training data, for example,

16


2. Theory

recording the mean and standard deviation when using a standard scaler. The
training data is then transformed using the fitted transformer before training the
model. The same fitted transformer is later used to transform the test data before
evaluation. Fitting the transformer on the entire dataset before splitting can lead
to data leakage, resulting in misleading model evaluations [32].

Two commonly used encoding types are a one-hot encoder and a text data encoder.
One-hot encoding is used for nominal data, creating binary features for each category
without implying order. It converts a feature with n values into n separate 0/1
features, which can increase dimensionality and requires handling unseen categories
in the test set- When encoding text data the data must be converted into numerical
form for the ML models. This is often done by tokenizing text into words and
assigning unique indexes and representing each sample as word indexes. These
indexes can then be transformed using one-hot encoding or embeddings [32].

Scikit-learn1 offers a variety of encoding tools that works well for ML models [33].
The OneHotEncoder, CountVectorizer and TfidfVectorizer are three of them. The
OneHotEncoder is typically used for categorical data that has no inheret order
[34], while the CountVectorizer and TfidfVectorizer are used for text data. The
CountVectorizer creates a bag-of-words representation meaning the text is tokenized
into words and each words occurance is counted [35]. TfidfVectorizer applied Term
Frequency-Ineverse Document Frequency (TF-IDF) weighting instead of raw counts.
Assigns higher importance to words that appear frequently in a document but rarely
across the dataset. This method helps to reduce the impact of words of high fre-
quency [36].

2.4.6 Evaluation Metrics
Various metrics are commonly used to evaluate machine learning models. Two of
the most widely used metrics for classification problems are the accuracy and the
confusion matrix, both of which are used in this project. Accuracy measures the
proportion of correct predictions made by a model relative to the total number of
input samples. It is calculated as the ratio of correctly classified classes to the total
number of predictions [37].

Accuracy = number of correct predictions
total number of predictions

A confusion matrix is a matrix in quadratic form that visualizes the performance of
a classification model. Each row represents the real class labels, while each column
represents the predicted class labels. This structure helps to identify where the
model is making correct predictions and where it is not, providing a more detailed
understanding of its performance beyond the accuracy [37]. An example of the
structure on a confusion matrix can be seen in Table 2.5.

1scikit-learn, a Python library for machine learning. Available at: https://scikit-learn.
org/stable/

17

https://scikit-learn.org/stable/
https://scikit-learn.org/stable/


2. Theory

Predicted: Negative Predicted: Positive
Real: Negative True Negative (TN) False Positive (FP)
Real: Positive False Negative (FN) True Positive (TP)

Table 2.5: A confusion matrix for binary classification. TN and TP represent the
number of correct predicted classes, while FN and FP represents the number of
misclassified instances.

2.5 Related Work
In [38], Alfardus and Rawat propose a machine learning approach to improve security
in in-vehicle networks (IVNs). These networks are used for communication between
different components in modern cars, such as sensors, infotainment systems, and
control units. However, as vehicles become more connected, IVNs also become
more vulnerable to cyber-attacks. To address this problem, the authors explore how
deep learning and feature engineering can be used to detect anomalies in IVNs and
through that improve cybersecurity.

They used real-world IVN traffic data for the experiment, which included both
normal and attack traffic. Before training any models the dataset was normalized
so that it was easier to work with. After that that useful features were chosen
and extracted from the traffic data. These included statistical features, such as the
average and variance of the signals, as well as time and frequency domain features.
A convolutional neural network (CNN) was used to learn deep features, these are
complex patterns in the data that might not be easy to see with traditional methods.

These features were used to train two deep learning models. The first model was
a deep neural network (DNN), which was used for direct classification of traffic as
normal or abnormal. The second model was a deep autoencoder, which was trained
only on normal traffic and then tested on its ability to reconstruct inputs. If the
autoencoder could not reconstruct an input well, it was likely an anomaly. This idea
is based on the assumption that normal data is easy to reconstruct, while attack
data is not.

The experiment produced promising results. Their proposed method achieved around
95% accuracy, with an F1 score of 0.95. This means that the system was good at
detecting both actual attacks and avoiding false alarms. When the deep learning
models were compared to more traditional machine learning models like Support
Vector Machine (SVM), Random Forest, and K-Nearest Neighbors (KNN), the deep
learning models performed better in all evaluation metrics. This suggests that using
a combination of feature engineering and deep learning can significantly improve the
performance of anomaly detection in IVNs.

The importance of carefully selecting and tuning the models’ hyperparameters was
also discussed by the authors. For example, they found that a smaller learning rate
and a slightly deeper network led to better results. In addition, they emphasized
that combining hand-made features with learned features from CNNs gave a more
complete picture of the network traffic, which improved detection performance.

18


2. Theory

In conclusion, the study shows that deep learning models, especially when combined
with good feature engineering, are a powerful tool for detecting anomalies in vehicle
networks. However, the authors noted that future work is needed to make the
models more robust against more complex or unknown types of attacks. They also
suggested using larger and more diverse datasets in future studies to improve the
generalization of the models.

In [39], Edmund Fosu Agyemang provides a comprehensive analysis of five promi-
nent unsupervised machine learning algorithms tailored for anomaly detection. The
algorithms evaluated include One-Class Support Vector Machine (One-Class SVM),
One-Class SVM with Stochastic Gradient Descent (SGD), Isolation Forest, Local
Outlier Factor (LOF), and Robust Covariance (also known as the Elliptic Envelope
method). The purpose of the study was to explore how these models perform in
controlled simulation settings and to provide insight into their practical applicability.

The study was conducted using a synthetically generated dataset designed to mimic
real-world scenarios where anomalies are rare and well-separated from the normal
data. The data consisted of two-dimensional features, with 100 normal points clus-
tered around two centers and 20 uniformly distributed anomalies. This setup allowed
for a focused investigation into how each model responds to clear outliers, an ideal
setting to understand baseline performance, although less representative of messy
real-world data.

A key aspect of the research was that all models were trained exclusively on nor-
mal data points. This reflects a common challenge in anomaly detection: the rarity
or complete absence of labeled anomalies during training. The models were then
evaluated based on their ability to correctly identify outliers using accuracy, preci-
sion, recall, and F1 score. These metrics provided a nuanced understanding of the
trade-offs each algorithm would entail.

Special attention was given to the model selection process and the motivations be-
hind each algorithms inclusion. One-Class SVM, a boundary-based method, was
selected due to its ability to encapsulate the region containing normal data and
identify deviations. Its variant, One-Class SVM with SGD, was introduced to ad-
dress scalability issues by enabling more efficient training on large datasets using
stochastic updates. Isolation Forest was included for its unique approach of isolating
anomalies through random splits, making it effective and fast in high-dimensional
settings. LOF was selected as a representative of density-based methods, assessing
local density deviations to identify outliers. Lastly, Robust Covariance was chosen
for its statistical grounding in modeling data distribution and identifying anomalies
as points lying outside an estimated Gaussian envelope.

The study found that the performance of each algorithm was highly dependent on
the characteristics of the dataset. One-Class SVM and Robust Covariance achieved
perfect recall but suffered from moderate precision, suggesting they were good at
capturing all anomalies but often at the cost of misclassifeying some normal data
points. In contrast, One-Class SVM with SGD exhibited excellent precision mean-
ing it rarely misclassified normal points but at the expense of very low recall, miss-
ing many actual outliers. This makes it suitable in contexts where false positives

19


2. Theory

are especially costly. Isolation Forest emerged as a strong general-purpose option,
providing a good balance between recall and precision, and maintaining a high F1
score. Meanwhile, LOF performed the worst in this specific setting, possibly due to
its sensitivity to neighborhood parameters and the relatively uniform distribution
of anomalies in the synthetic dataset.

Overall, the article emphasizes that there is no one-size-fits-all algorithm for anomaly
detection. The effectiveness of each method hinges on both the data characteristics
(such as distribution, dimensionality, and density) and the operational context of
its application. For example, applications prioritizing the minimization of false
alarms may benefit from high-precision models like One-Class SVM with SGD, while
scenarios where catching every anomaly is critical (such as fraud or fault detection)
may require high-recall models like Robust Covariance or traditional One-Class SVM.
The study also underlines the importance of careful hyperparameter tuning and
encourages further research using real-world datasets to validate the insights gained
from the controlled simulation.

20


3
Methods

This chapter explains how the data was prepared and how the machine learning
model was selected and implemented. It also describes the experiments conducted
and the tests that were performed.

3.1 The Dataset
Getting data for the project turned out to be challenging. Due to GDPR regulations
and internal policies at WirelessCar, access to their data took a long time. It had to
pass through several security checks before it could be used for training the model.
Because of the limited time available for the project, it became necessary to start
working with a different dataset while waiting for access to WirelessCars data.

At first, the project was carried out using computer-generated datasets, made up of
normal, non-malicious API requests. These had a similar structure to WirelessCars
real data in terms of which components were generated and the general structure of
the requests. Since these generated datasets only contained normal traffic, abnormal
requests were manually created based on common injection attack patterns. This
allowed for testing the model’s ability to detect unusual behavior.

The components of the generated dataset are listed in Table 3.1, where each com-
ponent is provided with a short explanation and an example entry from a request
log.

Component Explanation Example
IP of Client IP address of the client that

sent the address.
125.87.60.188

Remote Log Name Remote name of the client
sending the request. This in-
formation is hidden or not
available.

-

User ID ID of the client sending the
request. This information is
hidden or not available.

-

21


3. Methods

Component Explanation Example

Date and Time in
UTC format

The date and time of the re-
quest.

[27/Dec/2037:12:00:00
+0530]

Request Type, API,
Protocol and Version

An API string that con-
tains the type of the re-
quest (GET, POST, PUT or
DELETE), the API of the
website to which the request
is related, and the protocol
and its version used for con-
necting with the server.

"GET /usr HTTP/1.0"

Status Code The code the server returns
after the request. For exam-
ple, 200 is returned when the
request was performed suc-
cessfully.

404

Byte The amount of data in bytes
that was sent back from the
server to the client.

4961

Referrer The websites/source from
where the user was directed
to the current website. If
none it is represented by -.

http://www.parker-
miller.org/tag/list/list/privacy/

UA String The user agent string con-
tains details of the browser
and the host device (like the
name, version, device type
etc.).

"Mozilla/5.0 (Windows
NT 10.0; Win64; x64)
AppleWebKit/537.36
(KHTML, like Gecko)
Chrome/87.0.4280.88
Safari/537.36
OPR/73.0.3856.329"

Response Time The response time the server
took to serve the request.

2529

Table 3.1: Overview of the structure of the generated request logs.

Later in the project, real data from WirelessCar became available. It included two
datasets with normal API traffic from the application and WirelessCars servers.
Due to security precautions, some fields in the data were hashed to protect sensi-
tive information. Although the data was unlabeled, it had already passed through
WirelessCars internal security filters, so it was assumed to only contain legitimate,
non-harmful, traffic. As with the earlier dataset, custom attack samples were created

22


3. Methods

and added in order to test the models performance in spotting malicious activity.

Listing 3.1: Example of a raw API request log entry.
125.87.60.188 - - [27/Dec/2037:12:00:00 +0530] "GET /usr HTTP/1.0" 404

4961 "http://www.parker-miller.org/tag/list/list/privacy/" "Mozilla
/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/87.0.4280.88 Safari/537.36 OPR/73.0.3856.329" 2529

3.1.1 Data Preprocessing
Since attacks on APIs can be carried out in various ways, on various parts of the API
and communication process, it was decided to only analyze one request attribute.
The API string, which includes the request type, path, and protocol, was chosen
because it is known to be vulnerable to threats such as injection attacks and path
traversal.

The dataset was generated using publicly available code1 , where some entries were
modified to better suit the needs of the project. Specifically, API requests containing
the path segment ’admin’ were reclassified as malicious instead of normal. This was
based on the assumption that administrative access should not be publicly accessible,
which was later enforced through an AWS security rule that was set up. An example
of non-malicious generated log component is shown in Listing 3.2.

Listing 3.2: Example of a normal API component.
"DELETE /usr HTTP/1.0"

To be able to evaluate the implemented machine learning models, malicious requests
were generated for 10% of the data. Common injection attacks were randomly added
as payloads to the path components of the API strings. Each request was labeled by
adding a new field, where 0 was assigned to normal requests and 1 to the generated
malicious ones. See Listing 3.3 for the malicious paths used, and Listing 3.4 for an
example of a malicious path added to an API component.

Listing 3.3: Malicious payloads added to the API paths.
"1' OR '1'='1"
"/api/etc/user/passwd"
"rm -rf /"
"105'; DROP TABLE users; --"
"/admin"

Listing 3.4: Example of a malicious API string with a SQL injection.
"PUT /usr1' OR '1'=1 HTTP/1.0"

For the models to process textual data, its attributes had to be encoded into numer-
ical values. OneHotEncoder, CountVectorizer, and TfidfVectorizer from the Python

1Code and data from Vishnu U., Server Logs Dataset. Kaggle. Available at: https://www.
kaggle.com/datasets/vishnu0399/server-logs (accessed 2025-02-21).

23

https://www.kaggle.com/datasets/vishnu0399/server-logs
https://www.kaggle.com/datasets/vishnu0399/server-logs


3. Methods

module scikit-learn were tested, with TfidfVectorizer2 being selected. The TfidfVec-
torizer is an implementation of the TF-IDF (Term Frequency-Inverse Document
Frequency), a common measure for natural language processing used to evaluate
the importance of words in a text document relative to an entire collection of docu-
ments [40].

3.2 Selection of Machine Learning Model
The first step of the project was to select a suitable machine learning model for
anomaly detection. Since the raw data that servers receive is unlabeled it made sense
to use a unsupervised model to be able to accurately assess it’s potential. To guide
this choice, several related reports were reviewed, with one particularly influential
study being Anomaly Detection Using Unsupervised Machine Learning Algorithms:
A Simulation Study by Edmund Fosu Agyemang [39], discussed in Chapter 2.5.

This study compared five commonly used unsupervised models under controlled
conditions, including Isolation Forest, One-Class SVM, and Robust Covariance. All
models were trained only on normal data and then tested on a mix of normal and
anomalous points, which is a setup similar to our project. In our project the model
will however be trained on a dataset containing both normal and malicious requests.
The results from the study showed that Isolation Forest offered the best overall
balance between precision and recall, and it consistently achieved high F1 scores
across different types of data. It also performed well on datasets with different
structures and feature sets, which was important for our project, as we used both
generated and real-world data from WirelessCar that differed in format.

Given its strong general-purpose performance, low sensitivity to data dimensionality,
and efficiency on larger datasets, Isolation Forest was chosen as the most appropriate
model for our needs. Its ability to isolate anomalies without requiring labeled attack
data made it a particularly good fit for this project, since that is how the model
would have to operate in practice.

3.3 Threat Modeling
Before starting the project a threat model was made to identify potential threats
that the application is exposed to. This was made by analyzing the application,
its security risks, and infrastructure. This was also done to be able to accurately
hand-craft attacks that would simulate threats that might arise in real life.

In this application all common API threats, such as traffic overload (DoS/DDoS), au-
thentication hijacking, injection attacks, Man-in-the-Middle (MitM), data exposure,
and parameter tampering, would be possible.

To be able to analyze the results in clear, and since only one part of the API-request
was chosen to train the model, only one specific attack type was chosen. Since the

2TfidfVectorizer from scikit-learn v1.6.1. Documentation: https://scikit-learn.org/
stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

24

https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html


3. Methods

request parameter studied in this experiment was the API string, injection attacks
was chosen as the primary threat to focus on.

3.4 Isolation Forest Experiment Set Up
Isolation forests in this project were implemented using the isolation forest algo-
rithm3 from the scikit-learn library in Python. The isolation forest model in scikit-
learn provides several methods for its functionality, and the methods used together
with a description of their functionalities are listed in Table 3.2:

Method Functionality
fit(X) Builds a fitted isolation forest estimator based

on the input samples X.
decision_function(X) Returns the mean anomaly score of the trees

in the isolation forest for each instance in the
dataset X.

predict(X) Predicts whether each data point from X is an
anomaly or not based on a fitted model. -1
is return for anomalies and 1 for normal in-
stances.

Table 3.2: Overview of some parameters available for construction of an isolation
forest in Python.

In addition to this isolation forest’s fitting and prediction capabilities, such a model
can be adjusted with the parameters listed in Table 3.3.

Parameter Explanation
n_estimators The number of estimators (trees) in the isolation forest.
max_samples int, float or ’auto’. The number of samples drawn from the

dataset to train each estimator. If ’auto’, then max_samples
= min(256, n_samples).

contamination ’auto’ or float. The proportion of anomalies in the dataset.
If set to a float value, it sets the threshold for prediction and
must be in the range (0, 0.5]. If ’auto’, anomalies are detected
as in a standard Isolation Forest (see Section 2.4.4).

max_features int or float. The number of features to draw from the dataset
to train each estimator.

3IsolationForest from scikit-learn v1.6.1. Documentation: https://scikit-learn.org/
stable/modules/generated/sklearn.ensemble.IsolationForest.html

25

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html


3. Methods

bootstrap bool. If True, trees are fit on random subsets of the data with
replacement. If False, without replacement.

n_jobs int. The number of parallel jobs to run during model fitting.
random_state int, RandomState instance, or None. Controls randomness in

feature selection and split values. Ensures reproducibility if
set.

verbose int. Controls the verbosity of the tree building process.
warm_start bool. If True, allows additional trees to be added to an exist-

ing forest.

Table 3.3: Overview of the parameters available for constructing an Isolation Forest
using scikit-learn in Python.

An example of an isolation forest implementation can be seen in Listing 3.5, where
X_train and X_test contain the pre-processed data used for training and testing.

1 from sklearn.ensemble import IsolationForest
2

3 model = IsolationForest(
4 n_estimators = 200,
5 max_samples = 0.8,
6 contamination = 0.1,
7 n_jobs = -1,
8 verbose = 1
9 )

10

11 model.fit(X_train)
12

13 anomaly_scores = model.decision_function(X_test)
14 anomaly_lables = model.predict(X_test)

Listing 3.5: Isolation forest with scikit-learn.

To answer the research questions posed in this thesis, multiple experiments were
conducted. Each experiment was designed with a specific purpose to address differ-
ent aspects of the overall research objectives. The experimental process began with
validating the setup and dataset to ensure proper functionality. Based on the initial
analysis of results, further experiments were carried out to deepen the investigation.
The initial experiments used a generated dataset, and later, when company data
became available, additional tests were performed using that data.

The general setup for each experiment involved implementing the Isolation Forest
algorithm with specific configurations, dataset types, and dataset sizes. This allowed
for evaluating both the functionality of the Isolation Forest approach and the im-
pact of various real-world factors on its performance. Datasets of varying sizes were

26


3. Methods

generated and used for both training and testing. Multiple experiments were per-
formed with different values for the contamination parameter to assess its effect on
detection accuracy. The fit method from the scikit-learn library was used for train-
ing, while the predict and decision_function methods were used for evaluation.
Since the fit method does not provide evaluation scores, performance assessment
relied entirely on the results obtained from predict and decision_function.

To measure the performance of the Isolation Forest implementations, accuracy met-
rics4 and confusion matrices5 were evaluated using tools provided by the scikit-learn
library (see Listing 3.6).

1 from sklearn.metrics import accuracy_score, confusion_matrix
2

3 predicted_lables = (anomaly_lables == -1).astype(int)
4

5 accuracy = accuracy_score(real_lables, predicted_lables)
6 cm = confusion_matrix(real_lables, predicted_lables)

Listing 3.6: Evaluation metrics with scikit-learn.

3.4.1 Initial Experiment

To begin, three Isolation Forest models were generated to evaluate the initial setup
and determine how to proceed with the remaining experiments. The first model
was trained using a generated dataset consisting of one million normal data points.
During testing, a dataset of 10,000 data points were used of which 10% were hand-
crafted malicious requests. Following this, a second model was implemented, this
time including some of the malicious requests in the training set. The testing was
done the same way as in the first test. Lastly, a third test was done with all of
the malicious requests used during training, while the testing was carried out in the
same way again.

The purpose of this experiment was to understand how the Isolation Forest behaves
when exposed to malicious requests only during testing, compared to when such
requests are present in both training and testing. For both of these tests with
malicious requests in the training set, the contamination rate was set to the actual
rate of bad requests.

4accuracy_score from scikit-learn v1.6.1. Documentation: https://scikit-learn.org/
stable/modules/generated/sklearn.metrics.accuracy_score.html

5confusion_matrix from scikit-learn v1.6.1. Documentation: https://scikit-learn.org/
stable/modules/generated/sklearn.metrics.confusion_matrix.html

27

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html


3. Methods

Table 3.4: Test 1: Generated dataset with malicious only in testing.

Attribute Value
Training Data Generated dataset, no malicious data

(1,000,000 data points)
Testing Data 10,000 data points (10% hand-crafted mali-

cious requests)
Contamination Parameter ’auto’ (estimated by model, can not be zero)
Ground Truth Contamination 0

Table 3.5: Test 2: Generated dataset with malicious in both training and testing.

Attribute Value
Training Data Generated dataset with a subset of the hand-

crafted malicious requests (1,000,000 data
points; 100,000 malicious, 900,000 benign)

Testing Data 10,000 data points (10% hand-crafted mali-
cious requests)

Contamination Parameter 0.10
Ground Truth Contamination 0.10

Table 3.6: Test 3: Generated dataset with all malicious requests included in training.

Attribute Value
Training Data Generated dataset with all hand-crafted ma-

licious requests included (1,000,000 data
points; 100,000 malicious, 900,000 benign)

Testing Data 10,000 data points (10% hand-crafted mali-
cious requests)

Contamination Parameter 0.10
Ground Truth Contamination 0.10

3.4.2 Contamination Experiment
During the initial experiments it became clear that the contamination parameter
significantly influences the performance of the model. To better understand this
relationship, four additional Isolation Forest models were trained with varying con-
tamination settings. The purpose of these experiments is to examine how overesti-
mating or underestimating the contamination rate, relative to the true rate, impacts
detection effectiveness. In real-world deployments, the true contamination rate is
typically unknown and can fluctuate over time. This makes it essential to under-
stand how tuning the contamination parameter affects outcomes. These insights are
valuable to configuring models more effectively in dynamic, real-world environments.
The following tests were conducted to explore this further.

28


3. Methods

Table 3.7: Test 4: Contamination parameter set to 0.10.

Attribute Value
Training Data 250,000 data points (10% hand-crafted mali-

cious requests)
Testing Data 250,000 data points (10% hand-crafted mali-

cious requests)
Contamination Parameter 0.10
Ground Truth Contamination 0.10

Table 3.8: Test 5: Contamination parameter set to 0.05.

Attribute Value
Training Data 250,000 data points (10% hand-crafted mali-

cious requests)
Testing Data 250,000 data points (10% hand-crafted mali-

cious requests)
Contamination Parameter 0.05
Ground Truth Contamination 0.10

Table 3.9: Test 6: Contamination parameter set to 0.15.

Attribute Value
Training Data 250,000 data points (10% hand-crafted mali-

cious requests)
Testing Data 250,000 data points (10% hand-crafted mali-

cious requests)
Contamination Parameter 0.15
Ground Truth Contamination 0.10

Table 3.10: Test 7: Contamination parameter set to ’auto’.

Attribute Value
Training Data 250,000 data points (10% hand-crafted mali-

cious requests)
Testing Data 250,000 data points (10% hand-crafted mali-

cious requests)
Contamination Parameter ’auto’ (anomalies estimated by model)
Ground Truth Contamination 0.10

3.4.3 Dataset Size Experiment
To test how the size of the dataset affects the results additional tests were made
using dataset sizes of 100, 1,000, 10,000, and 100,000. The goal was to experiment

29


3. Methods

on how small the dataset could be while still allowing the algorithm to independently
detect anomalies with the contamination parameter set to ’auto’.

Table 3.11: Test 8: Dataset size 100.

Attribute Value
Training Data 100 data points (10% hand-crafted malicious

requests)
Testing Data 100 data points (10% hand-crafted malicious

requests)
Contamination Parameter ’auto’
Ground Truth Contamination 0.10

Table 3.12: Test 9: Dataset size 1,000.

Attribute Value
Training Data 1,000 data points (10% hand-crafted mali-

cious requests)
Testing Data 1,000 data points (10% hand-crafted mali-

cious requests)
Contamination Parameter ’auto’
Ground Truth Contamination 0.10

Table 3.13: Test 10: Dataset size 10,000.

Attribute Value
Training Data 10,000 data points (10% hand-crafted mali-

cious requests)
Testing Data 10,000 data points (10% hand-crafted mali-

cious requests)
Contamination Parameter ’auto’
Ground Truth Contamination 0.10

Table 3.14: Test 11: Dataset size 100,000.

Attribute Value
Training Data 100,000 data points (10% hand-crafted mali-

cious requests)
Testing Data 100,000 data points (10% hand-crafted mali-

cious requests)
Contamination Parameter ’auto’
Ground Truth Contamination 0.10

30


3. Methods

3.4.3.1 Experiment with Real Life Data

When the company data became accessible a final test was done to confirm that
the model would work on a more varied dataset. As only 8186 data points were
received this is the size of the dataset used. 10% of these requests were edited to
have malicious payloads. This means that the path changed to contain something
that could be harmful, just like how the previous malicious requests were created.

Table 3.15: Test 12: Real company dataset (10% malicious).

Attribute Value
Training Data Real company dataset (8,186 data points; 90% benign,

10% hand-made malicious)
Testing Data Same as training set
Contamination Parameter ’auto’
Ground Truth Contamination 0.10

3.5 Rule Based Testing Through AWS
To compare the performance of a rule-based system with the Isolation Forest ap-
proach used in the experiment, a rule-based system was implemented and tested
using AWS infrastructure. The setup involved deploying an AWS Lambda function,
which served as the protected resource. An Amazon API Gateway was configured
in front of the Lambda function, along with an associated AWS Web Application
Firewall (WAF) that enforced request filtering rules.

Due to time constraints, no custom (hand-crafted) rules were created. Instead,
AWS-managed rule groups were added to the WAF. These rule groups are designed
to cover a wide range of common web-based threats and were expected to block
the crafted attacks used in the experiment. However, the specific details of these
managed rules are not publicly available for security reasons.

The dataset was uploaded to an Amazon S3 bucket and used to generate requests
sent through the API Gateway. The WAF was configured to log all blocked requests
along with the reasons for blocking. This logging enabled analysis of the systems
performance by identifying which requests were blocked and which were allowed
through.

Table 3.16: Test 13: AWS setup with generated dataset (10% malicious).

Attribute Value
Data Generated dataset (250,000 data points; 90% benign,

10% hand-made malicious)
Contamination 0.10
Rules AWS Managed rules (Core Rule Set, Admin Protection,

Known Bad Inputs)

31


3. Methods

3.6 Evaluation
To evaluate the success of the project, the performance of the rule-based system using
AWS managed rules will be compared to the trained AI model. The comparison
will focus on how many of our crafted bad requests each system successfully detects
(the accuracy). Since this project serves as a proof of concept, we will consider it
successful if the AI model can catch at least all the requests detected by the rule-
based system. Achieving this would demonstrate that the AI approach is capable
of identifying previously unseen attacks and malicious behavior without relying on
predefined rules.

32


4
Results

To evaluate the effectiveness of AI-based anomaly detection for securing API com-
munication in connected vehicles, a series of experiments were conducted using both
generated and real-world datasets. This chapter presents the outcomes of these ex-
periments, focusing on the performance of the Isolation Forest model under various
configurations and scenarios. The results are organized based on key aspects of the
evaluation process, including the models behavior with different contamination rates,
varying dataset sizes, and the proportion of handcrafted malicious samples in the
data. Finally, the results obtained from the real-world company dataset, as well as
the rule-based approach using AWS WAF, are presented. Each experiment is accom-
panied by a confusion matrix, and the corresponding accuracy metric. Additionally,
the first experiment includes a visualization of the anomaly scores.

4.1 Isolation Forest Results
This section presents the results from the experiments conducted with the Isolation
Forest model. It begins with the initial setup experiments, followed by evaluations
focusing on contamination rate sensitivity and dataset size. The section concludes
with the results obtained from experiments using the real-world company dataset.

4.1.1 Initial Experiment on Generated Dataset

Test Accuracy
Test 1: Generated Dataset with No Anomalies in Training
set

0.1889

Test 2: Generated Dataset with a Subset of Handmade
Anomalies in Training Batch

0.9806

Test 3: Generated Dataset with Handmade Anomalies in
Training and Testing Batch

1.0000

Table 4.1: Results of the initial experiments. Each result is illustrated with a corre-
sponding plot of anomaly scores and a confusion matrix.

4.1.1.1 Generated Dataset with No Anomalies in Training set

Results from Test 1. The set up is presented in Table 3.4.

33


4. Results

Figure 4.1: The plot of anomaly scores of the isolation forest built and trained on a
generated dataset of 1,000,000 samples where no anomalies are included. Here the
test set contains 100 samples for a better visualization.

Figure 4.2: The confusion matrix of the isolation forest built on a generated dataset
where no anomalies are included.

4.1.1.2 Generated Dataset with a Subset of Handmade Anomalies in
Training Batch

Results from Test 2. The set up is presented in Table 3.5.

34


4. Results

Figure 4.3: The plot of anomaly scores of the isolation forest built on a generated
dataset where a subset of anomalies are included in the training set. Here the test
set contains 100 samples for a better visualization.

Figure 4.4: The confusion matrix of the isolation forest built on a generated dataset
where a subset anomalies are included in the training set.

4.1.1.3 Generated Dataset with Handmade Anomalies in Training and
Testing Batch

Results from Test 3. The set up is presented in Table 3.6.

35


4. Results

Figure 4.5: The plot of anomaly scores of the isolation forest built on a generated
dataset where the whole set of anomalies are included in the training set. Here the
test set contains 100 samples for a better visualization.

Figure 4.6: The confusion matrix of the isolation forest built on a generated dataset
where the whole set of anomalies are included in the training set.

36


4. Results

4.1.2 Experiments With Contamination On Generated Dataset

Test Accuracy
Test 4: Contamination Set to 10% 1.0000
Test 5: Contamination Set to 5% 0.9497
Test 6: Contamination Set to 15% 0.9549
Test 7: Contamination Set to ’auto’ 1.0000

Table 4.2: Results of tests on the contamination rate. Each result is illustrated with
a corresponding confusion matrix.

4.1.2.1 Contamination Set to 10%

Results from Test 4. The set up is presented in Table 3.7.

Figure 4.7: The confusion matrix of an isolation forest built on a dataset of 250,000
samples. The real contamination proportion is 10% and the contamination parame-
ter is set to 10%.

4.1.2.2 Contamination Set to 5%

Results from Test 5. The set up is presented in Table 3.8.

37


4. Results

Figure 4.8: The confusion matrix of an isolation forest built on a dataset of 250,000
samples. The real contamination proportion is 10% and the contamination parame-
ter is set to 5%.

4.1.2.3 Contamination Set to 15%

Results from Test 6. The set up is presented in Table 3.9.

Figure 4.9: The confusion matrix of an isolation forest built on a dataset of 250,000
samples. The real contamination proportion is 10% and the contamination parame-
ter is set to 15%.

38


4. Results

4.1.2.4 Contamination Set to ’auto’

Results from Test 7. The set up is presented in Table 3.10.

Figure 4.10: The confusion matrix of an isolation forest built on a dataset of 250,000
samples. The real contamination proportion is 10% and the contamination parame-
ter is set to ’auto’.

4.1.3 Experiments with Different Dataset Sizes

Test Accuracy
Test 8: Dataset of Size 100 0.9800
Test 9: Dataset of Size 1,000 0.9900
Test 10: Dataset of Size 10,000 1.0000
Test 11: Dataset of Size 100,000 1.0000

Table 4.3: Results of experiments on different dataset sizes. Each result is illustrated
with a corresponding confusion matrix.

39


4. Results

4.1.3.1 Dataset of Size 100

Results from Test 8. The set up is presented in Table 3.11.

Figure 4.11: The confusion matrix of an isolation forest built on a dataset of 100
samples. The real contamination proportion is 10% and the contamination parame-
ter is set to 10%.

4.1.3.2 Dataset of Size 1,000

Results from Test 9. The set up is presented in Table 3.12.

Figure 4.12: The confusion matrix of an isolation forest built on a dataset of 1,000
samples. The real contamination proportion is 10% and the contamination parame-
ter is set to 10%.

40


4. Results

4.1.3.3 Dataset of Size 10,000

Results from Test 10. The set up is presented in Table 3.13.

Figure 4.13: The confusion matrix of an isolation forest built on a dataset of 10,000
samples. The real contamination proportion is 10% and the contamination parame-
ter is set to 10%.

4.1.3.4 Dataset of Size 100,000

Results from Test 11. The set up is presented in Table 3.14.

Figure 4.14: The confusion matrix of an isolation forest built on a dataset of 100,000
samples. The real contamination proportion is 10% and the contamination parame-
ter is set to 10%.

41


4. Results

4.1.4 Experiments On Company Data
Results from Test 12. The setup is presented in Table 3.15. The accuracy is 0.9961
and the corresponding confusion matrix can be seen in Figure 4.15.

Figure 4.15: The confusion matrix of an isolation forest built on a real life dataset
of 8,186 samples. The real contamination proportion is 10% and the contamination
parameter is set to ’auto’.

4.2 Rule Based System Results
Results from Test 13. The setup is presented in Table 3.16.

Figure 4.16: The confusion matrix from the AWS system set up with managed rules
and the generated dataset.

42


4. Results

Test Accuracy
Test 13: Rule-based Approach/AWS 0.9548

Table 4.4: Results of experiments on aws setup, rule-based approach.

Table 4.5: Examples of malicious requests (label = 1) allowed by AWS WAF.

Method URL Path Suspicious Element Notes

DELETE /live/usr/api/etc/
user/passwd Attempt to access sys-

tem file
Contains path resem-
bling /etc/passwd;
known target in Unix-
based attacks

PUT /live/usr/register/api/
etc/user/passwd File path resembling

system credential file
No referrer; common
in automated malicious
scans

GET /live/usrrm
-rf / Command injection

pattern
Mimics destructive Unix
command rm -rf / in
URL path

43


4. Results

44


5
Discussion

This chapter evaluates the results of the experiments, examining factors that may
have influenced the outcomes. Each experiment is first analyzed individually, fol-
lowed by a combined discussion to assess the overall reliability of the project. Finally,
potential improvements and directions for future work are explored.

5.1 Initial Experiment Evaluation

The first experiment, shown in 3.4, did not yield good results, as most of the ma-
licious data was not recognized as such. An iTree is built based on the features it
encounters during training. When data with similar features is passed through the
trained tree, it is matched accordingly and the traversal stops. However, if the data
contains new, unseen features, it continues down the tree in search of a match. Since
no match exists, it reaches the bottom of the tree. Because the model cannot make
a confident prediction in this case, the data point is not marked as an anomaly and
is instead treated as normal. This behavior was not immediately clear during the
initial experiments but was later understood to occur because the TfidfVectorizer
creates a fixed feature vocabulary from the training data. As a result, any unseen
features in the test set are ignored and therefore not considered by the Isolation
Forest.

This limitation was clearly demonstrated in the first experiment. Therefore, in the
following two experiments with setups shown in Table 3.5 and Table 3.6, malicious
data was included in both the training and testing batches. In the test described in
Table 3.5, only a subset of the malicious requests was used for training. The results
showed some improvement, but as discussed earlier, the model still failed to classify
data points that included previously unseen features.

In the final initial experiment shown in Table 3.6, all types of created attacks were
included in both the training and testing sets. As a result, the model was able to
correctly label all of the data points. This gave valuable insights in how the model
worked and how it should be trained and tested in the following experiments to get
the best results, that is that the training and testing dataset should be the same.

45


5. Discussion

5.2 Contamination Experiment Evaluation

The contamination experiments were conducted to observe how overestimating or
underestimating the expected contamination rate, compared to the ground truth
contamination in the data batch, would affect model performance. Four experi-
ments were performed: one with an underestimated contamination rate, one with
an overestimated rate, one matching the ground truth contamination, and one using
the ’auto’ setting where the model estimates the anomalies on its own.

Both underestimating and overestimating the contamination negatively impacted
the results. When the contamination was overestimated, the model incorrectly la-
beled some benign data points as malicious. Conversely, when the expected contam-
ination was lower than the actual contamination, some malicious data points were
incorrectly labeled as benign.

The best results, with an accuracy of 1.0, were achieved when the contamination
rate was either correctly specified or set to ’auto’. The model uses the expected
contamination rate to determine how many data points to label as anomalies. If
the expected contamination is lower than the actual rate, the model identifies fewer
anomalies than truly exist. If the expected rate is higher, the model starts labeling
the most anomalous-looking normal data points as anomalies to meet the expected
quotaeven if those data points are actually benign.

5.3 Dataset Size Experiment Evaluation

The previous results showed that the contamination parameter significantly influ-
enced the models performance. Using an accurate contamination rate proved crucial
for making reliable predictions. However, in real-world scenarios, knowing the exact
contamination rate is not feasible, as it can vary over time. Therefore, the final set
of experiments aimed to determine how much data the model requires for the ’auto’
setting to produce accurate results.

The experiments revealed that the ’auto’ setting could accurately estimate the con-
tamination rate, provided that the data set was large enough. However, when the
dataset was too small, the model’s estimation became less reliable. This is likely be-
cause small datasets do not exhibit clear patterns. As a result, the model sometimes
falsely labeled benign data as malicious.

When the model builds its trees, it relies on the features present in the data and
the frequency with which they appear. In a small dataset, normal data may include
features that appear unique by chance. This can cause the model to misclassify
anomalous data as normal since the lines between what is normal and anomalous are
not as clear. In contrast, with a larger dataset, the patterns become more consistent
and statistically meaningful, reducing the likelihood of these false positives.

46


5. Discussion

5.4 Real Life Data Experiment Evaluation
The tests using the companys data, combined with the configurations that had
performed best in earlier experiments (i.e., contamination set to ’auto’ and inclusion
of malicious data in both the training and testing sets), yielded promising results.
The model successfully detected almost all of the malicious requests. The ones it
missed might have been because the dataset was quite small. As seen in previous
experiment the accuracy began to dip when training was done on less than about
10,000 data points, which this dataset had.

However, it should be noted that the malicious requests were manually crafted and
deliberately inserted into the data set. This may have made them easier to detect
compared to more subtle organic attacks that could occur in a real-world setting.

5.5 Rule-Based System Experiment Evaluation
The results from the rule-based system using AWS WAF highlighted its strengths,
but also its limitations. Some of the handcrafted malicious requests were successfully
detected by matching predefined rules in the AWS Managed Ruleset. AWS WAF is
designed to block known attack signatures such as SQL injections, cross-site script-
ing, and access to restricted paths. However, the system failed to detect some of the
crafted attacks, which demonstrates a key limitation of purely rule-based defenses:
they cannot catch what they are not explicitly programmed to recognize.

While the AWS WAF test provided useful insight into how rule-based systems work,
it does not fully reflect a real production setup. In practice the system would be
adjusted the rules to match the specific application and threat landscape. None of
these were included in this experiment.

Overall, AWS WAF performed well on known, simple attacks, but the test setup was
limited so not much else can be concluded. This does, however, sufficiently highlight
the need to combine rule-based tools with adaptive methods like machine learning
to detect unknown threats.

5.6 Experiment Reliability
There are several important considerations when interpreting the results of this study.
The most significant factor is the nature of the data used in the experiments. Since
the entire dataset was synthetically generated, it tends to be quite predictable and
follows easily recognizable patterns. Although different features appear in various
combinations, the overall number of distinct features is limited, which reduces the
datas diversity.

The attack scenarios included in the dataset were also quite constrained. Only five
distinct attack paths were created. While these were combined with other API
path components in different ways, the overall variation remained low. It could
be argued, however, that because the contamination rate was set relatively high

47


5. Discussion

despite the limited attack diversity, the frequency of attacks in the dataset increased,
potentially making them harder for the model to detect.

Isolation Forest is effective at identifying patterns that indicate normality. When the
data is inherently predictable, the model performs particularly well at recognizing
these patterns. However, since the analysis involved tokenizing and separating each
parameter in the API path (using the TfidfVectorizer), the sequential relationships
between tokens were not preserved. This lack of context can cause the model to
miss important behavioral cues. For example, a path ending with a specific term x
might be harmful, whereas having x in the middle may not be. Similarly, a certain
path y might appear harmless when used with method A but could be suspicious
when used with method B.

The high predictability of the dataset likely explains why the model achieved an ac-
curacy of 1. The attack patterns were relatively easy to distinguish from the normal
data points, making the classification task less challenging. One factor that signif-
icantly impacted performance was the contamination rate. The results indicated
that this parameter needed to be either set accurately or configured as ’auto’ for the
model to perform well. In systems with a constant flow of unpredictable data, how-
ever, the contamination rate is generally unknown in advan