Attack Traffic Generation for Network-based
Intrusion Detection System
Master’s thesis in Computer science and engineering

Chandrika Neelap
Harsh Vardhan Bhandari

Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2023


Master’s thesis 2023

Attack Traffic Generation for Network-based
Intrusion Detection System

Chandrika Neelap
Harsh Vardhan Bhandari

Department of Computer Science and Engineering
Chalmers University of Technology

University of Gothenburg
Gothenburg, Sweden 2023

ii


Attack Traffic Generation for Network-based Intrusion Detection System

Chandrika Neelap
Harsh Vardhan Bhandari

© Chandrika Neelap, Harsh Vardhan Bhandari, 2023.

Supervisor: Magnus Almgren, Department of Computer Science and Engineering
Advisor: Hjalmar Wennerström, Joergen Nilsson, Robert Bosch AB
Examiner: Magnus Almgren, Department of Computer Science and Engineering

Master’s Thesis 2023
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Typeset in LATEX
Gothenburg, Sweden 2023

iii


Abstract
The automotive industry is constantly coming up with technological advances mak-
ing automotive vehicles a complex system consisting of a multitude of electronic,
mechanical, and software components. A critical part of such systems is the elec-
tronic control unit (ECU) which is responsible for controlling specific functions.
Nowadays, automotive vehicles are equipped with more than 100 ECUs that control
a wide range of functions, from essential (engine and power steering control) to com-
fort (windows, seats, etc) to critical (airbags). The Controller Area Network (CAN)
helps the ECUs communicate with one another using a common bus. The CAN
bus is a message-based protocol that offers reliable, priority-driven communication
of essential control data.

The CAN bus, despite its reliability and efficiency, is prone to a variety of cyber
attacks. The vulnerabilities of CAN towards cyber attacks can be reduced by the
deployment of an Intrusion Detection System (IDS). IDS detects intrusions by ob-
serving the events or by validating the range of different parameters in an attempt
to identify malicious content that could potentially be an attack. This acts as a
line of defence against cyber attacks and can play a huge role in safeguarding CAN
based systems.

In order to check the efficiency and reliability of security mechanisms like IDSs, they
must be tested against malicious data to assess their ability to detect various types
of attacks. However, the availability of malicious data is not ubiquitous. The objec-
tive of this thesis is to investigate a methodology and develop a software that can
manipulate and add known attack traffic into already existing data sets. The abili-
ties and effectiveness of this attack traffic generating (ATG) software in mimicking
real-life cyber attacks is evaluated through a series of experiments while highlighting
its strengths and weaknesses. The experiments reveal that the developed software
succeeds in introducing malicious traffic into benign traffic in a random fashion,
which mimics real-life attack traffic. The time in which the software introduces
these attacks is a function of O(n2).

Keywords: Electronic Control Unit, Controller Area Network, Intrusion Detection
System, Attack traffic generator.

iv


Acknowledgements
We would like to acknowledge our supervisor from Chalmers University of Technol-
ogy, Magnus Almgren, our supervisors from Robert Bosch AB, Hjalmar Wenner-
ström, and Jörgen Nilsson for their support. We would also like to acknowledge
Sravan Tatipala, PhD Candidate at Product Development Research Laboratory-
BTH. Without their guidance, the success of this thesis would be unimaginable.

Chandrika Neelap, Gothenburg, 2023-09-06
Harsh Vardhan Bhandari, Gothenburg, 2023-09-06


vi


Contents

List of Figures xi

List of Tables xiii

1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Aim and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Review of Literature 5
2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 CAN attacks and countermeasures . . . . . . . . . . . . . . . 5
2.1.2 Security Mechanisms for CAN . . . . . . . . . . . . . . . . . . 6
2.1.3 Existing CAN Traffic Datasets . . . . . . . . . . . . . . . . . . 7
2.1.4 Analysing CAN Traffic . . . . . . . . . . . . . . . . . . . . . . 8
2.1.5 Traffic Generators for TCP/IP . . . . . . . . . . . . . . . . . . 9
2.1.6 Traffic Generators for CAN . . . . . . . . . . . . . . . . . . . 9

2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Conceptual Foundation 15
3.1 Controller Area Network (CAN) . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 CAN Architecture . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 CAN Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.3 CAN Security and Vulnerabilities . . . . . . . . . . . . . . . . 18

3.2 Traffic Analysis and Pattern Recognition . . . . . . . . . . . . . . . . 19
3.3 Intrusion Detection Systems (IDS) . . . . . . . . . . . . . . . . . . . 20
3.4 Attack Traffic Generation . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4.1 Characteristics of Attack Data . . . . . . . . . . . . . . . . . . 21
3.4.2 Experimental Evaluation and Validation . . . . . . . . . . . . 21
3.4.3 Trace Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.5 Attack Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1 Attacker Approach . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.2 The Replay Attack . . . . . . . . . . . . . . . . . . . . . . . . 24

vii


Contents

3.5.3 Denial of Service . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.4 Spoofing Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.5 Fuzzy Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.6 Flooding Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.7 Isolation Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.8 Overwrite Attack . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Design and Implementation 29
4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Trace File Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.1 Extracting Message Parameters . . . . . . . . . . . . . . . . . 30
4.2.2 Message ID Analysis . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.2.1 Frequency of Occurrence . . . . . . . . . . . . . . . . 31
4.2.2.2 Repeating Sequences . . . . . . . . . . . . . . . . . . 31

4.2.3 Timestamps Analysis . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Attack Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3.1 Fuzzy Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.2 Replay Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.3 Overwrite Attack . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.4 Spoofing Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.4 Evaluation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.1 Comparison of Trace Files: Real world vs Synthetic traffic . . 37
4.4.2 Output Trace File: Randomness . . . . . . . . . . . . . . . . . 37
4.4.3 Framework: Execution time, Complexity . . . . . . . . . . . . 38

5 Experiments and Results 39
5.1 Comparison of Trace Files . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1.1 TF1: Ratio of Cyclic Messages . . . . . . . . . . . . . . . . . . 40
5.1.2 TF2: Message Frequency . . . . . . . . . . . . . . . . . . . . . 40
5.1.3 TF3: Standard Deviation . . . . . . . . . . . . . . . . . . . . . 42
5.1.4 TF4: Parameter Correlation . . . . . . . . . . . . . . . . . . . 42

5.2 Output Trace File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.1 R1: Attack to benign ratio . . . . . . . . . . . . . . . . . . . . 43
5.2.2 R2: Spread of attacks within trace file . . . . . . . . . . . . . 44
5.2.3 R3: Cyclic to acyclic ratio . . . . . . . . . . . . . . . . . . . . 46

5.3 Framework Performance . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.1 P1: Execution time . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.2 P2: Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4.1 Comparison of Trace Files . . . . . . . . . . . . . . . . . . . . 50
5.4.2 Output Trace Files . . . . . . . . . . . . . . . . . . . . . . . . 51
5.4.3 Framework Performance . . . . . . . . . . . . . . . . . . . . . 52

5.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.7 Ethical concerns and sustainability . . . . . . . . . . . . . . . . . . . 54

5.7.1 Ethical issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.7.2 Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

viii


Contents

6 Conclusion 55

Bibliography 57

ix


Contents

x


List of Figures

3.1 CAN architecture as compared to the OSI model, taken from [33] . . 16
3.2 (a) standard CAN frame, (b) extended CAN frame, taken from [34] . 17
3.3 Snippet of an ASCII format tracefile . . . . . . . . . . . . . . . . . . 22

4.1 System overview flowchart . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Message ID frequency of a trace file . . . . . . . . . . . . . . . . . . . 31
4.3 Graphical representation of different message IDs based on their cyclic

difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Injected fuzzy attack traffic into normal CAN traffic . . . . . . . . . . 34
4.5 Injected replay attack traffic into normal CAN traffic . . . . . . . . . 35
4.6 Injected overwrite attack traffic into normal CAN traffic . . . . . . . 36
4.7 Spoofed messages compared to original attack-free trace file . . . . . 36

5.1 Message ID frequency of real-world trace files . . . . . . . . . . . . . 41
5.2 Message ID frequency of synthetically created trace files . . . . . . . 41
5.3 Standard deviation of time intervals of different messages . . . . . . . 42
5.4 Attack plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Fuzzy attack randomness . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.6 Replay attack randomness . . . . . . . . . . . . . . . . . . . . . . . . 45
5.7 Overwrite attack randomness . . . . . . . . . . . . . . . . . . . . . . 46
5.8 Spoofing attack randomness . . . . . . . . . . . . . . . . . . . . . . . 46
5.9 Expected execution time of each attack . . . . . . . . . . . . . . . . . 49
5.10 Observed execution time of each attack . . . . . . . . . . . . . . . . . 49

xi


List of Figures

xii


List of Tables

3.1 List of studied attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1 Size of trace files used in experiments . . . . . . . . . . . . . . . . . . 39
5.2 Cyclic to acyclic messages ratio . . . . . . . . . . . . . . . . . . . . . 40
5.3 Correlation between IDs and data lengths . . . . . . . . . . . . . . . 43
5.4 Cyclic to acyclic messages ratio of attack-free and infected trace files 47
5.5 Size of trace files used in experiments . . . . . . . . . . . . . . . . . . 47
5.6 Computation time for each attack . . . . . . . . . . . . . . . . . . . . 48
5.7 Scaling factor for each attack . . . . . . . . . . . . . . . . . . . . . . 48
5.8 Equation for each attack . . . . . . . . . . . . . . . . . . . . . . . . . 50

xiii


List of Tables

xiv


1
Introduction

Automotive systems in vehicles nowadays have become a vital part of life all around
the globe. The most common application of automotive systems is the transporta-
tion of populations. Modern automotive systems have improved the quality of trans-
portation not just by increasing the range of travel but also by making the journey
more comfortable [1]. Vehicles today are data-powered with a plethora of customiz-
able features that focus on improved safety, efficiency, and comfort for the driver
and the passengers by communicating with other vehicles and systems. To improve
the quality of transportation, automotive vehicles comprise multiple in-vehicular
electronic systems. Each system controls a unique function in the vehicle. These
electronic systems inside a vehicle are all controlled by Electronic Control Units
(ECUs). Due to the increasing need for comfort, a large number of electronic sys-
tems now exist inside a vehicle, each controlled by its own ECU [1].

This has led to a drastic increase in the number of ECUs that communicate not
just with each other but also with the environment. The most common way for
communication between ECUs used in vehicles today is Controller Area Network
(CAN). Its real-time properties, simplicity, and low cost make it favorable to use in
automotive vehicles.

Improved connectivity is a boon and a bane as it opens up an avenue for adversaries
to attack these systems which can lead to catastrophes . Therefore, the need for
improved automotive security arises in order to protect the communication integrity
inside these systems. The majority of real-world attacks target ECUs connected
through CAN [1].

With the rapid development of electronic and smart appliances in modern vehicles,
several ECUs are integrated into the conventional CAN bus. It was highlighted
by Huang et al. in [2] that the CAN bus does not offer protection against most
attacks, and it is vulnerable to a variety of manipulations. Vulnerabilities in an
auxiliary onboard appliance can turn it into relays, giving attackers an opportunity
to interact with the CAN bus [2]. With this in mind, a lot of security mechanisms
such as intrusion detection systems (IDSs) are being developed. To improve the
ability of such systems to detect known attacks, they are tested against data sets
of attack traffic mixed with benign traffic. A major challenge, both in academia

1


1. Introduction

and industry is the lack of available traffic data, especially data containing known
attacks [1]. In this thesis, we aim to come up with a solution for this problem for a
Network-based IDS developed for CAN.

1.1 Background and Motivation
The Controller Area Network (CAN) is a network protocol that is commonly used
in modern vehicles to communicate between electronic control units (ECUs). It is a
message-based protocol that allows for real-time information transmission between
various car systems such as engine control, brakes, and safety systems [3]. The
CAN protocol has been shown to be efficient and successful in facilitating vehicle-
to-vehicle communication. However, its security flaws have become an increasing
source of concern [4]. The increasing complexity of modern automobiles, with their
various interconnected ECUs, has resulted in an increase in the number of potential
attack vectors. As automobiles become increasingly integrated and autonomous,
they also become increasingly susceptible to attacks. Unauthorized access to the
CAN network can jeopardize vehicle safety, privacy, and functioning, endangering
drivers, passengers, and other road users [5]. Malicious actors can launch network
attacks by exploiting different vulnerabilities in the CAN protocol, including message
spoofing, injection, and alteration [6].

Intrusion Detection Systems (IDS) were created as an important line of protection
against CAN-based attacks. IDS scans the network for suspicious activity and seeks
to detect and neutralize any malicious behavior [7]. However, the efficacy of intrusion
detection systems is dependent on their ability to effectively identify and respond
to established attack patterns. The lack of a comprehensive data set of CAN-based
attacks makes designing and evaluating IDS for this protocol difficult [8]. A critical
component of testing network-based IDS for the CAN protocol is the generation of
realistic attack traffic patterns. Finding an innovative approach to generate traffic
data that contains established attack patterns, on the other hand, can be difficult
due to the limited availability of existing attack traffic data.

The goal of this thesis is to devise a methodology for designing and implementing
a software tool for generating synthetic attack traffic patterns that can mimic the
characteristics of real-world attacks. It is to analyze attack data characteristics,
generate, and change attack traffic in order to test network-based Intrusion Detection
Systems over the CAN network. The idea is to develop software that can take a
typical traffic trace file as input and insert abnormal traffic patterns using an attack
template. The ability to create traffic data with known attack patterns is critical
for assessing the performance and efficacy of intrusion detection systems [9]. The
developed software tool is tested on it’s ability to introduce malicious traffic patterns
into normal CAN traffic.

The proposed thesis aims to contribute to the field of CAN network security and
the evaluation of intrusion detection systems. By conducting an in-depth analysis of
attack data characteristics and developing a tool for generating attack traffic, this

2


1. Introduction

thesis seeks to enhance the ability to detect and mitigate attacks, thereby improving
the overall security of the CAN protocol in modern vehicles.

1.2 Aim and Scope
The objective of this thesis is to analyze the characteristics of attack data, generate
and modify attack traffic in order to test the network-based Intrusion Detection
System (IDS) over CAN. The aim is to develop a framework that can take a trace
file with normal traffic as input and insert known malicious traffic patterns using an
attack template. Finding a novel way to generate traffic data that contains known
attack patterns is difficult due to the limited availability of existing traffic data that
contains documented attacks. Additionally, this area of study is not extensively
explored in the literature.

The scope of the thesis involves the development of the framework, to introduce
attack traffic and its evaluation. Furthermore, the attack traffic in the files should
be injected without the need for user inputs indicating where these attacks should
be injected.

1.2.1 Objectives
The objectives of the thesis are:

• To perform a thorough literature review on automotive data-communication
security, CAN standards and its weaknesses, state-of-the-art electrical and
electronics architectures (involving CAN network), protocols such as CAN,
network management, IDS, and known attacks on vehicles relating to CAN.

• Review common attack patterns (such as Denial of Service, Flooding), under-
stand their implementation, and select the most suitable patterns to implement
in the framework.

• Develop an overarching framework in incremental steps that is capable of a)
analyzing trace files, b) generating a set of defined attack profiles, and c)
introducing the attack profiles on the selected trace files.

• Generalize the framework to handle any trace file and improve the attack
profiles to model an attacker for example by introducing randomness, history,
and learning.

• Evaluate the developed framework using defined metrics.

1.2.2 Research Questions
In this thesis, we aim to address three key research questions pertaining to the de-
velopment of a software tool for generating CAN traffic with known attack patterns.

3


1. Introduction

The objectives of the thesis are explored through the following research questions:

• How can a software tool be developed to generate traffic data containing known
attack patterns, based on a trace file containing normal traffic which addresses
the challenge of limited availability of such data in the field of automotive
security research?

• What are the limitations and challenges of generating synthetic traffic data
for CAN-based networks?

• How can the attack profiles be improved to make detection challenging for a
security mechanism?

1.2.3 Outline
The subsequent chapters of this thesis are organized as follows. Chapter 1 introduces
the research by providing the background and motivation, outlining the aim, scope,
objectives, and research questions. Chapter 2 presents a comprehensive review of
the literature, identifying gaps and areas for further investigation. In Chapter 3, the
conceptual foundation is established, covering the Controller Area Network (CAN)
architecture, CAN frames, security vulnerabilities, traffic analysis, pattern recogni-
tion, and Intrusion Detection Systems (IDS). Chapter 4 focuses on the implemen-
tation, providing an overview of the system, trace file analysis, attack generation
techniques, and the evaluation framework. Chapter 5 presents the conducted ex-
periments and their results, including comparisons of trace files, randomness of the
output trace file, and performance metrics of the framework. Finally, Chapter 6 con-
cludes the thesis, summarizing the findings, discussing limitations, and suggesting
potential areas for future work.

4


2
Review of Literature

In order to proceed with the project, relevant knowledge of various aspects of auto-
motive systems was required. It was only after we familiarized ourselves with the
know-how of autonomous communication protocols and standards that generating
attacks became possible. The following ideas discussed in various scientific publi-
cations proved useful in reaching the level of information sufficient to achieve the
objective of this thesis. Below we first provide a literature review before discussing
the most relevant work, compared to our own.

2.1 Literature Review
Initially, a small set of papers were obtained from our supervisor at Bosch, which
helped us understand the basic concepts of automotive systems and their security.
Through the references of these papers, more related researches were explored that
primarily focused on CAN architecture along with its vulnerabilities and the file
formats associated with it. It was important to be able to understand the infor-
mation stored in these files to be able to manipulate them. The various attacks
that are prevalent in CAN were identified and how different security mechanisms
work to detect these attacks revealed information that was important to identify the
characteristics of each attack. To gain an idea of what CAN traffic looks like, vari-
ous existing CAN datasets were explored that contained both normal and malicious
data. Learning how to analyze CAN traffic to extract as much information as pos-
sible help in finding different ways to introduce malicious traffic into benign traffic.
Before proceeding with the implementation, looking at existing traffic generators for
both TCP/IP networks and CAN gave us insights on how to develop the software.

2.1.1 CAN attacks and countermeasures
After gaining enough understanding of the CAN bus, different known attacks were
studied in order to gain insights on which attacks prevail in CAN buses and how
they can be performed.

Bozdal et al. [10] examine the security challenges associated with the CAN bus. The
authors identify various security threats that CAN bus networks are vulnerable to,

5


2. Review of Literature

such as message injection, replay, and denial-of-service attacks. They also discuss
the limitations of current security mechanisms used in CAN bus networks. The pa-
per emphasizes the importance of addressing these security challenges and proposes
various approaches to improve the security of CAN bus networks, including intru-
sion detection systems, protocols for secure authentication, and methods for data
encryption. Overall, the paper highlights the need for robust security solutions for
CAN bus networks to ensure the safety and reliability of critical systems.

Jo and Choi in [11] present a survey of attacks on CAN. The authors discuss the
various types of attacks that CAN networks are vulnerable to, such as message
injection, replay, and denial-of-service (DoS) attacks. The paper reviews existing
countermeasures used to prevent these attacks. These measures include message
authentication, encryption, and even IDSs. The authors evaluate the effectiveness
of these countermeasures and highlight their limitations. The authors suggest future
research to improve the security of CAN networks.

A related research carried out by Dibaei et al. in [12] covers a wide range of attacks
that can be performed on intelligent connected vehicles and measures that can be
used to mitigate those attacks. The paper presents attacks that can exploit the
CAN bus, the vehicle’s ECU as well as the sensors and actuators. It also discusses
the potential damage that these attacks can cause, such as risks to safety and fi-
nancial damage. The paper then describes various defense mechanisms that can be
employed to protect intelligent connected vehicles, including secure communication
protocols, intrusion detection systems, and hardware-based security solutions. The
implementation of attacks on the CAN bus is one of the most important parts of
this thesis. Some research papers presented a few common CAN attacks and how
they can be performed.

The attack presented by Thirumavalavasethurayar and Ravi in [13] is called a replay
attack. This research describes a method for simulating a replay attack on the CAN
bus. The paper begins by explaining what a replay attack is and how it can be
utilized by an adversary to compromise the security of a CAN bus. Then a detailed
description of how the replay attack is implemented is discussed along with its
different components like the attacker, the victim, and the replay device. The results
of their simulation showed that the replay attack was successful in compromising
the security of the CAN bus.

2.1.2 Security Mechanisms for CAN
We also explored the different techniques used in various security mechanisms that
focus on protecting the network by detecting malicious traffic.

Laufenberg et al. in [14] have performed a research that presents an approach widely
used in automotive and industrial systems for detecting attacks on CAN communi-
cation. The proposed approach analyzes the CAN message structure and identifies
suspicious patterns that can indicate the presence of an attack using a machine

6


2. Review of Literature

learning-based classification method to distinguish normal traffic from malicious
CAN messages.

Another research presented by Lenard and Bolboaca in [15] proposes a stateful fire-
wall and IDS for the CAN bus. The objective of these systems is to detect and
prevent various security threats by inspecting and filtering CAN bus traffic. Addi-
tionally, the authors propose a mechanism for storing the data and events generated
by the firewall and IDS in a secure manner (secure logging). The paper lays empha-
sis on the importance of secure logging in IDSs and suggests that future research
could be done on improving the efficiency of the proposed system.

Fürst and Bechter [16] talk about the use of AUTOSAR for connected and au-
tonomous vehicles. The paper highlights the benefits of using AUTOSAR for con-
nected and autonomous vehicles, including enhanced scalability, reliability, and se-
curity. The authors also discuss the challenges of improving the capabilities of
AUTOSAR in the case of such vehicles, like the need for advanced communication
protocols. Overall, the paper suggests that AUTOSAR is a promising approach for
developing software systems in connected and autonomous vehicles.

2.1.3 Existing CAN Traffic Datasets
After developing techniques for analyzing the traffic, the next step was to see how
the extracted information can be used to cleverly inject attack patterns into benign
traffic. For this we turned to existing datasets of attack traffic, to get an idea of
what normal and malicious CAN traffic looks like.

Hollifield et al. in [17] provides a detailed list of existing datasets for CAN intru-
sion detection along with their characteristics. It emphasizes on the importance of
developing a standardized methodology for the evaluation of an IDS’s performance
for CAN networks. The paper introduces a new dataset called "ROAD Dataset" for
this purpose. The dataset includes both benign and malicious traffic patterns. The
authors of this paper present an analysis of the contents of the dataset along with
attack frequencies and noticeable patterns.

Zago et al. in [18] evaluate a dataset of CAN messages for the purpose of reverse
engineering. This dataset contains over 5,000 CAN messages captured from a real-
world vehicular setting. Each message in this ReCAN dataset is unique. The paper
describes how the dataset was put together and its characteristics like the distri-
bution of different types of messages and the frequency in which they occur. The
dataset helps reverse engineer CAN messages to message signals and types along
with decoding the data in the payloads.

Sharafaldin et al. in [19] focus on the development of a new data set for IDS and
highlights the importance of such diverse and representative data sets that can be
used to train IDS in an effective manner. It highlights the shortcomings of existing
datasets for TCP/IP networks and discusses a unique methodology for generating

7


2. Review of Literature

a new dataset that is more realistic. The process of collecting network traffic data
from various sources and preprocessing it is explained in the paper. The authors of
the paper use an algorithm to generate synthetic malicious traffic by combining
real network traffic with simulated attacks in an attempt to capture real-world
attacks in their datasets. The dataset is evaluated by comparing the performance
enhancements of IDS when using this dataset against existing datasets.

2.1.4 Analysing CAN Traffic
A crucial part of the project was to analyze the traffic of the CAN bus to find
patterns and extract as much information about the traffic and its characteristics
as possible and the following researches helped gain insights about the same.

Ezeobi et al. in [20] explore unsupervised machine learning techniques to analyze
and understand the meaning and function of CAN signals in the payload. In order to
identify signal patterns in the data, clustering algorithms are used to create groups
of similar messages. Anomaly detection is used to identify system malfunctions or
harmful traffic. The paper shows unsupervised machine learning as a useful tool for
reverse engineering CAN messages to establish signal boundaries and identify those
signals.

A similar study was presented by [21] which presented A Modular Four-Step Pipeline
for Comprehensively Decoding Controller Area Network Data called CAN-D. The
pipeline mentioned in this paper includes analysis in four modules: the physical
layer, message layer, signal layer, and application layer analysis. Each module has a
unique job like identifying the type of CAN messages or decoding the signals in the
payload. This helps users gain valuable insights into the functioning of the vehicle’s
ECUs.

Verma et al. in [22] propose an approach for tokenizing and translating CAN mes-
sages in vehicles. This approach is called Automotive CAN tokenization and Trans-
lation (ACTT), which makes use of a machine learning algorithm to divide the
CAN messages into tokens in an attempt to translate them into a human-readable
format. This study describes how ACTT can be implemented and evaluates its per-
formance against a dataset of CAN traffic. According to the results, ACTT shows
high accuracy and effectiveness in translating CAN messages in real time.

Young et al. in [23] propose a machine-learning-based solution to reverse engineer-
ing of CAN messages. The process is described in four steps including data pre-
processing, feature extraction, model selection, and performance evaluation. Then
the models obtained by this process are evaluated using a dataset of CAN messages.
The results reveal that this approach is effective in decoding CAN messages and per-
forms better than existing methods in terms of accuracy and processing speed. This
approach has the potential to enhance the accuracy and efficiency of the analysis of
CAN messages.

8


2. Review of Literature

Lestyan et al. in [24] gives a way of re-identifying drivers based on the patterns
of their driving with the help of sensor data extracted from CAN bus logs. This
method includes feature extraction, dimensionality reduction, and clustering of the
sensor signals. The authors test the effectiveness of their proposed method in its
ability to accurately re-identifying drivers from a dataset of real-world CAN logs.

2.1.5 Traffic Generators for TCP/IP
To gain more insights about how to proceed with developing our own framework,
we explored various existing work that was similar to what we intend to achieve in
this thesis. These researches are performed for different protocols.

Puketza et al. in [25] discuss a software platform developed for testing IDS. The plat-
form tests IDS in a more controlled and realistic manner, focusing on the evaluation
of the IDS’s performance against different attacks. The paper describes the design
and architecture of the software, which has a traffic generator to produce realistic
network traffic, an attack generator for generating a variety of distinct attacks, and
a data collector for analyzing the IDS’s output. The authors discuss the challenges
associated with testing of IDS, such as the shortage of standardized datasets and
the problem of reproducing real-world attacks.

Behal and Kumar in [26] covers a wide variety of DDoS attack tools, some popular
ones are LOIC (Low Orbit Ion Cannon), HOIC (High Orbit Ion Cannon), and
XerXes. Other more sophisticated tools like Slowloris and RUDY are also covered
in this study. These tools are compared by the authors based on different criteria
like supported attack vectors, the intensity of attacks, stealthiness, and ease of use.
It analyzes tools like hping, TCPReplay, and Iperf, which are traffic generators used
to simulate DDoS attacks for testing and evaluation purposes, and their ability to
generate realistic attack traffic patterns.

A study that addresses the objectives of this thesis was performed by Erlacher and
Dressler in [27]. The paper presents the design and implementation of GENESIDS,
along with its evaluation using several IDS. The system automates the process of
generating attacks by utilizing an algorithm, which can generate new attacks by
combining existing ones. The authors highlight the importance of testing IDS to
detect security threats. The results show that the system is capable of generating a
wide range of attacks for testing IDSs.

2.1.6 Traffic Generators for CAN
After gaining insights on what attack traffic generators look like for traditional
TCP/IP networks, we explored similar existing work done by researchers for CAN.

A related tool that also generates attack traffic is presented by Huang et al. in [28].
The paper presents a tool called ATG (Attack Traffic Generation) that is designed
for security testing of the CAN bus. The tool is developed to generate attack traffic
to evaluate the security of CAN bus systems specifically found inside vehicles. The

9


2. Review of Literature

authors discuss the limitations of current security testing tools and propose ATG as
a solution that can generate various attack scenarios.

Yang et al. in [29] focused on the development of a penetration testing platform that
can be used to evaluate the security of embedded systems that make use of the CAN
bus. The platform is designed to provide a comprehensive and automated testing
process that includes various attack scenarios, such as message injection, replay, and
flooding. The study highlights the importance of penetration testing platforms that
can effectively evaluate their security.

The review of different literature presented different ideas that could be implemented
in this thesis and aided in defining the scope of this thesis. The information gained
from these articles helped in devising an approach and finding resources that proved
critical to the advancement of the project.

2.2 Related Work
The CAN bus is the most commonly used solution in in-vehicle networks. Hence,
several platforms for various CAN bus testing hardware and software implementa-
tions are already maintained by several commercial vendors. For example, CANoe
is a proprietary test software developed by Vector Informatik which is used for de-
veloping, testing, and analyzing electronic systems inside of vehicles. It provides a
simulation environment that supports multiple bus systems and protocols like Lo-
cal Interconnect Network (LIN), FlexRay, ethernet, etc. Through CANoe, one can
simulate ECUs within a network and analyze them individually to detect anomalies
in communication. It comes with tools for monitoring and analyzing communica-
tion on the CAN bus and allows users to create complex test scenarios. These test
scenarios can be automated and the results can be analyzed. Another tool, closely
related to CANoe is CANalyser, also developed by Vector Informatik. It provides
similar functionality as CANoe does with the added capabilities of logging and the
ease of integration with other tools. UDSim (Unified diagnostic services simulator)
is another such tool developed by Vector Informatik. This tool was later extended
to work on vehicular networks as well.

There are a number of tools similar to the ones discussed above that have the same
functionality and prove extremely useful when it comes to simulating and analyzing
in-vehicular networks and systems. BusMaster is another popular lightweight open-
source tool released by BOSCH. These tools focus on testing the functionality of the
system but have little to no provision for security testing. The focus of this thesis
is on generating and injecting attack traffic into the CAN traffic.

There are several research works done on open-source software packages which work
with cheap, commonly used hardware configurations. Tools like SavyCAN help
record messages in different file formats and have provisions for critically analyzing
the data payload of CAN messages which can be later represented visually in the
form of plots. Hounsinou et al. in [30] developed a CAN network analysis tool

10


2. Review of Literature

called CarShark that helps decode and distinguish control messages and provides
means for visualizing these messages. SavyCAN allows message filtering based on
specific criteria that allow users to focus on relevant data. OCTANE (Open Car
Testbed and Network Experiments) is a platform that aims to provide an open and
collaborative environment for developers and researchers in the industry. It provides
a testbed infrastructure that mimics real-life automotive systems. It allows for its
researchers to study and evaluate a variety of automotive network protocols and
communication technologies.

There are several attack tools that work over traditional TCP/IP networks. Hping,
a command-line tool, generates and transmits customized network packets which
can be utilized to create various malicious attacks that can lead to Denial of Service
(DoS). A popular framework called Metasploit, which is used for vulnerability assess-
ments and penetration testing, consists of modules that can generate known attacks
for different systems from a wide range of attack vectors. Low orbit Ion Cannon
(LOIC) is a network stress testing tool that generates high volumes of TCP, UDP
and HTTP flood attacks. This tool is misused to perform Distributed DoS attacks
(DDoS). Scapy is a python-based interactive packet manipulation tool. It is a power-
ful tool that firsts sniff packets before manipulating them to perform reconnaissance.
Similar to Scapy, Impacket is a python library that provides limited capabilities to
the user for packet manipulation. Tor’s Hammer is a python-based DoS testing tool
that works through the TOR network. It uses random source IP addresses to make
it difficult to trace back the attacks to the source. Behal and Kumar in [26] cover
many such tools and draw a comparison highlighting the capabilities of each tool.

The above tools provide a means for attacking systems. These tools are attack tools
that are different from traffic generators (what we aim to develop in this thesis).
Some traffic generator tools for TCP/IP networks are discussed below.

ByteBlower is a tool developed by Excentis for testing the performance of a network.
The primary aim of this tool is to perform testing of the devices connected in the
network bu generating different types of traffic like UDP and TCP with varying
payload sizes and rates. The generated synthetic traffic can simulate VoIP, web
browsing and even video streaming. Another tool, called Geist traffic generator
(GTG) developed by Geist Technologies is capable of mimicking the traffic patterns
of various applications. It supports the testing and analysis of network protocols
like IPv4, IPv6, UDP, ICMP, TCP and many more. GTG has features that enable
analysis of network characteristics and generating reports of its analysis. Harpoon
is a vulnerability scanner tool that scans the network for common vulnerabilities
and exposures and any other known security problems.

Along with these, the tools that were initially developed for traditional TCP/IP
networks are now being extended to work on CAN networks as well, for example,
UDSim. Many python Libraries were developed to support development for CAN.
Python-CAN is a widely used library that provides a simple mechanism for sending
and receiving CAN messages. This library supports various CAN hardware inter-

11


2. Review of Literature

faces like socketCAN and CANtact as discussed above. Other tools like CANard,
pyvit and canmatrix make it easy for developers to interact with CAN networks to
develop applications. CANtools is a library that helps work with CAN databases
(DBC files). It gives its users the ability to parse database files in different for-
mats. These database files contain information about decoding CAN messages into
human-readable format.

Sharafaldin et al. in [19] work on creating a single dataset that eliminates the
shortcomings of older datasets which proved to be outdated and unreliable. This
new dataset contains seven new and common attack patterns which are introduced
into benign data by simulating them in order to meet real world criteria. This
dataset is developed for TCP/IP networks. This work differs from the software
tool developed in this thesis as it does not focus on CAN networks. Moreover, the
software tool developed in this thesis generates a completely different output log or
dataset on every iteration.

There are a number of tools that have been developed to generate attack traffic
specifically for in-vehicular networks. They have similar capabilities as the tools
discussed above but were developed with CAN in mind. CANard is an open-source
Python-based framework designed for CAN. It has the ability to send and receive
CAN messages through hardware interfaces and can manipulate these messages as
desired. This tool can generate custom CAN traffic and mimic different scenarios
which makes it extremely useful for testing and developing CAN-based environments.
SocketCAN is another such tool that have similar capabilities but additionally is
equipped with mechanisms for message filtering and routing. The application can
use CAN identifiers and other fields of the frame to set filters to gather messages
of high relevance. CANtact is a tool developed by Erik Evenchick that has gained
popularity among developers. It provides a USB-to-CAN interface that connects
to the USB port to interface with CAN networks. Additionally, it enables its users
to capture CAN traffic and analyze the traffic which proves extremely useful for
debugging and security analysis of CAN networks.

Apart from existing tools and libraries, there are several research projects that aim
to address similar problems as discussed in this thesis. One such research was
performed by Huang et al. in [28]. They work on developing an Attack Traffic
Generation Tool (ATG). ATG provides a free and functional toolkit for automotive
security researchers for easy and effective interaction with real or simulated CAN
buses. This tool generates attack traffic for the evaluation of security mechanisms
developed for CAN systems. The authors compare their work with other developed
tools and highlight their strengths and weaknesses. The attack generation capabili-
ties of ATG differ from that of the software tool discussed in this thesis. ATG takes
a set of attacks and injects them into a log file which is then sent to the CAN bus in
real time. Each attack is configured before hand and a fixed number of such attacks
are injected into the log files in fixed intervals of time. The software tool developed
in this thesis is capable of analysing the CAN log files and based on it’s analysis, a
variable number of malicious attacks are injected at varying intervals of time.

12


2. Review of Literature

Another study is performed by Erlacher and Dressler in [27]. The tool developed by
the authors is called GENESIDS that automatically generates user defined HTTP
attacks. This allows network traces to be created in a straightforward manner. The
tool depends on the rules present in Snort (NIDS) to generate attacks that would
trigger the corresponding rules. This tool differs from the software tool developed
in this thesis as it is not designed for automotive networks. Additionally, the tool
developed in this thesis does not rely on rules to generate attack traffic, instead it
uses information obtained by analysing CAN logs to generate attack traffic.

A similar study by Palanca et al. [31] explores a specific type of cyber-attack on
vehicular networks that operates on the link layer which is responsible for the com-
munication between devices connected in the network. This attack is shown to be
stealthy as it is designed to be difficult to detect and trace back to the source. It
severs communication between ECUs and provides a detailed explanation of how
it does so along with some countermeasures against this attack. This attack se-
lectively targets ECUs, which makes it all the more dangerous as it can disrupt
communication between ECUs performing critical tasks which can cause a lot of
damage.

Radu and Andreea-Ina in [32] focus on the implementation of security measures in
order to protect the in-vehicle network from unauthorized access, potential attacks,
and data breaches. The paper discusses different layers of security, including the
architecture of the network used, authentication mechanisms, and intrusion detec-
tion systems. In any security-related research, the objective is to ensure the safety
and integrity of communication within the in-vehicle network. The paper dives into
different security techniques and best practices that can be adopted to safeguard any
system against cyber threats, such as firewalls, IDS, secure boot processes, access
control mechanisms, and secure update mechanisms for ECUs.

There are many other pieces of research that have not been mentioned in this section
that have contributed to the field of automotive security. All these studies have been
a source of motivation to proceed and find answers to the problems discussed in this
thesis. In order to know how the framework was developed the right knowledge of
some concepts is required which are presented in the following chapter.

13


2. Review of Literature

14


3
Conceptual Foundation

3.1 Controller Area Network (CAN)
The Controller Area Network was an idea birthed by engineers at the Robert Bosch
GmbH in Germany in February of 1986. Their aim was to devise a system that
would enable communication between multiple ECUs in vehicles. Since then, the
protocol has found its way into every mode of transport from cars to trains to even
ships. Every modern automotive system has at least one CAN network installed.
CAN is a bus-based protocol that has proved to be a very reliable communication
protocol for communication between vehicles and their surroundings [10].

The CAN bus uses serial communication which reduces the number of wires inside
the system. Even though this was not the main intention when developing this
protocol, it proved to be a useful by-product of the protocol. The use of multiple
processors improves the performance of the system. The decrease in the cost of micro-
controller chips at that time made a multi-processor architecture in a single system
feasible. CAN was originally devised with automotive systems in mind but today
it has found its way into all the fields where inter-microprocessor communication is
required.

3.1.1 CAN Architecture

CAN is a 2 wire half-duplex high-speed network system that is suitable for High-
speed real-time applications as it has low memory and CPU requirements. It also
provides collision detection and prevention as it makes use of CSMA/CD. Every
ECU must wait for a specific period of inactivity on the bus before a message can
be transmitted by the same ECU. Every message is given a unique identifier which
also contains the priority of the message, which is used to resolve any collisions that
might occur. Lower values in the identifier indicate higher-priority messages. The
implementation of the CAN bus is straightforward. It bypasses 4 layers of the OSI
model (presentation to the Network layer). By doing so it saves memory resources
and gains performance. CAN operate both on a datalink and physical layer.

15


3. Conceptual Foundation

Figure 3.1: CAN architecture as compared to the OSI model, taken from [33]

Every entity participating in communication over the CAN bus is called a node. A
CAN node is a functional ECU that participates in the CAN network. Each node
can send/receive a different number of messages onto the bus. The frequency of
sending and receiving differs greatly as well.

To participate in CAN communication every ECU should have a CAN interface that
comprises a CAN controller and transceiver. The CAN Controller is responsible for
processing information to and from the CAN bus. The CAN bus is the physical
transmission medium that links all the participating nodes via a CAN interface.
The transceiver connects the controller to the physical transmission medium.

In order to transmit signals onto the bus, the transceiver is equipped with 2 bus
pins, the CAN (High) and CAN (Low) pins. In order to signal the logical 0 and
1 signals, the transceiver creates a differential voltage i.e., the difference in voltage
between CAN (High) and CAN (Low). The dominant bit (Logical 0) is assigned
when a differential voltage of 2 V is observed and the recessive bit (Logical 1) is
assigned when a differential voltage of 0 V is seen.

The transmission of CAN messages does not follow any time sequence but rather is
event-driven. The communication channel is only busy when there is a message to
be transmitted. This makes access to the bus quick. Every CAN message on the
bus can be received by any connected CAN node (broadcasting) if the message can
be identified using a valid unique message identifier. A node can choose to either
accept or reject a CAN message based on the relevance of the message to that node.

3.1.2 CAN Frames

Messages in CAN are enclosed in a frame. The frame consists of various fields that
facilitate the successful transmission of data over the bus. Figure 3.2 shows the
format of CAN frames.

16


3. Conceptual Foundation

Figure 3.2: (a) standard CAN frame, (b) extended CAN frame, taken from [34]

• SOF: start of frame marks the start of a message.

• Identifier: 11 bits, establishes priority of each message. Lower the binary
value, higher the priority.

• RTR: 1-bit, Remote transmission request, set to 0 when information required
from another node.

• IDE: 1-bit, identifier extension. 0 means standard CAN msg with no exten-
sions transmitted.

• r0: Reserved bit.

• DLC: 4 bits, Data length code. Number of bytes being transmitted.

• DATA: 64 bits data.

• CRC: 16 bits, checksum for error detection.

• ACK: Originally contains 1, if the message is valid and error-free all recipients
overwrite this to 0. If not, all nodes overwrite this bit, then the message is
discarded, and the sender re-sends the message after corrections. All nodes
acknowledge the integrity of message.

• EOF: 7 bits, end of CAN data frame.

• IFS: 7 bits, stores time required by controller to move a correctly received
frame to its proper position in message buffer area.

An extended CAN message has all the above fields in the data frame along with the
following additional ones:

• SRR: Substitute remote request. 0 means standard CAN message and 1
means extended CAN message.

17


3. Conceptual Foundation

• IDE: 1 in IDE implies there are more identifier bits to follow.

• r1: additional reserve bit.

The ID in the extended format of CAN has 29 bits as compared to the standard 11
bits. If the last bit of EOF is 1 then the message is error free and the frame is valid.
If 0 is the last bit in the EOF field then the transmission of the message is repeated.

In the CAN protocol, there are 4 different types of frames that can be transmitted.

• Data Frame: Initiates, and maintains communication between partners. Trans-
mits and protects user data, and establishes communication relationships de-
fined in the communication matrix. The RTR bit is 0.

• Remote frame: Request for transmission of data from another node. Like
data frames except do not carry any data. The RTR bit is set to 1. Not used
often. Lacks the data field but otherwise the frame format is the same. CAN
controllers respond to a remote frame by sending the desired data frame.

• Error Frame: Violates formatting rules of CAN messages as it is a special
message. Transmitted when an error in a message is encountered in which
case all other nodes are also forced to transmit the error frame. The original
transmitter then transmits the re-transmit message. A node cannot tie up a
bus by repeatedly transmitting error frames.

• Overload Frame: Special message that is transmitted when a node becomes
too busy. An extra delay of messages is then implemented.

3.1.3 CAN Security and Vulnerabilities
The increasing connectivity of vehicles has exposed Controller Area Network (CAN)
networks to various security challenges. While CAN was originally designed with
a focus on reliability and efficiency, it lacks built-in security mechanisms. CAN
networks are vulnerable to a range of attacks that can compromise the confidentiality,
integrity, and availability of communication. This makes it susceptible to a range of
security vulnerabilities and attacks [35].

One of the primary vulnerabilities in CAN networks is message injection. An at-
tacker can inject malicious messages into the network, either by compromising a
legitimate ECU or by gaining unauthorized access to the network. An attacker
can also impersonate a legitimate ECU by sending messages with a spoofed source
identifier. This can deceive other ECUs into accepting and executing malicious com-
mands, potentially compromising the integrity and safety of the vehicle’s operations
[35].

An attacker can intercept valid CAN messages and replay them at a later time. This

18


3. Conceptual Foundation

can result in the re-execution of previously executed commands, leading to unwanted
or dangerous actions by the ECUs [35]. Denial of Service (DoS) attacks pose yet
another threat to CAN networks. By flooding the network with excessive messages
or by targeting specific ECUs, an attacker can disrupt the normal functioning of the
network. This can result in the loss of critical information or the unavailability of
vital vehicle functionalities [35]. Understanding these vulnerabilities is essential for
developing effective security mechanisms and intrusion detection systems.

3.2 Traffic Analysis and Pattern Recognition
Traffic analysis and pattern recognition techniques play a crucial role in identifying
anomalies and detecting attacks in Controller Area Network (CAN) networks. By
analyzing the parameters and contents of CAN messages, these techniques enable
the identification of patterns indicative of normal or malicious activities [36].

One aspect of traffic analysis is the examination of message parameters, such as the
identifier, data length code, and data field of CAN messages. Analyzing the identifier
can reveal patterns in the message sources or identify messages associated with
specific functionalities or ECUs within the network [36]. Analyzing DLC distribution
can help identify abnormal message sizes. Analyzing data field contents can provide
insights into the nature of the messages and the information being communicated
[36].

Furthermore, analyzing the contents of the data field can involve examining specific
data bytes or bit patterns within the payload. CAN messages often carry critical
information, such as sensor data, vehicle status, or control commands [37]. By ana-
lyzing the content of these messages, patterns associated with normal or malicious
behaviors can be identified. For example, abnormal values or ranges in sensor data
can indicate sensor tampering or spoofing attacks. Similarly, the presence of spe-
cific bit patterns or command sequences can be indicative of unauthorized access
attempts or malicious commands [37].

Pattern recognition techniques, including machine learning algorithms, can be ap-
plied to analyze the parameters and contents of CAN messages [38]. These algo-
rithms can learn from labeled data to identify patterns associated with normal or
malicious behavior. By training models on a dataset containing both normal and
attack traffic, the algorithms can learn to distinguish between them and detect new
or previously unseen attack patterns. Machine learning algorithms such as Support
Vector Machines (SVM), Random Forests, and Neural Networks have been applied
to analyze CAN message parameters and contents with good results [38].

Combining traffic analysis with pattern recognition techniques allows for a com-
prehensive understanding of the CAN network’s behavior and the ability to detect
anomalies and attacks in real time. By examining message parameters and contents,
researchers and practitioners can gain insights into the normal operation of the
network, identify deviations from expected behavior, and detect potential security

19


3. Conceptual Foundation

threats.

3.3 Intrusion Detection Systems (IDS)
Intrusion Detection Systems play a vital role in safeguarding CAN networks by
monitoring network traffic and detecting suspicious activities or attacks. There are
two main categories of Intrusion Detection systems that are used in CAN networks:
IDS and IPS.

An Intrusion Detection System (IDS) in the context of CAN refers to a security
mechanism that monitors the network for any suspicious or malicious activities. Its
primary purpose is to identify potential intrusions or attacks targeting the CAN bus.
The IDS analyzes network traffic, examines CAN message payloads, and compares
them against predefined patterns or known attack signatures to detect any anomalies
or deviations from expected behavior. When an intrusion is detected, the IDS can
generate alerts or take preventive actions to mitigate the potential impact of the
attack [39].

On the other hand, an Intrusion Prevention System (IPS) goes beyond detection
and aims to actively prevent or block intrusions in real time. It operates in a similar
manner as an IDS by monitoring network traffic and analyzing message payloads.
However, an IPS is equipped with the ability to take immediate action to prevent or
mitigate potential threats. It can automatically generate and deploy security policies
or rules to block malicious messages or suspicious activities. By actively preventing
intrusions, an IPS enhances the overall security posture of the CAN network [40].

IDSs in CAN networks employ various techniques for traffic monitoring and anal-
ysis. These include deep packet inspection, statistical analysis, machine learning
algorithms, and rule-based systems. Deep packet inspection allows for the detailed
analysis of CAN messages, examining their content and identifying any suspicious
patterns or anomalies [40]. Machine learning algorithms can be employed to train
these models using labeled or unlabeled data, enabling the detection of unknown
or emerging attacks. Rule-based systems utilize predefined rules or conditions to
identify specific types of attacks based on their characteristic features [40].

By deploying IDSs in CAN networks, organizations can enhance the security of their
vehicle systems, detect and respond to potential intrusions, and protect the integrity
and safety of the vehicles and their occupants [39].

3.4 Attack Traffic Generation
To analyze the security posture of CAN networks and gauge the efficacy of intru-
sion detection systems, realistic and representative attack traffic must be generated.
There are numerous methods for producing attack traffic [41]. To mimic particular
types of attacks, attack templates offer predetermined attack patterns or scenar-

20


3. Conceptual Foundation

ios. Researchers can create traffic that simulates actual attack scenarios using these
templates, which capture the traits and behaviors of known attacks. The addition
of anomalies, changing packet parameters, or introducing differences in timing and
payload content are examples of various methods that can be used to alter regular
traffic patterns, one of which we’ll be using in our thesis as well [41].

Using this method, researchers may simulate complex and nuanced attack changes
to assess how resistant IDSs are. Furthermore, by using machine learning algorithms
to study and imitate well-known attack patterns, attack traffic that closely mimics
actual attacks can be produced [41]. Researchers can create traffic that displays
similar traits and behavior as seen in actual attacks by employing machine learning
models trained on historical attack data [4]. Generating accurate and representa-
tive attack traffic is crucial for evaluating the detection capabilities of IDSs and
improving the overall security of CAN networks [41].

3.4.1 Characteristics of Attack Data
Understanding attack data characteristics is critical for generating realistic and rep-
resentative attack traffic. While it is challenging to guarantee an exact replication
of actual attack situations, certain factors of attack data, such as packet size, tim-
ing, payload content, and traffic patterns, are taken into account to generate attack
traffic that closely resembles real-world scenarios. These features can be better un-
derstood by analyzing previous attack data, which allows the building of models
that are suitable for producing attack traffic [42].

Studying the statistical properties of attack packet sizes, for example, can be helpful
in determining the size distribution that should be replicated in generated traffic.
Analyzing the historical trends of attack traffic, on the other hand, can help generate
traffic with realistic timing characteristics [42]. The created attack traffic can closely
match actual attack scenarios by capturing the nuanced aspects of attack data,
resulting in more accurate and appropriate evaluations of IDS and CAN network
security [42].

3.4.2 Experimental Evaluation and Validation
Controlled experiments and precise validation methodologies are required to evalu-
ate the efficacy of attack traffic creation techniques and intrusion detection systems.
In experimental evaluations, synthetic attack traffic is created using a variety of
methodologies, and the effectiveness of IDS in identifying and preventing such at-
tacks is evaluated [43]. The effectiveness of IDS and security measures is evaluated
by performance assessment metrics like detection accuracy, false positive rates, and
reaction time.

These measurements also help validate the created attack patterns. For reliable
and accurate evaluations, adequate experimental design, including relevant datasets
and plausible attack scenarios, is essential [43]. The robustness and reliability of the

21


3. Conceptual Foundation

experimental evaluation procedure are also increased by comparing it against current
intrusion detection systems and comparing results with approved requirements [43].

3.4.3 Trace Files
Trace files help record communication that takes place over the CAN bus and store
the streams of bytes in human-understandable formats. There are several trace file
formats that are of high relevance to all automotive systems. In this research, 3
types of trace file formats were observed:

DBC (Database for CAN) is a standardized file format developed by Vector In-
formatik GmbH for interpreting communication data in Controller Area Network
(CAN) networks. It serves as a protocol specification, converting raw CAN bus data
into understandable values by extracting signals from bytes. DBC files are widely
used for configuring ECUs, analyzing CAN bus traffic, and developing software ap-
plications. They define message identifiers, signal definitions, scaling factors, and
message encoding rules. DBC files are proprietary and specific to each manufacturer
or tool provider.

ARXML (AUTOSAR XML) is another standardized file format used in CAN net-
works, particularly in the context of AUTOSAR (AUTOmotive open System AR-
chitecture) systems. It represents the configuration of an ECU and contains de-
tailed information about message structures, signal definitions, and parameter val-
ues. ARXML files are crucial for integrating and configuring AUTOSAR-based
systems, allowing the exchange of ECU configurations between different tools and
platforms. They follow the AUTOSAR architecture, ensuring compatibility and
interoperability across various automotive systems.

ASCII (American Standard Code for Information Interchange) files are a text-based
file format commonly used for storing CAN network communication data. Unlike
DBC and ARXML files, ASCII files do not provide detailed message and signal
definitions. Instead, they present CAN data as a sequential list of messages with
timestamps, making them human-readable. ASCII files capture the raw streams of
CAN messages, including message identifiers, data payloads, and timestamps. As
can be seen in Figure 3.3, the second column displays the timestamps, followed by
the bus number and message identifiers in the subsequent columns. These identifiers
hold the priority assigned to each CAN message. The last columns represent the
payload length in bytes, and the message payload is shown in the final column in
hexadecimal format.

Figure 3.3: Snippet of an ASCII format tracefile

22


3. Conceptual Foundation

While they lack the decoding rules and structure provided by DBC and ARXML
files, ASCII files are useful for logging and analyzing CAN bus traffic. They offer a
convenient way to examine the sequence of messages, identify anomalies, and gain
insights into the behavior of the CAN network.

There are a lot of open-source tools that hackers use to analyze captured CAN traffic.
CANalyser, CANoe, and Raptor-CAN are some popular ones. The captured data
contains similar information about the CAN network as that of an ASCII file which
means an attacker would only have access to messages as they appear in an .asc file,
another reason why ASCII trace files were considered for this research.

3.5 Attack Models
This thesis focuses on the study of the following selected attacks that are prevalent
on the CAN bus. Some of these attacks were introduced in the form of malicious
traffic into trace files for the purpose of simulating attack traffic and testing the
reaction of the Intrusion Detection Systems toward these attacks.

In section 3.5.1 we introduce we introduce our envisioned attacker. In the following
subsections, we then first describe the general attack, followed by how our attacker
would use it.

3.5.1 Attacker Approach

For the purpose of this thesis, let us consider an adversary named BhanuPratap that
is trying to exploit an automotive vehicle system in order to maximize the damage
caused. BhanuPratap is a skilled adversary that understands the architecture of the
CAN. The adversary has the ability to sniff the data frames being transmitted over
the bus and the tools to translate the messages into an understandable format. By
observing these messages, the attacker can understand the values set in the data
frame and can interpret signal values that are critical in determining the state of
the automotive system. BhanuPratap understands the impact that changing these
signal and data frame values will have on the system.

There are other ways to access raw CAN data and make sense of the information in
the messages. ARXML and DBC files give a detailed description of the data present
in the messages along with the different signal values present in them. These files
contain information on how one can convert raw CAN bus data into human-readable
values. However, it’s unlikely for an adversary to have access to these files as they are
proprietary, and hence the adversary’s access is limited to only the raw information.

23


3. Conceptual Foundation

Table 3.1: List of studied attacks

Sr.No. Attack Name
1 Replay Attack
2 Denial Of Service (DoS)
3 Spoofing Attack
4 Fuzzy Attack
5 Flooding Attack
6 Isolation Attack
7 Overwrite Attack

3.5.2 The Replay Attack

In the case of a traditional TCP network, a replay attack is defined as a network
attack where valid transmissions of data are captured and repeated or delayed with
malicious intent. It is performed by an adversary by intercepting data and re-
transmitting it at a later instance in time. This attack is often a part of Spoofing
attacks where an adversary tries to trick the system into believing that the attacker
is a trusted entity in order to gain illegitimate access.

CAN is a message-based protocol with no provision for addressing the nodes. This
makes transmission of malicious messages easier and furthermore, the identification
of the source of these malicious messages becomes difficult. Replay attacks can
occur over secure systems that use encryption, and this makes them all the more
dangerous and a good choice for adversaries like BhanuPratap. In order to prevent
replay attacks, timestamps, and unique sequence numbers can be used.

3.5.3 Denial of Service

The aim of the DoS attack is to make the system resources unavailable to its intended
users. This is done by disrupting the services of a host connected to the network
either temporarily or indefinitely. Practically, this is achieved by flooding the target
machine with trivial requests to an extent where it is incapable of handling or
accepting new requests. By doing so the system becomes incapable of processing
requests coming from legitimate users. In the case where an adversary has control
over multiple systems, they can send requests to the victim through different sources
which produces the same effect but is a faster way of attacking the victim. This
type of attack is called Distributed Denial of Service or DDoS.

A CAN node broadcasts its message over the bus so every node receives the message
sent by every other node. Broadcasting a high volume of messages causes the bus to
overload. Collisions are possible over the CAN bus. In order to detect and resolve
collisions, CAN uses message identifiers to assign priorities to every message. The
lower the value of the identifier, the higher the priority of the message. In case
of a collision, the lower-priority messages are simply discarded. An adversary like

24


3. Conceptual Foundation

BhanuPratap can use this to their advantage to craft messages with small identifier
values in order to produce a Denial of Service attack.

3.5.4 Spoofing Attack
The aim of spoofing is for an adversary to masquerade as a trusted entity. The
aim of such attacks is to hide the adversary’s identity while exploiting resources the
victim has access to for personal gain. Spoofing can be done in a plethora of ways,
by sending fake emails and text messages, IP spoofing, etc. Man in the middle is
also a variant of spoofing attack that involves three parties, the user, the server, and
the adversary. The adversary makes an individual connection with the server and
the user and relays the messages between them, tricking them into thinking that
the user and the server are communicating over a secure channel where in reality
the attacker can listen to the entire conversation.

Since CAN provides no means of authentication, i.e. there is no mechanism to
determine which message comes from which CAN node, this type of attack becomes
easier to perform. BhanuPratap can compromise an ECU and send data frames
with a modified ID field with malicious intent in order to achieve a desired effect or
cause damage to the system.

3.5.5 Fuzzy Attack

Fuzzy attacks are used to gather information about the system in an attempt to find
an entry point to exploit. Fuzzing attacks are performed with the aim of applying
pressure to a system causing it to behave unexpectedly. In practice, fuzzing involves
feeding the system invalid or random data as input and observing the changes in
the state of the system caused by these inputs in an attempt to find a vulnerability.
This attack can also be used by professionals to assess the security of a system
by identifying vulnerabilities in systems and reporting them which in turn can help
improve the security of the system. Fuzzing points out the vulnerabilities and shows
how an adversary can interact with it to exploit the system. It also demonstrates
the impact of fixing the found vulnerabilities on the security of the system.

In an automotive system, an attacker like BhanuPratap focuses mainly on randomiz-
ing the IDs and data of every message that they inject. Consider a fuzzer, which is a
program that generates messages with randomized data and ID repeatedly. The at-
tacker observes the system for every such injected packet. If the system proceeds to
exhibit normal behavior the randomized fields do not seem interesting. However, if
the system behaves unexpectedly due to the injected message, then the randomized
field values are saved and injected again to monitor the changes closely.

25


3. Conceptual Foundation

3.5.6 Flooding Attack

Flooding attacks are a type of DoS attacks which aim at overwhelming a network or
a server with a large number of requests or traffic. By sending in traffic in such high
volumes, the attacker makes the server or network exceed the limits of processing
traffic rendering it incapable of handling any more traffic. By doing so, legitimate
users become unable to access the services the network or server is supposed to
provide. A DoS attack can take many forms and a flooding attack is one of them.
The aim of most DoS attacks is to exhaust the resources of a server so that legitimate
users cannot access them. The way flooding attacks achieve the same is through
sheer traffic volume. All flooding attacks are DoS attacks but not all DoS attacks
are flooding attacks. Furthermore, flooding attacks can be of various types like UDP,
Ping, SYN, and HTTP flood where a large volume of their respective request types
are generated and sent to the system.

In the case of CAN, BhanuPratap sends a large number of messages to the CAN bus,
more than what the network can handle resulting in a DoS attack. The impact of a
CAN flooding can be catastrophic, as it can lead to the loss of vehicle control or the
disruption of critical industrial processes. This can cause the system to malfunction
and can even cause safety hazards which may lead to loss of life.

3.5.7 Isolation Attack

In an isolation attack, an attacker gains access to the victim and then attempts to
cut off its communication from the rest of the network. There are various methods
in which an adversary can perform this attack, like, by disabling network interfaces
or creating rules in the firewall that prevent incoming and outgoing traffic to and
from the victim. The objective of an isolation attack is to isolate the victim from
other devices and systems, increasing the difficulty of detecting and responding to
the attack.

In the case of CAN, this attack aims to isolate one or more ECUs from the rest of
the network, thereby disrupting the functionality of the entire system. BhanuPratap
has various means of performing this attack on the target system. Once an ECU
is isolated, BhanuPratap can cause a lot of damage by modifying its firmware or
configuration. Since, there might be some ECUs that rely on signals coming from
other ECUs, an isolation attack can have serious consequences. Isolating an ECU
that sends out signals which other ECUs rely on can cause the entire system to fail.

3.5.8 Overwrite Attack

An overwrite attack is a type of attack where an attacker maliciously modifies the
content of a CAN message to overwrite or manipulate critical data within the mes-
sage [44]. This attack aims to disrupt the normal operation of the CAN network

26


3. Conceptual Foundation

and can have severe consequences in automotive systems. Each message has a fixed
data field size, which contains the payload or information being communicated. The
way this attack is performed in the case of in-vehicular networks is by creating a
malicious message with the same parameters as the targeted message. The data
field of the malicious message is altered.

This malicious message is sent right before the targeted message, so the system ac-
cepts the malicious message and adopts the signal values of the malicious message
instead of the legitimate message causing the system to exhibit a different behavior
than expected. The impact of an overwrite attack in a CAN network depends on
the nature of the modified data. For example, in a vehicle system, an attacker like
BhanuPratap could modify the sensor data being transmitted over the network, lead-
ing to inaccurate or misleading information being processed by the receiving nodes
or an attacker could overwrite a message that contains signals for the speedometer
reading of a car, causing it to display incorrect speed information to the driver which
could lead to over speeding or excessive breaking. Overwrite attacks can potentially
result in improper vehicle operation, compromised safety, or unauthorized control
of certain vehicle functionalities.

27


3. Conceptual Foundation

28


4
Design and Implementation

.

This section of the report discusses the design and implementation, as well as the
specific methods used in the process of developing the final framework. Additionally,
ways of evaluating the framework’s performance along with the results it produces
will be elaborated on in a step-by-step manner.

4.1 System Overview
In order to understand the working of the framework, a high-level description of its
different components is essential. The objective of this thesis is to develop an attack
traffic generator that introduces attack traffic into the available trace files.

Generate Attacks 
according to 

Patterns

Attack 1 

Attack 2

Attack 3

Attack 4

CAN Trace 
File

Extract 
Message 

Parameters

Find Patterns 
using Message 

Parameters

Timestamp 
Analysis

Message ID 
Analysis

Trace File Analysis

FRAMEWORK

Attack Generation

Use patterns 
found on 
analysis

Trace File 
with 

Injected 
Attacks

Inject attacks 
into trace file

Figure 4.1: System overview flowchart

29


4. Design and Implementation

The implementation of the framework revolves around ASCII trace files for the
reasons discussed in 3.4.3 on page 22. Additionally, the attacks should be injected
in a clever manner by analyzing trace files for noticeable patterns and sequences of
messages. This analysis helps reduce the need for user input to generate and insert
attacks into trace files. The main intention of this analysis is to improve attack
profiles by introducing randomness and observing existing patterns to model an
attacker more realistically. The framework contains a set of defined attack profiles
that are applied to a selection of trace files. The framework must be capable of
generalizing this method to handle any trace file.

The capabilities of this framework can be divided into two distinct components that
will be discussed in detail in the sections to follow. These components are trace file
analysis and attack generation. Both of these components yield results that will
be evaluated along with the performance of the framework based on a few defined
metrics.

4.2 Trace File Analysis
This section provides a detailed overview of our analysis methodology and its imple-
mentation in the framework. The analysis is based on 2 parameters of the CAN mes-
sage, the message IDs and timestamps. By examining these parameters, patterns,
and characteristics were identified that could be exploited to attack the system in
a clever manner. Identifying patterns in the frequency of occurrence of messages,
repeating sequences of messages, and the interval in which these messages arrive can
reveal a comprehensive understanding of the trace file’s dynamics.

4.2.1 Extracting Message Parameters
To be able to analyze the traffic in an effective manner, the different parameters of
CAN messages must be extracted. To automate the extraction and analysis process,
we developed specific functions within our code. The way this is achieved in the
framework is by reading all the lines of the trace file and then splitting each entry
into its constituent parameters. The first function called parse_files() takes the
path of the file as input and reads the entire trace file. Then the file is read line by
line and a second function called split_entry() takes each line and splits it with
white space as the delimiter. The entry after being split is stored as a list from which
a third function called get_parameters() extracts each parameter and adds it to
its corresponding list. In the end, we get a list for each parameter for the entire
trace file.

CAN messages have a unique format that allows the framework to differentiate
between standard CAN messages and comments and variants of CAN like CANFD.
Since the timestamps are the only unique parameter in the trace file, a dictionary
is created to keep track of each entry with the timestamp as the key. Dictionaries
do not allow duplicates to exist which is a useful property. This makes the injection
of attacks and the manipulation of messages easier as dictionaries can be sorted

30


4. Design and Implementation

Figure 4.2: Message ID frequency of a trace file

easily and each message can be accessed using just the key. Having each parameter
available as a list makes their analysis easier for the framework.

4.2.2 Message ID Analysis
Every message in the trace file is closely related to its message ID. The objective of
analyzing this parameter is to find which message IDs appear the most and if there
are groups of messages that appear together repeatedly throughout the trace file.
The existence of such messages shows their importance in the system and targeting
such messages might make attacks more effective and devastating.

4.2.2.1 Frequency of Occurrence

By analyzing the extracted list of message IDs, their frequency of occurrence can be
easily determined using the get_id_stats() function. This function takes the list
of IDs and simply counts the number of times each ID occurs in the list. By doing
so for all message IDs, a dictionary can be created which can be then used to easily
access the most frequently occurring messages.

The figure 4.2 is a graphical representation of the dictionary obtained by using the
function. If malicious messages imitate the messages that appear frequently, the
probability of these malicious messages being accepted as valid CAN messages may
increase.

4.2.2.2 Repeating Sequences

In addition to frequency analysis, we searched for repeating sequences of message IDs
within the trace file. Identifying message sequences that frequently occur together
can reveal important relationships and dependencies between different messages.

31


4. Design and Implementation

These sequences may indicate specific system operations or processes that can be
targeted or exploited by an attacker. The establish_correlation() function takes
the list of IDs and returns a dictionary containing all sequences of messages that
occur together along with the number of times they are observed to be together.
The sequences that occur together frequently imply the dependency of the messages
in that sequence on each other.

4.2.3 Timestamps Analysis
One of the unique parameters of messages is their timestamp which gives information
about when the message arrived on the CAN bus. A lot can be said about a
message just by observing the time at which different instances of that message
occur. In order to be able to observe and extract meaningful information, the
timestamp of the occurrence of each individual message must be obtained. The
function get_msgs_by_id() takes a message ID, the timestamp list, and the ID
list of the entire trace file and returns a list of timestamps of that particular message.
This process is done for each distinct ID in the trace file and the list returned by
the function is directly used for analysis.

Cyclic and Partially Cyclic Messages: To understand the purpose of timestamp
analysis, the concept of cyclic messages must be known. If the time difference
between any two instances of messages that have the same ID is exactly the same,
then that message ID is said to occur in a cyclic manner in the system. This
difference in time is referred to as the cyclic difference for the purpose of this thesis.
Since not all messages can always be cyclic, the concept of partially cyclic messages
arises. When messages with the same Message ID appear in a cyclic manner most
of the time, i.e, their cyclic difference is the same for the majority of their instances,
then we consider such messages to be partially cyclic. In the case of the framework,
for the message ID to be partially cyclic it must exhibit cyclic behavior at least 50%
of the time. If a message ID is not cyclic or partially cyclic, it is considered to be
acyclic.

32


4. Design and Implementation

Figure 4.3: Graphical representation of different message IDs based on their cyclic
difference

The function analyse_timestamp() takes the timestamp list of specific message
IDs and returns a list of time differences between 2 consecutive instances of messages
with that message ID. All occurrences of the unique time differences (delta) are
counted and converted into a dictionary. This process makes it easier to classify
message IDs as cyclic, partially cyclic, or acyclic (see figure 4.3). Furthermore, the
period in which a message ID exhibits cyclic behavior can be obtained.

The regularity in the timing of cyclic messages allows for precise prediction of when
these messages will occur. The variability in the occurrence of partially cyclic
messages adds some level of unpredictability, making their timing less consistent
compared to fully cyclic ones. This information proved useful for exploiting these
patterns by targeting specific time windows or disrupting the regularity of cyclic
messages.

4.3 Attack Generation
Once the analysis was in place, it was time to use the insights gained from it to
perform injections of attacks. Finalizing a smaller set of attacks from the list of
studied attacks presented in table 3.1 was the first step. For the scope of this thesis,
4 attacks were selected and implemented. These attacks are fuzzy, spoofing, replay
and overwrite attacks. The overwrite and spoofing attacks were selected due to
their ease of understanding and implementation. The fuzzy attack seems simple
but if successful can reveal a lot about any system. This attack is usually the
first attack performed by adversaries in an attempt to find an entry point into the
system and thus is an important attack to cover. The replay attack was selected
due to its effectiveness against encrypted systems. It appears easy to implement
but can have a devastating effect on a system. Furthermore, these attacks are
commonly encountered in real-world scenarios and are well-documented in related

33


4. Design and Implementation

research literature. This section discusses the details of the implementation of each
attack and the functions that aided in the process of implementation. In the end, the
average attack-to benign traffic ratio is presented for the output trace files generated
for every attack.

4.3.1 Fuzzy Attack
The fuzzy attack makes use of randomized IDs and message payloads, so the first
step in implementing this attack was to generate randomized IDs. However, the IDs
generated in this framework are not completely random. An algorithm to mimic
an attacker’s approach of manually randomizing IDs was created. The function
gen_random_ids() takes a list of IDs and performs a set of operations on each
ID present in that list to create a unique set of IDs for fuzzing. The input list given
to this function is created by taking the top 33% of the most frequently occurring
messages obtained through message ID analysis. Every ID in the input list given to
the function generates a series of at least 6 IDs. After removing all the duplicates a
big list of pseudo-random IDs is obtained. This algorithm generates both standard
and extended CAN message IDs in both decimal and hexadecimal formats.

Then, the bus and data length code (DLC) values are generated in a random manner.
Using the randomised DLC as input, another function called get_random_data()
generates a randomised payload of the same size in bytes as the DLC value. This ran-
domised payload is modified into a specific format. The function get_duration()
returns the first and last timestamps of the trace file. These values are used to
generate a random decimal timestamp between the first and last timestamp of the
trace file. After obtaining all the parameters required to form a malicious message,
a malicious entry is created and inserted into the dictionary with the generated ran-
dom timestamp as key. All of these tasks come together in a single function called
fuzzy_attack().

Figure 4.4: Injected fuzzy attack traffic into normal CAN traffic

Figure 4.4 is a snippet from the output file generated after injecting fuzzy attack
traffic into the trace file.

4.3.2 Replay Attack
To perform a replay attack, the target messages are re-transmitted at a later interval
in the system while all the other parameters of the target message remain unchanged.
The first step towards the implementation of this attack is to prepare a list of target
messages. To do so, the methods discussed for analysis of the trace file prove to be

34


4. Design and Implementation

very handy. First, the top 3 most frequently occurring message IDs are added to
the list. Assume this list is called "MsgList". Then the longest sequence of messages
that appear together is obtained and all the message IDs present in the sequence are
added to "MsgList". Now, for all the unique message IDs in "MsgList", the cyclic
time difference is calculated.

For the messages in "MsgList" all other message parameters are obtained. Malicious
messages are created using these parameters but with a new timestamp, which is the
sum of cyclic difference of the corresponding message and the original timestamp
of the target message. Additionally, the last occurrence of each message ID present
in "MsgList" is obtained and is replayed with two different timestamps. The first
timestamp is the sum of the original message timestamp and its cyclic difference
and the second timestamp is the sum of the original message timestamp and a
random decimal number between the first and last timestamp of the trace file. This
way the messages are replayed in a cyclic as well as a randomised manner. The
replay_attack() function does all of the tasks mentioned above to successfully
inject replay traffic into the input trace file.

Figure 4.5: Injected replay attack traffic into normal CAN traffic

Figure 4.5 shows how messages at line 1 and 2 are being replayed at lines 464 and
470 respectively. The parameters of the replayed messages are identical to their
target messages.

4.3.3 Overwrite Attack
The idea behind the overwrite attack is to inject a malicious message right before
a target message. This malicious message should have identical parameters as com-
pared to the target but with an altered payload. The implementation of this attack
is similar to the Replay attack. Instead of considering only the more frequently
occurring message IDs, all unique message IDs are accounted for and their cyclic
time difference is calculated. Using the get_same_period_diff, the timestamp
of the message IDs is obtained for the period in which they exhibit cyclic behavior.
These timestamps are converted into a list.

In order to be able to inject a malicious message right before the target messages
arrive on the CAN bus, subtraction of the least significant digit of the timestamp
by 1 is performed. All the other parameters of the target message are retrieved and
a randomised DLC is used to generate a new random hexadecimal payload. Now,
that all the required parameters have been obtained, the malicious messages are

35


4. Design and Implementation

created and inserted into the dictionary with the new timestamp (least significant
digit subtracted by 1) as key. The function that performs all of the above mentioned
tasks is called overwrite_attack().

Figure 4.6: Injected overwrite attack traffic into normal CAN traffic

In figure 4.6, it can be observed that the attack messages and the target messages
have similar parameters like message ID but have completely different payloads. The
timestamps of the attack messages appear right before the target messages.

4.3.4 Spoofing Attack
The spoofing attack is different from the other attacks implemented in the framework.
The biggest difference is that spoofing attack simply manipulates messages to change
their payload parameter. Where the other attacks create attack traffic and inject it
into the trace file, spoofing does not perform any injection of attacks. The procedure
of cleverly selecting messages for spoofing is similar to how the overwrite attack is
implemented. Instead of considering all of the messages, the top 15 most frequent
messages are considered and their timestamps are obtained for the period in which
they exhibit cyclic behavior.

All of the parameters corresponding to the targeted messages are retrieved and a
random payload is generated using a random DLC value. Since dictionaries do not
allow duplicate key values to appear, using the same timestamp as the target message
with a manipulated message payload replaces the target message with the malicious
one in place. The spoofing_attack() function performs all of the above-mentioned
tasks.

Figure 4.7: Spoofed messages compared to original attack-free trace file

Figure 4.7 compares the original attack-free trace file and the output generated after

36


4. Design and Implementation

introducing spoofing attacks to the same file. Lines 17 and 22 (highlighted in the
image) show the exact same messages but with varying payloads.

4.4 Evaluation Framework
In this thesis, there are three distinct types of results that need to be evaluated in
order to gain meaningful insights from them. These results are output trace files
with attack traffic, the performance of the framework, and a comparison of traffic
analysis of trace files between different real-world and synthetic trace files.

4.4.1 Comparison of Trace Files: Real world vs Synthetic
traffic

Analysis of the trace files can reveal a lot of information about the behavior of the
traffic. The idea behind comparing the insights gained from drawing a contrast
between different trace files is to be able to distinguish between the trends followed
in different systems. For this, a set of trace files containing real-world (obtained
from real-life vehicles) and synthetic (man-made) trace files are used. The
analysis results are represented graphically so it becomes easier to draw insights for
discussion.

The trace files were subjected to multiple experiments that analyzed the differences
in parameters and patterns that can be observed. These experiments make use of the
analysis functionalities implemented in the framework. The same parameters that
were used for clever attack traffic generation namely, Message ID and timestamps
of messages are analyzed to find the differences in patterns.

Analyses like the frequency of message IDs, the standard deviation of timestamps,
and the ratio of cyclic to acyclic messages helped gain more information about these
trace files that highlighted noticeable differences between them. It was determined
that for the trace files considered, the message ID and the length of the data had
the highest correlation. Using the spearman correlation coefficient, the magnitude
of the correlation between the message ID and data length was determined.

4.4.2 Output Trace File: Randomness
To evaluate the output trace file obtained after execution of the framework, the
metric considered is called Randomness. Randomness refers to the ability of the
framework to generate varying outputs. This is a desirable effect for attack traffic
generators to test various security mechanisms. It serves two important perspectives.
In actual cyber-attacks, the attacker’s behavior is often unpredictable and can vary
significantly. By incorporating randomness into attack traffic generation, the simu-
lation of a more realistic and diverse set of attack patterns becomes possible, making
security testing more robust. Secondly, randomness expands the coverage of secu-
rity testing by exploring different combinations within attacks. Relying solely on

37


4. Design and Implementation

predetermined attack patterns may overlook some vulnerabilities or attack vectors
that are not explicitly defined.

In the framework, a certain level of randomness is introduced while inserting attack
traffic into the trace file. This randomness appears in the form of variable number of
malicious messages introduced and the way they are scattered throughout the trace
file. The framework is designed in a way that lets the number of attacks generated
be increased or decreased as required, yet, there still exists variations in the number
of malicious messages introduced for every instance of execution.

4.4.3 Framework: Execution time, Complexity
There are several ways in which a framework can be evaluated. For the scope of this
thesis, the execution time for creating and inserting each attack is compared for
trace files having different lengths. The execution time is the time from when the
trace file is read till the time the output trace file with attack patterns is generated.

Additionally, the scalability of the framework must be measured to see if it works
well for larger trace files as well as it does for smaller ones. For doing so, a metric
called the "scaling factor" (this name is used for the pur