Attack Traffic Generation for Network-based Intrusion Detection System Master’s thesis in Computer science and engineering Chandrika Neelap Harsh Vardhan Bhandari Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2023 Master’s thesis 2023 Attack Traffic Generation for Network-based Intrusion Detection System Chandrika Neelap Harsh Vardhan Bhandari Department of Computer Science and Engineering Chalmers University of Technology University of Gothenburg Gothenburg, Sweden 2023 ii Attack Traffic Generation for Network-based Intrusion Detection System Chandrika Neelap Harsh Vardhan Bhandari © Chandrika Neelap, Harsh Vardhan Bhandari, 2023. Supervisor: Magnus Almgren, Department of Computer Science and Engineering Advisor: Hjalmar Wennerström, Joergen Nilsson, Robert Bosch AB Examiner: Magnus Almgren, Department of Computer Science and Engineering Master’s Thesis 2023 Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg SE-412 96 Gothenburg Telephone +46 31 772 1000 Typeset in LATEX Gothenburg, Sweden 2023 iii Abstract The automotive industry is constantly coming up with technological advances mak- ing automotive vehicles a complex system consisting of a multitude of electronic, mechanical, and software components. A critical part of such systems is the elec- tronic control unit (ECU) which is responsible for controlling specific functions. Nowadays, automotive vehicles are equipped with more than 100 ECUs that control a wide range of functions, from essential (engine and power steering control) to com- fort (windows, seats, etc) to critical (airbags). The Controller Area Network (CAN) helps the ECUs communicate with one another using a common bus. The CAN bus is a message-based protocol that offers reliable, priority-driven communication of essential control data. The CAN bus, despite its reliability and efficiency, is prone to a variety of cyber attacks. The vulnerabilities of CAN towards cyber attacks can be reduced by the deployment of an Intrusion Detection System (IDS). IDS detects intrusions by ob- serving the events or by validating the range of different parameters in an attempt to identify malicious content that could potentially be an attack. This acts as a line of defence against cyber attacks and can play a huge role in safeguarding CAN based systems. In order to check the efficiency and reliability of security mechanisms like IDSs, they must be tested against malicious data to assess their ability to detect various types of attacks. However, the availability of malicious data is not ubiquitous. The objec- tive of this thesis is to investigate a methodology and develop a software that can manipulate and add known attack traffic into already existing data sets. The abili- ties and effectiveness of this attack traffic generating (ATG) software in mimicking real-life cyber attacks is evaluated through a series of experiments while highlighting its strengths and weaknesses. The experiments reveal that the developed software succeeds in introducing malicious traffic into benign traffic in a random fashion, which mimics real-life attack traffic. The time in which the software introduces these attacks is a function of O(n2). Keywords: Electronic Control Unit, Controller Area Network, Intrusion Detection System, Attack traffic generator. iv Acknowledgements We would like to acknowledge our supervisor from Chalmers University of Technol- ogy, Magnus Almgren, our supervisors from Robert Bosch AB, Hjalmar Wenner- ström, and Jörgen Nilsson for their support. We would also like to acknowledge Sravan Tatipala, PhD Candidate at Product Development Research Laboratory- BTH. Without their guidance, the success of this thesis would be unimaginable. Chandrika Neelap, Gothenburg, 2023-09-06 Harsh Vardhan Bhandari, Gothenburg, 2023-09-06 vi Contents List of Figures xi List of Tables xiii 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Aim and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Review of Literature 5 2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 CAN attacks and countermeasures . . . . . . . . . . . . . . . 5 2.1.2 Security Mechanisms for CAN . . . . . . . . . . . . . . . . . . 6 2.1.3 Existing CAN Traffic Datasets . . . . . . . . . . . . . . . . . . 7 2.1.4 Analysing CAN Traffic . . . . . . . . . . . . . . . . . . . . . . 8 2.1.5 Traffic Generators for TCP/IP . . . . . . . . . . . . . . . . . . 9 2.1.6 Traffic Generators for CAN . . . . . . . . . . . . . . . . . . . 9 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Conceptual Foundation 15 3.1 Controller Area Network (CAN) . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 CAN Architecture . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 CAN Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.3 CAN Security and Vulnerabilities . . . . . . . . . . . . . . . . 18 3.2 Traffic Analysis and Pattern Recognition . . . . . . . . . . . . . . . . 19 3.3 Intrusion Detection Systems (IDS) . . . . . . . . . . . . . . . . . . . 20 3.4 Attack Traffic Generation . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4.1 Characteristics of Attack Data . . . . . . . . . . . . . . . . . . 21 3.4.2 Experimental Evaluation and Validation . . . . . . . . . . . . 21 3.4.3 Trace Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5 Attack Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.5.1 Attacker Approach . . . . . . . . . . . . . . . . . . . . . . . . 23 3.5.2 The Replay Attack . . . . . . . . . . . . . . . . . . . . . . . . 24 vii Contents 3.5.3 Denial of Service . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5.4 Spoofing Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.5 Fuzzy Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.6 Flooding Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.7 Isolation Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.8 Overwrite Attack . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 Design and Implementation 29 4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Trace File Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2.1 Extracting Message Parameters . . . . . . . . . . . . . . . . . 30 4.2.2 Message ID Analysis . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.2.1 Frequency of Occurrence . . . . . . . . . . . . . . . . 31 4.2.2.2 Repeating Sequences . . . . . . . . . . . . . . . . . . 31 4.2.3 Timestamps Analysis . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Attack Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.1 Fuzzy Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Replay Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.3 Overwrite Attack . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.4 Spoofing Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4 Evaluation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.1 Comparison of Trace Files: Real world vs Synthetic traffic . . 37 4.4.2 Output Trace File: Randomness . . . . . . . . . . . . . . . . . 37 4.4.3 Framework: Execution time, Complexity . . . . . . . . . . . . 38 5 Experiments and Results 39 5.1 Comparison of Trace Files . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.1 TF1: Ratio of Cyclic Messages . . . . . . . . . . . . . . . . . . 40 5.1.2 TF2: Message Frequency . . . . . . . . . . . . . . . . . . . . . 40 5.1.3 TF3: Standard Deviation . . . . . . . . . . . . . . . . . . . . . 42 5.1.4 TF4: Parameter Correlation . . . . . . . . . . . . . . . . . . . 42 5.2 Output Trace File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2.1 R1: Attack to benign ratio . . . . . . . . . . . . . . . . . . . . 43 5.2.2 R2: Spread of attacks within trace file . . . . . . . . . . . . . 44 5.2.3 R3: Cyclic to acyclic ratio . . . . . . . . . . . . . . . . . . . . 46 5.3 Framework Performance . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.1 P1: Execution time . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2 P2: Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.4.1 Comparison of Trace Files . . . . . . . . . . . . . . . . . . . . 50 5.4.2 Output Trace Files . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4.3 Framework Performance . . . . . . . . . . . . . . . . . . . . . 52 5.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.7 Ethical concerns and sustainability . . . . . . . . . . . . . . . . . . . 54 5.7.1 Ethical issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.7.2 Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 viii Contents 6 Conclusion 55 Bibliography 57 ix Contents x List of Figures 3.1 CAN architecture as compared to the OSI model, taken from [33] . . 16 3.2 (a) standard CAN frame, (b) extended CAN frame, taken from [34] . 17 3.3 Snippet of an ASCII format tracefile . . . . . . . . . . . . . . . . . . 22 4.1 System overview flowchart . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Message ID frequency of a trace file . . . . . . . . . . . . . . . . . . . 31 4.3 Graphical representation of different message IDs based on their cyclic difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Injected fuzzy attack traffic into normal CAN traffic . . . . . . . . . . 34 4.5 Injected replay attack traffic into normal CAN traffic . . . . . . . . . 35 4.6 Injected overwrite attack traffic into normal CAN traffic . . . . . . . 36 4.7 Spoofed messages compared to original attack-free trace file . . . . . 36 5.1 Message ID frequency of real-world trace files . . . . . . . . . . . . . 41 5.2 Message ID frequency of synthetically created trace files . . . . . . . 41 5.3 Standard deviation of time intervals of different messages . . . . . . . 42 5.4 Attack plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5 Fuzzy attack randomness . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.6 Replay attack randomness . . . . . . . . . . . . . . . . . . . . . . . . 45 5.7 Overwrite attack randomness . . . . . . . . . . . . . . . . . . . . . . 46 5.8 Spoofing attack randomness . . . . . . . . . . . . . . . . . . . . . . . 46 5.9 Expected execution time of each attack . . . . . . . . . . . . . . . . . 49 5.10 Observed execution time of each attack . . . . . . . . . . . . . . . . . 49 xi List of Figures xii List of Tables 3.1 List of studied attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.1 Size of trace files used in experiments . . . . . . . . . . . . . . . . . . 39 5.2 Cyclic to acyclic messages ratio . . . . . . . . . . . . . . . . . . . . . 40 5.3 Correlation between IDs and data lengths . . . . . . . . . . . . . . . 43 5.4 Cyclic to acyclic messages ratio of attack-free and infected trace files 47 5.5 Size of trace files used in experiments . . . . . . . . . . . . . . . . . . 47 5.6 Computation time for each attack . . . . . . . . . . . . . . . . . . . . 48 5.7 Scaling factor for each attack . . . . . . . . . . . . . . . . . . . . . . 48 5.8 Equation for each attack . . . . . . . . . . . . . . . . . . . . . . . . . 50 xiii List of Tables xiv 1 Introduction Automotive systems in vehicles nowadays have become a vital part of life all around the globe. The most common application of automotive systems is the transporta- tion of populations. Modern automotive systems have improved the quality of trans- portation not just by increasing the range of travel but also by making the journey more comfortable [1]. Vehicles today are data-powered with a plethora of customiz- able features that focus on improved safety, efficiency, and comfort for the driver and the passengers by communicating with other vehicles and systems. To improve the quality of transportation, automotive vehicles comprise multiple in-vehicular electronic systems. Each system controls a unique function in the vehicle. These electronic systems inside a vehicle are all controlled by Electronic Control Units (ECUs). Due to the increasing need for comfort, a large number of electronic sys- tems now exist inside a vehicle, each controlled by its own ECU [1]. This has led to a drastic increase in the number of ECUs that communicate not just with each other but also with the environment. The most common way for communication between ECUs used in vehicles today is Controller Area Network (CAN). Its real-time properties, simplicity, and low cost make it favorable to use in automotive vehicles. Improved connectivity is a boon and a bane as it opens up an avenue for adversaries to attack these systems which can lead to catastrophes . Therefore, the need for improved automotive security arises in order to protect the communication integrity inside these systems. The majority of real-world attacks target ECUs connected through CAN [1]. With the rapid development of electronic and smart appliances in modern vehicles, several ECUs are integrated into the conventional CAN bus. It was highlighted by Huang et al. in [2] that the CAN bus does not offer protection against most attacks, and it is vulnerable to a variety of manipulations. Vulnerabilities in an auxiliary onboard appliance can turn it into relays, giving attackers an opportunity to interact with the CAN bus [2]. With this in mind, a lot of security mechanisms such as intrusion detection systems (IDSs) are being developed. To improve the ability of such systems to detect known attacks, they are tested against data sets of attack traffic mixed with benign traffic. A major challenge, both in academia 1 1. Introduction and industry is the lack of available traffic data, especially data containing known attacks [1]. In this thesis, we aim to come up with a solution for this problem for a Network-based IDS developed for CAN. 1.1 Background and Motivation The Controller Area Network (CAN) is a network protocol that is commonly used in modern vehicles to communicate between electronic control units (ECUs). It is a message-based protocol that allows for real-time information transmission between various car systems such as engine control, brakes, and safety systems [3]. The CAN protocol has been shown to be efficient and successful in facilitating vehicle- to-vehicle communication. However, its security flaws have become an increasing source of concern [4]. The increasing complexity of modern automobiles, with their various interconnected ECUs, has resulted in an increase in the number of potential attack vectors. As automobiles become increasingly integrated and autonomous, they also become increasingly susceptible to attacks. Unauthorized access to the CAN network can jeopardize vehicle safety, privacy, and functioning, endangering drivers, passengers, and other road users [5]. Malicious actors can launch network attacks by exploiting different vulnerabilities in the CAN protocol, including message spoofing, injection, and alteration [6]. Intrusion Detection Systems (IDS) were created as an important line of protection against CAN-based attacks. IDS scans the network for suspicious activity and seeks to detect and neutralize any malicious behavior [7]. However, the efficacy of intrusion detection systems is dependent on their ability to effectively identify and respond to established attack patterns. The lack of a comprehensive data set of CAN-based attacks makes designing and evaluating IDS for this protocol difficult [8]. A critical component of testing network-based IDS for the CAN protocol is the generation of realistic attack traffic patterns. Finding an innovative approach to generate traffic data that contains established attack patterns, on the other hand, can be difficult due to the limited availability of existing attack traffic data. The goal of this thesis is to devise a methodology for designing and implementing a software tool for generating synthetic attack traffic patterns that can mimic the characteristics of real-world attacks. It is to analyze attack data characteristics, generate, and change attack traffic in order to test network-based Intrusion Detection Systems over the CAN network. The idea is to develop software that can take a typical traffic trace file as input and insert abnormal traffic patterns using an attack template. The ability to create traffic data with known attack patterns is critical for assessing the performance and efficacy of intrusion detection systems [9]. The developed software tool is tested on it’s ability to introduce malicious traffic patterns into normal CAN traffic. The proposed thesis aims to contribute to the field of CAN network security and the evaluation of intrusion detection systems. By conducting an in-depth analysis of attack data characteristics and developing a tool for generating attack traffic, this 2 1. Introduction thesis seeks to enhance the ability to detect and mitigate attacks, thereby improving the overall security of the CAN protocol in modern vehicles. 1.2 Aim and Scope The objective of this thesis is to analyze the characteristics of attack data, generate and modify attack traffic in order to test the network-based Intrusion Detection System (IDS) over CAN. The aim is to develop a framework that can take a trace file with normal traffic as input and insert known malicious traffic patterns using an attack template. Finding a novel way to generate traffic data that contains known attack patterns is difficult due to the limited availability of existing traffic data that contains documented attacks. Additionally, this area of study is not extensively explored in the literature. The scope of the thesis involves the development of the framework, to introduce attack traffic and its evaluation. Furthermore, the attack traffic in the files should be injected without the need for user inputs indicating where these attacks should be injected. 1.2.1 Objectives The objectives of the thesis are: • To perform a thorough literature review on automotive data-communication security, CAN standards and its weaknesses, state-of-the-art electrical and electronics architectures (involving CAN network), protocols such as CAN, network management, IDS, and known attacks on vehicles relating to CAN. • Review common attack patterns (such as Denial of Service, Flooding), under- stand their implementation, and select the most suitable patterns to implement in the framework. • Develop an overarching framework in incremental steps that is capable of a) analyzing trace files, b) generating a set of defined attack profiles, and c) introducing the attack profiles on the selected trace files. • Generalize the framework to handle any trace file and improve the attack profiles to model an attacker for example by introducing randomness, history, and learning. • Evaluate the developed framework using defined metrics. 1.2.2 Research Questions In this thesis, we aim to address three key research questions pertaining to the de- velopment of a software tool for generating CAN traffic with known attack patterns. 3 1. Introduction The objectives of the thesis are explored through the following research questions: • How can a software tool be developed to generate traffic data containing known attack patterns, based on a trace file containing normal traffic which addresses the challenge of limited availability of such data in the field of automotive security research? • What are the limitations and challenges of generating synthetic traffic data for CAN-based networks? • How can the attack profiles be improved to make detection challenging for a security mechanism? 1.2.3 Outline The subsequent chapters of this thesis are organized as follows. Chapter 1 introduces the research by providing the background and motivation, outlining the aim, scope, objectives, and research questions. Chapter 2 presents a comprehensive review of the literature, identifying gaps and areas for further investigation. In Chapter 3, the conceptual foundation is established, covering the Controller Area Network (CAN) architecture, CAN frames, security vulnerabilities, traffic analysis, pattern recogni- tion, and Intrusion Detection Systems (IDS). Chapter 4 focuses on the implemen- tation, providing an overview of the system, trace file analysis, attack generation techniques, and the evaluation framework. Chapter 5 presents the conducted ex- periments and their results, including comparisons of trace files, randomness of the output trace file, and performance metrics of the framework. Finally, Chapter 6 con- cludes the thesis, summarizing the findings, discussing limitations, and suggesting potential areas for future work. 4 2 Review of Literature In order to proceed with the project, relevant knowledge of various aspects of auto- motive systems was required. It was only after we familiarized ourselves with the know-how of autonomous communication protocols and standards that generating attacks became possible. The following ideas discussed in various scientific publi- cations proved useful in reaching the level of information sufficient to achieve the objective of this thesis. Below we first provide a literature review before discussing the most relevant work, compared to our own. 2.1 Literature Review Initially, a small set of papers were obtained from our supervisor at Bosch, which helped us understand the basic concepts of automotive systems and their security. Through the references of these papers, more related researches were explored that primarily focused on CAN architecture along with its vulnerabilities and the file formats associated with it. It was important to be able to understand the infor- mation stored in these files to be able to manipulate them. The various attacks that are prevalent in CAN were identified and how different security mechanisms work to detect these attacks revealed information that was important to identify the characteristics of each attack. To gain an idea of what CAN traffic looks like, vari- ous existing CAN datasets were explored that contained both normal and malicious data. Learning how to analyze CAN traffic to extract as much information as pos- sible help in finding different ways to introduce malicious traffic into benign traffic. Before proceeding with the implementation, looking at existing traffic generators for both TCP/IP networks and CAN gave us insights on how to develop the software. 2.1.1 CAN attacks and countermeasures After gaining enough understanding of the CAN bus, different known attacks were studied in order to gain insights on which attacks prevail in CAN buses and how they can be performed. Bozdal et al. [10] examine the security challenges associated with the CAN bus. The authors identify various security threats that CAN bus networks are vulnerable to, 5 2. Review of Literature such as message injection, replay, and denial-of-service attacks. They also discuss the limitations of current security mechanisms used in CAN bus networks. The pa- per emphasizes the importance of addressing these security challenges and proposes various approaches to improve the security of CAN bus networks, including intru- sion detection systems, protocols for secure authentication, and methods for data encryption. Overall, the paper highlights the need for robust security solutions for CAN bus networks to ensure the safety and reliability of critical systems. Jo and Choi in [11] present a survey of attacks on CAN. The authors discuss the various types of attacks that CAN networks are vulnerable to, such as message injection, replay, and denial-of-service (DoS) attacks. The paper reviews existing countermeasures used to prevent these attacks. These measures include message authentication, encryption, and even IDSs. The authors evaluate the effectiveness of these countermeasures and highlight their limitations. The authors suggest future research to improve the security of CAN networks. A related research carried out by Dibaei et al. in [12] covers a wide range of attacks that can be performed on intelligent connected vehicles and measures that can be used to mitigate those attacks. The paper presents attacks that can exploit the CAN bus, the vehicle’s ECU as well as the sensors and actuators. It also discusses the potential damage that these attacks can cause, such as risks to safety and fi- nancial damage. The paper then describes various defense mechanisms that can be employed to protect intelligent connected vehicles, including secure communication protocols, intrusion detection systems, and hardware-based security solutions. The implementation of attacks on the CAN bus is one of the most important parts of this thesis. Some research papers presented a few common CAN attacks and how they can be performed. The attack presented by Thirumavalavasethurayar and Ravi in [13] is called a replay attack. This research describes a method for simulating a replay attack on the CAN bus. The paper begins by explaining what a replay attack is and how it can be utilized by an adversary to compromise the security of a CAN bus. Then a detailed description of how the replay attack is implemented is discussed along with its different components like the attacker, the victim, and the replay device. The results of their simulation showed that the replay attack was successful in compromising the security of the CAN bus. 2.1.2 Security Mechanisms for CAN We also explored the different techniques used in various security mechanisms that focus on protecting the network by detecting malicious traffic. Laufenberg et al. in [14] have performed a research that presents an approach widely used in automotive and industrial systems for detecting attacks on CAN communi- cation. The proposed approach analyzes the CAN message structure and identifies suspicious patterns that can indicate the presence of an attack using a machine 6 2. Review of Literature learning-based classification method to distinguish normal traffic from malicious CAN messages. Another research presented by Lenard and Bolboaca in [15] proposes a stateful fire- wall and IDS for the CAN bus. The objective of these systems is to detect and prevent various security threats by inspecting and filtering CAN bus traffic. Addi- tionally, the authors propose a mechanism for storing the data and events generated by the firewall and IDS in a secure manner (secure logging). The paper lays empha- sis on the importance of secure logging in IDSs and suggests that future research could be done on improving the efficiency of the proposed system. Fürst and Bechter [16] talk about the use of AUTOSAR for connected and au- tonomous vehicles. The paper highlights the benefits of using AUTOSAR for con- nected and autonomous vehicles, including enhanced scalability, reliability, and se- curity. The authors also discuss the challenges of improving the capabilities of AUTOSAR in the case of such vehicles, like the need for advanced communication protocols. Overall, the paper suggests that AUTOSAR is a promising approach for developing software systems in connected and autonomous vehicles. 2.1.3 Existing CAN Traffic Datasets After developing techniques for analyzing the traffic, the next step was to see how the extracted information can be used to cleverly inject attack patterns into benign traffic. For this we turned to existing datasets of attack traffic, to get an idea of what normal and malicious CAN traffic looks like. Hollifield et al. in [17] provides a detailed list of existing datasets for CAN intru- sion detection along with their characteristics. It emphasizes on the importance of developing a standardized methodology for the evaluation of an IDS’s performance for CAN networks. The paper introduces a new dataset called "ROAD Dataset" for this purpose. The dataset includes both benign and malicious traffic patterns. The authors of this paper present an analysis of the contents of the dataset along with attack frequencies and noticeable patterns. Zago et al. in [18] evaluate a dataset of CAN messages for the purpose of reverse engineering. This dataset contains over 5,000 CAN messages captured from a real- world vehicular setting. Each message in this ReCAN dataset is unique. The paper describes how the dataset was put together and its characteristics like the distri- bution of different types of messages and the frequency in which they occur. The dataset helps reverse engineer CAN messages to message signals and types along with decoding the data in the payloads. Sharafaldin et al. in [19] focus on the development of a new data set for IDS and highlights the importance of such diverse and representative data sets that can be used to train IDS in an effective manner. It highlights the shortcomings of existing datasets for TCP/IP networks and discusses a unique methodology for generating 7 2. Review of Literature a new dataset that is more realistic. The process of collecting network traffic data from various sources and preprocessing it is explained in the paper. The authors of the paper use an algorithm to generate synthetic malicious traffic by combining real network traffic with simulated attacks in an attempt to capture real-world attacks in their datasets. The dataset is evaluated by comparing the performance enhancements of IDS when using this dataset against existing datasets. 2.1.4 Analysing CAN Traffic A crucial part of the project was to analyze the traffic of the CAN bus to find patterns and extract as much information about the traffic and its characteristics as possible and the following researches helped gain insights about the same. Ezeobi et al. in [20] explore unsupervised machine learning techniques to analyze and understand the meaning and function of CAN signals in the payload. In order to identify signal patterns in the data, clustering algorithms are used to create groups of similar messages. Anomaly detection is used to identify system malfunctions or harmful traffic. The paper shows unsupervised machine learning as a useful tool for reverse engineering CAN messages to establish signal boundaries and identify those signals. A similar study was presented by [21] which presented A Modular Four-Step Pipeline for Comprehensively Decoding Controller Area Network Data called CAN-D. The pipeline mentioned in this paper includes analysis in four modules: the physical layer, message layer, signal layer, and application layer analysis. Each module has a unique job like identifying the type of CAN messages or decoding the signals in the payload. This helps users gain valuable insights into the functioning of the vehicle’s ECUs. Verma et al. in [22] propose an approach for tokenizing and translating CAN mes- sages in vehicles. This approach is called Automotive CAN tokenization and Trans- lation (ACTT), which makes use of a machine learning algorithm to divide the CAN messages into tokens in an attempt to translate them into a human-readable format. This study describes how ACTT can be implemented and evaluates its per- formance against a dataset of CAN traffic. According to the results, ACTT shows high accuracy and effectiveness in translating CAN messages in real time. Young et al. in [23] propose a machine-learning-based solution to reverse engineer- ing of CAN messages. The process is described in four steps including data pre- processing, feature extraction, model selection, and performance evaluation. Then the models obtained by this process are evaluated using a dataset of CAN messages. The results reveal that this approach is effective in decoding CAN messages and per- forms better than existing methods in terms of accuracy and processing speed. This approach has the potential to enhance the accuracy and efficiency of the analysis of CAN messages. 8 2. Review of Literature Lestyan et al. in [24] gives a way of re-identifying drivers based on the patterns of their driving with the help of sensor data extracted from CAN bus logs. This method includes feature extraction, dimensionality reduction, and clustering of the sensor signals. The authors test the effectiveness of their proposed method in its ability to accurately re-identifying drivers from a dataset of real-world CAN logs. 2.1.5 Traffic Generators for TCP/IP To gain more insights about how to proceed with developing our own framework, we explored various existing work that was similar to what we intend to achieve in this thesis. These researches are performed for different protocols. Puketza et al. in [25] discuss a software platform developed for testing IDS. The plat- form tests IDS in a more controlled and realistic manner, focusing on the evaluation of the IDS’s performance against different attacks. The paper describes the design and architecture of the software, which has a traffic generator to produce realistic network traffic, an attack generator for generating a variety of distinct attacks, and a data collector for analyzing the IDS’s output. The authors discuss the challenges associated with testing of IDS, such as the shortage of standardized datasets and the problem of reproducing real-world attacks. Behal and Kumar in [26] covers a wide variety of DDoS attack tools, some popular ones are LOIC (Low Orbit Ion Cannon), HOIC (High Orbit Ion Cannon), and XerXes. Other more sophisticated tools like Slowloris and RUDY are also covered in this study. These tools are compared by the authors based on different criteria like supported attack vectors, the intensity of attacks, stealthiness, and ease of use. It analyzes tools like hping, TCPReplay, and Iperf, which are traffic generators used to simulate DDoS attacks for testing and evaluation purposes, and their ability to generate realistic attack traffic patterns. A study that addresses the objectives of this thesis was performed by Erlacher and Dressler in [27]. The paper presents the design and implementation of GENESIDS, along with its evaluation using several IDS. The system automates the process of generating attacks by utilizing an algorithm, which can generate new attacks by combining existing ones. The authors highlight the importance of testing IDS to detect security threats. The results show that the system is capable of generating a wide range of attacks for testing IDSs. 2.1.6 Traffic Generators for CAN After gaining insights on what attack traffic generators look like for traditional TCP/IP networks, we explored similar existing work done by researchers for CAN. A related tool that also generates attack traffic is presented by Huang et al. in [28]. The paper presents a tool called ATG (Attack Traffic Generation) that is designed for security testing of the CAN bus. The tool is developed to generate attack traffic to evaluate the security of CAN bus systems specifically found inside vehicles. The 9 2. Review of Literature authors discuss the limitations of current security testing tools and propose ATG as a solution that can generate various attack scenarios. Yang et al. in [29] focused on the development of a penetration testing platform that can be used to evaluate the security of embedded systems that make use of the CAN bus. The platform is designed to provide a comprehensive and automated testing process that includes various attack scenarios, such as message injection, replay, and flooding. The study highlights the importance of penetration testing platforms that can effectively evaluate their security. The review of different literature presented different ideas that could be implemented in this thesis and aided in defining the scope of this thesis. The information gained from these articles helped in devising an approach and finding resources that proved critical to the advancement of the project. 2.2 Related Work The CAN bus is the most commonly used solution in in-vehicle networks. Hence, several platforms for various CAN bus testing hardware and software implementa- tions are already maintained by several commercial vendors. For example, CANoe is a proprietary test software developed by Vector Informatik which is used for de- veloping, testing, and analyzing electronic systems inside of vehicles. It provides a simulation environment that supports multiple bus systems and protocols like Lo- cal Interconnect Network (LIN), FlexRay, ethernet, etc. Through CANoe, one can simulate ECUs within a network and analyze them individually to detect anomalies in communication. It comes with tools for monitoring and analyzing communica- tion on the CAN bus and allows users to create complex test scenarios. These test scenarios can be automated and the results can be analyzed. Another tool, closely related to CANoe is CANalyser, also developed by Vector Informatik. It provides similar functionality as CANoe does with the added capabilities of logging and the ease of integration with other tools. UDSim (Unified diagnostic services simulator) is another such tool developed by Vector Informatik. This tool was later extended to work on vehicular networks as well. There are a number of tools similar to the ones discussed above that have the same functionality and prove extremely useful when it comes to simulating and analyzing in-vehicular networks and systems. BusMaster is another popular lightweight open- source tool released by BOSCH. These tools focus on testing the functionality of the system but have little to no provision for security testing. The focus of this thesis is on generating and injecting attack traffic into the CAN traffic. There are several research works done on open-source software packages which work with cheap, commonly used hardware configurations. Tools like SavyCAN help record messages in different file formats and have provisions for critically analyzing the data payload of CAN messages which can be later represented visually in the form of plots. Hounsinou et al. in [30] developed a CAN network analysis tool 10 2. Review of Literature called CarShark that helps decode and distinguish control messages and provides means for visualizing these messages. SavyCAN allows message filtering based on specific criteria that allow users to focus on relevant data. OCTANE (Open Car Testbed and Network Experiments) is a platform that aims to provide an open and collaborative environment for developers and researchers in the industry. It provides a testbed infrastructure that mimics real-life automotive systems. It allows for its researchers to study and evaluate a variety of automotive network protocols and communication technologies. There are several attack tools that work over traditional TCP/IP networks. Hping, a command-line tool, generates and transmits customized network packets which can be utilized to create various malicious attacks that can lead to Denial of Service (DoS). A popular framework called Metasploit, which is used for vulnerability assess- ments and penetration testing, consists of modules that can generate known attacks for different systems from a wide range of attack vectors. Low orbit Ion Cannon (LOIC) is a network stress testing tool that generates high volumes of TCP, UDP and HTTP flood attacks. This tool is misused to perform Distributed DoS attacks (DDoS). Scapy is a python-based interactive packet manipulation tool. It is a power- ful tool that firsts sniff packets before manipulating them to perform reconnaissance. Similar to Scapy, Impacket is a python library that provides limited capabilities to the user for packet manipulation. Tor’s Hammer is a python-based DoS testing tool that works through the TOR network. It uses random source IP addresses to make it difficult to trace back the attacks to the source. Behal and Kumar in [26] cover many such tools and draw a comparison highlighting the capabilities of each tool. The above tools provide a means for attacking systems. These tools are attack tools that are different from traffic generators (what we aim to develop in this thesis). Some traffic generator tools for TCP/IP networks are discussed below. ByteBlower is a tool developed by Excentis for testing the performance of a network. The primary aim of this tool is to perform testing of the devices connected in the network bu generating different types of traffic like UDP and TCP with varying payload sizes and rates. The generated synthetic traffic can simulate VoIP, web browsing and even video streaming. Another tool, called Geist traffic generator (GTG) developed by Geist Technologies is capable of mimicking the traffic patterns of various applications. It supports the testing and analysis of network protocols like IPv4, IPv6, UDP, ICMP, TCP and many more. GTG has features that enable analysis of network characteristics and generating reports of its analysis. Harpoon is a vulnerability scanner tool that scans the network for common vulnerabilities and exposures and any other known security problems. Along with these, the tools that were initially developed for traditional TCP/IP networks are now being extended to work on CAN networks as well, for example, UDSim. Many python Libraries were developed to support development for CAN. Python-CAN is a widely used library that provides a simple mechanism for sending and receiving CAN messages. This library supports various CAN hardware inter- 11 2. Review of Literature faces like socketCAN and CANtact as discussed above. Other tools like CANard, pyvit and canmatrix make it easy for developers to interact with CAN networks to develop applications. CANtools is a library that helps work with CAN databases (DBC files). It gives its users the ability to parse database files in different for- mats. These database files contain information about decoding CAN messages into human-readable format. Sharafaldin et al. in [19] work on creating a single dataset that eliminates the shortcomings of older datasets which proved to be outdated and unreliable. This new dataset contains seven new and common attack patterns which are introduced into benign data by simulating them in order to meet real world criteria. This dataset is developed for TCP/IP networks. This work differs from the software tool developed in this thesis as it does not focus on CAN networks. Moreover, the software tool developed in this thesis generates a completely different output log or dataset on every iteration. There are a number of tools that have been developed to generate attack traffic specifically for in-vehicular networks. They have similar capabilities as the tools discussed above but were developed with CAN in mind. CANard is an open-source Python-based framework designed for CAN. It has the ability to send and receive CAN messages through hardware interfaces and can manipulate these messages as desired. This tool can generate custom CAN traffic and mimic different scenarios which makes it extremely useful for testing and developing CAN-based environments. SocketCAN is another such tool that have similar capabilities but additionally is equipped with mechanisms for message filtering and routing. The application can use CAN identifiers and other fields of the frame to set filters to gather messages of high relevance. CANtact is a tool developed by Erik Evenchick that has gained popularity among developers. It provides a USB-to-CAN interface that connects to the USB port to interface with CAN networks. Additionally, it enables its users to capture CAN traffic and analyze the traffic which proves extremely useful for debugging and security analysis of CAN networks. Apart from existing tools and libraries, there are several research projects that aim to address similar problems as discussed in this thesis. One such research was performed by Huang et al. in [28]. They work on developing an Attack Traffic Generation Tool (ATG). ATG provides a free and functional toolkit for automotive security researchers for easy and effective interaction with real or simulated CAN buses. This tool generates attack traffic for the evaluation of security mechanisms developed for CAN systems. The authors compare their work with other developed tools and highlight their strengths and weaknesses. The attack generation capabili- ties of ATG differ from that of the software tool discussed in this thesis. ATG takes a set of attacks and injects them into a log file which is then sent to the CAN bus in real time. Each attack is configured before hand and a fixed number of such attacks are injected into the log files in fixed intervals of time. The software tool developed in this thesis is capable of analysing the CAN log files and based on it’s analysis, a variable number of malicious attacks are injected at varying intervals of time. 12 2. Review of Literature Another study is performed by Erlacher and Dressler in [27]. The tool developed by the authors is called GENESIDS that automatically generates user defined HTTP attacks. This allows network traces to be created in a straightforward manner. The tool depends on the rules present in Snort (NIDS) to generate attacks that would trigger the corresponding rules. This tool differs from the software tool developed in this thesis as it is not designed for automotive networks. Additionally, the tool developed in this thesis does not rely on rules to generate attack traffic, instead it uses information obtained by analysing CAN logs to generate attack traffic. A similar study by Palanca et al. [31] explores a specific type of cyber-attack on vehicular networks that operates on the link layer which is responsible for the com- munication between devices connected in the network. This attack is shown to be stealthy as it is designed to be difficult to detect and trace back to the source. It severs communication between ECUs and provides a detailed explanation of how it does so along with some countermeasures against this attack. This attack se- lectively targets ECUs, which makes it all the more dangerous as it can disrupt communication between ECUs performing critical tasks which can cause a lot of damage. Radu and Andreea-Ina in [32] focus on the implementation of security measures in order to protect the in-vehicle network from unauthorized access, potential attacks, and data breaches. The paper discusses different layers of security, including the architecture of the network used, authentication mechanisms, and intrusion detec- tion systems. In any security-related research, the objective is to ensure the safety and integrity of communication within the in-vehicle network. The paper dives into different security techniques and best practices that can be adopted to safeguard any system against cyber threats, such as firewalls, IDS, secure boot processes, access control mechanisms, and secure update mechanisms for ECUs. There are many other pieces of research that have not been mentioned in this section that have contributed to the field of automotive security. All these studies have been a source of motivation to proceed and find answers to the problems discussed in this thesis. In order to know how the framework was developed the right knowledge of some concepts is required which are presented in the following chapter. 13 2. Review of Literature 14 3 Conceptual Foundation 3.1 Controller Area Network (CAN) The Controller Area Network was an idea birthed by engineers at the Robert Bosch GmbH in Germany in February of 1986. Their aim was to devise a system that would enable communication between multiple ECUs in vehicles. Since then, the protocol has found its way into every mode of transport from cars to trains to even ships. Every modern automotive system has at least one CAN network installed. CAN is a bus-based protocol that has proved to be a very reliable communication protocol for communication between vehicles and their surroundings [10]. The CAN bus uses serial communication which reduces the number of wires inside the system. Even though this was not the main intention when developing this protocol, it proved to be a useful by-product of the protocol. The use of multiple processors improves the performance of the system. The decrease in the cost of micro- controller chips at that time made a multi-processor architecture in a single system feasible. CAN was originally devised with automotive systems in mind but today it has found its way into all the fields where inter-microprocessor communication is required. 3.1.1 CAN Architecture CAN is a 2 wire half-duplex high-speed network system that is suitable for High- speed real-time applications as it has low memory and CPU requirements. It also provides collision detection and prevention as it makes use of CSMA/CD. Every ECU must wait for a specific period of inactivity on the bus before a message can be transmitted by the same ECU. Every message is given a unique identifier which also contains the priority of the message, which is used to resolve any collisions that might occur. Lower values in the identifier indicate higher-priority messages. The implementation of the CAN bus is straightforward. It bypasses 4 layers of the OSI model (presentation to the Network layer). By doing so it saves memory resources and gains performance. CAN operate both on a datalink and physical layer. 15 3. Conceptual Foundation Figure 3.1: CAN architecture as compared to the OSI model, taken from [33] Every entity participating in communication over the CAN bus is called a node. A CAN node is a functional ECU that participates in the CAN network. Each node can send/receive a different number of messages onto the bus. The frequency of sending and receiving differs greatly as well. To participate in CAN communication every ECU should have a CAN interface that comprises a CAN controller and transceiver. The CAN Controller is responsible for processing information to and from the CAN bus. The CAN bus is the physical transmission medium that links all the participating nodes via a CAN interface. The transceiver connects the controller to the physical transmission medium. In order to transmit signals onto the bus, the transceiver is equipped with 2 bus pins, the CAN (High) and CAN (Low) pins. In order to signal the logical 0 and 1 signals, the transceiver creates a differential voltage i.e., the difference in voltage between CAN (High) and CAN (Low). The dominant bit (Logical 0) is assigned when a differential voltage of 2 V is observed and the recessive bit (Logical 1) is assigned when a differential voltage of 0 V is seen. The transmission of CAN messages does not follow any time sequence but rather is event-driven. The communication channel is only busy when there is a message to be transmitted. This makes access to the bus quick. Every CAN message on the bus can be received by any connected CAN node (broadcasting) if the message can be identified using a valid unique message identifier. A node can choose to either accept or reject a CAN message based on the relevance of the message to that node. 3.1.2 CAN Frames Messages in CAN are enclosed in a frame. The frame consists of various fields that facilitate the successful transmission of data over the bus. Figure 3.2 shows the format of CAN frames. 16 3. Conceptual Foundation Figure 3.2: (a) standard CAN frame, (b) extended CAN frame, taken from [34] • SOF: start of frame marks the start of a message. • Identifier: 11 bits, establishes priority of each message. Lower the binary value, higher the priority. • RTR: 1-bit, Remote transmission request, set to 0 when information required from another node. • IDE: 1-bit, identifier extension. 0 means standard CAN msg with no exten- sions transmitted. • r0: Reserved bit. • DLC: 4 bits, Data length code. Number of bytes being transmitted. • DATA: 64 bits data. • CRC: 16 bits, checksum for error detection. • ACK: Originally contains 1, if the message is valid and error-free all recipients overwrite this to 0. If not, all nodes overwrite this bit, then the message is discarded, and the sender re-sends the message after corrections. All nodes acknowledge the integrity of message. • EOF: 7 bits, end of CAN data frame. • IFS: 7 bits, stores time required by controller to move a correctly received frame to its proper position in message buffer area. An extended CAN message has all the above fields in the data frame along with the following additional ones: • SRR: Substitute remote request. 0 means standard CAN message and 1 means extended CAN message. 17 3. Conceptual Foundation • IDE: 1 in IDE implies there are more identifier bits to follow. • r1: additional reserve bit. The ID in the extended format of CAN has 29 bits as compared to the standard 11 bits. If the last bit of EOF is 1 then the message is error free and the frame is valid. If 0 is the last bit in the EOF field then the transmission of the message is repeated. In the CAN protocol, there are 4 different types of frames that can be transmitted. • Data Frame: Initiates, and maintains communication between partners. Trans- mits and protects user data, and establishes communication relationships de- fined in the communication matrix. The RTR bit is 0. • Remote frame: Request for transmission of data from another node. Like data frames except do not carry any data. The RTR bit is set to 1. Not used often. Lacks the data field but otherwise the frame format is the same. CAN controllers respond to a remote frame by sending the desired data frame. • Error Frame: Violates formatting rules of CAN messages as it is a special message. Transmitted when an error in a message is encountered in which case all other nodes are also forced to transmit the error frame. The original transmitter then transmits the re-transmit message. A node cannot tie up a bus by repeatedly transmitting error frames. • Overload Frame: Special message that is transmitted when a node becomes too busy. An extra delay of messages is then implemented. 3.1.3 CAN Security and Vulnerabilities The increasing connectivity of vehicles has exposed Controller Area Network (CAN) networks to various security challenges. While CAN was originally designed with a focus on reliability and efficiency, it lacks built-in security mechanisms. CAN networks are vulnerable to a range of attacks that can compromise the confidentiality, integrity, and availability of communication. This makes it susceptible to a range of security vulnerabilities and attacks [35]. One of the primary vulnerabilities in CAN networks is message injection. An at- tacker can inject malicious messages into the network, either by compromising a legitimate ECU or by gaining unauthorized access to the network. An attacker can also impersonate a legitimate ECU by sending messages with a spoofed source identifier. This can deceive other ECUs into accepting and executing malicious com- mands, potentially compromising the integrity and safety of the vehicle’s operations [35]. An attacker can intercept valid CAN messages and replay them at a later time. This 18 3. Conceptual Foundation can result in the re-execution of previously executed commands, leading to unwanted or dangerous actions by the ECUs [35]. Denial of Service (DoS) attacks pose yet another threat to CAN networks. By flooding the network with excessive messages or by targeting specific ECUs, an attacker can disrupt the normal functioning of the network. This can result in the loss of critical information or the unavailability of vital vehicle functionalities [35]. Understanding these vulnerabilities is essential for developing effective security mechanisms and intrusion detection systems. 3.2 Traffic Analysis and Pattern Recognition Traffic analysis and pattern recognition techniques play a crucial role in identifying anomalies and detecting attacks in Controller Area Network (CAN) networks. By analyzing the parameters and contents of CAN messages, these techniques enable the identification of patterns indicative of normal or malicious activities [36]. One aspect of traffic analysis is the examination of message parameters, such as the identifier, data length code, and data field of CAN messages. Analyzing the identifier can reveal patterns in the message sources or identify messages associated with specific functionalities or ECUs within the network [36]. Analyzing DLC distribution can help identify abnormal message sizes. Analyzing data field contents can provide insights into the nature of the messages and the information being communicated [36]. Furthermore, analyzing the contents of the data field can involve examining specific data bytes or bit patterns within the payload. CAN messages often carry critical information, such as sensor data, vehicle status, or control commands [37]. By ana- lyzing the content of these messages, patterns associated with normal or malicious behaviors can be identified. For example, abnormal values or ranges in sensor data can indicate sensor tampering or spoofing attacks. Similarly, the presence of spe- cific bit patterns or command sequences can be indicative of unauthorized access attempts or malicious commands [37]. Pattern recognition techniques, including machine learning algorithms, can be ap- plied to analyze the parameters and contents of CAN messages [38]. These algo- rithms can learn from labeled data to identify patterns associated with normal or malicious behavior. By training models on a dataset containing both normal and attack traffic, the algorithms can learn to distinguish between them and detect new or previously unseen attack patterns. Machine learning algorithms such as Support Vector Machines (SVM), Random Forests, and Neural Networks have been applied to analyze CAN message parameters and contents with good results [38]. Combining traffic analysis with pattern recognition techniques allows for a com- prehensive understanding of the CAN network’s behavior and the ability to detect anomalies and attacks in real time. By examining message parameters and contents, researchers and practitioners can gain insights into the normal operation of the network, identify deviations from expected behavior, and detect potential security 19 3. Conceptual Foundation threats. 3.3 Intrusion Detection Systems (IDS) Intrusion Detection Systems play a vital role in safeguarding CAN networks by monitoring network traffic and detecting suspicious activities or attacks. There are two main categories of Intrusion Detection systems that are used in CAN networks: IDS and IPS. An Intrusion Detection System (IDS) in the context of CAN refers to a security mechanism that monitors the network for any suspicious or malicious activities. Its primary purpose is to identify potential intrusions or attacks targeting the CAN bus. The IDS analyzes network traffic, examines CAN message payloads, and compares them against predefined patterns or known attack signatures to detect any anomalies or deviations from expected behavior. When an intrusion is detected, the IDS can generate alerts or take preventive actions to mitigate the potential impact of the attack [39]. On the other hand, an Intrusion Prevention System (IPS) goes beyond detection and aims to actively prevent or block intrusions in real time. It operates in a similar manner as an IDS by monitoring network traffic and analyzing message payloads. However, an IPS is equipped with the ability to take immediate action to prevent or mitigate potential threats. It can automatically generate and deploy security policies or rules to block malicious messages or suspicious activities. By actively preventing intrusions, an IPS enhances the overall security posture of the CAN network [40]. IDSs in CAN networks employ various techniques for traffic monitoring and anal- ysis. These include deep packet inspection, statistical analysis, machine learning algorithms, and rule-based systems. Deep packet inspection allows for the detailed analysis of CAN messages, examining their content and identifying any suspicious patterns or anomalies [40]. Machine learning algorithms can be employed to train these models using labeled or unlabeled data, enabling the detection of unknown or emerging attacks. Rule-based systems utilize predefined rules or conditions to identify specific types of attacks based on their characteristic features [40]. By deploying IDSs in CAN networks, organizations can enhance the security of their vehicle systems, detect and respond to potential intrusions, and protect the integrity and safety of the vehicles and their occupants [39]. 3.4 Attack Traffic Generation To analyze the security posture of CAN networks and gauge the efficacy of intru- sion detection systems, realistic and representative attack traffic must be generated. There are numerous methods for producing attack traffic [41]. To mimic particular types of attacks, attack templates offer predetermined attack patterns or scenar- 20 3. Conceptual Foundation ios. Researchers can create traffic that simulates actual attack scenarios using these templates, which capture the traits and behaviors of known attacks. The addition of anomalies, changing packet parameters, or introducing differences in timing and payload content are examples of various methods that can be used to alter regular traffic patterns, one of which we’ll be using in our thesis as well [41]. Using this method, researchers may simulate complex and nuanced attack changes to assess how resistant IDSs are. Furthermore, by using machine learning algorithms to study and imitate well-known attack patterns, attack traffic that closely mimics actual attacks can be produced [41]. Researchers can create traffic that displays similar traits and behavior as seen in actual attacks by employing machine learning models trained on historical attack data [4]. Generating accurate and representa- tive attack traffic is crucial for evaluating the detection capabilities of IDSs and improving the overall security of CAN networks [41]. 3.4.1 Characteristics of Attack Data Understanding attack data characteristics is critical for generating realistic and rep- resentative attack traffic. While it is challenging to guarantee an exact replication of actual attack situations, certain factors of attack data, such as packet size, tim- ing, payload content, and traffic patterns, are taken into account to generate attack traffic that closely resembles real-world scenarios. These features can be better un- derstood by analyzing previous attack data, which allows the building of models that are suitable for producing attack traffic [42]. Studying the statistical properties of attack packet sizes, for example, can be helpful in determining the size distribution that should be replicated in generated traffic. Analyzing the historical trends of attack traffic, on the other hand, can help generate traffic with realistic timing characteristics [42]. The created attack traffic can closely match actual attack scenarios by capturing the nuanced aspects of attack data, resulting in more accurate and appropriate evaluations of IDS and CAN network security [42]. 3.4.2 Experimental Evaluation and Validation Controlled experiments and precise validation methodologies are required to evalu- ate the efficacy of attack traffic creation techniques and intrusion detection systems. In experimental evaluations, synthetic attack traffic is created using a variety of methodologies, and the effectiveness of IDS in identifying and preventing such at- tacks is evaluated [43]. The effectiveness of IDS and security measures is evaluated by performance assessment metrics like detection accuracy, false positive rates, and reaction time. These measurements also help validate the created attack patterns. For reliable and accurate evaluations, adequate experimental design, including relevant datasets and plausible attack scenarios, is essential [43]. The robustness and reliability of the 21 3. Conceptual Foundation experimental evaluation procedure are also increased by comparing it against current intrusion detection systems and comparing results with approved requirements [43]. 3.4.3 Trace Files Trace files help record communication that takes place over the CAN bus and store the streams of bytes in human-understandable formats. There are several trace file formats that are of high relevance to all automotive systems. In this research, 3 types of trace file formats were observed: DBC (Database for CAN) is a standardized file format developed by Vector In- formatik GmbH for interpreting communication data in Controller Area Network (CAN) networks. It serves as a protocol specification, converting raw CAN bus data into understandable values by extracting signals from bytes. DBC files are widely used for configuring ECUs, analyzing CAN bus traffic, and developing software ap- plications. They define message identifiers, signal definitions, scaling factors, and message encoding rules. DBC files are proprietary and specific to each manufacturer or tool provider. ARXML (AUTOSAR XML) is another standardized file format used in CAN net- works, particularly in the context of AUTOSAR (AUTOmotive open System AR- chitecture) systems. It represents the configuration of an ECU and contains de- tailed information about message structures, signal definitions, and parameter val- ues. ARXML files are crucial for integrating and configuring AUTOSAR-based systems, allowing the exchange of ECU configurations between different tools and platforms. They follow the AUTOSAR architecture, ensuring compatibility and interoperability across various automotive systems. ASCII (American Standard Code for Information Interchange) files are a text-based file format commonly used for storing CAN network communication data. Unlike DBC and ARXML files, ASCII files do not provide detailed message and signal definitions. Instead, they present CAN data as a sequential list of messages with timestamps, making them human-readable. ASCII files capture the raw streams of CAN messages, including message identifiers, data payloads, and timestamps. As can be seen in Figure 3.3, the second column displays the timestamps, followed by the bus number and message identifiers in the subsequent columns. These identifiers hold the priority assigned to each CAN message. The last columns represent the payload length in bytes, and the message payload is shown in the final column in hexadecimal format. Figure 3.3: Snippet of an ASCII format tracefile 22 3. Conceptual Foundation While they lack the decoding rules and structure provided by DBC and ARXML files, ASCII files are useful for logging and analyzing CAN bus traffic. They offer a convenient way to examine the sequence of messages, identify anomalies, and gain insights into the behavior of the CAN network. There are a lot of open-source tools that hackers use to analyze captured CAN traffic. CANalyser, CANoe, and Raptor-CAN are some popular ones. The captured data contains similar information about the CAN network as that of an ASCII file which means an attacker would only have access to messages as they appear in an .asc file, another reason why ASCII trace files were considered for this research. 3.5 Attack Models This thesis focuses on the study of the following selected attacks that are prevalent on the CAN bus. Some of these attacks were introduced in the form of malicious traffic into trace files for the purpose of simulating attack traffic and testing the reaction of the Intrusion Detection Systems toward these attacks. In section 3.5.1 we introduce we introduce our envisioned attacker. In the following subsections, we then first describe the general attack, followed by how our attacker would use it. 3.5.1 Attacker Approach For the purpose of this thesis, let us consider an adversary named BhanuPratap that is trying to exploit an automotive vehicle system in order to maximize the damage caused. BhanuPratap is a skilled adversary that understands the architecture of the CAN. The adversary has the ability to sniff the data frames being transmitted over the bus and the tools to translate the messages into an understandable format. By observing these messages, the attacker can understand the values set in the data frame and can interpret signal values that are critical in determining the state of the automotive system. BhanuPratap understands the impact that changing these signal and data frame values will have on the system. There are other ways to access raw CAN data and make sense of the information in the messages. ARXML and DBC files give a detailed description of the data present in the messages along with the different signal values present in them. These files contain information on how one can convert raw CAN bus data into human-readable values. However, it’s unlikely for an adversary to have access to these files as they are proprietary, and hence the adversary’s access is limited to only the raw information. 23 3. Conceptual Foundation Table 3.1: List of studied attacks Sr.No. Attack Name 1 Replay Attack 2 Denial Of Service (DoS) 3 Spoofing Attack 4 Fuzzy Attack 5 Flooding Attack 6 Isolation Attack 7 Overwrite Attack 3.5.2 The Replay Attack In the case of a traditional TCP network, a replay attack is defined as a network attack where valid transmissions of data are captured and repeated or delayed with malicious intent. It is performed by an adversary by intercepting data and re- transmitting it at a later instance in time. This attack is often a part of Spoofing attacks where an adversary tries to trick the system into believing that the attacker is a trusted entity in order to gain illegitimate access. CAN is a message-based protocol with no provision for addressing the nodes. This makes transmission of malicious messages easier and furthermore, the identification of the source of these malicious messages becomes difficult. Replay attacks can occur over secure systems that use encryption, and this makes them all the more dangerous and a good choice for adversaries like BhanuPratap. In order to prevent replay attacks, timestamps, and unique sequence numbers can be used. 3.5.3 Denial of Service The aim of the DoS attack is to make the system resources unavailable to its intended users. This is done by disrupting the services of a host connected to the network either temporarily or indefinitely. Practically, this is achieved by flooding the target machine with trivial requests to an extent where it is incapable of handling or accepting new requests. By doing so the system becomes incapable of processing requests coming from legitimate users. In the case where an adversary has control over multiple systems, they can send requests to the victim through different sources which produces the same effect but is a faster way of attacking the victim. This type of attack is called Distributed Denial of Service or DDoS. A CAN node broadcasts its message over the bus so every node receives the message sent by every other node. Broadcasting a high volume of messages causes the bus to overload. Collisions are possible over the CAN bus. In order to detect and resolve collisions, CAN uses message identifiers to assign priorities to every message. The lower the value of the identifier, the higher the priority of the message. In case of a collision, the lower-priority messages are simply discarded. An adversary like 24 3. Conceptual Foundation BhanuPratap can use this to their advantage to craft messages with small identifier values in order to produce a Denial of Service attack. 3.5.4 Spoofing Attack The aim of spoofing is for an adversary to masquerade as a trusted entity. The aim of such attacks is to hide the adversary’s identity while exploiting resources the victim has access to for personal gain. Spoofing can be done in a plethora of ways, by sending fake emails and text messages, IP spoofing, etc. Man in the middle is also a variant of spoofing attack that involves three parties, the user, the server, and the adversary. The adversary makes an individual connection with the server and the user and relays the messages between them, tricking them into thinking that the user and the server are communicating over a secure channel where in reality the attacker can listen to the entire conversation. Since CAN provides no means of authentication, i.e. there is no mechanism to determine which message comes from which CAN node, this type of attack becomes easier to perform. BhanuPratap can compromise an ECU and send data frames with a modified ID field with malicious intent in order to achieve a desired effect or cause damage to the system. 3.5.5 Fuzzy Attack Fuzzy attacks are used to gather information about the system in an attempt to find an entry point to exploit. Fuzzing attacks are performed with the aim of applying pressure to a system causing it to behave unexpectedly. In practice, fuzzing involves feeding the system invalid or random data as input and observing the changes in the state of the system caused by these inputs in an attempt to find a vulnerability. This attack can also be used by professionals to assess the security of a system by identifying vulnerabilities in systems and reporting them which in turn can help improve the security of the system. Fuzzing points out the vulnerabilities and shows how an adversary can interact with it to exploit the system. It also demonstrates the impact of fixing the found vulnerabilities on the security of the system. In an automotive system, an attacker like BhanuPratap focuses mainly on randomiz- ing the IDs and data of every message that they inject. Consider a fuzzer, which is a program that generates messages with randomized data and ID repeatedly. The at- tacker observes the system for every such injected packet. If the system proceeds to exhibit normal behavior the randomized fields do not seem interesting. However, if the system behaves unexpectedly due to the injected message, then the randomized field values are saved and injected again to monitor the changes closely. 25 3. Conceptual Foundation 3.5.6 Flooding Attack Flooding attacks are a type of DoS attacks which aim at overwhelming a network or a server with a large number of requests or traffic. By sending in traffic in such high volumes, the attacker makes the server or network exceed the limits of processing traffic rendering it incapable of handling any more traffic. By doing so, legitimate users become unable to access the services the network or server is supposed to provide. A DoS attack can take many forms and a flooding attack is one of them. The aim of most DoS attacks is to exhaust the resources of a server so that legitimate users cannot access them. The way flooding attacks achieve the same is through sheer traffic volume. All flooding attacks are DoS attacks but not all DoS attacks are flooding attacks. Furthermore, flooding attacks can be of various types like UDP, Ping, SYN, and HTTP flood where a large volume of their respective request types are generated and sent to the system. In the case of CAN, BhanuPratap sends a large number of messages to the CAN bus, more than what the network can handle resulting in a DoS attack. The impact of a CAN flooding can be catastrophic, as it can lead to the loss of vehicle control or the disruption of critical industrial processes. This can cause the system to malfunction and can even cause safety hazards which may lead to loss of life. 3.5.7 Isolation Attack In an isolation attack, an attacker gains access to the victim and then attempts to cut off its communication from the rest of the network. There are various methods in which an adversary can perform this attack, like, by disabling network interfaces or creating rules in the firewall that prevent incoming and outgoing traffic to and from the victim. The objective of an isolation attack is to isolate the victim from other devices and systems, increasing the difficulty of detecting and responding to the attack. In the case of CAN, this attack aims to isolate one or more ECUs from the rest of the network, thereby disrupting the functionality of the entire system. BhanuPratap has various means of performing this attack on the target system. Once an ECU is isolated, BhanuPratap can cause a lot of damage by modifying its firmware or configuration. Since, there might be some ECUs that rely on signals coming from other ECUs, an isolation attack can have serious consequences. Isolating an ECU that sends out signals which other ECUs rely on can cause the entire system to fail. 3.5.8 Overwrite Attack An overwrite attack is a type of attack where an attacker maliciously modifies the content of a CAN message to overwrite or manipulate critical data within the mes- sage [44]. This attack aims to disrupt the normal operation of the CAN network 26 3. Conceptual Foundation and can have severe consequences in automotive systems. Each message has a fixed data field size, which contains the payload or information being communicated. The way this attack is performed in the case of in-vehicular networks is by creating a malicious message with the same parameters as the targeted message. The data field of the malicious message is altered. This malicious message is sent right before the targeted message, so the system ac- cepts the malicious message and adopts the signal values of the malicious message instead of the legitimate message causing the system to exhibit a different behavior than expected. The impact of an overwrite attack in a CAN network depends on the nature of the modified data. For example, in a vehicle system, an attacker like BhanuPratap could modify the sensor data being transmitted over the network, lead- ing to inaccurate or misleading information being processed by the receiving nodes or an attacker could overwrite a message that contains signals for the speedometer reading of a car, causing it to display incorrect speed information to the driver which could lead to over speeding or excessive breaking. Overwrite attacks can potentially result in improper vehicle operation, compromised safety, or unauthorized control of certain vehicle functionalities. 27 3. Conceptual Foundation 28 4 Design and Implementation . This section of the report discusses the design and implementation, as well as the specific methods used in the process of developing the final framework. Additionally, ways of evaluating the framework’s performance along with the results it produces will be elaborated on in a step-by-step manner. 4.1 System Overview In order to understand the working of the framework, a high-level description of its different components is essential. The objective of this thesis is to develop an attack traffic generator that introduces attack traffic into the available trace files. Generate Attacks according to Patterns Attack 1 Attack 2 Attack 3 Attack 4 CAN Trace File Extract Message Parameters Find Patterns using Message Parameters Timestamp Analysis Message ID Analysis Trace File Analysis FRAMEWORK Attack Generation Use patterns found on analysis Trace File with Injected Attacks Inject attacks into trace file Figure 4.1: System overview flowchart 29 4. Design and Implementation The implementation of the framework revolves around ASCII trace files for the reasons discussed in 3.4.3 on page 22. Additionally, the attacks should be injected in a clever manner by analyzing trace files for noticeable patterns and sequences of messages. This analysis helps reduce the need for user input to generate and insert attacks into trace files. The main intention of this analysis is to improve attack profiles by introducing randomness and observing existing patterns to model an attacker more realistically. The framework contains a set of defined attack profiles that are applied to a selection of trace files. The framework must be capable of generalizing this method to handle any trace file. The capabilities of this framework can be divided into two distinct components that will be discussed in detail in the sections to follow. These components are trace file analysis and attack generation. Both of these components yield results that will be evaluated along with the performance of the framework based on a few defined metrics. 4.2 Trace File Analysis This section provides a detailed overview of our analysis methodology and its imple- mentation in the framework. The analysis is based on 2 parameters of the CAN mes- sage, the message IDs and timestamps. By examining these parameters, patterns, and characteristics were identified that could be exploited to attack the system in a clever manner. Identifying patterns in the frequency of occurrence of messages, repeating sequences of messages, and the interval in which these messages arrive can reveal a comprehensive understanding of the trace file’s dynamics. 4.2.1 Extracting Message Parameters To be able to analyze the traffic in an effective manner, the different parameters of CAN messages must be extracted. To automate the extraction and analysis process, we developed specific functions within our code. The way this is achieved in the framework is by reading all the lines of the trace file and then splitting each entry into its constituent parameters. The first function called parse_files() takes the path of the file as input and reads the entire trace file. Then the file is read line by line and a second function called split_entry() takes each line and splits it with white space as the delimiter. The entry after being split is stored as a list from which a third function called get_parameters() extracts each parameter and adds it to its corresponding list. In the end, we get a list for each parameter for the entire trace file. CAN messages have a unique format that allows the framework to differentiate between standard CAN messages and comments and variants of CAN like CANFD. Since the timestamps are the only unique parameter in the trace file, a dictionary is created to keep track of each entry with the timestamp as the key. Dictionaries do not allow duplicates to exist which is a useful property. This makes the injection of attacks and the manipulation of messages easier as dictionaries can be sorted 30 4. Design and Implementation Figure 4.2: Message ID frequency of a trace file easily and each message can be accessed using just the key. Having each parameter available as a list makes their analysis easier for the framework. 4.2.2 Message ID Analysis Every message in the trace file is closely related to its message ID. The objective of analyzing this parameter is to find which message IDs appear the most and if there are groups of messages that appear together repeatedly throughout the trace file. The existence of such messages shows their importance in the system and targeting such messages might make attacks more effective and devastating. 4.2.2.1 Frequency of Occurrence By analyzing the extracted list of message IDs, their frequency of occurrence can be easily determined using the get_id_stats() function. This function takes the list of IDs and simply counts the number of times each ID occurs in the list. By doing so for all message IDs, a dictionary can be created which can be then used to easily access the most frequently occurring messages. The figure 4.2 is a graphical representation of the dictionary obtained by using the function. If malicious messages imitate the messages that appear frequently, the probability of these malicious messages being accepted as valid CAN messages may increase. 4.2.2.2 Repeating Sequences In addition to frequency analysis, we searched for repeating sequences of message IDs within the trace file. Identifying message sequences that frequently occur together can reveal important relationships and dependencies between different messages. 31 4. Design and Implementation These sequences may indicate specific system operations or processes that can be targeted or exploited by an attacker. The establish_correlation() function takes the list of IDs and returns a dictionary containing all sequences of messages that occur together along with the number of times they are observed to be together. The sequences that occur together frequently imply the dependency of the messages in that sequence on each other. 4.2.3 Timestamps Analysis One of the unique parameters of messages is their timestamp which gives information about when the message arrived on the CAN bus. A lot can be said about a message just by observing the time at which different instances of that message occur. In order to be able to observe and extract meaningful information, the timestamp of the occurrence of each individual message must be obtained. The function get_msgs_by_id() takes a message ID, the timestamp list, and the ID list of the entire trace file and returns a list of timestamps of that particular message. This process is done for each distinct ID in the trace file and the list returned by the function is directly used for analysis. Cyclic and Partially Cyclic Messages: To understand the purpose of timestamp analysis, the concept of cyclic messages must be known. If the time difference between any two instances of messages that have the same ID is exactly the same, then that message ID is said to occur in a cyclic manner in the system. This difference in time is referred to as the cyclic difference for the purpose of this thesis. Since not all messages can always be cyclic, the concept of partially cyclic messages arises. When messages with the same Message ID appear in a cyclic manner most of the time, i.e, their cyclic difference is the same for the majority of their instances, then we consider such messages to be partially cyclic. In the case of the framework, for the message ID to be partially cyclic it must exhibit cyclic behavior at least 50% of the time. If a message ID is not cyclic or partially cyclic, it is considered to be acyclic. 32 4. Design and Implementation Figure 4.3: Graphical representation of different message IDs based on their cyclic difference The function analyse_timestamp() takes the timestamp list of specific message IDs and returns a list of time differences between 2 consecutive instances of messages with that message ID. All occurrences of the unique time differences (delta) are counted and converted into a dictionary. This process makes it easier to classify message IDs as cyclic, partially cyclic, or acyclic (see figure 4.3). Furthermore, the period in which a message ID exhibits cyclic behavior can be obtained. The regularity in the timing of cyclic messages allows for precise prediction of when these messages will occur. The variability in the occurrence of partially cyclic messages adds some level of unpredictability, making their timing less consistent compared to fully cyclic ones. This information proved useful for exploiting these patterns by targeting specific time windows or disrupting the regularity of cyclic messages. 4.3 Attack Generation Once the analysis was in place, it was time to use the insights gained from it to perform injections of attacks. Finalizing a smaller set of attacks from the list of studied attacks presented in table 3.1 was the first step. For the scope of this thesis, 4 attacks were selected and implemented. These attacks are fuzzy, spoofing, replay and overwrite attacks. The overwrite and spoofing attacks were selected due to their ease of understanding and implementation. The fuzzy attack seems simple but if successful can reveal a lot about any system. This attack is usually the first attack performed by adversaries in an attempt to find an entry point into the system and thus is an important attack to cover. The replay attack was selected due to its effectiveness against encrypted systems. It appears easy to implement but can have a devastating effect on a system. Furthermore, these attacks are commonly encountered in real-world scenarios and are well-documented in related 33 4. Design and Implementation research literature. This section discusses the details of the implementation of each attack and the functions that aided in the process of implementation. In the end, the average attack-to benign traffic ratio is presented for the output trace files generated for every attack. 4.3.1 Fuzzy Attack The fuzzy attack makes use of randomized IDs and message payloads, so the first step in implementing this attack was to generate randomized IDs. However, the IDs generated in this framework are not completely random. An algorithm to mimic an attacker’s approach of manually randomizing IDs was created. The function gen_random_ids() takes a list of IDs and performs a set of operations on each ID present in that list to create a unique set of IDs for fuzzing. The input list given to this function is created by taking the top 33% of the most frequently occurring messages obtained through message ID analysis. Every ID in the input list given to the function generates a series of at least 6 IDs. After removing all the duplicates a big list of pseudo-random IDs is obtained. This algorithm generates both standard and extended CAN message IDs in both decimal and hexadecimal formats. Then, the bus and data length code (DLC) values are generated in a random manner. Using the randomised DLC as input, another function called get_random_data() generates a randomised payload of the same size in bytes as the DLC value. This ran- domised payload is modified into a specific format. The function get_duration() returns the first and last timestamps of the trace file. These values are used to generate a random decimal timestamp between the first and last timestamp of the trace file. After obtaining all the parameters required to form a malicious message, a malicious entry is created and inserted into the dictionary with the generated ran- dom timestamp as key. All of these tasks come together in a single function called fuzzy_attack(). Figure 4.4: Injected fuzzy attack traffic into normal CAN traffic Figure 4.4 is a snippet from the output file generated after injecting fuzzy attack traffic into the trace file. 4.3.2 Replay Attack To perform a replay attack, the target messages are re-transmitted at a later interval in the system while all the other parameters of the target message remain unchanged. The first step towards the implementation of this attack is to prepare a list of target messages. To do so, the methods discussed for analysis of the trace file prove to be 34 4. Design and Implementation very handy. First, the top 3 most frequently occurring message IDs are added to the list. Assume this list is called "MsgList". Then the longest sequence of messages that appear together is obtained and all the message IDs present in the sequence are added to "MsgList". Now, for all the unique message IDs in "MsgList", the cyclic time difference is calculated. For the messages in "MsgList" all other message parameters are obtained. Malicious messages are created using these parameters but with a new timestamp, which is the sum of cyclic difference of the corresponding message and the original timestamp of the target message. Additionally, the last occurrence of each message ID present in "MsgList" is obtained and is replayed with two different timestamps. The first timestamp is the sum of the original message timestamp and its cyclic difference and the second timestamp is the sum of the original message timestamp and a random decimal number between the first and last timestamp of the trace file. This way the messages are replayed in a cyclic as well as a randomised manner. The replay_attack() function does all of the tasks mentioned above to successfully inject replay traffic into the input trace file. Figure 4.5: Injected replay attack traffic into normal CAN traffic Figure 4.5 shows how messages at line 1 and 2 are being replayed at lines 464 and 470 respectively. The parameters of the replayed messages are identical to their target messages. 4.3.3 Overwrite Attack The idea behind the overwrite attack is to inject a malicious message right before a target message. This malicious message should have identical parameters as com- pared to the target but with an altered payload. The implementation of this attack is similar to the Replay attack. Instead of considering only the more frequently occurring message IDs, all unique message IDs are accounted for and their cyclic time difference is calculated. Using the get_same_period_diff, the timestamp of the message IDs is obtained for the period in which they exhibit cyclic behavior. These timestamps are converted into a list. In order to be able to inject a malicious message right before the target messages arrive on the CAN bus, subtraction of the least significant digit of the timestamp by 1 is performed. All the other parameters of the target message are retrieved and a randomised DLC is used to generate a new random hexadecimal payload. Now, that all the required parameters have been obtained, the malicious messages are 35 4. Design and Implementation created and inserted into the dictionary with the new timestamp (least significant digit subtracted by 1) as key. The function that performs all of the above mentioned tasks is called overwrite_attack(). Figure 4.6: Injected overwrite attack traffic into normal CAN traffic In figure 4.6, it can be observed that the attack messages and the target messages have similar parameters like message ID but have completely different payloads. The timestamps of the attack messages appear right before the target messages. 4.3.4 Spoofing Attack The spoofing attack is different from the other attacks implemented in the framework. The biggest difference is that spoofing attack simply manipulates messages to change their payload parameter. Where the other attacks create attack traffic and inject it into the trace file, spoofing does not perform any injection of attacks. The procedure of cleverly selecting messages for spoofing is similar to how the overwrite attack is implemented. Instead of considering all of the messages, the top 15 most frequent messages are considered and their timestamps are obtained for the period in which they exhibit cyclic behavior. All of the parameters corresponding to the targeted messages are retrieved and a random payload is generated using a random DLC value. Since dictionaries do not allow duplicate key values to appear, using the same timestamp as the target message with a manipulated message payload replaces the target message with the malicious one in place. The spoofing_attack() function performs all of the above-mentioned tasks. Figure 4.7: Spoofed messages compared to original attack-free trace file Figure 4.7 compares the original attack-free trace file and the output generated after 36 4. Design and Implementation introducing spoofing attacks to the same file. Lines 17 and 22 (highlighted in the image) show the exact same messages but with varying payloads. 4.4 Evaluation Framework In this thesis, there are three distinct types of results that need to be evaluated in order to gain meaningful insights from them. These results are output trace files with attack traffic, the performance of the framework, and a comparison of traffic analysis of trace files between different real-world and synthetic trace files. 4.4.1 Comparison of Trace Files: Real world vs Synthetic traffic Analysis of the trace files can reveal a lot of information about the behavior of the traffic. The idea behind comparing the insights gained from drawing a contrast between different trace files is to be able to distinguish between the trends followed in different systems. For this, a set of trace files containing real-world (obtained from real-life vehicles) and synthetic (man-made) trace files are used. The analysis results are represented graphically so it becomes easier to draw insights for discussion. The trace files were subjected to multiple experiments that analyzed the differences in parameters and patterns that can be observed. These experiments make use of the analysis functionalities implemented in the framework. The same parameters that were used for clever attack traffic generation namely, Message ID and timestamps of messages are analyzed to find the differences in patterns. Analyses like the frequency of message IDs, the standard deviation of timestamps, and the ratio of cyclic to acyclic messages helped gain more information about these trace files that highlighted noticeable differences between them. It was determined that for the trace files considered, the message ID and the length of the data had the highest correlation. Using the spearman correlation coefficient, the magnitude of the correlation between the message ID and data length was determined. 4.4.2 Output Trace File: Randomness To evaluate the output trace file obtained after execution of the framework, the metric considered is called Randomness. Randomness refers to the ability of the framework to generate varying outputs. This is a desirable effect for attack traffic generators to test various security mechanisms. It serves two important perspectives. In actual cyber-attacks, the attacker’s behavior is often unpredictable and can vary significantly. By incorporating randomness into attack traffic generation, the simu- lation of a more realistic and diverse set of attack patterns becomes possible, making security testing more robust. Secondly, randomness expands the coverage of secu- rity testing by exploring different combinations within attacks. Relying solely on 37 4. Design and Implementation predetermined attack patterns may overlook some vulnerabilities or attack vectors that are not explicitly defined. In the framework, a certain level of randomness is introduced while inserting attack traffic into the trace file. This randomness appears in the form of variable number of malicious messages introduced and the way they are scattered throughout the trace file. The framework is designed in a way that lets the number of attacks generated be increased or decreased as required, yet, there still exists variations in the number of malicious messages introduced for every instance of execution. 4.4.3 Framework: Execution time, Complexity There are several ways in which a framework can be evaluated. For the scope of this thesis, the execution time for creating and inserting each attack is compared for trace files having different lengths. The execution time is the time from when the trace file is read till the time the output trace file with attack patterns is generated. Additionally, the scalability of the framework must be measured to see if it works well for larger trace files as well as it does for smaller ones. For doing so, a metric called the "scaling factor" (this name is used for the pur