Implementing a machine learning microser- vice for scoring and predicting vehicle driving attributes and their impact on costs Master’s thesis in Computer science and engineering Johan Blom Sam Sohrabpour Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2022 Master’s thesis 2022 Implementing a machine learning microservice for scoring and predicting vehicle driving attributes and their impact on costs Johan Blom Sam Sohrabpour Department of Computer Science and Engineering Chalmers University of Technology University of Gothenburg Gothenburg, Sweden 2022 Implementing a machine learning microservice for scoring and predicting vehicle driving attributes and their impact on operating costs © Johan Blom, Sam Sohrabpour 2022. Supervisor: Carl-Johan Seger, Department of Computer Science and Engineering Advisor: Alexander Crayvenn, Sigma Embedded Engineering Examiner: Aarne Ranta, Department of Computer Science and Engineering Master’s Thesis 2022 Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg SE-412 96 Gothenburg Telephone +46 31 772 1000 Cover: A stock photo which represents the dashboard of a car[37]. Typeset in LATEX Gothenburg, Sweden 2022 iv Implementing a machine learning microservice for scoring and predicting vehicle driving attributes and their impact on operating costs Johan Blom Sam Sohrabpour Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg Abstract Minimizing automotive insurance costs and other forms of operating costs has be- come a new priority within the vehicle industry as it is included in the expanding subscription based business model for vehicles. Research has shown that automotive insurance costs can be decreased if the automotive company manages to prove to the insurance company that driving behavior of the vehicles are better than expected. It has also been shown that additional operating costs such as sending a replacement car during service are also expenses which would be reduced by better driving be- havior. This study aims to provide an analysis that informs automobile companies how driving behavior attributes (trip data) affects operating cost. Through the use of machine learning models, the question is, if an automobile brand has trip data available, is it possible to create an analysis that can accurately predict service costs and other operating costs for the vehicle? The analysis was made through the use of a machine learning model using a super- vised algorithm called extreme gradient boosting. The machine learning model has been trained using trip data and operating cost data, where the model processes trip data to predict the likelihood of additional operating costs exceeding 10.000kr. In the absence of real data, the data used in the thesis was generated based on car statistics, not from real cars. The work was done with a micro-service structure, meaning multiple small services communicated with each other through API:s. The analysis for the final model demonstrated that it is possible to predict operating costs with a fairly good accuracy according to several evaluation metrics that was used to evaluate the model. The final model resulted with a 57% accuracy in find- ing vehicles with additional operating costs and a 92% accuracy in finding vehicles without additional operating costs. The results indicate that the dataset is too imbalanced due to that the rarity of requiring additional operating cost. This was handled by using the average trip data per car instead of processing all trip data individually. The machine learning models accuracy significantly increased once the imbalance ratio went above one car that requires operating cost for every 30 cars that does not. Keywords: Python, Micro-services, Machine Learning, Kubernetes, Azure ML, XG- Boost. v Acknowledgements Throughout the master thesis there have been several people who has advised us on what to look out for and more precisely what is needed. We would first like to thank our company supervisor Alexander Scott Crayvenn at Sigma Embedded Technology who even though was very busy, provided us with office space and weekly dinner. He also provided us with information about what types of frameworks and software most companies in the industry uses today. This thesis would have been significantly worse if it were not for him. We would also like to thank our academic supervisor Carl-Johan Seger who has provided us with feedback of the reports structure and grammatical issues on a weekly basis, the report would not have been even close to how it is today if it were not for him. Carl has also provided us with advice that gave us an idea of exactly what was needed for the report to be of good quality. But also warned us of potential problems that typically occurs for master thesis projects within the machine learning field. These warnings have not only saved us from a few heart attacks but also made the progress of the report feel significantly less ambiguous. At last but not least we would also want to thank our examinator Aarne Ranta who has been very open to talk to and has allowed us to be very flexible with our submission dates. In addition, we would also like to thank the excessive amount of high quality doc- umentation and information that the Python, Kubernetes and docker libraries pro- vides us with. While technological enhancements from the outsiders perspective might have not expanded as much for the past 3 years, the technological advance- ments in terms of learning and sharing software definitely have. vii Contents List of Figures xi 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.5 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Theory 5 2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Applications of Machine Learning . . . . . . . . . . . . . . . . 5 2.1.2 Data Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.4 Classification Algorithms vs. Regression Algorithms . . . . . . 7 2.1.5 Linear Regression and Logistic Regression Algorithms . . . . . 7 2.1.6 ARIMA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.7 Decision-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.8 Ensemble Method . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.9 Adaboost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.10 Gradient Boosting (GB) . . . . . . . . . . . . . . . . . . . . . 13 2.1.11 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Checking Quality of Algorithms . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Car Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Usage-based insurance . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Deployment of Machine Learning . . . . . . . . . . . . . . . . . . . . 17 2.5 Micro-services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.1 RESTful API . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5.2 Swagger UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.5.3 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5.4 Kubernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5.5 Kubernetes Services . . . . . . . . . . . . . . . . . . . . . . . 22 2.5.6 Kubernetes Ingress . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Methods 25 ix Contents 3.1 Machine Learning Model Training Platform . . . . . . . . . . . . . . 25 3.1.1 Generating Data . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1.2 Data Generation Algorithm . . . . . . . . . . . . . . . . . . . 27 3.2 Architecture Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Step 1, Data Processing . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Step 2.1, Choosing Algorithm and Training the Model . . . . 28 3.2.3 Step 2.2, Algorithms to Choose from . . . . . . . . . . . . . . 30 3.2.4 Step 3, Testing the Model . . . . . . . . . . . . . . . . . . . . 30 3.2.4.1 Testing Parameters . . . . . . . . . . . . . . . . . . . 30 3.2.5 Step 4.1, Deploying the Model to API . . . . . . . . . . . . . 31 3.2.6 Step 4.2, Setting up the Kubernetes Deployment . . . . . . . . 31 3.2.7 Step 4.3, Setting up the Docker Container . . . . . . . . . . . 33 3.2.8 Step 4.4, Connecting the Domain to an External IP . . . . . . 34 3.3 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.1 Scrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.2 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4 Results 41 4.1 Machine Learning Model Prediction results . . . . . . . . . . . . . . . 41 4.1.1 Kubernetes resource usage . . . . . . . . . . . . . . . . . . . . 44 5 Conclusion 45 5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1.1 Data Imbalance . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1.2 ML Result Interpretation . . . . . . . . . . . . . . . . . . . . . 46 5.1.3 Importance of result . . . . . . . . . . . . . . . . . . . . . . . 46 5.1.4 Limitation of results . . . . . . . . . . . . . . . . . . . . . . . 47 5.1.5 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.1.5.1 Safety score . . . . . . . . . . . . . . . . . . . . . . . 47 5.1.5.2 Hidden scores with positive affirmation . . . . . . . . 48 5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.1 Potential Improvements . . . . . . . . . . . . . . . . . . . . . 49 5.2.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Bibliography 51 x List of Figures 2.1 Example of a Sigmoid function [50] . . . . . . . . . . . . . . . . . . . 8 2.2 Example of a linear regression graph. . . . . . . . . . . . . . . . . . . 8 2.3 Example of a logistic regression graph. . . . . . . . . . . . . . . . . . 8 2.4 An example of how the Auto-regressive model ARIMA forecasts its values compared to normal linear regression model. The first picture represents the results made from the prediction of the linear regression model. The second picture are the results made from the ARIMA model. The brown dots represents the data sets that was used and the green dots represent the predictions that were made. . . . . . . . 9 2.5 An example of a decision tree where it tries to identify an animal . . 10 2.6 An example of a continuous tree [7]. . . . . . . . . . . . . . . . . . . . 11 2.7 An example of the structure of the bagging method [26] . . . . . . . . 12 2.8 An example of a ROC curve with the area under the curve marked in black [20]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.9 Is a screenshot of the API documentation that is automatically gen- erated by Swagger UI. In order to enter this site, one needs to change into /docs. . . . . . . . . . . . . . . . . . . . . . . 21 2.10 Example of a kubernetes architecture with a webpplication service which uses two replica pods connected to a database pod. . . . . . . . 23 3.1 Is a sketch of the Architectural plan for the machine learning process along with its process to deploy the machine learning model in the API so that it then can be used for other micro services. Each con- tainer either represents an object value or a Pipeline and the arrows represent the values required in order to make the pipelines to start. . 29 3.2 Depicts an example of testing that was made using different combi- nations for the maximum depth and minimum child weight of the trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 The combination that gave the best prediction is returned by the function and is thus used in the final classifier. . . . . . . . . . . . . . 31 3.4 A picture which represents an example of how a webapplication can communicate internally to a machine learning API by through GET and POST API calls through the use of the JSON language via a ClusterIP service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 xi List of Figures 3.5 The ideal Kubernetes architecture that is suggested to be used in a real world production case. Where the machine learning platform is made on Azure ML instead of FastAPI and where other microservices use the machine learning framework internally. To then afterward forward that internal connection into an external one with the added security comming from the Ingress service. . . . . . . . . . . . . . . 37 3.6 The Microservice architecture that is deployed for the master thesis. . 38 3.7 Is the architecture of the microservice depicting the connection with the company’s REST API and its front end web application. . . . . . 38 3.8 Running a terminal command to get all the services which exposes all of the IP address and ports that are open. . . . . . . . . . . . . . 38 3.9 Connecting the external IP-address of the LoadBalancer used in the master thesis to the A record in the DNS settings of a domain. . . . . 39 4.1 A stacked bar illustration of true/false ratio for the same dataset of 10000 cars running for 2 years. (Same dataset used in table 4.2) . . . 42 xii 1 Introduction 1.1 Background As new business models arise, one which has become popular recently is the sub- scription based business model. Similarly to how this model has become a new standard within the TV industry, it has slowly crawled its way into the automobile industry where additional costs such as insurance and service costs are also included [6]. Other consumer expenses such as gas costs and service costs caused by at-fault accidents are not included in the subscription model. While leasing a car offers similar options, the additional costs are typically not included. This means that the margin of error for subscription based services is significantly higher than the mar- gin of error for leasing models. Due to tight profit margins used in the automobile industry [21], an analysis that could predict these margins would be very beneficial. To illustrate the basic problem, suppose that a person named Ken is going to rent an Audi using a company’s subscription service for $800 every month. Ken consistently drives below the speed limit, barely misuses the breaks, never rapidly accelerates the car and does not have any close collisions with other cars. Meanwhile we have Leo who rapidly accelerates the car, constantly makes hard breaks and very often have close collisions with other cars. Should they really have to pay the same amount of money when they cause different costs to the company? It is true that the driver pays for at-fault accident service costs, but the automobile company who offered the subscription, still has to pay for some hidden operating costs, such as wear and tear, increased insurance costs and sending the driver a replacement vehicle during reparation and maintenance. This could sometimes even go as far as to eat up the whole profit margin that the automobile company initially had expected for the car. This means that a driver with Ken’s driving behavior in fact costs the company significantly less than a driver with Leo’s driving behavior. There are many automobile companies who have connected multiple sensors to their vehicles. Some of them also have these sensors connected to the cloud where snap- shots of the vehicle’s current state is sent to the cloud. This has enabled automobile companies to track driving behavior of the drivers. This data comes in many forms. The data could be anything from a snapshot that is given upon a request, to a list of attributes given for a specific driver within a specific trip (trip data). If an automobile brand has trip data available, is it possible to create an analysis that can accurately predict service costs and other operating costs for the vehicles? 1 1. Introduction 1.2 Aim The goal of our work is to provide an analysis that informs automobile companies of how driving behavior affect operating costs. The intention is to gather the results of the analysis, then discuss different measurements the automobile company could take to decrease operating costs, and see if the accuracy of the results is enough to make the approach feasible. 1.3 Scope The analysis will only be made through the use of machine learning models. Only a limited amount of machine learning algorithms will be tested, such as supervised algorithms. Supervised means that the input data is labeled into categories, then the algorithm will output predictions based on that data. Only the most functional machine learning model will be evaluated for the results. The only form of data that will be used is trip data, meaning any other form of snapshot grabbed by a vehicle sensor will not be considered. Additional bugs in the sensory data that bypasses data filtering may be discussed, but will be disregarded during the analysis. 1.4 Contribution Currently, the auto-motive industry lacks important metrics when determining the value of a car. Poor evaluations could lead to selling cars for less than what they are truly worth, or make future car prices hard to predict. As of now, many car companies only use mileage and the age of the car when evaluating it, leading to a sub-optimal evaluation. This work would contribute to a more precise evaluation of a car’s true value and future value, thus providing an analysis for car companies to utilize when deciding the value of their cars. 1.5 Overview The theory chapter first begins with an introduction to machine learning which is made for readers who have an academic background but lacks a machine learning background. The second half of the theory chapter goes through all of the important theoretical parts that was used specifically for the thesis. The method chapter explains the data generation process, the architectural plan for the creation of the machine learning model and how the models are stored on Azures servers through the use of its Kubernetes service. The results chapter provides an objective overview for the evaluation of the machine learning model that was created. The conclusion chapter begins with a discussion about different potential measure- ments the company could take with the results. It also interprets if the results are 2 1. Introduction accurate enough to be used for those measurements and if there are any improve- ments that could be made to gain more accurate results. 3 1. Introduction 4 2 Theory 2.1 Machine Learning Machine learning (ML) is a way for humans to utilize the fast computations that a machine possesses [8]. A ML algorithm improves the accuracy of its results step by step by rerunning a basic algorithm a large number of times on data. This creates a model, which is trained to recognize specific types of patterns. 2.1.1 Applications of Machine Learning Analyzing a small amount of data can easily be done by a human. For example, analyzing ten pictures to determine if there is a cat or a dog in the picture is easy for a human. But what if there were million photos to analyze? It would be impossible for one person to analyze that amount of pictures as it would take weeks or even months to complete. A computer with much faster computation capabilities can search through the photos and find patterns to determine the animal in the picture. The accuracy of the predictions of the type of animal depend on the quality of the algorithm. An accuracy as high as 97 percent can be reached [33]. ML is not just about analyzing data and checking the type of animal in a picture. More generally, ML can be used to find patterns and predict trends. This is called pattern recognition and with the help of information acquired from patterns an algorithm groups data into different categories. An example of a pattern could be when an animal has pointy ears, then it would be more likely that the animal is a cat than a dog. Machines are capable of working with larger quantities of data than humans, therefore ML algorithms can potentially find patterns that a human might not find. At an abstract level, pattern recognition based on ML proceeds as follows. First a large amount of data is collected. A fraction of this data is separated out to be used for testing. The remaining data is used to train some kind of network by gradually adjusting, usually a very large, number of parameters. Once the training has been concluded, the testing data is used to determine how successful the training was. A study has shown that ML can surpass humans in finding patterns, in, for ex- ample, speech recognition or image classification [33]. In the study the humans learned faster than the machine during the first hours of the learning process. But eventually the humans stopped improving. On the other hand, the ML algorithms continued to improve and surpasses the humans when dealing with a large amount of 5 2. Theory data. However when the patterns required a deeper understanding, the algorithms struggled to find patterns. The use of pattern recognition could be useful for finding patterns in human driving behaviours and potentially predict how behaviour affects the operating costs. 2.1.2 Data Filtering When working with ML models it is desirable to have large data sets to achieve a good quality prediction [52]. A large data set is desired as the model generally improves the more data it gets to process. One has to be careful with training too much on a data set however, as it can lead to overfitting the model. Overfitting means that the model has been trained so long on a data set that the model has essentially memorized it. Memorizing the data causes the model to struggle with new data and lead to erroneous predictions, which defeats the entire objective of the model. Data sets can contain data that is unnecessary or irrelevant for the prediction one wants to make. For instance, consider a data set on students in a school with their respective personal info and where the goal is to predict the students’ favourite subjects. Data on their grades would probably be relevant for the prediction, but their phone number would most certainly be irrelevant. That is why it is desirable to filter out some of the data so only the relevant part is left. This helps speeding up the process of training the model given the smaller data size. Filtering also helps with visualizing and analyzing the data as it is easier to see what is important when looking at a smaller set. Data cleaning is a part of data filtering that is important for getting valid predictions [47]. This is when one removes irrelevant data as previously mentioned. But also data that is contrived through mistakes and inaccuracy. This kind of data could be people in a survey that just answers quickly to be done with it. This type of behaviour makes their data useless as they did not pay attention to the questions. 2.1.3 Algorithms There are three types of ML algorithms: (1) supervised, (2) unsupervised and (3) reinforcement ML algorithms [4]. In supervised algorithms, the data is accompanied with labels denoting the desired result. These labels are used to train the model. In unsupervised algorithms, no such labels exist. Here the algorithms try themselves to partition the data into interesting groups [4]. In reinforced algorithms, the model creates its own data by recurrently simulating an environment. This process is done until the model has learned to take the optimal long-term action in order to obtain a certain goal. The choices made to reach the optimal long-term action are derived from the algorithm [4]. In our work, we will only consider supervised algorithms. This is because the work needs labeled data in order to predict a certain class. 6 2. Theory 2.1.4 Classification Algorithms vs. Regression Algorithms To understand which supervised machine learning algorithm would be the optimal, it is necessary to understand the difference between Regression and Classification. These processes are the two most commonly used with supervised ML. Regression is the process of predicting a continuous value that can have an infinite amount of possible variations. Examples of a continuous value could be, for instance, price, salary and height [19]. On the other hand, classification is the process of predicting a binary value, such as true or false, or zero or one. Consider a scenario where one would predict how the weather is going to be by feeding data where the data, month and hourly time is included on an hourly basis. A classification algorithm could predict whether the weather is going to be warm or cold during a given hour, where anything above 15 Celsius is considered warm and anything below 15 Celsius is cold. A Regression model on the other hand is able to predict the temperature in Celsius for a given hour. While the regression model may be providing a more specific number as output, the classifier model may have a higher likelihood of being correct in this case. The classifier model may also provide other useful insight such as predicting the probability of something occurring rather than estimating a specific number. This is highly useful when the attributes of the trained dataset only has a slight implication to the output rather than a strong one. Algorithms such as the linear regression algorithm only take advantage of one of the regressive processes. But there are numerous of other algorithms that can use both classification and regression processes [19]. 2.1.5 Linear Regression and Logistic Regression Algorithms Regression analysis is a technique which focuses on the relation between a dependant and an independent variable. There are three main uses for regression: find the strength of each predictor property, to recognize which variable relates most to the result value that is desired and forecast future values [51]. Logistic regression uses a function called Sigmoid. The Sigmoid function can take any value and map it to between one and zero, as seen in figure 2.1. Given that the values are between one and zero, the Sigmoid function is useful when dealing with probabilities or binary classification. The function will classify an element into a category or a discrete label based on a threshold [51]. To illustrate two problems where linear regression and logistic regression would be the appropriate algorithm to use, two different examples about brakepads are used. For linear regression, consider trying to predict the cost of brakepad replacements given the number of hardbrakes per month. Here there is hopefully a linear relation between the number of hardbrakes and the wear and tear on the brakepads, as seen in Figure 2.2. Finding the best slope and y-intercept would allow us to quickly estimate the brakepad replacement cost given a particular driver profile. On the other hand, consider the probability that a car ends up in a major collision requiring a replacement car as a function of average percent below or above the 7 2. Theory Figure 2.1: Example of a Sigmoid function [50] Figure 2.2: Example of a linear regression graph. speed limit the car is driven. Fitting the Sigmoid function to these data points would give us an estimate for the probability of a serious crash, as seen in Figure 2.3. Figure 2.3: Example of a logistic regression graph. Testing accuracy for linear regression is measured by loss, R2 and Adjusted R2 [53]. While logistic regression uses different methods such as accuracy, precision, recall and F1 score. 8 2. Theory 2.1.6 ARIMA Algorithm Figure 2.4: An example of how the Auto-regressive model ARIMA forecasts its values compared to normal linear regression model. The first picture represents the results made from the prediction of the linear regression model. The second picture are the results made from the ARIMA model. The brown dots represents the data sets that was used and the green dots represent the predictions that were made. Linear regression and logistic regression are popular to use as initial machine learn- ing algorithms. The reason behind their popularity is that they are both easy to implement and to understand. However when it comes to ML algorithms that are used in production, more sophisticated algorithms are used [31]. ARIMA is one of the most efficient simple algorithms [38]. The ARIMA algorithm is an amalgamation of two models called auto regressive [39] (AR) and Moving average [36] (MA) models. 9 2. Theory 2.1.7 Decision-tree Decision tree is a type of supervised machine learning algorithm. A unique aspect of decision trees is that they do not always yield a single answer. The trees can produce several options, which then a human can decide which answer to use. It is an efficient way of analyzing several alternatives and interpreting them; giving a better understanding of the model. Trees consist of a root node which is the tree’s base. This root can split up into several sub-nodes which themselves can split up to more sub-nodes. Each sub-node is represented by a decision, if that decision is fulfilled then the algorithm will follow that path. Furthest down in the tree are the leaf-nodes which marks the end of the sub-nodes and thus represents the tree’s answer(s). An example is shown in figure 2.5 in which an animal is identified. Figure 2.5: An example of a decision tree where it tries to identify an animal There exist two types of decision trees; categorical and continuous. In the categorical type the variables are binary in that they are yes or no questions. In continuous trees more variables are utilized than in categorical. It is not just one yes or no question, instead, there are variables that can have an infinite amount of different values, such as numbers. The categorical type is sufficient when dealing with binary questions but is lacking when the questions are more complex. Continuous trees are better to use with regression type problems as they both utilize non-binary values. An example of a continuous tree can be seen in Figure 2.6, where the variables in the decisions are non-binary and can change. 10 2. Theory Figure 2.6: An example of a continuous tree [7]. An advantage of decision trees is that they are simple to analyze, thus giving a clear picture on how to improve the model [30]. Another advantage is that trees can deal with both numerical regression types of problems and binary type problems. Some disadvantages with the trees are that they are not optimal for large data sets. They also often take more time when training a model than other algorithms. 2.1.8 Ensemble Method An ensemble method is a technique used in machine learning which combines mul- tiple weaker models to produce a single strong model [54]. These multiple models are called weak learners. The weak learners are typically simple ML algorithms, such as linear regression and logistic regression. Weak learners are often used in conjunction with decision trees and take advantage of the simple structure of the trees. Two common types of ensemble methods are: • Bagging: This method combines bootstrapping and aggregating, as seen in Figure 2.7. Bootstrapping is when you generate samples from a data set. The samples are created by randomly picking a subset of the larger data set. These samples are then observed in order to get an estimate of the entire data set. One can, for instance, calculate the mean of a sample and compare it to the mean of other samples, in order to get a clearer picture of the whole data set. Aggregating is when you have several models and combine their predictions. The idea is to use multiple independent models and take the average of their predictions which results in a single model. A decision tree is created from each sample and then used with different algo- rithms for each tree. The output of all the trees is combined into one, which is then used in the final model. The advantage of Bagging is that it enables a collection of poor learners to outperform one single strong learner. A disadvan- tage however is that interpreting the model is more difficult. This is because it is harder to determine how each individual variable affects the prediction, 11 2. Theory which makes it more difficult to identify poor variables [23]. • Random Forest (RF): The algorithm consists of several decision trees that work together as an ensemble. RF works by randomly picking a number of data points. It then builds a decision tree from these data points. These two steps get repeated a certain amount of times, depending on how many trees one wants to build. All the trees will then predict values based on new data points. The average of all the trees’ predicted values will be the final value for that data point, if it is a regression problem. If it is a classification problem, then the most often selected class will be chosen. The key difference between RF and Bagging is that RF only uses a random subset of features, these features include variables or columns. RF can, for instance, choose only columns in its samples, whilst Bagging considers all the features. An advantage with RF is that it can handle both classification and regression problems. It can also reduce overfitting in the decision trees. A disadvantage with both ensemble methods however is that they demand a higher computational power given the amount of trees that have to be built [23]. Figure 2.7: An example of the structure of the bagging method [26] 2.1.9 Adaboost Adaboost, short for adaptive boosting [41], is an algorithm that utilizes boosting and often with the help of decision trees. A common Adaboost technique is to use decision trees with only one level, these are called Decision stumps. Decisions stumps consist just of a root node and its two corresponding leaf nodes. This means stumps only use a single attribute when splitting the root. The Adaboost algorithm 12 2. Theory has the advantage that it is less susceptible to overfitting. This is because of the independent parameters that come with the ensemble structure. A problem with Adaboost however is that it requires a quality data set, one without big outliers and junk data. This means that the data have to be cleaned thoroughly before using the algorithm. 2.1.10 Gradient Boosting (GB) Gradient boosting is one of the most popular machine learning ensemble methods to use in price winning machine learning models [42]. GB takes advantage of weak learners. These learners could, for instance, be a decision tree or a linear regression model. GB takes advantage of weak learners by running them sequentially. It functions the same as Adaboost, in which weak learners learn from the previous learners errors [42]. The reason why gradient boosting is used over other ensemble methods like Ad- aboost, is its high compatibility with most simple ML algorithms. Since it provides accurate results for most machine learning algorithms it is also practical for most supervised machine learning use cases. Due to the strength of the gradient boosting algorithm it is typically compared with neural networks rather than other supervised machine learning algorithms. If one were to compare these two algorithms with each other it would be noticed that neural networks may be able to create a higher quality model whilst using large data sets. On the other hand Gradient boosting typically provides more accurate models whilst using a smaller data set. 2.1.11 XGBoost XGBoost is a decision-tree-based ensemble method that builds upon a customized version of the gradient boosting algorithm. It is known to be one of the best super- vised machine learning algorithms due to the low amount of training data one needs to create a model with high accuracy [29]. In the Adaboost algorithm, small decision trees called stumps are used, XGBoost use the gradient boosting algorithm together with stumps as weak learners. This allows these decision trees to learn from each others mistake by comparing the outliers with each other [29]. XGBoost algorithm has a similar flexibility to the gradient boosting algorithm in which any simple classification and regression problem can most likely be efficiently solved by it. On top of that, unlike other boosting algorithms XGBoost also uses regularized boosting methods which helps the model with avoiding overfitting. It can be implemented to automatically handle missing values in data sets which would normally result in errors or non-accurate results in other algorithms. One can also split up the training over a period of time, so one can pause the training of the model and then resume the training later. Consider a scenario where one would have dataset on how many minutes each worker has interacted with a tab for each day (interact data), and a dataset on how much 13 2. Theory work each worker has done for each day (work data). A hypothesis can now be made claiming that the amount of minute spent on the correct process tab has a high implication to the amount of work that has been made. One can now use the interact data as input and work data as output to train a machine learning model to potentially predict how much work each employee will do. Since the amount of data that is provided is most likely not more than a couple of thousand rows, an accurate machine learning model that is fast, flexible and easy to setup is therefore suggested. While there are better alternatives than extreme gra- dient boosting, they typically need a high amount of modification to work properly. The difficult part of XGBoost is to tune its hyper-parameters to fit the data set that is provided. Hyper-parameters are the parameters that control the model’s learning process and decides the model’s objective. Since gradient boost learners is hard to properly understand, testing a series of assumptions by the use of tuners from certain libraries is necessary to prevent the machine learning algorithm from under- or over-fitting [27]. Some of the most important hyper-parameters in XGBoost: • Booster: One can choose between SGtree booster or a SGregression booster depending whether one wants the output to be a classifier or a continuous value. • Objective function: One can choose between softmax, which allows multiple classifications, or softprob, which predicts the probability of a data point be- longing to a certain class. • Eta: Is the learning rate that the machine learning algorithm uses. The default value is at 0.3. • Max_depth: The maximum depth of a tree in the weak learners. If it is too small the model will end up being inaccurate and if it is too high the model will be overfitted. • Min_child_weight: Is used to control the overfitting, if it is too high the model will be underfitted and it may end up with inaccurate results. 2.2 Checking Quality of Algorithms When a machine learning model has been trained and is ready to be used, it is necessary to evaluate how accurate it is. This section discusses useful and common metrics used when evaluating models. 2.2.1 Evaluation Metrics • Accuracy: Evaluating the accuracy of the predictions by checking how many times they are correct, this is good when dealing with binary predictions. For instance, when checking if a picture contains a cat or dog. However, given the binary nature of this metric it is not useful when measuring the accuracy of continuous quantities. For example, when measuring prices, as a difference of 14 2. Theory hundred SEK would be treated the same as a difference of one SEK. • Root Mean Squared Error (RMSE): A popular evaluation metric when dealing with regression problems. It is the square root of the average of squared errors. RMSE = √∑N i=1(predictedi−actuali)2 N , where N is the sample size. If the formula returns zero then the model is perfect as the predictions match exactly the actual values. If it is not zero then a lower number means a better model, as more errors lead to a higher value. This formula is good with regression problems but can be greatly affected by outliers. There are several variations of this formula, one can for example use the log function with RMSE. This version is useful when dealing with large numbers as the larger difference would not be penalized as much. The log function also scales down potential outliers, thus significantly limiting their negative effect. • R-squared: Another useful regression metric as it measures how good the model fits. R-squared is meant to show how the variances of variables affect other variables. The ideal value is one and the closer to one it is the better the model. It works by comparing the model with a defined baseline which can be a simple model. This metric evaluates how good the model is compared to a simple model that just predicts values based on the mean value from the training set. R-squared has some limitations however, as the value does not decrease when adding new variables to the model, even if these variables are redundant and poor. The value can only increase or remain the same, which is not desired if the new variables are of poor quality. • Adjusted R-squared: This version of R-squared considers the new variables which helps ascertaining if the new variables increases the fit of the model or not. Adjusted R2 = 1 − [ (1−R2)(n−1) n−k−1 ], where n is the amount of data points and k is the number of independent values. Adjusted r-squared is better when dealing with several variables in the model as the value can actually be affected by new variables, thus being more accu- rate. If there is an assurance that there is no bias and only two variables, then r-squared would be better, otherwise adjusted is preferred. • F1 Score: This metric uses recall and precision as it is trying to get the best of them both at the same time. Recall is the amount of correct predictions divided by the amount that should have been correctly predicted, it is the percentage of results relevant to the model. Precision is the percentage of how many of the predictions are correct. For example predicting if a silhouette is of a human, the model identifies five humans out of eleven human silhouettes and some non-human silhouettes. Only three of the model’s five predictions were correct while the others were false positives. The recall in this case is three out of eleven, while the precision is three out of five. 15 2. Theory F1 Score uses the concepts of precision and recall to its advantage by taking the harmonic mean of the precision and recall. F1 = 2 ∗ precision∗recall precision+recall This is good if either precision or recall is very small as it will result in a low overall score because it balances the two metrics, hence making it accurate. It also represents many aspects at once because it combines other metrics into just one. F1 is used to evaluate binary classification problems, which classify data into either negative or positive. It is also better to use with imbalanced data than accuracy, as it can measure the accuracy of the positive class and not just the overall accuracy of all predictions. • Brier score: This score measures the accuracy of a model handling classification problems with probabilistic predictions. It works by calculating the mean squared error between the predicted probability and the actual result. Because of this, the score can only be in the range of 0 to 1, meaning that a lower score means a more accurate model. • Area under the ROC curve (AUC): This metric combines the area under the curve metric with the receiver operating characteristic curve (ROC). ROC is the performance of a classification model at every threshold. ROC utilizes two parameters; true positive rate and false positive rate. True positive rate is the probability that a class 1 driver will be predicted as class 1. On the other hand, the false positive rate is the probability that a class 0 driver will be predicted as class 0. These parameters keep track on if the predictions are of the correct class. AUC is the area under the ROC curve, as seen in Figure 2.8. AUC measures performance of the thresholds in the classification model and shows how good the model is at distinguishing between the classes. A good AUC score indicates that the model does not make many false predictions, for instance, predicting class 1 as class 0. The score ranges from 0 to 1 and a higher score means a better result. 16 2. Theory Figure 2.8: An example of a ROC curve with the area under the curve marked in black [20]. 2.3 Car Evaluation 2.3.1 Usage-based insurance Usage based insurance (UBI) is a type of car insurance where cost of the insurance depends on the drivers driving behavior [28]. Similarly to how trip data is measured, the usage based insurance also focuses on measuring driving behavior factors such as mileage, speed, hard braking, acceleration and time. An analysis is made to find the correlation between the driving behavior attributes and the additional service costs which is then used to estimate the price of the insurance cost. The pricing of this type of insurance provided the drivers with a 5% to 10% decrease in costs for just sharing the data and up to an 20% to 30% decrease in costs depending on the driving behavior and the insurance company that was used. 2.4 Deployment of Machine Learning While machine learning is being discussed and actively researched, deployment of machine learning services in industry is not brought up very often. According to a study conducted at Algorithmia [3], a machine learning operations platform com- pany, out of 750 companies that was developing machine learning solutions, only 51% of them had ever deployed one single machine learning model in practice. That means 49% of them have nothing practical to use and does not utilize machine learn- ing to the fullest extent. There exists a lot of information about different machine learning algorithms, how they work and which one is the most effective but there is not much information about the other steps in deploying a machine learning model. For example, what data is being collected, how the data can be used to predict what it is desired, how the filtration of the data will be made, how the model will update itself after its first deployment, how the model is going to be implemented in the 17 2. Theory framework that is planned to be used, etc. 2.5 Micro-services In order to deploy a ML model in practice, one will be required to consistently train new models and deploy them. Running the ML training on the same hardware as the framework that one want to use it on would not be recommended since the hardware would most likely be slowed down when the ML framework is training a model. This is where a micro-service architecture becomes interesting. In a micro- service architecture all the different frameworks are hosted on separate servers and communicate through a REST-API [32]. A REST-API is an interface that enables interaction between applications and services, the name stands for representational state transfer. Once the REST-API is set up correctly, it can then communicate between the micro-services through so-called HTTP POST and GET requests. One example of this is when the ML framework has finished creating its model, it can then send and retrieve information to the other frameworks using GET and POST requests. Micro-services are used today for various reasons. For example, when producing a web application that both needs to perform machine learning tasks while still working like a normal website with fast loading. The web application requires a server that has both high bandwidth and uses CPU:s that are capable of running quickly for a very short amount of time. The CPU should be available immediately but may be used for other things for most of its lifetime. The framework that can create the machine learning model requires a lot of CPU and GPU power for a few minutes typically once a day or once a week. If one were to train a model on the web application server the website will be significantly slower when the server is training the model compared to when it is not doing so. Most cloud services typically offers resources that are designed for specific use cases. A CPU core that is designed for app services could be half the speed and cost one tenth of a CPU core that is designed for machine learning services. Thus, dividing the resources of the services into their own servers may both solve resource management problems and also be less costly. Other pros of micro-services are that they allow developers to use multiple platforms for the same application. Suppose that a group of developers want to create a web application that can provide ML solutions. If the group want to have a monolithic architecture that uses the same framework for the whole application they may be forced to use a python framework such as Django [22]. Suppose that these developers all have a good background in ReactJS and want to continue using it instead of switching to Django. The micro-service architecture solves this problem since they can code the web-application on the REACT platform and then create the machine learning models with a framework such as python. With micro-services it is much easier to make future updates for an individual micro- service after the rest of the services are already finished. If there, for instance, have been teams working on individual parts of an application it would be very difficult 18 2. Theory for one of the teams to update their part after the other teams have finished their parts. If each part however is split up into its own micro-service the team can easily update one of the parts later since they only interact through outputs and inputs. While micro-services are very beneficial in many cases, they come at a cost. A downside is that one has to make sure that the output for the microservice will be compatible with all of the other platforms that it is made for. Another downside is that it can easily end up in input and output parameters that are not compatible with the use case that it was made for. For example, suppose a development team is making an API that is designed to let others gain driving behaviour data. The people that wrote the API did not have machine learning in mind when they wrote that micro-service. They could only give driving behavior data of how one car is behaving exactly when the API request is made, rather than providing the users with historical data of all of the cars directly. 2.5.1 RESTful API A RESTful API is a programming interface which allows reading, writing, updating and deleting information from a server through the use of web requests and web responses [18]. The RESTful API is typically connected to a web domain to make communication between two sources possible. The standard format that is used to send information is by adding input values to the website that one would be connecting to. Typically these inputs are received in a JSON language format and are then processed by the framework that the API is created with. Suppose that a person has a fully customizable RGB light at home and that the person has connected and deployed a RESTful API to it. The person can now create a page on a website with the RESTful API framework and update the color of the light through that page. Since the RESTful API of the lamp already has its values initiated, he must create a page that takes in ’PUT’ requests to update the lamp using a format such as: { "ID" : Integer "RGB1" : Integer "RGB2" : Integer "RGN3" : Integer "isOn" : Boolean } Suppose that the person knew that the ID for the lamp was 23 and that the person wanted red colour on the lamp, then the API request would have been the following: www.domainName.com/api?ID=23&RGB1=255&RGB2=0&RGB3=0&isOn=True This would have sent the following JSON request to the API: 19 2. Theory { "ID" : 23 "RGB1" : 255 "RGB2" : 0 "RGB3" : 0 "isOn" : True } Once the request has been made, the website will automatically run a function with the provided JSON elements as the functions input parameters to update the state of the software’s instance. For every GET, POST, PUT, DELETE request that is made, a code made by the creator of the API will process that information in any way the programmer sees fit. 2.5.2 Swagger UI Since it may be difficult to understand how an API call works for a specific page, most of the RESTful API frameworks have built in addons such as Swagger UI, which enables external users to see all possible API calls in a more clear manner. In Figure 2.9, an example of the Swagger interface can be seen. This makes it possible for API frameworks to have additional builtin features, such as providing a description of what every API call does. API:s also let the programmer modify the response one gets after a request. The response can prevent showing the user sensitive information whilst at the same time informing the user that everything is going well. Other features which Swagger UI enables is to create many different types of forms which the users can enter, showing what base values that is going to be sent if not entered. Swagger UI can also show the users which type of errors that could occur and can let the developers inform the user of every type of error that is occurring. 20 2. Theory Figure 2.9: Is a screenshot of the API documentation that is automatically generated by Swagger UI. In order to enter this site, one needs to change into /docs. 2.5.3 Docker Docker is a platform for packaging applications into containers and managing them safely [48]. When an application is complete it may require multiple libraries in order for the application to work. These libraries must be installed on the com- puter that needs to use the application. It can be tedious and difficult for the user to find everything and install it in a correct way. The application may demand certain settings in the system as well, further increasing the difficulty of running the application. These problems are what Docker handles with its containers and images. A Docker image is a small piece of software that contains the code of the application and all its requirements, such as libraries that needs to be installed. This enables the application to run quickly in any environment, without having to adjust the environment’s settings. A Docker container is an instance of this image and enables the application to run with all of its requirements ready. There can be multiple running containers of a single image and each one can be managed separately. The advantage with containers is that they are lightweight in size, they are fast to deploy and does not take up much memory. Virtual machines which is the alternative to Docker containers, are much slower to deploy and demands more memory in the system. This is because a single virtual machine requires a full copy of the operating system, whilst containers can share the kernel. 21 2. Theory 2.5.4 Kubernetes Kubernetes is a platform for managing containers and workloads [49]. It is meant to simplify the use of multiple containers. Handling containers is not an easy task when they start to increase in numbers. If a container goes down for instance, one would want another to start immediately to guarantee that there is no downtime for the application. Kubernetes can fix this problem by creating clusters of containers and manage the clusters so that they run as expected. If a container goes down; Kubernetes can start another one immediately, without the system owner having to manually start one. Containers that have failed and stopped running can be restarted again by Kubernetes if possible, otherwise they get terminated so that a faulty container cannot disrupt the application. Kubernetes also handles work- load balancing, by splitting the traffic between several containers, resulting in them sharing the workload. All of these resources make handling containers efficient and simple. Since kubernetes may manage other types of units that is not a containers, kuber- netes calls these units Pods [12]. These pods run on a virtual or physical machine called a node [13]. This means that the node could either be a part of a computer with its own operating system or the whole computer itself. These nodes can either be rented from a cloud service provider or hosted on a physical machine depending on the needs of the users. One general issue which occurs in hosting is balancing the amount of resources needed for the use case that it is intended for. One example of this was the back-end servers of the consumer-electronic company Elgiganten, who during a black-friday weekend in the year 2019 had both their webpage servers and their cash register systems overloaded due to too much traffic [25]. This resulted in most consumers not being able to buy anything either from their physical stores or their e-commerce websites. While it is true that having too many servers available at all times might be unnecessary for a company such as Elgiganten, temporarily gaining more servers with a click of a few buttons would have been extremely useful. What kubernetes framework allows you to do is to easily scale the resources, both through replication of container instances but also changing the amount of resources used by each container on an hourly basis. If Elgiganten had their servers on Azure kubernetes they could have just ordered more computing resources for a few days and then disabled them afterwards. 2.5.5 Kubernetes Services In order to make a pod service available externally to other people it is necessary to connect a set of pods into a Kubernetes network service [14]. There are three types of network services: (1) ClusterIP which is a service that offers a single internal IP-address that redirects to the all pods connected to it, (2) NodeIP service which offers an external IP address to a set of pods which can then be used by online by other people, and (3) a Loadbalancer service which offers both types. The service therefore enables connecting a set of pods into a single IP-address and then the pods that the service chooses is dependant on a policy that is set by the person 22 2. Theory Figure 2.10: Example of a kubernetes architecture with a webpplication service which uses two replica pods connected to a database pod. who created the pod. An example of such policy is called limit ranges. This policy limits the computing resources a certain container can have. The policy prevents a container from taking all of the resources available and starving other containers in the process. 2.5.6 Kubernetes Ingress While it may be possible to host a website by using an external IP address given by a Loadbalancer or a NodeIP service, it should only be done for testing purposes. To prevent unnecessary security vulnerabilities on a kubernetes cluster, another feature called Ingress exists which is an API object that enhances the external communication between a server and a domain [10]. When entering most webpages it is typically expected to have TLS or SSL security when connecting to the domain, this is something which Ingress helps to solve. TLS and SSL stands for Transport Layer Security and Secure Sockets Layer respectively. Once an Ingress object is set up, it can then run on an Ingress controller to make communication between a pod and a webdomain possible. 23 2. Theory 24 3 Methods 3.1 Machine Learning Model Training Platform The goal with this machine learning model is to create a micro-service which can, through the use of API requests, predict operating costs of a vehicle. The FastAPI platform is used both for the training of the model and for the creation of the API. FastAPI is hosted on Azure’s Kubernetes service which is Azure’s most popular platform to host applications on [11]. Kubernetes also offers APIs connected to the model which they label as endpoints. The ML algorithm has trained its model using driving behavior data (trip data) which is meant to measure how the driver has behaved for each trip. Initially the company had promised a dataset filled with actual trip data to be used for the thesis. Since the company that created the tools that gathers the trip data had delayed their release date of their product by a year, the data that was received in its current state lacked most of its core features such as hard braking count, over- speed count, rapid acceleration count, collision warning count and aggressive turn count. The trip data that is used for the ML model is therefore generated data that is designed to mimic the trip data used in the final release of the companies data gathering tool. 3.1.1 Generating Data Generating data that covers driving behaviour needs to be accurate enough so that it is feasible in real life scenarios. A driver can not drive a million kilometers per hour as that would not make sense, therefore there had to be certain ranges decided beforehand on what values the data could be. The data was generated with the programming language Python. The data parameters were given by the company and were the real parameters they use in their cars. These parameters stems from the measurements that usage based car insurance companies are using [28]. They are meant to reflect important factors when driving that can showcase the risk of accidents that can cause additional costs to the vehicle. The parameters generated were: • VIN : This is the identification number for each car and stands for Vehicle Identification Number. This was generated in the same format as real VIN:s but even if the fake VIN matches a real one, they have nothing in common. 25 3. Methods • Date_(days) : This is the data that covers the date in days and is meant to show which day in the timeframe the car has made a trip or several trips. Each day could have several trips and the amount of trips was decided by a random distribution based on a statistic on 2.24 average trips per day [5]. The data was simulated on three months or 90 days, with random trips during each day for each car. • TimeZoneLowRisk, TimezoneMediumRisk, TimeZoneHighRisk : These were binary attributes indicating if a certain trip is driven in a timezone with a low, medium and high risk of accident. If one of these attributes are 1, the others must be 0, which means the data does not cover two timezones in one trip. The risk was based on the most dangerous times of the day to drive and was received from the company. For instance 00:00 to 03:00 was considered to be most dangerous and the high risk zone, medium risk at 04:00 to 08:00 and low risk at 09:00 to 23:00. • TripMeterKm : Covers how far the car drives in kilometers during a trip. The parameter is randomly generated from a normal distribution with hard coded numbers that is based on the type of driver the car has. The distribution was based on the European average trip length of 20 kilometers [16]. Since every other attribute is correlated with the amount of kilometers driven, all attributes is multiplied by the amount of kilometers driven. • OverSpeedCount : This attribute covers how many times during a trip a car has driven faster than the allowed speeding limit. The speeding limit on the car’s current road can be detected by the car and the car’s sensors keep track of every time the driver is overspeeding. The acceptable limit that was decided was that less than three was acceptable, as most people drive faster than the limit at least a few times. If the amount was between three to five, then it would be a medium risk driver with a slightly higher risk of accidents. Everything over five times would be considered dangerous driving and is labeled as an aggressive driver with a higher risk of accident. • HardbrakingCount : Tells how many times the car performs a hard braking maneuver, for instance, when braking fast in order to avoid a collision. The acceptable amount was set to a maximum of two times during an average trip of twenty kilometer, if the attribute exceeds 2 then it would be considered aggressive driving. The number two is arbitrary and could be changed, but it was decided that it was a reasonable number as a good driver should not have to brake hard often. • RapidAcceleration : Shows how many times a car accelerate rapidly, with rapidly meaning almost full throttle. Sometimes a driver have to rapidly ac- celerate, for instance on ramps to a highway, it is therefore acceptable with a few of them. With ramps in mind the acceptable amount is three, everything over is considered aggressive, as during an average trip there are not that many ramps to a highway. • CollisionWarnings : This is an attribute that covers how many times the car have to brake on its own, in order to avoid collision. The car has a sensor that 26 3. Methods detects if collision is imminent and if true, the car will brake on its own. One time is acceptable during an average trip, this is because sometimes the car in front can brake suddenly and the sensor has better reflexes then the driver and will brake to avoid collision. If it happens more than once, it may indicate that the driver is often too close to cars in front which will be classified as aggressive driving. • AggresiveTurnCount : The amount of times a driver turns too fast, which can be measured in angular velocity by the car. This can be very dangerous behaviour so only one time is acceptable per trip, as fast turning can lead to driving off the road and much more, it shows that the driver does not plan ahead when driving. One time is acceptable, as every driver could have to turn fast in order to avoid an unexpected event in some cases, for instance, a child running out on the road. A good driver could just make a mistake as well once, but more than one time could indicate aggressive behaviour. 3.1.2 Data Generation Algorithm To gain a proper understanding of the code generation, an abbreviated snippet of the data generation code is shown below. Where for each car, a VIN ID is first generated, then the program randomly generates a number which represents which type of driver the car has. Each car will then loop for 90 days and then use a trip counter function that randomly generates how many trips the car will have that day. The trip will then generate normalized values which are set based on the driver type that the car has. This means that passive drivers have good driving behavior, average drivers have average driving behavior and aggressive drivers have bad driving behavior. carList = new Dataframe() For each car: Vin = vinGenerator() Profile = randomNumber(min = 1, max = 10) If (Profile > 1 && Profile < 9) # Average While(date < 365) For in range (genTripAmount()) Trip = New trip( "TripMeterKm" : max(0, numpy.random.normal(20,10)), #... ) carList.insert(trip) Date += 1 If (Profile == 1) # Agressive # ... If (Profile == 10) # Passive # ... 27 3. Methods 3.2 Architecture Plan This section explains the architecture of the machine learning process that was built with Azure Kubernetes Service. The architecture, as seen in Figure 3.7, outlines the different steps in the process in the form of boxes, with each box representing pipelines starting from the creation to the deployment of a machine learning model. 3.2.1 Step 1, Data Processing To create a model that can produce good results one would need to filter out as much false data as possible. Example of false data could be that the automatically assigned value when a value is NULL may be unreasonably high or low. Other factors that may give unreasonable values are caused by bugs with the sensors that gathers the data. So data that has some missing values in some rows or columns may be removed as well. Once the data has been filtered, the data will then be split up into two categories; one for training the model so it can learn, and the other for testing the model. The testing is done after training is completed and is meant to give an unbiased evaluation of the final model, by giving it new data and evaluating the predictions derived from it. The ratio of splitting the dataset can vary and depends on the size and type of the dataset. If the dataset is medium sized, for example less than one million, then a 70%- 30% split is recommended [1]. In smaller datasets you want to have around thirty percent for testing to get an accurate picture of the data as you want the testing to be representative of the whole dataset. A smaller number than thirty increases the risk of many outliers that skews the results, which one wants to avoid for accurate results. The testing percentage cannot be too high however as that could lead to underfitting. Underfitting means that the model cannot learn the patterns correctly due to the short amount of training, and therefore lead to poor predictions. For datasets larger than a million, the split can be 99%-1% , given that one percent of a million should be enough to accurately evaluate the model. 3.2.2 Step 2.1, Choosing Algorithm and Training the Model Once the training datasets is filtered and split up, one can use the data together with a machine learning algorithm to train a model. A supervised machine learning algorithm is used since this project is taking advan- tage of the labeled data of the driving behavior. The algorithm that is chosen is typically dependant on the input and the output types of the data. If one for instance would want the output to be an object it may need a classification algorithm while if one wants the output type to be a continuous value it may need a regression algorithm [4]. Another key factor when trying to choose an algorithm is the problem that the machine learning model is 28 3. Methods Figure 3.1: Is a sketch of the Architectural plan for the machine learning process along with its process to deploy the machine learning model in the API so that it then can be used for other micro services. Each container either represents an object value or a Pipeline and the arrows represent the values required in order to make the pipelines to start. 29 3. Methods going to solve. That solution is typically heavily tied to a methodology which only a set number of algorithms may use [4]. 3.2.3 Step 2.2, Algorithms to Choose from Since the chance of a vehicle requiring additional service cost is slim, predicting the likelihood of something occurring would be significantly more understandable than predicting an expected service cost score. The output is therefore a binary classifier, returning a zero if total service cost for a car is below 10.000 SEK during a three month period and a one if it is above 10.000 SEK. XGBoost is the machine learning algorithm that is used for this thesis because of its library documentation and accurate results without requiring large datasets [9]. The dataset used to create the machine learning model was highly imbalanced. Handling imbalanced data is a process that is typically highly demanding, but it is less demanding with XGBoost than many other algorithms due to its highly flexible library. If enough time is available to properly configure a machine learning model that supports an highly imbalanced dataset, then there may be a possibility for improvement in ROC and AUC scoring accuracy [20]. 3.2.4 Step 3, Testing the Model When the model has been trained and is complete, one must first test it so that it is assured that it works properly. Usually goals of testing include ensuring that a software behaves as expected and finding if there are any bugs, but with machine learning more steps are required. Given that the model learns a certain logic de- pending on the input data one must ensure that this logic stays consistent given new data, whilst guaranteeing that the logic is accurate. Therefore it is not enough to just test the model to find any faults, one also needs to evaluate it in order to make sure the model’s predictions make sense. 3.2.4.1 Testing Parameters The XGBoost classifier has a lot parameters that can have many different values, it can therefore be difficult to know which combination of these parameters gives the best result. We found the best parameters by testing a number of combinations of the parameter values and used the parameters that gave the best prediction in the final version. The combinations can be seen in Figure 3.2. In order to save time; only the uncertain parameters max depth and minimum child weight were tested, as seen in Figure 3.3. The ones we could logically deduce were optimal did not have to be tested. For instance, the parameter called objective, which decides the learning objective of the classifier and affects the predicted output. The desired output was a probability and only the objective "binary:logistic" fulfilled this, therefore only one acceptable setting for that parameter existed and no others needed to be tested. 30 3. Methods Figure 3.2: Depicts an example of testing that was made using different combina- tions for the maximum depth and minimum child weight of the trees. Figure 3.3: The combination that gave the best prediction is returned by the function and is thus used in the final classifier. 3.2.5 Step 4.1, Deploying the Model to API When the model has been trained and evaluated it is ready for use, given the micro- service structure of the project it is recommended to have it deployed on Kubernetes. Kubernetes normally allows internal communication between microservices by hav- ing API:s that communicate within Kubernetes (endpoint). This makes communi- cation possible between the microservices since they can through these endpoints make API calls to run, configure and edit other microservices within its system. If a developer would have a microservice architecture where we have a webapplication microservice made with the Framework react, a form can then be created where one uses the forms values as input. The webapplication microservice may then send an API call to a ML microservice where the inputs of the form is the input of the API call. The ML framework may then respond with an output which is then sent and processed by the webapplication in the end. To get a more clear picture of this explanation, see Figure 3.4. Since the goal of this project is to only build an API and to let other people learn from our API, there are two versions of the project. The first where the project is used as a External API uploaded to a website available to everyone, as seen in Figure 3.6. The second where the machine learning microservice is designed to be used internally for a website that may be public, an example of this is seen in 3.5. This allows readers to both test and see what types of use cases there are for an machine learning micro-service so that the readers themselves may be encouraged to try it out in the future. 3.2.6 Step 4.2, Setting up the Kubernetes Deployment Since online applications leveraged to users typically requires different resource de- mands depending on the time of day, it is typically recommended to use Kubernetes 31 3. Methods as an deployment alternative. Kubernetes has different types of units, the smallest type of units being a pod. These units are either ran on an external proxy server from a cloud provider or ran on a virtual server from a cloud provider called a node. All Kubernetes units can either be deployed from the command-line or through the use of YAML configuration files. These deployment files create Kubernetes pods by either importing a docker image available on the computer that the command-line is running on or by importing it from a container repository such as Dockerhub [15]. Here is an example of how the configuration of the master thesis pod deployment file looks like: apiVersion: apps/v1 kind: Deployment metadata: name: cost-estimation labels: app: cost-estimation spec: selector: matchLabels: app: cost-estimation replicas: 2 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 template: metadata: labels: app: cost-estimation spec: containers: - name: cost-estimation image: dockerName/cost-estimation:0.2 imagePullPolicy: IfNotPresent ports: - containerPort: 8000 name: http protocol: TCP livenessProbe: httpGet: path: / port: 8000 initialDelaySeconds: 5 timeoutSeconds: 1 32 3. Methods periodSeconds: 600 failureThreshold: 3 resources: requests: memory: "64Mi" cpu: "50m" limits: memory: "1000Mi" cpu: "1000m" As seen in the snippet above the configuration file starts off by declaring which type of file it is. Then there is metadata where the instance is named and a label is added to that instance which may later be used to identify the instance of the deployment. The first set of specifications of the configuration file mainly revolves around how many pods are needed (replicas), how to handle a pod when it crashes or updates (strategy) and what the pod is going to be (spec). The spec configuration is mainly focused on configuration of the pod itself. It includes the location and port connec- tion of the container image that it is going to import from, the expected resources that it is going to use and how to handle timeouts if they were to occur. 3.2.7 Step 4.3, Setting up the Docker Container When any form of server is running whether it is a webapplication or a database, it is typically running on a specific operating system with a specific platform and typically requires a few command prompt commands to run. Docker automates this by running any type of application on smaller versions of MacOS, Linux or Windows inside of a container. It then autoruns a set of commands which the developer chooses through the use of a Dockerfile and then based on the configurations of the Dockerfile creates an image which can be used to run a new instance of the server. These images can then be used, deployed and autorun using Kubernetes either by importing the image from a repository such as DockerHub or by uploading it directly to the source of the Kubernetes instance. Here is how the Dockerfile for the service cost estimation microservice looks like: # Importing python version 3.10 FROM python:3.10 # Copying the current directory over to the image folder # /user/src/app COPY . /usr/src/app # Changing base directory of the image to usr/src/app WORKDIR /usr/src/app # Installing all python libraries for FastAPI application RUN pip install --no-cache-dir -r requirements.txt 33 3. Methods # Making the port 8000 visible for kubernetes. EXPOSE 8000 # Running a command to create a FastAPI instance on port 8000. CMD uvicorn working:app --host 0.0.0.0 --port 8000 3.2.8 Step 4.4, Connecting the Domain to an External IP To connect a pod into an address that is available on the internet, the pods must be connected to a Kubernetes Network Service. The pod that is used in the thesis for instance, is called LoadBalancer which both offers an internal IP address which other Kubernetes services can connect to but also an External IP address that is accessible to the web as seen in Figure 3.8. An external IP address may also be connected to a domain by connecting the domains to a DNS A record, then to the external IP address as seen in Figure 3.9. 3.3 Tools This section showcases the tools used in the project. 3.3.1 Scrum The organization of the project was done according to the agile framework Scrum. Scrum is a framework which is built for complex adaptive problems. Instead of the old way of organizing projects where you only showcase a final product at the end of the project, the core principle of Scrum is that the work is divided into small time fragments. Each fragment should result in something that can be delivered to a customer instead of only delivering the final product at the end of the project. Each time fragment is called a sprint, the time period of a sprint can vary but in this project a length of two weeks was chosen. Two weeks is long enough for a substantial amount of work to be done, whilst short enough for being able to adapt in case something changes in the project. Before each sprint, the work to be done during the sprint is planned. At the end of a sprint, a sprint review takes place in which the project members and company supervisor reviews the work done. In order to display the planned work a Scrum board was used on an application called Trello, which shows the tasks that need to be done and displays the progress done on each task. Scrum was used mainly because of its flexibility to adapt, which can occur during the frequent meetings. It also enables transparency for the company supervisor, as he could follow the process clearly and provide feedback during these meetings, as well as checking the progress on Trello during a sprint. 34 3. Methods 3.3.2 Database The large amount of data that the machine learning model processed needed to be stored somewhere, therefore a database was used. The database was built in Visual Studio Code with a database management system called SQLite [34]. Lite in this case means that it is lightweight in the way it is setup, as it is simple to setup and require only a few resources. A good thing about SQLite is that it does not require a server to run, which helps keeping the access of data from being complex. Given that no server is required, installation and configuration of SQLite is not needed for the starting process to work. It is also self-contained in the sense that it does not need much support from the operating system, this makes it flexible as it can work in most environments. In the database you can add, delete and update objects that are needed. The data was stored in tables with rows and columns. SQLite utilizes something called dynamic types for its tables, which makes it so that you can store any type in a column, no matter what the data type is. This is useful when dealing with several different data types in your database. Swagger UI could utilize the functions from Visual Studio Code that affects the database in real-time, which made it simple to update the data. 35 3. Methods Figure 3.4: A picture which represents an example of how a webapplication can communicate internally to a machine learning API by through GET and POST API calls through the use of the JSON language via a ClusterIP service. 36 3. Methods Figure 3.5: The ideal Kubernetes architecture that is suggested to be used in a real world production case. Where the machine learning platform is made on Azure ML instead of FastAPI and where other microservices use the machine learning framework internally. To then afterward forward that internal connection into an external one with the added security comming from the Ingress service. 37 3. Methods Figure 3.6: The Microservice architecture that is deployed for the master thesis. Figure 3.7: Is the architecture of the microservice depicting the connection with the company’s REST API and its front end web application. Figure 3.8: Running a terminal command to get all the services which exposes all of the IP address and ports that are open. 38 3. Methods Figure 3.9: Connecting the external IP-address of the LoadBalancer used in the master thesis to the A record in the DNS settings of a domain. 39 3. Methods 40 4 Results As mentioned in the introduction, the goal with this thesis is to make an analysis that can protect assets within the automobile industry from operating costs caused by driving. This is done through the use of a machine learning micro-service which uses a model that can predict operating cost based upon a vehicle drivers driving behavior. In order to ensure that the predictions of the driving behavior is valid, an evaluation of the machine learning model will be shown. Once the predictions are available, car companies can analyse how customers drive their cars. How this analysing can be done will be discussed in chapter 5. 4.1 Machine Learning Model Prediction results The evaluated results of the model is shown in Table 4.1, containing six different generated datasets with different amount of trips per car, different amount of cars and different likelihoods of causing service cost. To balance and to find the best choice for the priority between having false positives and true positives, a ROC curve is used, as seen in Figure 2.8, where each value shows the accuracy for false positives and true positives for each type of balancing that has been made. The AUC score is a scoring of the flexibility of prioritizing true positive and false positives without making the F1 score worse. See Section 2.2.1 for more information about these metrics. The imbalance ratio in Table 4.1 represents the amount of cars that does not require additional operating cost for every car that requires it. An imbalance ratio of 1:30 would therefore be 30 cars that does not require additional operating cost for every car that does require additional operating cost. The imbalance ratio is dependant on the amount of trips each car has driven since the risk of needing additional service cost increases for every trip that is made. Since the dataset is generated rather than being real, different likelihoods of causing operating cost were also tested to see if that would potentially increase the evaluation scoring of the dataset. The imbalance ratio is also dependant on the likelihood of actually needing operating cost, if the likelihood of operating cost factor was to go from 1.0 to 3.0, the imbalance ratio would, for instance, go from 1:27 to 1:9. 41 4. Results Time driven per car (days) Amount of cars Likelihood of operating cost Imbalance ratio precision f1 score AUC 730 10000 1.0 1:27 0.91 0.71 0.77 730 10000 3.0 1:9 0.93 0.64 0.72 365 10000 1.0 1:40 0.91 0.41 0.58 365 20000 1.0 1:40 0.90 0.60 0.60 180 10000 1.0 1:75 0.96 0.20 0.57 90 10000 1.0 1:122 0.99 0.00 0.53 Table 4.1: Illustrates the evaluation of six different imbalanced datasets trained on the same extreme gradient boosting algorithm machine learning algorithm. In Table 4.2, the best true/false ratio of the predictions is shown. It shows the model’s accuracy in predicting the correct class for both classes. Predicted Negative Predicted Positive Actual Negative 2666 237 Actual Positive 41 56 Table 4.2: A Confusion matrix showing the true/false ratio for both the positive and negative classifier for the XGBoost model that was made. The generated dataset had 10000 cars with an average runtime of 2 years for each car. In Tables 4.3 to 4.8, the other generated datasets’ true/false ratios are shown. The implication of these tables is discussed in chapter 5. An illustration of the true/false ratio in the form of a bar can be seen in Figure 4.1 Figure 4.1: A stacked bar illustration of true/false ratio for the same dataset of 10000 cars running for 2 years. (Same dataset used in table 4.2) 42 4. Results Predicted Negative Predicted Positive Actual Negative 2666 237 Actual Positive 41 56 Table 4.3: A confusion matrix that illustrates the true/false ratio for both the negative and the positive classifier for the XGBoost model. The generated dataset had an average run-time of 2 years, used 10000 cars and had 1.0 times the normal crash risk Predicted Negative Predicted Positive Actual Negative 2516 176 Actual Positive 159 149 Table 4.4: A confusion matrix that illustrates the true/false ratio for both the negative and the positive classifier for the XGBoost model. The generated dataset had an average run-time of 2 years, used 10000 cars and had 3.0 times the normal crash risk Predicted Negative Predicted Positive Actual Negative 2690 242 Actual Positive 50 18 Table 4.5: A confusion matrix that illustrates the true/false ratio for both the negative and the positive classifier for the XGBoost model. The generated dataset had an average run-time of 1 year, used 10000 cars and had 1.0 times the normal crash risk Predicted Negative Predicted Positive Actual Negative 5272 579 Actual Positive 82 67 Table 4.6: A confusion matrix that illustrates the true/false ratio for both the negative and the positive classifier for the XGBoost model. The generated dataset had an average run-time of 1 year, used 20000 cars and had 1.0 times the normal crash risk Predicted Negative Predicted Positive Actual Negative 2846 109 Actual Positive 40 5 Table 4.7: A confusion matrix that illustrates the true/false ratio for both the negative and the positive classifier for the XGBoost model. The generated dataset had an average run-time of 6 months, used 10000 cars and had 1.0 times the normal crash risk 43 4. Results Predicted Negative Predicted Positive Actual Negative 2952 26 Actual Positive 22 0 Table 4.8: A confusion matrix that illustrates the true/false ratio for both the negative and the positive classifier for the XGBoost model. The generated dataset had an average run-time of 3 months, used 10000 cars and had 1.0 times the normal crash risk 4.1.1 Kubernetes resource usage The Kubernetes cluster used for the thesis was running on Azures Standard_E4s_v3 node server. The node included 4 VCPUs which is equivavelent to 4 threads/2 cores of an AMD EPYC 7763v-processor, 32 GB of ram and 150 GB of hard drive space. It was however discovered that significantly less resources were required than previously thought for the Kubernetes cluster to work as seen in Table 4.9. This also means that more pods would be deployed once the API pods become overloaded. Service VCPU Size RAM Cost (USD/m) FastAPI Pod 1 0.650 2.86GB 1GB 31.04 FastAPI Pod 2 0.650 2.86GB 1GB 31.04 ClusterIP 0.080 <0.1GB <0.1GB 2 Loadbalancer 0.100 <0.1GB <0.1GB 2 Table 4.9: A table illustrating the cost of Azures resource use with the most suitable type of virtual machine. 44 5 Conclusion 5.1 Discussion In the hypothesis of the thesis, we asked ourselves if it was possible to make an analysis that could predict service cost for vehicles if trip data was available. This hypothesis was attempted to be proved through an evaluation of a machine learning model using data that is designed to emulate trip data. 5.1.1 Data Imbalance The likelihood of an average person crashing is not very high during an average trip of 20 kilometers. Because of this only a few driving profiles in the data will result in operating costs. This means that there is a substantial imbalance in the data. More than 99.95% of the data belongs to class 0, which is the no-cost class. This imbalance led to problems in the model training, as it is more difficult for the model to learn the class 1 given the low numbers. Another problem with such a large imbalance is that the model can become biased towards the majority class. If, for instance, the model only predicts the majority class it will have a very high accuracy, but will fail to identify any costs, which is the purpose of the model. To alleviate this issue, we combined multiple trips into one trip that took the average out of all those trips for a driver. We also increased the number of days to increase the range where possible accidents can occur. These two steps was done to increase the number of operating costs in the data without altering the odds of crashing to unrealistic proportions. This in turn led to a higher percentage of costs in the data than before, which meant the model improved its accuracy for class 1. The data imbalance went from a tiny percentage (less than 0.05%) to approximately 0.5% to 8% depending on the amount of trips each car had. This means that a significant amount of data was sacrificed to increase the datasets imbalance issue. But a data imbalance, while more acceptable now, still exists and must be considered when evaluating the model’s performance. If one only test the model’s accuracy for all of the data it would result in a very high percentage of accuracy. But this exceptional accuracy would be misleading as the accuracy is this high because most of the data is no-cost and therefore logical that the model predicts such. To truly test the model’s accuracy, one wants to see how it performs when predicting the class with operating costs and evaluate that accuracy. That is why the main part of the testing was done with the metrics AUC ROC and F1 score, as they measure the accuracy of the class 1 predictions. 45 5. Conclusion 5.1.2 ML Result Interpretation From all of the tests gathered and from the results in the model evaluation (Table 4.1) we can see that the prediction accuracy showed better results once the imbalance ratio was greater than 1 in 30 cars needing additional operating cost. We can also see that more data resulted in a significant increase in accuracy. This means with this dataset structure that we have, once the imbalance ratio of the data reaches below 30, the only way to increase the accuracy of the results would be to increase the amount cars being used for the dataset. We can see that the best test case was where we generated 10000 cars that had been running an average time of two years. The results showed an AUC score of approximately 0.77 and a F1 score of 0.71. The best model resulted in a 57% accuracy in finding vehicles with additional operating costs and a 92% accuracy in finding vehicles without additional operating costs (as seen in Figure 4.1). This is very good results given that the generated data is not designed to be directly correlated to the output of the result. Since our driving behavior data covers the likelihood of additional operating cost and does not cause the operating cost itself, the likelihood of operating cost is what driving behavior changes rather than having a formula which can find exactly when additional operating cost will occur. The likelihood for operating cost used for the thesis is based on driver profile times the driving mile-age. The aggressive driver profile represents 10% of the dataset and the total amount of predicted operating cost drivers was 9.6%. Having an accuracy that is below 10% requires the operating cost likelihood to be correlated with other factors other than driver profile as well, which as of now only is the amount of kilometers that has been driven. In response to the hypothesis of the thesis, it is possible to create an analysis that can predict service cost of vehicles with the use of trip data, as long as the trip data has any form of implication to additional service cost risk. According to our earlier sources where we gathered the likelihood from the generated dataset, the empirical evidence claims that bad driving behavior causes more accidents. This means that as long as the sensors gathered from the trip data can accurately gather driving behavior, the machine learning model in this thesis should be able to predict operating cost with a good accuracy. 5.1.3 Importance of result The results of this thesis show that it is possible to make a mostly accurate ML model in regards to predicting operating costs. With the model returning the odds of operating costs exceeding 10000 SEK, the main value of the model is for analysis purposes. It can, for instance, be used by car companies who want to protect the value of their cars by identifying potential bad drivers. The companies can analyse the percentages and do as they wish with the corresponding driver. Another value with this work is that it has built a foundation for further work. If real data ever becomes available, the model can then easily be improved. If in the 46 5. Conclusion future a better algorithm than XGBoost is developed, then that algorithm could simply replace XGBoost and potentially improve the accuracy. We can with the results see that the machine learning algorithm extreme gradient boosting can provide accurate predictions with datasets that contains attributes which depends on other attributes. We can also see that we can predict how a set of data values can predict the likelihood of a certain condition being met. While previous products such as Tesla’s safety score[40] has focused on making scoring systems, these results demonstrate that correlating scoring systems with other factors such as vehicle service cost is also possible. 5.1.4 Limitation of results Due to the lack of real data collected from sensors, the results cannot confirm if having bad trip data results gathered from said sensors will in fact cause higher operating costs. This is an essential factor since the machine learning model may be useless in its current state if the hypothesis is not true. The choice of making generated datasets instead of using real datasets were con- strained by not being able to know if there is were any quality assurance of the data gathered from the sensors. Other constraints of the analysis was that we did not know whether the data will be stored in a database or if it would to be collected in real-time through some form network. It is beyond the scope of this study to compare other machine learning models to see which model provides the best accuracy. 5.1.5 Use cases While usage-based insurance for single cars exists where insurance discounts could reach as far as 30%, this is still not common when purchasing car insurance in bulk. It is instead used as a bargaining tool in contracts to decrease insurance costs by a few percentage points instead. This means that lower service costs for a subscription based automotive company provide more bargaining power for insurance discounts and allows the company to avoid the additional costs such as paying for a replacement cars to a driver. In this subsection, some measurements that would allow a subscription based car company to decrease its service cost will be mentioned. 5.1.5.1 Safety score The model could be used to calculate a safety score, instead of just a percentage of cost occurring. This score could be a number ranging from 0-100 and be displayed in the car for the driver to see. Examples of companies that use safety score are Tesla [45] and Toyota [46]. The performance of drivers can improve if they see a score on how well they drive. There are many people who are competitive and would want to get a higher score than their peers. In order for them to get a higher score they need to improve their driving, this is an advantage with displaying the score 47 5. Conclusion to the drivers. Good scores could also result in small rewards, for instance, Tesla provides access to Full Self-Driving beta for those with a safety-score above 95 [2]. This kind of positive encouragement would encourage drivers to perform well and maintain that level. There have been some issues with this scoring method however, in regards to dis- playing the score to the drivers. If the drivers can see their score all the time and discover what driving maneuvers result in a good score, they can exploit it. This happened with Tesla where some drivers found and posted an exploit that resulted in a perfect score [44]. The drivers could just restart the system and all their lat- est errors in traffic would be erased. They could also manipulate how the scoring was calculated by driving in a specific way, in order to increase the scoring. These explo