Multi-objective optimization by Machine Learning Master’s thesis in Computer science and engineering Hampus Hagstrand Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2023 Master’s thesis 2023 Multi-objective optimization by Machine Learning Hampus Hagstrand Department of Computer Science and Engineering Chalmers University of Technology University of Gothenburg Gothenburg, Sweden 2023 Multi-objective optimization by Machine Learning Hampus Hagstrand © Hampus Hagstrand, 2023. Supervisor: Carl-Johan Seger Examiner: Jean-Philippe Bernardy Master’s Thesis 2023 Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg SE-412 96 Gothenburg Telephone +46 31 772 1000 Typeset in LATEX Gothenburg, Sweden 2023 iv Multi-objective optimization by Machine Learning Hampus Hagstrand Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg Abstract Solving multi-objective optimization with machine learning can significantly improve various fields, such as multi-junction traffic management or stock portfolio optimiza- tion. These are problems that can have a large amount of relevant and irrelevant data. This thesis targets one such problem area, focusing on multi-objective opti- mization in trot horse harness racing, specifically the V75. A large part of the project was data-related, such as data collection, preprocessing, and engineering. The predicting part is divided into two parts single race prediction and system predictions. The single-race prediction utilizes the large amount of data collected to train a neural network to predict the percentage of the horse finishing behind the winner. The system prediction uses the result from the neural network to pick a system. During this process, a greedy algorithm selects more horses in the races that the machine learning deems close and fewer that it deems one-sided. The performance evaluation showed that the single race predicting performed on par with the more advanced baseline and showed clear signs of finding a pattern between the data and the finishing result. The system prediction found some accuracy but did not surpass the odds baseline. Keywords: Machine Learning, Artificial Intelligence, Multi-objective Optimization, Horse Racing, V75 v Acknowledgements Firstly, I would like to begin by expressing my sincere gratitude to Carl-Johan Seger, who has been an outstanding supervisor. His guidance, expertise, and patience were instrumental in completing this thesis. Secondly, I would like to thank Jean-Philippe Bernardy for his insightful feedback on the half-time report. His critiques significantly improved the quality of my work. Thirdly, I want to extend my appreciation to Magnus Edvardsson, Per Klevmarken, and Anton Hålldén. They introduced me to the world of V75, a sport that formed the foundation of this thesis. Their wisdom, expertise, and knowledge about the game played a crucial role in shaping this work. Lastly, I would also like to acknowledge that Grammarly was used for spell and grammar checking, ensuring the readability and correctness of this thesis. Hampus Hagstrand, Gothenburg, 2023-06-27 vii Contents List of Figures xi 1 Introduction 1 1.1 Horse Racing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Trotting and V75 . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Background 5 2.1 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Sport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Methods 9 3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.1 Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.1.1 Process . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.2.1 Process . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Validator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1 Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1.1 Process . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2.1 Random Baseline . . . . . . . . . . . . . . . . . . . . 17 3.3.2.2 Starting positions baseline . . . . . . . . . . . . . . 17 3.3.2.3 Win percentage baseline . . . . . . . . . . . . . . . . 18 3.4 Machine learning models . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.1 Tensorflow and Keras . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.1.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4.1.2 Hyper parameter tunner . . . . . . . . . . . . . . . . 20 3.5 Multi-Event Decision Making . . . . . . . . . . . . . . . . . . . . . . 22 3.5.1 Greedy naive odds baseline . . . . . . . . . . . . . . . . . . . . 22 3.5.2 Greedy naive picking algorithm . . . . . . . . . . . . . . . . . 22 4 Results 25 ix Contents 4.1 Single race predictions . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1.1 Random Baseline . . . . . . . . . . . . . . . . . . . . 25 4.1.2 Starting track . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.3 Win percentage . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.4 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.5 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 System race predictions . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1.1 Greedy Algorithm . . . . . . . . . . . . . . . . . . . 31 4.2.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.2.1 Greedy Algorithm . . . . . . . . . . . . . . . . . . . 32 4.2.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.3.1 greedy odds baseline . . . . . . . . . . . . . . . . . . 33 5 Conclusion 35 5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Bibliography 39 A Appendix 1 I A.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I A.1.1 Scraper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I A.1.2 Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . II A.1.3 Validator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III A.1.4 Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV x List of Figures 1.1 All options presented by ATG.se . . . . . . . . . . . . . . . . . . . . . 2 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Autostart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Voltstart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Overview of the ANN architecture . . . . . . . . . . . . . . . . . . . . 7 3.1 An overview of the whole process . . . . . . . . . . . . . . . . . . . . 9 3.2 An overview of the scraping process . . . . . . . . . . . . . . . . . . . 12 3.3 An overview of the preprocessing process . . . . . . . . . . . . . . . . 14 3.4 An overview of the Validator process . . . . . . . . . . . . . . . . . . 15 3.5 An overview of the Evaluator process . . . . . . . . . . . . . . . . . . 16 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 xi List of Figures xii 1 Introduction This thesis aims at working on optimization under constraints problems given a large amount of both relevant and irrelevant data. An example of a problem like this would be traffic management. A city has numerous junctions (events), and each intersection has different traffic volume, location, speed, and much more (data). Different traffic patterns and control strategies can be used at each junction to affect its performance (actions). However, the city has a restricted budget and can not do the best actions at each intersection (limiting factor). Another example of this type of problem is portfolio optimization. In this case, there are multiple investment opportunities (events). Each stock/company has different attributes, such as price, P/E, employees, and much more(data), and the limiting factor is budget. A final example, and one that we will use as a driving example in this thesis, is V75 betting, where deciding what horses to pick in each race is the events, previous races are the data, and a budget is the limiting factor. A more general explanation of this type of problem is that there are multiple events independent of each other and several possible actions at each event. Every event has the same data points, but the data itself differ. The only commonality between the events is their effect on the limiting factors such as a budget. Supervised machine learning(ML) can be used to determine the most optimal actions at each event, disregarding the limiting factor. Because of this, the ML needs to return a metric that quantifies how crucial a specific action is so that at a later stage, an algorithm can determine if the action is necessary enough to perform and affect the limiting factor. There is a wide range of this type of problem that varies in importance. Solving these problems can be essential for multiple reasons. Solving the traffic management problem can significantly impact and improve the safety and satisfaction of the residents in that city. Another aspect is resource allocation; if the city can maintain the same level of performance at each intersection with fewer resources, the city can spend that money elsewhere. Solving these problems using machine learning can also unravel undiscovered connections between data and the outcome. Perhaps the car’s color or the vehicle registration year will have an unexpected impact. This thesis will use horse racing and, more specifically, trotting and V75 as the problem. Each race is considered an event, and the data is, for example, the horse age, carriage type, shoe, win percentage, and much more. The limiting factor will be an artificial budget, and the aim is to retain as much money as possible playing 1 1. Introduction on V75. The following section will discuss the background of horse racing, trotting, and V75. Data availability is the primary reason for using trotting and V75 as the optimization under constraints problem, since there are over 100000 races stored in an archive easily accessible. 1.1 Horse Racing Horseracing has been an established sport for over three centuries and was first invented in England, but today it is common in many countries, including, Sweden[1]. It is usually heavily associated with betting, and because of this, many people have attempted to find the perfect system to predict the winner[2]. Since it is still thriving today, one can assume that no such method is widely known or exist at all. With the advancement of technology and analysis, the sport has evolved and now presents its players with more data. This is data related to the horses such as speed, carriage, trainer, age, genes, and much more[3]. Below is an image of all the different data points for each horse you can see on the Swedish betting site ATG.se. One can argue that this is more than you effectively use, but it is hard for a human to determine the relevant parameters[4]. Outside variables, such as start type, race distance, track type, weather, etc., must also be considered when predicting the winner. Different horses are good at various scenarios, but the same can also be stated about the “drivers”. To make it even more complicated, in some games, you have to predict multiple races and predict most of them correctly to get any return[5]. As one can see, it is more complex to find the potential winners than it first appears, and since it involves both humans and animals, a perfect system is likely impossible to find. Figure 1.1: All options presented by ATG.se 2 1. Introduction 1.1.1 Trotting and V75 This thesis will focus on trotting with harness racing, specifically, V75[5]. Trotting is when the horse is only allowed to trot and not gallop, and harness racing is when the “driver” is sitting in a carriage behind the horse, as seen in Figure 1.2. The goal is very straightforward, as in most races, the first to pass the finish line is considered the winner. Figure 1.2 In Swedish trotting, there are two start types and four different distances. The exact distances can vary from track to track but are usually 1640, 2140, 2640, or 3140 meters. One English mile (1609 meters) can also be used on rare occasions. In Sweden, there are two different starting types called autostart and voltstart. In autostart, a start car is used that gathers the horses behind it and then steadily accelerates up to the start line, where it quickly accelerates away from the horses signaling the start of the race. Voltstart is more complex than autostart; here, the horses are divided into two to three groups, each group of horses running in a loop. The horses start on a start command that goes “klart-ett-två-kör” when the command finishes, the horses should be running so that they all are in the correct track position. With this starting type, redoes are common. Both start types can be seen in figures 1.3 and 1.4. V75 is a game mode where the player must guess the winner of seven races and get at least five correct to receive any winnings. Of course, the argument can be made that the game mode is unimportant and only the individual races matter, but that is not entirely true. Although the races do not affect one another, the number of horses the player pick in each race affects the system’s cost. In V75, the cost is calculated by multiplying the number of horses the player has in each race with each other, and this results in the number of rows the system consists of. In V75, each row costs 0.5 KR, so the number of rows is multiplied by 0.5 to calculate the total cost. So, for example, if the player chooses four horses in each race, the cost would be 47 × 0.5 = 8192 KR, but if they were too fewer horses in some of the races and more in the others the cost could stay the same but more horses are picked for example, 1 × 1 × 3 × 5 × 9 × 10 × 10 × 0.5 = 6750 KR which is both cheaper and ten more horses. This makes it clear that the player should not evenly distribute the horses between the races, and they need to find races where they only pick a small 3 1. Introduction number of horses, preferably only one, to be able to afford to pick more horses in the more uncertain races. The payout works a bit differently. As mentioned, there is only a payout when the player gets 5,6 or 7 right, but the exact cash payout usually differs from each system. The payout is determined by the number of winning payers and the amount of money in the pot. The pot could vary from just a few million to over 20 million Swedish crowns. This fluctuation is because if there were no winners in the private V75, that pot would be combined with the next one. This means that some weekends are better to play on than others. But the player also wants to be one of the few winners since every player who got seven right shares the pot, the fewer, the better. Figure 1.3: Autostart Figure 1.4: Voltstart 4 2 Background 2.1 Machine learning Machine learning (ML) is a sub-field of Artificial intelligence (AI) that study the development of models that are able to learn from data and make decisions from it. The ML can improve its performance by analyzing more data during the training process. This ability enables it to discover patterns and trends that were otherwise hidden and makes ML a versatile and essential tool in numerous fields. Seeing as ML is a broad area but can be divided into many subareas. However, the two most common are supervised and unsupervised learning. Supervised learning is when the ML learns from labeled data and tries to map the input data (features) to the output. Each input feature is associated with one or more outputs, and during the training, the algorithm uses the labeled data and finds the connections between the features and the outputs. Unsupervised learning is when the ML is trained on unlabeled data. In addition to supervised and unsupervised learning, there is reinforcement learning and semi-supervised learning. Reinforcement learning is when the ML model learns to make decisions by taking available actions to maximize a goal and minimize a penalty. This learning type is especially popular when machine learning models are tasked with, for example, learning how to walk. Semi-supervised learning is, as the name suggests, a combination of supervised and unsupervised learning. This means the model is trained on partly labeled and unlabeled data. All machine learning models also have some measurement of how well they are performing. For example, this measurement can be accuracy, whether the model made the right decision or not. This measurement is often used when the model solves a classification problem with a limited number of prediction options. However, pure accuracy is seldom good when the model needs to make a prediction for a regression problem, such as a number with unlimited possibilities. With these types of problems, metrics such as mean absolute error (MAE) are much more suitable since it gives an average of how far the model was off, so the lower the MAE, the better. 5 2. Background ML dose also comes with some common problems; the most common ones are over and underfitting. Overfitting is when models learn their training data well and perform poorly on unseen data. Underfitting is the opposite and occurs when the model doesn’t know enough from the training data and performs poorly even on that. Bellow in Figure 2.1 is an illustration of overfitting and underfitting. Figure 2.1 To easily observe over and underfitting, it is common practice to divide the available data into two parts, training and validation. The training data set is only used during training, while the validation data set is only used to evaluate the model’s actual performance. Employing this practice means over, and underfitting will be easily detected and shows a more realistic model performance. Machine learning also has multiple sub-fields, such as K-nearest neighbors, decision trees, long short-term memory, and artificial neural networks, and many more. In the following section, neural networks will be explained in more detail. 2.1.1 Neural Networks Artificial neural networks (ANNs) are models that are inspired by the structure of the neural network in the brain. ANNs consists of interconnected layers, each tasked with applying a specific operation on the input data. During the training phase, the model learns to match input data to the output data, which involves tuning the weights and biases of the model. This tuning aims to minimize a loss metric calculated by the given loss function. This loss function indicates the difference between the actual output and the model’s predicted values. ANNs have two fundamental components: an input layer and an output layer. These layers indicate the beginning and the end of the network. The input layer is the first layer and takes the training data the network will learn from in its rawest form. The number of nodes in the input layer typically corresponds to the number of features in the training data. The input layer performs no actions or modifications on the data and is tasked with passing the data to the next layer. The output layer is the final layer, and the final predictions are generated here. The number of nodes in this layer corresponds to how many values need to be predicted. 6 2. Background ANNs can also contain hidden layers. Hidden layers are neurons between the input and output layers. Hidden layers help the neural network find non-linear patterns between the input features and the output predictions. In Figure 2.2, an overview of the ANN architecture can be seen. ANNs have multiple use cases, such as image recognition, natural language processing, game playing, and speech recognition. Figure 2.2: Overview of the ANN architecture 2.2 Previous work Using AI to solve optimization problems is not a new idea, and it has even been used to predict sports outcomes. There are also a few attempts at predicting horse races, which have even been reported in mainstream media[6]. For example, using machine learning, Ndiaye and Koffi demonstrated how to process the horse racing data and how to use that to predict the winner[7], [8]. They attempted two machine learning approaches, “lightGBM” and Deep learning, and found success with them, turning a profit by betting on the winner and betting if the horse ends top 3. Cam- bell attempted a similar approach but saw less success and experienced significant overfitting during his process[9]. However, in his discussion, he mentions, “One shall study the domain knowledge before data analysis. My lesson is I didn’t understand the horse racing terms.” He also warns that preprocessing probably takes longer than model building[3]. There are a few other works that are similar to the ones discussed above; however, a few of them chose to specify a specific location, such as Hong Kong [10] or Poland [11], and some try to predict the top two [11]. What they all have in common is that they only focus on one race and focus only on predicting one potential winner or, in some cases, the top 2. However, in this thesis, the focus will be to try to predict the winner in multiple races and choose numerous potential winners in races deemed close. 2.2.1 Sport As discussed, predicting sports outcomes is not a new idea. However, it is still a complicated problem to solve. For example, there is a yearly competition to develop a machine-learning algorithm that tries to predict the outcome of march madness, 7 2. Background the National Collegiate Athletic Association men’s and women’s college basketball tournaments[12]. This ML competition has over 1000 competitors and has had some very talented and educated winners. Since it is a yearly competition with different winners, it is not a problem someone has fully solved. Microsoft has also been working on predicting the football world cup using Bing[13]. In 2014 Bing did manage to predict all 16 of the games in the knockout stage. However, in the group stage, Bing only had a success rate of 60% and also failed to predict the winner in the final, so it is far from perfect. It is also worth noting that the 2014 word cup had very few upset wins, and it was usually the statistically favored team that won. It is also worth mentioning that ML predictors developed by large companies such as Google or Microsoft are typically kept secret since there is money to be made in sports betting. However, these were two result sports where either team A or B won. There are many more outcomes in horse racing, and it is, therefore, harder to predict. The article “The Future of Sports Betting: AI-Powered Predictive Analytics” dis- cusses the use of AI in sports but also reflects on AI in sports ethically[14]. The article mentions that “the main concern is the potential for AI to be used to ma- nipulate outcomes in favor of certain bettors or teams. This could lead to unfair advantages and an uneven playing field, which would be detrimental to the integrity of sports betting.” They also discuss that AI can find patterns between the data and the outcome that humans would not be able to identify, giving an unfair advantage to better who have access to this technology. However, the points discussed in the article are valid; therefore, no actual bets will be used during this thesis. On the other hand, some work indicates that Artificial intelligence and machine learning do not have unfair advantages compared to human predictions. In the paper “Human Decision Making and Artificial Intelligence: A Comparison in the Domain of Sports Prediction,” the authors compare the accuracy of human and AI predictions in the 2015 rugby world cup[15]. The paper found that the AI had an accuracy of 89.58% and the humans had 85.52%, and it drew the conclusion “that for rugby, over the limited period of a specific tournament, the evidence was not strong enough to suggest that a human agent is superior in terms of accuracy when predicting match outcomes compared to a machine learning approach.” 8 3 Methods At a high level, the project consists of five primary parts, all working with some form of data. The parts are web scraper, Preprocessor, Validator, Machine Learning, and lastly, Evaluator. Figure 3.1 shows that the data begins as a web archive, and with the web scraper, the archive is turned into raw data. The Preprocessor then processes that raw data. After this step, the Validator can check the data to confirm its accuracy. The data is now ready to train a machine-learning model to generate predictions. Lastly, the predictions can be evaluated to determine the performance and accuracy of the machine learning models. Figure 3.1: An overview of the whole process 9 3. Methods 3.1 Data The project’s data-related task is divided into two sections, scraping and preprocess- ing. Each section has its sub-parts and will be explained in detail in this section. Initially, it was planned for the preprocessing to be a subclass of the scraping, which was valid for early versions. However, after significant reconstruction and function- ality addition, it grew and was separated from the scraping class. Additionally, both are developed in such a way that they are modular, and new functionality can easily be added, or existing mechanics can be tuned without requiring much reconstruction or disrupting the functionality of other parts. 3.1.1 Scraping The Scraper is designed to collect data from the web containing information about horse racing. It starts by gathering race Ids from a race archive. The program then uses these ids to access the specific races and collect all relevant data about the race and the horses and stores it in a CSV file. CSV or comma-separated values is a commonly used format for artificial intelligence, machine learning, and general data storage. It is a text file that separates values with a comma[16]. The Scraper class uses several different libraries to streamline its process. It uses the request library to handle posts and get requests against websites. Json[17] is then used to extract the information from the scrape data the request returned, and lastly, Pandas library[18] is used to store and save the data as a CSV. The Scrapper also used some custom classes to represent the data more intuitively. Primarily there are two classes, the race class and the horse class. As the names of the classes suggest, the race class represents the race-related attributes, and the horse class the horse specifics ones. Both classes can be seen in the code below with all their values; note that the race class has a list of horse objects. The following section will explain the scaping process from start to finish, step by step, at a reasonably low level. class race: id: int = None date: int = None track: int = None distance: int = None start_type: bool = None horses = [] 10 3. Methods class Horse: id: int = None name: str = None money: int = None money_per_start: int = None distance: int = None track: int = None win_percanteage: int = None shoes_front: bool = None shoes_back: bool = None shoes_change: bool = None carriage: bool = None carriage_change: bool = None age: int = None home_track: str = None at_home: bool = None top3_percentage: int = None place: int = None gallop: bool = None kmTime: float = None starts_life: int = None driver_id: int = None trainer_id: int = None starts_2023: int = None third_2023: int = None second_2023: int = None first_2023: int = None starts_2022: int = None third_2022: int = None second_2022: int = None first_2022: int = None average_odds: float = None V75_odds: float = None top3_odds: float = None points: int = None record_time: float = None 11 3. Methods 3.1.1.1 Process This Section will follow a single race throughout the scraping process. An overview of the process can be seen in Figure 3.4. As mentioned in the previous Section, an archive is first scraped to gather race ids. A race id is a string structured as YYYY- MM-DD_XX_ZZ. The date represents the day the race took place, XX represents the track id, and ZZ represents the race number. This id is unique for each race, and every ID is saved, so no archive scraping is done twice. The program can now perform an additional scraping with the id. The result of this scrape is data relevant to that specific race; however, it is in the form of a TXT file. That TXT file is now processed into a more suitable format and added to a CSV file. The whole process is now done again. For a comprehensive explanation of the implementation, see Appendix A.1.1. Figure 3.2: An overview of the scraping process 12 3. Methods 3.1.2 Preprocessing The goal of the Preprocessor is to take the raw data from the scraper and convert it to something more suitable for machine learning. More specifically, the Preprocessor class has multiple assignments that are divided correctly to enable effortless tweaking and simplifying the process of adding and removing new tasks. The different sub- processes include the calculation of lap percentage, removal of unnecessary data, removal of data only available after the race, and the normalization of every column. The following section will motivate and explain these sub-processes in more detail. In addition to the data-related tasks, the Preprocessor also handles loading and saving the processed data. This ability makes it so the same data does not need to be processed twice, significantly reducing the execution time. To aid in these tasks, the class uses three libraries. Pandas library helps handle the data, the math library assists in calculating the lap times, and the train test split library divides the data[19]. def _preprocess(self): # Gets the lap percentages in the form of a data frame df_y = self._get_lap_percentages() # Process the race data df_x = self._process_x() # Normlizes all race data df_x = self._normalize_all_columns(df_x) # Saves both the race data and the lap percentages self.save(df_x, df_y) def _process_x(self): df_x = self.df # Removes the data that is considered unnececary df_x = self._remove_unnececary_data(df_x) # Removes data that is only available after the race df_x = self._remove_cheating_data(df_x) # Removes data that was given to declude df_x = self._remove_columns(df_x, self.columns_to_declude) return df_x 13 3. Methods 3.1.2.1 Process In Figure 3.3, the high-level data flow of the Preprocessor is shown. For an unpro- cessed race, it begins with the calculation of lap times. The lap times determine the finishing order of the horses and can also be used to calculate the lap percentages, which indicate how close a race was. The lap times are then stored in a separate CSV file. Several data points are now removed that are deemed unnecessary or cheating. For example, race id is removed since it has no impact on race. Similarly, the finishing time and if the horse galloped is also removed since it is a fact known only after the race is completed, and thus effectively would be cheating. Lastly, all the data points for the race are normalized and then saved in a separate CSV file. A more in-depth explanation of the implementation process can be found in Appendix A.1.2. Figure 3.3: An overview of the preprocessing process 3.2 Validator The Validator class is a relatively simple program but very important. As the name suggests, it is used to validate that the scraper and Preprocessor work correctly and catch any inconsistencies. It is not fully automated and is more of a tool to speed up validation and requires user input to determine if a data point is correct. The Validator is constructed in a modular and adaptive way, so if any changes are made to either the scraper or Preprocessor, the Validator does not require any modification. 14 3. Methods 3.2.1 Process The Validator can be given the races it should evaluate or get random races from a CSV. How it gets races are determined in the get races section in Figure 3.4. Once it has the races, it moves on to validate them. It presents the user with each data point in the race and asks whether that is correct. When all data points and races are checked, the results are saved in a TXT file. Appendix A.1.3 provides a more detailed explanation of the process. Figure 3.4: An overview of the Validator process 3.3 Evaluation The evaluation part of the project is crucial and essential to determine how well a model is performing. The evaluation can be divided into two parts. One that measures different metrics and calculates new ones. The other part is about creating baselines that are used to compare with the machine-learning models. Both are equally important in determining the performance of a model. 3.3.1 Evaluator The purpose of the Evaluator is to take predictions and correct answers, compare them and, from that comparison, calculate metrics and graphs that showcase the accuracy of the predictions. The predictions are given in a list, so the Evaluator can not determine if the predictions came from a neural network model or a trivial baseline. This approach is by design and makes it so the Evaluator can assess an advanced model and a baseline identically. The Evaluator also handles storing these different metrics as well as the different tables and graphs created. This feature eases comparing models and baselines against each other. The Evaluator is also 15 3. Methods structured so that it is effortless to add additional metrics, which has been crucial since the Evaluator has evolved throughout the process. 3.3.1.1 Process The race evaluation starts with translating the predicted and correct lap percentages to the finishing positions for a more straightforward comparison. This comparison is during the Evaluate Races process in Figure 3.5. During this process, “totals” are also calculated, such as how many actual horses finished 5th, how many were correctly predicted 5th, and so on. The process now moves on to calculating the metrics. Here the totals from the previous section are used to calculate numerous percentages, such as accuracy. These metrics are then saved to disk. In addition, some are used to generate graphs that are also saved. For additional information regarding the implementation, see Appendix A.1.4. Figure 3.5: An overview of the Evaluator process 16 3. Methods 3.3.2 Baselines To determine how the developed machine-learning approaches performed, they were compared to trivial baselines. Since baselines have vastly different accuracy, multiple ones were used in varying degrees of complexity. The overall structure of each baseline is similar and also shares some similarities with the machine learning models. None of the baselines used any form of learning. However, the baselines have a prediction method similar to the machine learning model that takes race data, and depending on the baseline, it uses that data to determine the race’s finishing order and return it. The reason for these similarities is that it makes it easier to evaluate the baseline with the same Evaluator that the machine learning models use. 3.3.2.1 Random Baseline As the name of the baseline suggests, this model ignores all the given data and returns a random finishing order. This is the least complex and worse performing baseline. This model was primarily used initially as a sanity check and a pathfinder for future baselines. The code for this baseline can be seen below and gives an idea of how future baselines will be structured. class RandomBaseLine: @staticmethod def predict(self, race_data, horses): result = [] for i in range(0,len(race_data)): result += [random.sample(list(range(0,horses)), horses)] return result 3.3.2.2 Starting positions baseline This baseline determines the predicted finishing order based on each horse’s starting position and aims to demonstrate a connection between the data and the result. See- ing as some starting positions are better than others. The baseline is still relatively simple and will predict the horse with the first track position as the winner and the rest in descending order. It is, however, worth noting that the best track positions do not necessarily go from the first to the last. It is usually best to have one of the first track positions; however, some horses have trouble running directly next to or behind the car, meaning that a middle position can be one of the worst depending on the horse. Also, in the case of a volt start, multiple horses can have the same track position, and in that case, the horse with the lowest number is prioritized, seeing as that horse starts in the front row. 17 3. Methods def predict(race_data, horses): track_position = race_data.filter(regex = "_track") for data in track_position: order = np.sort(data) for i in range(0, horses): index = data.tolist().index(order[i]) race_result[index] = i data[index] = replacer replacer -= 1 result += [race_result] return result 3.3.2.3 Win percentage baseline This baseline uses the overall win percentage of each horse to determine the finishing order, and it returns the horses in the finishing order of highest win percentage to lowest win percentage. Unlike previous baselines, this baseline effectively uses previ- ous race results to determine the finishing order, presumably leading to a significant increase in accuracy and performance. class WinPercentageBaseLine(BaseLine): @staticmethod def predict(race_data, horses): result = [] win_percentage = race_data.filter(regex = "_win_percentage") for data in win_percentage: race_result = [0]*horses order = np.sort(data)[::-1] replacer = -2 for i in range(0, horses): index = data.tolist().index(order[i]) race_result[index] = i data[index] = replacer replacer -= 1 result += [race_result] return result 18 3. Methods 3.4 Machine learning models 3.4.1 Tensorflow and Keras TensorFlow[20] and, more specifically, Keras[21] will be used for the first implemen- tation of the machine learning model. Both are open-source deep-learning libraries compatible with Python. They have a user-friendly and straightforward interface for building, training, tuning, and evaluating neural networks. They are excellent for first-time implementation and should give a good first indication if the model can find any patterns between the input and output. In more detail, a Keras machine learning model comprises interconnected layers, each tasked with applying a specific operation on the input data. During the train- ing phase, the model learns to match input data to the output data, which involves tuning the weights and biases of the model. This tuning aims to minimize a loss met- ric calculated by the given loss function. This loss function indicates the difference between the actual output and the model’s predicted values. Here is a high-level overview of A typical Keras model consisting of a description of the different layers in the neural network. The architecture includes the input layer, which needs to have the input shape of the input data, and the output layer, which must know the form of the output data. The architecture can also include multiple hidden layers which perform specific operations on the data from the previous layers. The model now needs to be compiled, which includes defining the optimizer, the loss function, and the evaluation metrics. The task of the optimizer is to tune the model’s parameters (weights and biases). The loss function, as mentioned previously, measures the difference between the model prediction and the actual output values. Furthermore, the evaluation metrics such as accuracy and precision are used to help assess the model’s performance. Training the model now commences and is usually the most time-consuming step in the process. The model receives input and output training data and starts iterating over it. The loss metric is computed during each iteration, and the model parame- ters are updated. The process is repeated for a number of so-called epochs, which is a complete pass through the dataset. After the training, the model can be evaluated using a test dataset. The test dataset includes data the model has not encountered before, which indicates the model’s actual performance and can uncover over and under-fitting. Depending on the evaluation results, the model’s different hyperpa- rameters can be fine-tuned. These are parameters such as learning rate, batch size, layer configuration, and much more. This process includes a lot of trial and error. The following sections will describe the actual model used in this thesis. As men- tioned before, the exact values of the hyperparameters are subject to trial and error, so precise values will be presented in the results section. 19 3. Methods 3.4.1.1 Model This model aims to get the input and output layer working properly and have the machine learning model compile correctly, it also hopes to find improvement between each training epoch and subsequently perform better than the random baseline. However, the model is still simple enough that the training and testing are relatively fast to ease the tuning phase. Actual hyperparameter tuning will also be performed on this model. Below, the code can be seen. The model consists of 3 layers, the input and output layers which are separated to leave room for a dense layer between them. The dense layer has 128,256,512 or 1024 neurons and has either Tanh, ReLU, or Sigmoid as activation functions. The optimizer is Adam, with a learning rate of 0.001, 0.0001, or 0.00001 and a loss function of mean absolute error (MAE). What hyperparameter performance best will be presented in the result section. To summarise this model, the goal is to find what activation function works best, what number of neurons gives the best performance in the single hidden dense layer, and what learning rate should improve accuracy. The input data for this model are all seen in Section 3.1. The total dataset will be 10000, where 8000 is used for training and 2000 for validation. The data processing will be divided into two different experiments. In experiment one, no normalization is done, and the data is in its rawest form. In experiment two, the data is normalized. Hyperparameter tuning will be performed in both experiments. 3.4.1.2 Hyper parameter tunner The model discussed in the previous sections requires significant tuning and testing. For example, the intermediate model has three options for learning rate, three for activation function, and at least four for the number of neurons. If all combinations are to be tested, that would be 36 different models, which will take quite a lot of time depending on the data set size. If this testing were done manually, it would require tedious work and be suboptimal in searching efficiently. However, Keras Tuner is an open-source flexible, and powerful Python library that can manage the hyperparameter optimization process for Kears models. Keras tuner works by using different search algorithms, such as random search, hyperband, and Bayesian optimization, to discover the best-performing hyperpa- rameters. The library starts with defining all possible hyperparameter values and then iteratively evaluates the different combinations. The library uses a user-defined objective function to determine what combination works best. The object function can, for example, be the validation accuracy or loss. At the end of the process, the tuner returns the best-performing model according to the objective function. Below is the code for the intermediate model. 20 3. Methods def build_simpel_model_tuner(hp): model = Sequential() activation = hp.Choice( "activation", values=["tanh", "sigmoid"]) model.add(Dense(x_train.shape[1]+1,activation=activation, input_dim = x_train.shape[1])) model.add(Dense(units=hp.Choice("units", values=[128,256,512,1024]), activation=activation)) model.add(Dense(15)) model.compile( optimizer=Adam( learning_rate=0.0001 ), loss='mae', ) return model earlyStopping = EarlyStopping( monitor='val_loss', patience=80, verbose=1, mode='min', restore_best_weights=True, min_delta=0.001) reduce_lr_loss = ReduceLROnPlateau( monitor='val_loss', factor=0.1, patience=40, erbose=1, min_delta=0.001, mode='min') tuner = RandomSearch( build_simpel_model_tuner, project_name = "experiment_1_simple_model", objective="val_loss", ) tuner.search(x_train,y_train, epochs = 10000,validation_data=(x_test, y_test), batch_size=200, callbacks=[earlyStopping, reduce_lr_loss]) 21 3. Methods 3.5 Multi-Event Decision Making With a machine learning model that can predict individual races see Section 4.1, the focus changed to predict systems of races, such as V75, as explained in Section 1.1.1. To achieve this, two picking algorithms were developed, ranging in complexity. Although the algorithms vary, they do have a lot in common. Such as, they all require at least three input variables, a budget, the price, and a list. The budget is the maximum expense a system is allowed to incur. The cost represents what one row in the system costs. Lastly, the list contains races, and each race has the predicted percentage of a horse and its unique id. Each algorithm also returns the same type of structure. For each race given, the algorithm returns a list indicating by id what horse to pick. So, for example, in V75, the cost would be 0.5kr, and the list would contain seven races. The return structure would be a list of seven lists containing IDs for what horse to pick in each race. However, the algorithm would be able to handle other game modes, such as V64 or V86. Since the picking algorithms only predict one system and do not handle dividing the races into systems, some preprocessing must be performed. The Systemprediction- Manager handles this, which takes two lists of races, one with the horse ids and the predicted percentage for each horse and the other with the horse ids and the actual percentages. Both lists are in the same order meaning that the first race in the predicted list is the same as the correct percentage list. The lists are now divided into systems of a given size, for example, 7. With the division complete, picks for each system in the prediction list are calculated using a given picking algorithm. The following sections will list and explain the picking algorithm that the System- predictionManager use. 3.5.1 Greedy naive odds baseline The greedy naive odds baseline is similar to the greedy picking algorithm; however, instead of using the predicted percentage, the baseline will use the odds. So this baseline first picks one horse in each race that the odds show will win. After that, it finds the horse who is closest to the winner according to the odds in each race, and in the race that is the closes, it chooses that horse. This process is repeated until the budget is reached. Presumably, this baseline will perform very well; however, using this approach will lead to a minimal payout as was described in Section 1.1. So although this baseline is highly likely to beat the ML in pure numbers, it might not mean it surpasses it in money payout. However, seeing as this thises will only focus on the number of correct systems, this will not be observed in the results. 3.5.2 Greedy naive picking algorithm The greedy naive picker is a simple, straightforward algorithm. It begins by picking the best horse in each race, and since the ML model predicts the percentage each 22 3. Methods horse finishes behind the winner, it is the one with the lowest percentage. It then looks at each race, finds the second-best horse, and calculates the distance between the 1st and 2nd. It now finds the horse with the smallest distance to the winner and calculates the system’s total cost to add that horse. If the cost is under the budget, the horse is added to the picks and removed from the list of available horses to pick. However, if the total cost exceeds the budget, the horse is not added to the picks but is still removed from the list of available horses. This process is repeated until there are no more available horses to choose from. 23 3. Methods 24 4 Results This chapter will present the results and performance of both single-race and system predictions. Additionally, some analyses and observations will be discussed. 4.1 Single race predictions Single race prediction is not the primary objective of this thesis and is merely a stepping stone for the primary purpose of picking a system of races. However, it is a fundamental process, and selecting a system would be impossible without it. In the following sections, the results of different race predictions will be presented, both results of different types of machine learning models and also various kinds of data processing. 4.1.1 Baselines 4.1.1.1 Random Baseline The random baseline, as expected, performed very poorly, with an overall accuracy of 6.8%; however, that is not a very good metric. The overall accuracy includes all positions, but it is predicting the first place that is the most important. So in the placement accuracy figure, we see the accuracy for each position. We also, in this figure, see some volatility toward the later positions. For example, at position 11, we have an accuracy of 8.4%, while at 14 and 15, it is zero. These accuracy levels can be explained by the fact that there are few races with this many horses. Therefore, one horse at the 14 and 15 finishing positions represents a more significant percentage. Since the ultimate goal is to predict multiple races and, in some cases, pick numerous horses in a single race, it would be interesting to see if the winner was included if you take the top x best-predicted horse where x can be 1 to the number of horses in the race. The accuracy when x is one is already known since they only take the predicted winner, so 6.6 %. However, in Figure 4.2, we see the accuracy if we include, for example, the two best-predicted horses. Additionally, a metric was calculated to show how far off the predicted finishing position was from the actual finishing position on average. So, for example, if the actual placement was first and the horse was predicted to finish fourth, that would be a distance of three. Figure 4.3 shows the average distance the prediction was off at each actual placement. The random baseline has an average distance of 4.53. 25 4. Results The random baseline does not perform well. However, it indicates what a lousy performance looks like and what a good performance might be. This baseline can be seen as the floor of expected performance. The average distance and top x accuracy figure also show some interesting patterns. In the average distance, a U shape can also be observed. This shape is simply because at the position when prediction, position seven can predict over and under, while at number one, only lower predictions can be made. The top x accuracy Figure is linear, which is to be expected with random picking. Figure 4.1 Figure 4.2 Figure 4.3 26 4. Results 4.1.2 Starting track The starting position baseline performance is better than the random baseline, indi- cating a correlation between the horse’s starting and finishing positions. It has an overall accuracy of 8.33% and an average distance of 3.78, both better than random. A few interesting observations can be seen in Figures 4.4, 4.5, and 4.6. In Figure 4.4, we see that the starting position is better at indicating some finishing positions than others, such as 1,2,8,10, and 12. The U can still be observed in the average distance position but is more flattened, and its center has moved to the left, closer to the top positions. Lastly, the top X accuracy is starting to show some nonlinear patterns indicating that the winner is favored towards the top. This baseline is still pretty poor, but it does indicate what finding a pattern between the data and the finishing result might look like. Although this pattern appears to be bad, it is still better than the random baseline. Figure 4.4 Figure 4.5 Figure 4.6 27 4. Results 4.1.3 Win percentage The win percentage baseline shows considerable improvements showing a strong correlation between the horse’s win percentage and its finishing position. It has an overall accuracy of 12.39% and an average placement distance of 2.99. However, the most significant improvement can be seen in the placement accuracy Figure 4.7, where we can see that finding the winner has an accuracy of 24.3%, which is a massive improvement from the other baselines. However, considering it is the win percentage that is used in this baseline, it is not very surprising. Other observations that can be made in Figures 4.8 and 4.9 are that the U in the average distance graph is considerably flatter towards the top positions, and the top x Figure is not linear anymore but rather logarithmic. This baseline demonstrates an acceptable level of performance, mainly when predict- ing the winner. Additionally, it highlights the pattern in average distance Figures and top X accuracy that should be aimed for. Figure 4.7 Figure 4.8 Figure 4.9 28 4. Results 4.1.4 Experiment 1 The best-performing hyperparameters for the model in Experiment 1 described in section 3.4.1.1 were the activation function of "tanh," 512 neurons in the single hidden layer, and a learning rate of 0.0001. Below is the exact implementation of this model: model = Sequential(name=name) model.add(Dense(input_dim+1,activation="tanh",input_dim)) model.add(Dense(units=512,activation="tanh")) model.add(Dense(15)) model.compile(optimizer=Adam(learning_rate=0.0001)) The model had a mae of 2.53 at its best, and the progression during the training process can be seen in Figure 4.10. However, the mae does not give a good indication of how this translates into actual placement prediction, so with the help of the Evaluator described in Section 3.3.1, we see that we have an overall correct accuracy of 11.34 %. In Figure 4.11, the accuracy for each position is shown, and here it can be observed that predicting the winner has a significantly higher percentage of 16.9% than the overall percentage. It is also clear that the model has found some form of a pattern since it performs considerably better than the random baseline. Figure 4.10 Figure 4.11 29 4. Results The top x accuracy graph 4.12 shows a logarithmic improvement similar to the win percentage, and the Average distance graph 4.13 shows a somewhat flattened U shape compared to the random baseline. Figure 4.12 Figure 4.13 4.1.5 Experiment 2 In experiment 2, the data is now also normalized, meaning that every value is between 0 and 1. Normalizing makes it so that each feature is treated equally. Before, higher values had a more significant effect than smaller values. The model that performed best now during the tuning process is very similar to the previous one, except for having 1024 neurons in its hidden layer instead of 512. Another difference is that this model trained significantly faster than the previous one, which can be observed by comparing figures 4.14 and 4.10 where we see that the previous model needed more than 1500 epoch to stop learning while this model only needed less than 700. With normalization enabled, we have an overall accuracy of 13.42%, which is an improvement but not a large one. However, once again, this metric only shows part of the picture. If we look at the placement figure 4.15, we see that the accuracy for the winner is significantly higher, with an improvement of 7.5 %. The same can also be observed in predicting 2nd place, which increased by 3%. The rest of the placement grew by roughly 1%. A similar trend can be seen on the average distance in Figure 4.16 where placements 1 and 2 had the most improvement with the exception of placement 14, which is probably an outlier due to a few races having 14 horses. In the top x accuracy, we see a significant improvement, as seen in Figure 4.17. After including the top 3 horses, this model has almost exact distances as the previous model had at 4. 30 4. Results Figure 4.14 Figure 4.15 Figure 4.16 Figure 4.17 4.2 System race predictions Only the best-performing machine learning model from each experiment was used for system prediction. However, multiple-picking algorithms, such as greedy, were tested with each model. The algorithms were also tested when they were forced to only select one horse in x races where x can be 1 through 4. This allows the picker to pick more in the uncertain races and forces it only to choose one horse in the races if it thinks a clear winner is present. To simulate a V75, the algorithms were tested with three budgets: 124, 1024, and 8192. The algorithms predict the outcomes of seven random races. Section 1.1.1 explains that a V75 only results in a payout if at least five races are correctly predicted. If the algorithm gets less than five correctly, it is considered a complete miss. The pickers were be tested on the test dataset, which is roughly 2000 races, which means it is 285 systems with a size of 7. 4.2.1 Experiment 1 4.2.1.1 Greedy Algorithm The greedy algorithm with the data in its rawest form and a budget of only 124 kr only had 11 systems, where it got five right. This amount corresponds to 3.9 % of 31 4. Results the 285 systems it tried to predict. The algorithm guessed, on average, 2.5 of the seven races correctly. The algorithm accuracy slightly increased when forced to pick only one horse in 3 races. Then it got five right in 4.5 % of the races and found the winner in 2.6 of the seven races. With a budget of 1024 kr and not being forced to pick one horse in any race, the greedy algorithm got three systems with six rights and 37 systems with five rights. In percentages, this results in 0.1% and 13%. The number of found winners on average in each system has also increased to 3.16. However, when forced to pick one horse in 4 races, the algorithm gets one system with seven correct guesses (0.1%), four systems with six (1.4%), and 35 systems with five (12%). And the found winner average in each system increased to 3.6. Lastly, with a budget of 8192 kr, the algorithm got three systems with seven (1%), 22 with 6 (7.7%), and 90 with 5 (31.6%) and found the winner on average 3.9 times in each system. It also so no improvement when forced to pick one horse in 1-4 races. 4.2.2 Experiment 2 4.2.2.1 Greedy Algorithm With normalization enabled and a budget of 124 kr, the greedy algorithm gets one system with seven right (0.35%), 13 with 6 (4.6%), and 60 with 5 (21.1%), significantly better than experiment 1. The same can also be seen in the average number of races correct in each system, where this experiment gets 3.4, which is better than experiment one with a budget of 1024. However, it does not see any improvement when forced to only pick one horse in 1-4 races. With the budget increase to 1024 kr, the same trend can be seen where this experi- ment gets seven systems with seven correct picks (2.5%), 39 with six (13.7%), and 115 (40.4%) with five, which is significantly better than even the highest budget of experiment one. Now the average of correct choices in each system is up to 4.2. However, similar to the lowest budget, no improvement was seen when forced to pick one horse. For the highest budget of 8192 kr, experiment 2 gets 23 with seven right (8.1%), 90 with 6 (31.6%), 186 with 5 (65.3%), and 4.9 correct picks on average in each system. This result is incomparable to any previous one; however, once again no improvement when forced to pick one horse. 32 4. Results 4.2.3 Baselines 4.2.3.1 greedy odds baseline The greedy odds baseline did unsurprisingly well, always taking the horse with the best odds until the budget was reached. With a budget of 128, it got 15 systems with seven correct races (5.3%), 81 with six correct (28.4%), and 179 with five (62.8%). Also, forcing the baseline to pick only one horse in two races did see an increase in both the number of seven right systems and five, with 21 (7.3%) and 181(63.5%). There was a slight decrease in the number of six correct systems to 78 (27.3%). With a budget of 1048, these are increased to 51 (17.9%), 145(50.8%), and 231(81.1%) of seven, six, and five rights. With this budget, there was no increase in performance when forcing the baseline to pick one horse in one to three races. Lastly, with budget 8192, the baseline gets 99(34.7%) with seven, 205(72%) with six, and 258(90.5%) with five. As with the previous title, no improvement was seen when forced only to pick one horse in one to three races. What needs to be remembered, however, is that this baseline might have picked a lot of the correct horses; however, they are the horse with the lowest payout. This baseline may give less payout than a single seven-correct system with multiple upset wins. 33 4. Results 34 5 Conclusion 5.1 Discussion A few conclusions can be drawn from the presented results in Section 4. A significant one is that the accuracy from both the single race prediction and system prediction indicates that the machine learning approach has found a connection between the input data and the finishing results since both perform significantly better than the random baselines. Each increasing step with the budget showed an unsurprising rise in overall accuracy and the number of systems with five, six, and seven correct races. However, Figure 5.1 shows that despite the increase in the budget, the algorithm’s performance did not have a proportional increase in any of the two experiments. The conclusion that can be drawn from this is that the algorithm efficiency will presumably decrease as the budget increases. Figure 5.1 Another significant conclusion that can be drawn is that normalizing the data had a considerable performance boost. The top two graphs in Figure 5.2 show the accuracy between the two experiments, and the two bottom shows how much the accuracy 35 5. Conclusion increased between the two budgets. Experiment 2 considerably outperforms Exper- iment 1. It even does this with a one-level lower budget in every case, which is extremely impressive considering the jump between budgets 2 and 3. The dispro- portion between the accuracy and the budget can again be seen in these figures, strengthening the presumption that efficiency decreases as the budget increase. Figure 5.2 Forcing the algorithm to only choose one horse in one to four races showed no measurable improvement in any case, which is interesting, as that is a common strategy among human players. This result suggests that it is too uncertain only to pick one horse if the data does not support it, even if it allows for more horses in other races. However, the result shows that the machine learning model has found a pattern between the data and the finishing result, indicating that solving Multi-objective optimization by Machine Learning is possible; however, a significant part of the project has been spent on data collection and processing showing that data quality and quantity are paramount. For this approach to be applicable to other problems, such as stock market or traffic management, sufficient high-quality data must be available. 36 5. Conclusion 5.2 Future work One of the most obvious things to do is to use the existing V75 systems instead of combining seven random races when evaluating the system picker. This imple- mentation should be straightforward but requires changes to the scraper and a new process that connects the related races. Doing this should give a better indication of how well the system actually performs. It should also be an excellent way to determine if combining seven random is a good way to evaluate the system pickers by analyzing if their performance is similar. However, there are a few drawbacks with only using actual V75 races; since there is roughly one every week, there are only 52 every year, so having 200, as with the random one, would require scraping roughly four years back. There is also the issue of not having any of these V75 races in the training data set. In the random system picker, this was solved by only taking from the test data set; however, this is not possible, seeing how small the test set is. One solution is to remove the V75 races that will be used to test the system picker from the train data set; however, there is the issue of testing on older races than those used for training. These drawbacks presumably mean that a combination of random and actual V75 races is optimal. One way to address many issues with only using V75 is to include more systems than V75, such as V64, V65, and V86. Including these will increase the number of systems to test; however, there is the issue that these systems have different costs and payouts at different numbers of correct races. So presumably, some distinctions need to be made in the system evaluation to separate them. With the addition of using real systems to evaluate, new metrics can also be cal- culated. One of the most interesting ones would be to measure the payout and profit/loss. This metric gives a significantly better indication of how well the picker performs since payout can vary greatly depending on what horses won. For example, a system with five rights can have a bigger payout than a system with seven correct, or a system with seven rights can have a payout 10000 times more than a five rights system. With these new metrics, the most optimal budget can be calculated. This would be the budget that gives the most profit or minor loss. Further future work could be additional feature engineering. Currently, feature engineering is rather basic and only calculates the average position in the last five races and similar data. However, more advanced data could be calculated, such as previous performance on the same track, starting position, driver, and much more. Performance could be defined by average speed, position, or win percentage. For instance, new data points could include average position on the current track, win percentage on the current track, and average time on the current track. With these new features, some feature selection could also be performed to only include the beneficial features, not those that dilute the data. Something also prevalent throughout this process was that increasing the size of the training data set increased performance. So finding the optimal size of the 37 5. Conclusion training data set where the performance increases subside would be interesting but time-consuming. Lastly, more picking algorithms could be tested as well. The current greedy one is very simple and the most obvious one. For example, a similar one could check if taking the best horse or the second and third best horse is cheaper, and this would increase the number of good horses it takes. Presumably, most additional picking algorithms will be more advanced variants of the basic greedy one. 5.3 Conclusion To conclude, this thises explores the possibility of Multi-objective optimization by Machine Learning and, more specifically, analyzes this by using horse racing. Some- what encouraging results were found since the machine learning model discovered a connection between the data and the finishing position. This performance resulted in an accuracy rate notably better than the random baseline and slightly better than the more advanced baselines. As for system prediction, the result was still good but did not beat the odds baseline. However, more research analyzing what races the predictor got right is important, and adding the profit metric. The strategy of only choosing one horse, commonly employed by human players, did not yield better results suggesting that the ML strategy differs from the traditional human ones. Future work was also discussed, suggesting further system improvements and ad- dressing challenges and limitations. Work such as expanding feature engineering and selection and testing new picking algorithms. In summary, this study opens up new opportunities for applying machine learning to multi-objective optimization problems and lays the basis for future exploration. 38 Bibliography [1] B. R., The Jockey Club and Its Founders: In Three Periods. Smith, Elder, 1891. [Online]. Available: https://books.google.no/books?id=SRBDAAAAIAAJ. [2] Brett, Kate McKay, How to bet on the ponies, https://www.artofmanliness. com/living/games-tricks/how-to-bet-on-horses/, Accessed: 2022-12-12, 2022. [3] https://www.atg.se/spel/2022-12-17/V75/romme, Accessed: 2022-12-12, 2022. [4] N. Kühl, M. Goutier, L. Baier, C. Wolff, and D. Martin, Human vs. supervised machine learning: Who learns patterns faster? 2020. doi: 10.48550/ARXIV. 2012.03661. [Online]. Available: https://arxiv.org/abs/2012.03661. [5] https://www.atg.se/V75/om-v75, Accessed: 2022-12-12, 2022. [6] T. Clark, “The use of artificial intelligence in horse racing: Predictive analytics for better performance,” 365 Retail, Mar. 2023. [Online]. Available: https: //365retail.co.uk/the-use-of-artificial-intelligence-in-horse- racing-predictive-analytics-for-better-performance/. [7] I. Ndiaye and K. Cornelis, “Horse racing prediction: A machine learning ap- proach (part 1),” CodeWorksParis, May 2021. [Online]. Available: https:// medium . com / codeworksparis / horse - racing - prediction - a - machine - learning-approach-part-1-44ed7fca869e. [8] I. Ndiaye and K. Cornelis, “Horse racing prediction: A machine learning ap- proach,” CodeWorksParis, May 2021. [Online]. Available: https://medium. com/codeworksparis/horse-racing-prediction-a-machine-learning- approach-part-2-e9f5eb9a92e9. [9] A. Campbell, “Use machine learning to predict horse racing,” Towards Data Science, Jun. 2020. [Online]. Available: https://towardsdatascience.com/ use-machine-learning-to-predict-horse-racing-4f1111fb6ced. [10] W.-C. Chung, C.-Y. Chang, and C.-C. Ko, “A svm-based committee machine for prediction of hong kong horse racing,” in 2017 10th International Con- ference on Ubi-media Computing and Workshops (Ubi-Media), 2017, pp. 1–4. doi: 10.1109/UMEDIA.2017.8074091. [11] P. Borowski, M. Chlebus, et al., Machine learning in the prediction of flat horse racing results in Poland. University of Warsaw, Faculty of Economic Sciences, 2021. [12] Kaggle, March machine learning mania 2022 - mens, https://www.kaggle. com/competitions/mens-march-mania-2022, Accessed: 2022-12-12, 2022. 39 https://books.google.no/books?id=SRBDAAAAIAAJ https://www.artofmanliness.com/living/games-tricks/how-to-bet-on-horses/ https://www.artofmanliness.com/living/games-tricks/how-to-bet-on-horses/ https://www.atg.se/spel/2022-12-17/V75/romme https://doi.org/10.48550/ARXIV.2012.03661 https://doi.org/10.48550/ARXIV.2012.03661 https://arxiv.org/abs/2012.03661 https://www.atg.se/V75/om-v75 https://365retail.co.uk/the-use-of-artificial-intelligence-in-horse-racing-predictive-analytics-for-better-performance/ https://365retail.co.uk/the-use-of-artificial-intelligence-in-horse-racing-predictive-analytics-for-better-performance/ https://365retail.co.uk/the-use-of-artificial-intelligence-in-horse-racing-predictive-analytics-for-better-performance/ https://medium.com/codeworksparis/horse-racing-prediction-a-machine-learning-approach-part-1-44ed7fca869e https://medium.com/codeworksparis/horse-racing-prediction-a-machine-learning-approach-part-1-44ed7fca869e https://medium.com/codeworksparis/horse-racing-prediction-a-machine-learning-approach-part-1-44ed7fca869e https://medium.com/codeworksparis/horse-racing-prediction-a-machine-learning-approach-part-2-e9f5eb9a92e9 https://medium.com/codeworksparis/horse-racing-prediction-a-machine-learning-approach-part-2-e9f5eb9a92e9 https://medium.com/codeworksparis/horse-racing-prediction-a-machine-learning-approach-part-2-e9f5eb9a92e9 https://towardsdatascience.com/use-machine-learning-to-predict-horse-racing-4f1111fb6ced https://towardsdatascience.com/use-machine-learning-to-predict-horse-racing-4f1111fb6ced https://doi.org/10.1109/UMEDIA.2017.8074091 https://www.kaggle.com/competitions/mens-march-mania-2022 https://www.kaggle.com/competitions/mens-march-mania-2022 Bibliography [13] Taylor Hatmaker, Bings prediction technology is 13-0 with its world cup pre- dictions, https://www.dailydot.com/debug/bing-world-cup-perfect- record/, Accessed: 2022-12-12, 2022. [14] IndustryTrends, “The future of sports betting: Ai-powered predictive analyt- ics,” Analytics Insight, Dec. 2022. [Online]. Available: https://www.analyticsinsight. net/the-future-of-sports-betting-ai-powered-predictive-analytics/. [15] A. Pretorius and D. A. Parry, “Human decision making and artificial intel- ligence: A comparison in the domain of sports prediction,” in Proceedings of the Annual Conference of the South African Institute of Computer Scien- tists and Information Technologists, ser. SAICSIT ’16, Johannesburg, South Africa: Association for Computing Machinery, 2016, isbn: 9781450348058. doi: 10.1145/2987491.2987493. [Online]. Available: https://doi.org/10.1145/ 2987491.2987493. [16] L. of Congress. “CSV, Comma Separated Values (RFC 4180),” Digital Preser- vation. (), [Online]. Available: https://www.loc.gov/preservation/digital/ formats/fdd/fdd000323.shtml. [17] F. Pezoa, J. L. Reutter, F. Suarez, M. Ugarte, and D. Vrgo, “Foundations of json schema,” in Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2016, pp. 263–273. [18] pandas development team, pandas: Data analysis and manipulation library. [Online]. Available: https://pandas.pydata.org/. [19] scikit-learn developers, train_test_split - scikit-learn. [Online]. Available: https: / / scikit - learn . org / stable / modules / generated / sklearn . model _ selection.train_test_split.html. [20] Martín Abadi, Ashish Agarwal, Paul Barham, et al., TensorFlow: Large-scale machine learning on heterogeneous systems, Software available from tensor- flow.org, 2015. [Online]. Available: https://www.tensorflow.org/. [21] F. Chollet et al. “Keras.” (2015), [Online]. Available: https://github.com/ fchollet/keras. 40 https://www.dailydot.com/debug/bing-world-cup-perfect-record/ https://www.dailydot.com/debug/bing-world-cup-perfect-record/ https://www.analyticsinsight.net/the-future-of-sports-betting-ai-powered-predictive-analytics/ https://www.analyticsinsight.net/the-future-of-sports-betting-ai-powered-predictive-analytics/ https://doi.org/10.1145/2987491.2987493 https://doi.org/10.1145/2987491.2987493 https://doi.org/10.1145/2987491.2987493 https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml https://www.loc.gov/preservation/digital/formats/fdd/fdd000323.shtml https://pandas.pydata.org/ https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html https://www.tensorflow.org/ https://github.com/fchollet/keras https://github.com/fchollet/keras A Appendix 1 A.1 Implementation A.1.1 Scraper The scraping class is initialized with only a single value. This value represents the desired number of races that should be acquired and is subsequently used in naming the resulting CSV file. Because of this naming convention, the scraper can first check if an existing file already has this name and, if so, return that file without doing any actual scrapping. This feature saves significant time; however, only the name is checked and not any of the content, meaning that if modifications have been made to the scraper class since creating the file, they will not be represented. When no previous file exists, the scraper invokes the method “scrape and save”. This method oversees the scraping and storing of race data and determines when to stop, and it also does some output to indicate how far along the scraping process is. The code for the method can be seen below. More specifically, the method has a while loop that scrapes, saves, and keeps track of the total number of scraped races. The while loop stops when the total number of races reaches the desired size. The method scrapes an archive page. An archive page can be seen as a list of race IDs in chronological order from new to old, and each page consists of 50 IDs. These IDs are then extracted with the method “Get race ids” and are then used to scrape the individual races in the method “get races,” which will be explained in detail later, but in essence, it returns races in a list of race objects. Lastly, these races are saved with the save race function, and the page number is incremented. This feature is crucial since it means the method saves continually throughout the process, and depending on the desired size, the execution time can be multiple hours. Saving like this mitigates the impact of unforeseeable events, such as power or internet outages and outliers in the data that cause errors. Now the different parts of the while loop will be explained in more detail, starting with the scraping of archives. def _scrape_and_save(self): while self.size <= self.goal_size: print(str(self.size) + "/" + str(self.goal_size)) archive = self._scrape_archive_page(page) I A. Appendix 1 race_ids = self._get_race_ids(archive) races = self._get_races(race_ids) self._save_races(races, page) page += 1 The scrape archive page method uses the request library to generate a post request acquiring the archive page at the given page number. The page and its content are then saved in a text file before being returned. This ensures that no unnecessary post requests are ever performed, and if the same page were to be asked for again, instead of making a request, it would be loaded and returned. The returned archive page contains much information, and most of it is not of interest to the scraper, and the page is also in a non-desirable format. These problems are what the “get race id” method addresses. This method loops through all the races in the archive page, extracts each race’s id using the Json library, and returns them. These ids are, as previously mentioned, then used in the “get races” methods. In this method, the ids are iterated through and used to scrape the corresponding race to that id. This race scrape contains information about the race and all the participating horses. Similar to the archive, each race scrape is saved, so it is only to be performed once. Another similarity is that the scraped race contains some clutter and is in a non-desirable format. Furthermore, the json library is used again in the “scrape to race” method to turn the scrape into a list of race objects and return them to the previously mentioned “scrape and save” method that then saves them. The “save races” method is relatively straightforward. It takes the given race objects and turns them into a data frame using the Panda library, and the library is then used again to concatenate the new race data frame with the previously saved races. After all the desired races have been scraped, extracted, and saved, they are sent to the preprocessing class, which will be explained in detail in the next section A.1.2 Preprocessor The DataPreprocsssing class takes four initiating parameters: • df • Unprocessed data in raw form • A name used to save and load the processed data • A path that states where the processed data should be stored • A boolean value which indicates whether the data should be normalized II A. Appendix 1 Before any preprocessing, the program checks whether a processed file with the same name already exists. If that is the case, the program assumes that no processing is necessary, loads the located files, and returns them. If no file is found, the program proceeds with preprocessing the given data in df with the method preprocess, which is seen in the code above. This method begins with calculating the lap time of each horse in every race. This calculation is done by taking the horse km_time in seconds and multiplying it with the distance the horse will run in kílometesrs. The distance is not always the same for every horse in the same race. These lap times are then used to calculate the percentage of each horse finishing behind the winner. Doing this conversion makes each race more comparable to the other and makes it easier to determine if a race is close. Both methods are computationally heavy since they iterate through every horse in every race. The calculated lap percentages are then saved as a CSV file named “Y_*name*”. The program then moves on to processing the race-related data. This is the data that should be available before the race and not after. Because of this constraint, the program first removes all data considered cheating, such as finishing order, speed, and whether the horse galloped, essentially the data recorded during or after the race. Then, all data deemed unnecessary is removed. Unnecessary data can, for example, be the page number used during the scraping. A data point like this should not impact the actual race result and would only dilute the dataset. Lastly, a list of custom features/columns is removed. This method was added to ease the implementation of the data points tester and will be discussed in Section 2.5. The complete data frame of x is then returned and saved accordingly. A.1.3 Validator The validator takes a data frame of races as an instance variable and has three primary methods and a few smaller helper methods. The crucial methods are “vali- date_specific”, “validate_random”, and “_validate_race”. Validates_specific takes a list of integers as a parameter representing the indexes of races in the data frames. These are the races that should be validated, and the method now iterates through the integers, gets the race from the data frame, and calls validate_race. Validate random calls validate specific with a list of random integers and are used to remove any bias in the races chosen to be validated. Validate_race takes race as a parameter and is tasked to validate it. The validated race code can be seen below and begins with creating and printing a comment stating the race’s id and at what track. The method now starts iterating through every data point in the given race and asking for input on wheater the data is correct. Depending on the input, the comment is updated accordingly, and when all data points have been checked, the comment is returned. The comment is then saved at a given path. def _validate_race(self, race): comment = race.id + " at track: " + race.track_id for column in race.columns: datapoint = race.iloc[index] III A. Appendix 1 index += 1 print ("in race " + str(id) + " the " + column + " is " + str(datapoint)) input = self._get_input() if input == "y": comment = comment + " " + column + " correct" continue elif input == "n": comment = comment + " " + column + " incorrect" continue elif input == "s": print("skiping datapoint") continue elif input == "sr": break return comment A.1.4 Evaluator The Evaluator class has no instance variables, and all necessary inputs are given in the three primary methods. The three methods are Evaluate_predictions, Show, and Save. The Evaluate predictions method takes two parameters, the predicted and correct answers, and is tasked with determining how good the predictions are compared to the correct answers. The method does this by first iterating through every race and every prediction checking if the prediction was correct. In addition to getting the overall correct number of predictions, the correct prediction for each position is also saved. However, the current stored values are only a count and are almost meaningless, so the method now calculates the overall percentages, both the total and by position. The show method can now be called to print and show a bar graph of the different metrics. The save method can also be called with a path, and as the name suggests, it saves both the metrics and the graphs. IV List of Figures Introduction Horse Racing Trotting and V75 Background Machine learning Neural Networks Previous work Sport Methods Data Scraping Process Preprocessing Process Validator Process Evaluation Evaluator Process Baselines Random Baseline Starting positions baseline Win percentage baseline Machine learning models Tensorflow and Keras Model Hyper parameter tunner Multi-Event Decision Making Greedy naive odds baseline Greedy naive picking algorithm Results Single race predictions Baselines Random Baseline Starting track Win percentage Experiment 1 Experiment 2 System race predictions Experiment 1 Greedy Algorithm Experiment 2 Greedy Algorithm Baselines greedy odds baseline Conclusion Discussion Future work Conclusion Bibliography Appendix 1 Implementation Scraper Preprocessor Validator Evaluator