Implementing a machine learning microservice for scoring and predicting vehicle driving attributes and their impact on costs
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Minimizing automotive insurance costs and other forms of operating costs has become a new priority within the vehicle industry as it is included in the expanding subscription based business model for vehicles. Research has shown that automotive insurance costs can be decreased if the automotive company manages to prove to the insurance company that driving behavior of the vehicles are better than expected. It has also been shown that additional operating costs such as sending a replacement car during service are also expenses which would be reduced by better driving behavior. This study aims to provide an analysis that informs automobile companies how driving behavior attributes (trip data) affects operating cost. Through the use of machine learning models, the question is, if an automobile brand has trip data available, is it possible to create an analysis that can accurately predict service costs and other operating costs for the vehicle? The analysis was made through the use of a machine learning model using a supervised algorithm called extreme gradient boosting. The machine learning model has been trained using trip data and operating cost data, where the model processes trip data to predict the likelihood of additional operating costs exceeding 10.000kr. In the absence of real data, the data used in the thesis was generated based on car statistics, not from real cars. The work was done with a micro-service structure, meaning multiple small services communicated with each other through API:s. The analysis for the final model demonstrated that it is possible to predict operating
costs with a fairly good accuracy according to several evaluation metrics that was used to evaluate the model. The final model resulted with a 57% accuracy in finding vehicles with additional operating costs and a 92% accuracy in finding vehicles without additional operating costs. The results indicate that the dataset is too imbalanced due to that the rarity of requiring additional operating cost. This was handled by using the average trip data per car instead of processing all trip data individually. The machine learning models accuracy significantly increased once the imbalance ratio went above one car that requires operating cost for every 30 cars that does not.
Beskrivning
Ämne/nyckelord
Python, Micro-services, Machine Learning, Kubernetes, Azure ML, XGBoost