Evaluation of models for forecasting traffic speed and classifying traffic delay in European cities
Examensarbete för masterexamen
Congestion and traffic delays are big challenges that cities face today. In this the sis, we use traffic speed data from 2018 for forecasting traffic speeds and classifying traffic delays. The data consists of 15-minute time-intervals and covers 15 European cities. As a starting point we investigate if machine learning can be used to forecast traffic speeds, specifically if more advanced forecasting models improve upon simple statistical models. It is also investigated if weather variables such as precipitation and temperature improve the forecasting models. Forecasting models require con tinuous traffic speed input data in order to make predictions about the future. To create more generic models capable of making predictions regardless of the current traffic speed, binary classification models are used with the goal of classifying if there is a delay or not at some point in time. The traffic speed data is transformed into a binary value, 1 for when there is a delay and 0 for when there is no delay. This is used as the target variable of the models. For the classification models, we use temporal features, weather features and graph-related centrality features that describe the road location. We also take into account the infrastructure along a road as well as the area around a road that might have an impact on traffic con gestion. A larger portion of this work is focused on the binary classification with regards to traffic delay. The classification is done to answer three questions. What are the most important features for creating classification models? How well do the models perform on untrained city data? Do the models improve when increasing the number of training cities? Regardless if weather data was used in conjunction with traffic speed data as input to the model, more advanced forecasting models didn’t improve the performance significantly. With regards to the classification results, the most important features were found to be related to public transportation, where bus stops were the most prominent feature followed by schools. Moreover, it was shown that generalizing the trained models to new city data was indeed possible and that it performed better than a random classifier (a classifier that guesses the class for an input). Finally the results showed an overall increase in performance of the classifiers when increasing the number of training cities but more work is needed for optimizing the models in the future.
computer science , thesis , traffic data , machine learning , classification , forecasting , congestion , time series