Learning to Navigate Over Stochastic Transport Networks Using Multi-Armed Bandits: A Contextual Approach for Efficient Online Learning in Road Network Graphs with Multi-Armed Bandits to Minimize Long- Term Travel Time
Ladda ner
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Publicerad
2024
Författare
Nilsson, Hannes
Johansson, Rikard
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
As part of the ongoing process of phasing out fossil fuel vehicles, attempts have been made to extend the effective range and adoption rate of electric vehicles through navigation systems focused on energy consumption. One way to approach this problem is by viewing route selections as a multi-armed bandit problem. This allows the system to adapt and recommend better routes over time, to minimize energy consumption. For navigation systems to be useful in practice, guiding vehicles from one point to another in minimal time is crucial. Therefore, this project examines the effectiveness of multi-armed bandit algorithms for time-efficient navigation in complex real-world environments, without initial information. For this purpose, we adapt a previously studied online learning framework developed for energy efficiency, and extract road segment travel time distributions from the traffic simulation software SUMO. The framework is applied to the Luxembourg road network and our results demonstrate that contextual multi-armed bandits using tree ensembles are highly effective. More specifically, we show that TEUCB and TETS, which we implement using both XGBoost and random forest, outperform state-of-the-art contextual multi-armed bandits based on neural networks and linear models. Further, by additional comparison of TEUCB and TETS to other bandit algorithms based on tree models, we identify at least two properties to explain their high level of performance. First, tree ensemble methods appear to offer relatively accurate travel time predictions from the contextual information available in this problem. Second, the ability to generalize over different arms and infer the travel time on one road segment from observations gathered on other ones, based on similarities, seems highly advantageous for this problem.
Beskrivning
Ämne/nyckelord
decision-making under uncertainty , online learning , machine learning , reinforcement learning , multi-armed bandits , navigation , shortest path problem