Towards Unknown Traffic Driving Pattern Discovery with Active Learning
Examensarbete för masterexamen
The promise of autonomous vehicles is frequently discussed and the traffic landscape is expected to change drastically with the technology of AD. Therefore rigorous test ing is essential for the reliance on and trust in the system. The vast amounts of labelled data for testing is not a trivial thing to obtain. One potential approach to this issue is Active Learning as its purpose is to produce a robust data set with minimal human interaction. The aim of this project is to examine the effectiveness of active learning for annotation of scenario data collected by a Volvo Cars Corpo ration (VCC) vehicle. Active learning trains a classifier on a small initial annotated data set and uses it to determine which unlabelled data points need annotation by a human. The classifier is then retrained with the updated annotated set until the budget of queries is spent. In this study, active learning is performed on the la tent space produced by multivariate Time Series t-Distributed Stochastic Neighbor Embedding (mTSNE), Recurrent Autoencoder (RAE) and Variational Recurrent Autoencoder (VRAE). Investigations are made into which embedding, classifier and query strategy is most suitable for the task of performing active learning on VCC’s trajectory data. A study is also performed on the impact of different degrees of class imbalance in the data. Area Under the Curve (AUC) and F1 score with regards to number of queried points are used as measures of performance. In many cases, active learning has proven an effective tool. We can conclude that the mTSNE embedding with the Support Vector Machines algorithm (SVM) as a classifier outperforms the other models, with both high AUC and F1 score in addition to a low run time and high stability. Entropy querying is observed as the most suitable query method. The separability of the mTSNE generated latent space provides a less complex model, although the mTSNE transformation itself is very computational heavy. RAE also performs well, though combined with a Neural Network (NN) it struggles with de tecting the smaller class as the class imbalance increases. VRAE proves to be a suboptimal choice of embedding, since it performs worse than the two others. We conclude that for mTSNE, 50 queries is sufficient to reach a high AUC and F1 score for most class imbalances, and for RAE, that number is 125. The potential of active learning to act as an unknown class detector was also investigated using RAE and VRAE embedded data. Cut in was regarded as the unknown class, and performance was measured in terms of number of queried cut ins. The results show that for a budget size up to 200 queries RAE with SVM classifier queries the most cut ins, while for a larger budget sizes VRAE with SVM queries the most cut ins.
Active Learning , Unknown Detection , Annotation , Time Series Analysis , mTSNE , SVM , Neural Network , query