Modelling temporal context for traffic light recognition using RNNs
Examensarbete för masterexamen
Björnsson, David Freyr
Abstract The purpose of this thesis is to investigate whether or not including temporal context using recurrent neural networks in real-time object detection systems can improve detection performance in traffic light recognition. This was investigated using the DriveU traffic light dataset. Two variations of the YOLOv4 object detection system were created. The first variation is a LSTM which takes as input the bounding boxes predicted by YOLOv4 and outputs updated predictions. The second variation is a modification of the YOLOv4 network in which convolutional layers are replaced with convolutional LSTMs. With a limited number of experiments, it was found that the baseline model outperforms the more complicated sequential models. However, there is evidence that this is due to the sequential training strategy since the YOLOv4 baseline was outperformed by some sequential models when it adopted the sequential training strategy. The baseline YOLOv4 model achieved best performance on a held-out test set. The best sequential model achieved lower detection performance. When the baseline YOLOv4 was trained with the sequential training strategy, it achieved worse performance than the sequential models. Modelling temporal context using recurrent neural networks may improve detection performance, but answering the question requires an exhaustive search for a training strategy and model architecture. The analysis conducted in this thesis provides no evidence that modelling temporal context with YOLOv4 improves traffic light recognition performance on the DriveU dataset.
object detection; traffic light recognition; recurrent neural networks; temporal context; YOLO