Synthetic Data Generation Techniques for Automotive Machine Learning

Examensarbete för masterexamen
Master's Thesis
Complex adaptive systems (MPCAS), MSc
Engineering mathematics and computational science (MPENM), MSc
Fredriksson, Jonny
Durgé , Rasmus
Seat belts drastically reduce the risk of injury or death, given that one is wearing them correctly. This thesis emanates from Volvo Cars’ aspiration to tackle this risk, using the growing potential of machine learning. The foundation of this work stems from another thesis at Volvo Cars, where a semantic segmentation model was developed, for identifying and segmenting the seat belt in an image of a car occupant. To apply this segmentation model approach, the tedious and costly process of collecting and annotating data is fundamental. The thesis explores the concept of using synthetic data, i.e., data that is made by software and annotated in silico, as a substitute for, and a complement to, previously collected real-world data. Specifically, the thesis explores different methods on how to apply and generate synthetic data and what aspects improve its quality, regarding the prediction accuracy of the segmentation model. As a measure of prediction accuracy, the mean intersection over union (IoU) over a test set consisting of real-world images is used. Several segmentation models, with different architectures, are evaluated to find the best-performing network. The thesis also explores the concept of domain randomization, which aims to narrow the domain gap between the synthetic and real data, as well as multiple label annotations to investigate whether identifying other objects improves segmentation of the seat belt, and guided backpropagation to explain predictions made by the segmentation model. This thesis shows that, although the choice of network architecture is shown to have a relatively small effect on performance, the top performing network is found to be a Unet++ decoder and a ResNet 34 encoder. The thesis also suggests that when there is a scarcity of real-world data, introducing synthetic data can improve prediction accuracy, both by training the model on a mix of real and synthetic data, and by pre-training the model on synthetic data before training it on real data. The results also suggest that when the model is trained to also identify objects which often interact with the seat belt, e.g., the occupant’s shirt, its prediction accuracy on the seat belt can improve. The thesis has identified ways to make synthetic data more appropriate for training the seat belt segmentation model. This thesis successfully demonstrates that there is a lot of potential to further develop the application of synthetic data in the future. One obvious approach would be to use a more powerful graphics engine, making the synthetic data even more realistic
Seat belt, car occupant, semantic segmentation, neural networks, augmentations, synthetic data, domain gap
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Teknik / material