Synthetic Data Generation Techniques for Automotive Machine Learning
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Complex adaptive systems (MPCAS), MSc
Engineering mathematics and computational science (MPENM), MSc
Engineering mathematics and computational science (MPENM), MSc
Publicerad
2023
Författare
Fredriksson, Jonny
Durgé , Rasmus
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Seat belts drastically reduce the risk of injury or death, given that one is wearing
them correctly. This thesis emanates from Volvo Cars’ aspiration to tackle this
risk, using the growing potential of machine learning. The foundation of this work
stems from another thesis at Volvo Cars, where a semantic segmentation model
was developed, for identifying and segmenting the seat belt in an image of a car
occupant. To apply this segmentation model approach, the tedious and costly
process of collecting and annotating data is fundamental. The thesis explores the
concept of using synthetic data, i.e., data that is made by software and annotated
in silico, as a substitute for, and a complement to, previously collected real-world data.
Specifically, the thesis explores different methods on how to apply and generate synthetic data and what aspects improve its quality, regarding the prediction accuracy of
the segmentation model. As a measure of prediction accuracy, the mean intersection
over union (IoU) over a test set consisting of real-world images is used. Several
segmentation models, with different architectures, are evaluated to find the best-performing network. The thesis also explores the concept of domain randomization,
which aims to narrow the domain gap between the synthetic and real data, as well as
multiple label annotations to investigate whether identifying other objects improves
segmentation of the seat belt, and guided backpropagation to explain predictions
made by the segmentation model.
This thesis shows that, although the choice of network architecture is shown to have
a relatively small effect on performance, the top performing network is found to be a
Unet++ decoder and a ResNet 34 encoder. The thesis also suggests that when there
is a scarcity of real-world data, introducing synthetic data can improve prediction
accuracy, both by training the model on a mix of real and synthetic data, and by
pre-training the model on synthetic data before training it on real data. The results
also suggest that when the model is trained to also identify objects which often
interact with the seat belt, e.g., the occupant’s shirt, its prediction accuracy on the
seat belt can improve.
The thesis has identified ways to make synthetic data more appropriate for training
the seat belt segmentation model. This thesis successfully demonstrates that there
is a lot of potential to further develop the application of synthetic data in the future.
One obvious approach would be to use a more powerful graphics engine, making the
synthetic data even more realistic
Beskrivning
Ämne/nyckelord
Seat belt, car occupant, semantic segmentation, neural networks, augmentations, synthetic data, domain gap