Data Hide and Seek is Over!
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
A complete pipeline is presented for automatic annotation and retrieval of multimodal vehicle data, using synchronized front-facing camera images and LiDAR point clouds. Driving scenes are classified across four categories: road condition, road type, lighting, and visibility. A dataset of 1,878 one-minute segments is constructed from over 200 hours of real-world driving. Segments are manually labeled to provide ground-truth annotations for training and evaluation, and selected to ensure an even distribution across all scenario categories.
Separate models are trained for each sensor: a VGG19-based CNN for image classification and a lightweight PointNet for LiDAR point clouds. The best-performing vision model achieves strong results across all categories, while the LiDAR model performs best on road condition and visibility. A fusion model, implemented as a small multilayer perceptron, combines outputs from both sensors and outperforms the individual models, particularly on more difficult scenarios and categories.
Sequence-level aggregation of predictions is applied to reduce frame-level variation and improve accuracy. A proof-of-concept data retrieval interface is also presented, enabling users to filter and inspect data based on predicted labels and confidence scores, and to explore both camera images and LiDAR point clouds for each retrieved segment.
Beskrivning
Ämne/nyckelord
Automatic Annotation, Multimodal, Camera, LiDAR, CNN, PointNet, Data Retrieval