Multimodal Data Fusion for BEV Perception
Ladda ner
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Publicerad
2024
Författare
XUAN, YUNER
QU, YING
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
In autonomous driving, sensors are situated across different parts of the vehicle to
capture the information from the surrounding environments to allow the autonomous
vehicles to address various tasks related to driving decisions, like object detection,
semantic segmentation and path planning. In the diverse approaches of perception,
birds-eye-view (BEV) perception has progressed impressively over recent years. In
contrast to front-view or perspective view modalities, BEV provides a comprehensive
representation of the vehicles surrounding environment, which is fusion-friendly and
offering convenience for downstream applications.
As vehicle cameras are oriented outward and parallel to the ground, the captured
images are in a perspective view that is perpendicular to the BEV. Consequently,
a crucial part of BEV perception is the transformation of multi-sensor data from
perspective view (PV) to BEV. The quality and efficiency of this transformation
play a critical role in influencing the performance of subsequent specific tasks.
This thesis project aims to study comprehensive multimodal data fusion solutions for
PV-to-BEV transformation. We analyzed the common and unique characteristics of
existing approaches and assessed their performance against a selected downstream
perception task, focusing on object detection within a short distance. Additionally,
we implemented mainly two modules Global Position Encoding (GPE) and Information
Enhanced Decoder (IED) to enhance the performance of the multi-modal data
fusion model.
Keywords:
Beskrivning
Ämne/nyckelord
Multi Modality , Sensor Fusion , BEV Perception , LiDAR , Camera , Transformer , Deep learning , 3D Object Detection , thesis