Multimodal Data Fusion for BEV Perception

dc.contributor.authorXUAN, YUNER
dc.contributor.authorQU, YING
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerAxelson-Fisk, Marina
dc.contributor.supervisorSelpi, Selpi
dc.date.accessioned2025-01-08T12:18:30Z
dc.date.available2025-01-08T12:18:30Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractIn autonomous driving, sensors are situated across different parts of the vehicle to capture the information from the surrounding environments to allow the autonomous vehicles to address various tasks related to driving decisions, like object detection, semantic segmentation and path planning. In the diverse approaches of perception, birds-eye-view (BEV) perception has progressed impressively over recent years. In contrast to front-view or perspective view modalities, BEV provides a comprehensive representation of the vehicles surrounding environment, which is fusion-friendly and offering convenience for downstream applications. As vehicle cameras are oriented outward and parallel to the ground, the captured images are in a perspective view that is perpendicular to the BEV. Consequently, a crucial part of BEV perception is the transformation of multi-sensor data from perspective view (PV) to BEV. The quality and efficiency of this transformation play a critical role in influencing the performance of subsequent specific tasks. This thesis project aims to study comprehensive multimodal data fusion solutions for PV-to-BEV transformation. We analyzed the common and unique characteristics of existing approaches and assessed their performance against a selected downstream perception task, focusing on object detection within a short distance. Additionally, we implemented mainly two modules Global Position Encoding (GPE) and Information Enhanced Decoder (IED) to enhance the performance of the multi-modal data fusion model. Keywords:
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/309060
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectMulti Modality
dc.subjectSensor Fusion
dc.subjectBEV Perception
dc.subjectLiDAR
dc.subjectCamera
dc.subjectTransformer
dc.subjectDeep learning
dc.subject3D Object Detection
dc.subjectthesis
dc.titleMultimodal Data Fusion for BEV Perception
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 24-56 YX YQ.pdf
Storlek:
1.61 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: