Multimodal Data Fusion for BEV Perception

XUAN, YUNER; QU, YING

Multimodal Data Fusion for BEV Perception

dc.contributor.author	XUAN, YUNER
dc.contributor.author	QU, YING
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Axelson-Fisk, Marina
dc.contributor.supervisor	Selpi, Selpi
dc.date.accessioned	2025-01-08T12:18:30Z
dc.date.available	2025-01-08T12:18:30Z
dc.date.issued	2024
dc.date.submitted
dc.description.abstract	In autonomous driving, sensors are situated across different parts of the vehicle to capture the information from the surrounding environments to allow the autonomous vehicles to address various tasks related to driving decisions, like object detection, semantic segmentation and path planning. In the diverse approaches of perception, birds-eye-view (BEV) perception has progressed impressively over recent years. In contrast to front-view or perspective view modalities, BEV provides a comprehensive representation of the vehicles surrounding environment, which is fusion-friendly and offering convenience for downstream applications. As vehicle cameras are oriented outward and parallel to the ground, the captured images are in a perspective view that is perpendicular to the BEV. Consequently, a crucial part of BEV perception is the transformation of multi-sensor data from perspective view (PV) to BEV. The quality and efficiency of this transformation play a critical role in influencing the performance of subsequent specific tasks. This thesis project aims to study comprehensive multimodal data fusion solutions for PV-to-BEV transformation. We analyzed the common and unique characteristics of existing approaches and assessed their performance against a selected downstream perception task, focusing on object detection within a short distance. Additionally, we implemented mainly two modules Global Position Encoding (GPE) and Information Enhanced Decoder (IED) to enhance the performance of the multi-modal data fusion model. Keywords:
dc.identifier.coursecode	DATX05
dc.identifier.uri	http://hdl.handle.net/20.500.12380/309060
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Multi Modality
dc.subject	Sensor Fusion
dc.subject	BEV Perception
dc.subject	LiDAR
dc.subject	Camera
dc.subject	Transformer
dc.subject	Deep learning
dc.subject	3D Object Detection
dc.subject	thesis
dc.title	Multimodal Data Fusion for BEV Perception
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Data science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 24-56 YX YQ.pdf
Storlek:: 1.61 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen