Diffusion models for novel view synthesis in autonomous driving
dc.contributor.author | Gasparyan, Artur | |
dc.contributor.author | Qiu, Ruiqi | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för elektroteknik | sv |
dc.contributor.examiner | Svensson, Lennart | |
dc.contributor.supervisor | Svensson, Lennart | |
dc.contributor.supervisor | Hess, Georg | |
dc.contributor.supervisor | Lindström, Carl | |
dc.contributor.supervisor | Tonderski, Adam | |
dc.date.accessioned | 2024-12-13T12:40:23Z | |
dc.date.available | 2024-12-13T12:40:23Z | |
dc.date.issued | 2024 | |
dc.date.submitted | ||
dc.description.abstract | Novel View Synthesis (NVS) generates target images from new camera poses using source images and their corresponding poses. It has gained prominence in the field of autonomous driving (AD) as a tool for generating synthetic data to improve perception systems. Current NVS implementations, such as Neural Radiance Fields (NeRFs), excel at constructing 3D scenes from sensory inputs but struggle to accurately render sparsely observed or unseen views. This thesis addresses these limitations by integrating Diffusion Models (DMs) into the NVS pipeline to enhance reconstruction quality in such cases. We propose a pipeline inspired by ReconFusion, training NeuRAD, a NeRF-based NVS method designed for dynamic AD data, on additional poses not present in the original training set. A pretrained, open-sourced DM, Stable Diffusion, provides supervision by refining NeuRAD’s outputs for these unseen views. To improve the DM’s performance on AD scenes, we finetune it using Low-Rank Adaptation (LoRA), enabling efficient adaptation to small datasets. ControlNet is incorporated to extend the diffusion model with additional conditioning signals, ensuring better alignment with scene-specific characteristics. Despite introducing these enhancements, our experiments reveal mixed results. While some metrics show improvement, others remain inconsistent, particularly in challenging scenarios. We identify weak conditional signals and limited LoRA rank as potential limitations. Future research should explore incorporating more robust conditioning signals, such as depth or temporal information, and training on diverse scenes to improve generalization and stability. These directions offer promising avenues for advancing NVS in AD applications. | |
dc.identifier.coursecode | EENX30 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/309029 | |
dc.language.iso | eng | |
dc.relation.ispartofseries | 00000 | |
dc.setspec.uppsok | Technology | |
dc.subject | Scene Reconstruction | |
dc.subject | Novel View Synthesis | |
dc.subject | Neural Radiance Fields | |
dc.subject | Autonomous Driving | |
dc.subject | Deep Learning | |
dc.subject | Generative Models | |
dc.subject | Diffusion Models | |
dc.subject | Latent Diffusion Models, | |
dc.subject | Closed Loop Simulation | |
dc.title | Diffusion models for novel view synthesis in autonomous driving | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Complex adaptive systems (MPCAS), MSc |