Diffusion models for novel view synthesis in autonomous driving

dc.contributor.authorGasparyan, Artur
dc.contributor.authorQiu, Ruiqi
dc.contributor.departmentChalmers tekniska högskola / Institutionen för elektrotekniksv
dc.contributor.examinerSvensson, Lennart
dc.contributor.supervisorSvensson, Lennart
dc.contributor.supervisorHess, Georg
dc.contributor.supervisorLindström, Carl
dc.contributor.supervisorTonderski, Adam
dc.date.accessioned2024-12-13T12:40:23Z
dc.date.available2024-12-13T12:40:23Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractNovel View Synthesis (NVS) generates target images from new camera poses using source images and their corresponding poses. It has gained prominence in the field of autonomous driving (AD) as a tool for generating synthetic data to improve perception systems. Current NVS implementations, such as Neural Radiance Fields (NeRFs), excel at constructing 3D scenes from sensory inputs but struggle to accurately render sparsely observed or unseen views. This thesis addresses these limitations by integrating Diffusion Models (DMs) into the NVS pipeline to enhance reconstruction quality in such cases. We propose a pipeline inspired by ReconFusion, training NeuRAD, a NeRF-based NVS method designed for dynamic AD data, on additional poses not present in the original training set. A pretrained, open-sourced DM, Stable Diffusion, provides supervision by refining NeuRAD’s outputs for these unseen views. To improve the DM’s performance on AD scenes, we finetune it using Low-Rank Adaptation (LoRA), enabling efficient adaptation to small datasets. ControlNet is incorporated to extend the diffusion model with additional conditioning signals, ensuring better alignment with scene-specific characteristics. Despite introducing these enhancements, our experiments reveal mixed results. While some metrics show improvement, others remain inconsistent, particularly in challenging scenarios. We identify weak conditional signals and limited LoRA rank as potential limitations. Future research should explore incorporating more robust conditioning signals, such as depth or temporal information, and training on diverse scenes to improve generalization and stability. These directions offer promising avenues for advancing NVS in AD applications.
dc.identifier.coursecodeEENX30
dc.identifier.urihttp://hdl.handle.net/20.500.12380/309029
dc.language.isoeng
dc.relation.ispartofseries00000
dc.setspec.uppsokTechnology
dc.subjectScene Reconstruction
dc.subjectNovel View Synthesis
dc.subjectNeural Radiance Fields
dc.subjectAutonomous Driving
dc.subjectDeep Learning
dc.subjectGenerative Models
dc.subjectDiffusion Models
dc.subjectLatent Diffusion Models,
dc.subjectClosed Loop Simulation
dc.titleDiffusion models for novel view synthesis in autonomous driving
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Artur_Ruiqi - Master_Thesis - v1.5.pdf
Storlek:
9.28 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: