Hierarchical Architecture Optimization for Efficient Transformer-based Monte Carlo Denoising

dc.contributor.authorGong, Xinglu
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerAssarsson, Ulf
dc.contributor.supervisorSintorn, Erik
dc.date.accessioned2026-06-30T07:37:38Z
dc.date.issued2026
dc.date.submitted
dc.description.abstractPhysically-based Monte Carlo rendering heavily relies on deep learning denoisers to reconstruct photorealistic images from low Sample-Per-Pixel (SPP) inputs. Re cently, the Joint Self-Attention (JSA) framework, a Transformer-base method uti lizing auxiliary G-buffers, has achieved state-of-the-art visual fidelity. However, the quadratic computational complexity of standard multi-head self-attention disquali fies it from interactive or real-time rendering applications. To bridge this efficiency gap, this thesis proposes a highly efficient denoising architecture by systematically eliminating the computational redundancy of standard JSA frameworks. The pro posed optimization strategy is executed on two architectural levels. At the micro level, we adapt the Single-Head Joint Self-Attention (SH-JSA) module with a partial channel ratio to preserve high-frequency structural features while reducing compu tational cost. Furthermore, guided by hardware profiling, we replace early-stage attention blocks with optimized Convolutional Neural Networks (CNNs). At the macro-level, we progressively streamline the global U-Net structure by implement ing a symmetric decoder, reducing the network depth to three stages, and expanding the input patch size. Extensive evaluations demonstrate a leap in efficiency. For high-resolution 1024 × 1024 inputs, the proposed framework reduces the network parameter count by 86.6% and slashes the inference latency by 91.6%. While this extreme acceleration introduces a minor 4.9% drop in Peak Signal-to-Noise Ratio (PSNR), it successfully transitions the JSA-based denoiser from offline computation to interactive frame rates.
dc.identifier.coursecodeDATX05
dc.identifier.urihttps://hdl.handle.net/20.500.12380/311645
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectComputer graphics, Monte Carlo rendering, real-time rendering, Monte Carlo denoising, transformer-based denoising.
dc.titleHierarchical Architecture Optimization for Efficient Transformer-based Monte Carlo Denoising
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeHigh-performance computer systems (MPHPC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 26-36 XG.pdf
Size:
1.93 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Size:
2.35 KB
Format:
Item-specific license agreed upon to submission
Description: