Hierarchical Architecture Optimization for Efficient Transformer-based Monte Carlo Denoising

Gong, Xinglu

Hierarchical Architecture Optimization for Efficient Transformer-based Monte Carlo Denoising

dc.contributor.author	Gong, Xinglu
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Assarsson, Ulf
dc.contributor.supervisor	Sintorn, Erik
dc.date.accessioned	2026-06-30T07:37:38Z
dc.date.issued	2026
dc.date.submitted
dc.description.abstract	Physically-based Monte Carlo rendering heavily relies on deep learning denoisers to reconstruct photorealistic images from low Sample-Per-Pixel (SPP) inputs. Re cently, the Joint Self-Attention (JSA) framework, a Transformer-base method uti lizing auxiliary G-buffers, has achieved state-of-the-art visual fidelity. However, the quadratic computational complexity of standard multi-head self-attention disquali fies it from interactive or real-time rendering applications. To bridge this efficiency gap, this thesis proposes a highly efficient denoising architecture by systematically eliminating the computational redundancy of standard JSA frameworks. The pro posed optimization strategy is executed on two architectural levels. At the micro level, we adapt the Single-Head Joint Self-Attention (SH-JSA) module with a partial channel ratio to preserve high-frequency structural features while reducing compu tational cost. Furthermore, guided by hardware profiling, we replace early-stage attention blocks with optimized Convolutional Neural Networks (CNNs). At the macro-level, we progressively streamline the global U-Net structure by implement ing a symmetric decoder, reducing the network depth to three stages, and expanding the input patch size. Extensive evaluations demonstrate a leap in efficiency. For high-resolution 1024 × 1024 inputs, the proposed framework reduces the network parameter count by 86.6% and slashes the inference latency by 91.6%. While this extreme acceleration introduces a minor 4.9% drop in Peak Signal-to-Noise Ratio (PSNR), it successfully transitions the JSA-based denoiser from offline computation to interactive frame rates.
dc.identifier.coursecode	DATX05
dc.identifier.uri	https://hdl.handle.net/20.500.12380/311645
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Computer graphics, Monte Carlo rendering, real-time rendering, Monte Carlo denoising, transformer-based denoising.
dc.title	Hierarchical Architecture Optimization for Efficient Transformer-based Monte Carlo Denoising
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	High-performance computer systems (MPHPC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 26-36 XG.pdf
Size:: 1.93 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen