ODR kommer att vara otillgängligt pga systemunderhåll onsdag 25 februari, 13:00 -15:00 (ca). Var vänlig och logga ut i god tid. // ODR will be unavailable due to system maintenance, Wednesday February 25, 13:00 - 15:00. Please log out in due time.
 

Efficient And Numerically Stable GPU Implementations of The Fast Walsh-Hadamard Transform

dc.contributor.authorAndersson, Joel
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerKemp, Graham
dc.contributor.supervisorKarppa, Matti
dc.date.accessioned2026-01-16T08:43:40Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractThe Walsh-Hadamard Transform is a fundamental mathematical operation used for many applications. Recent findings from its applicability within Convolutional Neural Networks (CNNs) has sparked a lot of attention, leading to very efficient GPU implementations of the Fast Walsh-Hadamard Transform (FWHT) algorithm. However, current state-of-the-art FWHT implementations are purpose built for usage within CNNs, thus the input size is heavily limited and the precision supported is low, reducing the number of alternative applications. Additionally, recent algorithmic improvements by Alman and Rao have not been used in these implementations. This thesis tackles all of these problems, presenting GPU implementations of both the Folklore and Alman & Rao algorithms. The implementations support input sizes of up to 263 and the numerical formats: FP64, FP32, FP16 and BF16. Both implementations suffer heavily from being memory bound, only outperforming Nvidia’s implementation of the computationally heavier Fast Fourier Transform on L4/L40s GPUs when using FP64. We find no benefit of using Alman & Rao’s algorithm. Additionally, we employ techniques heavily inspired by Kahan and Neumaier summation, empirically reducing relative errors on a wide range of real world applications. For a given numerical format, the median reduction of the relative error over all experiments is 30% and 75% respectively, while using Alman & Rao’s algorithm leads to a 50% increase of the relative error. All implementations are available as free software: https://github.com/erwinia42/FWHT.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310902
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectFWHT
dc.subjectWHT
dc.subjectGPU
dc.subjectCUDA
dc.subjectKahan
dc.subjectNeumaier
dc.subjectFloating Point Errors
dc.titleEfficient And Numerically Stable GPU Implementations of The Fast Walsh-Hadamard Transform
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 25-139 JA.pdf
Storlek:
6.01 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: