Vectorizing FFT for faster AI Convolutions
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The Fast Fourier Transform (FFT) is a widely used algorithm in signal processing, communications and image processing. In this thesis we implemented and investigated FFT convolutions that leverage vector length agnostic programming
for convolutional neural networks with the ARM Scalable Vector Extension (SVE) and RISC-V ”V” vector extensions. Our research aimed to address the limitations of traditional vectorisation techniques that require unportable fixed length vector instructions. We analysed the performance of applying vector length agnostic instructions with different vector lengths and L2 cache sizes. Due to unforeseen issues with simulator programs, we were unable to run all benchmarks and investigate all vector lengths as originally planned. However, our results showed that code using both vector extensions benefit from being portable by showing increasing speedups with simulated vector lengths. At best, there was a speedup of two times compared to the baseline using a short vector length of 512 bits, though vectorised implementations of the General Matrix Multiply (GeMM) and Winograd convolutions outperformed our FFT implementation by three to four times on the SVE architecture and three to eleven times on the RISC-V ”V” architecture on a network with small kernel sizes unfavourable to FFT. In conclusion, while the tools for simulating these architectures may be immature our investigation shows that the FFT convolution benefits from vector length agnostic programming.
Beskrivning
Ämne/nyckelord
Computer science, engineering, project, thesis, HPC, FFT, CNN, vector length agnostic programming, RISC-V ”V”, ARM SVE