Engineering an Efficient Implementation of Pagh’s Algorithm for Sparse Matrix Multiplication
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
This thesis explores approximate matrix multiplication through the algorithm proposed by R. Pagh in Compressed Matrix Multiplication (2013) and its implementation developed by J. Andersson and M. Karppa. We identify memory bandwidth as the primary performance bottleneck and implement support for handling sparse inputs and outputs.
We further introduce several optimizations to improve cache utilisation. While fitting the input data into the L3 caches of our CPU did not improve L3 hit rate, it did lead to the improvement in the L1 hit rate compared to the dense implementation. Our experimental evaluation demonstrates modest yet consistent runtime improvements, particularly on large-scale and sparse inputs, while retaining the mathematical guarantees of the original implementation.
Despite these optimisations, our implementations remain outperformed by the stateof- the-art libraries on all tested real-world datasets. However, our modifications reduce the asymptotic runtime of the compression step and significantly decrease memory usage when processing sparse matrices.
Based on our findings, we suggest potential directions for future research, including further optimization strategies. These should primarily focus on reducing memory traffic, as improvements in memory locality alone are unlikely to yield substantial gains.
Beskrivning
Ämne/nyckelord
Matrix Multiplication, Sparse Matrix, High Performance Computing, Fast Fourier Transform, Hashing.
