Engineering an Efficient Implementation of Pagh’s Algorithm for Sparse Matrix Multiplication

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis explores approximate matrix multiplication through the algorithm proposed by R. Pagh in Compressed Matrix Multiplication (2013) and its implementation developed by J. Andersson and M. Karppa. We identify memory bandwidth as the primary performance bottleneck and implement support for handling sparse inputs and outputs. We further introduce several optimizations to improve cache utilisation. While fitting the input data into the L3 caches of our CPU did not improve L3 hit rate, it did lead to the improvement in the L1 hit rate compared to the dense implementation. Our experimental evaluation demonstrates modest yet consistent runtime improvements, particularly on large-scale and sparse inputs, while retaining the mathematical guarantees of the original implementation. Despite these optimisations, our implementations remain outperformed by the stateof- the-art libraries on all tested real-world datasets. However, our modifications reduce the asymptotic runtime of the compression step and significantly decrease memory usage when processing sparse matrices. Based on our findings, we suggest potential directions for future research, including further optimization strategies. These should primarily focus on reducing memory traffic, as improvements in memory locality alone are unlikely to yield substantial gains.

Description

Keywords

Matrix Multiplication, Sparse Matrix, High Performance Computing, Fast Fourier Transform, Hashing.

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By