Reducing MPI Communication Latency with FPGA-Based Hardware Compression

BOURBIA, ANIS

Reducing MPI Communication Latency with FPGA-Based Hardware Compression

dc.contributor.author	BOURBIA, ANIS
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Petersen Moura Trancoso, Pedro
dc.contributor.supervisor	Vázquez Maceiras, Mateo
dc.date.accessioned	2026-01-19T09:01:07Z
dc.date.issued	2025
dc.date.submitted
dc.description.abstract	High-performance computing (HPC) clusters face significant communication overhead in distributed deep learning, where frequent data exchanges via the Message Passing Interface (MPI) can bottleneck overall training. This thesis explores an FPGA-based hardware compression approach to reduce MPI communication latency. We prototype integrating an FPGA compression module into the MPI stack, enabling on-the-fly compression of message payloads using fast lossless algorithms LZ4, Snappy, and Zstd. This hardware-accelerated compression offloads work from CPUs/GPUs and shrinks data volume before network transmission, thereby speeding up inter-node communication. In our evaluation, LZ4/Snappy/Zstd achieved compression ratios of 1.53x/1.51x/1.84x and reduced communication time by 34.6%, 33.8%, and 45.7%, yielding overall training speedups of 1.34x, 1.32x, and 1.50x, respectively. Experimental evaluation on representative deep learning workloads demonstrates up to a 1.50x improvement in end-to-end training time with the FPGA compression enabled. Among the tested compressors, Zstd achieved the highest compression ratio, translating to the greatest latency reduction and performance gain. These results highlight that FPGA-based compression can substantially improve throughput in distributed training by alleviating network delays, with negligible added overhead. The proposed method offers a practical path to accelerate HPC communications and scale deep learning workloads more efficiently.
dc.identifier.coursecode	DATX05
dc.identifier.uri	http://hdl.handle.net/20.500.12380/310924
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Computer
dc.subject	science
dc.subject	computer science
dc.subject	engineering
dc.subject	project
dc.subject	thesis
dc.subject	compression
dc.subject	FPGA
dc.subject	GPU
dc.subject	acceleration
dc.subject	HPC
dc.subject	DNN
dc.subject	lz4
dc.subject	zstd
dc.subject	snappy
dc.subject	lossless compression
dc.subject	MPI
dc.subject	networking
dc.subject	smart-NIC
dc.title	Reducing MPI Communication Latency with FPGA-Based Hardware Compression
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	High-performance computer systems (MPHPC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 25-179 AB.pdf
Size:: 1.45 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen