Reducing MPI Communication Latency with FPGA-Based Hardware Compression

dc.contributor.authorBOURBIA, ANIS
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerPetersen Moura Trancoso, Pedro
dc.contributor.supervisorVázquez Maceiras, Mateo
dc.date.accessioned2026-01-19T09:01:07Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractHigh-performance computing (HPC) clusters face significant communication overhead in distributed deep learning, where frequent data exchanges via the Message Passing Interface (MPI) can bottleneck overall training. This thesis explores an FPGA-based hardware compression approach to reduce MPI communication latency. We prototype integrating an FPGA compression module into the MPI stack, enabling on-the-fly compression of message payloads using fast lossless algorithms LZ4, Snappy, and Zstd. This hardware-accelerated compression offloads work from CPUs/GPUs and shrinks data volume before network transmission, thereby speeding up inter-node communication. In our evaluation, LZ4/Snappy/Zstd achieved compression ratios of 1.53x/1.51x/1.84x and reduced communication time by 34.6%, 33.8%, and 45.7%, yielding overall training speedups of 1.34x, 1.32x, and 1.50x, respectively. Experimental evaluation on representative deep learning workloads demonstrates up to a 1.50x improvement in end-to-end training time with the FPGA compression enabled. Among the tested compressors, Zstd achieved the highest compression ratio, translating to the greatest latency reduction and performance gain. These results highlight that FPGA-based compression can substantially improve throughput in distributed training by alleviating network delays, with negligible added overhead. The proposed method offers a practical path to accelerate HPC communications and scale deep learning workloads more efficiently.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310924
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectComputer
dc.subjectscience
dc.subjectcomputer science
dc.subjectengineering
dc.subjectproject
dc.subjectthesis
dc.subjectcompression
dc.subjectFPGA
dc.subjectGPU
dc.subjectacceleration
dc.subjectHPC
dc.subjectDNN
dc.subjectlz4
dc.subjectzstd
dc.subjectsnappy
dc.subjectlossless compression
dc.subjectMPI
dc.subjectnetworking
dc.subjectsmart-NIC
dc.titleReducing MPI Communication Latency with FPGA-Based Hardware Compression
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeHigh-performance computer systems (MPHPC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 25-179 AB.pdf
Storlek:
1.45 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: