Reducing MPI Communication Latency with FPGA-Based Hardware Compression

BOURBIA, ANIS

Reducing MPI Communication Latency with FPGA-Based Hardware Compression

Ladda ner

Primär fil CSE 25-179 AB.pdf (1.45 MB)

Publicerad

2025

Författare

BOURBIA, ANIS

Typ

Examensarbete för masterexamen
Master's Thesis

Program

High-performance computer systems (MPHPC), MSc

Sammanfattning

High-performance computing (HPC) clusters face significant communication overhead in distributed deep learning, where frequent data exchanges via the Message Passing Interface (MPI) can bottleneck overall training. This thesis explores an FPGA-based hardware compression approach to reduce MPI communication latency. We prototype integrating an FPGA compression module into the MPI stack, enabling on-the-fly compression of message payloads using fast lossless algorithms LZ4, Snappy, and Zstd. This hardware-accelerated compression offloads work from CPUs/GPUs and shrinks data volume before network transmission, thereby speeding up inter-node communication. In our evaluation, LZ4/Snappy/Zstd achieved compression ratios of 1.53x/1.51x/1.84x and reduced communication time by 34.6%, 33.8%, and 45.7%, yielding overall training speedups of 1.34x, 1.32x, and 1.50x, respectively. Experimental evaluation on representative deep learning workloads demonstrates up to a 1.50x improvement in end-to-end training time with the FPGA compression enabled. Among the tested compressors, Zstd achieved the highest compression ratio, translating to the greatest latency reduction and performance gain. These results highlight that FPGA-based compression can substantially improve throughput in distributed training by alleviating network delays, with negligible added overhead. The proposed method offers a practical path to accelerate HPC communications and scale deep learning workloads more efficiently.

Ämne/nyckelord

Computer, science, computer science, engineering, project, thesis, compression, FPGA, GPU, acceleration, HPC, DNN, lz4, zstd, snappy, lossless compression, MPI, networking, smart-NIC

URI

http://hdl.handle.net/20.500.12380/310924

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Reducing MPI Communication Latency with FPGA-Based Hardware Compression

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced