Development of multi-GPU parallelization for a DEM solver: A parallelization extension for an existing state of the art DEM solver

Rasmusson, Fredrik

Development of multi-GPU parallelization for a DEM solver: A parallelization extension for an existing state of the art DEM solver

dc.contributor.author	Rasmusson, Fredrik
dc.contributor.department	Chalmers tekniska högskola / Institutionen för matematiska vetenskaper	sv
dc.contributor.examiner	Logg, Anders
dc.contributor.supervisor	Jareteg, Klas
dc.contributor.supervisor	Bilock, Adam
dc.date.accessioned	2023-06-30T16:38:24Z
dc.date.available	2023-06-30T16:38:24Z
dc.date.issued	2023
dc.date.submitted	2023
dc.description.abstract	The thesis presents a multi-GPU parallelization extension for an existing single GPU Discrete Element Method solver. The implementation extends the solver’s capability to simulate large particle populations, making it possible to decrease the difference between simulations and real-world particulate systems. The code is developed with HPC in mind, carefully minimizing the additional overhead as a consequence of the parallelization operations by minimizing total number of communication points between the GPUs. The computational domain is divided amongst the GPUs by splitting physical space through one of the three Cartesian axes. Although topologically simplistic, it advantageously results in few communication points for each GPU as well as efficient transfers between GPUs as memory locality is trivially achieved. The HPC GPU clusters targeted by the solver generally have 4-8 GPUs which for most cases will be well suited for the one-dimensional domain decomposition. A load balancing scheme have been developed which dynamically shifts the domain borders to distribute the computational load between the devices. The scheme is optimized for even simulation time between the GPUs. This is achieved by measuring and monitoring execution time of some key operations performed in the DEM algorithm and incrementally shift the domain borders to reach a state where all solvers have close to equal execution times for these operations. Performance measurements have been performed through Amazon Web Services Accelerated Computing instances with systems ranging from 4 to 8 GPUs. The total cost of the parallelization in relation to total execution time ranges from 2.6% to 6.5% with increasing number of connected GPUs. Thus, the implementation of the parallelization scheme is deemed efficient and successful. The chosen and defined algorithm is verified and benchmarked on three cases. The verification shows that the physics of the single GPU solver is preserved for the multi-GPU solver. The dynamic load balancing is shown to give beneficial advantages over static decomposition and the optimization scheme for the balancing is verified on a simulation case with dynamic particle behavior. The overall scaling of the algorithm is studied by benchmarking and monitoring the cost associated with the different steps of the DEM algorithm. It is shown that for certain steps, part of the original single GPU solver, the scaling is worse than for the added implementation steps. This is analyzed and considered to be an effect of the memory schemes for the peer-to-peer mode on the GPUs and will require further attention in future work.
dc.identifier.coursecode	MVEX03
dc.identifier.uri	https://hdl.handle.net/20.500.12380/306522
dc.language.iso	eng
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	Discrete Element Method, Parallelization, GPU, mulit-GPU, HPC, Domain decomposition, Dynamic domain decomposition
dc.title	Development of multi-GPU parallelization for a DEM solver: A parallelization extension for an existing state of the art DEM solver
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Engineering mathematics and computational science (MPENM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: Master_Thesis_Fredrik_Rasmusson_2023.pdf
Size:: 20.42 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen