Development of multi-GPU parallelization for a DEM solver: A parallelization extension for an existing state of the art DEM solver

dc.contributor.authorRasmusson, Fredrik
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerLogg, Anders
dc.contributor.supervisorJareteg, Klas
dc.contributor.supervisorBilock, Adam
dc.date.accessioned2023-06-30T16:38:24Z
dc.date.available2023-06-30T16:38:24Z
dc.date.issued2023
dc.date.submitted2023
dc.description.abstractThe thesis presents a multi-GPU parallelization extension for an existing single GPU Discrete Element Method solver. The implementation extends the solver’s capability to simulate large particle populations, making it possible to decrease the difference between simulations and real-world particulate systems. The code is developed with HPC in mind, carefully minimizing the additional overhead as a consequence of the parallelization operations by minimizing total number of communication points between the GPUs. The computational domain is divided amongst the GPUs by splitting physical space through one of the three Cartesian axes. Although topologically simplistic, it advantageously results in few communication points for each GPU as well as efficient transfers between GPUs as memory locality is trivially achieved. The HPC GPU clusters targeted by the solver generally have 4-8 GPUs which for most cases will be well suited for the one-dimensional domain decomposition. A load balancing scheme have been developed which dynamically shifts the domain borders to distribute the computational load between the devices. The scheme is optimized for even simulation time between the GPUs. This is achieved by measuring and monitoring execution time of some key operations performed in the DEM algorithm and incrementally shift the domain borders to reach a state where all solvers have close to equal execution times for these operations. Performance measurements have been performed through Amazon Web Services Accelerated Computing instances with systems ranging from 4 to 8 GPUs. The total cost of the parallelization in relation to total execution time ranges from 2.6% to 6.5% with increasing number of connected GPUs. Thus, the implementation of the parallelization scheme is deemed efficient and successful. The chosen and defined algorithm is verified and benchmarked on three cases. The verification shows that the physics of the single GPU solver is preserved for the multi-GPU solver. The dynamic load balancing is shown to give beneficial advantages over static decomposition and the optimization scheme for the balancing is verified on a simulation case with dynamic particle behavior. The overall scaling of the algorithm is studied by benchmarking and monitoring the cost associated with the different steps of the DEM algorithm. It is shown that for certain steps, part of the original single GPU solver, the scaling is worse than for the added implementation steps. This is analyzed and considered to be an effect of the memory schemes for the peer-to-peer mode on the GPUs and will require further attention in future work.
dc.identifier.coursecodeMVEX03
dc.identifier.urihttp://hdl.handle.net/20.500.12380/306522
dc.language.isoeng
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectDiscrete Element Method, Parallelization, GPU, mulit-GPU, HPC, Domain decomposition, Dynamic domain decomposition
dc.titleDevelopment of multi-GPU parallelization for a DEM solver: A parallelization extension for an existing state of the art DEM solver
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeEngineering mathematics and computational science (MPENM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master_Thesis_Fredrik_Rasmusson_2023.pdf
Storlek:
20.42 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: