Enabling Moldability in OpenMP
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
OpenMP has long been a ubiquitous technology in High-Performance Computing (HPC), making parallel programs simple to reason about and portable to many different systems. When an OpenMP runtime decides which threads should run
tasks, it often uses a simple work-stealing scheduler as they evenly distribute tasks among cores. This is the method used by LLVMs OpenMP runtime. But today, HPC systems often consist of multiple sockets, each with many cores and non-uniform memory access (NUMA). This creates a complicated memory hierarchy which isn’t accounted for by simple work-stealing schedulers. Another feature not supported well by simple work-stealing schedulers is nested parallelism, where each task runs multiple threads in parallel. It isn’t clear how many threads each task should be allocated i.e. the width of the task. If it’s too high there will be over-subscription while if it’s too low there will be load imbalance. This can be solved by supporting moldable tasks, which are tasks where the scheduler decides each task’s width. We extend LLVM’s OpenMP runtime with support for moldable tasks scheduled using a locality-aware scheduler.
Beskrivning
Ämne/nyckelord
Adaptive Scheduling, OpenMP, NUMA-aware scheduling, Moldable