Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Ulvinge, Niklas

Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Ladda ner

194157.pdf (1.58 MB)

Typ

Examensarbete för masterexamen
Master Thesis

Program

Computer science – algorithms, languages and logic (MPALG), MSc

Publicerad

2014

Författare

Ulvinge, Niklas

Sammanfattning

GPGPU (general purpose computing on graphics processing units) programming is one interesting way to increase performance; unfortunately it is not easily done, because extensive knowledge of the GPU’s architecture is required to write programs that are faster than CPU programs. Obsidian is an embedded domain specific language for writing GPGPU kernels, which tries to make GPUs more programmable, but it still requires extensive knowledge of the GPU’s architecture to write fast kernels. This thesis demonstrates extensions to Obsidian, which increase the programmability of graphics processors. The methods described in this thesis increase the programmability by providing the programmer with feedback about their code through static analysis regarding possible performance bottlenecks, and common programming mistakes. This thesis also demonstrates how many of the decisions of optimizing kernels can be automated through different code transformations. The resulting domain specific language improves upon Obsidian by requiring less knowledge of GPU programming, making it easier to write correct programs, while still providing programs that are as fast and as expressive. The different kinds of feedback provided to the programmer using static analysis are many. Out of bounds checking and race condition detection are useful for determining correctness of code. Memory access patterns analysis for determining coalescing and bank conflict issues, divergent branch detection, unnecessary synchronization detection, and a cost model are useful for finding bottlenecks. The code transformations used are scalar depromotion, unnecessary synchronization removal, and some traditional loop transformations that enable an arbitrarily structured program to be transformed into a kernel efficiently runnable on a GPU.

Ämne/nyckelord

Informations- och kommunikationsteknik , Datavetenskap (datalogi) , Information & Communication Technology , Computer Science

URI

https://hdl.handle.net/20.500.12380/194157

Samling

Examensarbeten för masterexamen

Visa fullständig post