Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/194157
Download file(s):
File Description SizeFormat 
194157.pdfFulltext1.62 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis
Authors: Ulvinge, Niklas
Abstract: GPGPU (general purpose computing on graphics processing units) programming is one interesting way to increase performance; unfortunately it is not easily done, because extensive knowledge of the GPU’s architecture is required to write programs that are faster than CPU programs. Obsidian is an embedded domain specific language for writing GPGPU kernels, which tries to make GPUs more programmable, but it still requires extensive knowledge of the GPU’s architecture to write fast kernels. This thesis demonstrates extensions to Obsidian, which increase the programmability of graphics processors. The methods described in this thesis increase the programmability by providing the programmer with feedback about their code through static analysis regarding possible performance bottlenecks, and common programming mistakes. This thesis also demonstrates how many of the decisions of optimizing kernels can be automated through different code transformations. The resulting domain specific language improves upon Obsidian by requiring less knowledge of GPU programming, making it easier to write correct programs, while still providing programs that are as fast and as expressive. The different kinds of feedback provided to the programmer using static analysis are many. Out of bounds checking and race condition detection are useful for determining correctness of code. Memory access patterns analysis for determining coalescing and bank conflict issues, divergent branch detection, unnecessary synchronization detection, and a cost model are useful for finding bottlenecks. The code transformations used are scalar depromotion, unnecessary synchronization removal, and some traditional loop transformations that enable an arbitrarily structured program to be transformed into a kernel efficiently runnable on a GPU.
Keywords: Informations- och kommunikationsteknik;Datavetenskap (datalogi);Information & Communication Technology;Computer Science
Issue Date: 2014
Publisher: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/194157
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.