Design and Evaluation of a Software Abstraction Layer for Heterogeneous Neural Network Accelerators

Sreedhar, Aishwarya; Nagarajan, Naga Sarayu

Design and Evaluation of a Software Abstraction Layer for Heterogeneous Neural Network Accelerators

dc.contributor.author	Sreedhar, Aishwarya
dc.contributor.author	Nagarajan, Naga Sarayu
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Larsson-Edefors, Per
dc.contributor.supervisor	Pericas, Miquel
dc.date.accessioned	2022-12-05T09:28:26Z
dc.date.available	2022-12-05T09:28:26Z
dc.date.issued	2022
dc.date.submitted	2022
dc.description.abstract	Machine learning is becoming increasingly important across a wide range of hardware platforms. Current frameworks rely on vendor-specific operator libraries and cater to a small number of server-class GPUs. To be able to support a variety of hardware accelerators from various suppliers, which may vary over time, it is critical to abstract the hardware in order to deploy the core neural network algorithms nacross this heterogeneous hardware with minimal effort. There are various vendor specific consortiums and standards available in the market by the respective vendors. But to make the software portable, an abstraction layer should be build over the vendor proprietary standards. In this thesis, we have used a compiler that provides an abstraction level above CUDA and OpenCL so that we don’t bother to know the details about CUDA/OpenCL programming, One such type of a compiler is Apache TVM, which is a open source machine learning compiler framework for CPUs, GPUs and other hardware accelerators. We have performed a comprehensive comparison between the model compiled using Apache TVM framework and native compilation for two different hardware vendors such as Nvidia and Qualcomm. Framework models are fed into deep learning compilers, which provide optimised code for a range of deep learning hardware. It exposes graph and operator-level optimisations to enable deep learning workloads with performance portability across a variety of hardware backends. TVM tackles deep learning-specific optimization problems like high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also uses a evolutionary, learning-based cost modeling method for quick exploration of code to automate the optimisation of low-level programs to hardware features. Experiments show that TVM delivers performance comparable to state-of-the-art, hand-tuned libraries for low-power CPUs, mobile GPUs, and server-class GPUs across hardware back-ends. TVM’s ability to target new accelerator back-ends, such as the GPU-based generic deep learning accelerator using CUDA and OpenCL is also demonstrated.
dc.identifier.coursecode	DATX05
dc.identifier.uri	https://odr.chalmers.se/handle/20.500.12380/305876
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Deep machine learning
dc.subject	Apache TVM
dc.subject	GPU
dc.subject	OpenCL
dc.subject	CUDA
dc.subject	thesis
dc.subject	self-driving cars
dc.subject	Nvidia Jetson
dc.subject	Qualcomm
dc.subject	performance
dc.subject	native programming
dc.title	Design and Evaluation of a Software Abstraction Layer for Heterogeneous Neural Network Accelerators
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Embedded electronic system design (MPEES), MSc
local.programme	Communication Engineering (MPCOM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 22-129 Nagarajan Sreedhar.pdf
Storlek:: 3.73 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 1.64 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen