Energy Efficiency of Convolutional Neural Network Inference on FPGAs and Accelerated GPUs
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Energy efficiency of convolutional neural networks (CNN) can be improved by using
low-precision data types. FPGAs and GPUs are widely used to implement CNN
inference due to their parallel processing capabilities. Some GPU-based SoCs in clude accelerator cores that perform low-precision operations efficiently for certain
data types. FPGAs can be configured to carry out arbitrary bit-width operations.
This thesis examines and compares the energy efficiency of FPGAs and accelerated
GPUs for low-precision CNN inference applications. We implemented convolution,
fully connected and pooling building blocks for CNN inference on both platforms,
verified functionality, measured and compared performance with each other and the
state of the art. Accelerator cores on our GPU-based SoC improved the energy
efficiency for some design cases at the expense of increased latency and base power
consumption. Depending on the design parameters and the type of the layers, FPGA
provided up to 23.11 times better energy efficiency, 28.31 times less power consump tion and 6.59 times lower latency than accelerated GPU, and GPU provided up
to 1.64 times better operational energy efficiency. FPGA worked with even higher
energy efficiency for variety of low bit-width data types that cannot be processed
by accelerated GPU. Accelerated GPU delivered reasonable energy efficiency levels
and required comparably less design time. We also included detailed analysis of the
effects of the design parameters on energy efficiency.
Beskrivning
Ämne/nyckelord
Accelerator, CNN, Convolution, Energy Efficiency, FPGA, Fully Connected, GPU, HLS, Pooling, TensorRT