Neural Networks, Edge Computing or Offloading — A study about how offloading neural network calculations stands in contrast to edge computing for embedded hardware
Typ
Examensarbete för masterexamen
Program
Publicerad
2022
Författare
Frennborn, Erik
Oliv, Adam
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The use of machine learning and neural networks shows no signs of slowing down
in the embedded sector, but since embedded hardware often face performance or
energy constraints that heavily limit the computational capacity of applications, it
might not always be optimal to perform computations locally on the device. This
study compares on-chip computing of neural networks on the embedded hardware
i.MX 8M Plus with offloading the computation to a remote server supplied with Intel
i9-10850K and a GeForce RTX 3070. This study also investigates to what extent
the common compression techniques quantization, pruning and weight clustering
affect performance metrics such as latency and energy consumption for embedded
hardware. After measuring latency and energy consumption for non-pipelined inferences
for 33 network variations we have discovered both a latency and energy
consumption threshold where it is more effective to offload rather then computing
on the edge device. These thresholds exist since latency for the offloading scenario
remains almost constant. The latency threshold of 0,031 s is obtained from offloading
via Ethernet to remote GPU, and is strongly limited by the network latency.
This inference time could be used as a guideline for development meaning that if
the inference time on embedded device exceeds set time, it is probably more efficient
to offload calculations to remote server. Other discoveries points towards the
conclusion that the NPU of the i.MX 8M Plus heavily favors compressed models,
showing an average speedup of 188x on the NPU when models are compressed using
quantization and pruning.
Beskrivning
Ämne/nyckelord
Offloading , Convolutional neural networks (CNNs) , Deep neural networks (DNNs) , Embedded systems , Edge computing , Energy reduction , Optimization , Compression