Neural Networks, Edge Computing or Offloading — A study about how offloading neural network calculations stands in contrast to edge computing for embedded hardware

Frennborn, Erik; Oliv, Adam

Neural Networks, Edge Computing or Offloading — A study about how offloading neural network calculations stands in contrast to edge computing for embedded hardware

dc.contributor.author	Frennborn, Erik
dc.contributor.author	Oliv, Adam
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.examiner	Pathan, Risat
dc.contributor.supervisor	Petersen Moura Trancoso, Pedro
dc.date.accessioned	2022-06-15T08:15:54Z
dc.date.available	2022-06-15T08:15:54Z
dc.date.issued	2022	sv
dc.date.submitted	2020
dc.description.abstract	The use of machine learning and neural networks shows no signs of slowing down in the embedded sector, but since embedded hardware often face performance or energy constraints that heavily limit the computational capacity of applications, it might not always be optimal to perform computations locally on the device. This study compares on-chip computing of neural networks on the embedded hardware i.MX 8M Plus with offloading the computation to a remote server supplied with Intel i9-10850K and a GeForce RTX 3070. This study also investigates to what extent the common compression techniques quantization, pruning and weight clustering affect performance metrics such as latency and energy consumption for embedded hardware. After measuring latency and energy consumption for non-pipelined inferences for 33 network variations we have discovered both a latency and energy consumption threshold where it is more effective to offload rather then computing on the edge device. These thresholds exist since latency for the offloading scenario remains almost constant. The latency threshold of 0,031 s is obtained from offloading via Ethernet to remote GPU, and is strongly limited by the network latency. This inference time could be used as a guideline for development meaning that if the inference time on embedded device exceeds set time, it is probably more efficient to offload calculations to remote server. Other discoveries points towards the conclusion that the NPU of the i.MX 8M Plus heavily favors compressed models, showing an average speedup of 188x on the NPU when models are compressed using quantization and pruning.	sv
dc.identifier.coursecode	DATX05	sv
dc.identifier.uri	https://hdl.handle.net/20.500.12380/304696
dc.language.iso	eng	sv
dc.setspec.uppsok	Technology
dc.subject	Offloading	sv
dc.subject	Convolutional neural networks (CNNs)	sv
dc.subject	Deep neural networks (DNNs)	sv
dc.subject	Embedded systems	sv
dc.subject	Edge computing	sv
dc.subject	Energy reduction	sv
dc.subject	Optimization	sv
dc.subject	Compression	sv
dc.title	Neural Networks, Edge Computing or Offloading — A study about how offloading neural network calculations stands in contrast to edge computing for embedded hardware	sv
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.uppsok	H

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 22-25 Frennborn Oliv.pdf
Storlek:: 6.13 MB
Format:: Adobe Portable Document Format
Beskrivning:

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 1.51 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen