Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks

SIKLUND, AMANDA; SEDERSTEN, MAX

Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks

dc.contributor.author	SIKLUND, AMANDA
dc.contributor.author	SEDERSTEN, MAX
dc.contributor.department	Chalmers tekniska högskola / Institutionen för fysik	sv
dc.contributor.department	Chalmers University of Technology / Department of Physics	en
dc.contributor.examiner	Granath, Mats
dc.contributor.supervisor	Wikman, Filip
dc.date.accessioned	2024-06-18T13:07:34Z
dc.date.available	2024-06-18T13:07:34Z
dc.date.issued	2024
dc.date.submitted
dc.description.abstract	In recent years, machine learning has grown to become increasingly prevalent for a wide range of applications spanning multiple industries. For some of these applica tions, low latency can be critical, which may limit the types of hardware that can be used. Graphical Processing Units (GPUs) have long been the go-to hardware for machine learning tasks, often outperforming alternatives like Central Process ing Units (CPUs), but these are not practical in all situations. We explore CPUs, leveraging modern optimization techniques like pruning and quantization, as a com petitive alternative to GPUs with comparable predictive performance. This thesis provides a comparison of the two hardware types on a real-time latency-critical vi sion task. On the GPU side, TensorRT in combination with quantization is used to achieve state-of-the-art inference performance on the hardware. On the CPU side, the model is optimized using SparseML to introduce unstructured sparsity and quantization. This optimized model is then used by the DeepSparse runtime engine for optimized inference. Our findings show that the CPU approach can outperform the GPU hardware in certain situations. This suggests that CPU hardware could potentially be used in applications previously limited to GPUs.
dc.identifier.coursecode	TIFX05
dc.identifier.uri	http://hdl.handle.net/20.500.12380/307923
dc.language.iso	eng
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	machine learning, neural network, model compression, pruning, quanti zation, optimization, CPU, GPU, Neural Magic, NVIDIA
dc.title	Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Complex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: Master_thesis_Max_Sedersten_Amanda_Siklund.pdf
Storlek:: 1.95 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen