Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks

dc.contributor.authorSIKLUND, AMANDA
dc.contributor.authorSEDERSTEN, MAX
dc.contributor.departmentChalmers tekniska högskola / Institutionen för fysiksv
dc.contributor.departmentChalmers University of Technology / Department of Physicsen
dc.contributor.examinerGranath, Mats
dc.contributor.supervisorWikman, Filip
dc.date.accessioned2024-06-18T13:07:34Z
dc.date.available2024-06-18T13:07:34Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractIn recent years, machine learning has grown to become increasingly prevalent for a wide range of applications spanning multiple industries. For some of these applica tions, low latency can be critical, which may limit the types of hardware that can be used. Graphical Processing Units (GPUs) have long been the go-to hardware for machine learning tasks, often outperforming alternatives like Central Process ing Units (CPUs), but these are not practical in all situations. We explore CPUs, leveraging modern optimization techniques like pruning and quantization, as a com petitive alternative to GPUs with comparable predictive performance. This thesis provides a comparison of the two hardware types on a real-time latency-critical vi sion task. On the GPU side, TensorRT in combination with quantization is used to achieve state-of-the-art inference performance on the hardware. On the CPU side, the model is optimized using SparseML to introduce unstructured sparsity and quantization. This optimized model is then used by the DeepSparse runtime engine for optimized inference. Our findings show that the CPU approach can outperform the GPU hardware in certain situations. This suggests that CPU hardware could potentially be used in applications previously limited to GPUs.
dc.identifier.coursecodeTIFX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/307923
dc.language.isoeng
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectmachine learning, neural network, model compression, pruning, quanti zation, optimization, CPU, GPU, Neural Magic, NVIDIA
dc.titleExploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master_thesis_Max_Sedersten_Amanda_Siklund.pdf
Storlek:
1.95 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: