Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks
dc.contributor.author | SIKLUND, AMANDA | |
dc.contributor.author | SEDERSTEN, MAX | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för fysik | sv |
dc.contributor.department | Chalmers University of Technology / Department of Physics | en |
dc.contributor.examiner | Granath, Mats | |
dc.contributor.supervisor | Wikman, Filip | |
dc.date.accessioned | 2024-06-18T13:07:34Z | |
dc.date.available | 2024-06-18T13:07:34Z | |
dc.date.issued | 2024 | |
dc.date.submitted | ||
dc.description.abstract | In recent years, machine learning has grown to become increasingly prevalent for a wide range of applications spanning multiple industries. For some of these applica tions, low latency can be critical, which may limit the types of hardware that can be used. Graphical Processing Units (GPUs) have long been the go-to hardware for machine learning tasks, often outperforming alternatives like Central Process ing Units (CPUs), but these are not practical in all situations. We explore CPUs, leveraging modern optimization techniques like pruning and quantization, as a com petitive alternative to GPUs with comparable predictive performance. This thesis provides a comparison of the two hardware types on a real-time latency-critical vi sion task. On the GPU side, TensorRT in combination with quantization is used to achieve state-of-the-art inference performance on the hardware. On the CPU side, the model is optimized using SparseML to introduce unstructured sparsity and quantization. This optimized model is then used by the DeepSparse runtime engine for optimized inference. Our findings show that the CPU approach can outperform the GPU hardware in certain situations. This suggests that CPU hardware could potentially be used in applications previously limited to GPUs. | |
dc.identifier.coursecode | TIFX05 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/307923 | |
dc.language.iso | eng | |
dc.setspec.uppsok | PhysicsChemistryMaths | |
dc.subject | machine learning, neural network, model compression, pruning, quanti zation, optimization, CPU, GPU, Neural Magic, NVIDIA | |
dc.title | Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Complex adaptive systems (MPCAS), MSc |