Optimization of Deep Neural Networks for Efficient Resource Utilization

Sanjay, Namratha

Optimization of Deep Neural Networks for Efficient Resource Utilization

dc.contributor.author	Sanjay, Namratha
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Petersen Moura Trancoso, Pedro
dc.contributor.supervisor	Petersen Moura Trancoso, Pedro
dc.date.accessioned	2025-11-25T15:04:13Z
dc.date.issued	2025
dc.date.submitted
dc.description.abstract	Deep neural networks (DNNs) are widely used in computer vision tasks such as image classification and semantic segmentation, but their high computational and memory demands limit deployment on resource-constrained edge devices. This thesis explores quantization as a model compression technique to improve inference efficiency while minimizing accuracy loss. Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) were applied to MobileNetV2 and ResNet50 for classification on the Mini-ImageNet dataset, and to FCN-ResNet18 for segmentation on the Cityscapes dataset. Additionally, mixed-precision QAT was investigated using first-order gradient-based sensitivity analysis to assign per-layer bit-widths. Maintaining activation precision at or above 6 bits during mixed-precision QAT enabled substantial compression—up to 7.8×—while keeping accuracy degradation under 1%. ResNet50 and MobileNetV2 attained compression ratios of 6.3× and 5.2×, respectively. FCN-ResNet18 preserved 57.3% mIoU with 7.8× compression and under 1% accuracy drop compared to the FP32 baseline. Conversely, reducing activation precision to 4 bits led to notable performance degradation, especially in lightweight models and segmentation tasks. Experiments were conducted on NVIDIA Tesla T4 GPU. The results demonstrate strong potential for deploying quantized DNNs on integer-based hardware such as mobile devices, embedded systems, and FPGAs.
dc.identifier.coursecode	DATX05
dc.identifier.uri	http://hdl.handle.net/20.500.12380/310771
dc.language.iso	eng
dc.relation.ispartofseries	CSE 25-69
dc.setspec.uppsok	Technology
dc.subject	Neural Networks, Deep Learning, Network Compression, Quantization, Post-Training Quantization, Quantization Aware-Training ,Mixed-Precision Quantization, Network Acceleration, Resource-Constrained, Edge Device
dc.title	Optimization of Deep Neural Networks for Efficient Resource Utilization
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	High-performance computer systems (MPHPC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 25-69 NS.pdf
Storlek:: 4.31 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen