Optimization of Deep Neural Networks for Efficient Resource Utilization
| dc.contributor.author | Sanjay, Namratha | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Petersen Moura Trancoso, Pedro | |
| dc.contributor.supervisor | Petersen Moura Trancoso, Pedro | |
| dc.date.accessioned | 2025-11-25T15:04:13Z | |
| dc.date.issued | 2025 | |
| dc.date.submitted | ||
| dc.description.abstract | Deep neural networks (DNNs) are widely used in computer vision tasks such as image classification and semantic segmentation, but their high computational and memory demands limit deployment on resource-constrained edge devices. This thesis explores quantization as a model compression technique to improve inference efficiency while minimizing accuracy loss. Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) were applied to MobileNetV2 and ResNet50 for classification on the Mini-ImageNet dataset, and to FCN-ResNet18 for segmentation on the Cityscapes dataset. Additionally, mixed-precision QAT was investigated using first-order gradient-based sensitivity analysis to assign per-layer bit-widths. Maintaining activation precision at or above 6 bits during mixed-precision QAT enabled substantial compression—up to 7.8×—while keeping accuracy degradation under 1%. ResNet50 and MobileNetV2 attained compression ratios of 6.3× and 5.2×, respectively. FCN-ResNet18 preserved 57.3% mIoU with 7.8× compression and under 1% accuracy drop compared to the FP32 baseline. Conversely, reducing activation precision to 4 bits led to notable performance degradation, especially in lightweight models and segmentation tasks. Experiments were conducted on NVIDIA Tesla T4 GPU. The results demonstrate strong potential for deploying quantized DNNs on integer-based hardware such as mobile devices, embedded systems, and FPGAs. | |
| dc.identifier.coursecode | DATX05 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12380/310771 | |
| dc.language.iso | eng | |
| dc.relation.ispartofseries | CSE 25-69 | |
| dc.setspec.uppsok | Technology | |
| dc.subject | Neural Networks, Deep Learning, Network Compression, Quantization, Post-Training Quantization, Quantization Aware-Training ,Mixed-Precision Quantization, Network Acceleration, Resource-Constrained, Edge Device | |
| dc.title | Optimization of Deep Neural Networks for Efficient Resource Utilization | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | High-performance computer systems (MPHPC), MSc |
