Hybrid Compression: Exploiting Model and Data Compression for Deep Neural Network Workloads
Examensarbete för masterexamen
Computer systems and networks (MPCSN), MSc, High-performance Computer Systems (MPHPC), MSc
Nowadays, various kinds of deep neural networks (DNNs) show human-level capabilities in their domain. But the networks usually have millions or billions of parameters and significant computation cost. To bring the artificial intelligence into people’s daily lives, deploying DNNs efficiently on various hardware (e.g., resourcelimited edge devices) has become a popular topic. In our thesis work, we explore the model compression and data compression and then combine them to build hybrid compression to reduce the computation cost, memory cost of DNNs and speed up the model inference speed. The hybrid compression gives us compression ratios of 5.15 and 5.57, speedup of 2.66 and 3.93, and accuracy losses of 2.38% and 2.64%, if we start from pruning 30% of floating point operations (FLOPs) of MobileNetV1 and ResNet50 respectively. Starting from models which are pruned 50% FLOPs, hybrid compression can achieve compression ratio of 6.29 and 7.53, speedup of 2.74 and 4.21 and with accuracy drop of 7.71% and 9.27% for MobileNetV1 and ResNet50. To verify the effectiveness of hybrid compression, we evaluate the inferece speed of compressed model on two edge devices, Nvidia Jetson Nano and Nvidia Jetson Xavier NX. We find that the gains of hybrid compression are hardware related and NX shows more impressive strengths than Nano. At last, we give users recommendations about how to apply the hybrid compression from different aspects.
Deep Neural Networks , CNNs , Model Compression , Data Compression , Model Inference