Low latency video analytics system with multi-exit neural networks

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Computer vision-based control systems have become increasingly powerful and promising in tackling real-world problems. This can be accredited to the use of deep learning methods in these systems with state-of-the-art performance sometimes outperforming humans in tasks which require subjective decision making. This has resulted in increased interest in these systems from Swedish industry, including Volvo. One example system where these systems are used is the Volvo GPSS system, where semantic segmentation is used to perform real-time decisions based on pixel level classification of a monitored area. However, such systems frequently deal with a trade-off between latency and accuracy. This is primarily due to the increasing number of model layers being used to develop Deep-Neural-Network models for vision systems, resulting in equal resource utilization regardless of input complexity. In this thesis, we develop an approach that employs input adaptive multi-exit strategy to exploit latency benefits of dynamic processing based on the input complexity. The proposed approach aims to have a reduced average inference time as the simple input samples takes an early exit and only the complex samples need more computation offered by all the model layers. The open source CityScapes dataset and the Volvo dataset were used in a number of multi-exit semantic segmentation experiments with HRNet architecture chosen as the backbone. The thesis work studies three novel exit strategies, including reinforcement learning, auxiliary models, and fast Fourier transform. Out of all the methods examined, the reinforcement learningbased exit strategy displayed the best performance advantages, with accuracy on par with unbranched HRNet and a significant decrease in latency and computation.

Description

Keywords

Multi-exit Neural Networks, Input Adaptive Inference, Semantic Segmentation, Inference Optimization

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By