Generalization abilities of scene text detection models

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The field of scene text detection has seen massive improvements in the last year with the introduction of models that are based on deep convolutional networks. Stateof- the-art performance on certain benchmark datasets is getting close to human capabilities of detecting text. The question of how well these text detection models can generalize to detect text in images from different domains is of high interest for tasks where a high variety of images are included. This thesis performs an analysis of the generalization abilities of two scene text detection models, EAST and DBnet by training various instances of both models on different combinations of benchmark dataset and evaluating the performance on several datasets which are both used for training and unseen during training. The results show that both models are able to generalize the text detection to a certain degree with instances of both models achieving an average f1-score >0.6 on a selection of benchmark datasets. Both models also achieved f1-scores >0.6 on a set of images collected from social media provided by Recorded Future which was not used for training. Results from some of the easier benchmark datasets are, however, not indicative of performance in a highly varied domain. Finally, it was showed that both models perform quite well as classifiers of whether an image contains text or not.

Description

Keywords

Scene text detection, computer vision, deep neural network, convolutional network

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By