Generalization abilities of scene text detection models
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The field of scene text detection has seen massive improvements in the last year with
the introduction of models that are based on deep convolutional networks. Stateof-
the-art performance on certain benchmark datasets is getting close to human
capabilities of detecting text. The question of how well these text detection models
can generalize to detect text in images from different domains is of high interest for
tasks where a high variety of images are included. This thesis performs an analysis
of the generalization abilities of two scene text detection models, EAST and DBnet
by training various instances of both models on different combinations of benchmark
dataset and evaluating the performance on several datasets which are both used for
training and unseen during training. The results show that both models are able
to generalize the text detection to a certain degree with instances of both models
achieving an average f1-score >0.6 on a selection of benchmark datasets. Both
models also achieved f1-scores >0.6 on a set of images collected from social media
provided by Recorded Future which was not used for training. Results from some
of the easier benchmark datasets are, however, not indicative of performance in a
highly varied domain. Finally, it was showed that both models perform quite well
as classifiers of whether an image contains text or not.
Beskrivning
Ämne/nyckelord
Scene text detection, computer vision, deep neural network, convolutional network