Generalization abilities of scene text detection models

dc.contributor.authorGjesdal, Andreas F.T.
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerJohansson, Richard
dc.contributor.supervisorNorlund, Tobias
dc.date.accessioned2022-10-14T12:51:44Z
dc.date.available2022-10-14T12:51:44Z
dc.date.issued2022sv
dc.date.submitted2020
dc.description.abstractThe field of scene text detection has seen massive improvements in the last year with the introduction of models that are based on deep convolutional networks. Stateof- the-art performance on certain benchmark datasets is getting close to human capabilities of detecting text. The question of how well these text detection models can generalize to detect text in images from different domains is of high interest for tasks where a high variety of images are included. This thesis performs an analysis of the generalization abilities of two scene text detection models, EAST and DBnet by training various instances of both models on different combinations of benchmark dataset and evaluating the performance on several datasets which are both used for training and unseen during training. The results show that both models are able to generalize the text detection to a certain degree with instances of both models achieving an average f1-score >0.6 on a selection of benchmark datasets. Both models also achieved f1-scores >0.6 on a set of images collected from social media provided by Recorded Future which was not used for training. Results from some of the easier benchmark datasets are, however, not indicative of performance in a highly varied domain. Finally, it was showed that both models perform quite well as classifiers of whether an image contains text or not.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/305715
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectScene text detectionsv
dc.subjectcomputer visionsv
dc.subjectdeep neural networksv
dc.subjectconvolutional networksv
dc.titleGeneralization abilities of scene text detection modelssv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeComputer systems and networks (MPCSN), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 22-123 Gjesdal.pdf
Storlek:
2.89 MB
Format:
Adobe Portable Document Format
Beskrivning:

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.51 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: