Vision Language Model Based Systems for Optical Character Recognition of Historical Swedish Newspaper Material
Hämtar...
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The ability to digitize old scanned newspapers plays an important role in improving
searchability and making information accessible. To convert text in images into
machine-readable form, an Optical Character Recognition engine is employed. In
this thesis, a dataset Swedish newspaper material from 1818-1904 is used. The
report investigates whether small to medium-sized open-source Vision Language
Models are competitive for Optical Character Recognition compared to traditional
models. It is found that a fine-tuned Qwen3-VL-8B-Instruct in combination with a
simple repetition trimmer is able to outperform the traditional OCR engine Abbyy
FineReader version 11.1.16 by 68.5% in terms of CER on this particular dataset
and set a new record at 1.930% CER. This thesis demonstrates that the current
generation of small open source Vision Language Models are highly competitive
with traditional OCR engies for transcription of 19th century Swedish newspaper
material. The thesis also thoroughly investigates the particular quirks and failure
modes of different OCR systems through a qualitative analysis. Our best model
performs no better on the training set than on the test set, suggesting that our finetuning
was bottle necked by the LoRA adapter size and that one could potentially
achieve an even stronger model with a larger adapter.
Beskrivning
Ämne/nyckelord
Vision Language Model, VLM, machine learning, Optical Character Recognition, OCR, historical newspapers
