Vision Language Model Based Systems for Optical Character Recognition of Historical Swedish Newspaper Material

Johansson, Martin; Waginder, Selma

Vision Language Model Based Systems for Optical Character Recognition of Historical Swedish Newspaper Material

Ladda ner

CSE 26-12 SW MJ.pdf (17.08 MB)

Publicerad

2026

Författare

Johansson, Martin

Waginder, Selma

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Complex adaptive systems (MPCAS), MSc

Sammanfattning

The ability to digitize old scanned newspapers plays an important role in improving searchability and making information accessible. To convert text in images into machine-readable form, an Optical Character Recognition engine is employed. In this thesis, a dataset Swedish newspaper material from 1818-1904 is used. The report investigates whether small to medium-sized open-source Vision Language Models are competitive for Optical Character Recognition compared to traditional models. It is found that a fine-tuned Qwen3-VL-8B-Instruct in combination with a simple repetition trimmer is able to outperform the traditional OCR engine Abbyy FineReader version 11.1.16 by 68.5% in terms of CER on this particular dataset and set a new record at 1.930% CER. This thesis demonstrates that the current generation of small open source Vision Language Models are highly competitive with traditional OCR engies for transcription of 19th century Swedish newspaper material. The thesis also thoroughly investigates the particular quirks and failure modes of different OCR systems through a qualitative analysis. Our best model performs no better on the training set than on the test set, suggesting that our finetuning was bottle necked by the LoRA adapter size and that one could potentially achieve an even stronger model with a larger adapter.

Ämne/nyckelord

Vision Language Model, VLM, machine learning, Optical Character Recognition, OCR, historical newspapers

URI

https://hdl.handle.net/20.500.12380/311006

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Vision Language Model Based Systems for Optical Character Recognition of Historical Swedish Newspaper Material

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By