Vision Language Model Based Systems for Optical Character Recognition of Historical Swedish Newspaper Material

Johansson, Martin; Waginder, Selma

Vision Language Model Based Systems for Optical Character Recognition of Historical Swedish Newspaper Material

dc.contributor.author	Johansson, Martin
dc.contributor.author	Waginder, Selma
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Angelov, Krasimir
dc.contributor.supervisor	Dannélls, Dana
dc.date.accessioned	2026-03-05T11:11:41Z
dc.date.issued	2026
dc.date.submitted
dc.description.abstract	The ability to digitize old scanned newspapers plays an important role in improving searchability and making information accessible. To convert text in images into machine-readable form, an Optical Character Recognition engine is employed. In this thesis, a dataset Swedish newspaper material from 1818-1904 is used. The report investigates whether small to medium-sized open-source Vision Language Models are competitive for Optical Character Recognition compared to traditional models. It is found that a fine-tuned Qwen3-VL-8B-Instruct in combination with a simple repetition trimmer is able to outperform the traditional OCR engine Abbyy FineReader version 11.1.16 by 68.5% in terms of CER on this particular dataset and set a new record at 1.930% CER. This thesis demonstrates that the current generation of small open source Vision Language Models are highly competitive with traditional OCR engies for transcription of 19th century Swedish newspaper material. The thesis also thoroughly investigates the particular quirks and failure modes of different OCR systems through a qualitative analysis. Our best model performs no better on the training set than on the test set, suggesting that our finetuning was bottle necked by the LoRA adapter size and that one could potentially achieve an even stronger model with a larger adapter.
dc.identifier.coursecode	DATX05
dc.identifier.uri	https://hdl.handle.net/20.500.12380/311006
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Vision Language Model
dc.subject	VLM
dc.subject	machine learning
dc.subject	Optical Character Recognition
dc.subject	OCR
dc.subject	historical newspapers
dc.title	Vision Language Model Based Systems for Optical Character Recognition of Historical Swedish Newspaper Material
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Complex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 26-12 SW MJ.pdf
Size:: 17.08 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen