Multilingual Language Models for the Evaluation and Selection of auto-generated Abstract Wikipedia Articles

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Complex adaptive systems (MPCAS), MSc
Publicerad
2022
Författare
Le, Jiahui
Zhou, Ruxin
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
To enrich Wikipedia to more topics with less cost, Abstract Wikipedia project, an initiative from the Wikimedia foundation, is considered to be created . The general architecture of Natural Language Generation part of the project to automatically generate articles from wiki-data has been basically built. However, the same input wiki-data may be transformed to several sentences with different sentence structures. This thesis built multilingual data sets and utilized Natural Language Processing techniques (e.g. n-gram model and RoBERTa model) to evaluate the quality of these sentences. The report concludes, that a suitable language model is capable of evaluating and selecting auto-generated Abstract Wikipedia articles and has the potential to improve Abstract Wikipedia project. The model performance slightly varies according to the model architecture and the data set.
Beskrivning
Ämne/nyckelord
n-gram, RoBERTa, Language Model, Natural Language Processing, Abstract Wikipedia project
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index