Multilingual Language Models for the Evaluation and Selection of auto-generated Abstract Wikipedia Articles

Le, Jiahui; Zhou, Ruxin

Multilingual Language Models for the Evaluation and Selection of auto-generated Abstract Wikipedia Articles

Ladda ner

CSE 22-114 Le Zhou.pdf (7.19 MB)

Publicerad

2022

Författare

Le, Jiahui

Zhou, Ruxin

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Complex adaptive systems (MPCAS), MSc

Sammanfattning

To enrich Wikipedia to more topics with less cost, Abstract Wikipedia project, an initiative from the Wikimedia foundation, is considered to be created . The general architecture of Natural Language Generation part of the project to automatically generate articles from wiki-data has been basically built. However, the same input wiki-data may be transformed to several sentences with different sentence structures. This thesis built multilingual data sets and utilized Natural Language Processing techniques (e.g. n-gram model and RoBERTa model) to evaluate the quality of these sentences. The report concludes, that a suitable language model is capable of evaluating and selecting auto-generated Abstract Wikipedia articles and has the potential to improve Abstract Wikipedia project. The model performance slightly varies according to the model architecture and the data set.

Ämne/nyckelord

n-gram, RoBERTa, Language Model, Natural Language Processing, Abstract Wikipedia project

URI

https://odr.chalmers.se/handle/20.500.12380/305869

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Multilingual Language Models for the Evaluation and Selection of auto-generated Abstract Wikipedia Articles

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By