Multilingual Language Models for the Evaluation and Selection of auto-generated Abstract Wikipedia Articles

dc.contributor.authorLe, Jiahui
dc.contributor.authorZhou, Ruxin
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerRanta, Aarne
dc.contributor.supervisorRanta, Aarne
dc.date.accessioned2022-12-02T13:40:55Z
dc.date.available2022-12-02T13:40:55Z
dc.date.issued2022
dc.date.submitted2022
dc.description.abstractTo enrich Wikipedia to more topics with less cost, Abstract Wikipedia project, an initiative from the Wikimedia foundation, is considered to be created . The general architecture of Natural Language Generation part of the project to automatically generate articles from wiki-data has been basically built. However, the same input wiki-data may be transformed to several sentences with different sentence structures. This thesis built multilingual data sets and utilized Natural Language Processing techniques (e.g. n-gram model and RoBERTa model) to evaluate the quality of these sentences. The report concludes, that a suitable language model is capable of evaluating and selecting auto-generated Abstract Wikipedia articles and has the potential to improve Abstract Wikipedia project. The model performance slightly varies according to the model architecture and the data set.
dc.identifier.coursecodeDATX05
dc.identifier.urihttps://odr.chalmers.se/handle/20.500.12380/305869
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectn-gram
dc.subjectRoBERTa
dc.subjectLanguage Model
dc.subjectNatural Language Processing
dc.subjectAbstract Wikipedia project
dc.titleMultilingual Language Models for the Evaluation and Selection of auto-generated Abstract Wikipedia Articles
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 22-114 Le Zhou.pdf
Storlek:
7.19 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.64 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: