Multilingual Language Models for the Evaluation and Selection of auto-generated Abstract Wikipedia Articles
dc.contributor.author | Le, Jiahui | |
dc.contributor.author | Zhou, Ruxin | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
dc.contributor.examiner | Ranta, Aarne | |
dc.contributor.supervisor | Ranta, Aarne | |
dc.date.accessioned | 2022-12-02T13:40:55Z | |
dc.date.available | 2022-12-02T13:40:55Z | |
dc.date.issued | 2022 | |
dc.date.submitted | 2022 | |
dc.description.abstract | To enrich Wikipedia to more topics with less cost, Abstract Wikipedia project, an initiative from the Wikimedia foundation, is considered to be created . The general architecture of Natural Language Generation part of the project to automatically generate articles from wiki-data has been basically built. However, the same input wiki-data may be transformed to several sentences with different sentence structures. This thesis built multilingual data sets and utilized Natural Language Processing techniques (e.g. n-gram model and RoBERTa model) to evaluate the quality of these sentences. The report concludes, that a suitable language model is capable of evaluating and selecting auto-generated Abstract Wikipedia articles and has the potential to improve Abstract Wikipedia project. The model performance slightly varies according to the model architecture and the data set. | |
dc.identifier.coursecode | DATX05 | |
dc.identifier.uri | https://odr.chalmers.se/handle/20.500.12380/305869 | |
dc.language.iso | eng | |
dc.setspec.uppsok | Technology | |
dc.subject | n-gram | |
dc.subject | RoBERTa | |
dc.subject | Language Model | |
dc.subject | Natural Language Processing | |
dc.subject | Abstract Wikipedia project | |
dc.title | Multilingual Language Models for the Evaluation and Selection of auto-generated Abstract Wikipedia Articles | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Complex adaptive systems (MPCAS), MSc |