Generation of Wikidata Descriptions with Grammatical Framework
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Wikidata is a collaborative, multilingual knowledge base that serves a wide range of purposes, such as supporting Wikipedia, and plays a significant role in ensuring equitable access to information worldwide. However, the quality and consistency
of entity descriptions in different languages vary greatly, and many entities lack descriptions altogether. This thesis presents a workflow based on the Grammatical Framework (GF) for the automated generation of multilingual Wikidata entity
descriptions. The system integrates property extraction, grammar design, and automatic linearization, enabling the systematic generation of multilingual descriptions while reducing, to some extent, the need for manual intervention and GF-specific
expertise. Manual evaluation shows that, compared to human-written Wikidata descriptions and those generated by large language models, GF-generated descriptions achieve higher cross-linguistic consistency and factual accuracy. The workflow also supports efficient extension to additional languages, as demonstrated by the Bengali case by Mohammad Rakib Imtiaz. These results highlight the potential of GF, the GF Resource Grammar Library, and the approach introduced in this thesis for scalable, verifiable, and reliable multilingual description generation.
Beskrivning
Ämne/nyckelord
Wikidata, Grammatical Framework, Natrual Language, computer science
