A Computational Grammar and Lexicon for Maltese

Typ
Examensarbete för masterexamen
Master Thesis
Program
Computer science – algorithms, languages and logic (MPALG), MSc
Publicerad
2013
Författare
Camilleri, John J.
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Maltese is the national language of Malta and an official language of the European Union. While classified as Semitic, Maltese has been heavily influenced by the Romance languages and English, and features both root-and-pattern and concatenative morphologies. Despite its active use, the language is highly under-resourced in digital terms. This thesis contributes two computational resources for Maltese: a grammar and an online full-form lexicon. The first part of this thesis deals with a computational grammar for Maltese, which is implemented using the Grammatical Framework (GF). GF is a multilingual grammar formalism based on using abstract syntax trees as language-independent semantic representations. Its Resource Grammar Library (RGL) already covers the morphology and basic syntax of some 27 languages from around the world. Maltese is the 28th addition to the RGL, and the first Semitic language in the library to be completed. The smart paradigms implemented in the morphological part of grammar allow full inflection tables to be produced for any lexical unit, often requiring only a lemmatised form. This report looks at some of the more interesting implementational details of the grammar, discussing the compromises that had to be made along the way. The second part covers the collection of various Maltese lexical resources into a single searchable collection, using a schema-less database to accommodate partial data from heterogeneous sources. We then use the smart paradigms from the morphological part of the grammar to automatically produce some 4 million inflection forms and extend the collection into a full-form computational lexicon, which can be used in for morphological lookup and spell checking. All the software and resources described in this thesis are open-source and free to use for any purpose.
Beskrivning
Ämne/nyckelord
Data- och informationsvetenskap , Computer and Information Science
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index