Unsupervised Disambiguation of Abstract

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/255307
Download file(s):
File Description SizeFormat 
255307.pdfFulltext642.18 kBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Unsupervised Disambiguation of Abstract
Authors: Kalldal, Oscar
Ludvigsson, Maximilian
Abstract: Disambiguating natural text is the task of choosing the correct meaning among several possible interpretations. This thesis focus on disambiguating parse trees created by Grammatical Framework — a formal language that represent meaning of natural language sentences with abstract syntax trees in order to do machine translation. Since one tree represents a meaning, for every sentence there exists several interpretations for which the most probable one should be chosen. In order to achieve this, a language model on trees is defined. This is then used to compare possible trees and choose the one with the highest probability. In order to estimate the parameters of the model, the probability of the different meanings behind a word needs to be estimated. This is done using the Expectation Maximization algorithm. Experiments are done on seven different languages to show that the method is generalizable. Different smoothing techniques as well as different dictionaries are evaluated. A novel merged Wordnet is constructed in order to avoid sparseness. The method is evaluated by doing word sense disambiguation (a subtask of tree disambiguation) on standard data sets. The model is shown to be comparable to other unsupervised methods in the SemEval 2015.
Keywords: Data- och informationsvetenskap;Computer and Information Science
Issue Date: 2018
Publisher: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/255307
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.