Unsupervised Word-Sense Disambiguation for Product Description Texts
dc.contributor.author | Dahlberg, Andreas | |
dc.contributor.author | Olin, Oskar | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
dc.contributor.examiner | Ranta, Aarne | |
dc.contributor.supervisor | Angelov, Krasimir | |
dc.date.accessioned | 2020-07-08T09:56:43Z | |
dc.date.available | 2020-07-08T09:56:43Z | |
dc.date.issued | 2020 | sv |
dc.date.submitted | 2020 | |
dc.description.abstract | As the name suggests, word-sense disambiguation is the task of determining the correct meaning, or sense, of words that can have multiple interpretations. Textual, a company with a product that automatically generates product description texts in multiple languages, can make use of word-sense disambiguation to improve the quality of their texts. In this project, an attempt to solve this task is made. To achieve this, word alignment is used to define and label the senses of words as quadruples of translations in English, Swedish, French and Spanish, making word-sense disambiguation a supervised task. Contextually alike quadruples are then merged using a permutation test and a novel merging algorithm. In word-sense disambiguation it is natural to represent the word along with its context as a vector in a higherdimensional vector space. For this, different BERT-models are used as well as the simpler Bag-of-Words- and contextual Word2Vec-models. The results on 69 different word types show an average accuracy of 91.97% compared to 58.35% for the baseline classifier, the classifier that always predicts the most frequent sense. On unseen data from new fashion sites, the average accuracy on 8 word types is 85.19% compared to 56.89% for the baseline classifier. | sv |
dc.identifier.coursecode | DATX05 | sv |
dc.identifier.uri | https://hdl.handle.net/20.500.12380/301391 | |
dc.language.iso | eng | sv |
dc.setspec.uppsok | Technology | |
dc.subject | word-sense disambiguation | sv |
dc.subject | natural language processing | sv |
dc.subject | BERT | sv |
dc.subject | word alignment | sv |
dc.subject | machine learning | sv |
dc.subject | artificial intelligence | sv |
dc.title | Unsupervised Word-Sense Disambiguation for Product Description Texts | sv |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.uppsok | H |