Unsupervised Word-Sense Disambiguation for Product Description Texts

Publicerad

Typ

Examensarbete för masterexamen

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

As the name suggests, word-sense disambiguation is the task of determining the correct meaning, or sense, of words that can have multiple interpretations. Textual, a company with a product that automatically generates product description texts in multiple languages, can make use of word-sense disambiguation to improve the quality of their texts. In this project, an attempt to solve this task is made. To achieve this, word alignment is used to define and label the senses of words as quadruples of translations in English, Swedish, French and Spanish, making word-sense disambiguation a supervised task. Contextually alike quadruples are then merged using a permutation test and a novel merging algorithm. In word-sense disambiguation it is natural to represent the word along with its context as a vector in a higherdimensional vector space. For this, different BERT-models are used as well as the simpler Bag-of-Words- and contextual Word2Vec-models. The results on 69 different word types show an average accuracy of 91.97% compared to 58.35% for the baseline classifier, the classifier that always predicts the most frequent sense. On unseen data from new fashion sites, the average accuracy on 8 word types is 85.19% compared to 56.89% for the baseline classifier.

Beskrivning

Ämne/nyckelord

word-sense disambiguation, natural language processing, BERT, word alignment, machine learning, artificial intelligence

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced