Unsupervised Word-Sense Disambiguation for Product Description Texts

dc.contributor.authorDahlberg, Andreas
dc.contributor.authorOlin, Oskar
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerRanta, Aarne
dc.contributor.supervisorAngelov, Krasimir
dc.date.accessioned2020-07-08T09:56:43Z
dc.date.available2020-07-08T09:56:43Z
dc.date.issued2020sv
dc.date.submitted2020
dc.description.abstractAs the name suggests, word-sense disambiguation is the task of determining the correct meaning, or sense, of words that can have multiple interpretations. Textual, a company with a product that automatically generates product description texts in multiple languages, can make use of word-sense disambiguation to improve the quality of their texts. In this project, an attempt to solve this task is made. To achieve this, word alignment is used to define and label the senses of words as quadruples of translations in English, Swedish, French and Spanish, making word-sense disambiguation a supervised task. Contextually alike quadruples are then merged using a permutation test and a novel merging algorithm. In word-sense disambiguation it is natural to represent the word along with its context as a vector in a higherdimensional vector space. For this, different BERT-models are used as well as the simpler Bag-of-Words- and contextual Word2Vec-models. The results on 69 different word types show an average accuracy of 91.97% compared to 58.35% for the baseline classifier, the classifier that always predicts the most frequent sense. On unseen data from new fashion sites, the average accuracy on 8 word types is 85.19% compared to 56.89% for the baseline classifier.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/301391
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectword-sense disambiguationsv
dc.subjectnatural language processingsv
dc.subjectBERTsv
dc.subjectword alignmentsv
dc.subjectmachine learningsv
dc.subjectartificial intelligencesv
dc.titleUnsupervised Word-Sense Disambiguation for Product Description Textssv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 20-68 Dahlberg Olin.pdf
Storlek:
8.38 MB
Format:
Adobe Portable Document Format
Beskrivning:

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: