Text analysis for email multi label classification

dc.contributor.authorHarsha Kadam, Sanjit
dc.contributor.authorPaniskaki, Kyriaki
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerDubhashi, Devdatt
dc.contributor.supervisorNaili, Marwa
dc.date.accessioned2020-07-08T11:24:36Z
dc.date.available2020-07-08T11:24:36Z
dc.date.issued2020sv
dc.date.submitted2020
dc.description.abstractThis master’s thesis studies a multi label text classification task on a small data set of bilingual, English and Swedish, short texts (emails). Specifically, the size of the data set is 5800 emails and those emails are distributed among 107 classes with the special case that the majority of the emails includes the two languages at the same time. For handling this task different models have been employed: Support Vector Machines (SVM), Gated Recurrent Units (GRU), Convolution Neural Network (CNN), Quasi Recurrent Neural Network (QRNN) and Transformers. The experiments demonstrate that in terms of weighted averaged F1 score, the SVM outperforms the other models with a score of 0.96 followed by the CNN with 0.89 and the QRNN with 0.80.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/301402
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectnatural language processingsv
dc.subjectmachine learningsv
dc.subjectmulti label text classificationsv
dc.subjectdeep neural networkssv
dc.subjectbilingual textssv
dc.subjectemailssv
dc.subjectshort textssv
dc.titleText analysis for email multi label classificationsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 20-52 Kadam.pdf
Storlek:
2.03 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: