Evaluating Lexicon-Based Models versus BERT for Sentence-Level Sentiment Analysis in Swedish

dc.contributor.authorNilsson, Erik
dc.contributor.authorMansour, Ricardo
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerJohansson, Richard
dc.contributor.supervisorDannélls, Dana
dc.contributor.supervisorSievers, Erik
dc.date.accessioned2025-09-05T09:59:15Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractThis thesis explores the development and evaluation of different approaches to sentiment analysis for the Swedish language, focusing on sentence-level sentiment detection. The study compares traditional rule- and lexicon-based models with modern machine learning approaches, particularly the Bidirectional Encoder Representations from Transformers (BERT), as well as a hybrid model combining the rule-based model with Support Vector Machines SVM. Utilizing the Sparv pipeline for linguistic analysis and breadkdown in tandem with the sentiment lexicon SenSALDO, we aim to enhance the existing research on Swedish rule-based models by inclusion of linguistic features. The research also involves expanding the lexicon with neutral, positive and negative entries in order to improve coverage and accuracy of sentence level sentiment analysis. The evaluation highlights the strengths and weaknesses of each model where the BERT model was the best performing overall, especially for neutral sentences, while the rule based and hybrid model were much better at positive sentences, for negative sentiment detection the hybrid SVM model was the best performing. Our thesis contributes to the ongoing discourse on effective sentiment analysis in non-English languages and offers insights for further advancements in natural language processing (NLP) for Swedish.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310424
dc.language.isoeng
dc.relation.ispartofseriesCSE-24-52
dc.setspec.uppsokTechnology
dc.subjectComputer science, Sentiment analysis, BERT, Lexicon-based, Rule-based, Support Vector Machines (SVM), Swedish language, Natural language processing (NLP), Sparv, SALDO, SenSALDO, Machine learning.
dc.titleEvaluating Lexicon-Based Models versus BERT for Sentence-Level Sentiment Analysis in Swedish
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer science – algorithms, languages and logic (MPALG), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 24-52 EN RM.pdf
Storlek:
1.23 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: