Leveraging Data Augmentation for Better Named Entity Recognition in Low-Resource Settings
| dc.contributor.author | Björnerud, Philip | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Bernardy, Jean-Philippe | |
| dc.contributor.supervisor | Dannélls, Dana | |
| dc.contributor.supervisor | Kokkinakis, Dimitrios | |
| dc.date.accessioned | 2024-02-21T09:39:17Z | |
| dc.date.available | 2024-02-21T09:39:17Z | |
| dc.date.issued | 2024 | |
| dc.date.submitted | 2023 | |
| dc.description.abstract | This thesis investigates the challenges in the field of Natural Language Processing (NLP), with a focus on Named Entity Recognition (NER), a subtask within NLP that involves classifying entities. Addressing the issue of data scarcity, which is particularly critical in non-English languages like Swedish, this study investigates various data augmentation methods by fine-tuning the transformer-based model, KB-BERT. The datasets are simulated as low-resource settings, drawing inspiration from the study X Dai and H Adel (2020) [1] work, using three sets of training data containing 50, 150, and 500 instances respectively. The thesis also explores whether a newly developed state-of-the-art data augmentation method can outperform other data augmentation methods in enhancing an NLP model, centering on three data augmentation methods: Synonym replacement, Mention replacement, and AugGPT, the last being a state-of-the-art method. The findings of this study highlight that synonym replacement emerged as the most effective data augmentation method across various low-resource settings, achieving the highest F1-score increase in all scenarios. AugGPT achieved the second highest average F1-score, while mention replacement achieved the lowest across the tested settings. | |
| dc.identifier.coursecode | DATX05 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12380/307588 | |
| dc.language.iso | eng | |
| dc.setspec.uppsok | Technology | |
| dc.subject | Named Entity Recognition | |
| dc.subject | Data Augmentation | |
| dc.subject | Low-Resource Settings | |
| dc.subject | Synonym Replacement | |
| dc.subject | Mention Replacement | |
| dc.subject | AugGPT | |
| dc.title | Leveraging Data Augmentation for Better Named Entity Recognition in Low-Resource Settings | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | Computer science – algorithms, languages and logic (MPALG), MSc |
