Deep Active Learning for Swedish Named Entity Recognition An empiric evaluation of active learning algorithms for Named Entity Recognition
Publicerad
Författare
Typ
Examensarbete för masterexamen
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Named entity recognition holds promise for numerous practical applications involving
text data, such as keyword extraction and automated anonymization. However,
successfully train a machine learning model for Named Entity Recognition is challenging
due to the amount of annotated data required, especially for cases where
language that is not globally common such as Swedish is involved. In such cases,
using a Deep pre-trained model such as BERT in conjunction with the practice of
active learning may be preferred. To obtain some insight into the implementation of
such an approach, this thesis serves as an empirical study of various active learning
strategies when used in conjunction with BERT-based name entity recognition. The
performance of different active learning algorithms and the effect of acquisition size
on the performance of active learning is the main focus of this study. In conclusion,
after comparing and evaluating 17 different active learning methods, the study’s
empirical results demonstrate entropy sampling to be the best performing active
learning algorithm for Named Entity Recognition of Swedish texts, and the choice
of acquisition sizes is practically negligible to performance.
Beskrivning
Ämne/nyckelord
Active Learning, Deep Learning, Transformer, BERT, NLP, Named Entity Recognition, Diversity-Based Sampling, Uncertainty-Based Sampling, Pool- Based Sampling, Cumulative Training