Semi-Supervised Named Entity Recognition of Medical Entities in Swedish

Almgren, Simon; Pavlov, Sean

Semi-Supervised Named Entity Recognition of Medical Entities in Swedish

dc.contributor.author	Almgren, Simon
dc.contributor.author	Pavlov, Sean
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)	en
dc.date.accessioned	2019-07-03T14:27:27Z
dc.date.available	2019-07-03T14:27:27Z
dc.date.issued	2016
dc.description.abstract	A big opportunity within today’s society is the vast amounts of data generated each day. Especially within the health-care sector where a lot of journals are written daily and needs to be processed in some way to properly identify the content within. Enter the field of Named Entity Recognition (NER), where text is analyzed to locate and classify entities into predefined classes; in our case Disorder & Finding, Pharmaceutical Drug and Body Structure. With a model that can do this with a great accuracy, analyzing medical texts could be automated and strain could be removed from people having to read through them manually. Since journals and other medical text often are very sensitive and should be handled with care due to privacy, a method for constructing these models without the need for real annotated journals would be a big step in the right direction. During this thesis we have implemented two models for solving the problem of NER for medical texts in Swedish. Both models were created from lists of seedterms, which consist of words and phrases found in medical taxonomies which we assume belong to one of the three categories. Training data were extracted from the health-care magazine Läkartidningen as well as a subset of Swedish Wikipedia. The first model implemented is based on the work of Zhang and Elhadad [23] where a vector representation is calculated for the possible words and compared against vectors calculated the same way for the different categories. The results of our implementation is on par with the results given by Zhang and Elhadad which suggests that this method works as well for Swedish as it does for English. The second model implemented is based on recurrent neural networks and is built from the same seed-terms as the first model but instead of using only vectorcalculations for classification the network is trained to automatically classify words on character-basis, reading the text both forwards and backwards at the same time. Solving the problem of NER using only unsupervised methods is inherently hard and techniques for solving the problem are not quite there yet. However, by just improving them bit by bit will in the end lead to great results.
dc.identifier.uri	https://hdl.handle.net/20.500.12380/248967
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Data- och informationsvetenskap
dc.subject	Computer and Information Science
dc.title	Semi-Supervised Named Entity Recognition of Medical Entities in Swedish
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master Thesis	en
dc.type.uppsok	H
local.programme	Computer science – algorithms, languages and logic (MPALG), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: 248967.pdf
Size:: 1.03 MB
Format:: Adobe Portable Document Format
Description:: Fulltext

Ladda ner

Samlingar

Examensarbeten för masterexamen