Semi-Supervised Named Entity Recognition of Medical Entities in Swedish

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/248967
Download file(s):
File Description SizeFormat 
248967.pdfFulltext1.06 MBAdobe PDFView/Open
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAlmgren, Simon
dc.contributor.authorPavlov, Sean
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)sv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineering (Chalmers)en
dc.date.accessioned2019-07-03T14:27:27Z-
dc.date.available2019-07-03T14:27:27Z-
dc.date.issued2016
dc.identifier.urihttps://hdl.handle.net/20.500.12380/248967-
dc.description.abstractA big opportunity within today’s society is the vast amounts of data generated each day. Especially within the health-care sector where a lot of journals are written daily and needs to be processed in some way to properly identify the content within. Enter the field of Named Entity Recognition (NER), where text is analyzed to locate and classify entities into predefined classes; in our case Disorder & Finding, Pharmaceutical Drug and Body Structure. With a model that can do this with a great accuracy, analyzing medical texts could be automated and strain could be removed from people having to read through them manually. Since journals and other medical text often are very sensitive and should be handled with care due to privacy, a method for constructing these models without the need for real annotated journals would be a big step in the right direction. During this thesis we have implemented two models for solving the problem of NER for medical texts in Swedish. Both models were created from lists of seedterms, which consist of words and phrases found in medical taxonomies which we assume belong to one of the three categories. Training data were extracted from the health-care magazine Läkartidningen as well as a subset of Swedish Wikipedia. The first model implemented is based on the work of Zhang and Elhadad [23] where a vector representation is calculated for the possible words and compared against vectors calculated the same way for the different categories. The results of our implementation is on par with the results given by Zhang and Elhadad which suggests that this method works as well for Swedish as it does for English. The second model implemented is based on recurrent neural networks and is built from the same seed-terms as the first model but instead of using only vectorcalculations for classification the network is trained to automatically classify words on character-basis, reading the text both forwards and backwards at the same time. Solving the problem of NER using only unsupervised methods is inherently hard and techniques for solving the problem are not quite there yet. However, by just improving them bit by bit will in the end lead to great results.
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectData- och informationsvetenskap
dc.subjectComputer and Information Science
dc.titleSemi-Supervised Named Entity Recognition of Medical Entities in Swedish
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster Thesisen
dc.type.uppsokH
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.