Document Embeddings for Scientific Publications

dc.contributor.authorSchäfer, Florian
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)sv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineering (Chalmers)en
dc.date.accessioned2019-07-03T14:42:41Z
dc.date.available2019-07-03T14:42:41Z
dc.date.issued2018
dc.description.abstractWhile more and more research gets published today, it is consequentially getting harder for humans to keep up with new results at the pace at which they emerge. Even traditional computational methods cannot sufficiently handle such large amount of information which motivates us to utilize state-of-the-art research in order to find such a method that is first and foremost accurate enough to aid researchers in finding relevant literature while at the same time being computationally efficient enough to handle the increasingly larger amounts of data. In this work, we specifically focus on vector space models since they enable us to utilize several efficient geometric computations. We first establish an evaluation framework including several metrics to be able to make a sounds assessment. Then, we explain and evaluate several neural network-based vector space models in the context of scientific publications using our framework. We thereby assembled two novel datasets for both the large-scale multi-domain corpus of Iris AI AS, as well as for the systematic mapping study on autonomous vehicles conducted by Chalmers University of Technology, which serves as an example for a single domain. Lastly, we analyze how well our retrieved vector space models can aid researchers in conducting systematic mappings studies compared to traditional methods. We thereby found that the evaluated approaches strongly vary in their quality. Some performed barely above the random baseline, indicating either a lacking suitability of the method or being due to a lack of sufficient data or optimal hyperparameters. Especially sequential approaches and autoencoders, as well as the combination of the two, yielded surprisingly good results, which make these approaches worth considering for future studies. Apart from the quantitative results of our evaluation framework, we also provided a qualitative solution demonstrating for the autonomous vehicles mapping study how vector space models can provide benefits over traditional topic models.
dc.identifier.urihttps://hdl.handle.net/20.500.12380/254988
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectData- och informationsvetenskap
dc.subjectComputer and Information Science
dc.titleDocument Embeddings for Scientific Publications
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster Thesisen
dc.type.uppsokH
local.programmeSoftware engineering and technology (MPSOF), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
254988.pdf
Storlek:
1.96 MB
Format:
Adobe Portable Document Format
Beskrivning:
Fulltext