Document Embeddings for Scientific Publications

Schäfer, Florian

Document Embeddings for Scientific Publications

Ladda ner

Primär fil 254988.pdf (1.96 MB)

Publicerad

2018

Författare

Schäfer, Florian

Typ

Examensarbete för masterexamen
Master Thesis

Program

Software engineering and technology (MPSOF), MSc

Sammanfattning

While more and more research gets published today, it is consequentially getting harder for humans to keep up with new results at the pace at which they emerge. Even traditional computational methods cannot sufficiently handle such large amount of information which motivates us to utilize state-of-the-art research in order to find such a method that is first and foremost accurate enough to aid researchers in finding relevant literature while at the same time being computationally efficient enough to handle the increasingly larger amounts of data. In this work, we specifically focus on vector space models since they enable us to utilize several efficient geometric computations. We first establish an evaluation framework including several metrics to be able to make a sounds assessment. Then, we explain and evaluate several neural network-based vector space models in the context of scientific publications using our framework. We thereby assembled two novel datasets for both the large-scale multi-domain corpus of Iris AI AS, as well as for the systematic mapping study on autonomous vehicles conducted by Chalmers University of Technology, which serves as an example for a single domain. Lastly, we analyze how well our retrieved vector space models can aid researchers in conducting systematic mappings studies compared to traditional methods. We thereby found that the evaluated approaches strongly vary in their quality. Some performed barely above the random baseline, indicating either a lacking suitability of the method or being due to a lack of sufficient data or optimal hyperparameters. Especially sequential approaches and autoencoders, as well as the combination of the two, yielded surprisingly good results, which make these approaches worth considering for future studies. Apart from the quantitative results of our evaluation framework, we also provided a qualitative solution demonstrating for the autonomous vehicles mapping study how vector space models can provide benefits over traditional topic models.

Ämne/nyckelord

Data- och informationsvetenskap, Computer and Information Science

URI

https://hdl.handle.net/20.500.12380/254988

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Document Embeddings for Scientific Publications

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced