Nonparametric Evolutionary Short Text Topic Modeling

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/255512
Download file(s):
File Description SizeFormat 
255512.pdfFulltext1.31 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Nonparametric Evolutionary Short Text Topic Modeling
Authors: Ejbyfeldt, Emil
Abstract: With the advent of social media more information is published and discussions happens in the form of short text. Tools are needed for detecting new and changes in topics that can help people understand and explore the vast amount of information available. Many of current approaches do not handle short text well and some require specification of the number of topics beforehand. A way of extending Dirichlet Processes Mixture Models to handle temporal data is introduced. A collapsed Gibbs sampling algorithm for interference is derived for the model. In the model data is divided into epochs where data is interchangeable within an epoch. The number of clusters in each epoch is unbounded and the model has the ability to recover the birth, death and split of clusters. Topic modeling is done by assuming that each short text belong to a single topic. The model is specifically evaluated on short text dataset to show the model’s ability to discover topic evolution and discover the appearance of new topics. We also show that the model has better stability and less overfitting than previous solutions with the same abilities.
Keywords: Data- och informationsvetenskap;Computer and Information Science
Issue Date: 2018
Publisher: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/255512
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.