Twitter Topic Modeling

Typ
Examensarbete för masterexamen
Master Thesis
Program
Computer science – algorithms, languages and logic (MPALG), MSc
Publicerad
2014
Författare
Bunyik, Karina
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Following social media discussions related to real life events, has been a great topic of interest. There is no general method for deciding whether the social media discussions reflect the dynamics of the events or if they lead a life on their own. Existing methods for analyzing social media discussions rely on extensive manual work from domain experts and do not generalize well to discussions on languages other than English nor to various events. Combining the domain expert’s knowledge with data driven approaches can lead to models that are applicable to di↵erent domains, and the same time are capable of handling large data amount from social media. In this research, we modeled the Twitter discussions about the Swedish party leader debate held on October 2013. We constructed a semiautomatic model based on Term Frequency-Inverse Document Frequency in order to identify and measure the debate topics on Twitter. For discovering other discussions, we made use of Latent Dirichlet Allocation - an unsupervised learning algorithm. We evaluated the models manually with the help of a domain expert. We compared the Twitter discussions to the topics the politicians were talking about on the debate. The correlation between the Twitter discussions and the debate topic corresponds to the results from a still ongoing political science research. The political science domain expert Linn Sandberg from The University of Gothenburg, Department of Political Science contributed to the research by defining the research-question and evaluating the models.
Beskrivning
Ämne/nyckelord
Data- och informationsvetenskap , Computer and Information Science
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index