Twitter Topic Modeling

dc.contributor.authorBunyik, Karina
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)sv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineering (Chalmers)en
dc.date.accessioned2019-07-03T13:30:46Z
dc.date.available2019-07-03T13:30:46Z
dc.date.issued2014
dc.description.abstractFollowing social media discussions related to real life events, has been a great topic of interest. There is no general method for deciding whether the social media discussions reflect the dynamics of the events or if they lead a life on their own. Existing methods for analyzing social media discussions rely on extensive manual work from domain experts and do not generalize well to discussions on languages other than English nor to various events. Combining the domain expert’s knowledge with data driven approaches can lead to models that are applicable to di↵erent domains, and the same time are capable of handling large data amount from social media. In this research, we modeled the Twitter discussions about the Swedish party leader debate held on October 2013. We constructed a semiautomatic model based on Term Frequency-Inverse Document Frequency in order to identify and measure the debate topics on Twitter. For discovering other discussions, we made use of Latent Dirichlet Allocation - an unsupervised learning algorithm. We evaluated the models manually with the help of a domain expert. We compared the Twitter discussions to the topics the politicians were talking about on the debate. The correlation between the Twitter discussions and the debate topic corresponds to the results from a still ongoing political science research. The political science domain expert Linn Sandberg from The University of Gothenburg, Department of Political Science contributed to the research by defining the research-question and evaluating the models.
dc.identifier.urihttps://hdl.handle.net/20.500.12380/202973
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectData- och informationsvetenskap
dc.subjectComputer and Information Science
dc.titleTwitter Topic Modeling
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster Thesisen
dc.type.uppsokH
local.programmeComputer science – algorithms, languages and logic (MPALG), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
202973.pdf
Storlek:
8.71 MB
Format:
Adobe Portable Document Format
Beskrivning:
Fulltext