Text Analysis - Exploring latent semantic models for information retrieval, topic modeling and sentiment detection

Luotonen, Adam; Jalsborn, Erik

Text Analysis - Exploring latent semantic models for information retrieval, topic modeling and sentiment detection

dc.contributor.author	Luotonen, Adam
dc.contributor.author	Jalsborn, Erik
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)	en
dc.date.accessioned	2019-07-03T12:43:25Z
dc.date.available	2019-07-03T12:43:25Z
dc.date.issued	2011
dc.description.abstract	With the increasing use of the Internet and social media, the amount of available data has exploded. As most of this data is natural language text, there is a need for efficient text analysis techniques which enable extraction of useful data. This process is called text mining, and in this thesis some of these techniques are evaluated for the purpose of integrating them into the visual data mining software TIBCO Spotfire®. In total, five analysis models with different running time, memory use and performance have been analyzed, implemented and evaluated. The tf-idf vector space model was used as a baseline. It can be extended using Latent Semantic Analysis and random projection to find latent semantic relationships between documents. Finally, Latent Dirichlet Allocation (LDA), Joint Sentiment/ Topic model (JST) and Sentiment Latent Dirichlet Allocation (SLDA) are used to extract topics. The latter two are extensions to LDA which also detects positive and negative sentiment. Evaluation was done using the perplexity measure for topic modeling, average precision for searching and classification accuracy of positive and negative reviews for the sentiment models. It was concluded that for searching, a vector space model with tf-idf weighting had similar performance compared to the latent semantic models for the test corpus used. Topic modeling showed to provide useful output, however at the expense of running time. The JST and SLDA sentiment detectors showed a small improvement compared to a baseline word counting classifier, especially for a multiple domain dataset. Finally it was shown that they had mixed sentiment classification accuracy from run to run, indicating that further investigation is motivated.
dc.identifier.uri	https://hdl.handle.net/20.500.12380/149698
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Datavetenskap (datalogi)
dc.subject	Computer Science
dc.title	Text Analysis - Exploring latent semantic models for information retrieval, topic modeling and sentiment detection
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master Thesis	en
dc.type.uppsok	H

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: 149698.pdf
Size:: 2.63 MB
Format:: Adobe Portable Document Format
Description:: Fulltext

Ladda ner

Samlingar

Examensarbeten för masterexamen