Developing Social Media Analytics by the Means of Machine Learning: The Case of the Diffusion of Virtual Reality Technology

Typ
Examensarbete för masterexamen
Master Thesis
Program
Management and economics of innovation (MPMEI), MSc
Publicerad
2017
Författare
Berthold, Adam
Larsson, Daniel
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Social media analytics is concerned with analyzing data generated from social media platforms. It is commonly used within businesses to gain insights in order to improve decision-making. Social media analytics is also used within research, notably innovation research. Using social media data within research often entails reading large amounts of text posts. Since social media datasets can quickly become very large, there is a demand for computerized methods to replace manual analysis. This research is concerned withexploringwaystousemachine learning within innovation research to replace manual analysis of social media data. This study applies machine learning on a case study concerned with the diffusion of virtual reality technology. Virtual reality technology is a technology which has created much online buzz lately. However, sales have been much lower than expected. As such, the case is concerned with whyvirtualrealitytechnologyisnotmorepopular on the market. The case-study will use a social media analytics approach complemented with a method utilizing machine learning. The study makes use of several different researchers’ theories on diffusion of innovation to analyze the case. The dataset used in the case study consists of approximately 6000 public text posts written in Swedish on Facebook, Twitter, forums and other platforms. The dataset wascollectedbetweenAugust2016andAugust2017. To investigate the barriers to diffusion, while also being able to apply machine learning algorithms, the posts are categorized into four different categories based on the topic of the post. The categories, “Technological Utility”, “NetworkExternalities”,“Price”and“Trialability”arederivedfromtheoriesofdiffusionofinnovation.Also,some posts are marked as “Spam” and not taken into account during analysis. The categorization is done manually as well as through a machine learning algorithm. The machine learning program usedisbasedontheSVMclassifier,whichisasupervisedbinaryclassifier.Thehyperparameter“C”isset to 1.2 and the N-gram to 1. The evaluation metrics used are accuracy and AUROC. Using k-fold crossvalidation on the dataset, these evaluation metrics reach about 85 % and about 0.8respectively.Comparingtheresultsofthe machine learning categorization and the manualcategorizationrevealsthattheseevaluationmetricsaretoolowfor a practical use in research since it has the potential to significantly change the outcome of the study. The conclusion of the study is that in order to use machine learning in innovation research, the performance of algorithms needs to be very high, which is hard to do with the classifiers used in the study. In the future more complex algorithms or using different methods for feature selection should be explored. Furthermore, a larger dataset would naturally induce higher performance, and allow for other types of algorithms. The case study however suggests that a small testing set could be useful to apply on Big Data contexts where manual analysis is not feasible, but such a method would always compromise accuracy for time saving. The study concludes that perhaps the most feasible way to use machine learning in social media analytics innovation research wouldbeeitherasafiltertopickoutdatatoanalyze,orinatemporalanalysisthatisinterested in trends over time. The case concludes that the
Beskrivning
Ämne/nyckelord
Transport , Övrig industriell teknik och ekonomi , Transport , Other industrial engineering and economics
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index