Shorter build-measure-learn cycle in software development by using natural language to query big data sets

Typ
Examensarbete för masterexamen
Master Thesis
Program
Software engineering and technology (MPSOF), MSc
Publicerad
2014
Författare
Berget, Markus
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Background Big data is used by many companies to gain insights and drive decisions. Data-scientist is a role that is responsible for analyzing and finding trends in data. In software product development these insights can be valuable in order to improve the quality of the software product. Examples of data used can be usage logs, social media data etc. But the gap between the stakeholders in software product development and data-insights makes it difficult for stakeholders in software product development to gain fast insights about data. Objective This thesis explores what possible factors make it difficult for stakeholders in software product development to gain data-insights in order to improve products. The thesis also explores how stakeholders in software product development can gain big-data insights without the involvement of data-scientists. Method The research method chosen in this thesis was action research. The research contained five iterations with a collaborating company. The iterations conducted were: rule based parsing using a DSL, statistical parsing using machine learning, webapplication prototype, survey, and observations. Results It was concluded from the results of the survey and semi-structured observations that there was a need to improve data-insights for stakeholders in software product development. The main issues found was lack of customizability and exibility, also the multiple data sources used and difficulties to explore the data. A prototype was presented to address the identified issues. The prototype used natural language and machine learning for querying data. The prototype also supported querying of multiple data sources. From the observations the prototype proved to be a simple way to query the data and allowing for querying multiple data sources in one place. Conclusion The proposed prototype did not eliminate the need for data-scientists. But the prototype worked as a structured communication channel for data scientists to gauge stakeholders interest in different data queries and adding missing functionality by using a data driven approach.
Beskrivning
Ämne/nyckelord
Data- och informationsvetenskap , Informations- och kommunikationsteknik , Computer and Information Science , Information & Communication Technology
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index