Shorter build-measure-learn cycle in software development by using natural language to query big data sets

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/220541
Download file(s):
File Description SizeFormat 
220541.pdfFulltext1.88 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Shorter build-measure-learn cycle in software development by using natural language to query big data sets
Authors: Berget, Markus
Abstract: Background Big data is used by many companies to gain insights and drive decisions. Data-scientist is a role that is responsible for analyzing and finding trends in data. In software product development these insights can be valuable in order to improve the quality of the software product. Examples of data used can be usage logs, social media data etc. But the gap between the stakeholders in software product development and data-insights makes it difficult for stakeholders in software product development to gain fast insights about data. Objective This thesis explores what possible factors make it difficult for stakeholders in software product development to gain data-insights in order to improve products. The thesis also explores how stakeholders in software product development can gain big-data insights without the involvement of data-scientists. Method The research method chosen in this thesis was action research. The research contained five iterations with a collaborating company. The iterations conducted were: rule based parsing using a DSL, statistical parsing using machine learning, webapplication prototype, survey, and observations. Results It was concluded from the results of the survey and semi-structured observations that there was a need to improve data-insights for stakeholders in software product development. The main issues found was lack of customizability and exibility, also the multiple data sources used and difficulties to explore the data. A prototype was presented to address the identified issues. The prototype used natural language and machine learning for querying data. The prototype also supported querying of multiple data sources. From the observations the prototype proved to be a simple way to query the data and allowing for querying multiple data sources in one place. Conclusion The proposed prototype did not eliminate the need for data-scientists. But the prototype worked as a structured communication channel for data scientists to gauge stakeholders interest in different data queries and adding missing functionality by using a data driven approach.
Keywords: Data- och informationsvetenskap;Informations- och kommunikationsteknik;Computer and Information Science;Information & Communication Technology
Issue Date: 2014
Publisher: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/220541
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.