A Software Engineering Perspective on Data Quality Processes in Environmental Research - Recommendations Based on Software Engineering Practices Applied for Improving of Open Data Practices and Communication in Environmental Research
Ladda ner
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Software engineering and technology (MPSOF), MSc
Publicerad
2024
Författare
MOEN, MARKUS
NORÉN, MAX
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The many fields within environmental research have been on the path towards open science and, most importantly, open data. With the increase in available data, there are opportunities to apply data-driven and data-intensive methods, including recent developments such as machine learning. However, the success of applying machine learning depends significantly on the quality of the available training data. The purpose of this thesis was to investigate the field of environmental research in regards to current views, practices and communication of data quality and to identify software engineering principles and practices that can form possible recommendations to progress data quality in environmental research. This process identified six challenges and proposed eight recommendations. The result shows a great deal of effort towards open data, with the FAIR principles as the main arbiter to achieve it. Most identified challenges are based on data quality handling, communication, and difficulties in achieving open science. We found suitable software engineering practices for four of the six challenges, with two key perspectives being derived from open source software and requirements engineering practices. Our results demonstrate that there is a willingness among environmental researchers to investigate and adopt software engineering practices in environmental research. Importantly, there is a broad agreement that open science is an improvement over to previous methods, and the stated challenges and recommendations need to preserve those advancements. The recommendations should be regarded as a first design iteration of these recommendations, and they should be explored further in terms of their applicability to different fields within environmental research.
Beskrivning
Ämne/nyckelord
Software engineering , requirements engineering , data quality , environmental research , open science , open data , data-intensive , big data , FAIR , thesis