Empirical Analysis of Hidden Technical Debt Patterns in Machine Learning Software
dc.contributor.author | ALAHDAB, MOHANNAD | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
dc.contributor.examiner | Feldt, Robert | |
dc.contributor.supervisor | Calikli, Gul | |
dc.date.accessioned | 2019-10-31T09:24:29Z | |
dc.date.available | 2019-10-31T09:24:29Z | |
dc.date.issued | 2019 | sv |
dc.date.submitted | 2019 | |
dc.description.abstract | Development, deployment, and maintenance of Machine Learning (ML) based software products are costly. However, these costs are usually neglected. Challenges regarding maintainability of ML software were explained under the framework of "Hidden Technical Debt" (HTD) by Sculley et al. [10] by making an analogy to technical debt in traditional software. HTD patterns are due to a group of ML software practices and activities leading to the future difficulty in ML system improvements, many unhandled errors in the long term and hence considered as main causes of the increase in maintainability cost. Moreover, some of those patterns keep expanding unnoticed; for that reason, they are called hidden patterns. ML systems have a special ability for increasing technical debt due to ML specific issues at the system level in addition to having all the problems of regular code. The aim of this thesis is to empirically analyze which and how HTD patterns emerge during the early development phase of ML software, namely the prototyping phase. For this purpose, we conducted a case study to analyze ML models. These models will go into production and then integrated to the software system owned by Västtrafik, that is the public transportation agency in the west area of Sweden. In order to investigate the generalizability of our case study findings, we conducted a workshop with practitioners consisting of data scientists and software engineers. During our case study, out of 25 HTD patterns, we were able to detect 12. One of the 12 patterns detected was only observed to a limited extent. Observed patterns during prototyping are mainly underutilized data dependencies (e.g., correlated, bundled, and features) and ML code smells (e.g., glue code, pipeline jungles, dead experimental code paths). We also observed entanglement, configuration debt, abstraction debt, prediction bias, multiple language smell, data testing debt, and cultural debt up to some extent. All of the 12 HTD patterns that were undetected, could only be detected after deployment of ML software. The only undetected HTD pattern is "Plain Old Data Type Smell", since we did not implement the ML algorithms from scratch, but instead used existing ML libraries owned by an online cloud solution. Our workshop results indicate that, majority of our findings are applicable to other ML application domains. Practitioners also agreed that prototypes built by data scientists are not ideal in terms of software engineering (SE) practices. Hence, developers need to refactor the prototypes in order to prepare for the production stage. | sv |
dc.identifier.coursecode | DATX05 | sv |
dc.identifier.uri | https://hdl.handle.net/20.500.12380/300501 | |
dc.language.iso | eng | sv |
dc.setspec.uppsok | Technology | |
dc.subject | Technical Debt | sv |
dc.subject | Hidden Technical Debt Patterns (HTD) Patterns | sv |
dc.subject | Machine Learning (ML) | sv |
dc.subject | Software Engineering (SE) | sv |
dc.subject | Maintainability | sv |
dc.subject | Feature Engineering | sv |
dc.title | Empirical Analysis of Hidden Technical Debt Patterns in Machine Learning Software | sv |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.uppsok | H |
Ladda ner
Original bundle
1 - 1 av 1
Hämtar...
- Namn:
- CSE 19-118 Alahdab.pdf
- Storlek:
- 4.66 MB
- Format:
- Adobe Portable Document Format
- Beskrivning:
- Empirical Analysis of Hidden Technical Debt Patterns in Machine Learning Software
License bundle
1 - 1 av 1
Hämtar...
- Namn:
- license.txt
- Storlek:
- 1.14 KB
- Format:
- Item-specific license agreed upon to submission
- Beskrivning: