Empirical Analysis of Hidden Technical Debt Patterns in Machine Learning Software
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Development, deployment, and maintenance of Machine Learning (ML) based software
products are costly. However, these costs are usually neglected. Challenges
regarding maintainability of ML software were explained under the framework of
"Hidden Technical Debt" (HTD) by Sculley et al. [10] by making an analogy to
technical debt in traditional software. HTD patterns are due to a group of ML
software practices and activities leading to the future difficulty in ML system improvements,
many unhandled errors in the long term and hence considered as main
causes of the increase in maintainability cost. Moreover, some of those patterns keep
expanding unnoticed; for that reason, they are called hidden patterns. ML systems
have a special ability for increasing technical debt due to ML specific issues at the
system level in addition to having all the problems of regular code. The aim of this
thesis is to empirically analyze which and how HTD patterns emerge during the
early development phase of ML software, namely the prototyping phase. For this
purpose, we conducted a case study to analyze ML models. These models will go
into production and then integrated to the software system owned by Västtrafik,
that is the public transportation agency in the west area of Sweden. In order to
investigate the generalizability of our case study findings, we conducted a workshop
with practitioners consisting of data scientists and software engineers. During our
case study, out of 25 HTD patterns, we were able to detect 12. One of the 12
patterns detected was only observed to a limited extent. Observed patterns during
prototyping are mainly underutilized data dependencies (e.g., correlated, bundled,
and features) and ML code smells (e.g., glue code, pipeline jungles, dead experimental
code paths). We also observed entanglement, configuration debt, abstraction
debt, prediction bias, multiple language smell, data testing debt, and cultural debt
up to some extent. All of the 12 HTD patterns that were undetected, could only
be detected after deployment of ML software. The only undetected HTD pattern
is "Plain Old Data Type Smell", since we did not implement the ML algorithms
from scratch, but instead used existing ML libraries owned by an online cloud solution.
Our workshop results indicate that, majority of our findings are applicable to
other ML application domains. Practitioners also agreed that prototypes built by
data scientists are not ideal in terms of software engineering (SE) practices. Hence,
developers need to refactor the prototypes in order to prepare for the production
stage.
Beskrivning
Ämne/nyckelord
Technical Debt, Hidden Technical Debt Patterns (HTD) Patterns, Machine Learning (ML), Software Engineering (SE), Maintainability, Feature Engineering