A Guide to Quality Assurance of Machine Learning Software: A Toolkit based on Goals, Contexts, and Testing Solutions
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The intersection of quality assurance (QA) and machine learning (ML) is a research
area that has sparked interest in recent years. In this thesis, the focus is on traditional
ML, which includes all types of ML algorithms except deep learning. To this
end, we set out to investigate the context of different ML-based systems (MLS) and
the QA goals that are applicable to ML. By snowballing three recent literature reviews
and surveys, and analysing the collected set of papers, we created a curated list
of papers that propose testing techniques for different MLSs. Each paper focuses on
one or more testing goals, such as fairness or robustness, to help developers achieve
these goals for their systems. In total 14 goals have been defined and incorporated
as tags that classify the collected papers. With this knowledge we have created a
mapping of testing solutions, that would output a paper proposing a solution given
a testing goal and a context of a system as the input. This forms the foundation for
our toolkit that guides developers in finding appropriate testing solutions for their
particular ML project. This tool was implemented as a filterable table in Microsoft
Lists. Eight employees from Volvo Cars participated in a judgement study to evaluate
the effectiveness and user experience of applying the toolkit on two industrial
use cases. These eight subjects, each experienced within ML, were interviewed to
collect feedback about the mapping and toolkit. The evaluation shows that the
current implementation of the toolkit has some limitations but is capable of helping
developers find relevant papers for their use cases. By making it easier for ML developers
to find QA solutions, both current and future products and services that
utilise ML can become more reliable for its users.
Beskrivning
Ämne/nyckelord
traditional machine learning, testing, software engineering, toolkit, mapping, quality assurance, project, thesis