A Guide to Quality Assurance of Machine Learning Software: A Toolkit based on Goals, Contexts, and Testing Solutions

Publicerad

Typ

Examensarbete för masterexamen

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

The intersection of quality assurance (QA) and machine learning (ML) is a research area that has sparked interest in recent years. In this thesis, the focus is on traditional ML, which includes all types of ML algorithms except deep learning. To this end, we set out to investigate the context of different ML-based systems (MLS) and the QA goals that are applicable to ML. By snowballing three recent literature reviews and surveys, and analysing the collected set of papers, we created a curated list of papers that propose testing techniques for different MLSs. Each paper focuses on one or more testing goals, such as fairness or robustness, to help developers achieve these goals for their systems. In total 14 goals have been defined and incorporated as tags that classify the collected papers. With this knowledge we have created a mapping of testing solutions, that would output a paper proposing a solution given a testing goal and a context of a system as the input. This forms the foundation for our toolkit that guides developers in finding appropriate testing solutions for their particular ML project. This tool was implemented as a filterable table in Microsoft Lists. Eight employees from Volvo Cars participated in a judgement study to evaluate the effectiveness and user experience of applying the toolkit on two industrial use cases. These eight subjects, each experienced within ML, were interviewed to collect feedback about the mapping and toolkit. The evaluation shows that the current implementation of the toolkit has some limitations but is capable of helping developers find relevant papers for their use cases. By making it easier for ML developers to find QA solutions, both current and future products and services that utilise ML can become more reliable for its users.

Beskrivning

Ämne/nyckelord

traditional machine learning, testing, software engineering, toolkit, mapping, quality assurance, project, thesis

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced