Applying software engineering and machine learning practices to manage machine learning complexity
Typ
Examensarbete för masterexamen
Program
Publicerad
2022
Författare
HIllström, Sara
Mejborn, Johan
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Today, both software engineering (SE) and machine learning (ML) are two fairly
well-established areas within engineering. The field of software engineering for machine
learning (SE4ML) addresses the issue of applying software engineering practices
for software containing ML. Complexity is a term with a widespread definition,
and the way of handling and defining it is something that differs between traditional
software engineering and machine learning. In this thesis, complexity is defined as
the measure of the resources expended by another system in interacting with a piece
of software. If the interacting system is another machine we define it as resource
cost, and if the interacting system is instead people (tasks such as, e.g., debugging
and testing) we define it as software complexity.
This thesis was conducted in close collaboration with a partner company and aims
to contribute to SE4ML by providing a framework aimed to act as guidance for
how software complexity and resource cost may be addressed in different parts of
the ML development process. The framework should also provide insights into
possible trade-offs between software complexity and resource cost. To validate the
framework, validation interviews with practitioners as well as representatives from
academia were held, and the framework was also applied to an existing problem at
the partner company. The latter was done by tweaking an existing ML model and
developing two other models for comparison purposes.
In conclusion, the validation interviews and the application to an existing ML model
confirmed that the framework is useful for practitioners. There are trade-offs between
some of the different activities that form the framework, referred to as artifacts.
This means that practitioners, to some extent, need to balance contradicting
artifacts to optimize the resource cost and software complexity trade-off, depending
on the specific use case at hand.
Beskrivning
Ämne/nyckelord
software engineering , machine learning , complexity , framework