A Domain-Specific Language for Cross-
platform, Edge-deployed Machine Learn-
ing Models

A Model Interpretation-based Approach

Master’s thesis in Computer science and engineering

Albin Karlsson Landgren
Philip Perhult Johnsen

Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2024


Master’s thesis 2024

A Domain-Specific Language for Cross-platform,
Edge-deployed Machine Learning Models

A Model Interpretation-based Approach

Albin Karlsson Landgren
Philip Perhult Johnsen

Department of Computer Science and Engineering
Chalmers University of Technology

University of Gothenburg
Gothenburg, Sweden 2024


A Domain-Specific Language for Cross-platform, Edge-deployed Machine Learning
Models
A Model Interpretation-based Approach
ALBIN KARLSSON LANDGREN
PHILIP PERHULT JOHNSEN

© ALBIN KARLSSON LANDGREN, 2024.
© PHILIP PERHULT JOHNSEN, 2024.

Supervisor: Daniel Strüber, Computer Science and Engineering
Advisor: Ludwig Friborg, Wiretronic
Examiner: Hans-Martin Heyn, Computer Science and Engineering

Master’s Thesis 2024
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Typeset in LATEX
Gothenburg, Sweden 2024

iv


A Domain-Specific Language for Cross-platform, Edge-deployed Machine Learning
Models
A Model Interpretation-based Approach
Albin Karlsson Landgren
Philip Perhult Johnsen
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg

Abstract
Deploying machine learning (ML) models on edge devices presents unique challenges.
The challenges arise from the different environments used for developing ML models
and those required for their deployment, leading to a gray area of competence and
expertise between ML engineers and application developers. This thesis presents
the design and implementation of a domain-specific language aimed at simplifying
the deployment of ML models on edge devices, specifically smartphones. It aims to
bridge the gap between ML engineers and application engineers, creating a shared
platform for deploying ML models on edge devices. The study exists at the intersec-
tion of model-driven engineering, machine learning, and cross-platform smartphone
development. It explores model-driven engineering in an environment where devel-
opers don’t have full control over the deployment platform, using model interpreta-
tion to generate ML serving pipelines (pre- and postprocessing of data before and
after inference) during runtime, thus removing the need to re-release an application
upon changes to a pipeline. We follow a design science approach consisting of three
research cycles. We elicited requirements through an initial literature study and
interviews with engineers at the collaboration company. This was followed by de-
signing and implementing an artifact within the domain presented above. Finally,
we evaluated the proposed solution with engineers at the collaboration company
through a controlled experiment and subsequent qualitative interviews. The de-
veloped artifact consists of a lightweight, JSON-based domain-specific language de-
signed to describe ML serving pipelines, along with an accompanying Flutter library
to generate the pipelines during runtime. The evaluation showed that it increased
development speed, decreased the amount of code required to make changes to an
ML serving pipeline, and made engineers less experienced in mobile development
more confident contributing to the domain.

Keywords: Computer, science, computer science, engineering, project, thesis.

v


Acknowledgements
We wish to thank our supervisor Daniel Strüber for the valuable feedback and guid-
ance throughout the thesis work. We also wish to thank our examiner Hans-Martin
Heyn for valuable report feedback during the study.
We are grateful to Wiretronic AB and its engineers for their participation in inter-
views and experiments, and for providing us with the opportunity to conduct our
thesis work at their company and office.

Albin Karlsson Landgren & Philip Perhult Johnsen, Gothenburg, June 2024

vii


Contents

List of Figures xiii

List of Tables xv

1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Study Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Purpose and Significance of the Study . . . . . . . . . . . . . . . . . 3
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Theory 5
2.1 Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Edge-deployed Machine Learning . . . . . . . . . . . . . . . . . . . . 5
2.3 Cross-Platform Mobile Development . . . . . . . . . . . . . . . . . . 6

2.3.1 ML in Cross-Platform Mobile Environments . . . . . . . . . . 6
2.4 Model-Driven Engineering . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4.1 Code Generation and Model Interpretation . . . . . . . . . . . 7
2.4.2 Metamodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.3 MDE in Edge Devices . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Domain-specific Languages . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Developing a Domain-Specific Language . . . . . . . . . . . . 9
2.5.2 Domain-Specific Languages in the Deployment of ML Models . 10
2.5.3 JSON Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Methods 11
3.1 Design Science Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Cycle 1: Domain Understanding and Initial Artifact Definition . . . . 12

3.2.1 Literature Study . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 Repository Analysis and Program Comprehension . . . . . . . 12
3.2.3 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.4 Requirements Engineering . . . . . . . . . . . . . . . . . . . . 13

3.3 Cycle 2: Artifact Design and Development . . . . . . . . . . . . . . . 14
3.3.1 Design and Technology Choices . . . . . . . . . . . . . . . . . 14

3.4 Cycle 3: Artifact Evaluation . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.1 Controlled Experiment . . . . . . . . . . . . . . . . . . . . . . 15

ix


Contents

3.4.1.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.1.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Results 19
4.1 Initial Problem Exploration . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.1 Interview Findings . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.2 Impact on Artifact Development . . . . . . . . . . . . . . . . . 20

4.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 User Stories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . 22

4.2.2.1 Pipeline Specification (DSL) . . . . . . . . . . . . . . 22
4.2.2.2 Platform-Specific Model Interpretation (DSL + Ar-

chitecture) . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.2.3 Support Pre-Existing and Custom Operations (DSL) 22
4.2.2.4 Support Dynamic Changes of the Pipeline (Archi-

tecture) . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.3 Non-Functional Requirements . . . . . . . . . . . . . . . . . . 23

4.2.3.1 Usability . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.3.2 Maintainability . . . . . . . . . . . . . . . . . . . . . 23
4.2.3.3 Performance . . . . . . . . . . . . . . . . . . . . . . 24
4.2.3.4 Compatibility . . . . . . . . . . . . . . . . . . . . . . 24

4.3 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.1 Current Approach . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.2 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . 24

4.3.2.1 Domain-Specific Language . . . . . . . . . . . . . . . 25
4.3.3 Model Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.4 DSL Development Tools . . . . . . . . . . . . . . . . . . . . . 29

4.4 Artifact Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4.1 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4.1.1 Development Time . . . . . . . . . . . . . . . . . . . 30
4.4.1.2 Lines of Code . . . . . . . . . . . . . . . . . . . . . . 32
4.4.1.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . 32

4.4.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4.3.1 Functional Requirements . . . . . . . . . . . . . . . . 33
4.4.3.2 Non-functional Requirements . . . . . . . . . . . . . 34

5 Discussion 37
5.1 Research Question 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Research Question 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Research Question 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Cross-Platform Communication . . . . . . . . . . . . . . . . . . . . . 39
5.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.5.1 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5.2 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5.3 Construct Validity . . . . . . . . . . . . . . . . . . . . . . . . 41

x


Contents

6 Conclusion 43
6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.2.1 DSL Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2.2 Deeper Exploration Within the Domain . . . . . . . . . . . . 44
6.2.3 Expansion to Other Domains . . . . . . . . . . . . . . . . . . 44

Bibliography 45

A Appendix 1 - Initial Interviews I

B Appendix 2 - Experiment Interviews III

C Appendix 3 - Mann-Whitney U Test Code V

D Appendix 4 - Fisher’s Exact Test Code VII

E Appendix 5 - Artifact Code IX

xi


Contents

xii


List of Figures

2.1 An example hierarchy displaying how artifacts, models, and meta-
models relate to each other. . . . . . . . . . . . . . . . . . . . . . . . 8

4.1 The preprocessing method in Java that Wiretronic uses for one of
their models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Abstract syntax of the DSL. . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Illustration of how the model engine prepares an ML serving pipeline

from a DSL instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Comparison of an example JSON schema as defined using Typebox

(left) and the actual schema outputted by Typebox (right). . . . . . . 30
4.5 The mean time per task (in minutes) for the old and new approach,

respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

xiii


List of Figures

xiv


List of Tables

3.1 List of interviewees participating in initial interviews. . . . . . . . . . 13
3.2 Experiment setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 The engineers from Wiretronic that participated in the experiment,

with their respective experience levels. . . . . . . . . . . . . . . . . . 16

4.1 The time (in minutes) it took for engineers A & B to complete the
subtasks in the first task using the new approach, and the subtasks
in the second task using the old approach. . . . . . . . . . . . . . . . 31

4.2 The time (in minutes) it took for engineers C & D to complete the
subtasks in the first task using the new approach, and the subtasks
in the second task using the old approach. . . . . . . . . . . . . . . . 31

4.3 The lines of code written by engineers A & B to complete the subtasks
in the first task using the new approach, and the subtasks in the
second task using the old approach. . . . . . . . . . . . . . . . . . . . 31

4.4 The lines of code written by engineers C & D to complete the subtasks
in the first task using the old approach, and the subtasks in the second
task using the new approach. . . . . . . . . . . . . . . . . . . . . . . 32

4.5 The correctness for engineers A & B when completing the subtasks in
the first task using the new approach, and the subtasks in the second
task using the old approach. . . . . . . . . . . . . . . . . . . . . . . . 32

4.6 The correctness for engineers C & D when completing the subtasks in
the first task using the old approach, and the subtasks in the second
task using the new approach. . . . . . . . . . . . . . . . . . . . . . . 32

4.7 The startup time (in milliseconds) of the application for the new and
old approach respectively. . . . . . . . . . . . . . . . . . . . . . . . . 35

xv


List of Tables

xvi


1
Introduction

When deploying an ML model, the choice of target platform can have a significant
impact on the ease of deployment. Models are often written, trained, and deployed
in a Python environment using libraries such as TensorFlow or PyTorch [1], with
the same group of engineers maintaining control during the entire process. However,
there are cases where the same toolchain or platform is not available throughout the
whole process. One such example arises when deploying machine learning models
on edge devices, specifically smartphones. In such scenarios, there is a chance that
the training and deployment environments are different and that the deployment
environment only contains the already trained models. Thus, the engineers deploy-
ing the models to applications might not know how to format inputs and outputs
from the model. Therefore, they might land in a gray area of competence and re-
sponsibility between those developing and deploying the models. The ML engineers
developing the models have most of the knowledge related to the actual models,
while the application engineers know the deployment platform, but might lack the
crucial context required to effectively deploy the model.
To add to the development and deployment scenario, further complexity arises since
the development is happening in a cross-platform environment, a consequence of
smartphone applications being developed with two separate platforms in mind (iOS
and Android). By bridging the gap between the different engineering roles and
simultaneously reducing the amount of equivalent code being written twice, the in-
tention is to improve knowledge-sharing across engineering teams, improve developer
experience, and facilitate experimentation and flexibility in development.

1.1 Background and Motivation
The study aims to simplify the process of deploying ML models on edge devices,
specifically smartphones. When, for example performing ML inference on image
data from a smartphone camera, a series of pre- and post-processing steps is re-
quired before and after performing the inference, as different models require inputs
of different shapes and output results of different shapes. These steps collectively
form a pipeline, referred to as an ML serving pipeline.
Our study and developed artifact aim to facilitate the development and maintenance
of ML serving pipelines on smartphones using model-driven engineering (MDE).
We propose a domain-specific language (DSL) and accompanying Flutter library
that allows developers to easily specify and make changes to ML serving pipelines

1


1. Introduction

deployed on smartphones. The DSL should be able to specify an ML serving pipeline,
and the accompanying library should be able to support the execution of the pipeline
using platform-specific functionality, based on the contents of a DSL instance. The
DSL would act as a platform to facilitate a shared understanding of an ML serving
pipeline between the ML engineer and application engineer.

1.2 Study Context
The study is conducted in collaboration with Wiretronic AB and their AI division
in Gothenburg, where one of the group members is employed. Wiretronic develops
image-based ML products (e.g. object detection and computer vision), including a
suite of smartphone applications [2], [3]. As a part of this offering, the company de-
ploys ML models directly on devices, where the variety in architecture and operating
systems can make it more difficult to deploy software effectively.

1.3 Problem Description
This problem was introduced by Wiretronic. The applications and corresponding
libraries developed at Wiretronic are written in Flutter [4], a library for the Dart pro-
gramming language enabling cross-platform development of native apps for iOS and
Android. Whilst a majority of the functionality can be developed in Flutter, some
aspects require writing platform-specific code. This often entails working directly
with hardware resources on the device, such as the device’s camera, or hardware-
optimized ML libraries [5], [6]. Thus, the developer is required to write equiva-
lent implementations for two platforms, underlining the challenges in deploying ML
models on smartphones. Wiretronic believes that since the platform-specific code
is constrained to a very specific domain, the workflow can be enhanced. By lifting
the development to a suitable level of abstraction and making use of model-driven
engineering techniques they could avoid having to write equivalent, domain-specific
code for multiple platforms.
The deployment of ML models on smartphones or edge devices, in general, can cause
problems with maintenance and updates. When deploying an ML model on a cen-
tralized server, the developer can have near full control over that server and perform
updates as needed, without users noticing or requiring manual work. Meanwhile,
when developing and deploying ML models for smartphones, this process is non-
trivial. If the ML model and related functionality are bundled and shipped with the
application when installed on a user’s device, we must re-publish the application to
the app store of each platform upon making changes or updates and the user must
reinstall the application. An alternative method to this is to view the model as an
asset to fetch from the application, allowing for easier updates. However, this still
requires developers and users to perform the update process if a new version of the
ML model requires a different serving pipeline.
In combination, these issues create problems with knowledge-sharing, developer ex-
perience, and flexibility when working with ML models on smartphones. It is unclear

2


1. Introduction

who is responsible and most suited to handle the deployment of the ML models, the
code often has to be written for two separate platforms, and then subsequently
re-deployed for these separate platforms’ app stores. By using a DSL and model-
driven engineering, we can address communication difficulties, create a single source
of truth for the ML serving pipeline, and decrease development as well as deployment
efforts. Subsequently, this can also improve the end-user experience since updates
to the ML performance can occur without them noticing or having to update the
application.

1.4 Research Questions
The study is guided by the findings from studying the role of MDE in edge-deployed
ML and applied to the specific context of cross-platform mobile development. Specif-
ically, we will explore how a DSL can be designed and utilized to enhance this
development, centered around the following research questions:

• RQ1: How can a domain-specific language (DSL) describe an ML model (and,
for example, its required inputs, outputs, and pre- and post-processing stages)?

• RQ2: How can we best implement and utilize the DSL in a concrete setting,
specifically in the development of cross-platform mobile applications?

• RQ3: To what extent does the introduction of a DSL and an accompanying
library improve the developer experience in the aspects of maintenance, feature
development, time-saving, and resource planning?

1.5 Purpose and Significance of the Study
The study aims to answer the questions asked in Section 1.4. The significance of the
study is to simplify the process of deploying ML models on edge devices, specifically
smartphones. The main challenge identified is the requirement of deploying equiva-
lent solutions to multiple platforms, and we aim to solve this problem using a model-
driven engineering approach. The contribution of the study is a DSL for describing
the required ML serving pipeline for deploying an ML model on smartphones in a
given context. The DSL will support knowledge-sharing across engineering teams,
improve the developer experience and, facilitate experimentation and flexibility in
development.
While domain-specific languages in machine learning are well-documented, we aim
to use this study to contribute to their application in a cross-platform context,
which is less explored. Chapter 2 expands on the relevant theoretical background
and reviews the state of the art in this field.

3


1. Introduction

1.6 Thesis Outline
Chapter 1: Introduces the thesis project. Explaining our background and motiva-
tion, problem description, and research questions regarding this thesis project.

Chapter 2: Consists of the groundwork done to grasp the theory and domain
related to the project. Explaining the most significant topics of the research like
edge-deployed machine learning, model-driven engineering, and domain-specific lan-
guages.

Chapter 3: Explains the design science cycles the project is centered around.
How we decided to conduct the work, taking inspiration from design science and
requirements engineering defining the requirements and artifact through extensive
interviews and casual conversations with the employees at Wiretronic.

Chapter 4: Provides a presentation of our findings during the first research cycle,
and how we designed the artifact. How we obtained our insights and how that in
collaboration with the background theory resulted in our requirements and imple-
mentation.

Chapter 5: Discusses primarily the other possibilities we had in choosing the de-
sign of our artifact and our reasoning behind not opting for those paths. Secondly,
we discuss potential improvements to our artifact.

Chapter 6: Concludes the thesis, emphasizing a summary of our results and dis-
cussing the future work related to our research and artifact.

4


2
Theory

2.1 Research Gap
We now present several relevant lines of research that, so far, have been developed
independently, but their combination has not been considered yet. As of conducting
our research, there is currently no available research material about creating a DSL
for machine learning models deployed on edge devices in a cross-platform environ-
ment. We aim to explore this area, where there are usually several code bases for
the ML models that run on Android in addition to iOS.

2.2 Edge-deployed Machine Learning
Edge-deployed ML refers to deploying ML models on edge devices instead of on a
centralized server. An edge device can, for example, be a smartphone or Internet of
Things (IoT) device, which generally has far simpler hardware than a server in a data
center. The deployment of ML models on edge devices has increased significantly
in recent years thanks to advancements in both software and hardware [7]–[9]. De-
spite the less advanced hardware, deploying machine learning models on edge devices
presents several advantages when compared to a centralized approach. Transmitting
potentially sensitive or private data to a remote server introduces the risk of data
leakage, with a fault in the remote system potentially leading to personal or financial
consequences [8]. Additionally, eliminating the need for connecting to an external
service for ML inference can improve both latency and reliability, as the potential
bottleneck introduced by a weak network connection is removed. Despite these im-
provements, deploying ML models on edge devices, specifically smartphones, is not
straightforward. One reason for this is the heterogeneity of underlying architec-
ture [10]. A wide selection of libraries to deploy ML models on smartphones exists,
but they each perform differently depending on the device’s hardware configuration.
A difference in cache size or GPU capacity can cause two libraries accomplishing
equivalent tasks to perform differently, and with the wide range of hardware config-
urations present in the market, it is difficult to develop a solution optimal for every
device [10]. Additionally, opting to deploy an ML model on devices instead of in
a centralized environment can create obstacles to improvement and maintenance.
When deploying an ML model in a centralized environment, the developer has full
control over software and hardware and can develop the artifacts surrounding the
model for a single environment. If the development is instead targeted at smart-

5


2. Theory

phones, the model has to be deployable on both iOS and Android devices, which
each have distinct underlying architectures for deploying custom ML models [5], [6].

2.3 Cross-Platform Mobile Development
Mobile developers targeting both iOS and Android users may opt for a cross-platform
framework, which enables the creation of separate, native builds for both platforms
from a single codebase. The two most widely used frameworks for this purpose
are Flutter and React Native [11]. Flutter is an open-source framework for the
Dart language maintained by Google, while React Native is a JavaScript frame-
work maintained by Meta. While there are some differences in architecture, both
frameworks abstract away platform-specific details, allowing developers to focus on
a single, platform-agnostic codebase. This abstraction layer translates the shared
code into platform-specific components, aiming to achieve native functionality and
performance for both iOS and Android. When requiring access to platform-specific
features, often hardware, developers can opt to write native code for each plat-
form. Although React Native has a new architecture in development that will allow
for easier communication between the cross-platform and native layers, current im-
plementations of both Flutter and React Native require serialization of data for
inter-layer communication [12], [13].

2.3.1 ML in Cross-Platform Mobile Environments
In the deployment of machine learning (ML) models within cross-platform mobile
environments, it is often advantageous to utilize platform-optimized ML frameworks.
An example of such a framework is Core ML for iOS [5]. The advantages presented
by such an approach increase as the ML model requires interaction with other hard-
ware functionalities, such as the camera of the device. In a scenario where continuous
inference on a camera stream is required, significant processing time can be saved
by performing the entire computation flow on the native layer, as it omits the data
serialization introduced by inter-layer communication [12].

2.4 Model-Driven Engineering
Models are important in many scientific contexts to understand the basics of a field
or domain. They are descriptive of a system or prescriptive for determining the
scope or details of a problem. This is no different in software development, adopted
in model-driven engineering (MDE). MDE reshapes software engineering by em-
phasizing high-level abstraction and model-centric approaches. The abstractions
offered by MDE facilitate easier adaptation of new technologies. This abstracts
away platform-specific details, making it viable when working in cross-platform en-
vironments [14].
MDE employs model-based approaches that can improve the daily practice of soft-
ware professionals [14]. Through the focus on creating and examining models, these
aspects can capture various aspects of a problem-specific domain [15]. This allows

6


2. Theory

for complex systems to be more understandable and more easily translatable across
different platforms through code generation or model interpretation [16].

Adjacent to MDE are concepts such as Model-Driven Development (MDD) and
Model-Driven Architecture (MDA). MDD focuses on placing models at the center
of the development process, and implementations of these models often are either
fully or partially generated. MDA is a term first coined and used by the Object
Management Group (OMG) to narrow the focus of their OMG standards to modeling
and transformations. MDD serves as a superset to MDA, and MDE in turn acts as a
superset that encompasses both MDD and MDA. MDE offers a broader perspective
and application of model-based methodologies in software engineering. In addition,
MDE is sometimes referred to as Model-Driven Software Engineering (MDSE), but
this term is merely a synonym with MDE, both encapsulating performing software
development through abstractions and modeling [14].

2.4.1 Code Generation and Model Interpretation

There are several approaches within MDE to go from a model to executable software.
One such approach commonly applied is code generation, where a model is trans-
formed into a program in a suitable language that can subsequently be executed.
This process allows for developer intervention where needed since the generated code
can be edited upon generation. Furthermore, as a consequence of the code being
generated before execution, this approach does not introduce any run-time overhead.
However, while avoiding performance overhead, a disadvantage of code generation is
having to re-generate and re-deploy the software if making changes to a model [14].

In addition to code generation, another use case for automating software develop-
ment is model interpretation [14]. Model interpretation does not generate code, it
instead implements a generic engine, e.g. a library, that parses and executes the
model on the fly. This comes with several advantages as noted by Brambilla et al.
[14]: it allows making changes to the model or engine without an added code gener-
ation step, easier portability between platforms, and not having to interact directly
with the source code.

There are also a few concerns about model interpretation. As it is a black-box
approach, the application will be dependent on the library or tool serving as the
engine. If this implementation is not written optimally it can cause performance
issues that are difficult to solve, as the developer may not have insight into how the
engine operates. While this is a common reason for discarding this approach, it is
not a concern for most applications [14].

While model interpretation is often viewed as an alternative to code generation
within MDE, they are not exclusive alternatives. The two techniques can often be
used in a hybrid manner [14]. For example, developers may use model interpre-
tation during development for faster prototyping, while using code generation for
production to eliminate runtime overhead or minimize bundle size.

7


2. Theory

2.4.2 Metamodels

Previously, we have described a model within MDE as an abstraction of how some
software artifact is constructed. If moving one level of abstraction higher, we can
define a metamodel as a high-level abstraction of a model [14], [15]. Designing a
metamodel helps guide developers to have a structured approach when designing a
model’s intended properties and behavior in a software system.

Figure 2.1: An example hierarchy displaying how artifacts, models, and metamod-
els relate to each other.

Figure 2.1 displays how artifacts conform to a model and how this model can conform
to a metamodel. Instead of focusing on a specific platform, the model helps describe
the respective artifacts on a higher level, and these models are in turn guided in their
shape by the metamodel, describing how all models should be structured.

2.4.3 MDE in Edge Devices

As hardware improves and new areas of applicability arise, the demand to deploy
ML models on edge devices increases. However, integrating ML models into edge
device environments still comes with many limitations in terms of computational
resources, power constraints, and network communication [17].

Furthermore, there is a significant heterogeneity in edge devices, spanning from
low-memory microcontrollers to high-end smartphones. Working in this domain can
therefore require familiarity with several techniques and operating systems.

Within the specific domain of mobile development, Vaupel et al. [18] discuss how
model-driven techniques can be used to create flexible, cross-platform mobile appli-
cations, stating that models should be "As abstract as possible and as concrete as
needed." [18], [19]. By opting for model-driven techniques and using higher abstrac-
tion levels we can create separate native builds from a single source, similarly to
techniques mentioned in Section 2.3.

8


2. Theory

2.5 Domain-specific Languages

DSLs are languages used within a particular domain or problem, unlike general-
purpose languages like Python, C++, or Java. These DSLs are designed with
domain-specific abstractions and notations, making them more concise and accessi-
ble, and thereby easier for providing solutions within the domain [20]. MDE plays a
significant role in the development of such domain-specific languages, as they often
differ from general-purpose languages. In many cases, MDE is employed specifically
for working with DSLs [21]. When defining a DSL, the terms abstract syntax and
concrete syntax are often used, and they are relevant in this study. The abstract
syntax refers to how the DSL is modeled, i.e. which features are available and how
they combine and interact with each other, while the concrete syntax refers to the
actual grammar of the DSL and how the source files are written [14]. While the
concrete syntax is represented through the DSL itself, the abstract syntax is often
represented through metamodels. As Paige et al. [22] states, a metamodel can be de-
fined as the "description of the abstract syntax of a language, capturing its concepts
and relationships, using modeling infrastructure".

2.5.1 Developing a Domain-Specific Language

The development of a DSL requires a structured approach [23]. Firstly, the necessity
of a DSL has to be established, followed by an analysis of the domain to elicit the
specific requirements, and thereafter, designing the DSL syntax and semantics to
meet the requirements. Implementation of the DSL requires appropriate tools and
technologies to leverage its advantages. Lastly, testing and validation to ensure the
DSL efficiently solves the domain-specific challenges it was developed to solve.

Developing DSLs can be done in both Language Workbenches (LW), like Xtext [24],
GEMOC, and MetaEdit+, or by using more lightweight approaches, such as a JSON
Schema [25]. In the literature review done by Korani et al. [26] from 2023, Xtext
is still the most used language framework for developing textual languages, this is
also backed by Moin et al. [27] that uses the Xtext framework for developing ML-
Quadrat [28], an open-source prototype based on MDE for full code generation on
IoT devices. Using LWs enables more comprehensive tooling and support for editors
to make it work like a programming language. However, the LWs require more time
to develop and developers have to learn a new programming language. On the other
hand, using a JSON Schema enables defining specific data structures and can be
written in JSON or YAML. Modern IDEs support features such as autocomplete
and type-checking in even lightweight tools.

Fundamentally, there are two ways for DSLs and regular code to interact, either as
external or internal DSLs. External DSLs keep the DSL and GPL code in different
files, where the DSL is transformed into a programming language and executed, like
in Xtext. SQL is a well-known external query DSL used in relational databases [29].
Internal DSLs have both types of code in the same file, reusing the same grammar
as the General Programming Language (GPL) it is written in cohesion with.

9


2. Theory

2.5.2 Domain-Specific Languages in the Deployment of ML
Models

DSLs can play a significant role in the deployment of ML models, especially on edge
devices where computation power and memory are limited. A DSL can help ensure
type and function compatibility, which is an integral part for models used for tasks
such as image recognition and text processing. In addition, providing the ability to
efficiently manage tasks such as inputs and outputs. The paper by Zhao et al. [30]
introduces a system that exemplifies the use of a DSL in such a context.
Beyond the development and deployment, MDE has also been used to support the
management of orthogonal ML aspects, such as asset management [31] and dataset
management [32]. Traditional version control systems (VCS) can struggle to han-
dle complex assets such as ML models and datasets. In the paper by Idowu et al.
[31], they address these asset management challenges by introducing the Experi-
ment Management Meta-Model (EMMM). A meta-model to characterize ML asset
structures as concepts and their relationships observed in state-of-the-art tools, and
conceptual VCS structures that can hold both ML and traditional assets. Mean-
while, Giner-Miguelez et al. [32] presents DescribeML, a tool for utilizing a DSL to
describe datasets. This tool aims to enable a more data-centric approach in ML, to
handle issues like undesired model behaviors resulting from biased predictions.

2.5.3 JSON Schemas
JavaScript Object Notation (JSON) can be used to define DSLs, as DSLs can be
embodied as configuration files in applications [33]. JSON is a data serialization
format widely adopted to either store data physically or transfer it over the internet
[34]. It is a semi-structured document format, that is possibly the most popular
format for data exchange over the internet [35], [36]. It allows developers and IT
professionals to transfer data structures across programming languages and environ-
ments, without having to worry about said environments. Instead, the data can be
serialized or parsed in any language.
JSON Schemas and JSON documents differ in their purpose. A JSON document
contains the data to be sent or stored, organized in JSON objects. Meanwhile, JSON
Schemas are used to define the structure of which a JSON document should adhere
to, to ensure compatibility and consistency. The schema can then also be used to
validate a JSON document. [37]
JSON Schemas are the standard schema language for structuring JSON data. It is
based on a combination of structural operators that describe values, arrays, and ob-
jects, with logical operators like negation, conjunction, and disjunction [38]. JSON
Schema validators have been developed for many programming languages and they
are used to make software and data transfer more reliable [34].

10


3
Methods

This study employs a design science approach, combining theoretical research with
a practical application of the research into a specific artifact. Firstly, we conducted
research into the potential role of MDE in the deployment of ML models on edge
devices. We applied these findings to develop an artifact to solve the specific prob-
lem of deploying ML models in cross-platform mobile environments. The problem
is being addressed through the three aforementioned research questions and three
distinct cycles explained in Section 3.1.

3.1 Design Science Cycles
Following the process presented by Knauss [39], each cycle was centered around a
specific phase in the design science process while iteratively advancing the under-
standing and progress of each research question. Below is a summary of the work
conducted in each cycle of the study.

• Cycle 1 (RQ1): This cycle was about a major focus on research and under-
standing of the domain. We studied existing literature covering topics relevant
to the study. Developed small proof-of-concept solutions and evaluated their
viability in this context. We identified clear requirements for our artifact,
both functional and non-functional, to deepen our knowledge about the do-
main and Wiretronic’s needs. We employed tools such as interviews, frequent
conversations with employees, and inspection of source code.

• Cycle 2 (RQ2): This cycle primarily focused on applying our findings through
the concrete implementation of the artifact, iteratively verifying the develop-
ment against the requirements specified by Wiretronic. Complemented with
research into our domain and specific implementation details.

• Cycle 3 (RQ3): We evaluated our developed artifact by conducting a two-
fold evaluation. Firstly, we conducted an internal evaluation, focusing on the
requirements defined in cycle 1. Secondly, an external evaluation in collabo-
ration with our supervisor from Chalmers and the respondents at Wiretronic.
To test the suitability of our artifact we conducted a controlled experiment at
Wiretronic, where two groups performed a set of tasks using the existing ap-
proach and the new approach. The two groups were measured with respect to
time, lines of code, and correctness. Additionally, a final interview to evaluate
their experiences of implementing cross-platform specific code using our new

11


3. Methods

approach compared to the old approach.

3.2 Cycle 1: Domain Understanding and Initial
Artifact Definition

As stated, we spent most of the first cycle researching the domain and its specific
representation at Wiretronic, using this information to help define our requirements
and the scope of the study. This section covers the methods applied during this
phase.

3.2.1 Literature Study
The literature study was conducted as a foundation for our interviews and subse-
quent requirements elicitation. During cycle 1, the literature review provided in-
sights into various design options for our artifact. We explored developing the DSL
using Xtext as a language workbench or the more lightweight approach of JSON or
YAML. We assessed whether Wiretronic would benefit more from code generation or
model interpretation, and connected to this, we explored the benefits and drawbacks
of a build time respectively runtime approach to our model-to-code transformation,
considering feasibility within our time frame. From our interviews in Section 3.2.3
we got clarifying answers that helped us decide our final approach.

3.2.2 Repository Analysis and Program Comprehension
We spent part of the first cycle examining an existing library developed at Wiretronic.
This library powers all of Wiretronic’s machine learning operations on edge devices,
here limited to smartphones. This was primarily done as a program comprehen-
sion activity, as we required a thorough understanding of the domain and current
state of development to have informed discussions with engineers at the company,
identify constraints for future requirements elicitation, and find potential areas of
enhancement.
The applications are written in Flutter, using Java for Android-specific function-
ality and Swift for iOS-specific functionality. This resulted in large parts of the
program comprehension being conducted twice, as the machine learning code was
implemented both in Java and Swift. This gave us two possibilities to understand
most of the relevant code instead of one. Aside from serving as a tool to inform our
requirements elicitation and development, analysis of the library was used as a tool
to better understand the Flutter architecture and how it handles communication
between the cross-platform and native layers.

3.2.3 Interviews
At Wiretronic, there were two engineers with experience in applications and libraries
relevant to the study, therefore these two were chosen for qualitative interviews. The
interviewees had experience both in developing the ML models and deploying them

12


3. Methods

on devices, which allowed us to obtain a holistic view of their current processes with
a small number of interviewees. The interviews, performed as part of our problem
exploration in the first cycle, aimed to document workflows and challenges to inform
our subsequent requirements elicitation. While the interviews were conducted during
the initial phase of the study, they were not conducted immediately. We deemed it
necessary to first grasp the theoretical concepts relevant to the study, in addition to
performing program comprehension. This was done to ensure we would go into the
interviews well-informed and that they would serve their purpose.

Label Role Platforms Experience
Interviewee A Engineer/System Architect ML + iOS 4 years
Interviewee B Engineer/System Architect ML + Android 4 years

Table 3.1: List of interviewees participating in initial interviews.

The interviews were conducted in a semi-structured format to obtain both quan-
titative data, such as tools currently in use, and deeper, qualitative insights into
workflows and challenges. To allow for this structure, we used a set of pre-defined
questions available in Appendix A in combination with follow-up questions to elicit
more detailed information. Each session began with a standardized introduction to
maintain consistency across interviews, regardless of the participants’ prior knowl-
edge of the study. When crafting the interview questions, we deliberately included
some questions where we expected to already know the answer. This measure was
taken to ensure that basic needs, or must-be requirements, were not overlooked.
These requirements are often taken for granted and go unnoticed if fulfilled, but
failing to fulfill them can render the artifact unusable [40].
Upon conducting the interviews, we analyzed them as part of our requirements
identification. Since the majority of the interview content was equivalent across
the two interviews, they could be directly compared to identify challenges and pain
points identified by both engineers. We also followed up the interviews with informal
discussions, helping us draw conclusions and inform requirements when opinions or
statements presented in the interviews were conflicting.

3.2.4 Requirements Engineering
We used the insights obtained from examining relevant repositories at Wiretronic
and interviewing engineers when specifying our requirements. Analyzing repositories
gave us a good overview of their existing systems and possible areas of improvement
within our scope. Additionally, the interviews were valuable in providing context
to our findings, ensuring that they align with the requirements and priorities of
the company. The requirements identification was initiated by developing a set
of user stories centered around a persona representing a developer at Wiretronic.
This helped us bridge the requirements elicitation and requirements identification
phases, consolidating the information we had obtained without getting caught up
in implementation details.
After defining a set of suitable user stories we began defining requirements, both

13


3. Methods

functional and non-functional, rooted in the user stories. Naturally, this phase
helped in setting up a set of more concrete, measurable goals for the project. Defining
the requirements was an important tool in defining the scope of our study and
creating a mutual understanding of priorities among ourselves and with the engineers
at Wiretronic. Furthermore, the requirements were vital for the third and final cycle
of the study, when performing validation and verification of the developed artifact.

3.3 Cycle 2: Artifact Design and Development
The purpose of the second cycle was two-fold: it first involved transforming the
data collected in the first cycle to well-informed technology and design choices, and
secondly, it involved designing and implementing the artifact. This section covers
how we conducted this transformation, as well as how the design and implementation
phase was conducted.

3.3.1 Design and Technology Choices
This section and the choices we made can be divided into two separate areas making
up our artifact: the DSL and the accompanying library.
When designing the DSL, the primary guiding factor was the interviews with en-
gineers, as no similar project had been conducted at Wiretronic before. The in-
terviews, informed by the initial research activities, helped us narrow down what
specific purpose we should aim to solve. The technologies chosen for the library are
primarily rooted in practices already in place at Wiretronic to avoid obstacles in the
handover of the artifact at the end of the study and to ensure compatibility with
relevant applications. This information was elicited through the interviews and our
analysis of existing repositories.
The possibility of using a JSON Schema to define the DSL was explored before the
actual study began. Through the interviews and subsequent requirements engineer-
ing it was deemed a viable and preferable option during the first research cycle. We
found that the main role of the DSL would be to describe an ML serving pipeline,
and not write the actual implementation and logic, thus making a JSON Schema a
fitting choice. After this decision was made, more focus was put into how to best
describe the model metadata and the pre- and postprocessing steps. This involved
going through the existing models and comparing which aspects of the current ML
serving pipelines that are shared, and which are unique for one or a set of specific
models.
After identifying the required content of the DSL, the concrete syntax had to be
established. Thanks to the lightweight nature of JSON Schemas in contrast to
developing a programming language, we were able to iterate on the syntax quickly
and try out several variations of the syntax in a single working session. Additionally,
some of the choices of this process were made automatically due to limitations of
the JSON specification, as highlighted in Section 4.3.2.
The library was designed in parallel with the DSL, ensuring both that any additions

14


3. Methods

or changes made to the DSL would be feasible to implement in the library and
that we could find a suitable place for them. When designing the functionality
for preparing and running the actual ML serving pipeline, we made several choices
based on our initial research and the interview feedback.
It was clear that, since a camera-based ML application can receive data as a stream
of images, the overhead introduced by our library must be minimal. This meant
that we wanted to avoid parsing the DSL instance each time, and also avoid con-
ditional statements during execution, based on the parsed DSL instance. Thus, we
implemented the pre- and postprocessing as a series of individually contained steps,
all implementing an interface with the necessary method stubs. Thus, the pipeline
lists consist of generic pre- and postprocessors, and not the concrete implementation,
according to the dependency inversion principle [41].
This helped us to separate the preparation and execution of the pipeline, as all
steps had a method for setting it up with all the correct parameters and a separate
method for executing it. While this was mainly done to eliminate any DSL-related
logic during execution, it also helped when designing the functionality to implement
custom pre- or postprocessing steps. By allowing the developer to implement an
anonymous class implementing the interface directly in the consuming application,
they can be confident that the step will be compatible with the pipeline, as long as
the implementation is fault-free. In MDE, this functionality is considered part of
the model engine, which is presented in more detail in Section 4.3.3

3.4 Cycle 3: Artifact Evaluation
This section covers our evaluation of the developed artifact, which was the main focus
of the third and final cycle. Here, we first conducted an internal evaluation of the
developed artifact, comparing the result with the visions presented by Wiretronic
and the set of requirements we developed as a result of our initial exploration.
Secondly, we evaluated the artifact together with Wiretronic, performing a controlled
experiment with two groups of engineers. In doing this evaluation, we covered both
aspects of verification and validation, ensuring not only that the artifact has been
built correctly, but also that it solves the correct problem.

3.4.1 Controlled Experiment
To perform the evaluation, we performed a controlled experiment. The purpose
behind this was two-fold: first, we aimed to identify the specific impact of our arti-
fact, and second, maintaining increased control over the experiment helped ensure
similar conditions for each trial, minimizing the impact of outside factors. The ex-
periment was carried out using a Latin square design [42], where two groups each
performed two tasks. One group used our artifact to solve the first task and not
for the second task, and vice versa, as is displayed in Table 3.2. All sessions were
performed in a 60-minute time slot, ensuring all participants had the same time
to perform the tasks. Furthermore, the two groups received identical presentations
and documentation for our artifact. Because of the small available sample size, we

15


3. Methods

utilized stratified sampling [43]. The engineers were categorized into two groups:
experienced and inexperienced, with both experienced engineers having four years
of experience and inexperienced zero, not having worked in the environment at all.
We then formed the two experiment groups with an equal number of experienced
and inexperienced engineers, as visible in Table 3.3. The Latin square design aimed
to minimize the learning bias that comes from the participants improving their per-
formance by repeating similar tasks. By alternating the order in which the tasks are
performed, and the use of the artifact across both groups, we can effectively reduce
the bias. Group 1 will be the control group for task 1, and the treatment group for
task 2, and vice-versa for group 2.

Group 1 Group 2
Task 1 not using artifact using artifact
Task 2 using artifact not using artifact

Table 3.2: Experiment setup.

Group 1 Engineer A** Experienced
Engineer B Inexperienced

Group 2 Engineer C* Experienced
Engineer D Inexperienced

Table 3.3: The engineers from Wiretronic that participated in the experiment,
with their respective experience levels. *Interviewee A, **Interviewee B.

With this experiment, we aimed to identify whether the introduction of our artifact
improves the workflow of the specific process it is designed to improve, to give
answers to RQ3. Opting for a contrived setting allowed us to identify the impact of
the artifact, albeit at the cost of generalizability and realism [44]. To provide further
nuance and compensate for the drawbacks of a controlled experiment, we conducted
semi-constructed interviews with the participants to gain qualitative insights.

3.4.1.1 Metrics

In RQ3 we wanted to answer to what extent our artifact improves the developer
experience in aspects such as maintenance, feature development, time-saving, and
resource planning. We used the experiment to obtain quantitative data and com-
bined this with the interviews for qualitative data. The quantifiable metrics we
observed through the experiment were the following:

• Time per completed task. Measured in minutes, extracted from commit times-
tamps.

• Lines of code written to complete each task. Measured in lines inserted and
lines deleted for each commit.

• Correctness, a binary metric of whether the task was performed correctly or
incorrectly. Measured by manual static analysis of the solution, and occurrence
of runtime errors after the experiment.

16


3. Methods

The post-experiment interviews helped us obtain qualitative data about more sub-
jective metrics, identifying how usable, intuitive, and useful the artifact can be for
the engineers’ daily workflow. The questions asked in these interviews are available
in Appendix B.
Additionally, we performed hypothesis testing on our metrics, specifically time and
correctness to get a more comprehensive view of our results. We expected the data to
not be normally distributed due to a small sample size, natural variations in human
performance, and the variability in the experience levels of the engineers. For the
development time, we utilized the Mann-Whitney U Test [45]. This test is suitable
because it is non-parametric and does not assume a normal distribution, making
it appropriate for small sample sizes and discrete time data. For the correctness
metric, we used Fisher’s Exact Test [46]. It is designed for categorical data, in our
case correct and incorrect, and is ideal for small sample sizes.
The hypotheses for the Mann-Whitney U test:

• Null Hypothesis (H0): There is no statistically significant difference in the
development time between the old and new approaches.

• Alternative Hypothesis (H1): There is a statistically significant difference in
the development time between the old and new approaches.

The hypotheses for Fisher’s Exact Test:
• Null Hypothesis (H0): There is no statistically significant difference in cor-

rectness between the old and new approaches.
• Alternative Hypothesis (H1): There is a statistically significant difference in

correctness between the old and new approaches.

3.4.1.2 Tasks

As stated, we designed two example tasks to evaluate the artifact. Task 1 had three
subtasks and Task 2 had two subtasks. These were designed with the pain points
of Wiretronic in mind, identifying how effective the artifact can be in maintenance
for both the pre- and postprocessing parts of an ML serving pipeline. Therefore,
Task 1 is completely related to preprocessing and Task 2 is completely related to
postprocessing.
Task 1 - Assessing preprocessing: Given an existing model with accompany-
ing pre- and postprocessing methods implemented, the engineers will perform the
following subtasks:

• Change the path from which the model is loaded.
• Modify the size of the input data, that the image will be resized to from 300

by 300 to a new specified dimension, 380 by 380.
• Enable normalization for the input image.

Task 2 - Assessing postprocessing: The model that has the least trivial post-
processing is a multi-headed model used for several computer vision tasks. Being
multi-headed, it can both provide e.g. whether an item is visible in the frame, and

17


3. Methods

produce a bounding box for locating the item.
• Adjust the threshold of the binary classification head named is_visible to 0.5.
• Implement interpolation for the binary classification called size. Set the size

to 300 if below the threshold, otherwise set it to 500.

18


4
Results

This chapter presents the findings of our study, the subsequent artifact implemen-
tation, and the evaluation of the artifact. It lays out the requirements that guided
the artifact implementation and evaluation, along with the reasoning behind each
requirement.

4.1 Initial Problem Exploration
This section is dedicated to presenting our findings from the first cycle, focused
on defining the artifact. This entails our literature study, repository analysis, and
interviews. The literature study primarily focused on RQ1, identifying how we can
implement the DSL for this specific scenario. Meanwhile, the repository analysis and
interviews were aimed at exploring RQ2 and RQ3, identifying how the introduction
of a DSL for ML pipelines can improve development processes within Wiretronic.

4.1.1 Interview Findings
We primarily obtained insights into existing development processes and potential
enhancements through the engineer interviews. Interviewee B stated that a DSL
and accompanying tools would help in the development and testing of ML serving
pipelines, specifically for iOS. Stating that since he does not use MacOS, a require-
ment for building iOS applications in Swift, he can not currently develop for iOS.
Instead, if making changes to an ML serving pipeline, he would have to write and
test the changes in Java and then pass development to Interviewee A, who can im-
plement the equivalent functionality for iOS in Swift. He mentioned that with a
DSL he could instead define an ML serving pipeline using the DSL and then be con-
fident that the iOS implementation will work, as long as the DSL instance is written
correctly. Interviewee A independently pointed this out as well, underlining the fact
that native development and related communication are obstacles in their current
workflow. Furthermore, the two engineers agreed that an additional problem they
would like to solve is having to publish a new version of the library when either
making a change to an ML serving pipeline or implementing a new model.
When asked about the language design, Interviewee B stated that they would prefer
writing the pipeline steps in a format they are familiar with and can get used to
quickly rather than a completely custom DSL since there are only two platforms.
They used the reasoning that if they were to learn a new language or platform, they

19


4. Results

could learn the other platform (in their case, iOS/Swift) instead of a new DSL.
The two interviewees presented slightly different approaches to implementing the
DSL in an application. Interviewee B suggested that it could be a part of the build
process, i.e. generating platform-specific code for the ML pipeline when compiling
the application. Interviewee A, however, noted that he would prefer that the DSL be
bundled with the application, loaded and parsed during runtime, and then used to
configure the pipeline. This suggestion can be classified as a model interpretation-
based approach, as it parses and executes a model during runtime [14]. This process
requires including all possible pipeline operations in the application bundle. He
stated that the performance implications would be negligible, especially in compar-
ison to loading an ML model from either the disk or over the network, which is
already done in the applications. The suggestion by Interviewee B would satisfy
their shared pain point of having to republish the library when making a change to
an ML serving pipeline, but it would still require publishing a new version of the
application. Interviewee A’s suggestion would also remove this step, but it could
prove less flexible if a developer needs to add currently non-existent functionality or
functionality not general enough to be part of the library.

4.1.2 Impact on Artifact Development
Here we present some decisions made after conducting our initial studies and inter-
views. While Section 4.3 explores the design and implementation of our artifact in
more detail, this section aims to provide relevant context for Section 4.2, which lays
out the requirements guiding the artifact development.
When re-examining the problem after our research and interview study, we decided
to opt for an approach based on model interpretation. This decision was primarily
driven by two factors. Firstly, the interviews along with further discussions with en-
gineers confirmed that the set of operations used for image transformation is limited
and overlaps significantly across pipelines, confirming our previous findings from
examining repositories. This highlighted that the configuration of arguments would
benefit more from abstraction than the development of completely new functionality.
Secondly, by opting for a model interpretation-based approach, the need to release a
new library or application version upon making changes to the pipeline is removed,
as previously highlighted. Instead, the pipeline can be updated dynamically, for
example by fetching it from a remote server, thanks to the required functionality
being bundled in configurable modules with the application.
While it seems suitable for this scenario, opting for a model interpretation-based
approach may bring drawbacks. As highlighted previously, if a new ML model that
requires custom preprocessing functions is introduced, this functionality will not be
present in the library. In this case, the DSL and library either have to be extended
to include this functionality, or we would need to include a way for a developer
to reference one-off functions residing in the application in the DSL with suitable
syntax. This does in turn introduce a problem of runtime safety. If we fetch a
new DSL instance and this includes functionality not present in the application, the
pipeline will not be configured correctly.

20


4. Results

Upon discussions with engineers, we still deemed the model interpretation-based
approach to be most suitable. If using code generation and avoiding runtime config-
uration, completely new functionality not supported by the DSL would still require
substantial maintenance work and manual updating of either the library or applica-
tions consuming it.
As highlighted by our interview study, the DSL needs to be easy to learn and adopt
compared to mastering a new platform. This, in combination with our specific
context of defining a machine learning serving pipeline using pre-defined functional-
ities, we decided that employing a JSON Schema for the DSL would be an effective
approach. This choice seemed more suitable than opting for a more complex and
advanced tool like Xtext since the primary goal is to describe pipeline steps and we
do not require more detailed application logic within the DSL. Utilizing a JSON
Schema offers several advantages: it simplifies the versioning of the DSL and allows
for the validation of DSL instances against the schema. These validation abilities in
turn provide syntax highlighting and integrated documentation within the develop-
ers’ editors for increased usability and ease of adoption.

4.2 Requirements
This section will present the requirements identified through our requirements en-
gineering process, presented in further detail in Section 3.2.4. This entails both
the user stories, focused on creating a high-level view of the solutions provided by
our artifact, along with our functional and non-functional requirements. The re-
quirements are presented together with a short description aimed to provide further
context and reasoning behind the requirement.

4.2.1 User Stories
User stories are features written from the perspective of a user, in our case a devel-
oper [47].

• UC1: As a developer, I want to be able to create and modify ML pipelines
for multiple platforms without requiring platform-specific knowledge.

• UC2: As a developer, I seek to avoid writing equivalent, platform-specific
code for multiple platforms when deploying ML models.

• UC3: As a developer, I want a configuration file in a format I recognize, like
JSON, to quickly change ML model parameters for rapid experimentation to
enhance efficiency.

• UC4: As a developer, I aim to dynamically adjust ML model configurations
using the DSL at runtime, thus avoiding releasing new application or library
versions for changes to the configuration.

• UC5: As a developer, I wish to use pre-built templates for common ML
tasks, enabling me to concentrate on developing new and unique features for

21


4. Results

improving model performance.

• UC6: As a developer, I need a framework to easier identify potential failures
in the ML pipeline, reducing manual debugging efforts.

4.2.2 Functional Requirements
The functional requirements specify the functions of the system, the features it is
going to have, and how it handles data [48].

4.2.2.1 Pipeline Specification (DSL)

The ML serving pipeline refers to the set of processing steps required for an ML
serving model. Each step in the process is a pipeline step, that performs a specific
operation or transformation to data. As specified in FR1.1, this would include the
pre- and postprocessing steps required before and after using the ML models. The
preprocessing steps Wiretronic uses include cropping an image, rotating an image,
changing image format, normalizing pixels, and initializing buffers for storing image
data. The post-processing steps include tensor conversion, and extracting tensor
data into other formats.

• FR1.1: The DSL should be able to specify which pre- and postprocessing
steps are required for an ML model in a given context.

• FR1.2: The DSL should be able to be validated against a JSON Schema to
ensure its correctness.

Given the need for a clear and flexible way to define these pipelines, we have cho-
sen to use JSON Schemas for our DSL. JSON Schemas provides a structured yet
lightweight approach to defining the syntax and validation rules for our DSL, ensur-
ing compatibility and ease of use across different platforms.

4.2.2.2 Platform-Specific Model Interpretation (DSL + Architecture)

When specifying the steps in the DSL, the library should allow for model interpreta-
tion directly in Swift and Java. It ensures the application can be run across different
platforms, in this case iOS and Android, by abstracting away the complexities of
writing platform-specific code, while also allowing for changes to the ML serving
pipeline on the fly.

• FR2.1: The DSL should enable model interpretation in Swift and Java, initi-
ating an ML serving pipeline from existing native functionality based on the
steps defined in an instance of the DSL.

4.2.2.3 Support Pre-Existing and Custom Operations (DSL)

A tool like this needs to be able to maintain the freedom of implementing specific
operations if needed. Our tool already provides the existing operations mentioned
in Section 4.2.2.1, however, these are still pre-defined operations Wiretronic uses
for their ML models. When working with ML models the preprocessing steps can

22


4. Results

significantly impact the predictions of the models, hence making it an iterative
process using different operations that could need these custom operations [49].

• FR3.1: The DSL should enable the developers to use local functions instead
of those pre-defined in the DSL.

4.2.2.4 Support Dynamic Changes of the Pipeline (Architecture)

One of the advantages of implementing a DSL and library solution is that it en-
ables dynamic changes during runtime. By having the ML serving pipeline set up
dynamically through a configuration JSON file, we can change the model serving
parameters without Wiretronic having to release new versions of their library. Since
all functionality already exists in the library, we can dynamically load new model
parameters when changes happen to the configuration file, or initialize a new con-
figuration file.

• FR4.1: Being able to switch between several configurations while the appli-
cation is running, enabling A/B testing of pipelines.

4.2.3 Non-Functional Requirements
Non-functional requirements, or quality requirements, specify how well the system
performs its functions. It is very important to address these alongside the func-
tional requirements, as they play a crucial role in what we want to achieve with the
requirements as stated in Section 4.2 [48].

4.2.3.1 Usability

Usability refers to how friendly the system is to users [50]. The artifact aims to
ease the workflow of the developers, hence it needs to be intuitive and have a low
learning curve.

• NFR1.1: The system should be easy to learn, allowing developers to use it
with minimal training required.

4.2.3.2 Maintainability

Maintainability here refers to the ability to improve and understand software [50].
As the thesis project is in collaboration with Wiretronic, it is essential to make
the artifact easy to build further upon by the company after the completion of the
project. By writing documentation about our solution, the developers at Wiretronic
should easily be able to understand our library and DSL to make changes or add
new features.

• NFR2.1: The system should be easy to update, with clear documentation
and guides.

• NFR2.2: It should facilitate the addition of new ML serving pipeline features
without having to make substantial modifications to the existing code.

23


4. Results

4.2.3.3 Performance

Performance defines how fast a software system or component responds to actions
[50]. In Section 2.4.1 it says performance may be a concern for some when using
model interpretation. Through our research and implementation, we aim to prove
that using model interpretation should not negatively impact the application startup
time when initiating the ML serving pipeline through the library and DSL.

• NFR3.1: The system should not add more than 50ms to the application
startup time when initiating an ML serving pipeline from an instance of the
DSL.

• NFR3.2: The system should not cause performance overheads when running
an application containing an ML serving pipeline dynamically set up by the
library.

4.2.3.4 Compatibility

Compatibility refers to a system that exists and interacts with another system in
the same environment [50]. As the system is in a cross-platform environment it is
important to not have any limitations due to different operating systems or IDEs.

• NFR4.1: The system should work across multiple platforms (MacOS, Win-
dows, and Linux).

• NFR4.2: The system should work in Flutter codebases.

4.3 Design and Implementation

4.3.1 Current Approach
Figure 4.1 is a code snippet from the existing library at Wiretronic, it displays how
the preprocessing is written in Java for one of their models. The method performs
cropping, rotation, and normalization of an image, with the parameters for image
size being instance variables in the Java class. When implementing a new ML
model or making changes to an existing ML serving pipeline, the developers will
also have to write this code for Swift to support iOS devices. As will be presented
in this section, our DSL abstracts away the platform-specific details and provides
the developer with a single interface to specify the ML serving pipeline.

4.3.2 Proposed Approach
In this section, we propose an alternative approach to manage the ML serving
pipelines in cross-platform mobile environments, decoupling this configuration from
the underlying platform. This proposal is the result of the previously outlined re-
quirements definition and the work done to inform that. It consists of two separate
but connected parts: the DSL aimed to aid developers in specifying the ML serving
pipelines in a single, familiar format, and the Flutter library which supports the
DSL and generates the pipelines at runtime.

24


4. Results

Figure 4.1: The preprocessing method in Java that Wiretronic uses for one of their
models.

4.3.2.1 Domain-Specific Language

The DSL provides definitions for three different aspects of the pipeline: the model
metadata, preprocessing, and postprocessing. Figure 4.2 displays the abstract syn-
tax of the language through a metamodel, showing the main concepts of the domain
and their relationships.

1 "model": {
2 "name": " Multihead ",
3 "path": {
4 " android ": " multihead .pt",
5 "ios": " multihead . mlmodel "
6 },
7 "input": {
8 "width": 380,
9 " height ": 380

10 }
11 }

Listing 4.1: An example of how the DSL allows for specifying metadata about the
model.

Using the DSL, a developer can provide metadata about the model, consisting of
its name, the specific path of where to fetch the model from on iOS and Android

25


4. Results

Figure 4.2: Abstract syntax of the DSL.

respectively, and the required input size of the model, which any image fed to the
pipeline can be resized to. How this metadata can be defined is displayed in Listing
4.1.
Preprocessing is divided into separate steps, called preprocessors. Each preproces-
sor supports one specific action and can receive arguments from the developer as
necessary. The DSL provides built-in support for four preprocessors: cropping, re-
sizing, rotating, and normalizing an image. These steps are commonly used when
preprocessing images for ML tasks, as the image received from e.g. the camera can
be of different dimensions and orientation depending on the device configuration.

1 " preprocessors ": [
2 {
3 " action ": "crop",
4 "mode": " square "
5 },
6 {
7 " action ": " resize ",
8 "input": " custom ",
9 " height ": 380,

10 "width": 380
11 },
12 {
13 " action ": " normalize "
14 }
15 ]

Listing 4.2: An example preprocessing configuration using the DSL.

26


4. Results

1 " postprocessor ": {
2 "type": " segmentation ",
3 " format ": {
4 " height ": 320,
5 "width": 320
6 }
7 }

Listing 4.3: Example of the postprocessing in Wiretronic’s Segmentation model
using our DSL.

The order of preprocessing steps is important. If, for example, an image received
from the camera is 2000x2000 pixels after cropping, but the model requires an image
with normalized colors of size 300x300, it would be a waste of time and computing
power to apply the normalization before resizing the image, as it would require
iterating through over 40 times as many pixels. Since the JSON specification does
not guarantee a maintained order of object entries, the preprocessors have to be
defined in an array of objects and not an object with a key for each preprocessor
[51]. To accommodate this, each preprocessor is defined as an object with a key
called action specifying the name of the step. The additional argument entries that
are required for the preprocessor are then inferred by the schema through the value
of the action key. The built-in preprocessors are defined below, and an example
preprocessor configuration is displayed in Listing 4.2.

• crop: Allows the developer to specify a mode. If mode is square, it will perform
a square crop in the center of the image. If mode is custom, the DSL requires
the additional arguments x, y, width, and height, specified as integers.

• resize: Resizes the input image. The developer can choose the input for
the measurements, if it is custom the image will be resized according to the
arguments specified by the developer for width and height. If it is model, the
function will use the size specified in the model metadata.

• rotate: Rotates an image by the number of degrees specified in the argument
degrees.

• normalize: Takes no additional arguments. Normalizes the image.
While we found the preprocessing steps to be generalizable and had a large overlap
in usage across models, the postprocessing was close to the opposite. Here, instead
of implementing support for specific functions that can be used for many different
models, we had to implement model-centric solutions.

27


4. Results

1 " postprocessor ": {
2 "type": " multihead ",
3 "heads": [
4 {
5 "name": " is_visible ",
6 "type": " binary ",
7 " threshold ": 0.3
8 },
9 {

10 "name": " centerpoint_x ",
11 "type": " regression "
12 },
13 {
14 "name": " centerpoint_y ",
15 "type": " regression "
16 }

Listing 4.4: Example of the postprocessing in Wiretronic’s multi-headed model
using our DSL, showcases 4 out of 11 output heads.

Listing 4.4 displays the postprocessing of Wiretronic’s multi-headed model. This
figure illustrates why model output can pose a challenge when defining postpro-
cessing of these outputs using our DSL. This model outputs 11 "heads", specific to
this model. Comparing this to Listing 4.3, we see how different the two models’
outputs and postprocessing can be. During our research, we implemented function-
ality for these models as proof-of-concepts, displaying that the DSL can be utilized
for both simple and advanced postprocessing tasks. However, if Wiretronic were to
implement a completely new model, they would need to implement this in the DSL.
While extending the DSL can be suitable when introducing a new model with com-
pletely new postprocessing, there may be one-off situations where a model requires
some custom operations in either the pre- or postprocessing stages. To accommo-
date this, we implemented a pre- and postprocessor registry, which allows developers
to introduce custom functionality without the DSL being an obstacle. Contrary to
Wiretronic’s current approach, where everything ML-related is handled in a library,
our DSL and library would allow defining custom functionality directly in the ap-
plication where it’s required. If developers then encounter the same situation in
more applications, they can decide to introduce the custom step into the DSL and
library permanently. The main difference between a custom implementation and an
existing one is that it would require a re-release of the application since it involves
writing platform-specific code that needs to be bundled with the application.

4.3.3 Model Engine
As required when opting for a model interpretation-based approach, a model engine
was implemented to handle the model-to-code transformation. This model engine
is displayed in Figure 4.3. When starting the application, the developer can initial-
ize the model engine in Flutter by providing a path to the correct DSL instance.

28


4. Results

Figure 4.3: Illustration of how the model engine prepares an ML serving pipeline
from a DSL instance.

This DSL instance is loaded and parsed, creating a nested dictionary referred to as
the model instance. Performing the parsing in Flutter helps avoid discrepancies in
parsing or file system access between platforms. After this, the model instance is
fed through a MethodChannel into the platform-specific model engines. The model
engine uses the model instance to fetch the correct pre- and post-processing steps
for the ML model from the processor registry. Additionally, it also uses the path
provided in the model instance to load the correct ML model from the file system.
Upon fetching the pre- and postprocessing steps and loading the ML model, the ML
serving pipeline is ready and can receive images from the device’s camera. Since the
model interpretation happens at startup, any performance overheads incurred will
be present on application startup and not when performing inference.

4.3.4 DSL Development Tools

We used the TypeScript tool TypeBox to develop the JSON Schema and abstract
syntax that defines our DSL. TypeBox significantly reduces the amount of code
having to be written compared to defining a JSON Schema manually. Additionally,
it improved developer ergonomics by providing functions for set theory, allowing
us to easily define complex conditional types. After the JSON Schema had been
defined using TypeBox, we ran a TypeScript script that outputs the rendered JSON
Schema to a JSON file. Figure 4.4 displays how TypeBox allows for separation and
significantly reduced code when defining a JSON Schema, using a mock example.

29


4. Results

Figure 4.4: Comparison of an example JSON schema as defined using Typebox
(left) and the actual schema outputted by Typebox (right).

4.4 Artifact Evaluation
This section will go through the findings from our different evaluations of our artifact.
This involves examining whether it fulfills the requirements set out at the beginning
of the study along with the experiment and accompanying interviews from the third
cycle.

4.4.1 Experiment Results
The results from the experiment conducted as part of our evaluation are presented
here. They are presented group-wise, presenting the results from Group 1 and Group
2 for each metric. Each metric is reported per subtask.

4.4.1.1 Development Time

Overall, the artifact generated a substantial improvement in development time for all
subtasks. As displayed in Table 4.1 and 4.2, this was true for both the experienced
engineers (A, C) and the inexperienced engineers (B, D). Figure 4.5 displays the
mean time for all participants, categorized by task and approach used. When com-

30


4. Results

New approach Old approach
1.1 1.2 1.3 2.1 2.2

Engineer A 1 1 1 3 4
Engineer B 2 1 2 9 21

Table 4.1: The time (in minutes) it took for engineers A & B to complete the
subtasks in the first task using the new approach, and the subtasks in the second
task using the old approach.

Old approach New approach
1.1 1.2 1.3 2.1 2.2

Engineer C 2 5 4 1 2
Engineer D 7 5 15 2 6

Table 4.2: The time (in minutes) it took for engineers C & D to complete the
subtasks in the first task using the new approach, and the subtasks in the second
task using the old approach.

Figure 4.5: The mean time per task (in minutes) for the old and new approach,
respectively.

paring an inexperienced engineer not using the artifact and one using the artifact,
the average improvement in development time was 344%. If comparing experienced
to inexperienced engineers before introducing the artifact, the experienced engineers
on average performed 141% better than the inexperienced engineers.

New approach Old approach
1.1 1.2 1.3 2.1 2.2

Engineer A Insertions 2 2 3 1 3
Deletions 2 2 0 1 3

Engineer B Insertions 2 2 4 1 6
Deletions 2 2 0 1 4

Table 4.3: The lines of code written by engineers A & B to complete the subtasks
in the first task using the new approach, and the subtasks in the second task using
the old approach.

31


4. Results

Old approach New approach
1.1 1.2 1.3 2.1 2.2

Engineer C Insertions 2 1 4 1 1
Deletions 2 1 0 1 0

Engineer D Insertions 1 1 3 2 1
Deletions 1 1 0 2 0

Table 4.4: The lines of code written by engineers C & D to complete the subtasks
in the first task using the old approach, and the subtasks in the second task using
the new approach.

4.4.1.2 Lines of Code

As displayed in Table 4.3 and 4.4, there were no significant differences in the absolute
number of lines of code required to complete the tasks. Only in task 2.2 was there
a major difference but this is attributed to normalization being a built-in function
in the DSL, thus only requiring it to be enabled instead of having to perform the
normalization manually. This experiment only included development on Android,
however, and some of the tasks would require performing equivalent operations also
on iOS, thus increasing the required lines of code when not using the DSL.

4.4.1.3 Correctness

When measuring correctness we manually tested each commit to catch any runtime
failures, and statically analyzed the commits, ensuring that the commits using our
artifact did not include any unnecessary code not required for the task description.
As can be observed in Table 4.5 and 4.6, six subtasks implemented using the old
approach were considered correct, accounting for 60%. Meanwhile, for the new
approach, eight solutions were deemed correct, representing 80%.

New approach Old approach
1.1 1.2 1.3 2.1 2.2

Engineer A correct correct correct correct incorrect
Engineer B correct correct incorrect correct correct

Table 4.5: The correctness for engineers A & B when completing the subtasks in
the first task using the new approach, and the subtasks in the second task using the
old approach.

Old approach New approach
1.1 1.2 1.3 2.1 2.2

Engineer C correct correct wrong correct correct
Engineer D wrong correct wrong wrong correct

Table 4.6: The correctness for engineers C & D when completing the subtasks in
the first task using the old approach, and the subtasks in the second task using the
new approach.

32


4. Results

4.4.2 Hypothesis Testing
We performed hypothesis testing on the development time and correctness metrics.
As explained in Section 3.4.1.1, for the development time a Mann-Whitney U test
was utilized, and for the correctness, a Fisher’s Exact test was used. For a test to
show significance, we decided on the p-value to be lower than 0.05, as this has been
the rule for strong evidence in favor of a scientific theory [52].

• Mann-Whitney U statistic: 90.0
• P-value: 0.0022820350196566465

The results of the Mann-Whitney U test indicate a statistically significant improve-
ment in time efficiency with the new approach, as the p-value is lower than the
0.05 threshold. This finding supports discarding the null hypothesis, hence the new
approach would reduce the time required to complete tasks.

• Odds Ratio: 0.375
• P-value: 0.628482972136223

The Fisher’s Exact test results in a p-value greater than 0.05, showing no statistical
significance between the new and old approaches, hence keeping our null hypothesis.
This result is likely due to the small sample size.

4.4.3 Requirements
Here, we evaluate the artifact with respect to the requirements defined in the first
cycle. This is split into functional and non-functional requirements, and evaluated
both using metrics from testing the software and subjective opinions presented by
engineers during the evaluative interviews held after the experiment.

4.4.3.1 Functional Requirements

Pipeline Specification: The DSL does enable developers to specify which pre-
and postprocessing steps are required for an ML model. The DSL does validate
against a JSON Schema when using an IDE, both in terms of what is required for
an ML model in general, and autocomplete with all pre-existing operations.
Platform-Specific Model Interpretation: The DSL does enable model inter-
pretation in Swift and Java using the model engine illustrated in Figure 4.3, and
further explained in Section 4.3.3.
Support Pre-existing and Custom Operations: Since the DSL was imple-
mented based on the results of our interviews and repository studies, we were able
to identify and implement support for the most commonly used operations, both
in pre- and postprocessing. We complemented this with the previously mentioned
pre- and postprocessor registry, which allows developers to include custom func-
tionality, thus fulfilling the requirement of supporting both pre-existing and custom
operations.
Support Dynamic Swapping of Configuration: The ML serving pipeline is set
up through the runtime parsing of a JSON file. Thus, developers can write code that

33


4. Results

supports changing which JSON file is loaded, and the library would instantiate a new
pipeline. This is possible thanks to the model interpretation approach, performing
the model-to-code transformation at runtime.

4.4.3.2 Non-functional Requirements

Usability: The goal was to make the DSL easy to learn, allowing developers to
use it with a minimal training required. During the second round of interviews, the
participants were asked to rate the DSL in terms of the properties of intuitiveness,
learnability, and usability on a scale from 1 to 5. Intuitiveness got an average of
4.75, learnability was an average of 4.75, and usability was an average of 5. These
answers indicate accomplishing our goal. Additionally, one of the inexperienced
participants stated the tool provides a lower barrier of entry for contributing to
the code: "I would not dare to work in this environment otherwise, using the new
method makes me feel more secure" - Engineer D. However, we did get feedback
on the documentation being slightly confusing, with both Engineer C & D stating
that we should improve the documentation and that the large amount of text in a
single place made it difficult to get an overview. Based on this feedback, we made
improvements to the documentation after the interviews.
Maintainability: The main feature of the new approach is enabling easier updates
of ML serving pipelines: "I think it was much better compared to without, there
are so many files I don’t recognize and difficult navigating the file structure" - En-
gineer B. The DSL enables the developers to modify the models through only one
configuration file, not having to make substantial changes to the existing code.
Performance: We conducted a test measuring how long the application takes
from startup to readiness, including tasks like camera setup and model loading,
excluding build time. We started the application ten times using each approach,
where the average of the new approach increases the startup time by 24ms. This
fulfills requirement NFR3.1, stating that our approach should not add more than
50ms to the startup time of an application consuming the library. The test was
conducted on a single computer and OS, therefore the results might differ. The full
results of the trial runs are displayed in Table 4.7.
Compatibility: We manually tested the new approach across Windows, Linux, and
MacOS. As long as the system had installed all the necessary software like Android
Studio and Flutter, there were no issues in either system during build or runtime.

34


4. Results

New approach Old approach
1314 1381
1323 1356
1393 1278
1427 1348
1404 1340
1392 1350
1410 1383
1387 1372
1402 1374
1371 1399

average 1382.3 1358.1
median 1392.5 1364

Table 4.7: The startup time (in milliseconds) of the application for the new and
old approach respectively.

35


4. Results

36


5
Discussion

This chapter discusses different ways to support several ideas provided by the inter-
viewees during evaluation in both the first cycle and the second cycle.

5.1 Research Question 1

RQ1: How can a domain-specific language (DSL) describe an ML model (and,
for example, its required inputs, outputs, and pre- and postprocessing stages)?

It is important to distinguish that our research is directed at making a DSL describ-
ing the input, output, preprocessing, and postprocessing around ML models, not
the models themselves.

Through our literature review and iterative process working on the project, how an
ML model can be described through a DSL depends on how generalizable you want
it to be. During our research into the domain, we have found that it is quite easy
to describe what happens before the data is fed into the ML model, however, the
difficult part is describing what happens after. Finding a balance here was one of the
more challenging tasks of the study. In the end, the choice to implement the DSL
using a JSON Schema with an accompanying library allowed us to provide both a
simple interface to describe ML serving pipelines and a way to implement custom,
one-off features without slowing down development.

Our approach allows developers to specify metadata about the model, consisting of
its name, path on the device, and input shape. The preprocessing is described as
a series of steps, where we implemented support for the most common actions used
in Wiretronic’s current ML serving pipelines in addition to the possibility of defin-
ing custom steps. Lastly, the DSL supports specifying the required postprocessing
actions. While the preprocessing actions were found to be quite trivial and general-
izable, the postprocessing steps are usually different between each model, opting for
having to implement custom functionality to handle the model outputs. Here, the
functionality to easily be able to define custom postprocessing actions is necessary.

37


5. Discussion

Summary

Representing an ML model’s input, output, preprocessing, and postprocessing
steps in a JSON configuration file requires a minimum of model metadata,
preprocessing actions, and postprocessing actions. By utilizing JSON Schemas
you can easily expand upon what features to have depending on the situation.

5.2 Research Question 2

RQ2: How can we best implement and utilize the DSL in a concrete setting,
specifically in the development of cross-platform mobile applications?

One aim with the DSL was to create a unified interface for not only multiple plat-
forms but also for engineers of different backgrounds. This meant that we did not
want to make it overly related to the underlying platforms, since this could cause
confusion or unfamiliarity for ML-focused engineers. Furthermore, we did not want
to make the DSL too restricting, offering experienced engineers the possibility to
combine the DSL with custom, platform-specific functionality.

While the DSL is aimed at cross-platform mobile development, we did not want
to make it tied to the technique currently used at Wiretronic, for example as an
internal DSL written in Dart (for Flutter). This connects to the previously men-
tioned point of creating a unified interface across platforms and experience levels,
but it also allows for porting or extending the DSL to additional platforms. We
developed the DSL and accompanying library so that if Wiretronic decides to shift
its cross-platform development to another technique, the DSL would not require any
modifications.

To accommodate the initial requests made by engineers to not require a complete re-
release of the application upon changes to the ML serving pipeline, we implemented
the DSL using a model interpretation approach, instead of using code generation.
Any application consuming our accompanying library could fetch a remote file writ-
ten in our DSL and dynamically set up the ML serving pipeline, without having to
re-release the application.

Summary

The DSL and accompanying library were implemented to support different
underlying techniques, engineers of different backgrounds, and making changes
to the ML serving pipeline without re-releasing the application.

38


5. Discussion

5.3 Research Question 3

RQ3: To what extent does the introduction of a DSL and an accompany-
ing library improve the developer experience in the aspects of maintenance,
feature development, time-saving, and resource planning?

From our controlled experiment and two rounds of interviews, it has been made clear
that a DSL designed in a familiar format can aid in lowering the barrier of entry in
this area of development. This may be the largest improvement when comparing the
previously used approach, as new engineers can contribute and experiment in devel-
opment. Through both objective and subjective metrics, our evaluation showed that
the engineers worked faster and more confidently while using our approach, partly
thanks to the ML-related functionality being isolated into a single file and format.
In addition, the results in Section 4.4.1 show us improvement in all metrics using
our new approach. The average improvement in development time was 344%, which
is also backed by the hypothesis testing performed in Section 4.4.2. The correctness
improved by 20% in the controlled experiment, but we could not prove statistical
significance. The engineers did however state in the interviews following the ex-
periment they still felt more secure using the new approach. If the DSL can help
more engineers contribute to this area of development it can help in all the aspects
stated in this research question. More engineers will be able to perform maintenance
tasks and develop new features, further helping Wiretronic deliver features faster
and easing their planning.

Summary

The DSL does lower the complexity of edge-deployed ML at Wiretronic. The
DSL and accompanying library make the entry into the field quicker, while
also enabling the engineers to do the work faster and more confidently. Hence,
improving the developer experience in the aspects outlined in the research
question.

5.4 Cross-Platform Communication
Due to the nature of cross-platform development, there will be many instances of
communication between the Flutter layer and the native layer through Method-
Channels. During the development of our library, we ran into many instances of
having to debug on both sides of the MethodChannels. This can become a very
tedious and time-consuming task, and with our library solving the issues of layer
communication, we can effectively ease the need for debugging for the developers.
In the future, we may see a shift away from using channels and data serialization
for inter-layer communication. React Native explores this in their new architecture,
which is under development at the time of writing. Here, the native code is written
in C++ and the cross-platform layer (in this case, JavaScript) can hold references
to C++ objects and vice-versa, calling functions directly on these objects [12].

39


5. Discussion

5.5 Threats to Validity
In our project and research, we have three main threats to validity.

5.5.1 Internal Validity
Internal validity is of concern when we examine causal relations [53]. We aimed to
ensure internal validity in our study by using a controlled setting, with the intent of
eliminating any confounding factors. The elicitation of our requirements and scope
was defined through only two