Camera-based State Estimation and
Autonomous Motion Control
Perception and Control of a Hauler Truck in a Demo Site

Master’s thesis in Systems, Control and Mechatronics

Kevin Bielecki
Rasmus Ekedahl

DEPARTMENT OF ELECTRICAL ENGINEERING

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2024
www.chalmers.se

www.chalmers.se


Master’s thesis 2024

Camera-based State Estimation and
Autonomous Motion Control

Perception and Control of a Hauler Truck in a Demo Site

KEVIN BIELECKI
RASMUS EKEDAHL

Department of Electrical Engineering
Division of Systems and Control

Chalmers University of Technology
Gothenburg, Sweden 2024


Camera-based State Estimation and Autonomous Motion Control
Perception and Control of a Hauler Truck in a Demo Site
KEVIN BIELECKI
RASMUS EKEDAHL

© Kevin Bielecki, Rasmus Ekedahl, 2024.

Supervisor: Hanna Hermansson, B&R Industrial Automation
Examiner: Martin Fabian, Electrical Engineering

Master’s Thesis 2024
Department of Electrical Engineering
Division of Systems and Control
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: The demonstration area where the system is deployed.

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Printed by Chalmers Reproservice
Gothenburg, Sweden 2024

iv


Camera-based State Estimation and Autonomous Motion Control
Perception and Control of a Hauler Truck in a Demo Site

Kevin Bielecki
Rasmus Ekedahl

Department of Electrical Engineering
Chalmers University of Technology

Abstract
This thesis explores the development of an autonomous system, designed for B&R
Industrial Automation to demonstrate autonomous solutions on their products with-
out any operator input. The goal of this thesis is to develop an autonomous system
that can manoeuvre a mobile unit between different stations, in a collision-free and
smooth manner primarily to enhance sales demonstrations. With a single camera
mounted in the ceiling, the system can make well-informed decisions using percep-
tion, motion planning and motion control. Key components include a machine-
learning model for perceiving the environment, a path- and trajectory planner, and
a linear Model Predictive Control (MPC) system. The project resulted in a fully
functional autonomous system that could execute demonstration runs, offering op-
portunities for further development.

Keywords: Autonomous systems, Computer vision, Object Detection, YOLO,
Path-planning, Motion planning, Motion control, MPC, Machine learning.

v


Acknowledgements
This master’s thesis was carried out at B&R Industrial Automation during the spring
of 2024. We wish to thank our academic examiner and supervisor from Chalmers,
Professor Martin Fabian, for his continuous support and feedback throughout this
project.

We also want to give special thanks to our supervisor, Hanna Hermansson, and
everyone at B&R Industrial Automation who assisted us during this project. Your
dedication, support, and the opportunity to work on this thesis have been irreplace-
able.

Kevin Bielecki, Rasmus Ekedahl, Gothenburg, June 2024

vii


List of Acronyms

Below is the list of acronyms that have been used throughout this thesis listed in
alphabetical order:

AMR Autonomous Mobile Robot
APC Automation Personal Computer
AVX2 Advanced Vector Extensions 2
CNN Convolutional Neural Networks
COCO Common Objects in Context
CPU Central Processing Unit
GPU Graphics Processing Unit
HMI Human-Machine Interface
IoT Internet of Things
IoU Intersection over Union
mAP Mean Average Precision
MPC Model Predictive Control
PID Proportional – Integral – Derivative
SSD Single-shot Detector
TPU Tensor Processing Unit
YOLO You Only Look Once
ZOH Zero-Order Hold

ix


Nomenclature

Below the nomenclature that has been used throughout this thesis is presented.

Indices

i Index for iterations
k Index for discrete time step

Parameters

∆t Time discretization step (time interval) [ms]
t Time [ms]
L Wheel base [m]
Lt Trailer length [m]
n Number of waypoints
N Control horizon

Variables

δ Steering angle [rad]
a Longitudinal acceleration [m/s2]
x x-coordinate [m]
y y-coordinate [m]
θ Heading angle [rad]
v Longitudinal velocity [m/s]
ψ Relative angle of trailer [rad]
x State vector

xi


P Point including x and y coordinate [m]

xii


Contents

List of Acronyms ix

Nomenclature xi

List of Figures xv

List of Tables xvii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Ethics and Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Preliminaries 5
2.1 Computer Vision and Machine Learning . . . . . . . . . . . . . . . . 5

2.1.1 You Only Look Once . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Model Quantization and Pruning . . . . . . . . . . . . . . . . 7

2.2 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Convex vs Non-convex optimization problems . . . . . . . . . 8

3 Technical Concept 9
3.1 Company Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Hardware Specifications . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.1 Automation PC (APC) . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 The Mobile Unit . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Conceptual overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Perception 15
4.1 Choosing the Perception Framework . . . . . . . . . . . . . . . . . . 15
4.2 Perception System Overview . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 State Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3.1 Position and Velocity . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Model Performance and Training . . . . . . . . . . . . . . . . . . . . 21

4.4.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 22

xiii


Contents

4.4.2 Image Annotation . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4.4 Process Acceleration on CPU . . . . . . . . . . . . . . . . . . 24

4.4.4.1 ONNX Runtime . . . . . . . . . . . . . . . . . . . . 25
4.4.4.2 OpenVino . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4.4.3 DeepSparse . . . . . . . . . . . . . . . . . . . . . . . 25

5 Motion Planning 27
5.1 Path Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Trajectory Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Motion Control 31
6.1 Motion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.1.1 Rigid Motion Model . . . . . . . . . . . . . . . . . . . . . . . 32
6.1.2 Articulated Motion Model . . . . . . . . . . . . . . . . . . . . 34

6.2 High-level Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2.1 PID Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2.2 Model Predictive Control . . . . . . . . . . . . . . . . . . . . . 36

6.2.2.1 Cost and Constraints . . . . . . . . . . . . . . . . . . 36
6.2.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . 38

6.3 Low-level Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Results 41
7.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7.1.1 Dataset and Model Training Evaluation . . . . . . . . . . . . 41
7.1.2 State Estimation Accuracy . . . . . . . . . . . . . . . . . . . . 42
7.1.3 Process Acceleration . . . . . . . . . . . . . . . . . . . . . . . 43

7.2 Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2.1 Test Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2.2 Control Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.4 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.3 Full System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3.1 Hardware Assessment . . . . . . . . . . . . . . . . . . . . . . . 50
7.3.2 Solution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3.3 Reliability and Accuracy Assessment . . . . . . . . . . . . . . 51

8 Discussion 53
8.1 Perception Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.2 Motion Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 53
8.3 Motion Control Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 54
8.4 Latency and Hardware Performance . . . . . . . . . . . . . . . . . . . 55

9 Conclusions 57
9.1 Future Improvements and Development . . . . . . . . . . . . . . . . . 58

Bibliography 59

xiv


List of Figures

1.1 Demo site with marked stations at B&R industrial automation in
Malmö. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Different methods of object detection classification . . . . . . . . . . . 6
2.2 YOLO architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Distinction between a convex and a non-convex function. . . . . . . . 8

3.1 Mobile Automation PC 3100 . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 ArkCam basic+ Mini 130 and table of specifications. . . . . . . . . . 12
3.3 The two mobile units used in the project. . . . . . . . . . . . . . . . . 12
3.4 Camera setup in the demonstration area. . . . . . . . . . . . . . . . . 13
3.5 Full system overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1 Performance comparison of different YOLO models . . . . . . . . . . 16
4.2 Flowchart of the three-phase principle of the perception system. . . . 17
4.3 A frame of the Mercedes Lego truck with active perception system,

displaying the coordinate system and the detection area . . . . . . . . 19
4.4 Illustration of the position, velocity and timestamps for the mobile unit 20
4.5 Undesired detections and false negatives with a pre-trained YOLO

model on the COCO dataset. . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Sample of the labelled dataset used for the perception system for

testing on the Mercedes truck. . . . . . . . . . . . . . . . . . . . . . . 23

5.1 Visual representation of the grid map with defined station nodes. . . 28
5.2 Flowchart of path generation in a grid map environment with a de-

scription for each process. . . . . . . . . . . . . . . . . . . . . . . . . 28

6.1 Simplified motion control overview. . . . . . . . . . . . . . . . . . . . 31
6.2 Simplified model of the rigid mobile unit. . . . . . . . . . . . . . . . . 32
6.3 Simplified model of the articulated mobile unit. . . . . . . . . . . . . 34

7.1 Performance metrics of model training over 25 training epochs. . . . . 41
7.2 Average latency for different object detection frameworks during run-

time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.3 Simple step response test. . . . . . . . . . . . . . . . . . . . . . . . . 45
7.4 Full cycle test between stations. . . . . . . . . . . . . . . . . . . . . . 45
7.5 Simulator interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.6 Controller comparison in simulation in a step test. . . . . . . . . . . . 47

xv


List of Figures

7.7 Controller comparison in simulation in a full cycle test. . . . . . . . . 48
7.8 Controller comparison on the target hardware for a step test. . . . . . 48
7.9 Controller comparison on the target hardware for a full cycle test. . . 49

xvi


List of Tables

3.1 System requirements for the project. . . . . . . . . . . . . . . . . . . 10
3.2 Automation PC 3100 specifications . . . . . . . . . . . . . . . . . . . 11
3.3 ArkCam basic+ Mini 130 specifications . . . . . . . . . . . . . . . . . 12

4.1 Performance metrics for different YOLOv8 models. . . . . . . . . . . 21

7.1 Standard deviation of position measurements . . . . . . . . . . . . . . 42
7.2 Average accuracy measurements of different acceleration methods. . 43
7.3 Tuning parameters for PID-controller. . . . . . . . . . . . . . . . . . . 46
7.4 Plot of reference path deviation error in simulation. . . . . . . . . . . 47
7.5 Reference path deviation error in simulation. . . . . . . . . . . . . . . 48
7.6 Reference path deviation error on the target hardware for a step test. 49
7.7 Reference path deviation error on the target hardware for a full cycle

run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.8 Average update time, max update time and solver time in seconds for

full cycle test with different controllers. . . . . . . . . . . . . . . . . 51

xvii


List of Tables

xviii


1
Introduction

As technology evolves, the pursuit of automation expands in several areas [1]. Within
the field of autonomous systems, perception and control are two fundamental chal-
lenges that play an important role in enabling systems to make well-informed de-
cisions based on their surroundings. By continuously monitoring and gathering
information from the surrounding environment, the system can update its internal
representation of the world. This technique is often referred to as state estimation
and its data can be used in decision-making systems such as a motion planner to
determine feasible or optimal routes from a starting point to a goal. Further, motion
controllers are used to ensure that the planned path is maintained. By combining
these components, autonomous systems could navigate through complex and dy-
namically changing environments to reach desired locations on time, efficiently and
safely.

1.1 Background
This master thesis is a collaborative project with B&R Industrial Automation. To
display pioneering technology and digital innovation with B&Rs products, a show-
room called OrangePoint is facilitated at the main office in Malmö. Within Or-
angePoint, one of the demonstrations showcased is a small-scale site containing a
miniature hauler truck referred to as a mobile unit, displayed in Figure 1.1. The mo-
bile unit can be controlled remotely by an operator via Bluetooth communication,
to drive the truck between different stations using a computer from B&R. To further
improve this showroom, B&R wants to implement a fully automated system that
can replace the operator, navigating the area and driving the mobile unit between
multiple stations. The goal is to complete various tasks and interact with different
products from B&R and external suppliers.

1


1. Introduction

Figure 1.1: Demo site with marked stations at B&R industrial automation in
Malmö.

The demonstration site displays a fusion of Internet of Things (IoT) and Cloud
services, incorporating third-party solutions and enabling remote connectivity. It
showcases how various industries can develop a resilient, robust and future-ready
platform with B&Rs hardware and software solutions [2]. The proposed addition of
an autonomous feature aims to enable the mobile unit to autonomously navigate a
complex environment by leveraging real-time data analysis, machine learning algo-
rithms, and computer vision, deployed on a hardware control unit from B&R. This
feature is anticipated to ensure safe and efficient operation of the mobile unit while
travelling between different stations within the demonstration area. Its integration
is seen as a crucial step in enhancing the platform’s capabilities and displaying how
B&Rs technologies can be utilized at the forefront of mobile automation.

The demonstration area is a compact 2 × 2 meter square, designed to simulate a
sand-covered environment using a base layer and larger piles of orange plastic. This
area also contains a miniature mobile digger and various equipment with the poten-
tial of expanding with more features and products in the future. The layout features
static obstacles, sharp turns and areas where a combination of reversing and going
forward is necessary to navigate and effectively reach the desired positions, thus
posing several challenges when designing an autonomous system.

For this project, Figure 1.1 illustrates four pre-determined stations. Loading ma-
terial at station 1, camera and QR-code identification at station 2, weighing of the
loaded vehicle at station 3, and finally unloading at station 4. The objective is for
the mobile unit to autonomously navigate between these stations in a safe and ro-
bust manner. At each station, the mobile unit will pause and wait for a “go-ahead”
signal, which indicates the completion of the required process before moving on to
the next station.

2


1. Introduction

1.2 Aim
The primary goal of this thesis is to investigate different approaches, and develop a
system for the perception and control of an autonomous mobile unit with hardware
from B&R. The results should be presented on a mobile unit that will travel between
different stations inside a demonstration area without any operator input.

1.3 Limitations
To define the scope of the project, the following list of limitations and aspects are
considered within this thesis.

• The system is only intended for the demo site and therefore not general pur-
pose, meaning modifications of setup or different locations will require adjust-
ments and additional work to ensure the same performance.

• The primary goal of this project is to achieve smooth operation of the mobile
unit rather than optimal computational efficiency.

• The system is limited to only one mobile unit, thus not considering other
mobile units within the area.

• The perception system will not consider the location of anything but the mobile
unit. All other obstacles are considered static, and therefore their locations
are predefined.

1.4 Research Questions
This project aims to implement an autonomous system that enables the mobile unit
to navigate between different stations. To assess the system and guide development,
the following research questions will be explored.

1. To what extent can camera-based state estimation of a mobile unit be used
to determine different kinematic properties, including position, speed, and
relative angle of joints?

2. What type of control strategy will ensure sufficient motion planning and con-
trol of the autonomous mobile unit within the predefined area to ensure safe
and efficient navigation without operator input?

3. What are the key factors affecting the reliability and accuracy of the au-
tonomous system controlling the mobile unit, and how can these factors be
managed?

3


1. Introduction

1.5 Ethics and Sustainability
This thesis aims to explore the field of perception and control of autonomous mo-
bile units, a technology that has many upsides when implemented on a larger scale.
This project is a proof of concept on a small scale within a secure area. However,
the same technologies and principles can be applied within other industries and on
a larger scale. Therefore it is important to consider the sustainability and ethical
implications of the project, as well as the potential consequences this technology
could have for the future. Below some important ethical and sustainability aspects
are further addressed primarily focusing on autonomous transportation vehicles.

The perception and control within the autonomous system must be reliable and
robust to ensure safety for the vehicle and its surroundings. However, in the case
of failure, the autonomous system must be able to hand over control or go into a
fail-safe mode in order to come to a collision-free and safe stop [3]. At the same
time, removing the "human factor" from critical operations could lead to a reduced
risk of errors and mitigate the risk of human injuries.

The perception functionality of an autonomous system generally contains sensitive
data about its environment. Whether the data is collected with cameras, lidars or
GPS, the contents must be protected from external operators with ill intent. It is
also important to ensure the integrity of other people is upheld. To mitigate this
problem it is critical that this data is properly protected and defining data destruc-
tion as a continuous process [3].

While autonomous transportation could improve the social working conditions within
several industries, by removing monotonous work and heavy lifting, it is important
to consider the aspect of a increased lack of jobs for humans. Truck- and forklift
drivers and many more occupations risk replacement in the future as autonomous
driving technology evolves [3]. Therefore the impact of deploying autonomous tech-
nology should be assessed in each separate industry and a plan for relocation the
workforce should be made.

From an environmental perspective, autonomous mobile units’ capability to opti-
mize routes and driving behaviours can also result in enhanced energy efficiency
compared to manual operations. This efficiency could lead to decreased emissions,
contributing positively to environmental sustainability, while also enabling cost sav-
ings for companies applying these technologies [3].

4


2
Preliminaries

This section establishes the theoretical foundation for the report, providing the
necessary background on concepts and methods essential to this thesis.

2.1 Computer Vision and Machine Learning

Enabling machines to interpret and understand visual inputs, could often involve
identifying and locating objects within an image [4]. Object detection can be
achieved by classifying different parts of each image into various categories and us-
ing techniques like deep learning and convolutional neural networks (CNNs). There
are several different approaches within this field, where algorithms such as Region-
based Convolutional Neural Networks (R-CNN) or You Only Look Once (YOLO)
algorithms are commonly used. These models improve the speed and accuracy of
detection by focusing on specific regions of interest within the image and executing
classification in a single pass. These networks learn to recognize patterns and fea-
tures from larger datasets and labelled images and are widely applied within different
industries, where real-time accuracy is crucial. Ongoing research and development
in object detection focus on increasing the robustness and efficiency of these models.
Efforts include improving the training datasets to cover more diverse scenarios and
conditions, optimizing algorithms to reduce computational demands, and refining
accuracy to distinguish between closely similar objects [5].

2.1.1 You Only Look Once

You Only Look Once (YOLO) is a deep learning-based algorithm mainly used for
object detection claiming to provide real-time performance, high accuracy, and is
open source [6]. The YOLO model can estimate both bounding boxes and predicts
object classes simultaneously, while still maintaining high accuracy. The object
detector used in YOLO is a single-shot method, meaning that the entire frame of
the image is analyzed and made predictions, all at the same time. This approach
differs from many other methods, such as RCNN or Fast RCNN, which first detect
possible regions of interest and then perform image recognition. Figure 2.1 classifies
different algorithms, where the main distinction between the different approaches
suggests that single-shot methods result in improved real-time performance while
two-shot detection yields a higher accuracy.

5


2. Preliminaries

Figure 2.1: Different methods of object detection classification

Figure 2.1 describes the branching of the different object detector methods and
whether they are of Two-stage or One-stage classification. The YOLO-model is a
CNN that can be used to recognize and identify items with high speed and accu-
racy [6]. The detection model consists of 24 convolution layers where 20 of these are
pre-trained. These are then followed by 2 fully connected layers, which in the end
yield a 7 × 7 × 30 tensor of predictions. This direct prediction mechanism is what
enables YOLO to achieve its high speed, differentiating it from other detection sys-
tems that often employ separate steps for feature extraction, proposal generation,
and object classification. The YOLO architecture is shown in Figure 2.2 with the
7 × 7 × 30 tensor as the final output.

Figure 2.2: YOLO architecture [6].

Since the first version of YOLO was published in [6], different versions of the algo-
rithm including YOLOv3, YOLOv4, all the way up to YOLOv9 have been developed.
Each iteration aims to improve its detection accuracy and speed while keeping a low

6


2. Preliminaries

computational complexity to achieve real-time performance. Further information
and comparison of different versions of the YOLO algorithm can be found in [7, 8].

2.1.2 Model Quantization and Pruning
One main goal when designing new deep-learning models is improving the accuracy.
However, this commonly also results in larger model sizes. Consequently, with larger
models comes the need for more computational resources. Simultaneously, there is
a demand for the deployment of high-precision models on less powerful hardware,
both in terms of cost and scalability. Two effective strategies for achieving these
objectives are pruning and quantization. These techniques not only help in scaling
down the models but also ensure that their precision remains as close to the more
dense models as possible [9].

Large sets of weights in deep-learning models are commonly replicated more than
once, lack content, and are together with different pathways not important after
training the model [10]. Pruning involves removing parts of the model that are
considered less important or redundant for deployment. The primary goal is to re-
duce the complexity of the model without significantly impacting its performance
or accuracy. Reducing the model parameters can result in a model with improved
runtime performance and reduced computational complexity.

Additionally, most weights in a model are typically too precise for runtime appli-
cations and this type of precision is generally not needed after the model has been
trained [10]. Quantization refers to the method of reducing the precision of the
numbers used to represent the model weights. With quantization, the computa-
tional power and memory needed to run the model can be reduced using lower
precision datatypes such as 8-bit integers instead of the standard high precision 32-
bit floating points.

2.2 Optimization problems
An optimization problem seeks the optimal solution from a set of possible choices.
The primary objective of such problems is to find the minimum or maximum value
of a function, known as the objective function, under a set of constraints. These
constraints are typically expressed as inequalities and equalities that the solution
must satisfy [11]. An example of a mathematical formulation of an optimization
problem is presented in (2.1).

min
x

f0(x)

subject to gi(x) ≤ bi, i = 1, . . . ,m
hj(x) = 0, j = 1, . . . , p

(2.1)

In (2.1), the vector x is the decision variable, and x∗ is the optimal solution to
the problem. The function f0 represents the objective function or cost, that is to

7


2. Preliminaries

be minimized, while gi(x) and hj(x) represent inequality- and equality constraints
that the solution must respect. The constraints ensure that the solution not only
optimizes the objective function but also remains within a feasible region defined by
the set limits. A solution to the optimization problem corresponds to a choice that
has a minimum cost, among all choices that meet the constraints [11].

2.2.1 Convex vs Non-convex optimization problems
The nature of the objective function and the set over which the optimization is
performed can be divided into two different types of optimization problems: convex
and non-convex, with their differences illustrated in Figure 2.3.

Figure 2.3: Distinction between a convex and a non-convex function.

A convex optimization problem is generally characterized by every local minima also
being a global minimum within the feasible set. This property significantly simplifies
the search for an optimal solution as convex problems only have a unique solution
or multiple solutions forming a convex set. In comparison, non-convex functions
contain both local and global minima, posing a larger challenge to find the optimal
solution as the problem becomes more complex and computationally heavy [12].
The general trade-off between these types of optimization problems is the gain in
accuracy compared to the added computational complexity. Non-convex problems
allow for more complex problem formulations that could yield a higher accuracy
than a convex function, however, solving these types of problems generally requires
more computational power to converge to the optimal solution.

8


3
Technical Concept

The main purpose of this thesis is to research and develop methodologies that enable
a mobile unit to navigate fully autonomously within a pre-defined area. This can be
divided into three partial problems, state estimation, motion planning, and motion
control, which are open-ended problems and can be solved in multiple different ways.
Thus, to highlight the aspects that form the basis of the chosen technical concept,
the following sections focus on how requirements and limitations affect the choice of
technical solutions for the project.

3.1 Company Requirements
Ensuring that the technical solution is adapted to its use case is an important as-
pect of the autonomous system which will primarily be used in sales demonstrations,
showcasing the capabilities of B&R’s hardware and software in autonomous applica-
tions. To achieve this alignment, a set of system requirements was defined together
with the company, ensuring that the solution covers all desired aspects. Table 3.1
presents the system requirements for the project.

9


3. Technical Concept

Requirement Priority Details
Performance analysis of hardware Must have Explore the potential applica-

tions of B&R’s hardware tech-
nology with software that re-
quires significant computational
resources.

Modularity Must have All parameters that affect the sys-
tem behaviour should be simple
to change to display different sce-
narios.

State estimation with a camera Must have Estimate the position of the mo-
bile unit, relative angle, and ve-
locity.

Cropping of image Must have Used to specify what parts of the
camera frame objects should be
detected within. Should be easy
to adjust based on the location of
the demo area.

Go through production cycles Must have Go between different stations
within the site, where the location
of each station can be adjusted.

Defining restricted areas Must have A configurable area where the ve-
hicle is allowed to move.

Efficient and smooth navigation Must have Opt for a visually appealing and
smooth route to enhance the au-
dience’s perception.

Automatic adjustment of crop-
ping parameters

Nice to have Automatic detection of demo lo-
cation and orientation to adjust
cropping parameters.

Adaptive visual mapping Nice to have Mapping with the camera obsta-
cles and areas that are difficult to
traverse within.

Table 3.1: System requirements for the project.

In Table 3.1, the tasks listed as must have are critical for the project and must be
implemented. The topics that are listed as nice to have are to be implemented if
time permits.

From the company’s perspective, the autonomous system actuating the mobile unit
should be appealing to customers, displaying a complex technical solution with
intuitive and smooth movements and decisions. The project therefore prioritises
a final product with the above-mentioned features rather than finding the optimal
solution in terms of shortest path, energy efficiency etc.

10


3. Technical Concept

3.2 Hardware Specifications
Based on the requirements above, a technical analysis was performed on the given
hardware to further evaluate its capabilities. This was mainly done for benchmarking
purposes and to eliminate any potential bottlenecks later on in the project, concern-
ing hardware specifications. Below, a description of each hardware component used
in the project is specified.

3.2.1 Automation PC (APC)
The processing unit used in this project is the APC Mobile 3100, a product from
B&R Industrial Automation displayed in Figure 3.1. The computer houses a PLC
and a Linux operating system, both using an Intel processor for computations. All
work done within this project is done on the Linux side of the APC. The objective
is to integrate the entire system within a single B&R hardware unit, along with
components from other suppliers. If necessary, the selected APC can be upgraded
to an APC Mobile 3100 with an Intel i7 central processing unit (CPU) and increased
RAM.

Parameters Specifications
Material number 5MPC3100.K038-000

CPU model Intel Celeron 3965U
CPU speed 2.2 GHz

RAM 8 GB

Figure 3.1: Mobile Automation PC 3100 [13].

11


3. Technical Concept

3.2.2 Camera
The camera selected for this project is an ArkCam image sensor, specifically de-
signed for monitoring both mobile and stationary industrial environments [14]. The
sensor choice by the company is strategic, primarily because of its widespread use
in mobile applications. The goal is therefore to incorporate it in the demonstration
area to show its capability to be integrated seamlessly with B&R’s hardware along-
side other components from various suppliers, aiming to deliver a complete system
solution. Figure 3.2 provides sensor specifications and shows the ArkCam sensor.

Parameters Specifications
Max videostream 1280x720@60fps

Latency <100 ms
Power consumption <3 W

Viewing angle 130°

Figure 3.2: ArkCam basic+ Mini 130 and table of specifications.

3.2.3 The Mobile Unit
In this project, the used mobile units are miniature Lego trucks designated to demon-
strate the autonomous functionality of the system. However, the result of this thesis
work is meant to be applicable across a range of sectors, including the autonomous
mobile robot (AMR) industry, automotive- and construction industries, and other
fields that employ similar technologies and the Lego trucks are utilized for proof of
concept. The two mobile units employed in this project are shown in Figure 3.3.

(a) Rigid truck for testing during
development.

(b) Articulated truck used in the
demonstration area.

Figure 3.3: The two mobile units used in the project.

Figure 3.3a, shows a truck with rigid dynamics i.e. the steering capabilities are
directly influenced by its wheelbase. In comparison, Figure 3.3b has another degree
of rotational freedom around the joint between the head of the truck and the trailer.
The latter truck is the main unit used in the demonstration area. However, since it

12


3. Technical Concept

is commonly used for sales purposes, the rigid truck it has been utilized for testing
the system during development.

Both mobile units are controlled with longitudinal drive motors and a motor con-
trolling the requested steering angle, where the steering geometry is Ackerman. The
main implication of using two different mechanical setups is the manoeuvrability of
the vehicle. Simply put, a rigid body provides simpler dynamics when performing
more complex actions such as reversing etc. However, when comparing turning ca-
pabilities the articulated mobile unit allows for better manoeuvrability as the pivot
point allows for sharper turns, making it more suitable when manoeuvring tight
spaces such as the demonstration area [15]. The trade-off between complexity and
manoeuvrability is further analyzed mathematically in Chapter 6.1.

3.3 Conceptual overview
With the specified hardware and desired functionality of the system, a conceptual
overview could be determined, including operational behaviour and technical solu-
tions. With the current functionalities considered and the fact that the mobile unit
does not hold an adequate internal processing unit, a centralized approach was taken
where the APC estimates the position, plans the desired path and actuates the mo-
bile unit based on a single sensor, a camera mounted above the demonstration area
as shown in Figure 3.4.

(a) Camera mounted in the ceiling. (b) Field of view from the camera.

Figure 3.4: Camera setup in the demonstration area.

13


3. Technical Concept

A simplified overview of the system architecture is presented in Figure 3.5, where
the system is divided into 3 larger sections, Perception, Motion Planner, and Motion
control.

Figure 3.5: Full system overview.

With a centralized approach, all subsystems must run on a single CPU, limiting
the computational complexity and thereby what can be achieved in terms of real-
time performance. Therefore, this aspect must be considered throughout the whole
project.

With a single camera as the available sensor and providing the system with real-
time updates, the desired states of the mobile unit can be estimated. This will be
achieved using a machine-learning model for object detection within the perception
system. Simultaneously, the desired goal state is given as an external input to the
system. The motion planner then combines this with the current state from the
perception system to plan the desired path and define a set of desired states at each
iteration. Finally, a motion controller utilizes all these inputs combined with a pre-
defined feasible area to decide how to manoeuvre the vehicle to reach the desired
goal state collision-free and smoothly. The upcoming chapters will delve deeper into
each of the subsystems discussed, providing detailed explanations of the reasoning
behind the decisions, methodologies, and implementation processes.

14


4
Perception

In robotics and autonomous systems, accurately perceiving the environment is im-
portant for effective and accurate operations. Perception involves the use of various
sensors to gather data, which is then processed to estimate the state of the system.
These states can be essential as they provide the system with an object’s position,
orientation, and other desired dynamic attributes critical to its decision-making pro-
cesses. Among the sensors available, cameras are particularly valuable due to their
rich data capture. To leverage this data effectively, object detection frameworks
can be employed, which can enable recognition and tracking of objects within the
camera’s field of view. There are several frameworks available for object detection,
each with its unique strengths and applications. Some of the most recognized are
R-CNN, SSD, and YOLO models [16].

4.1 Choosing the Perception Framework
In this project, a camera serves as the primary input sensor, capturing visuals of the
system’s environment. Utilizing computer vision techniques, the developed system
is designed to detect, recognize, and track the mobile unit, leveraging this data to es-
timate the desired states of the mobile unit. The challenge in accurately estimating
the state from visual inputs lies in dealing with varying visual conditions, potential
obstacles, undetectable objects and the necessity for processing the data at real-time
speed. A critical aspect of this project is identifying and consistently tracking the
targeted mobile unit in real time, which is essential for a reliable system. With these
aspects in mind, a one-stage method was desired to minimize the model complexity,
narrowing the selection down to YOLO or SSD. Based on prior comparisons the
YOLO framework was selected due to its smaller model size, comparable real-time
accuracy and its compatibility with other tools [17].

The trade-off between speed and accuracy became more nuanced as the YOLO
framework evolved. Each YOLO model is aimed at increasing either speed, ac-
curacy, or a balance of both. YOLOv4, for instance, is noted for its robustness
and efficiency in real-time settings. YOLOv5 for its new innovations, such as new
network backbones, improved data augmentation techniques, and optimized train-
ing strategies. YOLOv7 and YOLOv8 further push the boundaries in terms of
accuracy, integrating techniques from the latest research to improve detection per-
formance [18]. Figure 4.1 shows a comparison of various YOLO models, illustrating

15


4. Perception

the correlation in model size, speed, and accuracy for the different models.

(a) The relationship between model
complexity and detection accuracy.

(b) The tradeoff between infer-
ence speed and accuracy for the
same models.

Figure 4.1: Performance comparison of different YOLO models [18].

From Figure 4.1 it can be observed that YOLOv8 maintains a leading accuracy rate
while it has a slightly larger number of parameters compared to YOLOv6-2.0. This
indicates a higher efficiency in parameter utilization since it achieves higher accuracy
with a comparable number of parameters. In addition to this, YOLOv8 displays the
lowest latency. For a real-time application, this is critical as it means the model
can process and analyze frames more efficiently. With all this considered, YOLOv8
is the most balanced option for the project’s real-time application. It achieves the
highest accuracy and does so at the least latency.

During the course of the project, a new iteration of the YOLO algorithm, YOLOv9,
was released. Comparisons between the previous version, YOLOv8, and YOLOv9
suggest that the latter offers reduced complexity and enhanced performance for ob-
ject detection [19]. However, as YOLOv9 is in the initial stages of deployment, and
compatibility with additional tools such as trackers, visual aids, and other function-
alities remains limited, this poses challenges for its integration. Therefore the choice
of using YOLOv8 for this project remains.

The perception system in this thesis is based on the YOLOv8 model, with a custom-
trained model dataset. The YOLOv8 model handles object detection, classification,
and segmentation tasks. YOLOv8 introduces improvements over previous YOLO
versions, such as better feature extraction, more sophisticated backbones and fea-
tures that make it easier to use and tailor for the project’s specific application. This
results in enhanced accuracy and lower latency, especially in challenging scenarios
like small object detection or in conditions with poor lighting or occlusions. In ad-
dition to this, YOLOv8’s architecture allows for efficient custom training with new
datasets. In comparison to other models like Faster R-CNN, SSD, or Mask R-CNN,
YOLOv8 offers a superior balance of speed, accuracy, and flexibility [18].

16


4. Perception

4.2 Perception System Overview
The perception system works on a three-phase principle: process video, process
frame, and annotate frame. Figure 4.2 shows a flowchart of the working principle.
Each principle of the system is structured as a distinct task within the ’State Es-
timation’ group. This approach organizes the different parts of the program into
specific, manageable sections, each responsible for a particular aspect of the overall
process.

Figure 4.2: Flowchart of the three-phase principle of the perception system.

Figure 4.2 presents a visualization of the three-phase principle and how the principles
of the perception system communicate. The perception system features a dynamic
configuration script which allows the configuration of settings to be adjusted. The
process video principle captures raw footage from the chosen video source. Then,
the process frame principle uses the YOLOv8 model for executing computer vision
tasks, enabling to detect and track objects. The model continuously updates for
each frame to maintain accurate detection and tracking. Furthermore, the annotate
frame principle ensures that all detections are confined within the pre-set detection
zone. The system operates in a loop, consistently refreshing the visual output in
sync with the system’s frequency.

In this thesis, the detection zones have been specifically set to suit the sandbox area
at OrangePoint in Malmö. This configuration ensures that detections are limited
to objects within the sandbox, effectively filtering out irrelevant targets. The video
input for the system is streamed from a live camera feed, strategically placed in
the ceiling above the sandbox. Both the detection zone and video input source are
customizable and can be tailored to desired specifications through a configuration
file. The perception system is designed to accept inputs into the detection zone that
are proportionate to the camera’s field of view. This configuration assumes that the
sandbox at OrangePoint in Malmö remains unrotated. Any rotation would intro-

17


4. Perception

duce inaccuracies in the position calculations within the state estimation process.
The location of the unrotated sandbox can be configured depending on the desired
position.

4.3 State Estimations
State estimation is a process aimed at deducing the state of a system, such as
position, using observed data. It uses a mathematical model, later described in
Chapter 6.1, that describes how the system’s state changes over time and how this
state correlates with the observed data. In the real world, measurements often come
with noise and may not be complete. When implementing state estimation with a
camera and computer vision, the approach involves using a camera to capture images
or video frames, which act as observational data. These images may include objects
whose states (like position or orientation) are to be estimated. Computer vision
methods are then applied to identify and track features of objects across successive
frames. Techniques such as optical flow or object detection algorithms, for instance,
YOLO, are commonly used [7]. The detected features allow the system to estimate
movements or positional changes of the objects. Many applications require these
estimations to be performed in real-time, necessitating the use of efficient algorithms
and, in some cases, the support of hardware acceleration.

4.3.1 Position and Velocity
In this thesis, the YOLOv8 computer vision model is used together with a cam-
era to enable real-time tracking of a mobile unit. Object detection, the core task of
this model, involves specifying the location and categorizing objects within an image
or video stream. In this thesis, a video stream acts as the visual input for the model.

The detector’s output consists of bounding boxes that encompass the identified ob-
jects in each frame, together with class labels and confidence scores. To track the
mobile unit, separate tracking algorithms like BYTE, SORT (Simple Online and
Real-time Tracking) or DeepSORT can be used [20]. These algorithms take the de-
tections from YOLOv8 and apply a series of steps to achieve continuous tracking of
objects as they move across the video frames. The primary challenge is to maintain
the identity of each object from frame to frame, despite changes in position, orienta-
tion, scale, or in interaction with other objects. In this thesis, BYTE is used to track
the mobile unit, because of its robust and accurate detection performance. BYTE
uses the help of associating multiple low-score detection boxes as it can indicate the
existence of objects. It also highlights the method for using detection outcomes to
improve multi-object tracking [20].

The position is gathered by extracting the centre point of the detection box for
the tracked object, irrespective of its orientation. The x and y coordinates for the
detected objects are updated with each updated frame, in line with the frequency
∆t. The x and y coordinates are plotted on a 2D plane, as the camera is oriented
directly over the demo area. Initially, the origin is placed at the top left corner of

18


4. Perception

the frame. However, upon configuring specific detection zones, the origin shifts to
the top left corner of the selected detection zone, as illustrated in Figure 4.3.

Figure 4.3: A frame of the Mercedes Lego truck with active perception system,
displaying the coordinate system and the detection area

Figure 4.3 shows a snapshot from the perception system’s output. It features the
detected object outlined in purple, along with its x and y centre coordinates. Addi-
tionally, there’s a red box that represents the pre-set detection zone. The origin of
the coordinate system is indicated in the top left corner of the detection area.

To calculate the velocity of a moving vehicle using a camera, it is essential to track
how specific points on the vehicle shift over time. These points, which remain fixed
to the bounding box centre point, move at the same velocity and direction as the
vehicle when it is in motion relative to the camera. In this project, the camera is
strapped to the ceiling, so the vehicle’s speed is measured in relation to the camera,
which corresponds to the speed relative to a stationary plane in the camera’s view.

To determine the vehicle’s speed, frames captured by the camera are analyzed.
This process allows for the measurement of the vehicle’s momentary speed. The
calculation of this velocity is based on the change in the position of the reference
points across the current frames according to:

v = ∆P
∆t . (4.1)

Here v is equal to the velocity of the truck, and ∆P corresponds to the Euclidean
distance between two points, displayed in (4.2). ∆t corresponds to the measured
time difference it takes for the vehicle to be transported the distance ∆P , i.e ∆t =
tk − tk−1.

∆P =
√

(xk − xk−1)2 + (yk − yk−1)2 (4.2)

The velocity, v, is a velocity vector of a point where v ∈ R2 i.e in 2D space since only
one camera is used in this thesis. The measured time, ∆t, is equal to the time which
passes between two processed video frames and is equal to the update frequency.
To find the velocity of the vehicle, one point is not enough. For this reason, the

19


4. Perception

estimate of the velocity of the mobile unit can only be used after two time periods,
t > 2∆t, where at least two positions have been registered. Figure 4.4 illustrates
the points and timestamps necessary for calculating the velocity.

Figure 4.4: Illustration of the position, velocity and timestamps for the mobile
unit

When dealing with image processing and tracking, the orientation and placement
of the camera play a crucial role in how to interpret and manipulate the captured
data. If a camera is not positioned in a bird’s-eye view, that is, directly overhead,
the resulting images can exhibit perspective distortion. This distortion will then
skew the perceived dimensions and positions of objects within the frame, calculat-
ing the kinetic properties like velocity or distance impractical.

One common approach to this problem is to apply a coordinate transformation [21].
This process involves adjusting the image coordinates to reflect the true layout of the
frame. It corrects for the perspective-induced distortions, aligning the image closer
to what would be seen from a top-down view. The transformation is essential for
precise tracking and measurement, as it ensures that the calculations are based on
the actual arrangement of objects, rather than their distorted image representations.

This thesis is however fortunate to avoid these complexities. The camera setup is
strategically placed directly above the surface that is being tracked. This positioning
provides a bird’s-eye perspective, thus naturally eliminating significant distortion
that would otherwise be present. As a result, the images that are captured are
already in a desirable format for analysis, decreasing the complexity of the perception
system. No camera calibration has been performed in this project but could yield
improved accuracy as distortion due to the camera lens is still present.

20


4. Perception

4.4 Model Performance and Training

The YOLOv8 architecture provides a range of different-sized models. Some of these
models are presented in Table 4.1. When comparing each model at a set pixel
size, the mean average precision (mAP) and the latency can be evaluated. The
mAP measures the average precision of an object detection model over a range of
intersection over union (IoU) thresholds, in this case between 50% – 95% [22].

Table 4.1: Performance metrics for different YOLOv8 models.

Model Pixel Size mAPval
50−95 Speed CPU ONNX [ms]

YOLOv8n 640 37.3 80.4
YOLOv8s 640 44.9 128.4
YOLOv8m 640 50.2 234.7
YOLOv8l 640 52.9 375.2

Within object detection with YOLOv8, Table 4.1 displays a trade-off between the
model’s size and its performance characteristics on a CPU. A larger model typically
yields increased accuracy and precision but comes with a cost of decreased process-
ing speed and a higher demand on computational resources. To choose a suitable
model for the application one must achieve a balance between accuracy and speed,
taking into consideration the computational capacity available for the task.

Based on the data presented in Table 4.1 and considering the hardware specifications
outlined in Chapter 3.2, the YOLOv8n “nano” model was selected for the project.
The choice was made to keep inference time as low as possible, accepting a certain
trade-off in accuracy to ensure real-time performance.

Utilizing a CNN for the detection and tracking of an object is a crucial part of the
overall system as the perception lays the foundation for the other systems to make
well-informed decisions. Thus, the state estimation system must be robust, ensuring
that the tracking of the mobile unit is never lost. This was not the case when using
the pre-trained model provided with YOLOv8, resulting in the detection of multiple
undesired objects, tracking loss, and false negatives as shown in Figure 4.5. There-
fore the model had to be trained on a custom dataset suitable for the application.

21


4. Perception

Figure 4.5: Undesired detections and false negatives with a pre-trained YOLO
model on the COCO dataset.

To achieve desirable results, the model must be modified and trained on data such
that the mobile unit can be recognized and tracked at all positions within the demo
area. For the chosen YOLOv8-n model the following steps were taken to obtain a
robust model that could complete the tasks specified in Chapter 4.3.

4.4.1 Data Acquisition

The performance and efficiency of a YOLO model are highly dependent on the data
that it is trained on. Therefore, the pre-trained YOLOv8-n model was extended
and trained with a dataset including images of the mobile unit in various contexts
within the demo area. To save time and ensure a varied dataset, multiple videos
of the mobile unit were taken, covering different production cycles in various con-
ditions and placements around the demo area. The objective was to form a broad
dataset to enhance the model’s ability to generalize effectively and mitigate the risk
of overfitting. The gathered videos were then converted into images, with a selected
number of captured frames from each video to compile a large dataset. The final
dataset consisted of 1450 images, where a sample of the labelled dataset is shown in
Figure 4.6.

22


4. Perception

Figure 4.6: Sample of the labelled dataset used for the perception system for
testing on the Mercedes truck.

4.4.2 Image Annotation
Image annotation is the process of adding metadata to a set of images, i.e. annotat-
ing the desired objects within each frame with bounding boxes and labels. This is
done to guide the algorithm to learn from the provided data and emphasise certain
points.

The process of image annotation can be time and resource-consuming for a large
dataset. Therefore, to avoid manually annotating each image in the new dataset a
base model was employed to automate the process. A base model is a large founda-
tion model that can be applied for multiple purposes, trained on large datasets [23].
Within this project, the Grounded Segment Anything Model (SAM) is used, a model
that can segment out individual objects from an image [24]. The base model is
trained on over 11 million images and 1.1 billion masks, and when given prompts
of desired objects it can annotate a large dataset with bounding boxes and labels
quickly and without any other external inputs. A result of this is shown in Figure 4.6.

4.4.3 Model Training
Training a YOLOv8 computer vision model for real-time applications, particularly
for consistent detection and tracking of a specific object, is critical. When utilizing
transfer learning to train a dataset precisely tailored to the trait of a particular ob-
ject, the model significantly improves in detecting that object accurately. It learns
to identify unique features and variations of the object, effectively distinguishing it
from similar items or background interference. To achieve the distinction between
the target and other objects, a set of the early layers in the model are frozen, mean-
ing that they are not updated during the training process. Instead, only the deeper
layers are fine-tuned with the new data. This is an important aspect as it leverages
the generic features learned from the standard dataset and adapts more specific
features in the deeper layers. With this method the number of false positives and
negatives can be reduced, thereby improving the accuracy.

23


4. Perception

Moreover, the effectiveness of the model in real-time scenarios depends on its abil-
ity to swiftly and reliably re-identify the object in successive frames, adapting to
movement and partial obscurations. This training is key to maintaining consistent
tracking, regardless of changing conditions. Additionally, by refining the model’s
focus on a specific object, it becomes operationally more efficient. This efficiency
translates to reduced computational complexity, making the model a better fit for
systems with limited processing capabilities, such as the system developed in this
project. This targeted training approach not only elevates the model’s performance
in its primary task but also enhances its applicability and reliability, for identifying
and tracking the desired object [25].

The effectiveness of the detection model is significantly impacted by how well the
training data is balanced to avoid underfitting and overfitting, particularly when the
model is trained for a singular objective. Underfitting occurs when the trained model
is too simplistic, failing to capture the complexity and variability in the data. This
can lead to poor performance as the model cannot generalize well to new, unseen
scenarios. On the other hand, overfitting occurs when the model is excessively tai-
lored to the training data, capturing noise and anomalies as if they were significant
patterns. This may result in a model that performs well on training data but poorly
on new, real-world data, as it becomes too specialized [26]. To avoid this, a varied
and large dataset is used for the model training, allocating 80% of the dataset to
training and 20% for validation. To avoid overfitting, an “early stopping” algorithm
is implemented when training the model. This entails continuously monitoring the
validation metrics and stopping the training of the model if the metrics indicate a
performance plateau over a set amount of epochs, i.e. the model does not display
improved performance over time with more training.

4.4.4 Process Acceleration on CPU
With the trained model implemented in the preception system, the maximum al-
lowed throughput was deemed to be 100 [ms] to ensure that each iteration for the
full system could be completed within 200 [ms]. However, without accelerating the
process on the target hardware, the time for a single computational iteration (pre-
processing, inference, and post-processing) took approximately 600 [ms].

To mitigate the problem of insufficient inference rates in real-time applications, ma-
chine learning models are typically deployed on GPUs (Graphics Processing Units),
or TPUs (Tensor Processing Units), which are capable of conducting numerous
parallel operations. Alternatively, strategies such as model acceleration or sparsi-
fication, including pruning and quantization, could be employed to speed up the
inference rate [10].

As stated in Section 3.2, the target hardware unit within this project is an APC
containing only a CPU. Thus, common hardware acceleration techniques such as
the use of GPUs are not available. Instead, the focus is shifted towards model ac-

24


4. Perception

celeration and sparsification to reduce the complexity of the model and in return
increase the model throughput. To achieve this without compromising accuracy to a
significant extent, different tools can be applied. Within this project, a few methods
were investigated and are briefly highlighted below.

4.4.4.1 ONNX Runtime

ONNX runtime [27] is a machine-learning engine aimed at executing inference on a
wide range of platforms and hardware to accelerate the throughput. To obtain this,
the engine analyzes the model’s graph and determines how it can be optimized for
execution. Then the model is partitioned and the engine can thereafter dynamically
assign computational tasks, thus ensuring efficient execution of individual tasks and
a holistic optimization of the entire model.

4.4.4.2 OpenVino

For an Intel-based system OpenVino [28], short for Open Visual Inference & Neural
Network Optimization, can be applied to optimize and improve inference on a target
hardware application. Developed by Intel, the tool compresses the deep learning
models and supports deployment- and hardware optimization for a large number of
Intel CPUs, taking advantage of the specific hardware capabilities of each supported
device.

4.4.4.3 DeepSparse

DeepSparse [29] is an engine that utilizes sparsity to accelerate inference within neu-
ral networks on CPUs. By utilizing structured and unstructured sparsity, weights
with no impact on the system during runtime are known and can thereby be avoided
during runtime. To further optimize for CPU architectures, the runtime computa-
tions are organized into “Tensor-columns”, allowing effective cache utilization. This
is done by reducing the amount of data transportation in and out of the larger cache
memories, which usually is a large bottleneck for memory-bound systems [29]. The
DeepSparse tool facilitates acceleration for both dense models and models sparsefied
through quantization and pruning. In this project, both model types are evaluated
to investigate the performance enhancements of a reduction in model complexity.
The evaluation and selection of the acceleration method is displayed in Chapter 7.1.3.

25


4. Perception

26


5
Motion Planning

The primary objective of a motion planner is to determine how a mobile unit should
navigate through a specified environment. This includes deciding the desired path
that the mobile unit should take, as well as its associated states such as position,
velocity, and pose at each point in time. To achieve this, the motion planning
problem is divided into two parts, a global path planner and a trajectory planner.
The global path planner effectively links the mobile unit’s initial state to a set of
specified goal states and the trajectory planner then locally plans the desired states
along the path, taking the physical constraints into consideration, similar to [30].
Within this project, the environment is considered static and all objects are mapped
beforehand within the 2D space.

5.1 Path Planner

The path planner aims at finding a path between the start and goal states. By
assuming no dynamic obstacles except for the mobile unit itself affecting the envi-
ronment, a predetermined map can be used to determine the desired path. This
involves the creation of a grid map of the demonstration area, where the environ-
ment is discretized into a series of nodes. These nodes serve as stations in defining
and facilitating the navigation of the mobile unit’s path.

With a grid map defined, multiple approaches can be taken to find an optimal path.
Conventional trajectory optimization techniques such as search-based algorithms like
A*, or sampling-based algorithms such as RRT are commonly used [31,32]. However,
within this project, the focus is not on finding the optimal path in terms of distance,
energy minimization or time, but rather on ensuring that the path is aesthetically
pleasing, easy to modify and smooth. Therefore a graph-based approach is used,
where the shortest Euclidean distance between each node is interpolated and used
as the desired path for the mobile unit. This was done to reduce computational
complexity and rely on the motion controller to maintain a smooth and collision-
free path. A simplified example of such a grid map with a path is shown in Figure 5.1
to visualize how the system internally interprets the environment.

27


5. Motion Planning

Figure 5.1: Visual representation of the grid map with defined station nodes.

To define a desired velocity and behavior when approaching the different stations a
schedule is also provided to the path planner. This entails a specification of when
the mobile unit should arrive at each station, allowing the system to incorporate
more aspects of the desired behaviour into the final motion plan. An overview of
the path planner is presented in Figure 5.2.

Load graph & map
Define the grid map and
environment, marking ob-
stacles and nodes.

Set schedule
Incorporate schedule con-
straints into pathfinding.

Create environment and generate node to node path

Combine the environmen-
tal setup and the appli-
cation of the pathfinding
algorithm.

Linear interpolation between nodes
Smooth the path by calcu-
lating intermediate points
between nodes.

Trajectory planner

Figure 5.2: Flowchart of path generation in a grid map environment with a de-
scription for each process.

The purpose of this system is to generate a navigable path within a predefined envi-
ronment using the Cartesian coordinates x, y. The algorithm comprises processing
a schedule, generating a node-to-node path, and performing linear interpolation
between nodes to smooth out the path.

28


5. Motion Planning

5.2 Trajectory Planner

The purpose of a trajectory planner is to enable a robot to navigate its desired path
in a way that respects its physical limitations [33]. This involves determining a set
of reference states along the desired path that the mobile unit should adhere to. A
trajectory planner is designed as a subsystem within the greater motion planning
system, subsequently providing the reference states over the control horizon at each
time step, k. The trajectory planner translates the path information such that the
control system can manage detailed motor instructions for controlling its movement,
taking the mobile unit’s current state into account, the planned node-to-node path,
and the dynamic constraints of the environment.

With the information from the path planner, a set of references can be defined for
the mobile unit based on the given path. The states of the model include the x and
y coordinates and the heading angle θ. Additionally, a longitudinal reference veloc-
ity, v, is incorporated into the state vector, where the velocity reference is based on
the distance to the next station and the desired arrival time. To ensure that the
set references adhere to what is physically possible to achieve, physical constraints
are incorporated to saturate the references including the linear velocity and heading
angle. To ensure obtainable reference states, the x and y coordinates are also limited
to only within the boundary of the predefined demonstration area.

The reference generation’s main goal is to discretize a continuous reference path
into a series of states over the control horizon, N . The framework for generating a
linear reference trajectory uses two principal functions: global path sampling, and
segment-wise interpolation. The global path sampling function systematically in-
vokes the segment-wise interpolation to create a path of uniformly spaced points
from a given set of waypoints.

The global path sampling function operates on a set of n waypoints, W = {W1,
W2, . . . ,Wn}, that define the trajectory. The objective is to construct a sequence of
points P that captures the essence of the path with a desired resolution.
The step size, ∆s, is calculated as the product of the vehicle’s velocity v and the
control system’s sampling time interval ∆t. The function iteratively samples each
segment of the trajectory, where the distance ∆s determines the gap to the next
node on the generated path. Within each segment between consecutive waypoints,
Wi and Wi+1, segment-wise interpolation is executed. For a segment of length L
where L = ∥Wi+1 −Wi∥, a series of intermediate points are computed based on the
linear interpolation principle:

P (λ) = Wi + λ(Wi+1 −Wi), (5.1)

where λ is a parameter that increments in steps sized to maintain the spacing ∆s,
terminating once the segment is fully sampled.

29


5. Motion Planning

The segment’s interpolated points are:

Pk = Wi +
(
k · ∆s+R

L

)
(Wi+1 −Wi) (5.2)

for k = 1, 2, . . . such that k · ∆s ≤ L, and R is the remainder from the previous
segment’s interpolation, ensuring that the spacing between points remains consis-
tent across the segment boundaries. Each segment-wise interpolation yields a set of
points and a new remainder, which is carried forward to the subsequent segment,
preserving the geometric resolution.

The linear reference generation transforms a continuous trajectory into a series of
discrete, equally split waypoints, and a heading angle. The waypoints, together
with the heading angle serve as a reference for the control systems to dictate the
movement of the mobile unit along the predefined path. The algorithm uses linear
interpolation to ensure a predictable outcome. Given two known points, the inter-
polated points will always lie directly between them in a straight line, allowing for
a path that is both smooth and efficient.

In this project, several nodes are established to manage the operational location
of the mobile unit. Upon arriving at a designated node, the mobile unit stops its
movement for a predetermined duration to execute specific tasks at that station. As
an example of a station task, the unit could pause at a station equipped with QR
code identification technology until it is recognized, after which it will proceed to
the next designated node. The system used at OrangePoint has four such stations
which are predefined by entering the x and y coordinates of each station, along with
a priority level that determines the sequence in which the stations are visited. After
completing the sequence, the unit either resets and begins the cycle from the begin-
ning or terminates its operation at the final station, depending on the user input in
the configuration file.

The trajectory planner is horizon-based, which allows for planning over a predefined
number of steps or time intervals into the future on the given path. This will be
referred to as the reference horizon. This approach enables the trajectory planner
to respond to future conditions and objectives, allowing for adjustments to the set
of references as new information becomes available [34]. The length of the reference
horizon is directly dependent on the control horizon in the motion control system,
balancing the benefits of foresight against the need for timely decision-making.

30


6
Motion Control

To track the desired trajectory, a motion control system is developed. The system
can be further divided into two main segments, a high-level controller and a low-level
controller. The high-level controller is tasked with the core computations, aiming
to minimize the deviation between the reference and the estimated position. The
discrepancies from the intended path are then converted into actuation requests,
such as steering angle and longitudinal velocity to reduce the deviation over time.
The low-level controller then acts as an allocator, converting the requested actions
into motor commands and facilitating communication between the APC and the
remote mobile unit. A top-level overview of the motion control system, including
the flow and interaction of signals is presented in Figure 6.1.

Figure 6.1: Simplified motion control overview.

The motion control architecture of Figure 6.1 shows multiple interconnected sub-
systems, together resulting in the actuation of the mobile unit. Subsequent sections
will provide a more detailed description of each subsystem within this figure.

31


6. Motion Control

6.1 Motion Models
A motion model is derived to capture the dynamic behaviour of the mobile unit used
in this project and how current actions affect the system’s future states. The mo-
tion model is crucial for system validation because it is integrated into the simulator.
Additionally, it can be used in more complex controllers, enabling the prediction of
future states based on current states and actions.

Based on the design of the mobile units detailed in Chapter 3.2, two motion models
are derived. One with a rigid body and another featuring an additional degree of
freedom between the head and the trailer, known as an articulated body. Each
model is a simplification of reality and some approximations have been made. For
example, the Ackerman steering is considered parallel and the power distribution
between the front and rear axis is neglected.

6.1.1 Rigid Motion Model
Within this project, a simplified car model [35] is used to derive the dynamics
of the rigid mobile unit. Due to the wheel alignment and steering configuration,
certain constraints are imposed on the car, limiting the rotation around its z-axis
proportional to its wheelbase. Figure 6.2 shows the rigid motion model with its
position in two-dimensional space and heading orientation.

Figure 6.2: Simplified model of the rigid mobile unit.

From Figure 6.2, the discrete state vector can be represented by xk = [xk, yk, vk, θk]T .
The control inputs coupled to the states are denoted by uk = [ak, δk], where ak is
the longitudinal acceleration and δk is the steering angle deviation from its zero
position. To further model the mobile units dynamics, a non-linear motion model
in continuous time is derived in (6.1) with the mentioned states and control actions.

ẋ
ẏ
v̇

θ̇

 =


v cos(θ)
v sin(θ)

a
v
L

tan(δ)

 (6.1)

To find numerical solutions to the differential equations of (6.1) in discrete time, the
system is discretized using forward Euler discretization:

x̂k+1 = x̂k + f(x̂k, uk)∆t. (6.2)

32


6. Motion Control

The non-linear motion model is thereafter implemented into the simulator.

To further simplify the model for linear control system applications, the motion
model is linearized. Restricted by its physical limitations, larger changes in the
states are limited within a small time frame. Thus, a first-order Taylor expansion
is used to improve the accuracy of the linearized model at smaller state deviations
from the nominal state. This means that the system is linearized around a nominal
state (x̄, ū) to mitigate deviations between the linear and non-linear model around
a given operating point. The final linearized motion model becomes:

x̂k+1 = Axk +Buk + C. (6.3)

When using the linear motion model in control algorithms, the operating point is
continuously updated to ensure the accuracy of the motion model. Below are the
A and B matrices, together with the correction matrix C presented at an arbitrary
operating point (v̄k, θ̄k, δ̄k).

A = (I + A′∆t) =


1 0 cos(θ̄k)∆t −v̄k · sin(θ̄k)∆t
0 1 sin(θ̄k)∆t v̄k · cos(θ̄k)∆t
0 0 1 0
0 0 tan(δ̄k)

L
∆t 1

 (6.4)

B = (B′∆t) =


0 0
0 0

∆t 0
0 v̄k

L·cos2(δ̄k)∆t

 (6.5)

C =


v̄k · sin(θ̄k)θ̄k∆t

−v̄k · cos(θ̄k)θ̄k∆t
v̄k·δ̄k

L·cos2(δ̄k)∆t

 (6.6)

The C-matrix of (6.6) can be described as a correction term that accounts for
differences between the predicted and actual dynamics of the vehicle model. The
correction term calculates the difference between the actual system dynamics and
its linear approximation:

C = f(x̄, ū) − A′x̄ −B′ū. (6.7)

33


6. Motion Control

6.1.2 Articulated Motion Model
To account for the additional degree of freedom between the head and trailer of the
mobile unit shown in Figure 3.3b, the motion model (6.3) is augmented with an
additional state ψ as shown in Figure 6.3.

Figure 6.3: Simplified model of the articulated mobile unit.

Figure 6.3 shows the articulated mobile unit, highlighting the variables affecting the
relative joint angle. Based on the model given in Figure 6.3 and [36], the relative
joint angle, ψ, can be derived as the deviation between the heading of the trailer
and the heading of the mobile unit. The rate of change of the relative joint angle is:

ψ̇ = θ̇t − θ̇ = v

Lt

· sin(θ − θt) − v

L
· tan(δ). (6.8)

As there is no wheelbase affecting the steering capabilities for the head of the mobile
unit and assuming small joint angle deviations, (6.8) can be approximated as:

ψ̇ ≈ v

Lt

sin(ψ − δ). (6.9)

The rate of change in the relative joint angle can then be added to the current angle
and implemented in the augmented non-linear motion model as an additional state:

xk+1
yk+1
vk+1
θk+1
ψk+1

 =


xk + ∆t · vx cos(θk)
yk + ∆t · vx sin(θk)

vk + ∆t · a
θk + ∆t · vk

L
tan(δk)

ψk + ∆t · vk

lt
sin(ψk − δk)

 (6.10)

The motion model is linearized as for the rigid motion model above.

34


6. Motion Control

6.2 High-level Controller
The primary objective of the high-level controller is to minimize the trajectory de-
viation from the setpoints given by the motion planner. Achieving this objective
involves a recurrent process of identifying a sequence of viable control signals that
ensure that the mobile unit adheres to the intended trajectory. The complexity of
this type of system can vary considerably, where the constraints and cost minimiza-
tion can be handled as two separate entities or incorporated into a controller that
can handle both. This section presents the implementation of two different con-
trol strategies, a classical PID controller, and a model-predictive controller (MPC)
with the intent of evaluating what different levels of complexity yield in terms of
performance.

6.2.1 PID Control
To ensure a fully working system and to set a baseline for trajectory tracking, a sim-
ple Proportional-Integral-Derivative (PID) controller was implemented. To do so,
the control objective was defined as a single input single output (SISO) system with
the sole objective of tracking the current desired position and correcting the steering
angle to minimize the deviation. The discrete expression for the PID controller is:

uδ[k] = Kpe[k] +Ki∆t
k∑

i=0
e[k] +Kd

e[k] − e[k − 1]
∆t (6.11)

Here Kp, Ki and Kd are weights that were manually tuned to improve the perfor-
mance during testing.

The control action, denoted as uδ, for each iteration, k, consists of the accumulated
error from the current e[k] and previous e[k−1] states, combined with the predefined
weights. The accumulated sum yields a control action that affects the change in
steering angle to mitigate the observed error. The error term, e[k], represents the
current deviation between the mobile unit’s estimated centre point and the reference
position. The calculated error at each sample is given by:

e[k] = ∥xref,k − x̂k∥2, (6.12)

which represents the mobile units x and y coordinates as x̂k and the corresponding
reference point as xref,k. The error calculation serves as the foundation for deter-
mining the steering angle in the next iteration. To ensure that the resulting control
action is within the mobile unit’s feasible operational bounds the control action is
saturated, and an anti-windup solution is incorporated.

Since the mobile unit operates at low speeds and efficiently reaches the desired
longitudinal velocity within a reasonable time on all surfaces in the demonstration
area, there is no necessity for a dedicated controller for the longitudinal velocity
input. The velocity command is instead directly based on the reference from the
trajectory planner.

35


6. Motion Control

6.2.2 Model Predictive Control
To further improve the motion control system, more advanced controllers were in-
vestigated, allowing the mobile unit to handle complex tasks and solve difficult
manoeuvres in tight environments. This could be achieved in multiple ways but a
method with forward-looking capabilities that can incorporate constraints and lim-
itations into the problem formulation was desired. A full-state feedback design and
the ability to model constraints could be more effective in navigating tight environ-
ments than previous controllers as the end goal is to achieve a smoothly controlled
mobile unit [37]. For these reasons, an MPC approach was chosen.

MPC is a control strategy that explicitly accounts for future events to make current
decisions. Unlike PID controllers, which react to present errors, MPC formulates an
optimization problem that predicts future system behaviours over a given prediction
horizon, solving for the optimal control inputs at each step [38]. This forward-looking
capability allows this type of controller to manage constraints and multiple input,
multiple output (MIMO) systems more effectively, which was desired in this project.

An MPC formulation can contain both linear and non-linear dynamics. Non-linear
problems, while potentially more precise, are also more computationally demand-
ing [39]. In contrast, linear problems, though approximations, can be kept convex,
requiring less computational power to solve. Based on the available computational
resources for this project and the fact that higher precision is deemed unnecessary
for this application, a linear quadratic control problem is formulated with the linear
motion models presented in Chapter 6.1. The motion model is chosen depending on
what mobile unit is used. The final problem formulation and the objective function
are further described below.

6.2.2.1 Cost and Constraints

To ensure that the system maintains the desired reference trajectories with smooth
behaviour, an objective function is formulated that incorporates a set of costs de-
signed to penalize undesired behaviours. Each cost component is treated as a soft
constraint. This approach requires less computational resources than methods with
hard constraints and inequalities, even if a harder constrained problem yields a
smaller feasible set [40]. However, it presents a trade-off between accuracy and
computational complexity. The problem formulation in this project aims to min-
imize the overall cost, requiring more tuning to achieve the desired results while
a harder-constrained problem is less dependent on the tuning but instead is more
computationally expensive. The formulation of each cost and constraint is further
described below, inspired by [41,42].

Similar to the PID controller a state deviation cost is derived, in this case provid-
ing full state feedback where the deviation between the references states xref =
[xref , yref , vref , θref ]T and the current state vector x̂ is found at each step k. Each
state deviation is penalized with a cost-matrix Q, and computed over the entire
prediction horizon, N . Additionally, a terminal cost is added to the final state de-

36


6. Motion Control

viation at the horizon N , together accumulating the total cost for reference path
deviation over the entire horizon:

Jx = ∥xref,k − x̂k∥2
Q, k ∈ N[0,N ] (6.13)

Jτ = ∥xref,N − x̂N∥2
Qt

(6.14)

Additionally, to maintain the desired velocity given by the motion planner and
minimize steering effort, an actuation cost is implemented (6.15), weighed with a
matrix denoted R. To avoid oscillations and fast action changes, acceleration and
steering jerk are also penalized as an additional cost (6.16).

Ju = ∥uk∥2
R, k ∈ N[0,N−1] (6.15)

Ju′ = ∥uk+1 − uk∥2
Rd, k ∈ N[0,N−1] (6.16)

To define the feasible region of the solution and determine the bounds, a set of
inequality constraints is defined. These constraints limit the feasible region, helping
the solver to find an optimum within these bounds. Specifically, the control inputs
uk are constrained by minimum and maximum allowable values:

umin ≤ uk ≤ umax, (6.17)

ensuring that the vehicle’s actuators operate within safe and efficient limits.

To constrain the feasible solution to be within the bounding box of the demonstration
area, four additional inequalities were added as an upper and lower bound, denoted
xb and yb:

xb,min ≤ xk ≤ xb,max

yb,min ≤ yk ≤ yb,max.
(6.18)

These constraints limit the controller’s horizon to be inside the area, thus avoiding
a series of actions that could lead to collisions with the walls.

In addition to the inequalities, an equality constraint is defined to ensure that the
solution strictly adheres to the modelled system behaviour, ensuring feasible physical
solutions:

xk+1 = f(xk, uk). (6.19)

Together, these constraints ensure that the solution derived from minimizing the
cost of the objective function adheres to the physical- and operational limits of the
mobile units. Where the main intent is to estimate and achieve feasible actions that
the mobile unit can perform.

37


6. Motion Control

6.2.2.2 Problem Formulation

With all terms in the objective function being quadratic and the constraints being
linear, the final optimization problem, given in (6.20) and (6.21), is formulated as a
quadratic minimization problem over the horizon N . With a quadratic problem, a
convex solution can be guaranteed, meaning that any local minimum is also a global
minimum, ensuring the optimal solution at each iteration.

min
u

N−1∑
k=0

[Jx + Ju + Ju′ ] + Jτ (6.20)

s.t. ∀k ∈ N[0,N−1]

umin ≤ uk ≤ umax

xb,min ≤ xk ≤ xb,max

yb,min ≤ yk ≤ yb,max

xk+1 = f(xk, uk)

(6.21)

The quadratic problem formulation above entails a predictable and low-cost solu-
tion in terms of computational complexity, as iterating through several local minima
can be avoided if the problem is convex. With all terms in the objective function
being quadratic and using only linear constraints, convexity can be guaranteed by
ensuring that the objective function is positive and semi-definite.

There is a wide range of available solvers that can efficiently solve convex problems,
in this project, an interior point solver called ECOS is chosen from the CVXPY -
library [43]. The interior point method transforms the original problem into a se-
quence of approximate problems, which become progressively closer to the original
problem. Rather than handling the constraints directly, this method uses barrier
functions that make the cost of approaching the boundary of the feasible region
tend towards infinity [44]. Thus, ensuring that the solution is within the feasible
region without handling constraints in a way that would increase computational
complexity.

38


6. Motion Control

6.3 Low-level Control
As previously stated, the actuation of the mobile unit is facilitated by the imple-
mentation of a low-level controller. The subsystem serves an intermediate role by
processing the high-level motion requests into specific, executable motor commands
on the mobile unit. The available motor commands may vary based on the motor
configuration of the mobile unit. In this thesis, the mobile unit is controlled through
steering and velocity commands.

To maintain centralized control of the entire system and because the internal pro-
cessing unit of the mobile device is not directly accessible or adequate, the low-level
controller is located on the APC. From this setup, motor commands are transmit-
ted to the mobile unit via Bluetooth. This arrangement ensures reliable delivery
of actuation commands to the mobile unit once a Bluetooth connection is success-
fully established, eliminating the need for physical access to its internal components.
Each hub on a Lego Truck possesses a unique Bluetooth ID, therefore successful con-
nection to the intended mobile unit can be established. This centralized approach
also opens up the possibility of controlling multiple mobile units from a single APC
in the future.

Discretized with zero-order hold (ZOH), the mobile units will hold the previous
actuation commands between samples. The physical limitations within the mobile
unit also resulted in additional latency between the requested and fully completed
actuation. To mitigate a buffer build-up, resulting in increasing latency over time
and race conditions, the low-level controller cannot send commands without the
previous request being completed. To ensure this, a mutex lock is implemented to
verify that the mobile unit has processed and completed the previously requested
command before allowing another request to be sent, simultaneously taking care of
the buffer by choosing the most recent actuation requests.

39


6. Motion Control

40


7
Results

The performance of the system and its subsystems was evaluated by a series of tests,
both in simulation and on the physical hardware. This chapter presents the results,
covering both a system overview and the test of each subsystem to ensure that it
meets the requirements presented in Table 3.1. All tests were performed on an Intel
i5-8350U CPU with 4 cores and a clock speed of 1.7 GHz, together with 16GB of
RAM. The specifications are similar to the upgraded version of the APC presented
in Chapter 3.2.

7.1 Perception
In this project, the perception system is designed to process and interpret data
from a single sensor input, the camera, for use by the rest of the system. Thereby,
the precision and latency of the system largely depend on the performance of the
perception system. The perception subsystem is evaluated first separately and later
together with the entire system. The main applications of the subsystem are further
evaluated and presented below.

7.1.1 Dataset and Model Training Evaluation
To ensure consistent detection and tracking of the mobile unit, a dataset from the
demonstration area was used and the training results are shown in Figure 7.1.

Figure 7.1: Performance metrics of model training over 25 training epochs.

To assess the model’s learning and adaption to the new dataset some performance
matrices have been extracted and displayed in Figure 7.1. By analyzing the box_loss

41


7. Results

and cls_loss, high accuracy can be concluded as the model improves over the train-
ing epochs, both in locating and classifying objects correctly. The third column in
Figure 7.1 covers the distribution focal loss denoted dfl_loss, indicating an increased
correlation between the estimation of the bounding box coordinates and the ground
truth specified in the dataset.

The four plots on the right in Figure 7.1 highlight the final model’s performance.
These results display high values in both Precision and Recall, indicating that the
model can distinguish well between desired objects and non-desired objects. Higher
values for these metrics imply fewer false- positives and negatives. Additionally,
based on the metrics mAP50 and mAP50-95, the results suggest a rapid improve-
ment in the model’s ability to predict the bounding boxes for objects with at least
50 % overlap, as well as IOU thresholds ranging from 50 to 95 %. These results
indicate that the model has achieved a high level of accuracy in consistent object
detection and tracking. However, some fluctuations in the precision metric suggest
that the model was more inconsistent in detecting true positives in the early stages
of training.

7.1.2 State Estimation Accuracy
Estimating the mobile unit’s state is the key component of the perception system
and should be done with high accuracy to yield a stable and responsive system.
Within this project, the main states estimated with the perception system were the
Cartesian x and y coordinates of the mobile unit and the linear velocity. The re-
maining variables in the state vector, X, were deemed more sufficient to estimate
with the internal motion model. By giving the system a set of initial states, the
states in the next iteration can be estimated by the current actions set by the mo-
tion controller, thus mitigating the need to estimate the pose with the perception
system.

As the estimated position in the 2D space is critical to ensure that the mobile unit
follows the desired trajectory as intended, the accuracy of the estimated x and y
coordinates are evaluated in a small test. The test involved comparing the measured
true position to the estimated one given by the perception system. Table 7.1 shows
the average deviation between the true and estimated position in the 2D space.

Table 7.1: Standard deviation of position measurements

Position measurement Deviation from true position
x 0.0366 m
y 0.0133 m

|x, y| 0.0389 m

The results presented in Table 7.1 show that the estimated position can deviate ap-
proximately 4 cm from the true position. The position can deviate in any direction.
The deviation test was performed with a camera height of 2.28 m above the ground,
facing directly downwards.

42


7. Results

7.1.3 Process Acceleration
To evaluate the performance of each acceleration method presented in Chapter 4.4.4,
a sample video similar to the application is used. The sample video was a unique
set of frames, different from the frames in the dataset used for training the model.
With this, the latency (including pre-processing, inference, and post-processing) of
the system is measured. Thereafter, each framework is compared to find which
acceleration method yields the lowest computational time without compromising
accuracy to a large extent. The results from the performed tests are presented in
Figure 7.2 and Table 7.2.

Figure 7.2: Average latency for different object detection frameworks during run-
time.

Table 7.2: Average accuracy measurements of different acceleration methods.

Method Model Accuracy
.pt 74%

ONNX 79%
OpenVino 81%

ONNX Runtime 69%
DeepSparse Dense 91%

DeepSparse Quantizied 54%

Table 7.2 presents different acceleration methods and their accuracy rating. The
accuracy rating is measured based on the average confidence score of the object
detection during the test, where the period and frames were the same for all meth-
ods. As shown in Figure 7.2 the DeepSparse framework provides the lowest latency
on average, particularly with a dense model configuration. When analyzing the re-
sults in Table 7.2, the comparison between dense and quantized DeepSparse models
reveals a significant insight into the trade-offs between speed and accuracy. Both
DeepSparse models indicate a reliable level of performance across different configu-
rations. However, the dense model stands out for its balance of speed and precision,
as the loss of accuracy for the quantized model was significantly higher. Thereby,
the dense model with the Deepsparse engine was used for the remaining tests.

43


7. Results

7.2 Motion Control

In the motion control system, two controllers were implemented: a PID controller
and an MPC controller, each with significantly different levels of complexity. With
this evaluation, the main goal was to establish the complexity level required to
achieve the desired results navigating the tight demonstration area with the given
mobile units. To evaluate the controllers, the average deviation between the refer-
ence and the measured position of the mobile unit is computed at each point, as
shown by (7.1) over the simulation time, Ttot:

eavg = 1
Ttot

Ttot∑
k=0

√
(xr,k − x̃k)2 + (yr,k − ỹk)2. (7.1)

Additionally, the maximum path deviation emax and the standard deviation σ are
computed. This can be analyzed by (7.2), which is used to evaluate the reliability
and efficiency of the mobile unit navigation.

σ =

√√√√∑Ttot
k=0(ek − eavg)2

Ttot

(7.2)

Other than evaluating the two motion controllers’ ability to follow the desired ref-
erence states, the solution time of each controller, its ability to solve more complex
problems, and reach all desired stations within a given tolerance is investigated.

During the performed tests both in simulation and on the hardware, the PID con-
troller followed a set reference velocity while the setup for the MPC controller al-
lowed for adaptive velocity planning. This gave the MPC controller the ability to
regulate its speed while performing different manoeuvres, incorporating it into the
optimization problem.

7.2.1 Test Scenarios

To evaluate both motion control strategies used within the project, a set of test
scenarios where defined. Each controller is evaluated in all scenarios, both in sim-
ulation and on the hardware to verify performance and discrepancies between the
simulation results and the hardware performance. Below are some of the tests de-
scribed in more detail.

Figure 7.3 shows a simple test similar to a step response. However, due to the turning
capabilities of the mobile units used within this project, the reference change is not
a 90-degree turn. The test was done to evaluate the controllers settling time for a
reasonable reference change.

44


7. Results

(a) Demonstration area with sta-
tions and a helping node (•).

(b) Generated reference path from
the motion planner.

Figure 7.3: Simple step response test.

The second test, presented in Figure 7.4, contains several stations, made to resemble
a realistic run, constructed in a way similar to what the company wants to use for
demonstration purposes in the future. This entails going to several stations shown
in Figure 7.4a and completing different tasks. Figure 7.4b shows the generated
reference states given by the motion planner during the run.

(a) Demonstration area with sev-
eral stations.

(b) Generated reference path from
the motion planner.

Figure 7.4: Full cycle test between stations.

With the given stations in each test, Figure 7.3b and 7.4b show the reference tra-
jectory generated by the motion planner. With this approach, a simple and feasible
path can be generated with more direct control over the exact path, giving the
demonstration area more flexibility to get the system to behave as intended. One
example of this is displayed in Figure 7.3 where a helping node is added between
the two stations, indicating where the path should start its sharp turn.

45


7. Results

7.2.2 Control Tuning
The tuning of the controllers was performed manually. The parameters for each
controller were kept the same for both the PID and MPC during all hardware tests
and simulations. This intent was to highlight the deviation between the simulator
and hardware performance and also see how a statically tuned system would affect
the results for different reference changes.

The PID’s proportional gain (P) was set to an aggressive value to ensure quick
correction to enable sharp turns. The integral action (I) included an anti-windup
mechanism to reduce potential instability issues. The derivative component (D) was
set to a low value to minimize oscillations and avoid excessive system changes. The
parameters used for the PID controller during all physical tests and simulations are
presented in Table 7.3.

Table 7.3: Tuning parameters for PID-controller.

Kp 1000
Ki 0.5
Kd 5

For the MPC controller, a prediction horizon of N = 5 was chosen to balance the
travel between acceptable planning and avoiding shortcuts that could lead to miss-
ing stations. The constraints and cost functions were implemented as described in
Chapter 6.2.2.

The tuning of the MPC controller aimed at penalizing the position in x and y the
most to minimize the deviation between the true position and the reference position.
The weights for the velocity v and θ were configured to allow the mobile unit to
slow down and reverse if necessa