Camera-based State Estimation and Autonomous Motion Control Perception and Control of a Hauler Truck in a Demo Site Master’s thesis in Systems, Control and Mechatronics Kevin Bielecki Rasmus Ekedahl DEPARTMENT OF ELECTRICAL ENGINEERING CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2024 www.chalmers.se www.chalmers.se Master’s thesis 2024 Camera-based State Estimation and Autonomous Motion Control Perception and Control of a Hauler Truck in a Demo Site KEVIN BIELECKI RASMUS EKEDAHL Department of Electrical Engineering Division of Systems and Control Chalmers University of Technology Gothenburg, Sweden 2024 Camera-based State Estimation and Autonomous Motion Control Perception and Control of a Hauler Truck in a Demo Site KEVIN BIELECKI RASMUS EKEDAHL © Kevin Bielecki, Rasmus Ekedahl, 2024. Supervisor: Hanna Hermansson, B&R Industrial Automation Examiner: Martin Fabian, Electrical Engineering Master’s Thesis 2024 Department of Electrical Engineering Division of Systems and Control Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 31 772 1000 Cover: The demonstration area where the system is deployed. Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria Printed by Chalmers Reproservice Gothenburg, Sweden 2024 iv Camera-based State Estimation and Autonomous Motion Control Perception and Control of a Hauler Truck in a Demo Site Kevin Bielecki Rasmus Ekedahl Department of Electrical Engineering Chalmers University of Technology Abstract This thesis explores the development of an autonomous system, designed for B&R Industrial Automation to demonstrate autonomous solutions on their products with- out any operator input. The goal of this thesis is to develop an autonomous system that can manoeuvre a mobile unit between different stations, in a collision-free and smooth manner primarily to enhance sales demonstrations. With a single camera mounted in the ceiling, the system can make well-informed decisions using percep- tion, motion planning and motion control. Key components include a machine- learning model for perceiving the environment, a path- and trajectory planner, and a linear Model Predictive Control (MPC) system. The project resulted in a fully functional autonomous system that could execute demonstration runs, offering op- portunities for further development. Keywords: Autonomous systems, Computer vision, Object Detection, YOLO, Path-planning, Motion planning, Motion control, MPC, Machine learning. v Acknowledgements This master’s thesis was carried out at B&R Industrial Automation during the spring of 2024. We wish to thank our academic examiner and supervisor from Chalmers, Professor Martin Fabian, for his continuous support and feedback throughout this project. We also want to give special thanks to our supervisor, Hanna Hermansson, and everyone at B&R Industrial Automation who assisted us during this project. Your dedication, support, and the opportunity to work on this thesis have been irreplace- able. Kevin Bielecki, Rasmus Ekedahl, Gothenburg, June 2024 vii List of Acronyms Below is the list of acronyms that have been used throughout this thesis listed in alphabetical order: AMR Autonomous Mobile Robot APC Automation Personal Computer AVX2 Advanced Vector Extensions 2 CNN Convolutional Neural Networks COCO Common Objects in Context CPU Central Processing Unit GPU Graphics Processing Unit HMI Human-Machine Interface IoT Internet of Things IoU Intersection over Union mAP Mean Average Precision MPC Model Predictive Control PID Proportional – Integral – Derivative SSD Single-shot Detector TPU Tensor Processing Unit YOLO You Only Look Once ZOH Zero-Order Hold ix Nomenclature Below the nomenclature that has been used throughout this thesis is presented. Indices i Index for iterations k Index for discrete time step Parameters ∆t Time discretization step (time interval) [ms] t Time [ms] L Wheel base [m] Lt Trailer length [m] n Number of waypoints N Control horizon Variables δ Steering angle [rad] a Longitudinal acceleration [m/s2] x x-coordinate [m] y y-coordinate [m] θ Heading angle [rad] v Longitudinal velocity [m/s] ψ Relative angle of trailer [rad] x State vector xi P Point including x and y coordinate [m] xii Contents List of Acronyms ix Nomenclature xi List of Figures xv List of Tables xvii 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Ethics and Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Preliminaries 5 2.1 Computer Vision and Machine Learning . . . . . . . . . . . . . . . . 5 2.1.1 You Only Look Once . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Model Quantization and Pruning . . . . . . . . . . . . . . . . 7 2.2 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Convex vs Non-convex optimization problems . . . . . . . . . 8 3 Technical Concept 9 3.1 Company Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Hardware Specifications . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Automation PC (APC) . . . . . . . . . . . . . . . . . . . . . . 11 3.2.2 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.3 The Mobile Unit . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Conceptual overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Perception 15 4.1 Choosing the Perception Framework . . . . . . . . . . . . . . . . . . 15 4.2 Perception System Overview . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 State Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.1 Position and Velocity . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Model Performance and Training . . . . . . . . . . . . . . . . . . . . 21 4.4.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 22 xiii Contents 4.4.2 Image Annotation . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.4 Process Acceleration on CPU . . . . . . . . . . . . . . . . . . 24 4.4.4.1 ONNX Runtime . . . . . . . . . . . . . . . . . . . . 25 4.4.4.2 OpenVino . . . . . . . . . . . . . . . . . . . . . . . . 25 4.4.4.3 DeepSparse . . . . . . . . . . . . . . . . . . . . . . . 25 5 Motion Planning 27 5.1 Path Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Trajectory Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6 Motion Control 31 6.1 Motion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.1.1 Rigid Motion Model . . . . . . . . . . . . . . . . . . . . . . . 32 6.1.2 Articulated Motion Model . . . . . . . . . . . . . . . . . . . . 34 6.2 High-level Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.2.1 PID Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.2.2 Model Predictive Control . . . . . . . . . . . . . . . . . . . . . 36 6.2.2.1 Cost and Constraints . . . . . . . . . . . . . . . . . . 36 6.2.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . 38 6.3 Low-level Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 7 Results 41 7.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 7.1.1 Dataset and Model Training Evaluation . . . . . . . . . . . . 41 7.1.2 State Estimation Accuracy . . . . . . . . . . . . . . . . . . . . 42 7.1.3 Process Acceleration . . . . . . . . . . . . . . . . . . . . . . . 43 7.2 Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 7.2.1 Test Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 7.2.2 Control Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . 46 7.2.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2.4 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.3 Full System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.3.1 Hardware Assessment . . . . . . . . . . . . . . . . . . . . . . . 50 7.3.2 Solution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.3.3 Reliability and Accuracy Assessment . . . . . . . . . . . . . . 51 8 Discussion 53 8.1 Perception Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 53 8.2 Motion Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 53 8.3 Motion Control Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 54 8.4 Latency and Hardware Performance . . . . . . . . . . . . . . . . . . . 55 9 Conclusions 57 9.1 Future Improvements and Development . . . . . . . . . . . . . . . . . 58 Bibliography 59 xiv List of Figures 1.1 Demo site with marked stations at B&R industrial automation in Malmö. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 Different methods of object detection classification . . . . . . . . . . . 6 2.2 YOLO architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Distinction between a convex and a non-convex function. . . . . . . . 8 3.1 Mobile Automation PC 3100 . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 ArkCam basic+ Mini 130 and table of specifications. . . . . . . . . . 12 3.3 The two mobile units used in the project. . . . . . . . . . . . . . . . . 12 3.4 Camera setup in the demonstration area. . . . . . . . . . . . . . . . . 13 3.5 Full system overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1 Performance comparison of different YOLO models . . . . . . . . . . 16 4.2 Flowchart of the three-phase principle of the perception system. . . . 17 4.3 A frame of the Mercedes Lego truck with active perception system, displaying the coordinate system and the detection area . . . . . . . . 19 4.4 Illustration of the position, velocity and timestamps for the mobile unit 20 4.5 Undesired detections and false negatives with a pre-trained YOLO model on the COCO dataset. . . . . . . . . . . . . . . . . . . . . . . 22 4.6 Sample of the labelled dataset used for the perception system for testing on the Mercedes truck. . . . . . . . . . . . . . . . . . . . . . . 23 5.1 Visual representation of the grid map with defined station nodes. . . 28 5.2 Flowchart of path generation in a grid map environment with a de- scription for each process. . . . . . . . . . . . . . . . . . . . . . . . . 28 6.1 Simplified motion control overview. . . . . . . . . . . . . . . . . . . . 31 6.2 Simplified model of the rigid mobile unit. . . . . . . . . . . . . . . . . 32 6.3 Simplified model of the articulated mobile unit. . . . . . . . . . . . . 34 7.1 Performance metrics of model training over 25 training epochs. . . . . 41 7.2 Average latency for different object detection frameworks during run- time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 7.3 Simple step response test. . . . . . . . . . . . . . . . . . . . . . . . . 45 7.4 Full cycle test between stations. . . . . . . . . . . . . . . . . . . . . . 45 7.5 Simulator interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.6 Controller comparison in simulation in a step test. . . . . . . . . . . . 47 xv List of Figures 7.7 Controller comparison in simulation in a full cycle test. . . . . . . . . 48 7.8 Controller comparison on the target hardware for a step test. . . . . . 48 7.9 Controller comparison on the target hardware for a full cycle test. . . 49 xvi List of Tables 3.1 System requirements for the project. . . . . . . . . . . . . . . . . . . 10 3.2 Automation PC 3100 specifications . . . . . . . . . . . . . . . . . . . 11 3.3 ArkCam basic+ Mini 130 specifications . . . . . . . . . . . . . . . . . 12 4.1 Performance metrics for different YOLOv8 models. . . . . . . . . . . 21 7.1 Standard deviation of position measurements . . . . . . . . . . . . . . 42 7.2 Average accuracy measurements of different acceleration methods. . 43 7.3 Tuning parameters for PID-controller. . . . . . . . . . . . . . . . . . . 46 7.4 Plot of reference path deviation error in simulation. . . . . . . . . . . 47 7.5 Reference path deviation error in simulation. . . . . . . . . . . . . . . 48 7.6 Reference path deviation error on the target hardware for a step test. 49 7.7 Reference path deviation error on the target hardware for a full cycle run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.8 Average update time, max update time and solver time in seconds for full cycle test with different controllers. . . . . . . . . . . . . . . . . 51 xvii List of Tables xviii 1 Introduction As technology evolves, the pursuit of automation expands in several areas [1]. Within the field of autonomous systems, perception and control are two fundamental chal- lenges that play an important role in enabling systems to make well-informed de- cisions based on their surroundings. By continuously monitoring and gathering information from the surrounding environment, the system can update its internal representation of the world. This technique is often referred to as state estimation and its data can be used in decision-making systems such as a motion planner to determine feasible or optimal routes from a starting point to a goal. Further, motion controllers are used to ensure that the planned path is maintained. By combining these components, autonomous systems could navigate through complex and dy- namically changing environments to reach desired locations on time, efficiently and safely. 1.1 Background This master thesis is a collaborative project with B&R Industrial Automation. To display pioneering technology and digital innovation with B&Rs products, a show- room called OrangePoint is facilitated at the main office in Malmö. Within Or- angePoint, one of the demonstrations showcased is a small-scale site containing a miniature hauler truck referred to as a mobile unit, displayed in Figure 1.1. The mo- bile unit can be controlled remotely by an operator via Bluetooth communication, to drive the truck between different stations using a computer from B&R. To further improve this showroom, B&R wants to implement a fully automated system that can replace the operator, navigating the area and driving the mobile unit between multiple stations. The goal is to complete various tasks and interact with different products from B&R and external suppliers. 1 1. Introduction Figure 1.1: Demo site with marked stations at B&R industrial automation in Malmö. The demonstration site displays a fusion of Internet of Things (IoT) and Cloud services, incorporating third-party solutions and enabling remote connectivity. It showcases how various industries can develop a resilient, robust and future-ready platform with B&Rs hardware and software solutions [2]. The proposed addition of an autonomous feature aims to enable the mobile unit to autonomously navigate a complex environment by leveraging real-time data analysis, machine learning algo- rithms, and computer vision, deployed on a hardware control unit from B&R. This feature is anticipated to ensure safe and efficient operation of the mobile unit while travelling between different stations within the demonstration area. Its integration is seen as a crucial step in enhancing the platform’s capabilities and displaying how B&Rs technologies can be utilized at the forefront of mobile automation. The demonstration area is a compact 2 × 2 meter square, designed to simulate a sand-covered environment using a base layer and larger piles of orange plastic. This area also contains a miniature mobile digger and various equipment with the poten- tial of expanding with more features and products in the future. The layout features static obstacles, sharp turns and areas where a combination of reversing and going forward is necessary to navigate and effectively reach the desired positions, thus posing several challenges when designing an autonomous system. For this project, Figure 1.1 illustrates four pre-determined stations. Loading ma- terial at station 1, camera and QR-code identification at station 2, weighing of the loaded vehicle at station 3, and finally unloading at station 4. The objective is for the mobile unit to autonomously navigate between these stations in a safe and ro- bust manner. At each station, the mobile unit will pause and wait for a “go-ahead” signal, which indicates the completion of the required process before moving on to the next station. 2 1. Introduction 1.2 Aim The primary goal of this thesis is to investigate different approaches, and develop a system for the perception and control of an autonomous mobile unit with hardware from B&R. The results should be presented on a mobile unit that will travel between different stations inside a demonstration area without any operator input. 1.3 Limitations To define the scope of the project, the following list of limitations and aspects are considered within this thesis. • The system is only intended for the demo site and therefore not general pur- pose, meaning modifications of setup or different locations will require adjust- ments and additional work to ensure the same performance. • The primary goal of this project is to achieve smooth operation of the mobile unit rather than optimal computational efficiency. • The system is limited to only one mobile unit, thus not considering other mobile units within the area. • The perception system will not consider the location of anything but the mobile unit. All other obstacles are considered static, and therefore their locations are predefined. 1.4 Research Questions This project aims to implement an autonomous system that enables the mobile unit to navigate between different stations. To assess the system and guide development, the following research questions will be explored. 1. To what extent can camera-based state estimation of a mobile unit be used to determine different kinematic properties, including position, speed, and relative angle of joints? 2. What type of control strategy will ensure sufficient motion planning and con- trol of the autonomous mobile unit within the predefined area to ensure safe and efficient navigation without operator input? 3. What are the key factors affecting the reliability and accuracy of the au- tonomous system controlling the mobile unit, and how can these factors be managed? 3 1. Introduction 1.5 Ethics and Sustainability This thesis aims to explore the field of perception and control of autonomous mo- bile units, a technology that has many upsides when implemented on a larger scale. This project is a proof of concept on a small scale within a secure area. However, the same technologies and principles can be applied within other industries and on a larger scale. Therefore it is important to consider the sustainability and ethical implications of the project, as well as the potential consequences this technology could have for the future. Below some important ethical and sustainability aspects are further addressed primarily focusing on autonomous transportation vehicles. The perception and control within the autonomous system must be reliable and robust to ensure safety for the vehicle and its surroundings. However, in the case of failure, the autonomous system must be able to hand over control or go into a fail-safe mode in order to come to a collision-free and safe stop [3]. At the same time, removing the "human factor" from critical operations could lead to a reduced risk of errors and mitigate the risk of human injuries. The perception functionality of an autonomous system generally contains sensitive data about its environment. Whether the data is collected with cameras, lidars or GPS, the contents must be protected from external operators with ill intent. It is also important to ensure the integrity of other people is upheld. To mitigate this problem it is critical that this data is properly protected and defining data destruc- tion as a continuous process [3]. While autonomous transportation could improve the social working conditions within several industries, by removing monotonous work and heavy lifting, it is important to consider the aspect of a increased lack of jobs for humans. Truck- and forklift drivers and many more occupations risk replacement in the future as autonomous driving technology evolves [3]. Therefore the impact of deploying autonomous tech- nology should be assessed in each separate industry and a plan for relocation the workforce should be made. From an environmental perspective, autonomous mobile units’ capability to opti- mize routes and driving behaviours can also result in enhanced energy efficiency compared to manual operations. This efficiency could lead to decreased emissions, contributing positively to environmental sustainability, while also enabling cost sav- ings for companies applying these technologies [3]. 4 2 Preliminaries This section establishes the theoretical foundation for the report, providing the necessary background on concepts and methods essential to this thesis. 2.1 Computer Vision and Machine Learning Enabling machines to interpret and understand visual inputs, could often involve identifying and locating objects within an image [4]. Object detection can be achieved by classifying different parts of each image into various categories and us- ing techniques like deep learning and convolutional neural networks (CNNs). There are several different approaches within this field, where algorithms such as Region- based Convolutional Neural Networks (R-CNN) or You Only Look Once (YOLO) algorithms are commonly used. These models improve the speed and accuracy of detection by focusing on specific regions of interest within the image and executing classification in a single pass. These networks learn to recognize patterns and fea- tures from larger datasets and labelled images and are widely applied within different industries, where real-time accuracy is crucial. Ongoing research and development in object detection focus on increasing the robustness and efficiency of these models. Efforts include improving the training datasets to cover more diverse scenarios and conditions, optimizing algorithms to reduce computational demands, and refining accuracy to distinguish between closely similar objects [5]. 2.1.1 You Only Look Once You Only Look Once (YOLO) is a deep learning-based algorithm mainly used for object detection claiming to provide real-time performance, high accuracy, and is open source [6]. The YOLO model can estimate both bounding boxes and predicts object classes simultaneously, while still maintaining high accuracy. The object detector used in YOLO is a single-shot method, meaning that the entire frame of the image is analyzed and made predictions, all at the same time. This approach differs from many other methods, such as RCNN or Fast RCNN, which first detect possible regions of interest and then perform image recognition. Figure 2.1 classifies different algorithms, where the main distinction between the different approaches suggests that single-shot methods result in improved real-time performance while two-shot detection yields a higher accuracy. 5 2. Preliminaries Figure 2.1: Different methods of object detection classification Figure 2.1 describes the branching of the different object detector methods and whether they are of Two-stage or One-stage classification. The YOLO-model is a CNN that can be used to recognize and identify items with high speed and accu- racy [6]. The detection model consists of 24 convolution layers where 20 of these are pre-trained. These are then followed by 2 fully connected layers, which in the end yield a 7 × 7 × 30 tensor of predictions. This direct prediction mechanism is what enables YOLO to achieve its high speed, differentiating it from other detection sys- tems that often employ separate steps for feature extraction, proposal generation, and object classification. The YOLO architecture is shown in Figure 2.2 with the 7 × 7 × 30 tensor as the final output. Figure 2.2: YOLO architecture [6]. Since the first version of YOLO was published in [6], different versions of the algo- rithm including YOLOv3, YOLOv4, all the way up to YOLOv9 have been developed. Each iteration aims to improve its detection accuracy and speed while keeping a low 6 2. Preliminaries computational complexity to achieve real-time performance. Further information and comparison of different versions of the YOLO algorithm can be found in [7, 8]. 2.1.2 Model Quantization and Pruning One main goal when designing new deep-learning models is improving the accuracy. However, this commonly also results in larger model sizes. Consequently, with larger models comes the need for more computational resources. Simultaneously, there is a demand for the deployment of high-precision models on less powerful hardware, both in terms of cost and scalability. Two effective strategies for achieving these objectives are pruning and quantization. These techniques not only help in scaling down the models but also ensure that their precision remains as close to the more dense models as possible [9]. Large sets of weights in deep-learning models are commonly replicated more than once, lack content, and are together with different pathways not important after training the model [10]. Pruning involves removing parts of the model that are considered less important or redundant for deployment. The primary goal is to re- duce the complexity of the model without significantly impacting its performance or accuracy. Reducing the model parameters can result in a model with improved runtime performance and reduced computational complexity. Additionally, most weights in a model are typically too precise for runtime appli- cations and this type of precision is generally not needed after the model has been trained [10]. Quantization refers to the method of reducing the precision of the numbers used to represent the model weights. With quantization, the computa- tional power and memory needed to run the model can be reduced using lower precision datatypes such as 8-bit integers instead of the standard high precision 32- bit floating points. 2.2 Optimization problems An optimization problem seeks the optimal solution from a set of possible choices. The primary objective of such problems is to find the minimum or maximum value of a function, known as the objective function, under a set of constraints. These constraints are typically expressed as inequalities and equalities that the solution must satisfy [11]. An example of a mathematical formulation of an optimization problem is presented in (2.1). min x f0(x) subject to gi(x) ≤ bi, i = 1, . . . ,m hj(x) = 0, j = 1, . . . , p (2.1) In (2.1), the vector x is the decision variable, and x∗ is the optimal solution to the problem. The function f0 represents the objective function or cost, that is to 7 2. Preliminaries be minimized, while gi(x) and hj(x) represent inequality- and equality constraints that the solution must respect. The constraints ensure that the solution not only optimizes the objective function but also remains within a feasible region defined by the set limits. A solution to the optimization problem corresponds to a choice that has a minimum cost, among all choices that meet the constraints [11]. 2.2.1 Convex vs Non-convex optimization problems The nature of the objective function and the set over which the optimization is performed can be divided into two different types of optimization problems: convex and non-convex, with their differences illustrated in Figure 2.3. Figure 2.3: Distinction between a convex and a non-convex function. A convex optimization problem is generally characterized by every local minima also being a global minimum within the feasible set. This property significantly simplifies the search for an optimal solution as convex problems only have a unique solution or multiple solutions forming a convex set. In comparison, non-convex functions contain both local and global minima, posing a larger challenge to find the optimal solution as the problem becomes more complex and computationally heavy [12]. The general trade-off between these types of optimization problems is the gain in accuracy compared to the added computational complexity. Non-convex problems allow for more complex problem formulations that could yield a higher accuracy than a convex function, however, solving these types of problems generally requires more computational power to converge to the optimal solution. 8 3 Technical Concept The main purpose of this thesis is to research and develop methodologies that enable a mobile unit to navigate fully autonomously within a pre-defined area. This can be divided into three partial problems, state estimation, motion planning, and motion control, which are open-ended problems and can be solved in multiple different ways. Thus, to highlight the aspects that form the basis of the chosen technical concept, the following sections focus on how requirements and limitations affect the choice of technical solutions for the project. 3.1 Company Requirements Ensuring that the technical solution is adapted to its use case is an important as- pect of the autonomous system which will primarily be used in sales demonstrations, showcasing the capabilities of B&R’s hardware and software in autonomous applica- tions. To achieve this alignment, a set of system requirements was defined together with the company, ensuring that the solution covers all desired aspects. Table 3.1 presents the system requirements for the project. 9 3. Technical Concept Requirement Priority Details Performance analysis of hardware Must have Explore the potential applica- tions of B&R’s hardware tech- nology with software that re- quires significant computational resources. Modularity Must have All parameters that affect the sys- tem behaviour should be simple to change to display different sce- narios. State estimation with a camera Must have Estimate the position of the mo- bile unit, relative angle, and ve- locity. Cropping of image Must have Used to specify what parts of the camera frame objects should be detected within. Should be easy to adjust based on the location of the demo area. Go through production cycles Must have Go between different stations within the site, where the location of each station can be adjusted. Defining restricted areas Must have A configurable area where the ve- hicle is allowed to move. Efficient and smooth navigation Must have Opt for a visually appealing and smooth route to enhance the au- dience’s perception. Automatic adjustment of crop- ping parameters Nice to have Automatic detection of demo lo- cation and orientation to adjust cropping parameters. Adaptive visual mapping Nice to have Mapping with the camera obsta- cles and areas that are difficult to traverse within. Table 3.1: System requirements for the project. In Table 3.1, the tasks listed as must have are critical for the project and must be implemented. The topics that are listed as nice to have are to be implemented if time permits. From the company’s perspective, the autonomous system actuating the mobile unit should be appealing to customers, displaying a complex technical solution with intuitive and smooth movements and decisions. The project therefore prioritises a final product with the above-mentioned features rather than finding the optimal solution in terms of shortest path, energy efficiency etc. 10 3. Technical Concept 3.2 Hardware Specifications Based on the requirements above, a technical analysis was performed on the given hardware to further evaluate its capabilities. This was mainly done for benchmarking purposes and to eliminate any potential bottlenecks later on in the project, concern- ing hardware specifications. Below, a description of each hardware component used in the project is specified. 3.2.1 Automation PC (APC) The processing unit used in this project is the APC Mobile 3100, a product from B&R Industrial Automation displayed in Figure 3.1. The computer houses a PLC and a Linux operating system, both using an Intel processor for computations. All work done within this project is done on the Linux side of the APC. The objective is to integrate the entire system within a single B&R hardware unit, along with components from other suppliers. If necessary, the selected APC can be upgraded to an APC Mobile 3100 with an Intel i7 central processing unit (CPU) and increased RAM. Parameters Specifications Material number 5MPC3100.K038-000 CPU model Intel Celeron 3965U CPU speed 2.2 GHz RAM 8 GB Figure 3.1: Mobile Automation PC 3100 [13]. 11 3. Technical Concept 3.2.2 Camera The camera selected for this project is an ArkCam image sensor, specifically de- signed for monitoring both mobile and stationary industrial environments [14]. The sensor choice by the company is strategic, primarily because of its widespread use in mobile applications. The goal is therefore to incorporate it in the demonstration area to show its capability to be integrated seamlessly with B&R’s hardware along- side other components from various suppliers, aiming to deliver a complete system solution. Figure 3.2 provides sensor specifications and shows the ArkCam sensor. Parameters Specifications Max videostream 1280x720@60fps Latency <100 ms Power consumption <3 W Viewing angle 130° Figure 3.2: ArkCam basic+ Mini 130 and table of specifications. 3.2.3 The Mobile Unit In this project, the used mobile units are miniature Lego trucks designated to demon- strate the autonomous functionality of the system. However, the result of this thesis work is meant to be applicable across a range of sectors, including the autonomous mobile robot (AMR) industry, automotive- and construction industries, and other fields that employ similar technologies and the Lego trucks are utilized for proof of concept. The two mobile units employed in this project are shown in Figure 3.3. (a) Rigid truck for testing during development. (b) Articulated truck used in the demonstration area. Figure 3.3: The two mobile units used in the project. Figure 3.3a, shows a truck with rigid dynamics i.e. the steering capabilities are directly influenced by its wheelbase. In comparison, Figure 3.3b has another degree of rotational freedom around the joint between the head of the truck and the trailer. The latter truck is the main unit used in the demonstration area. However, since it 12 3. Technical Concept is commonly used for sales purposes, the rigid truck it has been utilized for testing the system during development. Both mobile units are controlled with longitudinal drive motors and a motor con- trolling the requested steering angle, where the steering geometry is Ackerman. The main implication of using two different mechanical setups is the manoeuvrability of the vehicle. Simply put, a rigid body provides simpler dynamics when performing more complex actions such as reversing etc. However, when comparing turning ca- pabilities the articulated mobile unit allows for better manoeuvrability as the pivot point allows for sharper turns, making it more suitable when manoeuvring tight spaces such as the demonstration area [15]. The trade-off between complexity and manoeuvrability is further analyzed mathematically in Chapter 6.1. 3.3 Conceptual overview With the specified hardware and desired functionality of the system, a conceptual overview could be determined, including operational behaviour and technical solu- tions. With the current functionalities considered and the fact that the mobile unit does not hold an adequate internal processing unit, a centralized approach was taken where the APC estimates the position, plans the desired path and actuates the mo- bile unit based on a single sensor, a camera mounted above the demonstration area as shown in Figure 3.4. (a) Camera mounted in the ceiling. (b) Field of view from the camera. Figure 3.4: Camera setup in the demonstration area. 13 3. Technical Concept A simplified overview of the system architecture is presented in Figure 3.5, where the system is divided into 3 larger sections, Perception, Motion Planner, and Motion control. Figure 3.5: Full system overview. With a centralized approach, all subsystems must run on a single CPU, limiting the computational complexity and thereby what can be achieved in terms of real- time performance. Therefore, this aspect must be considered throughout the whole project. With a single camera as the available sensor and providing the system with real- time updates, the desired states of the mobile unit can be estimated. This will be achieved using a machine-learning model for object detection within the perception system. Simultaneously, the desired goal state is given as an external input to the system. The motion planner then combines this with the current state from the perception system to plan the desired path and define a set of desired states at each iteration. Finally, a motion controller utilizes all these inputs combined with a pre- defined feasible area to decide how to manoeuvre the vehicle to reach the desired goal state collision-free and smoothly. The upcoming chapters will delve deeper into each of the subsystems discussed, providing detailed explanations of the reasoning behind the decisions, methodologies, and implementation processes. 14 4 Perception In robotics and autonomous systems, accurately perceiving the environment is im- portant for effective and accurate operations. Perception involves the use of various sensors to gather data, which is then processed to estimate the state of the system. These states can be essential as they provide the system with an object’s position, orientation, and other desired dynamic attributes critical to its decision-making pro- cesses. Among the sensors available, cameras are particularly valuable due to their rich data capture. To leverage this data effectively, object detection frameworks can be employed, which can enable recognition and tracking of objects within the camera’s field of view. There are several frameworks available for object detection, each with its unique strengths and applications. Some of the most recognized are R-CNN, SSD, and YOLO models [16]. 4.1 Choosing the Perception Framework In this project, a camera serves as the primary input sensor, capturing visuals of the system’s environment. Utilizing computer vision techniques, the developed system is designed to detect, recognize, and track the mobile unit, leveraging this data to es- timate the desired states of the mobile unit. The challenge in accurately estimating the state from visual inputs lies in dealing with varying visual conditions, potential obstacles, undetectable objects and the necessity for processing the data at real-time speed. A critical aspect of this project is identifying and consistently tracking the targeted mobile unit in real time, which is essential for a reliable system. With these aspects in mind, a one-stage method was desired to minimize the model complexity, narrowing the selection down to YOLO or SSD. Based on prior comparisons the YOLO framework was selected due to its smaller model size, comparable real-time accuracy and its compatibility with other tools [17]. The trade-off between speed and accuracy became more nuanced as the YOLO framework evolved. Each YOLO model is aimed at increasing either speed, ac- curacy, or a balance of both. YOLOv4, for instance, is noted for its robustness and efficiency in real-time settings. YOLOv5 for its new innovations, such as new network backbones, improved data augmentation techniques, and optimized train- ing strategies. YOLOv7 and YOLOv8 further push the boundaries in terms of accuracy, integrating techniques from the latest research to improve detection per- formance [18]. Figure 4.1 shows a comparison of various YOLO models, illustrating 15 4. Perception the correlation in model size, speed, and accuracy for the different models. (a) The relationship between model complexity and detection accuracy. (b) The tradeoff between infer- ence speed and accuracy for the same models. Figure 4.1: Performance comparison of different YOLO models [18]. From Figure 4.1 it can be observed that YOLOv8 maintains a leading accuracy rate while it has a slightly larger number of parameters compared to YOLOv6-2.0. This indicates a higher efficiency in parameter utilization since it achieves higher accuracy with a comparable number of parameters. In addition to this, YOLOv8 displays the lowest latency. For a real-time application, this is critical as it means the model can process and analyze frames more efficiently. With all this considered, YOLOv8 is the most balanced option for the project’s real-time application. It achieves the highest accuracy and does so at the least latency. During the course of the project, a new iteration of the YOLO algorithm, YOLOv9, was released. Comparisons between the previous version, YOLOv8, and YOLOv9 suggest that the latter offers reduced complexity and enhanced performance for ob- ject detection [19]. However, as YOLOv9 is in the initial stages of deployment, and compatibility with additional tools such as trackers, visual aids, and other function- alities remains limited, this poses challenges for its integration. Therefore the choice of using YOLOv8 for this project remains. The perception system in this thesis is based on the YOLOv8 model, with a custom- trained model dataset. The YOLOv8 model handles object detection, classification, and segmentation tasks. YOLOv8 introduces improvements over previous YOLO versions, such as better feature extraction, more sophisticated backbones and fea- tures that make it easier to use and tailor for the project’s specific application. This results in enhanced accuracy and lower latency, especially in challenging scenarios like small object detection or in conditions with poor lighting or occlusions. In ad- dition to this, YOLOv8’s architecture allows for efficient custom training with new datasets. In comparison to other models like Faster R-CNN, SSD, or Mask R-CNN, YOLOv8 offers a superior balance of speed, accuracy, and flexibility [18]. 16 4. Perception 4.2 Perception System Overview The perception system works on a three-phase principle: process video, process frame, and annotate frame. Figure 4.2 shows a flowchart of the working principle. Each principle of the system is structured as a distinct task within the ’State Es- timation’ group. This approach organizes the different parts of the program into specific, manageable sections, each responsible for a particular aspect of the overall process. Figure 4.2: Flowchart of the three-phase principle of the perception system. Figure 4.2 presents a visualization of the three-phase principle and how the principles of the perception system communicate. The perception system features a dynamic configuration script which allows the configuration of settings to be adjusted. The process video principle captures raw footage from the chosen video source. Then, the process frame principle uses the YOLOv8 model for executing computer vision tasks, enabling to detect and track objects. The model continuously updates for each frame to maintain accurate detection and tracking. Furthermore, the annotate frame principle ensures that all detections are confined within the pre-set detection zone. The system operates in a loop, consistently refreshing the visual output in sync with the system’s frequency. In this thesis, the detection zones have been specifically set to suit the sandbox area at OrangePoint in Malmö. This configuration ensures that detections are limited to objects within the sandbox, effectively filtering out irrelevant targets. The video input for the system is streamed from a live camera feed, strategically placed in the ceiling above the sandbox. Both the detection zone and video input source are customizable and can be tailored to desired specifications through a configuration file. The perception system is designed to accept inputs into the detection zone that are proportionate to the camera’s field of view. This configuration assumes that the sandbox at OrangePoint in Malmö remains unrotated. Any rotation would intro- 17 4. Perception duce inaccuracies in the position calculations within the state estimation process. The location of the unrotated sandbox can be configured depending on the desired position. 4.3 State Estimations State estimation is a process aimed at deducing the state of a system, such as position, using observed data. It uses a mathematical model, later described in Chapter 6.1, that describes how the system’s state changes over time and how this state correlates with the observed data. In the real world, measurements often come with noise and may not be complete. When implementing state estimation with a camera and computer vision, the approach involves using a camera to capture images or video frames, which act as observational data. These images may include objects whose states (like position or orientation) are to be estimated. Computer vision methods are then applied to identify and track features of objects across successive frames. Techniques such as optical flow or object detection algorithms, for instance, YOLO, are commonly used [7]. The detected features allow the system to estimate movements or positional changes of the objects. Many applications require these estimations to be performed in real-time, necessitating the use of efficient algorithms and, in some cases, the support of hardware acceleration. 4.3.1 Position and Velocity In this thesis, the YOLOv8 computer vision model is used together with a cam- era to enable real-time tracking of a mobile unit. Object detection, the core task of this model, involves specifying the location and categorizing objects within an image or video stream. In this thesis, a video stream acts as the visual input for the model. The detector’s output consists of bounding boxes that encompass the identified ob- jects in each frame, together with class labels and confidence scores. To track the mobile unit, separate tracking algorithms like BYTE, SORT (Simple Online and Real-time Tracking) or DeepSORT can be used [20]. These algorithms take the de- tections from YOLOv8 and apply a series of steps to achieve continuous tracking of objects as they move across the video frames. The primary challenge is to maintain the identity of each object from frame to frame, despite changes in position, orienta- tion, scale, or in interaction with other objects. In this thesis, BYTE is used to track the mobile unit, because of its robust and accurate detection performance. BYTE uses the help of associating multiple low-score detection boxes as it can indicate the existence of objects. It also highlights the method for using detection outcomes to improve multi-object tracking [20]. The position is gathered by extracting the centre point of the detection box for the tracked object, irrespective of its orientation. The x and y coordinates for the detected objects are updated with each updated frame, in line with the frequency ∆t. The x and y coordinates are plotted on a 2D plane, as the camera is oriented directly over the demo area. Initially, the origin is placed at the top left corner of 18 4. Perception the frame. However, upon configuring specific detection zones, the origin shifts to the top left corner of the selected detection zone, as illustrated in Figure 4.3. Figure 4.3: A frame of the Mercedes Lego truck with active perception system, displaying the coordinate system and the detection area Figure 4.3 shows a snapshot from the perception system’s output. It features the detected object outlined in purple, along with its x and y centre coordinates. Addi- tionally, there’s a red box that represents the pre-set detection zone. The origin of the coordinate system is indicated in the top left corner of the detection area. To calculate the velocity of a moving vehicle using a camera, it is essential to track how specific points on the vehicle shift over time. These points, which remain fixed to the bounding box centre point, move at the same velocity and direction as the vehicle when it is in motion relative to the camera. In this project, the camera is strapped to the ceiling, so the vehicle’s speed is measured in relation to the camera, which corresponds to the speed relative to a stationary plane in the camera’s view. To determine the vehicle’s speed, frames captured by the camera are analyzed. This process allows for the measurement of the vehicle’s momentary speed. The calculation of this velocity is based on the change in the position of the reference points across the current frames according to: v = ∆P ∆t . (4.1) Here v is equal to the velocity of the truck, and ∆P corresponds to the Euclidean distance between two points, displayed in (4.2). ∆t corresponds to the measured time difference it takes for the vehicle to be transported the distance ∆P , i.e ∆t = tk − tk−1. ∆P = √ (xk − xk−1)2 + (yk − yk−1)2 (4.2) The velocity, v, is a velocity vector of a point where v ∈ R2 i.e in 2D space since only one camera is used in this thesis. The measured time, ∆t, is equal to the time which passes between two processed video frames and is equal to the update frequency. To find the velocity of the vehicle, one point is not enough. For this reason, the 19 4. Perception estimate of the velocity of the mobile unit can only be used after two time periods, t > 2∆t, where at least two positions have been registered. Figure 4.4 illustrates the points and timestamps necessary for calculating the velocity. Figure 4.4: Illustration of the position, velocity and timestamps for the mobile unit When dealing with image processing and tracking, the orientation and placement of the camera play a crucial role in how to interpret and manipulate the captured data. If a camera is not positioned in a bird’s-eye view, that is, directly overhead, the resulting images can exhibit perspective distortion. This distortion will then skew the perceived dimensions and positions of objects within the frame, calculat- ing the kinetic properties like velocity or distance impractical. One common approach to this problem is to apply a coordinate transformation [21]. This process involves adjusting the image coordinates to reflect the true layout of the frame. It corrects for the perspective-induced distortions, aligning the image closer to what would be seen from a top-down view. The transformation is essential for precise tracking and measurement, as it ensures that the calculations are based on the actual arrangement of objects, rather than their distorted image representations. This thesis is however fortunate to avoid these complexities. The camera setup is strategically placed directly above the surface that is being tracked. This positioning provides a bird’s-eye perspective, thus naturally eliminating significant distortion that would otherwise be present. As a result, the images that are captured are already in a desirable format for analysis, decreasing the complexity of the perception system. No camera calibration has been performed in this project but could yield improved accuracy as distortion due to the camera lens is still present. 20 4. Perception 4.4 Model Performance and Training The YOLOv8 architecture provides a range of different-sized models. Some of these models are presented in Table 4.1. When comparing each model at a set pixel size, the mean average precision (mAP) and the latency can be evaluated. The mAP measures the average precision of an object detection model over a range of intersection over union (IoU) thresholds, in this case between 50% – 95% [22]. Table 4.1: Performance metrics for different YOLOv8 models. Model Pixel Size mAPval 50−95 Speed CPU ONNX [ms] YOLOv8n 640 37.3 80.4 YOLOv8s 640 44.9 128.4 YOLOv8m 640 50.2 234.7 YOLOv8l 640 52.9 375.2 Within object detection with YOLOv8, Table 4.1 displays a trade-off between the model’s size and its performance characteristics on a CPU. A larger model typically yields increased accuracy and precision but comes with a cost of decreased process- ing speed and a higher demand on computational resources. To choose a suitable model for the application one must achieve a balance between accuracy and speed, taking into consideration the computational capacity available for the task. Based on the data presented in Table 4.1 and considering the hardware specifications outlined in Chapter 3.2, the YOLOv8n “nano” model was selected for the project. The choice was made to keep inference time as low as possible, accepting a certain trade-off in accuracy to ensure real-time performance. Utilizing a CNN for the detection and tracking of an object is a crucial part of the overall system as the perception lays the foundation for the other systems to make well-informed decisions. Thus, the state estimation system must be robust, ensuring that the tracking of the mobile unit is never lost. This was not the case when using the pre-trained model provided with YOLOv8, resulting in the detection of multiple undesired objects, tracking loss, and false negatives as shown in Figure 4.5. There- fore the model had to be trained on a custom dataset suitable for the application. 21 4. Perception Figure 4.5: Undesired detections and false negatives with a pre-trained YOLO model on the COCO dataset. To achieve desirable results, the model must be modified and trained on data such that the mobile unit can be recognized and tracked at all positions within the demo area. For the chosen YOLOv8-n model the following steps were taken to obtain a robust model that could complete the tasks specified in Chapter 4.3. 4.4.1 Data Acquisition The performance and efficiency of a YOLO model are highly dependent on the data that it is trained on. Therefore, the pre-trained YOLOv8-n model was extended and trained with a dataset including images of the mobile unit in various contexts within the demo area. To save time and ensure a varied dataset, multiple videos of the mobile unit were taken, covering different production cycles in various con- ditions and placements around the demo area. The objective was to form a broad dataset to enhance the model’s ability to generalize effectively and mitigate the risk of overfitting. The gathered videos were then converted into images, with a selected number of captured frames from each video to compile a large dataset. The final dataset consisted of 1450 images, where a sample of the labelled dataset is shown in Figure 4.6. 22 4. Perception Figure 4.6: Sample of the labelled dataset used for the perception system for testing on the Mercedes truck. 4.4.2 Image Annotation Image annotation is the process of adding metadata to a set of images, i.e. annotat- ing the desired objects within each frame with bounding boxes and labels. This is done to guide the algorithm to learn from the provided data and emphasise certain points. The process of image annotation can be time and resource-consuming for a large dataset. Therefore, to avoid manually annotating each image in the new dataset a base model was employed to automate the process. A base model is a large founda- tion model that can be applied for multiple purposes, trained on large datasets [23]. Within this project, the Grounded Segment Anything Model (SAM) is used, a model that can segment out individual objects from an image [24]. The base model is trained on over 11 million images and 1.1 billion masks, and when given prompts of desired objects it can annotate a large dataset with bounding boxes and labels quickly and without any other external inputs. A result of this is shown in Figure 4.6. 4.4.3 Model Training Training a YOLOv8 computer vision model for real-time applications, particularly for consistent detection and tracking of a specific object, is critical. When utilizing transfer learning to train a dataset precisely tailored to the trait of a particular ob- ject, the model significantly improves in detecting that object accurately. It learns to identify unique features and variations of the object, effectively distinguishing it from similar items or background interference. To achieve the distinction between the target and other objects, a set of the early layers in the model are frozen, mean- ing that they are not updated during the training process. Instead, only the deeper layers are fine-tuned with the new data. This is an important aspect as it leverages the generic features learned from the standard dataset and adapts more specific features in the deeper layers. With this method the number of false positives and negatives can be reduced, thereby improving the accuracy. 23 4. Perception Moreover, the effectiveness of the model in real-time scenarios depends on its abil- ity to swiftly and reliably re-identify the object in successive frames, adapting to movement and partial obscurations. This training is key to maintaining consistent tracking, regardless of changing conditions. Additionally, by refining the model’s focus on a specific object, it becomes operationally more efficient. This efficiency translates to reduced computational complexity, making the model a better fit for systems with limited processing capabilities, such as the system developed in this project. This targeted training approach not only elevates the model’s performance in its primary task but also enhances its applicability and reliability, for identifying and tracking the desired object [25]. The effectiveness of the detection model is significantly impacted by how well the training data is balanced to avoid underfitting and overfitting, particularly when the model is trained for a singular objective. Underfitting occurs when the trained model is too simplistic, failing to capture the complexity and variability in the data. This can lead to poor performance as the model cannot generalize well to new, unseen scenarios. On the other hand, overfitting occurs when the model is excessively tai- lored to the training data, capturing noise and anomalies as if they were significant patterns. This may result in a model that performs well on training data but poorly on new, real-world data, as it becomes too specialized [26]. To avoid this, a varied and large dataset is used for the model training, allocating 80% of the dataset to training and 20% for validation. To avoid overfitting, an “early stopping” algorithm is implemented when training the model. This entails continuously monitoring the validation metrics and stopping the training of the model if the metrics indicate a performance plateau over a set amount of epochs, i.e. the model does not display improved performance over time with more training. 4.4.4 Process Acceleration on CPU With the trained model implemented in the preception system, the maximum al- lowed throughput was deemed to be 100 [ms] to ensure that each iteration for the full system could be completed within 200 [ms]. However, without accelerating the process on the target hardware, the time for a single computational iteration (pre- processing, inference, and post-processing) took approximately 600 [ms]. To mitigate the problem of insufficient inference rates in real-time applications, ma- chine learning models are typically deployed on GPUs (Graphics Processing Units), or TPUs (Tensor Processing Units), which are capable of conducting numerous parallel operations. Alternatively, strategies such as model acceleration or sparsi- fication, including pruning and quantization, could be employed to speed up the inference rate [10]. As stated in Section 3.2, the target hardware unit within this project is an APC containing only a CPU. Thus, common hardware acceleration techniques such as the use of GPUs are not available. Instead, the focus is shifted towards model ac- 24 4. Perception celeration and sparsification to reduce the complexity of the model and in return increase the model throughput. To achieve this without compromising accuracy to a significant extent, different tools can be applied. Within this project, a few methods were investigated and are briefly highlighted below. 4.4.4.1 ONNX Runtime ONNX runtime [27] is a machine-learning engine aimed at executing inference on a wide range of platforms and hardware to accelerate the throughput. To obtain this, the engine analyzes the model’s graph and determines how it can be optimized for execution. Then the model is partitioned and the engine can thereafter dynamically assign computational tasks, thus ensuring efficient execution of individual tasks and a holistic optimization of the entire model. 4.4.4.2 OpenVino For an Intel-based system OpenVino [28], short for Open Visual Inference & Neural Network Optimization, can be applied to optimize and improve inference on a target hardware application. Developed by Intel, the tool compresses the deep learning models and supports deployment- and hardware optimization for a large number of Intel CPUs, taking advantage of the specific hardware capabilities of each supported device. 4.4.4.3 DeepSparse DeepSparse [29] is an engine that utilizes sparsity to accelerate inference within neu- ral networks on CPUs. By utilizing structured and unstructured sparsity, weights with no impact on the system during runtime are known and can thereby be avoided during runtime. To further optimize for CPU architectures, the runtime computa- tions are organized into “Tensor-columns”, allowing effective cache utilization. This is done by reducing the amount of data transportation in and out of the larger cache memories, which usually is a large bottleneck for memory-bound systems [29]. The DeepSparse tool facilitates acceleration for both dense models and models sparsefied through quantization and pruning. In this project, both model types are evaluated to investigate the performance enhancements of a reduction in model complexity. The evaluation and selection of the acceleration method is displayed in Chapter 7.1.3. 25 4. Perception 26 5 Motion Planning The primary objective of a motion planner is to determine how a mobile unit should navigate through a specified environment. This includes deciding the desired path that the mobile unit should take, as well as its associated states such as position, velocity, and pose at each point in time. To achieve this, the motion planning problem is divided into two parts, a global path planner and a trajectory planner. The global path planner effectively links the mobile unit’s initial state to a set of specified goal states and the trajectory planner then locally plans the desired states along the path, taking the physical constraints into consideration, similar to [30]. Within this project, the environment is considered static and all objects are mapped beforehand within the 2D space. 5.1 Path Planner The path planner aims at finding a path between the start and goal states. By assuming no dynamic obstacles except for the mobile unit itself affecting the envi- ronment, a predetermined map can be used to determine the desired path. This involves the creation of a grid map of the demonstration area, where the environ- ment is discretized into a series of nodes. These nodes serve as stations in defining and facilitating the navigation of the mobile unit’s path. With a grid map defined, multiple approaches can be taken to find an optimal path. Conventional trajectory optimization techniques such as search-based algorithms like A*, or sampling-based algorithms such as RRT are commonly used [31,32]. However, within this project, the focus is not on finding the optimal path in terms of distance, energy minimization or time, but rather on ensuring that the path is aesthetically pleasing, easy to modify and smooth. Therefore a graph-based approach is used, where the shortest Euclidean distance between each node is interpolated and used as the desired path for the mobile unit. This was done to reduce computational complexity and rely on the motion controller to maintain a smooth and collision- free path. A simplified example of such a grid map with a path is shown in Figure 5.1 to visualize how the system internally interprets the environment. 27 5. Motion Planning Figure 5.1: Visual representation of the grid map with defined station nodes. To define a desired velocity and behavior when approaching the different stations a schedule is also provided to the path planner. This entails a specification of when the mobile unit should arrive at each station, allowing the system to incorporate more aspects of the desired behaviour into the final motion plan. An overview of the path planner is presented in Figure 5.2. Load graph & map Define the grid map and environment, marking ob- stacles and nodes. Set schedule Incorporate schedule con- straints into pathfinding. Create environment and generate node to node path Combine the environmen- tal setup and the appli- cation of the pathfinding algorithm. Linear interpolation between nodes Smooth the path by calcu- lating intermediate points between nodes. Trajectory planner Figure 5.2: Flowchart of path generation in a grid map environment with a de- scription for each process. The purpose of this system is to generate a navigable path within a predefined envi- ronment using the Cartesian coordinates x, y. The algorithm comprises processing a schedule, generating a node-to-node path, and performing linear interpolation between nodes to smooth out the path. 28 5. Motion Planning 5.2 Trajectory Planner The purpose of a trajectory planner is to enable a robot to navigate its desired path in a way that respects its physical limitations [33]. This involves determining a set of reference states along the desired path that the mobile unit should adhere to. A trajectory planner is designed as a subsystem within the greater motion planning system, subsequently providing the reference states over the control horizon at each time step, k. The trajectory planner translates the path information such that the control system can manage detailed motor instructions for controlling its movement, taking the mobile unit’s current state into account, the planned node-to-node path, and the dynamic constraints of the environment. With the information from the path planner, a set of references can be defined for the mobile unit based on the given path. The states of the model include the x and y coordinates and the heading angle θ. Additionally, a longitudinal reference veloc- ity, v, is incorporated into the state vector, where the velocity reference is based on the distance to the next station and the desired arrival time. To ensure that the set references adhere to what is physically possible to achieve, physical constraints are incorporated to saturate the references including the linear velocity and heading angle. To ensure obtainable reference states, the x and y coordinates are also limited to only within the boundary of the predefined demonstration area. The reference generation’s main goal is to discretize a continuous reference path into a series of states over the control horizon, N . The framework for generating a linear reference trajectory uses two principal functions: global path sampling, and segment-wise interpolation. The global path sampling function systematically in- vokes the segment-wise interpolation to create a path of uniformly spaced points from a given set of waypoints. The global path sampling function operates on a set of n waypoints, W = {W1, W2, . . . ,Wn}, that define the trajectory. The objective is to construct a sequence of points P that captures the essence of the path with a desired resolution. The step size, ∆s, is calculated as the product of the vehicle’s velocity v and the control system’s sampling time interval ∆t. The function iteratively samples each segment of the trajectory, where the distance ∆s determines the gap to the next node on the generated path. Within each segment between consecutive waypoints, Wi and Wi+1, segment-wise interpolation is executed. For a segment of length L where L = ∥Wi+1 −Wi∥, a series of intermediate points are computed based on the linear interpolation principle: P (λ) = Wi + λ(Wi+1 −Wi), (5.1) where λ is a parameter that increments in steps sized to maintain the spacing ∆s, terminating once the segment is fully sampled. 29 5. Motion Planning The segment’s interpolated points are: Pk = Wi + ( k · ∆s+R L ) (Wi+1 −Wi) (5.2) for k = 1, 2, . . . such that k · ∆s ≤ L, and R is the remainder from the previous segment’s interpolation, ensuring that the spacing between points remains consis- tent across the segment boundaries. Each segment-wise interpolation yields a set of points and a new remainder, which is carried forward to the subsequent segment, preserving the geometric resolution. The linear reference generation transforms a continuous trajectory into a series of discrete, equally split waypoints, and a heading angle. The waypoints, together with the heading angle serve as a reference for the control systems to dictate the movement of the mobile unit along the predefined path. The algorithm uses linear interpolation to ensure a predictable outcome. Given two known points, the inter- polated points will always lie directly between them in a straight line, allowing for a path that is both smooth and efficient. In this project, several nodes are established to manage the operational location of the mobile unit. Upon arriving at a designated node, the mobile unit stops its movement for a predetermined duration to execute specific tasks at that station. As an example of a station task, the unit could pause at a station equipped with QR code identification technology until it is recognized, after which it will proceed to the next designated node. The system used at OrangePoint has four such stations which are predefined by entering the x and y coordinates of each station, along with a priority level that determines the sequence in which the stations are visited. After completing the sequence, the unit either resets and begins the cycle from the begin- ning or terminates its operation at the final station, depending on the user input in the configuration file. The trajectory planner is horizon-based, which allows for planning over a predefined number of steps or time intervals into the future on the given path. This will be referred to as the reference horizon. This approach enables the trajectory planner to respond to future conditions and objectives, allowing for adjustments to the set of references as new information becomes available [34]. The length of the reference horizon is directly dependent on the control horizon in the motion control system, balancing the benefits of foresight against the need for timely decision-making. 30 6 Motion Control To track the desired trajectory, a motion control system is developed. The system can be further divided into two main segments, a high-level controller and a low-level controller. The high-level controller is tasked with the core computations, aiming to minimize the deviation between the reference and the estimated position. The discrepancies from the intended path are then converted into actuation requests, such as steering angle and longitudinal velocity to reduce the deviation over time. The low-level controller then acts as an allocator, converting the requested actions into motor commands and facilitating communication between the APC and the remote mobile unit. A top-level overview of the motion control system, including the flow and interaction of signals is presented in Figure 6.1. Figure 6.1: Simplified motion control overview. The motion control architecture of Figure 6.1 shows multiple interconnected sub- systems, together resulting in the actuation of the mobile unit. Subsequent sections will provide a more detailed description of each subsystem within this figure. 31 6. Motion Control 6.1 Motion Models A motion model is derived to capture the dynamic behaviour of the mobile unit used in this project and how current actions affect the system’s future states. The mo- tion model is crucial for system validation because it is integrated into the simulator. Additionally, it can be used in more complex controllers, enabling the prediction of future states based on current states and actions. Based on the design of the mobile units detailed in Chapter 3.2, two motion models are derived. One with a rigid body and another featuring an additional degree of freedom between the head and the trailer, known as an articulated body. Each model is a simplification of reality and some approximations have been made. For example, the Ackerman steering is considered parallel and the power distribution between the front and rear axis is neglected. 6.1.1 Rigid Motion Model Within this project, a simplified car model [35] is used to derive the dynamics of the rigid mobile unit. Due to the wheel alignment and steering configuration, certain constraints are imposed on the car, limiting the rotation around its z-axis proportional to its wheelbase. Figure 6.2 shows the rigid motion model with its position in two-dimensional space and heading orientation. Figure 6.2: Simplified model of the rigid mobile unit. From Figure 6.2, the discrete state vector can be represented by xk = [xk, yk, vk, θk]T . The control inputs coupled to the states are denoted by uk = [ak, δk], where ak is the longitudinal acceleration and δk is the steering angle deviation from its zero position. To further model the mobile units dynamics, a non-linear motion model in continuous time is derived in (6.1) with the mentioned states and control actions. ẋ ẏ v̇ θ̇  =  v cos(θ) v sin(θ) a v L tan(δ)  (6.1) To find numerical solutions to the differential equations of (6.1) in discrete time, the system is discretized using forward Euler discretization: x̂k+1 = x̂k + f(x̂k, uk)∆t. (6.2) 32 6. Motion Control The non-linear motion model is thereafter implemented into the simulator. To further simplify the model for linear control system applications, the motion model is linearized. Restricted by its physical limitations, larger changes in the states are limited within a small time frame. Thus, a first-order Taylor expansion is used to improve the accuracy of the linearized model at smaller state deviations from the nominal state. This means that the system is linearized around a nominal state (x̄, ū) to mitigate deviations between the linear and non-linear model around a given operating point. The final linearized motion model becomes: x̂k+1 = Axk +Buk + C. (6.3) When using the linear motion model in control algorithms, the operating point is continuously updated to ensure the accuracy of the motion model. Below are the A and B matrices, together with the correction matrix C presented at an arbitrary operating point (v̄k, θ̄k, δ̄k). A = (I + A′∆t) =  1 0 cos(θ̄k)∆t −v̄k · sin(θ̄k)∆t 0 1 sin(θ̄k)∆t v̄k · cos(θ̄k)∆t 0 0 1 0 0 0 tan(δ̄k) L ∆t 1  (6.4) B = (B′∆t) =  0 0 0 0 ∆t 0 0 v̄k L·cos2(δ̄k)∆t  (6.5) C =  v̄k · sin(θ̄k)θ̄k∆t −v̄k · cos(θ̄k)θ̄k∆t v̄k·δ̄k L·cos2(δ̄k)∆t  (6.6) The C-matrix of (6.6) can be described as a correction term that accounts for differences between the predicted and actual dynamics of the vehicle model. The correction term calculates the difference between the actual system dynamics and its linear approximation: C = f(x̄, ū) − A′x̄ −B′ū. (6.7) 33 6. Motion Control 6.1.2 Articulated Motion Model To account for the additional degree of freedom between the head and trailer of the mobile unit shown in Figure 3.3b, the motion model (6.3) is augmented with an additional state ψ as shown in Figure 6.3. Figure 6.3: Simplified model of the articulated mobile unit. Figure 6.3 shows the articulated mobile unit, highlighting the variables affecting the relative joint angle. Based on the model given in Figure 6.3 and [36], the relative joint angle, ψ, can be derived as the deviation between the heading of the trailer and the heading of the mobile unit. The rate of change of the relative joint angle is: ψ̇ = θ̇t − θ̇ = v Lt · sin(θ − θt) − v L · tan(δ). (6.8) As there is no wheelbase affecting the steering capabilities for the head of the mobile unit and assuming small joint angle deviations, (6.8) can be approximated as: ψ̇ ≈ v Lt sin(ψ − δ). (6.9) The rate of change in the relative joint angle can then be added to the current angle and implemented in the augmented non-linear motion model as an additional state: xk+1 yk+1 vk+1 θk+1 ψk+1  =  xk + ∆t · vx cos(θk) yk + ∆t · vx sin(θk) vk + ∆t · a θk + ∆t · vk L tan(δk) ψk + ∆t · vk lt sin(ψk − δk)  (6.10) The motion model is linearized as for the rigid motion model above. 34 6. Motion Control 6.2 High-level Controller The primary objective of the high-level controller is to minimize the trajectory de- viation from the setpoints given by the motion planner. Achieving this objective involves a recurrent process of identifying a sequence of viable control signals that ensure that the mobile unit adheres to the intended trajectory. The complexity of this type of system can vary considerably, where the constraints and cost minimiza- tion can be handled as two separate entities or incorporated into a controller that can handle both. This section presents the implementation of two different con- trol strategies, a classical PID controller, and a model-predictive controller (MPC) with the intent of evaluating what different levels of complexity yield in terms of performance. 6.2.1 PID Control To ensure a fully working system and to set a baseline for trajectory tracking, a sim- ple Proportional-Integral-Derivative (PID) controller was implemented. To do so, the control objective was defined as a single input single output (SISO) system with the sole objective of tracking the current desired position and correcting the steering angle to minimize the deviation. The discrete expression for the PID controller is: uδ[k] = Kpe[k] +Ki∆t k∑ i=0 e[k] +Kd e[k] − e[k − 1] ∆t (6.11) Here Kp, Ki and Kd are weights that were manually tuned to improve the perfor- mance during testing. The control action, denoted as uδ, for each iteration, k, consists of the accumulated error from the current e[k] and previous e[k−1] states, combined with the predefined weights. The accumulated sum yields a control action that affects the change in steering angle to mitigate the observed error. The error term, e[k], represents the current deviation between the mobile unit’s estimated centre point and the reference position. The calculated error at each sample is given by: e[k] = ∥xref,k − x̂k∥2, (6.12) which represents the mobile units x and y coordinates as x̂k and the corresponding reference point as xref,k. The error calculation serves as the foundation for deter- mining the steering angle in the next iteration. To ensure that the resulting control action is within the mobile unit’s feasible operational bounds the control action is saturated, and an anti-windup solution is incorporated. Since the mobile unit operates at low speeds and efficiently reaches the desired longitudinal velocity within a reasonable time on all surfaces in the demonstration area, there is no necessity for a dedicated controller for the longitudinal velocity input. The velocity command is instead directly based on the reference from the trajectory planner. 35 6. Motion Control 6.2.2 Model Predictive Control To further improve the motion control system, more advanced controllers were in- vestigated, allowing the mobile unit to handle complex tasks and solve difficult manoeuvres in tight environments. This could be achieved in multiple ways but a method with forward-looking capabilities that can incorporate constraints and lim- itations into the problem formulation was desired. A full-state feedback design and the ability to model constraints could be more effective in navigating tight environ- ments than previous controllers as the end goal is to achieve a smoothly controlled mobile unit [37]. For these reasons, an MPC approach was chosen. MPC is a control strategy that explicitly accounts for future events to make current decisions. Unlike PID controllers, which react to present errors, MPC formulates an optimization problem that predicts future system behaviours over a given prediction horizon, solving for the optimal control inputs at each step [38]. This forward-looking capability allows this type of controller to manage constraints and multiple input, multiple output (MIMO) systems more effectively, which was desired in this project. An MPC formulation can contain both linear and non-linear dynamics. Non-linear problems, while potentially more precise, are also more computationally demand- ing [39]. In contrast, linear problems, though approximations, can be kept convex, requiring less computational power to solve. Based on the available computational resources for this project and the fact that higher precision is deemed unnecessary for this application, a linear quadratic control problem is formulated with the linear motion models presented in Chapter 6.1. The motion model is chosen depending on what mobile unit is used. The final problem formulation and the objective function are further described below. 6.2.2.1 Cost and Constraints To ensure that the system maintains the desired reference trajectories with smooth behaviour, an objective function is formulated that incorporates a set of costs de- signed to penalize undesired behaviours. Each cost component is treated as a soft constraint. This approach requires less computational resources than methods with hard constraints and inequalities, even if a harder constrained problem yields a smaller feasible set [40]. However, it presents a trade-off between accuracy and computational complexity. The problem formulation in this project aims to min- imize the overall cost, requiring more tuning to achieve the desired results while a harder-constrained problem is less dependent on the tuning but instead is more computationally expensive. The formulation of each cost and constraint is further described below, inspired by [41,42]. Similar to the PID controller a state deviation cost is derived, in this case provid- ing full state feedback where the deviation between the references states xref = [xref , yref , vref , θref ]T and the current state vector x̂ is found at each step k. Each state deviation is penalized with a cost-matrix Q, and computed over the entire prediction horizon, N . Additionally, a terminal cost is added to the final state de- 36 6. Motion Control viation at the horizon N , together accumulating the total cost for reference path deviation over the entire horizon: Jx = ∥xref,k − x̂k∥2 Q, k ∈ N[0,N ] (6.13) Jτ = ∥xref,N − x̂N∥2 Qt (6.14) Additionally, to maintain the desired velocity given by the motion planner and minimize steering effort, an actuation cost is implemented (6.15), weighed with a matrix denoted R. To avoid oscillations and fast action changes, acceleration and steering jerk are also penalized as an additional cost (6.16). Ju = ∥uk∥2 R, k ∈ N[0,N−1] (6.15) Ju′ = ∥uk+1 − uk∥2 Rd, k ∈ N[0,N−1] (6.16) To define the feasible region of the solution and determine the bounds, a set of inequality constraints is defined. These constraints limit the feasible region, helping the solver to find an optimum within these bounds. Specifically, the control inputs uk are constrained by minimum and maximum allowable values: umin ≤ uk ≤ umax, (6.17) ensuring that the vehicle’s actuators operate within safe and efficient limits. To constrain the feasible solution to be within the bounding box of the demonstration area, four additional inequalities were added as an upper and lower bound, denoted xb and yb: xb,min ≤ xk ≤ xb,max yb,min ≤ yk ≤ yb,max. (6.18) These constraints limit the controller’s horizon to be inside the area, thus avoiding a series of actions that could lead to collisions with the walls. In addition to the inequalities, an equality constraint is defined to ensure that the solution strictly adheres to the modelled system behaviour, ensuring feasible physical solutions: xk+1 = f(xk, uk). (6.19) Together, these constraints ensure that the solution derived from minimizing the cost of the objective function adheres to the physical- and operational limits of the mobile units. Where the main intent is to estimate and achieve feasible actions that the mobile unit can perform. 37 6. Motion Control 6.2.2.2 Problem Formulation With all terms in the objective function being quadratic and the constraints being linear, the final optimization problem, given in (6.20) and (6.21), is formulated as a quadratic minimization problem over the horizon N . With a quadratic problem, a convex solution can be guaranteed, meaning that any local minimum is also a global minimum, ensuring the optimal solution at each iteration. min u N−1∑ k=0 [Jx + Ju + Ju′ ] + Jτ (6.20) s.t. ∀k ∈ N[0,N−1] umin ≤ uk ≤ umax xb,min ≤ xk ≤ xb,max yb,min ≤ yk ≤ yb,max xk+1 = f(xk, uk) (6.21) The quadratic problem formulation above entails a predictable and low-cost solu- tion in terms of computational complexity, as iterating through several local minima can be avoided if the problem is convex. With all terms in the objective function being quadratic and using only linear constraints, convexity can be guaranteed by ensuring that the objective function is positive and semi-definite. There is a wide range of available solvers that can efficiently solve convex problems, in this project, an interior point solver called ECOS is chosen from the CVXPY - library [43]. The interior point method transforms the original problem into a se- quence of approximate problems, which become progressively closer to the original problem. Rather than handling the constraints directly, this method uses barrier functions that make the cost of approaching the boundary of the feasible region tend towards infinity [44]. Thus, ensuring that the solution is within the feasible region without handling constraints in a way that would increase computational complexity. 38 6. Motion Control 6.3 Low-level Control As previously stated, the actuation of the mobile unit is facilitated by the imple- mentation of a low-level controller. The subsystem serves an intermediate role by processing the high-level motion requests into specific, executable motor commands on the mobile unit. The available motor commands may vary based on the motor configuration of the mobile unit. In this thesis, the mobile unit is controlled through steering and velocity commands. To maintain centralized control of the entire system and because the internal pro- cessing unit of the mobile device is not directly accessible or adequate, the low-level controller is located on the APC. From this setup, motor commands are transmit- ted to the mobile unit via Bluetooth. This arrangement ensures reliable delivery of actuation commands to the mobile unit once a Bluetooth connection is success- fully established, eliminating the need for physical access to its internal components. Each hub on a Lego Truck possesses a unique Bluetooth ID, therefore successful con- nection to the intended mobile unit can be established. This centralized approach also opens up the possibility of controlling multiple mobile units from a single APC in the future. Discretized with zero-order hold (ZOH), the mobile units will hold the previous actuation commands between samples. The physical limitations within the mobile unit also resulted in additional latency between the requested and fully completed actuation. To mitigate a buffer build-up, resulting in increasing latency over time and race conditions, the low-level controller cannot send commands without the previous request being completed. To ensure this, a mutex lock is implemented to verify that the mobile unit has processed and completed the previously requested command before allowing another request to be sent, simultaneously taking care of the buffer by choosing the most recent actuation requests. 39 6. Motion Control 40 7 Results The performance of the system and its subsystems was evaluated by a series of tests, both in simulation and on the physical hardware. This chapter presents the results, covering both a system overview and the test of each subsystem to ensure that it meets the requirements presented in Table 3.1. All tests were performed on an Intel i5-8350U CPU with 4 cores and a clock speed of 1.7 GHz, together with 16GB of RAM. The specifications are similar to the upgraded version of the APC presented in Chapter 3.2. 7.1 Perception In this project, the perception system is designed to process and interpret data from a single sensor input, the camera, for use by the rest of the system. Thereby, the precision and latency of the system largely depend on the performance of the perception system. The perception subsystem is evaluated first separately and later together with the entire system. The main applications of the subsystem are further evaluated and presented below. 7.1.1 Dataset and Model Training Evaluation To ensure consistent detection and tracking of the mobile unit, a dataset from the demonstration area was used and the training results are shown in Figure 7.1. Figure 7.1: Performance metrics of model training over 25 training epochs. To assess the model’s learning and adaption to the new dataset some performance matrices have been extracted and displayed in Figure 7.1. By analyzing the box_loss 41 7. Results and cls_loss, high accuracy can be concluded as the model improves over the train- ing epochs, both in locating and classifying objects correctly. The third column in Figure 7.1 covers the distribution focal loss denoted dfl_loss, indicating an increased correlation between the estimation of the bounding box coordinates and the ground truth specified in the dataset. The four plots on the right in Figure 7.1 highlight the final model’s performance. These results display high values in both Precision and Recall, indicating that the model can distinguish well between desired objects and non-desired objects. Higher values for these metrics imply fewer false- positives and negatives. Additionally, based on the metrics mAP50 and mAP50-95, the results suggest a rapid improve- ment in the model’s ability to predict the bounding boxes for objects with at least 50 % overlap, as well as IOU thresholds ranging from 50 to 95 %. These results indicate that the model has achieved a high level of accuracy in consistent object detection and tracking. However, some fluctuations in the precision metric suggest that the model was more inconsistent in detecting true positives in the early stages of training. 7.1.2 State Estimation Accuracy Estimating the mobile unit’s state is the key component of the perception system and should be done with high accuracy to yield a stable and responsive system. Within this project, the main states estimated with the perception system were the Cartesian x and y coordinates of the mobile unit and the linear velocity. The re- maining variables in the state vector, X, were deemed more sufficient to estimate with the internal motion model. By giving the system a set of initial states, the states in the next iteration can be estimated by the current actions set by the mo- tion controller, thus mitigating the need to estimate the pose with the perception system. As the estimated position in the 2D space is critical to ensure that the mobile unit follows the desired trajectory as intended, the accuracy of the estimated x and y coordinates are evaluated in a small test. The test involved comparing the measured true position to the estimated one given by the perception system. Table 7.1 shows the average deviation between the true and estimated position in the 2D space. Table 7.1: Standard deviation of position measurements Position measurement Deviation from true position x 0.0366 m y 0.0133 m |x, y| 0.0389 m The results presented in Table 7.1 show that the estimated position can deviate ap- proximately 4 cm from the true position. The position can deviate in any direction. The deviation test was performed with a camera height of 2.28 m above the ground, facing directly downwards. 42 7. Results 7.1.3 Process Acceleration To evaluate the performance of each acceleration method presented in Chapter 4.4.4, a sample video similar to the application is used. The sample video was a unique set of frames, different from the frames in the dataset used for training the model. With this, the latency (including pre-processing, inference, and post-processing) of the system is measured. Thereafter, each framework is compared to find which acceleration method yields the lowest computational time without compromising accuracy to a large extent. The results from the performed tests are presented in Figure 7.2 and Table 7.2. Figure 7.2: Average latency for different object detection frameworks during run- time. Table 7.2: Average accuracy measurements of different acceleration methods. Method Model Accuracy .pt 74% ONNX 79% OpenVino 81% ONNX Runtime 69% DeepSparse Dense 91% DeepSparse Quantizied 54% Table 7.2 presents different acceleration methods and their accuracy rating. The accuracy rating is measured based on the average confidence score of the object detection during the test, where the period and frames were the same for all meth- ods. As shown in Figure 7.2 the DeepSparse framework provides the lowest latency on average, particularly with a dense model configuration. When analyzing the re- sults in Table 7.2, the comparison between dense and quantized DeepSparse models reveals a significant insight into the trade-offs between speed and accuracy. Both DeepSparse models indicate a reliable level of performance across different configu- rations. However, the dense model stands out for its balance of speed and precision, as the loss of accuracy for the quantized model was significantly higher. Thereby, the dense model with the Deepsparse engine was used for the remaining tests. 43 7. Results 7.2 Motion Control In the motion control system, two controllers were implemented: a PID controller and an MPC controller, each with significantly different levels of complexity. With this evaluation, the main goal was to establish the complexity level required to achieve the desired results navigating the tight demonstration area with the given mobile units. To evaluate the controllers, the average deviation between the refer- ence and the measured position of the mobile unit is computed at each point, as shown by (7.1) over the simulation time, Ttot: eavg = 1 Ttot Ttot∑ k=0 √ (xr,k − x̃k)2 + (yr,k − ỹk)2. (7.1) Additionally, the maximum path deviation emax and the standard deviation σ are computed. This can be analyzed by (7.2), which is used to evaluate the reliability and efficiency of the mobile unit navigation. σ = √√√√∑Ttot k=0(ek − eavg)2 Ttot (7.2) Other than evaluating the two motion controllers’ ability to follow the desired ref- erence states, the solution time of each controller, its ability to solve more complex problems, and reach all desired stations within a given tolerance is investigated. During the performed tests both in simulation and on the hardware, the PID con- troller followed a set reference velocity while the setup for the MPC controller al- lowed for adaptive velocity planning. This gave the MPC controller the ability to regulate its speed while performing different manoeuvres, incorporating it into the optimization problem. 7.2.1 Test Scenarios To evaluate both motion control strategies used within the project, a set of test scenarios where defined. Each controller is evaluated in all scenarios, both in sim- ulation and on the hardware to verify performance and discrepancies between the simulation results and the hardware performance. Below are some of the tests de- scribed in more detail. Figure 7.3 shows a simple test similar to a step response. However, due to the turning capabilities of the mobile units used within this project, the reference change is not a 90-degree turn. The test was done to evaluate the controllers settling time for a reasonable reference change. 44 7. Results (a) Demonstration area with sta- tions and a helping node (•). (b) Generated reference path from the motion planner. Figure 7.3: Simple step response test. The second test, presented in Figure 7.4, contains several stations, made to resemble a realistic run, constructed in a way similar to what the company wants to use for demonstration purposes in the future. This entails going to several stations shown in Figure 7.4a and completing different tasks. Figure 7.4b shows the generated reference states given by the motion planner during the run. (a) Demonstration area with sev- eral stations. (b) Generated reference path from the motion planner. Figure 7.4: Full cycle test between stations. With the given stations in each test, Figure 7.3b and 7.4b show the reference tra- jectory generated by the motion planner. With this approach, a simple and feasible path can be generated with more direct control over the exact path, giving the demonstration area more flexibility to get the system to behave as intended. One example of this is displayed in Figure 7.3 where a helping node is added between the two stations, indicating where the path should start its sharp turn. 45 7. Results 7.2.2 Control Tuning The tuning of the controllers was performed manually. The parameters for each controller were kept the same for both the PID and MPC during all hardware tests and simulations. This intent was to highlight the deviation between the simulator and hardware performance and also see how a statically tuned system would affect the results for different reference changes. The PID’s proportional gain (P) was set to an aggressive value to ensure quick correction to enable sharp turns. The integral action (I) included an anti-windup mechanism to reduce potential instability issues. The derivative component (D) was set to a low value to minimize oscillations and avoid excessive system changes. The parameters used for the PID controller during all physical tests and simulations are presented in Table 7.3. Table 7.3: Tuning parameters for PID-controller. Kp 1000 Ki 0.5 Kd 5 For the MPC controller, a prediction horizon of N = 5 was chosen to balance the travel between acceptable planning and avoiding shortcuts that could lead to miss- ing stations. The constraints and cost functions were implemented as described in Chapter 6.2.2. The tuning of the MPC controller aimed at penalizing the position in x and y the most to minimize the deviation between the true position and the reference position. The weights for the velocity v and θ were configured to allow the mobile unit to slow down and reverse if necessa