Drone Platform for Safety in Testing
Bachelor thesis in Electrical Engineering

Abdullah Arshad
Arvid Boisen
Jesper Eriksson
Viggo Forsell
Albin Hallander
Erik Rödin

DEPARTMENT OF ELECTRICAL ENGINEERING

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2025
www.chalmers.se

https://www.linkedin.com/in/abdullahars/
https://www.linkedin.com/in/arvid-boisen/
https://www.linkedin.com/in/jesper-eriksson-080b54211/
https://www.linkedin.com/in/viggo-forsell-2013bb247/
https://www.linkedin.com/in/albin-hallander-76159721a/
https://www.linkedin.com/in/erik-r%C3%B6din-817687249/
www.chalmers.se


Bachelor thesis 2025

Drone Platform for Safety in Testing

Abdullah Arshad
Arvid Boisen

Jesper Eriksson
Viggo Forsell

Albin Hallander
Erik Rödin

Department of Electrical Engineering
Division of Systems and Control

Chalmers University of Technology
Gothenburg, Sweden 2025

https://www.linkedin.com/in/abdullahars/
https://www.linkedin.com/in/arvid-boisen/
https://www.linkedin.com/in/jesper-eriksson-080b54211/
https://www.linkedin.com/in/viggo-forsell-2013bb247/
https://www.linkedin.com/in/albin-hallander-76159721a/
https://www.linkedin.com/in/erik-r%C3%B6din-817687249/


Drone Platform for Safety in Testing
Abdullah Arshad
Arvid Boisen
Jesper Eriksson
Viggo Forsell
Albin Hallander
Erik Rödin

© Abdullah Arshad, 2025.
© Arvid Boisen, 2025.
© Jesper Eriksson, 2025.
© Viggo Forsell, 2025.
© Albin Hallander, 2025.
© Erik Rödin, 2025.

Supervisor:
Emmanuel Dean, Department of Electrical Engineering
Mikael Enelund, Department of Mechanics and Maritime Sciences
Robert Brenick, AstaZero
Examiner: Knut Åkesson, Department of Electrical Engineering

Bachelor Thesis 2025
Department of Electrical Engineering
Chalmers University of Technology
SE-412 96 Gothenburg
Sweden
Telephone +46 31 772 1000

Cover: Picture shows a close-up shot of a DJI Mavic drone.
Photo by Pok Rie: Pexels.

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Gothenburg, Sweden 2025

iv

https://www.linkedin.com/in/abdullahars/
https://www.linkedin.com/in/arvid-boisen/
https://www.linkedin.com/in/jesper-eriksson-080b54211/
https://www.linkedin.com/in/viggo-forsell-2013bb247/
https://www.linkedin.com/in/albin-hallander-76159721a/
https://www.linkedin.com/in/erik-r%C3%B6din-817687249/
https://www.linkedin.com/in/abdullahars/
https://www.linkedin.com/in/arvid-boisen/
https://www.linkedin.com/in/jesper-eriksson-080b54211/
https://www.linkedin.com/in/viggo-forsell-2013bb247/
https://www.linkedin.com/in/albin-hallander-76159721a/
https://www.linkedin.com/in/erik-r%C3%B6din-817687249/
https://www.pexels.com/photo/a-close-up-shot-of-a-dji-mavic-drone-13310698/


Drone Platform for Safety in Testing
Abdullah Arshad
Arvid Boisen
Jesper Eriksson
Viggo Forsell
Albin Hallander
Erik Rödin
Department of Electrical Engineering
Chalmers University of Technology

Abstract
This thesis addresses the need for improved safety surveillance in autonomous ve-
hicle testing by developing a drone platform for multiple drones. Extending single
drone systems, this work focuses on coordinating multiple drones to provide wider
area coverage at testing facilities. The backend of the developed system calculates
drone positions based on test trajectories from ATOS, an open-source platform for
automated vehicle testing, ensuring required sensor overlap. Real-time video is
streamed from DJI drones via an Android app using Web Real Time Communi-
cation to a backend system. The backend processes the streams, performs image
stitching to create a panoramic view, and applies object detection with estimated
GPS coordinates for bikes, cars, and pedestrians. A web-based frontend allows users
to monitor individual and merged video feeds, view detected objects, and control
the drones. The system architecture is containerized using Docker. Experimental
validations demonstrate the platform’s ability to control drones, stream video in real-
time, and provide extended surveillance with object detection. Key challenges and
future improvements, including addressing hardware dependencies and automating
configuration, are identified. The project represents a collaborative effort between
Chalmers University of Technology and Pennsylvania State University.

Keywords: Drone Surveillance, Autonomous Vehicle Testing, Multi-Drone Systems,
Object Detection, YOLO, WebRTC, Image Stitching, ATOS, AstaZero.

v

https://www.linkedin.com/in/abdullahars/
https://www.linkedin.com/in/arvid-boisen/
https://www.linkedin.com/in/jesper-eriksson-080b54211/
https://www.linkedin.com/in/viggo-forsell-2013bb247/
https://www.linkedin.com/in/albin-hallander-76159721a/
https://www.linkedin.com/in/erik-r%C3%B6din-817687249/


Drone Platform for Safety in Testing
Abdullah Arshad
Arvid Boisen
Jesper Eriksson
Viggo Forsell
Albin Hallander
Erik Rödin
Department of Electrical Engineering
Chalmers University of Technology

Sammandrag
Detta kandidatarbete behandlar behovet av förbättrad säkerhetsövervakning vid
testning av autonoma fordon genom att utveckla en plattform med flera drönare.
Som en vidareutveckling av tidigare system med endast en drönare, fokuserar det-
ta arbete på att koordinera flera drönare för att ge täckning över större områden
vid testanläggningar som AstaZero. Systemet beräknar optimala drönarpositioner
baserat på testbanor från ATOS-systemet och säkerställer nödvändigt kameraöver-
lappning. Realtidsvideo strömmas från DJI-drönare till ett backend-system via en
Android-applikation som använder WebRTC. Backend-systemet bearbetar videost-
römmarna, utför bildsammansättning (image stitching) för att skapa en panorama-
vy, och tillämpar objektdetektering med skattade GPS-koordinater. Ett webbaserat
gränssnitt (frontend) möjliggör för användare att övervaka individuella och samman-
fogade videoflöden, se objektdetekteringar och styra drönarna. Systemarkitekturen
är uppbyggd med Docker-containrar. Validering visade att plattformen kan styra
drönare, strömma video i realtid och erbjuda utökad övervakning med objektdetek-
tering. Centrala utmaningar och framtida förbättringsmöjligheter identifieras, såsom
att hantera hårdvaruberoenden och automatisera konfiguration. Projektet är resul-
tatet av ett samarbete mellan Chalmers tekniska högskola och Pennsylvania State
University.

Nyckelord: Drönarövervakning, Testning av autonoma fordon, Flerdrönarsystem,
Objektdetektering, YOLO, WebRTC, Bildsammansättning, ATOS, AstaZero.

vii

https://www.linkedin.com/in/abdullahars/
https://www.linkedin.com/in/arvid-boisen/
https://www.linkedin.com/in/jesper-eriksson-080b54211/
https://www.linkedin.com/in/viggo-forsell-2013bb247/
https://www.linkedin.com/in/albin-hallander-76159721a/
https://www.linkedin.com/in/erik-r%C3%B6din-817687249/


Acknowledgements
We wish to express our sincere gratitude to everyone who supported us throughout
this thesis project.
Firstly, we are very grateful to our supervisors at Chalmers, Emmanuel Dean and
Mikael Enelund. Their invaluable guidance, insightful feedback, and unwavering
support were instrumental to our progress.
Secondly, we thank Robert Brenick from RISE AstaZero, who was consistently avail-
able to answer technical questions and generously provided us the opportunity to
visit AstaZero’s proving ground at RISE.
Lastly, our work benefited greatly from connections at Penn State. Our peers and
collaborators there - Suryansh Agrawal, Shashank Bommareddy, Aidan Brown,
Rahique Mirza, Jaden Peacock, Eugene Sosa, and Jack Volgren - deserve sincere
appreciation. Their impressive work is admirable, and their readiness to meet and
collaborate, often at short notice, was especially valued. Further thanks are ex-
tended to Darryl Farber for the valued invitation to the “Strategic Foresight and
Global Governance of Critical Technologies & Sociotechnical Systems: Implications
for Peace and Security 2025-2050” conference.

ix


Preface
The work presented in this thesis stems from a significant international collabora-
tion between Chalmers University of Technology (Chalmers) in Gothenburg, Swe-
den, and The Pennsylvania State University (Penn State) in State College, United
States. From January to May 2025, six students from Chalmers and seven students
from Penn State collaborated to achieve a multi-drone safety surveillance system for
autonomous vehicles and a frontend interface for drone control.
This partnership relied on continuous communication through weekly virtual meet-
ings, fostering a regular exchange of ideas, code, and footage. The collaboration was
further strengthened by a visit from the Chalmers team to Penn State, where they
participated in a productive joint workshop with their Penn State teammates.
The project’s interdisciplinary nature is reflected in the participants’ fields of study:
Automation and Mechatronics, Mechanical Engineering, Computer Science, Com-
puter Engineering, and Computational Mathematics. This diverse expertise proved
crucial to the outcomes presented herein.
This thesis covers the work performed in the spring of 2025 within this collabora-
tion.

Abdullah Arshad
Arvid Boisen

Jesper Eriksson
Viggo Forsell

Albin Hallander
Erik Rödin

Gothenburg, May 2025

xi

https://www.linkedin.com/in/abdullahars/
https://www.linkedin.com/in/arvid-boisen/
https://www.linkedin.com/in/jesper-eriksson-080b54211/
https://www.linkedin.com/in/viggo-forsell-2013bb247/
https://www.linkedin.com/in/albin-hallander-76159721a/
https://www.linkedin.com/in/erik-r%C3%B6din-817687249/


List of Acronyms

Below are the acronyms that have been used throughout this thesis listed in alpha-
betical order:

AI Artificial Intelligence
API Application Programming Interface
ATOS Autonomous Vehicle Test Operating System
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
CD Continuous Deployment
CI Continuous Integration
DJI Da-Jiang Innovations, drone designer and manufacturer
GitHub Developer platform that allows developers to create, store, manage,

and share their code
FOV Field Of View
GPU Graphics Processing Unit
GUI Graphical User Interface
HTTP Hypertext Transfer Protocol
ICE Interactive Connectivity Establishment, used in WebRTC
JSON JavaScript Object Notation
MAVlink Messaging protocol for communicating with drones
ML Machine Learning
NAT Network Address Translation
OBB Oriented Bounding Box
Open-source Source code that is made freely available for possible modification

and redistribution
RISE Research Institutes of Sweden
ROS Robot Operating System
SDK Software development kit
SDP Session Description Protocol, used in WebRTC
SIFT Scale-Invariant Feature Transform
STUN-server Session Traversal Utilities for NAT server
TCP Transmission Control Protocol
TURN Traversal Using Relays around NAT, used in WebRTC
YOLO You Only Look Once
WebRTC Web Real-Time Communication

xiii


Contents

1 Introduction 1
1.1 AstaZero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Autonomous Vehicle Test Operating System (ATOS) . . . . . 2
1.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Identified points of improvement . . . . . . . . . . . . . . . . . 3
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Key project milestones . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Demarcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Theory 7
2.1 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Redis and WebSocket . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 DJI SDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Control using DJI SDK . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Video extraction . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Android App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 MAVLink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Robot Operating System . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 ATOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Web Real-Time Communication . . . . . . . . . . . . . . . . . . . . . 10

2.8.1 Videostreaming with WebRTC . . . . . . . . . . . . . . . . . . 11
2.9 Object detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.10 Merging multiple video streams . . . . . . . . . . . . . . . . . . . . . 11
2.11 Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.12 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.13 CI/CD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.14 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.15 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Methodology 17
3.1 Project management . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Testing & Verification . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Version control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4.1 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.2 Drone Positioning . . . . . . . . . . . . . . . . . . . . . . . . . 20

xv


Contents

3.4.3 Real-time video streaming . . . . . . . . . . . . . . . . . . . . 24
3.4.4 Extended video surveillance . . . . . . . . . . . . . . . . . . . 25
3.4.5 Android app . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.6 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4.6.1 Drone control using the frontend . . . . . . . . . . . 28

4 Experimental Validation and Results 31
4.1 Verifying the Docker Setup . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Container Startup and Communication . . . . . . . . . . . . . 31
4.1.2 Monitoring and Debugging with Logs . . . . . . . . . . . . . . 32
4.1.3 Ensuring Consistent and Reproducible Experiments . . . . . . 32
4.1.4 Continuous Integration and Delivery . . . . . . . . . . . . . . 34

4.2 Drone Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Real-time Video Streaming Using DJI Drones . . . . . . . . . . . . . 35

4.3.1 Integrated Functionality For Testing and Debugging . . . . . . 37
4.4 Extended Video Surveillance . . . . . . . . . . . . . . . . . . . . . . . 38

4.4.1 Testing the Subsystem . . . . . . . . . . . . . . . . . . . . . . 39

5 Discussion 41
5.1 Drone Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Data Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Endpoint vs Message Type . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 Real Time Video Streaming In the Developed System and Its Limi-

tations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5 Extended Video Surveillance . . . . . . . . . . . . . . . . . . . . . . . 42
5.6 Suggestions on Future Work . . . . . . . . . . . . . . . . . . . . . . . 44

5.6.1 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.6.2 Extended Video Surveillance . . . . . . . . . . . . . . . . . . . 44

5.7 Social and Ethical Aspects . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Conclusion 47

Bibliography 49

A Code I
A.1 Message type in backend . . . . . . . . . . . . . . . . . . . . . . . . . I

xvi


1
Introduction

In today’s market, technical solutions to everyday problems and comfort standards
are key points to remain competitive and compliant with regulations. Examples
of such markets are the automotive industry, where advanced automated driving
assistance and road safety are important [1].
Outside of Borås, Sweden, the company AstaZero operates a vehicle testing facility.
At this site, autonomous vehicles can be tested in a wide range of traffic scenarios,
including urban environments with multilane traffic and isolated country roads [2].
To ensure safe and efficient vehicle tests, the facility requires constant monitoring.
By leveraging applied machine learning, drones can be used to observe test vehicles
and detect behaviors that deviate from normal patterns. This enables the identi-
fication of potential hazards during autonomous vehicle testing, issues that might
otherwise be difficult for a human operator to detect on their own.
Drone surveillance systems facilitate object detection across a variety of simulated
traffic scenarios. By enabling the simultaneous operation of multiple drones, the
system is capable of covering an extended testing area. Through this approach,
a greater area can be covered while maintaining sufficient video resolution in the
footage.
This project is a collaboration between AstaZero and a team of students from both
Chalmers University of Technology hereafter Chalmers, and Pennsylvania State Uni-
versity hereafter Penn State. The purpose of this thesis is to document the contin-
ued development of a drone surveillance system [3]. The project will involve cross-
disciplinary work, including DevOps, front- and backend development, as well as
data processing.

1


1. Introduction

1.1 AstaZero
This project is a collaboration with AstaZero, an organization owned by Research In-
stitutes of Sweden hereafter RISE. RISE is an independent, state-owned research in-
stitute to strengthen Sweden’s competitiveness and contribute to sustainable growth
[2]. AstaZero operates a test facility, illustrated in Fig. 1.1, specifically designed for
the development and evaluation of autonomous vehicle technologies. The facility
enables simulation of a wide range of traffic environments and testing of advanced
safety systems under a wide variety of conditions [4].

Figure 1.1: Map of AstaZero test facility [5].

1.1.1 Autonomous Vehicle Test Operating System (ATOS)
To operate the test facility, AstaZero has developed its own autonomous system,
known as the Autonomous Vehicle Test Operating System (ATOS). This system
functions as a central control hub and coordinator for scenario-based testing of
autonomous vehicles.
ATOS determines the trajectories of the test participants, such as vehicles and
dummy pedestrians, as well as the location of surrounding infrastructure. To ensure
precise and repeatable test scenarios, ATOS uses an internal Cartesian coordinate
system to position each participant accurately [6]. ATOS is built on the open-source
software Robot Operating System 2 (ROS 2), and the source code can be accessed
on AstaZero’s GitHub [7].

2


1. Introduction

1.2 Previous work
The system previously developed by Mohi et al. [3] at Chalmers, illustrated in
Fig. 1.2, comprises a single drone, a mobile app, a backend, and an ATOS module.
The mobile app connects to the backend to enable comparison between the positions
of objects detected by the drone camera and the corresponding positions provided
by ATOS. If anomalies are detected, there is support for sending a command to
ATOS to abort the test.

In addition, the communication software (backend) calculates the desired position
of the drone. The mobile app facilitates communication between the drone and
provides a graphical user interface (GUI) that allows users to operate the drone
within system constraints by inputting coordinates and adjusting camera settings
[3].

Figure 1.2: An overview of the system developed in previous work [3]. Some of
the modules, e.g., Compare Positions are not yet fully developed. Image taken from
[3].

1.2.1 Identified points of improvement
The existing system’s monitoring capabilities are constrained by the limited area
that a single drone can cover. When operating at close range, the monitored area
is insufficient for comprehensive data collection. However, operating at a greater
distance results in inadequate image resolution for detailed analysis. A system
that is capable of using multiple drones and thus multiple video streams during
the test monitoring is required. This will allow a larger surveillance image of the
test area in real time. Expanding the monitored area will enable the monitoring of
larger test scenarios at AstaZero’s facility, thereby increasing the system’s overall
usefulness.

Furthermore, surveillance with the current drone system is run on proprietary soft-

3


1. Introduction

ware. This severely limits the applicability of the system since it requires a specific
software to be developed for each specific hardware. Migrating the control system
to an open-source communication layer is therefore also needed. This will signifi-
cantly increase the availability of such a drone monitoring system. In addition, the
system currently lacks support for real-time video streaming between the drone and
the backend, which is essential for enabling real-time object detection. Hence, a
low-latency video streaming solution needs to be implemented.

1.3 Objectives
The purpose of this project is to improve the testing capabilities of the AstaZero
test track. This improvement involves the development of an autonomous system
capable of controlling multiple drones, thereby enabling effective supervision of large
areas through the integration of multiple video streams with different Field Of View
(FOV). The drone control system, in conjunction with ATOS, will be utilized to ef-
ficiently plan drone waypoint missions and detect anomalies or unexpected behavior
in test objects.
In previous work, students developed a system supporting a single drone, built on
the existing software from Da-Jiang Innovation (DJI). This software allowed com-
munication exclusively with DJI drones. To reduce reliance on foreign technology
systems, which pose risks related to cybersecurity and information leaks, an open-
source application will be developed to replicate the functionality of DJI’s Software
Development Kit (SDK). This will allow integration of drones from other manu-
facturers into a unified open-source platform. Additionally, a toggle switch will be
implemented to enable seamless switching between between multiple control plat-
forms.

1.3.1 Key project milestones
The key project milestones are outlined below, highlighting the major phases and
deliverables critical to the project’s success:

1. Drone Formation: Optimally position the drones using test trajectories from
ATOS.
(a) Pose control of one drone using ATOS: Implement and validate the po-

sitioning of one drone from test trajectories from ATOS.
(b) Pose control of multiple drones using ATOS: Extend the control system

to manage multiple drones simultaneously from the same ATOS instance.
(c) Positioning of drones for optimal test coverage: Develop and test strate-

gies or algorithms to determine the relative positions and orientations of
the drones required to maximize the visual coverage of a designated area
or target object, considering overlap, resolution, and potential occlusions.

2. Combined Video Streams: From two drones, create one single panoramic
view.
(a) Merging of video streams: Combine the frames of each stream into one

seamless video.

4


1. Introduction

(b) Object detection using merged video: Apply a computer vision algorithm
to the combined video feed to detect objects of interest within the wider
field of view provided by the multi-drone system.

3. Frontend Interface: Developing a frontend interface where drone position
and telemetry data are displayed. The individual drone feeds and the stitched
video are also presented. Drone control is also possible from the frontend
interface.
(a) Drone control: Implement buttons which can arm, take off, and command

the drone to return to its origin (home).
(b) Video streaming: Receive individual video feeds from the drones.
(c) Drone location and telemetry: Retrieve the position and telemetry from

the drone, such as speed, height, and battery state. The data should be
shown in the frontend interface.

4. Toggle Switch: Create an interface that can receive instructions and com-
municate with with different drone application programming interface (API).

• DJI Communication: Successfully implement and verify bidirectional
communication with DJI drones. This includes sending commands and
receiving telemetry and video data.

• MAVLink Communication: Implement and verify communication using
the MAVLink protocol, enabling interaction with drones running com-
patible flight stacks.

• Control of one drone using simulation: Develop and test the drone control
algorithms within a simulated environment.

• Open Source Translation/Toggling Layer: Design and implement an ab-
straction layer within the software architecture. This layer should allow
the core application logic to interact with different drone types through
a unified interface.

• Integrated control of multiple drones with merged video: Control two
drones of different types while processing and stitching their respective
video streams.

1.4 Demarcations
• This project will focus on software development exclusively.
• The detection software will be able to detect bikes, pedestrians, and motor

vehicles.
• Since testing and usage primarily occur during the daytime in the dry, the

system will be tested only for such conditions.
• The drones will only be operated in an outdoor environment, and, in the testing

phase, the system will only be evaluated with two drones due to limited access
to drones.

5


1. Introduction

6


2
Theory

This chapter presents the theoretical foundation necessary to carry out the project.
It outlines key concepts, methods, and background knowledge about the project’s
design and implementation.

2.1 Docker
The software platform Docker can be used to build, test, and deploy applications
quickly. Docker packages software into standardized units called containers that
have all the dependencies the software needs to run, including libraries, system tools,
code, and runtime [8]. A Docker image is a read-only template with instructions
for creating a Docker container [9]. The image contains all of the files, binaries,
libraries, and configurations to run the container [10].

A Docker container is a runnable instance of an image. By default, a container is
relatively well isolated from other containers and its host machine [9]. This ensures
that other containers cannot interact with each other, enhancing security. With
isolation, the libraries and dependencies are the same between different hosts.

2.2 Redis and WebSocket
Redis is an in-memory key-value real-time database [11]. A key-value database
uses a key-value pair to store data. The key is a unique string, and the value is an
arbitrarily large data field that should be stored [12]. The Fig. 2.1 shows an example
of a key-value pair.

Redis communicates over a Transmission Control Protocol (TCP) connection. This
allows other clients across different machines to connect to the server. Redis supports
a publish/subscribe paradigm, where publishers are not designed to send messages
to specific receivers. Instead, messages are published on channels without awareness
of whether there are subscribers. Subscribers, in turn, declare interest in one or
more channels and receive only the messages relevant to those channels, without
knowledge of the publishers’ identities or activities [14]. This allows scalability
across machines.

WebSocket is a communication protocol providing a full-duplex communication
channel [15]. Both client and server can send requests using WebSockets, which
is visualized in Fig. 2.2.

7


2. Theory

Figure 2.1: Example of a key-value database [13] Clescop, CC BY-SA 4.0, via
Wikimedia Commons.

Figure 2.2: WebSocket connection [15] Brivadeneira, CC BY-SA 4.0, via Wikime-
dia Commons.

2.3 DJI SDK
The DJI Mobile SDK V4 is a library of classes that integrates with Android apps
to control drones from a mobile device. [16]. The mobile SDK has a manager class
that can be used to, for example, register the SDK and monitor nearby fly zones.
The Base class contains a video feeder that manages the live feed from the DJI
product to the mobile device. The Product and Component classes are used to get
live telemetry and send commands to the products. The Mission class is used to
construct custom missions for the drones, such as setting waypoints for the drones.
The Misc class handles, for example, errors and diagnostics [16].

2.3.1 Control using DJI SDK
The drones utilize three primary flight modes: Arm, Take Off, and Return to Home.
Arming the drone places it in a ready-for-takeoff state where the motors are enabled,

8

https://creativecommons.org/licenses/by-sa/4.0/
https://creativecommons.org/licenses/by-sa/4.0/


2. Theory

and safety checks are run to ensure the drone is ready to fly. The Take Off mode
flies a preprogrammed waypoint mission. The waypoint coordinates are retrieved
via a WebSocket connection from the backend. Return to Home mode flies the drone
back to the position where it initially took off.

2.3.2 Video extraction
The DJICodecManager class is a part of the DJI Mobile SDK that provides meth-
ods for encoding and decoding video. By utilizing its functions enabledYuvData
and setYuvDataCallback, YUV420 pixel matrices can be retrieved from the DJI
drone’s live video feed. The drone camera’s video feed can be accessed using the
VideoFeeder class provided by the DJI Mobile SDK. This is done in combination
with VideoDataListener which is a callback interface that processes incoming video
frames via its onReceive method [17, 18].

2.4 Android App
In the work presented by Mohi et al. [3], an Android app was developed for com-
municating between the backend and DJI Mobile SDK V4. The Android app uses
a WebSocket to establish an asynchronous bidirectional connection. The Android
app connects to the backend, which requires exposing ports on the host. Informa-
tion exchange between the app and the backend is done in string format. A data
interchange format that could be used in similar systems is JavaScript Object No-
tation (JSON), which is a data format that is used to store and transfer data using
a “name-value” pair [19].

2.5 MAVLink
MAVLink is a very lightweight, header-only message library for communication be-
tween drones and/or ground control stations [20]. MAVLink is published under
the LGPL license [21], which makes it suitable to use with an open source drone.
Using an open-source drone offers multiple advantages, such as avoiding vendor
lock-in, transparency, and customization. Compared to the DJI SDK described
in Section 2.3, which is only compatible with drones made by DJI, MAVLink is
compatible with multiple different drone manufacturers.

2.6 Robot Operating System
Robot Operating System 2 (ROS 2) is a further development of the original ROS 2
and is an open source software package specifically designed for robotics applications
[22]. It offers a unified platform for robotic software development and is widely used
today in the robotics industry. Because ROS 2 is open source, users have complete
control over how it is implemented and can customize it to their specific needs [22].
It also works seamlessly with other software stacks and can be easily integrated into

9


2. Theory

existing systems. ROS 2 is fully open source, ensuring free access to an advanced
and complete robotics SDK. [22].

2.7 ATOS

AstaZero has developed the Autonomous Vehicle Test Operating System (ATOS) to
function as a control center and coordinator for scenario-based testing of autonomous
vehicles. ATOS controls the test participants, such as vehicles and dummy pedes-
trians, as well as the surrounding infrastructure. To ensure precise and repeatable
executions, ATOS utilizes GPS [6]. ATOS is built on ROS 2 and is open-source.
The ATOS code is published on AstaZero’s GitHub [7].

An ATOS-trajectory is defined as a dictionary of key-value pairs. The key is made up
of a string describing the test participant, and the value consists of a list of objects
in the Coordinate type, with attributes lat, lng, and alt. The list of coordinates
describes the trajectory of the vehicle. These coordinates are in the ATOS internal
Cartesian coordinate system and can be mapped onto latitude and longitude using
the drone origin coordinate. The code implementation of this process can be found
in Appendix B in [3]

2.8 Web Real-Time Communication

Web Real-Time Communication (WebRTC) is a low-latency streaming framework
commonly used in applications for video calls, such as Google Meet. WebRTC
provides a JavaScript API that enables direct integration [23]. In Python, several
libraries have been developed to follow this API. One of them is aiortc, which en-
ables Python-based integration of WebRTC [24]. Another benefit of using WebRTC
is that it allows multiple peer-to-peer connections to be established on the same
network. The connection between peers is established through what is commonly
known as a “WebRTC handshake”.

The first step in this process is for each peer to connect to a signaling server; for
this, a WebSocket could be used. The first peer connects to a Session Traversal
Utilities for NAT (STUN) server to obtain its Network Address Translation (NAT)
information. Using the signaling server, the first peer then requests a Traversal
Using Relays around NAT (TURN) server. After this, the first peer sends a Session
Description Protocol (SDP) offer to the second peer, which responds with an SDP
answer, both utilizing the signaling server to send these messages.

The first peer receives the SDP answer and sets up a remote description, which
forms the basic network setup. This is followed by an Interactive Connectivity Es-
tablishment (ICE) candidate exchange between the peers. ICE candidates contain
information about the peers and how they can discover one another on the net-
work, enabling basic connectivity for video and audio transmission between the
peers [23].

10


2. Theory

2.8.1 Videostreaming with WebRTC
Once the peer-to-peer connection is established, the WebRTC API provides support
for streaming media between peers. The media stream is added asynchronously
to the RTCPeerConnection by the first peer, attaching a media track. On the re-
ceiving side, the second peer retrieves the incoming stream using a listener event
provided by the WebRTC API (ontrack event), which is triggered when a new
media track is added to the connection. This enables the transmission of real-
time video between peers [25]. WebRTC supports several video formats, including
VP8, VP9, and H.264. Encoding in one of these formats can be managed using
the DefaultVideoEncoderFactory provided by the WebRTC API [26]. For exam-
ple, H.264 supports various pixel formats such as YUV420, YUV422, and YUV444
[27].

2.9 Object detection
Object detection is one of the most central and challenging tasks in computer vision
and has undergone significant development over the past decades. Early approaches
relied heavily on handcrafted visual descriptors such as the Scale-Invariant Feature
Transform (SIFT) and the Histogram of Oriented Gradients (HOG), which provided
a degree of robustness to scale and orientation variations [28, 29].
A particularly influential method in this domain is You Only Look Once (YOLO),
which introduced a novel paradigm for object detection. In contrast to earlier ar-
chitectures, such as R-CNN, that employ multiple sequential steps, including region
proposal, classification, and post-processing, YOLO treats detection as a single re-
gression problem. The model predicts both object classes and bounding box coor-
dinates directly from raw image pixels in a single forward pass [30].
One of YOLO’s key strengths lies in its ability to analyze the entire image during
both training and inference. This holistic approach allows the model to implicitly
incorporate contextual information, thereby reducing the likelihood of misclassify-
ing background regions. This characteristic makes YOLO particularly well-suited
for use in complex and dynamic environments where multiple objects may appear
simultaneously [30].

2.10 Merging multiple video streams
To add an extended monitoring of the test area, the information from several cameras
is combined, creating a panoramic view. This method aims to create a seamless
video and is known as video stitching. As shown in Fig. 2.3 and Fig. 2.4, each
camera captures a part of the scene from different angles. To successfully generate
the stitched panorama see Fig. 2.5, an overlapping (FOV) between the two drone
cameras is required. The overlap ensures that corresponding features can be aligned
accurately to form a unified image.
To create a smooth transition between the video streams, the overlapping area in
the cameras’ FOV needs to be smoothed and blended. Fig. 2.6 and Fig. 2.7 shows

11


2. Theory

Figure 2.3: Footage left camera. Figure 2.4: Footage right camera.

Figure 2.5: Stitched image.

the same area from two different perspectives and changes in angle. Fig. 2.8 presents
the result after blending has been applied to combine them. Since the surveillance
scenarios specifically involve the drones maintaining a static position, free from sharp
changes in angle or perspective, a linear blending method can be used.
A very computationally efficient linear blending method is Alpha blending [31]. Al-
pha blending consists of weighting each individual pixel gradually in the overlapping
area, where the coefficient α can take a value in the range [0, 1] and is determined
by how close the pixel is to the left edge of the right image or the right edge of the
left image. The extreme value 0 means that only the left image is used, and the
extreme value 1 means that only the right image is used.

2.11 Threading
In video processing, especially when it comes to real-time requirements or process-
ing large amounts of data, the use of multithreading becomes a key optimization
technique. By dividing the workload among multiple threads, the CPU’s resources
can be utilized more efficiently, reducing wait times and increasing the overall per-
formance of the system.
A common implementation strategy for multithreaded programming is the producer-
consumer pattern [32]. This design model is based on two separate threads, a

12


2. Theory

Figure 2.6: Left image before blending. Figure 2.7: Right image before blend-
ing.

Figure 2.8: Image after blending.

producer and a consumer, that communicate via a shared data cache. The producer
thread is responsible for generating or loading data, while the consumer thread
processes this data. A wait/notify mechanism is used to manage synchronization: if
the cache is full, the producer is blocked, and if it is empty, the consumer is blocked.
This pattern allows for continuous work in at least one of the threads, which prevents
the system from stalling [32].

In video-based synthesis, multiple frames or video streams are processed and com-
bined to generate new visual data, such as reconstructed frames, interpolated mo-
tion, or enhanced video output. This type of processing often involves several compu-
tational stages, including decoding, transformation, and rendering. Multithreading
plays a critical role here by enabling these tasks to run in parallel. For example,
while one thread decodes incoming video frames and stores them in a buffer, another
thread can simultaneously process previously decoded frames to perform synthesis
operations. By breaking down the workflow into concurrent threads, the system can
maintain high throughput and responsiveness [32].

Another key use case for multithreading in video processing is motion vector com-
putation, a fundamental step in video compression. Techniques such as optical flow
computation, which estimates the motion between pixels in sequential frames, are
particularly computationally intensive. Since each framework in this process is inde-
pendent of the others, these operations can be effectively parallelized across multiple
cores [33].

13


2. Theory

2.12 Frontend
Modern web-based interfaces for drones are commonly designed using modular web
technologies that allow for dynamic, real-time data display and interaction. A com-
mon architecture includes the use of structured HTML for layout, CSS frameworks
like Bootstrap for responsive design, and client-side JavaScript libraries such as
Leaflet.js for interactive geospatial visualization [34].
This architecture supports real-time telemetry visualization, responsive user control
elements, and synchronized updates of drone state and geolocation. Such frontend
designs align with modern principles of human-robot interaction, which empha-
size low-latency responsiveness, usability, and system modularity. A key goal in
human-robot interaction interfaces is to support real-time monitoring and control
while minimizing cognitive load for the operator [35]. Additionally, modular fron-
tend architectures allow integration with heterogeneous backends such as robotic
controllers, simulation environments, or mission management platforms, thereby
supporting flexible and extensible robotic systems [36].

2.13 CI/CD
Continuous Integration (CI), is a practice in which members of a team integrate and
merge development work (e.g., code) frequently, for example, multiple times per day.
CI enables software companies to have shorter and more frequent release cycles, im-
prove software quality, and increase their teams’ productivity. This practice includes
automated software building and testing [37].
Continuous Deployment (CD); the goal of this practice is to automatically and
steadily deploy every change into the production environment [37].

2.14 Hardware
The hardware used in this project consists of a DJI Mavic 2 Enterprise, which is
shown in Fig. 2.9, and the Holybro x500 v2 Development Kit. The DJI drone
communicates with the remote controller over a USB connection and is compatible
with the DJI SDK described in Section 2.3 whereas the Holybro drone, sometimes
referred to as an open-source drone, communicates over 915 MHz radio frequency
using MAVLink packets, see Section 2.5. The camera FOV for the DJI drone is 16:9
which is used for determining drone area coverage.

2.15 CUDA
CUDA is a parallel computing platform and programming model developed by
NVIDIA for general computing on Graphical Processing Units (GPUs). With CUDA,
developers are able to dramatically speed up computing applications by harnessing
the power of GPUs [38].

14


2. Theory

Figure 2.9: Figure shows DJI Mavic 2 drone, ZLEA CC BY-SA 4.0, via Wikimedia
Commons.

15

https://creativecommons.org/licenses/by-sa/4.0/


2. Theory

16


3
Methodology

This project will build upon previous work on identification models, which can
identify various objects in the test area, including trucks, bicycles, and pedestrians
[3]. The control system for the drones will be implemented using MAVLink [39],
described in Section 2.5 and using the DJI SDK described in Section 2.3. The
drone controller developed in previous work, which utilizes an Android app will
be extended. A frontend interface will serve as the main control center for the
drones.

3.1 Project management
For this project, the team adopted the Scrum framework based on the Agile method-
ology. The Scrum framework was well suited for this project due to the ability to
adapt to changing requirements and unforeseen change [40]. Other methodologies,
such as the Waterfall methodology, were not suitable since the project requires con-
tinuous testing and being able to adapt to feedback during development.

The work in this framework was split into Epics, also referred to as Milestones.
Milestones were then further broken down into a number of smaller Stories, or
sometimes called Issues [41].

All the Stories were initially placed in the To-do section of the team’s backlog. As
a student began working on a Story, it was moved to the In progress section. Upon
completion, the Story was moved to the Done section. This workflow is visually
represented on the Scrum board, as illustrated in Fig. 3.1.

inspirers Projects CTH Board Type /  to search

CTH Board Add status update

New viewBacklog Priority board Team items Roadmap In review My items

Discard Save-status:Ready,"In review"

To-do 2 / 5 Estimate: 0

This item hasn't been started

Add item

In progress 3 / 20 Estimate: 0

This is actively being worked on

Add item

Done 17 Estimate: 0

This has been completed

Add item

Drone-platform-for-safety-in-testing #32

Documentation in Git

Drone-platform-for-safety-in-testing #20

Adding and removing current connected

drones for frontend

Draft

Incorrect colors for frontend

Drone-platform-for-safety-in-testing #69

Compare expected and actual coordinates

Drone-platform-for-safety-in-testing #65

Testing and verification

Drone-platform-for-safety-in-testing #47

GPS coords

Drone-platform-for-safety-in-testing
#18

Display video in frontend

Drone-platform-for-safety-in-testing
#41

Optimization of overlap and coverage of

test area

Drone-platform-for-safety-in-testing #19

Display coordinates and drone data in

frontend

Drone-platform-for-safety-in-testing #8

Risk assessment

D l tf f f t i t ti

22

5/5/25, 3:32 PM Backlog · CTH Board

https://github.com/users/inspirers/projects/2/views/1?filterQuery=-status%3AReady%2C"In+review" 1/2

Figure 3.1: Scrum board hosted on GitHub.

17


3. Methodology

3.2 Testing & Verification

For the test of code during development, multiple techniques were used, for exam-
ple, the theory from Section 2.13, and a CI pipeline was configured within GitHub
Actions. This pipeline is automatically triggered whenever a developer initiates a
pull request to merge changes into the main branch. The pipeline executes a number
of unit tests. If any test fails, the merge is blocked, preventing the introduction of
faulty code into the main branch.

Upon successful completion of the unit tests. A second developer examined the pro-
posed changes for code quality, style adherence, potential bugs, and overall main-
tainability. The pull request required approval from the reviewer before it could be
merged. If the code review identifies issues or suggests improvements, the original
developer addresses the feedback and updates the pull request.

Once code changes are successfully merged into the main branch, a CD pipeline
is automatically initiated. The pipeline built and pushed a new Docker image to
the container registry to incorporate the latest code changes. The updated Docker
image was then deployed, either through a manual process or using a tool, for ex-
ample, Argo CD Image Updater to automatically detect the new image and initiate
deployment [42].

3.3 Version control

The project used Git for version control. The project’s Git repository was hosted
on GitHub, leveraging its features for code management, collaboration, and CI/CD.
The repository is publicly available at GitHub [43]. For new features or bug fixes,
developers created dedicated feature branches from the main branch. When devel-
opment was completed, a pull request was initiated to merge the feature branch into
main. This process follows the steps outlined in Section 3.2.

3.4 Software

The complete system, illustrated in Fig. 3.2, comprises various smaller subsystems,
incorporating pre-existing components such as ATOS (discussed in Section 2.7) and
the YOLO software (explored in Section 2.9). The system also builds upon previous
work, as mentioned in Section 1.2. The development tasks were divided among
the students illustarted in Fig. 3.2. The Chalmers team (green boxes) focused on
the frontend interface, combining video streams (stitching), video transmission, and
object detection, while the Penn State team (blue boxes) worked on making the
system compatible with both the DJI SDK and the MAVLink platform. For the
sake of clarity, detailed discussions about the Toggle switch and MAVLink platform
are beyond the scope of this thesis. Interested readers are referred to the work by
Penn State for further information [44].

18

https://github.com/inspirers/Drone-platform-for-safety-in-testing


3. Methodology

Open-source Drone(s)

Dji Drone(s)


Chalmers

Transmitter

(Hand control)

Transmitter

Video 2x Video 2x

Flight plan

Video 2x

Serial radio MAV 
SDK

DJI 
SDK

DJI-waypoint

Video 2x

Flight plan

Toggle 
switch

State,

Video 2x

State,

Video 2x

ATOS

Frontend

(Web-app)

Start/Stop

Toggle Switch

GUI

Combined video

Drone & ATOS state

Target positions 
(lat, long, alt), 
Gimbal settings

Target 
position

Lat,Long,Alt

Gimbal settings Combined video

ABORT-signal

Expected test coordinates

and test origin

Calculate test 
boundaries

Penn State

Pixel coordinates 
and labels

Anomalies 
detected?

Yes

No

Merging 

Video streams

Calculate the 
drone positions

Drone state

Drone type

Drone height

Abort signal

Start/Stop signal

Home positions

Lat, Long, Alt

YOLO

Object 
detection

Redis

Figure 3.2: UML diagram for the complete system, the gray boxes refer to external
existing software provided by [6, 30].

3.4.1 Docker
The system architecture leveraged Docker to enhance modularity and deployment
flexibility, see Section 2.1. Three separate Docker images were defined, correspond-
ing to the primary functional units: the backend, the image stitching, and the fron-
tend interface. This containerization strategy significantly simplified deployment
onto any Linux AMD64 machine by abstracting away underlying system config-
urations. The deployment workflow supported obtaining pre-built images from a
container registry or building the images directly from source using the provided
Dockerfiles. All containers communicated with each other using an internal Docker
network.
The backend container utilized the ATOS image, detailed in Section 2.7, as its
base. This ATOS image, in turn, was built upon a ROS 2 base image, described in
Section 2.6. A Dockerfile orchestrated the build process for the backend image:

1. It began by specifying the ROS 2-based ATOS image as its base.
2. Necessary Python libraries were then installed.
3. Subsequently, the project’s source files were copied into the image.
4. The ROS 2 packages within the project were then compiled.
5. Finally, the ROS 2 workspace setup was sourced in the container’s .bashrc1

file to ensure the environment was correctly configured on startup.
To manage this multi-container application, a Docker Compose file was created. This
file serves as a blueprint, defining how all Docker services, networks, and volumes
are configured and orchestrated. Specifically, it:

• Defined the start command for the backend container, initiating the main loop
within its ROS 2 package.

1.bashrc files are configuration files that define the bash shell environment.

19


3. Methodology

• Ensured all services communicated over a shared internal network named
atos-net.

• Specified the necessary port mappings for external access:

– Port 14500/tcp: Mapped for communication between the Android app.

– Port 8000/tcp: Mapped for interaction with the frontend interface ser-
vice.

– Port 3478/udp: Exposed for the STUN server functionality.

– Port 8765/udp: Exposed for WebRTC signaling.

• Facilitated the configuration of essential environment variables for the con-
tainers:

– ENV_LATITUDE: Sets a custom latitude for the ATOS origin.

– ENV_LONGITUDE: Sets a custom longitude for the ATOS origin.

– DEBUG_MODE (True/False): Enables a debug mode, utilizing a predefined
ATOS origin and outputting additional debug messages.

– N_DRONES: Specifies the number of drones to be used.

– OVERLAP (0.0-1.0): Defines the desired camera image overlap percentage
between adjacent drones.

3.4.2 Drone Positioning

The drone positioning is controlled through a Python script in the backend. It
is based on the drone positioning function used in the previous work [3]. In this
project, we extended this function for multiple drones instead of one. The goal of
the script is to evaluate the ATOS trajectories and produce the optimal coordinate
placements for the drones. The efficiency of the placements is evaluated based on
effective use of the drones FOV. It is required that the drones cover the entire test
area, and that the placements maximize effective use of the drones FOV.

The drone-localization function extracts the ATOS-trajectories into an array of co-
ordinates C with respect to the internal ATOS Cartesian plane. A nested function
is used to compute the Oriented Bounding Box (OBB) of the convex hull of the
points array, see Fig. 3.3.

20


3. Methodology

40 20 0 20 40
ATOS x-coordinate

60

40

20

0

20

AT
OS

 y
-c

oo
rd

in
at

e

Drone Coverage Area
Data Points
Min Area Rectangle

Figure 3.3: Oriented bounding box of sample trajectory.

The OBB calculating function that is applied is the one described in [45], Listing
1. The pseudocode was translated from C++ into Python using ChatGPT. The
results were verified through testing the function with an arbitrary input trajectory
dictionary and plotting the result using Matplotlib in Python.

60 40 20 0 20 40 60 80 100
ATOS x-coordinate

80

60

40

20

0

20

40

60

80

AT
OS

 y
-c

oo
rd

in
at

e

Drone height: 36 m

Drone Coverage Area
Data Points
Min Area Rectangle
Drone Coverage
Drone Center

Figure 3.4: Drone coverage area 1.

50 0 50 100 150
ATOS x-coordinate

100

50

0

50

100

AT
OS

 y
-c

oo
rd

in
at

e

Drone height: 50 m

Drone Coverage Area
Data Points
Min Area Rectangle
Drone Coverage
Drone Center

Figure 3.5: Drone coverage area 2.

The result was assessed and deemed to be satisfactory because the drone squares
cover the entire area while also respecting the overlap parameter.

The attributes of the OBB are the axis Arect ∈ R2×2, center ccenter ∈ R2×1, the
extent (half-lengths of the sides of the triangle, see Fig. 3.6) erect ∈ R2×1 and the
area arect ∈ R.

21


3. Methodology

Extent 1

Extent 2

Figure 3.6: Extent of the OBB.

The axis stores the unit vectors in a matrix for the coordinate system on which the
OBB is defined. The center variable ccenter stores the center position of the rectangle,
and the extent refers to the half-lengths of the rectangle sides in the directions of
the unit vectors. An important assumption that is made is that the points are not
collinear. These cases are handled by a function that returns a bool indicating if
the points are collinear, and if they are, then the rectangle is manually constructed
to fit the coordinates using the following calculation. First, the center is taken as
the mean, and the list is sorted,

ccenter = mean(C, axis=0) and (3.1)
Csorted = sorted(C, key = p0) (3.2)

where p0 refers to the first value in the coordinate pair and axis=0 refers to sorting
the array on the first value of the coordinate pair. Then, the last point of the array
is stored in a variable

cend = Cn. (3.3)

A vector d is constructed between the center and the last point

d = cend − ccenter. (3.4)

The vector d is normalized and stored in U0 as a unit vector. The perpendicular
unit vector is then stored in U1

U0 = norm(d), (3.5)
U1 = perp(U0), (3.6)

where perp(α) is a function that obtains the perpendicular vector of α, and norm(α)
is a function that obtains the normalized vector of α. The max extent of the
rectangle is defined as the absolute value of the vector d, and the short extent is set
to half of emax. These values are stored in an array

emax = |d|, (3.7)

erect =
[
emax
emax

2

]
. (3.8)

22


3. Methodology

The axis Arect is stored in an array, and the area for the rectangle arect is calcu-
lated

Arect =
[
U0
U1

]
, (3.9)

arect = 4 erect,0 erect,1. (3.10)

When the OBB has been calculated, the drone boxes and positions cdrones are ex-
tracted. The split axis saxis along which the drones are placed is set as the unit
vector parallel to the side of the longest extent, and the axis for which the drone
angles are calculated is defined as the unit vector perpendicular to the split axis.
The measurements are visualized in Fig. 3.7.

Split Axis

Split Offset

Angle Axis

Figure 3.7: Visualization of axes and offset.

The calculation for the distance between the drones soffset is as follows, for a certain
overlap ratio ϕ. w and hrect refers to the width and height of the rectangle respec-
tively. The drone rectangles maintain a 16:9 aspect ratio, which is due to the aspect
ratio of the drone’s camera described in Section 2.14. The different parameters are
calculated as

r = 16
9 , (3.11)

hrect =
√

arect

ndrones r
, (3.12)

w = hrect r, (3.13)
soffset = w (1 + (1− 2 ϕ)). (3.14)

The drone centers are calculated using a for-loop in list comprehension shown
in

Algorithm 1 Compute Drone Positions in a Line Formation
Require: Number of drones ndrones, center position crect, offset scalar soffset, axis

vector saxis
1: for i = 0 to ndrones − 1 do
2: cdrones,i ← crect +

(
i− ndrones−1

2

)
soffset saxis

3: end for

23


3. Methodology

The height is then retrieved from a height calculating function from Section 3.2.3
in [3]. For this function, the drone FOV area adrone = w hrect is used. If the height
of the drones is below 30 meters, it is locked at 30 meters, as the object detection
model is trained at this area. In this case, the calculation is done in reverse, starting
by fixing the height and then calculating the optimal drone positions in order to
obey the overlap.

The drone coordinates are then converted to latitude and longitude relative to the
test origin. The test origin is a coordinate in latitude and longitude that represents
the origo of the ATOS coordinate system. The purpose of this function is to position
the drones in such a way that their FOV collectively cover the entire OBB calculated
from the ATOS trajectory points, while always maintaining the required overlap
between the FOVs, regardless of the drones altitude.

3.4.3 Real-time video streaming
Using the DJICodecManager class of DJI’s Mobile SDK V4, as described in Sec-
tion 2.3.2, video frames could be extracted from the DJI drone in real-time in
YUV420 pixel format. As stated in Section 2.8.1, WebRTC supports H.264 en-
coded video, which is compatible with the YUV420 pixel format. Using this format,
a WebRTC-based system for transmitting the drone’s video stream from the appli-
cation to the backend could be implemented as illustrated in Fig. 3.8.

 Drone-

camera

Peer-
connection

DJICodec
Manager

NV12-
Buffer

YUV420-frame YUV420-frameYUV420-frame

Frontend

Redis
Video


stitching & 
detection

JPG-frame

JPG-frame

Frame 
converter


(set_frame)

JPG-frame

JPG-frame

YUV420-frame

Android-app

Backend

Figure 3.8: System architecture for the WebRTC-based video streaming system.

Since the Android app can connect to the backend via a WebSocket, as described
in Section 2.2, this same WebSocket connection can also function as the signaling

24


3. Methodology

server for the WebRTC handshake, explained in Section 2.2. Because all devices are
located on the same local network, peer-to-peer connections should be achievable
without the need for either a STUN or a TURN server, explained in Section 2.8,
enabling direct communication between peers. When running in Docker described
in Section 3.4.1, it is necessary to use port forwarding for this to work.
However, since the server-side client (backend) was implemented in Python using
the aiortc library, which requires a STUN server address during ICE configuration,
an ICE server had to be specified. For this purpose, Google’s public STUN server
was used. When the system is offline, this server remains unused, as peers can find
each other directly when located on the same local network.
Once video frames are received by the backend, they are converted into JPG format
and stored in Redis with a key indicating their origin drone, as described in Sec-
tion 2.1. This makes the frames accessible throughout the backend and frontend.
The formatting ensures compatibility with the video stitching and the frontend in-
terface.

3.4.4 Extended video surveillance
Using the theory in Section 2.10, the live streams from both drones can be merged
into one panoramic view. The two images Ileft(x) and Iright(x) are those com-
bined by the blend for each pixel x. The formula for the blending is obtained
in Eq. (3.15),

Iblend(x) = (1− α(x)) Ileft(x) + α(x)Iright(x), (3.15)

where Iblend(x) represents the blended pixel at position x, Ileft(x) and Iright(x) are the
pixel values in the left and right images at position x, respectively, while α(x) is the
weight that controls how much each image contributes to the blended image.
Detecting objects from merged video streams requires both proper synchronization
between drones and effective processing and analysis. Using the theory from Sec-
tion 2.9 on object detection principles, it is evident that real-time detection requires
efficient computational models. For real-time detection, models such as YOLO are
essential. YOLO performs direct object detection, minimizing the computational
load compared to traditional network-based approaches [46]. By using YOLO, both
stationary and moving objects can be detected and tracked across each camera
view.
To enable tracking of objects detected by YOLO, the coordinates of the objects need
to be calculated. This method was originally developed in previous work described
in Section 1.2 and focuses on saving the center pixel from each bounding box. Using
the drone’s altitude, FOV, and GPS position, an estimated position for all objects
is calculated [3].
To allow positioning of multiple drones, a weighted GPS position needs to be cal-
culated to locate the object between the two cameras in the merged image. By
calculating a weighted alpha value α, the position of the object is interpolated be-
tween the GPS coordinates of the left and right cameras, depending on the location

25


3. Methodology

of the object in the image. If the object is closer to the left camera, more weight
is given to the left camera, and if it is closer to the right camera, more weight is
given to the right camera. The methodology for transforming pixel coordinates into
GPS coordinates was originally developed in previous work described in Section 1.2.
As described in Eq. (3.16), this method enables the extraction of weighted GPS
coordinates, including both latitude and longitude:

GPS = (1− α) GPSleft + α GPSright, (3.16)

where α is the pixel’s x-coordinate divided by the total width of the stitched im-
age.

To enable simultaneous image processing from multiple video streams, an implemen-
tation of parallel computing through multi-threading is required [47]. The theory
from Section 2.11 is using this architecture, where each video stream is assigned a
separate execution thread, where frames are continuously fetched and placed in a
queue structure. The pairs are then retrieved from these queues by the main thread,
which then performs the merging and executes the object detection. The use of these
queues and thread events ensures that the data flow remains stable. This ensures
that data is processed in real time.

To accelerate image processing and offload computations from the CPU, the GPU
was utilized. Using theory from Section 2.15, a special Docker Compose file was
created to take advantage of the CUDA acceleration. This reduced the CPU load
from around 95% to 20%. The subsystem for combining the video streams could
also be run on a separate machines which was achieved via setting the Redis URL
from the internal Redis hostname to the separate machine IP.

3.4.5 Android app
The communication between the two hosts was changed from string format, as it
was implemented in previous work and described in Section 2.4, to JSON serial-
ization. The name msg_type denotes the type of the incoming message and which
function should be executed. The implementation in the backend is highlighted in
Appendix A.1. All possible msg_type’s are listed in Table 3.1.

Table 3.1: Description of different. msg_type

msg_type Purpose
Coordinate_request Exchange drone waypoint coordinates
Position Send telemetry and position to backend
Debug Message type used for debugging prints JSON payload
candidate Candidate for WebRTC
answer Answer for WebRTC

The mobile phone receives the telemetry and position using the DJI SDK described
in Section 2.3 and sends a message over the socket to the backend using the msg_type
set to Position, see Fig. 3.2. The backend stores the incoming data in Redis. The

26


3. Methodology

data is published in a Redis Pub/Sub channel where each drone ID is assigned its
unique key.

3.4.6 Frontend
The frontend interface serves as the user-facing component of the system, allowing
operators to control and monitor drones in real time. It builds upon the architecture
described in Section 2.12. The frontend pipeline and it’s integeration with the rest
of the system is also visualized in the Fig. 3.2. The communication and data transfer
between frontend and rest of the system is also visualized in that figure.
The interface includes a responsive layout comprising a video display section, teleme-
try dashboard, control panel, and a live map as shown in Fig. 3.9.

Figure 3.9: Frontend user interface with with video display, telemetry dashboard,
control panel and live map.

Positioning and telemetry data for each drone is sent continuously from the backend
and displayed in the browser. This data is obtained through a persistent WebSocket
connection from the backend, which relays the live Redis-published telemetry to
the frontend interface, where it is parsed and used to update each drone’s position,
altitude, speed, and battery status as described in Section 2.2.
To visualize the drones on a map, Leaflet.js markers are updated with the latest
GPS coordinates of each drone, from Redis. In this way, operators receive an ac-
curate and up-to-date spatial representation of drone positions relative to the test
environment as shown in Fig. 3.10.
For each live video feed, the frames are retrieved from Redis, and served as MJPEG
streams through Hypertext Transfer Protocol (HTTP) endpoints, and are displayed
in the frontend using HTML img tags. Users can select different layout configura-
tions (e.g., single stream, side-by-side, or stitched panoramic view) via a dropdown
menu as shown in the Fig. 3.11. The stitched video feed is generated in the backend
using the methodology explained in Section 3.4.4, and delivered to the frontend via
a dedicated endpoint.

27


3. Methodology

Figure 3.10: Interactive map displayed in the frontend user interface, visualizing
live positioning of the drones.

Figure 3.11: Video stream with single view layout configuration in the frontend
user interface.

3.4.6.1 Drone control using the frontend

The control of the drones builds on the theory described in Section 2.3.1 is managed
by the user from the frontend application. The interface includes command buttons
for Arm, Take-Off, and Return to Home as shown in Fig. 3.12.
Each button dispatches a WebSocket message to the backend, which translates these
into actionable commands for the drones and forwards it the mobile phone.
The Arm function sends a message from the frontend interface to the backend via
WebSocket. The message is received by the Android app’s client handler, which
decodes it and triggers the arming function in the flight manager. This function
performs initial checks, such as verifying the battery state (minimum 20 % required)
and GPS lock. It then clears any leftover waypoints and defines the new mission. A
take-off waypoint is generated 10 meters above the drone to prevent collision, and
subsequent mission waypoints are appended in order. The flight characteristics are
configured, and the waypoint mission is uploaded.

28


3. Methodology

The Take-Off command, when triggered from the frontend, initiates the mission start
by calling a function in the DJI SDK. Similarly, the Return to Home command
sends a corresponding message to return the drone to its launch location. These
controls are also available as global commands that apply to all connected drones
simultaneously as shown in Fig. 3.12.
Additionally, operators can start or abort test sessions via the Start Test and
Emergency Stop buttons as shown in Fig. 3.12. These trigger WebSocket messages
directed at the ATOS controller, which maintains test state. The frontend interface
visual feedback (e.g., test active badge) is synchronized with the ATOS system
response as it can be seen in Fig. 3.9 showing TEST INACTIVE.

Figure 3.12: Different command buttons and video layout configuration in the
frontend user interface.

29


3. Methodology

30


4
Experimental Validation and

Results

This chapter outlines the outcomes of the project and explains the methods used to
test the system’s functionality. The testing approach was aimed at assessing critical
performance parameters and confirming that the developed components met the
specified requirements. Both qualitative insights and quantitative data are provided
to support the system’s validation. The main results are:

• Positioning algorithm for two drones.
• Real time video streaming from drone to computer.
• Video merging of two drone-video streams.
• Web-based GUI with system monitoring and control capabilities.

These results are further discussed and validated bellow.

4.1 Verifying the Docker Setup
The experiments relied heavily on Docker containers. Before we could validate any-
thing else, we first had to confirm that this containerized setup was working correctly.
The containers and the communication between them is shown in Fig. 4.1.

4.1.1 Container Startup and Communication
The three Docker containers for the backend, image stitching module, and frontend
interface all started up as expected. We used the docker ps command to check
that they were running and that their ports were correctly mapped. The output
from docker ps is shown in Fig. 4.2.

A key check was ensuring smooth communication. The containers needed to talk to
each other, to internal services like ATOS and Redis, and also connect externally
with the Android app and the frontend interface. This all happened over the ded-
icated atos-net Docker network. However, services intended for external clients
were made accessible using Docker’s port forwarding to expose the necessary ports.
We confirmed this by observing successful data exchanges: for instance, the backend
correctly queried ATOS, and the frontend interface received data from the backend
containers, which originated from the Android app.

31


4. Experimental Validation and Results

Combined 
Streams

RedisBackend Frontend

Frames

Frames, 

drone control & telemetry

Frames, 

drone control & telemetry

Figure 4.1: Communication between Docker containers.

4.1.2 Monitoring and Debugging with Logs
During experiments, container logs were essential for tracking real-time behav-
ior and for any troubleshooting. We regularly used the command docker logs
<container_name> to pull logs from each service. This helped us to:

• Confirm that each container initialized properly.
• Monitor key processes and check that data was flowing correctly between ser-

vices.
• Quickly spot and fix any errors or unexpected issues that came up during

specific test scenarios. For example, if the backend wasn’t receiving a message,
its logs were our first stop. An example of logs is shown in Fig. 4.3.

Sometimes, to dig deeper, the command docker exec -it <container_name>
/bin/bash was used to open a shell inside a running container. This lets us inspect
its internal state, files, and running processes directly, which was helpful for tricky
debugging issues that logs alone couldn’t solve.

4.1.3 Ensuring Consistent and Reproducible Experiments
One of Docker’s biggest benefits for our experiments was environmental consistency.
By packaging all software dependencies and configurations into Docker images, we
made sure every experimental run used the exact same software environment. This
cuts down on variations caused by different machine setups, making our results
more reproducible and reliable. We controlled specific parameters for each run—like
ENV_LATITUDE, ENV_LONGITUDE, or N_DRONES—through environment variables in the
Docker Compose file, allowing us to systematically vary test conditions.

32


4. Experimental Validation and Results

[inspirer@spectrex360 ~]$ docker ps --format json
Image: communication_software-frontend
Command: "/docker-entrypoint...."
Names: frontend
State: running
Status: Up 4 seconds
Ports: 0.0.0.0:8001->80/tcp, [::]:8001->80/tcp

Image: communication_software-backend
Command: "/ros_entrypoint.sh ..."
Names: backend
State: running
Status: Up 4 seconds
Ports: 5000, 8000, 3478, 8765, 14500 (tcp/udp mapped)

Image: communication_software-image_stitching
Command: "python3 main.py"
Names: image_stitching
State: running
Status: Up 5 seconds

Image: astazero/atos_docker_env:latest
Command: "/ros_entrypoint.sh ..."
Names: atos
State: running
Status: Up 4 seconds
Ports: 80, 443, 3000, 3443, 8080-8082, 9090, 55555 (tcp mapped)

Image: astazero/iso_object_demo:latest
Command: "/app/build/ISO_obje..."
Names: isoObject
State: running
Status: Up 5 seconds

Image: public.ecr.aws/docker/library/redis:latest
Command: "docker-entrypoint.s..."
Names: astazero-redis
State: running
Status: Up 5 seconds
Ports: 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp

Figure 4.2: Status of running Docker containers. Some information for example,
ID has been removed.

33


4. Experimental Validation and Results

[inspirer@spectrex360 ~]$ docker compose logs backend
backend | Client connected.
backend | [DroneStream] Created RTCPeerConnection:
<aiortc.rtcpeerconnection.RTCPeerConnection object at 0x71c217fa8a60>
backend | [DroneStream] WebRTC offer created: v=0
backend | Assigned coordinate (’57.772956’, ’12.769956’, ’42’,
’-164’) to client 125078439888544

Figure 4.3: Example of logs retrieved from backend container. Some logs have
been redacted.

4.1.4 Continuous Integration and Delivery
Our CI/CD pipeline, which automatically built and pushed Docker images for the
backend and frontend interface to the GitHub Container Registry after every push
to the main branch, was also key element for the correct deployment of our solution.
Fig. 4.4 illustrates the build steps for the backend within this pipeline. This agile
approach meant our validation always used the latest stable software version, keeping
our tests aligned with ongoing development.

Build and Push Docker Image

Updated coords for testing at chalmers, new convex hull updated, vigg… #50

inspirers Drone-platform-for-safety-in-testing Type /  to search

Code Issues 16 Pull requests 2 Actions Projects 1 Wiki Security Insights Settings

Re-run all jobs

Jobs

Run details

Summary

build-and-push

Usage

Workflow file

Set up job 1s

Checkout code 4s

Log in to Docker Hub 1s

Set up Docker Buildx 20s

Build and push Docker image 1m 47s

Post Build and push Docker image 0s

Post Set up Docker Buildx 5s

Post Log in to Docker Hub 0s

Post Checkout code 0s

Complete job 0s

build-and-push
succeeded last week in 2m 25s

Search logs

5/6/25, 10:12 AM Updated coords for testing at chalmers, new convex hull updated, vigg… · inspirers/Drone-platform-for-safety-in-testing@68f7f20

https://github.com/inspirers/Drone-platform-for-safety-in-testing/actions/runs/14703109664/job/41256631206 1/1

Figure 4.4: Pipeline for building the Docker image for the backend.

4.2 Drone Control
The drone control program, together with the drone positioning script, allows the
drones to be controlled from the frontend interface and sent to their optimal posi-
tions. The drone positioning script maintains the desired amount of overlap at all
heights below 120 meters, where the Swedish legal limit is according to [48]. In this

34


4. Experimental Validation and Results

case, an exception in the code is raised. In the Section 3.4.2, the drone positioning
system is presented.

The system covers the entire area in all cases with height below 120 meters, and
respects the overlap that is used as argument in the function. In some edge cases
bugs are present in the positioning system where it selects the wrong axis to plot
the FOV-rectangles against, but this is deemed acceptable as in all tests the system
has shown to work and still respects the overlap that is input to the system.

A test file was created to observe the coverage in the case of one drone and two
drones. Ten simulated tests were performed using randomized ATOS trajectories.
The drone positioning functions from this project and last year’s project [3] were
compared. The following graph was extracted. It can be observed that the height
for the tests with two drones is lower in all cases. Thus, it can achieve sufficient
coverage at a significantly lower height. This is valuable as the object detection
model used currently is trained at a low altitude of 30 meters, and works best at
that height.

1 2 3 4 5 6 7 8 9 10
Test Number

1000

2000

3000

4000

5000

Ar
ea

 (m
^2

)

Test number, Area
Area

1 2 3 4 5 6 7 8 9 10
Test Number

30

40

50

60

70

80

90

100

He
ig

ht
 (m

)

Test number, Height
One Drone Height
Two Drone Height

Figure 4.5: Covered area versus height.

The FlightController is effective for sending the drones to their desired locations.
A Google Maps link was integrated into the code to test the positioning accuracy
of the software, and it was deemed to be sufficient for the functioning of the sys-
tem.

4.3 Real-time Video Streaming Using DJI Drones
The integrated WebRTC system connects the Android app to the client, as described
in Section 3.4.3, once the WebSocket connection is established. Upon a successful
peer connection, the Android client immediately begins streaming video captured
from the drone to the backend, where the frames are written into Redis. These video
streams can then be accessed and viewed separately by the user through the frontend
interface, as illustrated in Fig. 4.6 and Fig. 4.7. If the drones are not connected or if
a connection failure occurs, the frontend will receive and display black placeholder
frames with an error message, as shown in Fig. 4.8 and Fig. 4.9.

35


4. Experimental Validation and Results

Figure 4.6: Camera view in the GUI for the first drone.

Figure 4.7: Camera view in the GUI for the second drone.

Figure 4.8: Filler frame with error message in the GUI for the first drone.

36


4. Experimental Validation and Results

Figure 4.9: Filler frame with error message in the GUI for the second drone.

4.3.1 Integrated Functionality For Testing and Debugging
The peer connection state of each drone can be monitored through the terminal us-
ing the Docker command docker compose logs comm_software described in Sec-
tion 3.4.1. This is possible because the WebRTC system is integrated into the back-
end with print statements to display information in the terminal. As discussed in
Section 3.4.3, the server-side client is implemented in Python using the aiortc library.
The library provides a connectionState attribute for each active peer connection.
This attribute was used to implement drone stream state monitoring, described in
Table 4.1.

An important observation from the system testing was that the Drone stream state
will remain Unknown during the ICE candidate exchange before it becomes Online
when the peer-to-peer connection is complete or Error if it fails.

To supervise the RTC handshake process, which is important during development
and debugging, each step of the handshake is logged in the terminal. The logs
include details about ICE candidates and SDP offers. These logs accessible via the
same Docker command, provide visibility into the offer-answer negotiation and ICE
candidate exchange between peers.

Table 4.1: Description of different drone stream states.

State: Meaning:
Online Peer connection was successful
Disconnecting Client disconnected
Closed Peer connection was removed successfully
Error Peer-to-peer connection could not be established
Checking Checking for peer connection
Unknown State was not recognized by aiortc’s connectionState

37


4. Experimental Validation and Results

4.4 Extended Video Surveillance
In Fig. 4.10 the results of the extended video surveillance are displayed, where during
system operation, a continuous video feed is displayed on the system frontend. Here,
the live video streams are merged into one wide panoramic view. In the image,
detected objects are marked with rectangles and text labels indicating each object’s
unique ID and its calculated GPS coordinates. The image is updated in real time
and continuously reflects changes in the scene as detected objects move and are
tracked, or new objects are detected. If the connection to a drone is lost, its part of
the combined image is replaced with an information box indicating the connection
status, as seen in Fig. 4.11.

Figure 4.10: System testing at AstaZero.

The video streams are retrieved from a Redis database, where each individual drone
uploads its current frame in JPEG format. Using asynchronous functions, these
frames are loaded in parallel from each drone. These frames are then decoded into
an OpenCV image in the program. The frames are then rescaled to a common size
and stitched together into a merged panoramic image. Using alpha blending, the
overlap zone from the two video streams is blended to create a seamless panorama
with natural transition between the two frames.
On the extended panorama image, object detection is applied using a YOLO model
developed in previous work [3]. For each detected object, a GPS coordinate is
calculated based on its pixel position and the geographical position and altitude
of the drones. In order to track a detected object between frames, weighted GPS
coordinates are calculated.
Finally, the merged image is annotated, where each detected object is enclosed by a
rectangle showing each individual ID and geographical GPS coordinates. Once the

38


4. Experimental Validation and Results

Figure 4.11: Drones not connected.

image is finalized, it is converted to JPEG format and sent to Redis to be accessed
by the frontend interface.

4.4.1 Testing the Subsystem
To enable separate testing of the subsystem, locally stored video files were used.
This approach was chosen because it enabled a more efficient development process
by allowing the subsystem to be tested in parallel without being dependent on real-
time streaming from drones. The use of local video files resulted in a significant
increase in the number of possible tests, as the entire operational system did not
need to be connected to obtain suitable footage. Development could thus proceed
rapidly and iteratively, which is crucial in the early phases of system development.
A fundamental prerequisite for this type of testing was that the methods imple-
mented for the subsystem’s purpose, such as video handling and image merging,
were designed to function independently of the video source. This means that the
functionality tested using local files could be assumed to be representative of the
functionality during real-time streaming from drones.
The testing is considered valid because the same code base and processing methods
were used regardless of whether the video originated from a local file or from a live
stream. The critical functions to be validated concerned how the subsystem handles,
processes, and merges video streams, and these operations are identical regardless of
the source. Furthermore, local files allowed for systematic and reproducible testing:
the same conditions could be reused to isolate and analyze potential errors, signifi-
cantly improving the precision of fault analysis and systematic improvement.
Since the subsystem was based on previous work developed by Mohi et al. [3] , the
initial phase of development focused on extending an existing object detection sys-
tem that originally handled only a single video stream. The work then progressed
to enable the handling of multiple streams simultaneously and to develop a method

39


4. Experimental Validation and Results

for seamlessly merging these into a unified panoramic image. Validating the sub-
system’s ability to handle multiple streams and perform accurate merging could be
reliably performed using local video files, as they provided controlled and repeatable
test scenarios that captured the core aspects of the system’s functionality.
After the tests with locally stored videos showed good results and a stable base
system was established, the subsystem was adapted for integration with the other
system components. This integration only required minor adjustments in the video
handling logic.
Finally, the system’s functionality was verified in a full drone system with real-time
data, where further optimizations and calibrations of the model’s key parameters
could be performed.

40


5
Discussion

This chapter discusses the performance of the individual subsystems and the system
as a whole, highlighting both strengths and limitations identified during development
and testing. Reflections on the overall design process, lessons learned, and potential
areas for improvement are also presented. In addition, the chapter addresses ethical
aspects related to the project and proposes directions for future work.

5.1 Drone Control

For a surveillance area that is relatively square in shape the advantages of the system
are not directly apparent, but for a test-scenario that follows a wider and narrower
trajectory this system is capable of covering a substantially larger area than a system
of only one drone. This is due to the 16:9 aspect ratio of the drones FOV.

5.2 Data Serialization

In previous work, the code utilized plain strings for communication between the
Android app and the backend [3]. To send a coordinate over the socket the first ten
digits represented the latitude, digits 11 to 20 denoted the longitude, digits 21 and
22 indicated the altitude, while the remaining digits specified the angle. However,
one problem could arise if the altitude is single digit or three digits. Instead, a new
serialization for data transfer was used which is JSON described in Section 2.4. One
of the disadvantages of JSON is that it does not guarantee type safety and does not
automatically enforce the schema.

It is also text based which is great for debugging, however, could lead to larger
size, and serialization and deserialization could be slower compared to a binary
format. An alternative is ProtoBuf developed by Google [49]. Messages transmitted
with ProtoBuf are serialized into a binary wire which is compact, forward- and
backward-compatible [50]. Since ProtoBuf uses a binary wire compared to JSON
which is text-based ProtoBuf tends to be more performant. When transmitting
large amount of data this could be a noticeable difference, for example, for video
and telemetry data from the drones. Support for ProtoBuf in web browser are lower
while JSON is more native. Therefore, it would make sense to use ProtoBuf between
the mobile phone and backend and use JSON for the frontend webpage.

41


5. Discussion

5.3 Endpoint vs Message Type
For communication between the backend and the mobile phone a message type
was introduced as explained in Section 2.4 to distinguish between different types of
methods. An alternative approach could be to use different endpoints for different
messages e.g., /position for drone position, /flightmanager for commands sent to
the drones flight manager. This would eliminate the need for messages types since
only drone position messages would be sent over the /position endpoint. This
would make the code more structured, reducing the risks of bugs, for example, what
if the mobile phone sends a message type that the backend does not recognize?
The reason for current implementation is that to accommodate for the different
endpoints the mobile phone would have to connect to multiple WebSockets. This
would require changes to both the backend and the Android app.

5.4 Real Time Video Streaming In the Developed
System and Its Limitations

The subsystem for real time video streaming, described in Section 2.8, is currently
limited to DJI drones. This limitation arises because the Android app, described
in Section 2.4, is currently specifically designed for DJI drones. To enable video
streaming from other types of drones within the developed system, these drones or
their control interfaces must support a WebRTC connection by implementing the
logic to function as a WebRTC client, similar to the existing one in the current
Android app.

Another limitation of the current video streaming setup is that the Android app
supports video display either within the app or in the frontend, but not both simul-
taneously. The reason for keeping the function to display video in the app is for
debugging purposes, such as verifying that the camera is properly connected before
linking the drone to the backend system via the app.

5.5 Extended Video Surveillance
In the merged panoramic image, visual artifacts occasionally occur, particularly in
connection with moving objects passing through the overlap zone between the two
image frames. This phenomenon is well known in image stitching and is commonly
referred to as “ghosting” [31], as it produces ghost-like duplicates of the moving
objects. When such artifacts are not detected by the YOLO model, they primarily
pose an aesthetic issue from the user’s perspective, without affecting the system’s
core functionality or performance. However, problems may arise if these artifacts
are falsely registered as valid objects by the model, which can lead to duplicate
detections. This, in turn, may result in erroneous triggers, reactions, or mismatches
in object recognition, where the system responds to objects that do not actually
exist.

42


5. Discussion

To minimize the occurrence of these artifacts and mitigate the effects of ghosting, it
is crucial to ensure correct overlap between the image frames. In the current system,
the overlap is manually defined in the code file. As long as these settings accurately
reflect real-world conditions, a stable panoramic image without visible artifacts can
be produced. However, this approach requires the ability to consistently reproduce
test scenarios, enabling calibration of the model to specific conditions. This method-
ology was employed during the evaluation of the subsystem in Section 4.4.1.
Similar to the overlap configuration, this subsystem also requires manual specifica-
tion of the coordinates for both drones. This is necessary because the transformation
from pixel coordinates to geographic coordinates, required to assign GPS positions
to detected and annotated objects, depends on accurate knowledge of the relative
positions of the drones.
The YOLO model used in this subsystem was developed in previous work [3], and
it performs sufficiently well to provide a foundation for object detection and the
assignment of unique IDs that can be tracked across the merged image surface.
However, the model exhibits certain limitations, particularly in frame-by-frame de-
tection, where it occasionally fails to accurately identify objects. To compensate
for this, the model is currently configured to operate at a low confidence threshold,
which increases its sensitivity to potential objects. While this improves the like-
lihood of detecting true positives, it also raises the risk of false positives, that is,
detecting objects that are not actually present. As a result, both false positives and
false negatives may occur, leading to discrepancies between the expected number of
objects and the actual detections recorded by the system.
To merge the image frames into a coherent panoramic image, alpha blending is used.
This method has proven to be sufficiently effective for the objectives of the project,
as all test scenarios are two-dimensional. However, if scenarios involving a third
dimension (i.e., incorporating depth) were introduced, more advanced image stitch-
ing techniques would be required. Such techniques include, for example, SIFT and
homography-based methods, which aim to identify matching keypoints between im-
age frames to enable robust image fusion. These methods were explored during the
early development phase of the subsystem but could not be implemented in a usable
way, most likely due to increased computational demands. This resulted in unstable
panoramic images in which the frames exhibited temporal instability and misalign-
ment between frames, which did not meet the system’s requirements for stability and
precision. Given that the test scenarios are limited to 2D use cases, alpha blending
has proven to be an adequate solution for achieving the necessary object traceability
across the panorama required for the drone system’s full functionality.

43


5. Discussion

5.6 Suggestions on Future Work
This subsection outlines potential directions for future work within the project,
highlighting areas for improvement and proposing strategies to enhance the full
system.

5.6.1 Communication
During testing at AstaZero, it was discovered that ATOS communication relies on
an Ethernet cable connection. In the current system setup, the key functionality
depends on all devices being connected to the same local wireless network. There-
fore, a potential improvement is to implement an adapter to bridge Ethernet to the
wireless network or transition to a fully Ethernet based solution. However, since the
Android app requires a wired connection to the hand controller, this could present
challenges in a fully Ethernet based setup.
An inefficiency observed during testing was the time required to manually enter IP
address and port into the Android app to connect all devices. Therefore, another
identified point of improvement is to eliminate this step. This would be a crucial step
to undertake if the system were to be implemented for a large number of drones,
which has been identified as another key area for improvement. One solution is
that the Android app pings every possible private IP address on the network. If
a server is running on that IP the server responds with a message the Android
app recognizes. However, for IPv4 since a private IP can range from 10.0.0.0 to
10.255.255.255, 10/8 in CIDR prefix [51]. This could lead to 224 IPv4 addresses
which doesn’t scale efficiently. An alternative is that running servers publish their
IP to a list the Android app can access. Then the Android app checks the list and
the user can easily connect to the servers in the list.

5.6.2 Extended Video Surveillance
Previous work [3] has identified several areas for improvement in the object detection
model, particularly regarding its performance at varying flight altitudes and its
ability to detect a broader range of object classes. Since the model has primarily
been trained on image data captured from an altitude of approximately 40 meters,
its accuracy decreases when applied at other heights [3]. This limitation is primarily
attributed to insufficient training data, which could be addressed through extended
data collection and additional training efforts [3]. Furthermore, efforts should focus
on improving the model’s robustness and reliability, as its current configuration fails
to consistently deliver satisfactory detection results.
To extend the system’s functionality to include three-dimensional interpretation
of the environment, the implementation of depth estimation techniques is recom-
mended. Algorithms such as SIFT and homography-based computations could be
utilized to reconstruct depth information from multiple camera perspectives. This
would notably improve the precision of image stitching, especially when handling
multiple simultaneous video streams, where accurate spatial understanding is critical
for achieving a coherent composite. Also, the current system is designed to operate

44


5. Discussion

with two drones, which limits the area that can be covered by their camera views.
Expanding the system to support a larger number of drones would overcome this
limitation. However, integrating such a system introduces new challenges, including
a more complex image stitching process.

Another proposed area for future development is the automation of parameter han-
dling within the subsystem responsible for image stitching. In the present system,
the user is required to manually specify the drone coordinates and image overlap
settings directly in the source code. To enhance usability and reduce the risk of
human error, these parameters should instead be retrieved automatically from other
system modules. Such automation would increase the workflow and contribute to a
more integrated and scalable system architecture.

In the current implementation, the coordinates of detected objects are not compared
with the trajectories provided by ATOS. These trajectories typically include detailed
information such as object type, position, and the expected time of arrival (e.g.,
“Volvo at coordinates X, Y after T seconds”). A potential improvement would
involve extracting and classifying objects from these trajectories so that the YOLO
model adopts the same classification scheme. This would enable the system to
compare expected object positions and classes with those estimated by the detection
algorithm. If a significant mismatch is detected, an abort signal to ATOS via a ROS
topic is triggered, terminating the scenario.

5.7 Social and Ethical Aspects
The relevant ethical concerns to consider during the project are mostly in the scope of
personal integrity. As this project contains the use of automated camera surveillance
from autonomous drones, therefore a risk of an invasion of privacy exists. However,
during the use of the system, the video stream that is provided from the drones will
not have the sufficient resolution to identify one particular identity. An important
point of view from the social aspect is that drone operations regarding flight will
be limited to operators with an approved license, as this is a legal requirement
[52].

The ethical benefits if this project is to be used on a regular basis, is of course an
improvement of safety at AstaZero’s testing facility. The integrated multiple video
streams will provide them with a larger surveillance coverage of the test range.

Social aspects can also be related to privacy, where the collection of data by drones
can pose a problem for the security of vital installations, for example, installations
requiring extra protection for sabotage, terrorism, espionage, and aggravated rob-
bery. According to the Protection Act, a decision on a vital installation means that
unauthorized persons do not have access to the object of protection. The prohibition
of access also covers access by means of an unmanned aircraft system. A special
decision may be issued prohibiting the taking of photographs, descriptions or mea-
surements of or within the vital installation [53]. If the system developed during
the project is misused or falls into the hands of someone with hostile intentions, it
could pose a significant risk to these installations.

45


5. Discussion

A social aspect that benefits from the use of drones is the rescue service. With the
help of a drone, both endangered people and fires can be located faster. Drones
contribute important information throughout the course of the rescue operation
through their information retrieval [54]. If the project were to be successful, it could
contribute to the rescue service in the future, as drone surveillance and detection of
dangers are at the forefront in this sector as well.
During a discussion following the Strategic Foresight Workshop at Penn State, James
E. Cartwright, a retired four-star general of the United States M