Autoencoder and Active Learning
to Reduce False Positive Warnings in a
Slippery Road Alert System

Using unlabeled multivariate time series

Master’s thesis in Computer science and engineering

Philip Axenhamn

Andreas Greppe

Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2023


Master’s thesis 2023

Autoencoder and Active Learning
to Reduce False Positive Warnings in a

Slippery Road Alert System

Using unlabeled multivariate time series

Philip Axenhamn

Andreas Greppe

Department of Computer Science and Engineering
Chalmers University of Technology

University of Gothenburg
Gothenburg, Sweden 2023


Autoencoder and Active Learning to Reduce False Positive Warnings in a Slippery
Road Alert System
Using unlabeled multivariate time series
Philip Axenhamn
Andreas Greppe

© Philip Axenhamn, 2023.
© Andreas Greppe, 2023.

Supervisor: Selpi, Department of Computer Science and Engineering
Advisor: Per-Olof Nilsson, Volvo Cars
Examiner: Peter Damaschke, Department of Computer Science and Engineering

Master’s Thesis 2023
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Typeset in LATEX
Printed by Chalmers Reproservice
Gothenburg, Sweden 2023

iv


Abstract
Modern vehicles are commonly equipped with a slippery road alert (SRA) system
that warns drivers of slippery roads. Current implementations of this system occa-
sionally produce warnings when the road is not slippery. These warnings are called
false positives and can harm the system’s trustworthiness. In this thesis, we propose
a false positive filter capable of reducing the number of false positive alerts gener-
ated by an SRA system based on two machine learning techniques: autoencoder
and active learning. The available data was completely unlabeled and contained
few informative features. The analysis of this data showed that vehicles send bursts
of data points forming sequences. Due to the limitations of the data, multi-variate
time series were constructed with the idea that the sequential data might reveal more
about the situation than a single-point measurement. Furthermore, the sequences
were grouped into true- and false-positive classes based on assumptions of the causes
for the alerts, such as driving on ice or a speed bump. The sequential data was used
to train GRU- and LSTM-based autoencoders and classifiers to detect sequences
that correspond to false positive situations such that they can be removed. The
hyperparameters for the models were tuned using Optuna and the best-performing
models with the most optimal hyperparameters were further evaluated. Since the
data was not labeled, the actual performance of the proposed solution could not
be assessed. Instead, the evaluation was based on computing the proportion of re-
maining assumed true positive (ATP) sequences and assumed false positive (AFP)
sequences after filtering. The results show that the LSTM autoencoder could find
patterns in the sequential data and was able to remove 43% of the AFP sequences
while retaining 90% of the ATP sequences. The active learning approach proved to
not work well with the available data.

Keywords: slippery road alert, false positive, machine learning, unsupervised learn-
ing, semi-supervised learning, autoencoder, active learning, anomaly detection

v


Acknowledgements
We would like to express our gratitude to everyone who has supported us during
the project. We want to thank our supervisor Selpi from Chalmers, who gave us
valuable suggestions and helped us form ideas. We also thank the people at Volvo
Cars who made the project possible. A special thanks to Fredrik and Per-Olof who
provided feedback and help throughout the project.

Philip Axenhamn & Andreas Greppe
Gothenburg, 2023-06-19

vii


Contents

List of Figures xi

List of Tables xiii

List of Acronyms xv

1 Introduction 1
1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related Work 5

3 Theory 7
3.1 Friction Between Tire and Road Surface . . . . . . . . . . . . . . . . 7
3.2 Estimating The Coefficient of Friction . . . . . . . . . . . . . . . . . . 8

3.2.1 Vehicle Operating Parameters . . . . . . . . . . . . . . . . . . 9
3.2.2 Tire Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.3 Road Characteristics . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.4 Environmental Factors . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . 11
3.3.2 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . 11
3.3.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.4 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . 14

3.4 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5.1 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . 20
3.5.2 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . 21
3.5.3 Gated Recurrent Unit . . . . . . . . . . . . . . . . . . . . . . 22
3.5.4 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.6 Spatial Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Data Description and Analysis 27

ix


Contents

4.1 Available Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1 Vehicle Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Slippery Road Alert Data . . . . . . . . . . . . . . . . . . . . 30
4.1.3 Weather Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.4 Road data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Methods 37
5.1 Defining True- and False Positive SRAs . . . . . . . . . . . . . . . . . 37
5.2 False Positive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3.1 Sequence Generation . . . . . . . . . . . . . . . . . . . . . . . 39
5.3.2 Sequence Mapping . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3.3 Sequence Grouping . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 Final Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.4.1 Autoencoder Data Set . . . . . . . . . . . . . . . . . . . . . . 44
5.4.2 Active Learning Data Set . . . . . . . . . . . . . . . . . . . . . 45

5.5 Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5.1 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5.2 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Results 57
6.1 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7 Discussion and Conclusion 73
7.1 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.3 Available Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.5 Ethical and Legal Considerations . . . . . . . . . . . . . . . . . . . . 79
7.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.7 Final Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Bibliography 85

A Appendix I
A.1 Autoencoder Experiment 3 - All trials . . . . . . . . . . . . . . . . . . I
A.2 Autoencoder Experiment 3 - Hyperparameter importance . . . . . . . II

x


List of Figures

3.1 Simplified visualization of the forces acting on a rotating wheel. . . . 8
3.2 Pool-based active learning. . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 How DBSCAN works. . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 A single layer perceptron with three input variables and one output. . 16
3.5 A multi-layer perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 Three common activation functions. . . . . . . . . . . . . . . . . . . . 18
3.7 The unrolling of a recurrent neural network. . . . . . . . . . . . . . . 20
3.8 Long short-term memory (LSTM) cell. . . . . . . . . . . . . . . . . . 22
3.9 Gated recurrent unit (GRU) cell. . . . . . . . . . . . . . . . . . . . . 23
3.10 The structure of an autoencoder. . . . . . . . . . . . . . . . . . . . . 24
3.11 Spatial hashing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Sample of data points from the vehicle data plotted on a map. . . . . 29
4.2 Number of generated alerts for each season within a year. . . . . . . . 32
4.3 Number of generated slippery road alerts in Gothenburg during the

project (late winter and early spring). . . . . . . . . . . . . . . . . . . 33
4.4 Number of daily generated alerts together with the daily air temper-

ature and precipitation during summer 2022 in Gothenburg. . . . . . 34
4.5 Heatmap of generated slippery road alerts within an area in Gothen-

burg during summer 2022. . . . . . . . . . . . . . . . . . . . . . . . . 34
4.6 Number of friction measurements against the daily air temperature

and daily precipitation levels. . . . . . . . . . . . . . . . . . . . . . . 35

5.1 High-level overview of the proposed false positive filter. . . . . . . . . 39
5.2 Temporal clustering of the raw vehicle data. . . . . . . . . . . . . . . 40
5.3 Sequence to speed bump matching. . . . . . . . . . . . . . . . . . . . 42
5.4 Assumed outcomes of different weather conditions. . . . . . . . . . . . 43
5.5 Structure of the GRU and LSTM autoencoders. . . . . . . . . . . . . 48
5.6 Baseline model for the active learning approach. . . . . . . . . . . . . 52
5.7 Final model for the active learning approach. . . . . . . . . . . . . . . 53

6.1 Mean MSE reconstruction losses of each class using the GRU- and
LSTM autoencoders for setting 1 and 4. . . . . . . . . . . . . . . . . 59

6.2 Train- and validation MSE losses for settings 1 and 4 using GRU- and
LSTM autoencoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

xi


List of Figures

6.3 Distributions of reconstruction losses per sequence of each class using
the GRU- and LSTM autoencoders for setting 1 and 4. . . . . . . . . 61

6.4 Distributions of the reconstruction losses for each class using MSE
loss and scaled MSE loss. . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.5 Train and validation loss for each epoch using regular MSE loss and
scaled MSE loss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.6 Validation loss and loss difference for each trial with the LSTM au-
toencoder using MAE as loss function. . . . . . . . . . . . . . . . . . 62

6.7 Validation loss and loss difference for each trial with the LSTM au-
toencoder using MSE as loss function. . . . . . . . . . . . . . . . . . 63

6.8 Distributions of reconstruction losses for each class and for each model
trained with hyperparameters from the selected trials. . . . . . . . . . 64

6.9 Train- and validation loss during training of models with hyperparam-
eters from the selected trials. . . . . . . . . . . . . . . . . . . . . . . . 65

6.10 Distributions of the reconstruction losses for each class and two dif-
ferent thresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.11 Proportion of remaining sequences after filtering using both thresholds. 67
6.12 Active learning iteration for least certain sampling, test 7. . . . . . . 70
6.13 Active learning iteration for margin sampling, test 8. . . . . . . . . . 70

A.1 The final validation loss and loss difference between ATP-ICE val.
and AFP-DRY for both the GRU- and LSTM autoencoder using
MAE and MSE as loss function. . . . . . . . . . . . . . . . . . . . . . I

A.2 Hyperparameter importance for each model tuned using Optuna. . . II

xii


List of Tables

3.1 Factors affecting the estimated friction between tire and road surface. 9

4.1 Features of the vehicle data. . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Features of the slippery road alert data. . . . . . . . . . . . . . . . . 30
4.3 Features of the weather data from Trafikverket. . . . . . . . . . . . . 30
4.4 Features of the weather data from SMHI. . . . . . . . . . . . . . . . . 31
4.5 Features of the road data. . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1 Definition of true- and false positive slippery road alerts. . . . . . . . 38
5.2 Classes for the autoencoder data set. . . . . . . . . . . . . . . . . . . 44
5.3 Certain classes for active learning data set. . . . . . . . . . . . . . . . 45
5.4 Uncertain true positive classes for active learning data set. . . . . . . 46
5.5 Uncertain false positive classes for active learning data set. . . . . . . 46
5.6 Settings for autoencoder experiment 1. . . . . . . . . . . . . . . . . . 50
5.7 Hyperparameter used in autoencoder experiment 1. . . . . . . . . . . 50
5.8 Hyperparameters used in autoencoder experiment 2. . . . . . . . . . . 51
5.9 Hyperparameters tuned with Optuna for the autoencoders. . . . . . . 51
5.10 Initial hyperparameters for the classifiers . . . . . . . . . . . . . . . . 53
5.11 Settings for active learning experiment 1. . . . . . . . . . . . . . . . . 54
5.12 Settings for active learning experiment 2. . . . . . . . . . . . . . . . . 54
5.13 Hyperparameters tuned with Optuna for the classifier model. . . . . . 55
5.14 Settings for active learning experiment 3. . . . . . . . . . . . . . . . . 55

6.1 Results from experiment 1 - GRU-autoencoder . . . . . . . . . . . . . 58
6.2 Results from experiment 1 - LSTM-autoencoder . . . . . . . . . . . . 58
6.3 Autoencoder - Best hyperparameters . . . . . . . . . . . . . . . . . . 63
6.4 Proportion of remaining sequences for each class after filtering using

threshold 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.5 Proportion of remaining sequences for each class after filtering using

threshold 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.6 Results from active learning experiment 1. . . . . . . . . . . . . . . . 68
6.7 Results from active learning experiment 2. . . . . . . . . . . . . . . . 69
6.8 Active learning - Best hyperparameters. . . . . . . . . . . . . . . . . . 69
6.9 Results from active learning experiment 3. . . . . . . . . . . . . . . . 70
6.10 Proportion of remaining sequences for each class after filtering using

the classifier models. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xiii


List of Tables

xiv


List of Acronyms

ABS Anti-lock Braking System.

ADAS Advanced Driver Assistance System.

AE Autoencoder.

AFP Assumed False Positive.

AL Active Learning.

ANN Artificial Neural Network.

ATP Assumed True Positive.

DBSCAN Density-Based Spatial Clustering of Applications with Noise.

GRU Gated Recurrent Unit.

LSTM Long Short-Term Memory.

MAE Mean Absolute Error.

ML Machine Learning.

MSE Mean Squared Error.

RNN Recurrent Neural Network.

SRA Slippery Road Alert.

TCS Traction Control System.

xv


List of Acronyms

xvi


1
Introduction

Slippery road surfaces are a common cause of car crashes in northern countries.
According to Trafa (Swedish government agency for traffic analysis), 33% of all
road-traffic accidents reported during 2022 in Sweden occurred on wet or icy road
conditions [1]. To increase vehicle and traffic safety, modern vehicles are equipped
with Advanced Driver Assistance System (ADAS) that aims to assist drivers in dif-
ferent driving situations, including difficult road conditions. A subsystem of ADAS
is the slippery road alert (SRA) system that warns drivers of nearby slippery roads
via a cooperative intelligent transport system (C-ITS) where data, including infor-
mation about road slipperiness, is anonymously shared with nearby vehicles via a
cloud-based solution. These warnings make the driver aware of a potentially slippery
road ahead such that the vehicle can be manoeuvred to avoid slipping. Inaccurate
estimations of the road slipperiness in this system can lead to either misses of slip-
pery roads (false negatives) or false warnings (false positives). Hence, generating
accurate slippery road alerts is significantly important for vehicle and traffic safety
and the reliability of the system.

1.1 Problem Formulation

In an SRA system, vehicles share friction data to the cloud where it is processed
to generate SRAs that are sent to other nearby vehicles. Current implementations
of SRA systems rely on the aggregation of estimated friction coefficients between
tire and road surface based on measurements from in-vehicle sensors. In [2], the
authors proposed a sensor data fusion approach for friction estimation based on the
brush model and Kalman filter. Furthermore, a more accurate friction estimation
can be made using systems such as the anti-lock braking system (ABS) and the
traction control system (TCS). However, current implementations of SRA systems
occasionally produce false positive alerts. Hence, drivers receive warnings of slippery
roads when the road is not slippery. Reduction of false positive SRAs is important
since it would increase the reliability and trustworthiness of the system which in
turn can lead to drivers reducing the speed of the vehicle instead of ignoring the
warnings.

1


1. Introduction

1.2 Aim and Objectives
This thesis aims to reduce the number of false positive alerts generated by an SRA
system based on two machine learning techniques: autoencoder and active learning.
The proposed solution will be a false positive filter intended to remove sequences that
correspond with false positive situations e.g. driving on a speed bump. The primary
objectives of this thesis can be divided into two parts: First, the causes of potential
false positive alerts will be investigated by analyzing the available data. This is to
better understand in which situations the assumed false positive alerts commonly
occur to motivate decisions taken in the study. Second, we aim to improve an SRA
system by focusing on reducing the number of false positive alerts through the use
of the previously mentioned machine learning approaches.

1.3 Contributions
Numerous studies have investigated techniques to estimate the friction between tire
and road surface. However, reducing false positive SRAs by the use of machine
learning models solely on unlabeled and estimated friction data in combination with
the weather- and road data has never been done before this study. In this thesis,
we use two different machine learning approaches based on autoencoder and active
learning to reduce the number of false positive SRAs. Both of the approaches will be
evaluated using unlabeled data only. Furthermore, the study provides an analysis
of potential causes of false positive alerts in an SRA system based on the available
data. Finally, an approach for creating sequential data from spatial-temporal data
points based on time series clustering and DBSCAN is proposed.

1.4 Limitations
The limitations of this study were mostly related to the available data and the
evaluation of the proposed solution.

• Spatial- and temporal limitations in data: A data set was not available
at hand and was therefore needed to be constructed during the project. The
data sets used for training the machine learning models were gathered within a
limited time frame, more specifically during winter and early spring. Hence, the
generated data sets lack data from both summer and autumn where false positive
alerts probably are easier to distinguish from true positive alerts. Additionally,
the data was gathered from vehicles in Gothenburg, Sweden.

• Feature limitations in data: The most important available features in the
available data were the estimated friction coefficient and friction quality. Proper-
ties of the vehicle such as its velocity, weight, and lateral- and longitudinal forces
on the wheels, were not available but would probably be beneficial for detecting
false positive alerts.

• Unlabeled data: The available friction data was unlabeled meaning that ground

2


1. Introduction

truths of true- and false positives were not available. Creating a labeled data set
would be a time-consuming process and due to limited time for the project, this
was avoided. Hence, unsupervised and semi-supervised methods were used and
some of the choices made in this study were based on assumptions motivated by
the literature study and data analysis.

• Evaluation limitations: Evaluation of unsupervised machine learning models is
generally challenging since there is no ground truth to compare with. Usually, at
least a small set of labeled data is collected to have something to evaluate against.
However, a labeled data set could not be created due to practical difficulties
and time constraints. Hence, determining if the proposed solution reduces false
positive alerts was not manageable. Instead, the evaluation of the proposed
solution was mostly based on comparing the machine learning models on assumed
true- and false-positive sequences using metrics that do not rely on ground truths.

1.5 Thesis Outline
The rest of this paper is structured as follows:

• Chapter 2 presents previous work and research related to this study.

• Chapter 3 provides a theoretical background of the most important concepts and
techniques used in this project.

• Chapter 4 provides a description and analysis of the available data.

• Chapter 5 describes the methods used and how the project was conducted includ-
ing data processing, choice of architecture, development of the machine learning
models, and finally the evaluation of the proposed solution.

• Chapter 6 presents the results of the experiments and final evaluation of the
autoencoder and active learning approach.

• Chapter 7 discusses the results and concludes the thesis with a discussion of
ethical- and legal considerations, insights on the topic, and future work.

3


1. Introduction

4


2
Related Work

False positive alerts can be considered to be anomalies and there are numerous
studies made on how to detect anomalies in time series. In [3], the authors discussed
common complexities and challenges in the field of anomaly detection using deep
learning techniques. First, some anomalies might be unknown until they occur which
makes them difficult to prevent. Second, there might be heterogeneous anomaly
classes i.e. anomalies with different root causes and abnormal characteristics. Third,
anomalies are typically rare and only occur on special occasions which can lead to
class imbalance.

In [4], the authors pointed out that anomaly detection is different from traditional
machine learning where the goal is to find similarities rather than dissimilarities in
the data. Furthermore, the authors mentioned that the evaluation of anomaly detec-
tion systems is crucial but difficult to perform and even claimed that it is even more
challenging than developing the system itself. One of the most challenging aspects of
evaluating anomaly detection systems is the lack of a labeled data set. Furthermore,
in other fields of machine learning, there are often public data sets available that
make the results directly comparable with other methods used in previous studies.
In contrast, researchers in the field of anomaly detection are forced to create their
own data sets which means that the results are not directly comparable with results
from previous studies.

For the reasons previously mentioned, anomaly detection is typically defined as an
unsupervised learning task. In [5], k-means clustering was used to reduce anomalies
in the field of computer networks. Using this approach, it was possible to achieve an
anomaly detection rate of 82.92%. The use of clustering-based methods is typically
less complex than machine learning models and can therefore be easier to implement.
However, other methods might be more suitable than clustering depending on the
available data.

Autoencoders are commonly used as an unsupervised method for anomaly detection
and can be applied to many different types of complex data including images and
time series. In [6], autoencoders were applied to reduce false positives in a handgun
detection system. The results showed that the autoencoder could reduce the number
of false positives of detected guns by 78%.

In the work [7], the authors trained LSTM-based autoencoders on time series for
anomaly detection. The LSTM autoencoders achieved an accuracy rate of 87% in

5


2. Related Work

detecting anomalies.

A more recent approach for anomaly detection in time series involves the utilization
of transformers, as proposed in [8]. The experimental results revealed an improve-
ment in F1-score by up to 17% and a remarkable 99% decrease in training time
compared to the baseline models. While transformer-based autoencoders share sim-
ilarities with traditional RNN-based autoencoders, they offer advantages such as
enhanced performance and execution time. However, it is important to note that
these benefits come at the expense of a more intricate architecture, which may pose
implementation challenges.

Moreover, research efforts have also been directed toward the prediction of road
states using various machine learning techniques. Multiple studies have explored
approaches, utilizing data from external sensors such as GPS, accelerometers, and
gyroscopes attached to vehicles. In [9], the authors employed unsupervised methods,
including self-organizing maps (SOM) and k-means clustering in combination with
the aforementioned sensor data to group roads based on their roughness and condi-
tion. Conversely, in [10] the authors implemented a supervised learning approach to
predict road obstacles using the same type of sensor data. These studies show the
potential for using sensor data to directly estimate and identify road obstacles and
conditions from data available in the vehicles.

6


3
Theory

This chapter provides a theoretical framework of the key concepts of the project.
Section 3.1 describes the fundamentals of friction and how it occurs between tire and
road surface. Section 3.2 describes the most critical factors affecting the estimated
friction coefficient. Section 3.3 describes important concepts within machine learning
relevant to the study. Section 3.4 describes the DBSAN clustering algorithm used
in the sequence generation process. Section 3.5 provides a theoretical background of
artificial neural network, recurrent neural network, LSTM, GRU, and autoencoder.
Finally, Section 3.6, describes spatial hashing which is a method that was used in
the creation of the data sets.

3.1 Friction Between Tire and Road Surface
Friction is the force that counteracts the relative motion between two surfaces that
are in contact with each other. The coefficient of friction (COF), often written as
µ, is defined as the ratio between the impressed force and the normal force of an
object and ranges from 0 (no friction) to 1 (full friction). For a moving object e.g. a
rotating wheel on a flat horizontal surface, the coefficient of friction is defined as the
quotient between the horizontal friction force (commonly called rolling resistance)
F and the vertical normal force FN [11].

µ = F

FN

(3.1)

As shown in Figure 3.1, the friction force F from the ground acts in the opposite
direction to the direction of motion of the wheel at the point of contact. On the
vertical axis of the wheel, the normal force FN that is perpendicular to the road
surface is the force the ground is acting on the wheel. Conversely, according to
Newton’s third law of motion, the wheel is acting with an equal amount of force
FW on the ground which is calculated by multiplying the mass with the acceleration
due to gravity [11]. The friction force F between tire and road surface enables
the vehicle to move forward and is essential for the maneuvering of the vehicle in
both the lateral- and longitudinal directions. In general, higher friction means more
control of the vehicle. In cases when the coefficient of friction is low, the vehicle
starts to slip and the driver loses control of the vehicle. Hence, accurate estimation

7


3. Theory

of the friction between tire and road surface is important for road manufacturers to
improve road and traffic safety.

Figure 3.1: Simplified visualization of the forces acting on a rotating wheel.

3.2 Estimating The Coefficient of Friction
The coefficient of friction (COF) can be estimated for each wheel or as the average
tire-road friction coefficient for all wheels of the vehicle. In the case of the latter,
the coefficient of friction is assumed to be equal for all wheels. Conversely, for indi-
vidual tire-road friction coefficient estimation, this assumption is not made [12]. In
this thesis, the coefficient of friction between tire and road surface was estimated as
the average friction for all wheels. Average friction coefficient estimation is accom-
plished by using slip-slope methods, extended Kalman filter, or lateral-dynamics-
based methods [12].

Slip-slope methods utilize the relationship between the normalized longitudinal
forces acting on the wheels and the relative motion between tire and road surface,
referred to as slip [12] to estimate the COF between tire and road surface. Tire
slip has been proven to have a high correlation with the COF and can be described
as the amount of wheel lock where 0% slip corresponds to a free-rolling wheel and
100% corresponds to a fully-locked wheel [13]. The coefficient of friction increases
rapidly until the peak friction is reached which typically occurs when slip is 10-20%.
Then, the coefficient of friction decreases slowly up to 100% slip, i.e. when the wheel
stops rotating. To avoid sliding when braking on icy roads the relationship between
the coefficient of friction and slip is used in the anti-lock brake system (ABS) to
determine when it needs to activate [13].

The interaction between tire and road surface is a complex event that is affected by
a variety of parameters including vehicle weight, slip, road condition, and surface
texture. Additionally, surrounding environmental factors such as ice, snow, slush,
water, and gravel have an impact on the friction that occurs between tire and road
surface. Some of the most highly influential factors are divided into four categories
presented in Table 3.1, modified from [13].

8


3. Theory

Table 3.1: Factors affecting the estimated friction between tire and road surface.

Vehicle operating
parameters

Tire
properties

Road
characteristics

Environmental
factors

• Longitudinal
motions
◦ Vehicle speed
◦ Braking

action
• Lateral motions

◦ Turning
◦ Overtaking

• Tread design
and condition

• Inflation
pressure

• Rubber
composition
and stiffness

• Road type
◦ Asphalt
◦ Brick
◦ Gravel

• Unevenness
◦ Pothole
◦ Speed bumps

• Weather
◦ Temperature
◦ Precipitation
◦ Humidity
◦ Wind speed

• Contaminants
◦ Anti-skid

materials
(salt, sand)

◦ Dirt, mud, oil
spill

3.2.1 Vehicle Operating Parameters
The estimated friction coefficient between tire and road surface depends on how the
vehicle is maneuvered. As previously mentioned, the friction coefficient is estimated
based on the forces acting on the wheels. Since driving maneuvers such as accelerat-
ing, braking, and turning affects these forces, the estimated friction coefficient will
inherently depend on how the vehicle is maneuvered [13].

3.2.2 Tire Properties
When the wheel spins, the tire is deformed and recovered in repeated cycles due to
the loaded weight. Hence, the friction force between the tire and the road surface
has a hysteresis characteristic that is dependent on tire rubber stiffness [13]. The
tire stores the deformation energy within the rubber and releases the energy in the
form of a pushing force and heat resulting in a net friction force that contributes
to the reduction in forward motion. Additionally, friction occurs due to adhesion
which is the result of the small-scale bindings of the rubber molecules with the road
surface. These types of molecular bonds are called Van der Waals bonds and are
weak attractive forces lasting for a short amount of time. Thus, the total friction
force F can be defined as the sum of both the hysteresis and adhesion friction forces
FH and FA respectively as follows [13]:

F = FH + FA (3.2)

Both the hysteresis and adhesion force components depend on the properties of the
tire itself but also on the characteristics of the road surface that are described in the
next section. The hysteresis effect mostly occurs on the macro-level asperities and is
the dominant component on wet and rough-textured road surfaces. Conversely, ad-
hesion occurs on micro-level asperities and is dominant on dry and smooth-textured

9


3. Theory

road surfaces [13].

3.2.3 Road Characteristics
The properties of the road surface have a direct effect on the estimated tire-road
friction coefficient. The type of road surface affects the maximal friction achievable
between the road and the tire. In [14], the authors studied how different road
conditions affect the friction coefficient. On rough road, the friction coefficient was
measured to be in the range of 0.8-0.9, whereas smooth road resulted in the range
between 0.35-0.45. Finally, gravel and mud surfaces resulted in friction coefficients
between 0.1 - 0.3. Hence, the type of road surface influences the friction coefficient
regardless of other parameters.

3.2.4 Environmental Factors
Both temperature and precipitation have an indirect effect on the friction between
tire and road surface. Water on the road itself will reduce the friction between
tire and road resulting in reduced traction and control over the car. In [15], it was
shown that even a 0.05 mm layer of water on the road surface can result in up
to 30% less tire-road friction when compared to dry conditions. More severe cases
when thicker layers of water build up on the road surface result in an increased risk
of hydroplaning where the friction coefficient can drop close to zero [13].

Snow and ice are the most hazardous environmental factor when it comes to driving.
In the experiments of [14], the friction coefficient of ice surfaces was below 0.1. In
cases when ice or snow blocks the direct contact between tire and road surface, any
sudden changes like breaking or changing direction will cause the wheels to lock and
slide [13].

10


3. Theory

3.3 Machine Learning
Machine learning (ML) is a sub-field in artificial intelligence (AI) where algorithms
are used to find patterns in empirical data based on how humans learn [16]. The
concept of learning in ML is accomplished by gradually improving the accuracy of
predictions in an iterative process commonly referred to as training. The end product
of a machine learning pipeline is a trained model capable of making predictions
based on new data that the model has never been exposed to before [16]. ML
has proven to perform well in a variety of tasks such as classification-, regression-,
clustering-, and association tasks within many different fields including self-driving
cars, banking and financial services, and healthcare. In recent decades, there have
been tremendous advances in the field of artificial neural networks (ANNs), especially
deep neural networks (DNNs) that have proven to outperform previously state-of-
the-art methods in many fields [17]. Furthermore, depending on both the available
data and the task at hand, machine learning can be divided into supervised learning,
unsupervised learning, semi-supervised learning, and reinforcement learning [18]. In
this study, unsupervised- and semi-supervised learning methods were used and are
therefore more thoroughly described next.

3.3.1 Unsupervised Learning
Unsupervised learning is a machine learning technique in which algorithms are
trained to find patterns in unlabeled data [18]. In contrast to supervised learn-
ing where each data point is tagged with a target variable, unsupervised methods
solely rely on capturing patterns in the underlying structure of the data [18]. La-
beling a large amount of data can be both time-consuming and expensive but is
often necessary to train models capable of making accurate predictions that reflect
real-world situations. Additionally, evaluating the results of models trained in an
unsupervised fashion can be challenging since there is no ground truth to compare
with, especially in classification and regression tasks [19]. However, unsupervised
learning is still used in a variety of tasks including data visualization, association,
anomaly detection, dimensionality reduction, and data generation. Data clustering
methods such as k-means, DBSCAN, and hierarchical clustering are common unsu-
pervised learning techniques in which data points are clustered into groups based on
some predefined similarity measure such as Euclidean distance, Haversine distance
or cosine similarity [20]. For anomaly detection tasks, ANNs are commonly trained
to find patterns in "normal data" such that anomalous data can be detected and
removed [21]. An autoencoder is a type of generative ANN that was used in this
study for anomaly detection and will therefore be described later in this chapter.

3.3.2 Semi-supervised Learning
Semi-supervised learning methods use a combination of both labeled and unlabeled
data, where the former commonly is smaller in size [18]. Examples of semi-supervised
learning techniques are active learning, self-training, and co-training. Each of these
techniques involves the model to either label or pseudo-label unlabeled samples

11


3. Theory

from a larger unlabeled data set. Hence, a smaller amount of labeled data is needed
which saves both time and resources. However, unlike in unsupervised learning,
these models can still be trained in a supervised learning fashion using labels as
ground truths to compute the loss.

Active Learning
Active learning (AL) is a semi-supervised machine learning approach used to create
labeled data sets efficiently when annotating data is either difficult or expensive.
There are two methods for implementing AL: pool-based and stream-based. In
stream-based AL, a model is trained on an initial small labeled data set, and new
samples from the unlabeled data set are then passed successively to the model that
predicts a label. For certain predictions, the samples are added to the labeled data
set. However, the more uncertain samples are instead labeled by the oracle (human
labeler).

Pool-based active learning involves additional steps including sampling data points
and using a query to select new samples as shown in Figure 3.2. The core concept
in pool-based AL is that a smaller labeled data set is used to train a model to
sample data from the larger unlabeled data set using a query function. The new
data samples are labeled using either the model or human annotators. Finally, the
labeled samples are added to the labeled data set. The model is then trained using
the extended data set and the cycle is repeated until a data set of sufficient size has
been achieved [22].

Figure 3.2: Pool-based active learning.

There are multiple methods for querying new samples in active learning including
query by committee, uncertainty-based sampling, and expected model change/error
reduction. The committee sampling methods score the samples using multiple mod-
els trained on the labeled data. There are several uncertainty-based methods, but
the main goal is to efficiently select samples that the model struggles to classify
accurately and instead manually label them. Some examples are margin-based sam-
pling, entropy sampling, and least confidence sampling [23]. Finally, expected model
change and expected error reduction are both similar and computationally heavy

12


3. Theory

methods. The change or error reduction is calculated when adding a new sample,
leading to many additional computations [24].

3.3.3 Metrics
Metrics are used to measure the performance of machine learning models. For
classification tasks, some of the most widely used metrics are accuracy, F1 score,
precision, and recall. Accuracy can be used to assess the overall performance of the
model. For binary classification tasks, accuracy can be defined as follows:

Accuracy = TP + TN

TP + TN + FP + FN
(3.3)

where: TP = True positives
TN = True negatives
FP = False positives
FN = False negatives

However, accuracy might be misleading if the data set is imbalanced i.e. when there
are more instances of some class. Additionally, accuracy is not a good measure of the
model’s performance if there are different costs for false positives and false negatives.
Therefore, other metrics such as F1-score are usually more useful than accuracy. F1-
score is computed based on the precision and recall of a model. Precision measures
the proportion of positive predictions that were correct and is computed as follows:

Precision = TP

TP + FP
(3.4)

In contrast, recall measures the completeness of positive predictions and is computed
as follows:

Recall = TP

TP + FN
(3.5)

Finally, with precision and recall, the F1-Score is computed as follows:

F1-score = 2 · Precision · Recall
Precision + Recall

(3.6)

13


3. Theory

3.3.4 Hyperparameter Tuning
Hyperparameter tuning is essential when optimizing the performance of a machine
learning algorithm. A hyperparameter is a parameter that directly affects the learn-
ing process including batch size, learning rate, and number of epochs. The optimal
hyperparameters for a model differ depending on the data that the model is being
trained on. Furthermore, the goal in hyperparameter optimization is to find a com-
bination of hyperparameters that yields the optimal value of an objective function
e.g. loss or accuracy [25].

The most common approaches for hyperparameter tuning are grid search and ran-
dom search. Grid search is an exhaustive search method in which the best-performing
model is assessed by training the model on each combination of provided hyperpa-
rameters. In random search, the combination of hyperparameters is randomized
which in some cases can lead to more optimal hyperparameters with less number of
iterations. Even though both grid- and random search often can be computed in
parallel, they are time-consuming methods, especially for large numbers of hyperpa-
rameters [25].

Another approach for hyperparameter tuning is to use hyperparameter optimization
frameworks such as Optuna which is designed to find the most optimal parameters
automatically and efficiently. A benefit of using Optuna is that the search space is
dynamically constructed using a sampling strategy. Both grid search and random
search can be used as the sampling method in Optuna. Additionally, other model-
based sampling methods available in Optuna are tree-structured Parzen estimator
(TPE), Gaussian processes (GP), and covariance matrix adaptation evolution strat-
egy (CMA-ES). In this thesis, the TPE sampling method was used which is based
on independent sampling of previously evaluated hyperparameters and is known to
generally perform well. Another benefit of using Optuna is that trials can be ter-
minated if they do not seem promising. This means that more time can be spent
on better trials during the search for the optimal hyperparameters. Finally, after
the hyperparameter tuning process has finished, Optuna has the option to extract
the most important hyperparameters that had the most impact on minimizing or
maximizing the objective value [26].

Additionally, Optuna can handle multiple objectives during the hyperparameter
tuning process. In multi-objective optimization, it is common to end up with more
than one optimal combination of hyperparameters and trade-offs must therefore be
made. Finally, with Optuna it is possible to simply extract the most optimal trials
for multiple objectives by computing the Pareto front [27].

14


3. Theory

3.4 DBSCAN
Density-based spatial clustering of applications with noise (DBSCAN) is a clustering
algorithm used to group spatial data points by how densely packed they are [28].
Points that are closely packed together will form a cluster while other points located
in low-density regions will be classified as outliers. Hence, DBSCAN can detect
outliers and handle noisy data well unlike other famous clustering algorithms such
as K-means. The main idea of DBSCAN is to cluster the given data based on a
concept called ϵ-neighbour that is defined as follows:

Definition 1 Let D be a data set containing points. Two points p, q ∈ D are
neighbours if their distance is smaller than the radius ϵ around p:

Nϵ(p) = {q ∈ D | d(p, q) < ϵ}

where d is the distance measure.

Examples of distance measures are Euclidean-, Manhattan-, and Haversine distances.
The latter is defined as the angular distance between two points (longitude and
latitude) on the surface of a sphere. Unlike Euclidean- and Manhattan-distance, the
Haversine distance takes the curvature of the sphere into account. In DBSCAN, the
radius ϵ is a value that determines the maximum distance between two points for one
of the points to be considered as a neighbor to the other point. Choosing a too-small
epsilon would lead to many points being classified as outliers. Conversely, a too-
high epsilon value would result in merged clusters that should have been separated.
Another tunable parameter in DBSCAN is a value determining the minimum number
of points within the radius ϵ needed for the center point to be considered as a core
point.

Figure 3.3: How DBSCAN works.

DBSCAN starts by defining core points as shown in Figure 3.3. Core points are used
to extend the clusters further. Ultimately, when all core points have been assigned
a cluster, non-core points or border points that are within the ϵ radius are added to
the cluster. Border points can only be added to clusters and not be used to add new
points to the clusters. Finally, the remaining points are considered to be outliers
since they are not close to any core point.

15


3. Theory

3.5 Artificial Neural Network
An artificial neural network (ANN) is a machine learning algorithm inspired by the
biological neural network that exists in the human brain [29]. Similar to the human
brain, ANNs are built upon a set of interconnected nodes/neurons that send and
receive signals in a complex system. In an ANN, these signals are real numbers that
are being processed by the nodes to form an output that is computed by a non-linear
function, called activation function. The simplest ANN is a single-layer perceptron
consisting of multiple inputs and an output, as shown in Figure 3.4.

Figure 3.4: A single layer perceptron with three input variables and one output.

A perceptron with n input variables is a type of linear classifier that computes the
output ŷ in two steps. First, each input variable x1, x2, . . . , xn is multiplied with its
corresponding weight w1, w2, . . . , wn, as shown in Equation 3.7.

z =
n∑

i=1
xiwi + b (3.7)

Additionally, a bias term b is added to this weighted sum that serves as an additional
parameter that can be tuned to improve the performance of the perceptron. Second,
the weighted sum is passed through an activation function f that determines if the
node should be activated or not, as shown in Equation 3.8.

ŷ = f(z) =

1 if z > 0,

0 otherwise
(3.8)

Common activation functions are sigmoid, tanh, and rectified linear unit (ReLU)
[30]. In this simple example, the activation function is a function that activates if
the input is positive and deactivates if the input is negative. Hence, the output of
this perceptron will be binary (0 or 1).

The single-layer perceptron can handle linear classification tasks but struggles when
the data is not linearly separable [29]. Therefore, a multi-layer perceptron (MLP)
also referred to as a feed-forward neural network (FNN) was developed with the idea

16


3. Theory

Figure 3.5: A multi-layer perceptron.

of creating a non-linear model by combining multiple single-layer perceptrons in a
network. An MLP is a fully-connected neural network meaning that each neuron
in one layer is connected with each neuron in neighbouring layers, as shown in
Figure 3.5. A connection between two neurons enables the output signal from one
neuron to be passed as input to the next neuron. The layers can be divided into an
input layer, hidden layers, and an output layer. Both the input- and output layers
contain only a single layer. However, between these layers are the hidden layers that
can contain any number of layers and neurons. When calculating the depth of an
ANN, only the layers with adjustable weights are considered, i.e. the hidden layers
and output layer. Increasing the number of hidden layers increases the number of
parameters and complexity of the network. Hence, training a network with many
layers will be more time-consuming and the network might overfit to the training
data and thus perform worse when exposed to new data. Conversely, a network
containing too few hidden layers will struggle to learn the underlying representation
of the data. ANNs containing many layers are referred to as deep neural networks
(DNNs) and will be discussed later in this chapter. The network shown in Figure 3.5,
has four neurons in the input layer for each input. The input data is then passed to
the two hidden layers containing three neurons each. Finally, the signals from the
last three neurons in the last hidden layer are passed to the output layer containing
two neurons to form the output.

Similar to humans, ANNs learn by making initial guesses and improving these
guesses when new information is given and processed. The goal of training an ANN
is that it should learn how to make accurate predictions by finding the optimal
weights and biases for each neuron in the network. Training an ANN is an iterative
process that consists of two main concepts: forward- and backpropagation. In for-
ward propagation, the input data is passed forward to each neuron in the network.
As previously mentioned, the outputs of each neuron in the network are computed
using an activation function that suppresses the input values to a bounded range

17


3. Theory

Figure 3.6: Three common activation functions.

such that the problem of exploding gradient is prevented. The most commonly used
activation functions are sigmoid, tanh, and ReLU as shown in Figure 3.6.

The sigmoid activation function is a non-linear function that transforms the input to
a value between 0 and 1, and is thus suitable for producing probabilities [30]. Since
many loss functions expect the input values to be in the range of 0 to 1, the sigmoid
activation function is commonly used for neurons in the output layer. The tanh
function is another non-linear function that pushes the input values to the range -1
to 1. Both the sigmoid and tanh activation functions are differentiable everywhere
meaning that both activation functions can be used for gradient-based backprop-
agation. However, both functions can cause the vanishing gradient problem [30].
This means that the weights and biases are updated with small changes resulting
in slow learning. The ReLU activation function is commonly used in the hidden
layers instead of sigmoid and tanh to fix this problem. Unlike sigmoid and tanh,
ReLU is not differentiable at zero. This means that a sub-derivate is needed in the
backpropagation when using ReLU as an activation function. Additionally, ReLU
introduces sparsity in the hidden layers which have been proven to be beneficial in
ANNs and are achieved when some values are negative and converted to zero by the
ReLU function. However, ReLU suffers from the dying ReLU problem that occurs
when too many activations get below zero resulting in the deactivation of neurons
and thus a reduction in learning [31].

The final output from the neurons in the output layer is a predicted vector ŷ that is
compared with the target vector y by using a loss function. The loss is a measure of
how far the predicted value is from the target and the choice of loss function depends
on the type of task the network tries to solve e.g. classification or regression. For
binary classification tasks, the binary cross entropy (BCE) is commonly used and is
computed as follows [32]:

LBCE = − 1
n

n∑
i=1

(yi · log(ŷi) + (1 − yi) · log(1 − ŷi)) (3.9)

The BCE loss function expects the predicted values ŷi and target values yi to be inter-
preted as probabilities ranging from 0 to 1. Additionally, BCE loss is differentiable
at all points making it suitable for gradient computations. Another commonly used
loss function for binary classification tasks is the hinge loss defined as follows [33]:

18


3. Theory

LHINGE = 1
n

n∑
i=1

max(0, 1 − (ŷi · yi)) (3.10)

Similarly to BCE, the hinge loss function also expects the predicted values to be
between 0 and 1. However, the hinge loss is not differentiable at one meaning that
sub-derivatives are needed. For regression tasks, mean absolute error (MAE) also
known as L1 loss is a commonly used loss function and is calculated by taking
the absolute value of the difference between the predicted- and target values as
follows [34]:

LMAE = 1
n

n∑
i=1

|yi − ŷi| (3.11)

Another loss function for regression tasks that is similar to MAE is the mean squared
error (MSE) that is computed by taking the square of the difference between the
predicted- and target values [35].

LMSE = 1
n

n∑
i=1

(yi − ŷi)2 (3.12)

Unlike MAE, MSE penalizes large errors since the difference is squared. If many
predicted values result in low losses, then MSE is preferable. However, if the model
makes a few poor predictions, the MSE loss will magnify the error due to the squaring
of the difference. This will result in an overall high error using MSE even though
the majority of predictions were accurate. In that case, MAE might be the better
alternative.

The goal of training an ANN is to reduce the loss by updating the weights and biases
of the network during backpropagation. Finding the smallest loss is an optimization
problem in multi-dimensional space tackled by the use of an optimizer. Most of the
optimizers are based on gradient descent which is an algorithm that converges to
a local minimum of the loss function. Some of the most well-known optimizers are
stochastic gradient descent (SGD), root mean squared propagation (RMSprop), and
adaptive moment estimation (Adam) [36]. During backpropagation, the weights wi

and biases b of the network are updated using gradient descent as follows [37]:

wi = wi − η
∂L

∂wi

b = b − η
∂L

∂b

(3.13)

The learning rate η is a tunable hyperparameter that determines the size of each
step taken towards a local minimum of the loss function in gradient descent [37]. A
too high learning rate might result in overshooting the local minimum. Conversely,
a too low learning rate can result in slow convergence or possibly being stuck at an
undesirable local minimum. Commonly, the learning rate is set to a value between
1 × 10−3 and 1 × 10−5 but needs to be tuned for optimal performance.

19


3. Theory

3.5.1 Recurrent Neural Network
A recurrent neural network (RNN) is a type of artificial neural network (ANN) that
handles sequential data as input [38]. Similar to vanilla ANNs, RNNs consist of
an input layer, hidden layers, and an output layer. However, traditional ANNs
do not take the order of the input data into account and can therefore not find
temporal or structural patterns in sequences. Therefore, RNNs use an additional
memory mechanism in the form of a feedback loop that allows the model to reuse
past information in the computations of the next output [38] as shown in Figure 3.7.
RNNs can be used for different types of sequence modeling tasks including sentiment
classification, text generation, image captioning, translation, or forecasting. For
classification tasks, the output of the RNN will not be a sequence but rather a
vector of probabilities. For sequence generation tasks, commonly referred to as
sequence-to-sequence or seq-to-seq, the relationship between the input and output
is many-to-many since both the input and output are a sequence [39].

Figure 3.7: The unrolling of a recurrent neural network.

The RNN is unrolled across time into a full network for each element in the sequence.
An RNN cell is similar to a neuron in a feed-forward neural network in that it
computes an output vector by the use of an activation function on the weighted
sum of the inputs. However, in addition to the output vector, RNN cells produce
a hidden state ht that is passed to the next RNN cell in the network such that the
output is influenced by both current and previous values. The weights and biases
are shared across time meaning that the number of parameters does not change
with increasing number of unrolls of the network. During forward propagation of
the network, the hidden state ht and output yt is updated for each time step t as
follows [40]:

zt = Wht−1 + Uxt + b

ht = tanh(zt)
ot = V ht + c

ŷt = softmax(ot)

(3.14)

20


3. Theory

where: U = input-to-hidden weight matrix
V = hidden-to-output weight matrix
W = hidden-to-hidden weight matrix
b, c = bias vectors

During the backpropagation of an RNN, the gradients flow backward for each time
step and across all time steps starting from time t. This process is referred to
as backpropagation through time (BPTT) [38]. The gradients are computed with
respect to each weight matrices U ,V ,W , and bias vectors b and c. Each parameter
is then updated similarly to a regular ANN. Due to the unrolling of the RNN, the
network usually becomes deep, resulting in many repeated gradient computations.
This can in turn result in vanishing or exploding gradients, i.e. gradients becoming
too small or too large respectively [38]. Hence, the network will not be able to
update the weights and biases properly and the learning will be reduced or stopped
entirely. In other words, vanilla RNNs suffer from short-term memory and struggle
to carry information from earlier time steps, especially for long sequences. A solution
to exploding gradients is gradient clipping where large gradients are clipped back to
a feasible scale. Vanishing gradients is a more common problem in RNNs and can
somewhat be mitigated by choosing a suitable activation function, initializing the
weights, or changing the network architecture [38].

3.5.2 Long Short-Term Memory
As previously described, vanilla RNNs are commonly facing the vanishing- and ex-
ploding gradient problem that inherently prohibits the network from learning. To
counteract this issue, long short-term memory (LSTM) cells were introduced with
the idea of including both long- and short-term memory when computing the outputs
of each cell in the RNN [41]. The long-term memory of an LSTM cell is represented
by the cell state c that flows through the top of the LSTM cell in Figure 3.8. In
contrast, the short-term memory is reflected in the hidden state h. The main idea of
LSTMs is that the gradients for the cell states neither vanish nor explode since they
are not directly computed w.r.t the weights. LSTMs are more complex compared
to regular RNNs but are better at learning the inter-dependencies in sequential
data [41] through the use of gates.

The first gate of an LSTM cell is the forget gate where the input vector xt at the
current time step t and the previous hidden state ht−1 is passed through a sigmoid
activation function that outputs a value ranging from 0 to 1 (probability) as follows:

ft = σ(Wf [ht−1, xt] + bf ) (3.15)

The first hidden state is usually initialized by setting each value to zero. The output
from the sigmoid activation function is then multiplied with the previous cell state
ct−1. Hence, the output of the sigmoid activation function determines the amount
of long-term memory that should be forgotten [41].

21


3. Theory

Figure 3.8: Long short-term memory (LSTM) cell.

Next, in the input gate, the previous hidden state and current input are passed to
both a sigmoid- and a hyperbolic tangent activation function as follows:

it = σ(Wi[ht−1, xt] + bi)
c̃t = tanh(Wc[ht−1, xt] + bc)

(3.16)

The latter is used to compute potential long-term memory while the former is used
to decide the percentage of long-term memory to be added to the cell state. The
current cell state ct will be updated based on the outputs from both the forget gate
and input gate as follows [42]:

ct = ft ⊙ ct−1 + it ⊙ c̃t (3.17)

The final gate in an LSTM cell is the output gate where the values of the current
cell state ct are passed to a hyperbolic tangent function to compute potential short-
term memory. Similar to the other gates, the sigmoid activation function is used
on the previous hidden state and the current input to compute a probability that
determines the amount of short-term memory to be passed to the next LSTM cell
as follows [42].

ot = σ(Wo[ht−1, xt] + bo)
ht = ot ⊙ tanh(ct)

(3.18)

3.5.3 Gated Recurrent Unit
A gated recurrent unit (GRU) is a type of RNN that is similar to LSTM in that gates
are used to update the hidden states of the network for each time step. GRUs are

22


3. Theory

less complex compared to LSTMs which makes them less computationally expensive
while still maintaining a high performance comparable to the performance of LSTMs.
Unlike LSTMs, GRUs consists of only two gates and do not use the internal cell
state, i.e. the long-term memory state. Hence, in theory, LSTM networks should
outperform GRUs in cases when the input sequences are long [43].

Figure 3.9: Gated recurrent unit (GRU) cell.

A GRU cell consists of a reset- and update gate as shown in Figure 3.9. The input
vector xt and previous hidden state ht−1 are first passed to the sigmoid function in
the reset gate as follows [43]:

rt = σ(Wr[ht−1, xt] + br) (3.19)

The sigmoid function outputs a value between 0 and 1. This value is used later in
the update of the current hidden state ht and determines how much of the previous
information should be forgotten. Next, the input vector xt and previous hidden
state ht−1 are passed to the sigmoid function in the update gate where the output
is computed as follows [43]:

zt = σ(Wz[ht−1, xt] + bz) (3.20)

This value is used to update the hidden state. The candidate hidden state h̃t is
computed using a hyperbolic tangent function. The output value from the sigmoid
function in the reset gate rt is multiplied with the previous hidden state ht−1. This
product is used in the weight matrix multiplication. Additionally, the input vector
xt is also multiplied with the weight matrix. The bias term is added and the entire
matrix is passed to the hyperbolic tangent function that outputs the candidate
hidden state h̃t as follows [43]:

23


3. Theory

h̃t = tanh(Wh[rt ⊙ ht−1, xt] + bh) (3.21)

Furthermore, to compute the current hidden state ht, the candidate hidden state is
multiplied with the output from the sigmoid function in the update gate zt. The
opposite or complement probability of zt is computed and multiplied with the pre-
vious hidden state ht−1. The sum of these element-wise products forms the current
hidden state as follows [43]:

ht = (1 − zt) ⊙ ht−1 + zt ⊙ h̃t (3.22)

3.5.4 Autoencoder
An autoencoder (AE) is an unsupervised learning technique based on an artificial
neural network that is trained to reconstruct high-dimensional input data. Au-
toencoders can be used in a variety of applications including feature extraction,
dimensionality reduction, image generation, and anomaly detection. Autoencoders
for anomaly detection is a special use case where the reconstruction error or loss is
used to classify and filter out anomalies [3]. Some examples of applications where
autoencoders are used for anomaly detection are [44], [7], [6].

The structure of an autoencoder consists of three main parts: encoder, bottleneck,
and decoder, as shown in Figure 3.10. The encoder can be viewed as a function
g(x) that can transform the input data x into a latent vector representation z. The
encoded vector z is located in the bottleneck of the network and is a compressed
representation of the input data. This vector is then up-scaled to the same dimension
as the input data in the decoder part f(z) of the network. Ideally, after training the
autoencoder, the input x should be identical to the reconstructed output x̂ [45].

Figure 3.10: The structure of an autoencoder.

24


3. Theory

As previously mentioned, autoencoders can be used for anomaly detection tasks.
The main idea is to train the autoencoder on normal data that contains a majority
of normal data points. Ideally, this data set should not include any anomalies. How-
ever, this can be challenging to achieve, especially when the causes of the anomalies
are unknown. Furthermore, the autoencoder will encode the normal data into a
latent vector representation that is decoded back to its original dimension. The au-
toencoder is trained to minimize the reconstruction loss such that the reconstructed
output will be similar to the input after training. Hence, smaller reconstruction
losses will in general correspond to reconstructed outputs that are more similar to
the input. After training, the autoencoder should have learned the latent distribu-
tion of the input data such that when anomalous data is inputted, the reconstruction
loss will be higher compared to when normal data is inputted. A threshold on the
reconstruction loss can therefore be used to classify if the inputted data is normal
or anomalous.

Autoencoders for anomaly detection tasks can be summarised in the following four
steps:

1. Train the autoencoder on normal data.

2. Feed normal data as input to the trained autoencoder and compute the distribu-
tion of the reconstruction loss.

3. Select a threshold on the reconstruction loss e.g. such that the majority of normal
data is being classified as normal.

4. Test and evaluate the trained autoencoder with the given threshold by observing
the number of anomalies being detected.

One of the most important hyperparameters for autoencoders is the embedding
size that determines the size of the compressed output vector of the encoder. The
embedding size can have a large impact on the model’s ability to distinguish false-
from true positive sequences. A too large size of the embedding will enforce the
autoencoder to learn the identity function such that the model will generate low
losses for any input and thus make the model unreliable in detecting anomalies [46].
Conversely, a too small embedding size will make it too difficult for the model to
reconstruct the data since too much information is lost.

Traditionally in machine learning, the objective of hyperparameter tuning is to min-
imize the validation loss or maximize the accuracy. However, finding the most
optimal hyperparameters for autoencoders in the setting of anomaly detection is a
non-trivial task because the loss is not a rigorous measure of the performance on
its own. For example, increasing the embedding size will result in the input data
not being compressed much and the decoder will easily decode the latent vector
representation and reconstruct the input data such that the reconstruction loss will
be small for all inputs. However, the autoencoder will struggle to find and learn
patterns specific to the data it is being trained on and thus will not perform well in
detecting anomalies [46].

25


3. Theory

3.6 Spatial Hashing
Spatial hashing is a method that is used to reduce the number of comparisons
needed to find neighbouring points [47]. Assume the goal is to map each point
pi from the data set D1 = {p1, p2, . . . , pn} with each point qj from the data set
D2 = {q1, q2, . . . , qm} based on their spatial relationship. A naive solution would
be to compare the locations of every point with each other. This results in a time
complexity of O(nm), where n and m are the numbers of points in D1 and D2
respectively. Spatial hashing is a significantly more efficient solution, especially with
an increasing number of points in the data sets. The main idea with spatial hashing
is to reduce the number of comparisons needed to find neighbouring points in two
steps. First, each point pi ∈ D1 is assigned to cells in a grid using a hash function
as shown in Figure 3.11. The hash function can be used to e.g. map latitudinal-
and longitudinal coordinates to cell indices in 2D or 3D space [47].

Figure 3.11: Spatial hashing.

Furthermore, to find neighbouring points of a point qi ∈ D2, the point is hashed to
a cell index using the same hash function. The distance is computed between qi and
each point from D1 that was assigned to adjacent cells, including the cell to which
qi was assigned. Hence, instead of computing the distance between every point as in
the naive solution, the point qi is only compared with points in neighbouring cells.
The cell size influences the number of points that are mapped to the same cell and
is a parameter that needs to be tuned for optimal performance. A too large cell size
would result in too many points being hashed into the same cell and thus increase
the number of comparisons needed. Conversely, a too small value would lead to
points spanning too many cells which can result in points not being compared as
they could get grouped into grids further apart [47].

The time complexity of spatial hashing depends on how the points are disturbed
in the cells. In cases when all points are hashed to unique cells, the complexity is
O(1). The worst case is when all points are assigned the same cell which results in
the time complexity of O(nm), which is the same as for the naive approach [47].

26


4
Data Description and Analysis

This chapter provides a description and analysis of the available data for this project.
Section 4.1 presents the available data in this study which originated from three
sources: Volvo Cars Corporation (VCC), Trafikverket, and SMHI. This is followed
by an analysis of the data in Section 4.2. Since the data was unlabeled, the analysis
was essential to find potential root causes of false positive alerts and to justify choices
made in the methodology.

4.1 Available Data
The available data used in this study is divided into the following four sets: vehicle
data, slippery road alert (SRA) data, weather data, and road data. The vehicle-
and SRA data originated from Volvo Cars Corporation (VCC) while the weather-
and road data were requested and gathered from Swedish Transport Administration
(Trafikverket). Additionally, the weather data used in the analysis was gathered
from Swedish Meteorological and Hydrological Institute (SMHI).

4.1.1 Vehicle Data
The vehicle data was gathered from the cloud for 10 weeks (72 days) between the
12th of January and to 24th of March 2023 during winter and early spring in Gothen-
burg, Sweden. Ultimately, after the data gathering process, the vehicle data con-
tained approximately 115 million data points. Due to GDPR, the vehicle data was
anonymized such that unique vehicles could not be identified.

The most important features of the vehicle data that were used in this thesis are
presented in Table 4.1. This data contained bursts of data points sent from Volvo
vehicles across Gothenburg and was transformed into sequences in the preprocessing
step that is described in Chapter 5. The sequential vehicle data was then used to
train and evaluate the machine learning models.

27


4. Data Description and Analysis

Table 4.1: Features of the vehicle data.

Feature Abbreviation Data type
1 Friction coefficient FC float
2 Friction quality FQ int
3 Creation time CT timestamp
4 Coordinate COORD geolocation
5 Temperature Tv float
6 Wiper speed WS int
7 Vehicle identification number VIN string
8 Software versions SV string
9 Road max speed RMS int

1. Friction coefficient
The estimated friction coefficient was the main feature of interest in this thesis and is
a value ranging from 0 to 1, where 0 represents no friction and 1 represents maximum
friction. Friction coefficients below or equal to 0.35 are defined as low friction while
friction coefficients above 0.35 are defined as high friction. This estimated value is
based on the longitudinal brush model, lateral tire-force model, and limit handling
measurement based on the ABS and traction control system.

2. Friction quality
The friction quality is an integer number ranging from 4 to 7 indicating the quality
of the estimated friction coefficient. The calculations of this value are based on the
variance of slip and a multiple of in-signals to the vehicle such as the lateral- and
longitudinal forces on the wheels.

3. Creation time
The creation time represents the time in which the friction data was created on the
database and is expressed by the date and time ranging from hours to milliseconds.

4. Coordinate
The coordinates in the friction data correspond to the longitude- and latitude coor-
dinates of each data point.

5. Temperature
The temperature measurement originates from the vehicle temperature sensor and
is represented in Celsius degrees.

6. Wiper speed
The wiper speed is an integer value ranging from 0 to 6 indicating the speed of the car
wiper. This value correlates with the precipitation. However, since the wiper speed
can be set manually, it is not an accurate nor reliable indicator of precipitation.

28


4. Data Description and Analysis

7. Vehicle identification number
The vehicle identification number (VIN) is a unique identification number for each
vehicle and was only used for creating the sequential data. This number was removed
and was not present in the data sets due to GDPR.

8. Software versions
Software versions represent multiple versions of the software in the electronic control
unit (ECU).

9. Road max speed
The maximum legal speed for the road ranges from 5 km/h to 120 km/h.

Figure 4.1 shows a sample of data points from the vehicle data plotted on a map.
Each data point corresponds to a row in the vehicle data with each of the previously
mentioned features as columns. As can be seen in the upper right corner, vehicles in
motion send bursts of data points to the cloud resulting in the formation of sequences
with both temporal and spatial properties. This was an important discovery that
was exploited in the creation of the data sets for the machine learning models.

Figure 4.1: Sample of data points from the vehicle data plotted on a map.

29


4. Data Description and Analysis

4.1.2 Slippery Road Alert Data
The slippery road alert (SRA) data consisted of both location and time of specific
SRAs generated in Gothenburg, Sweden within a year from 25th of March 2022 to
24th of March 2023. This data was only used in the analysis to find root causes
of when potential false positive SRAs can occur. Furthermore, the SRA data was
previously aggregated meaning that unique vehicles could not be identified in this
data.

The features of the SRA data are shown in Table 4.2. The start- and end times are
expressed by the date and time ranging from hours to milliseconds. The locations
of the SRAs are indicated by the longitudinal- and latitudinal coordinates.

Table 4.2: Features of the slippery road alert data.

Feature Data type
1 Start time timestamp
2 End time timestamp
3 Location coordinate

4.1.3 Weather Data
The weather data used to create the data sets was requested from Trafikverket
and consisted of both temperature and precipitation measurements from 11 weather
stations located near roads scattered across Gothenburg, Sweden. The features of
this weather data are shown in Table 4.3. The station id was an integer value
corresponding to the 11 weather stations. The road temperature was measured in
Celsius degrees and the road precipitation level was measured in millimeters. The
time interval of this data was 30 minutes. This data was used to filter the vehicle
data to form the data sets used for training and testing the machine learning models.
Additionally, this data was also used to train some of the models using the active
learning approach.

Table 4.3: Features of the weather data from Trafikverket.

Feature Abbreviation Data type
1 Station id SI int
2 Date and time DT timestamp
3 Road temperature Tr float
4 Road precipitation Pr float

The weather data used in the data analysis originated from SMHI. This data was
used since the weather data from Trafikverket was not available at the time when the
analysis was conducted. Additionally, the weather data from SMHI was daily based
and computed using more sophisticated algorithms than just computing the mean
for each day, which would be the alternative if the weather data from Trafikverket
were to be used instead. The temperature was measured in Celsius degrees while the

30


4. Data Description and Analysis

precipitation levels were measured in millimeters. The data from SMHI originated
from a weather station in Gothenburg and was gathered via SMHI’s publicly open
website. The features of the weather data from SMHI are presented in Table 4.4.

Table 4.4: Features of the weather data from SMHI.

Feature Data type
1 Date and time timestamp
2 Daily air temperature float
3 Daily precipitation float

4.1.4 Road data
The road data was gathered from Trafikverket’s site Lastkajen [48] within the area
of Gothenburg. Lastkajen is a web-based application open to the public where one
can order and download Sweden’s road and railway data. For this thesis, the most
valuable road data was speed bump locations, and road types as shown in Table 4.5.
The speed bump locations consisted of single longitudinal- and latitudinal coordi-
nates for each location of speed bumps whereas the road types were represented by
polygons of multiple longitudinal- and latitudinal coordinates.

Table 4.5: Features of the road data.

Feature Data type
1 Speed bump location coordinate
2 Asphalt road coordinates
2 Gravel road coordinates

31


4. Data Description and Analysis

4.2 Data Analysis
An initial data analysis of the unprocessed data from VCC and Trafikverket was
conducted for three main reasons. First, since the available data was unlabeled, this
initial analysis was essential to get an intuition of in which circumstances true- and
false-positive warnings might occur. A hypothesis was that most of the generated
alerts during a year are true positives and occur when the probability of ice formation
on the road surfaces is high, i.e. when the precipitation levels are high and the
temperature is below 0 ◦C. Conversely, some false positive alerts might occur when
driving on speed bumps or when making a sharp turn for example. Second, the
insights from this analysis were used to motivate the choices made in this project
including how the raw data was processed and how the models were constructed.
Third, this initial analysis might reveal correlations between features and limitations
of the data that need to be taken into consideration in the creation of the final data
sets used to train and test the machine learning models. Ultimately, the goal of this
initial data analysis was to answer the following questions:

• Are there seasonal patterns in the number of generated slippery road alerts?

• What are the root causes of true- and false-positive slippery road alerts?

• In which weather conditions are low- and high values of the estimated friction
coefficient sent to the cloud?

The majority (56%) of the alerts were generated during spring (March to May),
as shown in Figure 4.2. This was somewhat unexpected since winter is commonly
more associated with slippery roads. However, further investigation into the data
reveals that the maximum number of alerts occurred during a week in spring when
it snowed heavily and the temperature was below 0 ◦C. Furthermore, the number
of alerts during winter was 34% while 7% occurred during autumn and only 3%
occurred during summer.

Figure 4.2: Number of generated alerts for each season within a year.

Most of the generated alerts during spring and winter are believed to be true pos-

32


4. Data Description and Analysis

itives caused by ice formation on the road surfaces. Hence, finding false positive
alerts during these seasons can be challenging. On the contrary, most of the alerts
generated during autumn and summer are probably false positives. However, both
true- and false positives can occur in any season. For example, some true positives
during summer and autumn could be caused by hydroplaning. Conversely, during
spring and winter, hard braking or driving on a speed bump can potentially cause
false positive alerts.

The vehicle data used to create the data sets for the machine learning models were
gathered during late winter and early spring. Hence, the number of generated alerts
from the SRA data during this period is of high interest. Figure 4.3 shows the num-
ber of generated alerts (orange) against the air temperature (grey) and precipitation
levels (blue) for each day from 2023-01-12 (winter) to 2023-03-24 (early spring).

Figure 4.3: Number of generated slippery road alerts in Gothenburg during the
project (late winter and early spring).

As expected, when the daily air temperature dropped below 0 ◦C and the precipi-
tation levels were above 0 millimeters, then the number of alerts increased. Note
that the daily air temperature dropped below 0 ◦C four times within this period.
The first two drops occurred on the 22nd of January and 27th of January. Even
though the daily air temperature was below 0 ◦C, none or few alerts were generated.
The reason for this is probably because of no precipitation. The next drop in air
temperature below 0◦C occurred on the 4th of February when the precipitation lev-
els were above 0 millimeters. As expected, this increased the number of generated
alerts. Similarly, the number of generated alerts increased in the final drop in air
temperature which occurred around the 8th of March. The precipitation levels were
above 0 millimeters during this period as well. The maximum number of alerts
(approximately 130 000) occurred on the 7th of March. In general, when the daily
air temperature was above 0 ◦C, the number of generated alerts was few. Some of
these alerts might be false positives. An increase in precipitation levels during those
days did not seem to drastically increase the number of generated alerts.

The number of daily generated alerts during summer 2022 in Gothenburg is shown

33


4. Data Description and Analysis

Figure 4.4: Number of daily generated alerts together with the daily air tempera-
ture and precipitation during summer 2022 in Gothenburg.

in Figure 4.4. As previously mentioned, the number of generated alerts during sum-
mer was much lower compared to alerts generated during winter and spring. High
precipitation levels and air temperatures above 0 ◦C increase the risk of hydroplan-
ing which could be an explanation for the increase of generated alerts on the 25th
of July and 4th of August. Most of the generated alerts during days with no or a
small amount of precipitation are probably false positive alerts. There are many
explanations for these false positives including hard breaking, rapid acceleration,
or driving on speed bumps. To investigate this further, the locations of generated
alerts during summer 2022 are visualized in Figure 4.5. The selected area shown in
this figure is where the majority of the alerts were generated in Gothenburg during
the summer of 2022.

Figure 4.5: Heatmap of generated slippery road alerts within an area in Gothen-
burg during summer 2022.

34


4. Data Description and Analysis

The vast majority of generated alerts marked in red are close to speed bumps in this
specific area. The reason why some of the false positive alerts are generated close to
speed bumps could be due to hard braking or driving on the speed bump since both
cause the estimated friction to change. A large number of alerts were generated
close to a specific speed bump shown in the upper left corner. The location of this
speed bump was included in the data from Trafikverket, thus it could be visualized
as a black dot. However, speed bumps that were temporally or recently placed such
as the speed bumps shown in the bottom right corner, were not included in the road
data and could therefore not be used for further analysis and processing.

Figure 4.6: Number of friction measurements against the daily air temperature
and daily precipitation levels.

As can be seen in Figure 4.6, both the daily air temperature and precipitation from
SMHI correlate with the number of low estimated friction coefficients sent to the
cloud each day. The number of low friction values is drastically increased only when
the daily air temperature is below 0 ◦C and the daily precipitation is above 0 mm.
These days were the 3rd of February and the 7th to 10th of March. This is reflected
in the number of generated alerts previously shown in Figure 4.3. For other days,
even though the daily precipitation was above 0 mm, the number of low friction
values sent to the cloud was not increased much since the daily air temperature was
above 0 ◦C. Hydroplaning might be an explanation for some of the true positives
for these cases. Furthermore, some low friction values are sent to the cloud even
though the daily air temperature was above 0 ◦C and the daily precipitation was 0
mm. These data points most likely correspond to false positive situations.

35


4. Data Description and Analysis

36


5
Methods

This chapter describes the methods used in this thesis and how the project was
carried out based on the data analysis from the previous chapter. Section 5.1 be-
gins with defining true- and false positives in the context of slippery road alerts. In
Section 5.2, the proposed solution, a false positive filter is presented. Section 5.3
describes how the raw vehicle data was processed including sequence generation,
sequence mapping, and sequence grouping. Section 5.4 describes the final data sets
used to train, test and evaluate the machine learning models for each approach.
Section 5.5 describes the architectures of the machine learning models and the ex-
periments conducted in this thesis. Finally, Section 5.6 describes the evaluation
process of the final models for both the autoencoder- and active learning approach.

5.1 Defining True- and False Positive SRAs
The first step of the project was to define true- and false positives in the context
of slippery road alerts (SRAs). There are many possible causes for SRAs; some
of them are ambiguous and therefore more challenging to classify. One can for
example argue that SRAs that are generated when vehicles are driven on a gravel
road should be classified as a true positive alert since it caused the vehicle to slip.
However, since alerts caused by these types of situations depend on many other
parameters and rarely occur in the same location, the alert might not be useful for
the driver. Ultimately, defining true- and false positive SRAs can be divided into two
types of definitions, one where the actual slipperiness is central and the other that
focuses more on the value of receiving the alert. For this project, true positive alerts
are defined as alerts generated in situations that lead to loss of traction. Conversely,
false positive alerts are defined as alerts generated in situations when the vehicle did
not slip.

37


5. Methods

Table 5.1: Definition of true- and false positive slippery road alerts.

True positives False positives
1 Ice/snow slip Driving on speed bump
2 Hydroplaning Driving on road crack/pothole
3 Gravel slip Making a sharp turn
4 Mud slip Rapid braking
5 Slip on wet leaves High acceleration

Some of the most common causes of true- and false-positive alerts using this defini-
tion are shown in Table 3.1. In this definition, true positive SRAs depend on road
type and environmental factors. Conversely, false positive SRAs are more related
to road quality, road design, and vehicle maneuvering. The available data for this
thesis sets the limit for which of the true- and false-positive alerts can be considered
in developing the false-positive filter. Hence, false positive causes 2-5 could not di-
rectly be classified since the available data lacks features that are needed to classify
them.

5.2 False Positive Filter
The main goal of this thesis was to create a false positive (FP) filter capable of
reducing the number of generated false positive alerts in a (SRA) system as shown
in Figure 5.1. Filtering solely on the current weather conditions for each specific
situation is not enough for a rigorous solution. For example, some situations when
the probability of ice formation on the road surface is high can still be caused by
false positive situations e.g. driving on a speed bump or a pothole.

The idea was to create sequences of the provided vehicle data, train ML models on
that data, and finally remove sequences corresponding to false positive situations e.g.
driving on a speed bump or making a sharp turn. The sequential data corresponding
to true positive situations is then passed to an SRA system where it is processed to
create SRAs that are sent to other nearby vehicles.

There are two main reasons why we focused on creating a false positive filter. First,
implementing an entire SRA system was impractical due to the limited time of the
project. A false positive filter that acts alongside a current SRA system is less
time-consuming to implement. Second, the bursts of data points are even further
processed and filtered in current SRA systems. Hence, the number of valuable
features for training the machine learning models is reduced going further into the
system. Therefore, the filter was placed as close to the raw vehicle data as possible,
i.e. when the data is sent to the cloud.

38


5. Methods

Figure 5.1: High-level overview of the proposed false positive filter.

5.3 Data Preprocessing
The process of generating a data set from the raw vehicle data is divided into the
following three main parts: sequence generation, sequence mapping, and sequence
grouping. First, the unprocessed vehicle data was clustered based on both time and
location to form sequences for unique vehicles. Second, each sequence was mapped
to weather stations, speed bump locations, and road types. Finally, the sequential
vehicle data was grouped to create the final data sets used to train, test, and evaluate
the machine learning models.

5.3.1 Sequence Generation
The available vehicle data consisted of unlabeled spatial-temporal data points in-
cluding the estimated friction coefficient, friction quality, temperature, and wiper
speed. Based on the initial analysis of this data, it seemed difficult to distinguish
between false- from true-positives solely on point measurements with the available
features. A single point measurement consisting of the friction coefficient and fric-
tion quality is not informative enough to determine if the point corresponds to a true-
or false-positive situation. There are many other variables needed to make accurate
predictions of the situation including vehicle speed and forces acting on the wheels.
Additionally, the available data was unlabeled which increases the complexity of the

39


5. Methods

task of detecting and reducing false positive alerts.

The data analysis revealed that vehicles generate sequences of data points. A hy-
pothesis was that both the estimated friction coefficient and friction quality will
change over time differently depending on if the situation corresponds to a true pos-
itive situation e.g. driving on ice or a false positive situation e.g. driving on a speed
bump. Furthermore, creating sequences of the vehicle data introduces dependencies
between data points in consecutive order. This makes sequences more informative
than a single point measurement. The idea was therefore to train machine learning
models to learn patterns in the sequential vehicle data such that sequences corre-
sponding to false positive situations could be removed. The process of creating these
sequences was based on both temporal and spatial clustering.

The temporal clustering method is shown in Figure 5.2. First, each data point was
grouped by the vehicle identification number (VIN) and sorted by the creation time.
The creation time represents when each specific data point was created in the cloud
and was the only accessible time variable in the vehicle data. The main goal of
the temporal clustering was to form rough initial clusters that would reduce the
execution time of the spatial clustering later on.

Figure 5.2: Temporal clustering of the raw vehicle data.

For a specific vehicle, two consecutive points p and q are considered to belong to the
same sequence if the creation time between the points tp and tq is within n seconds.
Otherwise, the points are placed into separate sequences. A too-small value for n
would result in a split of sequences belonging to the same situation. Conversely,
a too-large value for n could result in spatial collisions of sequences for a specific

40


5. Methods

vehicle. For example, a vehicle might generate a sequence of points at time t0 on
location l and might generate a sequence at time t1 on the same location l. Hence,
the spatial clustering later on would cluster these sequences into one single sequence
which should be avoided since they do not correspond to the same situation. For
this initial clustering, n was set to 10 seconds to ensure that each point for a specific
situation is captured while also minimizing the risk of spatial collisions of sequences.
After adding the next consecutive point q to the current sequence or creating a new
sequence starting with q, the next consecutive pair of data points will consist of the
point q and the next point in the list of points sorted by the creation times. The
loop continues until each data point for each vehicle belongs to a sequence.

The sequences generated from the initial temporal clustering were further clustered
using the DBSCAN clustering algorithm with Haversine distance as the distance
measure, which is covered in section 3.4. The value of epsilon in DBSCAN should
be as small as possible such that points that are related to the same situation
are clustered together. However, a too-small epsilon would split sequences that
relate to the same situation into different clusters. Conversely, a too-large value for
epsilon would have the opposite effect, i.e. larger clusters containing sequences from
multiple situations. Creating sequences of the vehicle data proved to be a challenging
task since many unknown variables such as vehicle speed, road surface, and road
construction affect the properties of a sequence including its length, sparseness, and
form (e.g. straight or curved).

During the analysis of the vehicle data, it was discovered that data points on high-
ways were more sparse compared to data points on other roads. Using a fixed small
value for epsilon in DBSCAN would lead to multiple clusters for data points that
relate to the same situation on highways. Increasing epsilon would solve this issue
on highways. However, it would introduce issues on other roads where points were
more densely packed. Instead, the value for epsilon was computed based on the road
maximum speed (RMS) in km/h for each data point, as follows:

ϵ =
(

RMS

3.6

)
k (5.1)

The road maximum speed was divided by 3.6 to go from km/h to m/s and multiplied
with a constant k that was tuned visually and finally set to 1. For example, if the
vehicle was driven on a road with a maximum road speed of 80 km/h, the value for
epsilon would be set to 22 meters. Hence, for two points to be considered neighbors
in DBSCAN, the distance between the points would need to be less than or equal
to 22 meters when driving on a road with a maximum legal speed of 80 km/h.
Ultimately, after running DBSCAN on the entire vehicle data, a total of 6 million
sequences were generated.

5.3.2 Sequence Mapping
After the sequence generation process of the vehicle data, each sequence was mapped
to weather stations, speed bump locations, and road types. The mapping was imple-
mented using spatial hashing (described in Section 3.6) which significantly reduced

41


5. Methods

Figure 5.3: Sequence to speed bump matching.

the number of comparisons needed. The gathered vehicle data contained approxi-
mately 115 million data points which would result in billions of comparisons needed
using a naive approach.

Weather station mapping
Each sequence was mapped to the closest weather station such that the weather
data would more accurately represent the current weather conditions for specific
sequences. However, it was impractical and unnecessary to compare the distances
between every point within a sequence with the locations of the weather stations.
Instead, a single point from a sequence was sampled to represent the location of
the entire sequence. The distance between this point and the locations of each
weather station was computed. The weather station closest to the point was then
mapped to the sequence to which the point belonged. This procedure was applied for
each sequence in the sequential vehicle data. Finally, the current temperature and
precipitation levels were mapped to the closest half-hour of the sequential vehicle
data since the weather data had a time interval of 30 minutes.

Speed bump location mapping
The mapping of speed bump locations was implemented in two steps. First, a
sequence was considered to be close to a speed bump if each point of the sequence
was within a radius of 15 meters of the speed bump, as shown in Figure 5.3. Points
outside this radius were removed from the sequence since they were assumed to
not relate to the same situation. However, this could lead to wrong mappings, e.g.
sequences being mapped to speed bumps on a different road. Second, to counteract
this issue, at least one point of a sequence needed to be within a radius of 4 meters
from the speed bump.

Road type mapping
The mapping of sequences to road types was more complicated than the other map-
pings since the road type data contained polygons in contrast to singular points for
the weather stations and speed bump locations. The road type data contained data
for asphalt and gravel roads. Hence, it was only necessary to map points within
each sequence to gravel roads since the rest of the sequences could be assumed to
belong to asphalt roads. To map the sequences to gravel roads, the polygons were
first converted to a set of coordinates. Points within a sequence were then mapped
to a gravel road if any point was within a radius of 30 meters of a gravel road point.

42


5. Methods

5.3.3 Sequence Grouping
The final data sets used to train, test, and evaluate the machine learning models were
created by grouping the sequential vehicle data on the current weather conditions,
speed bump locations, and road types.

The sequential vehicle data was grouped based on the current weather conditions
from Trafikverket using the flowchart shown in Figure 5.4. The flowchart was created
based on assumptions under w