DF

Deep Normal Driving
A deep dive into driver modelling using a data-driven approach

Master’s thesis in Systems, Control and Mechatronics

GUSTAV ANDERSSON
EDVIN ASPELIN

Department of Electrical Engineering

CHALMERS UNIVERSITY OF TECHNOLOGY

Gothenburg, Sweden 2019


Master’s thesis 2019

Deep Normal Driving

A deep dive into driver modelling using a data-driven approach

GUSTAV ANDERSSON
EDVIN ASPELIN

DF

Department of Electrical Engineering
Chalmers University of Technology

Gothenburg, Sweden 2019


Deep Normal Driving
A deep dive into driver modelling using a data-driven approach
GUSTAV ANDERSSON
EDVIN ASPELIN

© GUSTAV ANDERSSON, EDVIN ASPELIN 2019.

Supervisor: Anders Ödblom, Volvo Car Corporation
Supervisor: Carl Toft, Computer Vision and Medical Image Analysis
Examiner: Fredrik Kahl, Computer Vision and Medical Image Analysis

Master’s Thesis 2019
Department of Electrical Engineering
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: An example of a multi-lane road-state.

Typeset in LATEX, template by David Frisk
Printed by Chalmers Reproservice
Gothenburg, Sweden 2019

iv


Deep Normal Driving
A deep dive into driver modelling using a data-driven approach
GUSTAV ANDERSSON
EDVIN ASPELIN
Department of Electrical Engineering
Chalmers University of Technology

Abstract

Self-driving cars seek to relieve the driver from the task of driving, and the new
features that will come from the development of autonomous vehicles will introduce
a whole new paradigm for the automotive industry. Developing these features calls
for smart and efficient testing. These tests are performed more and more in sim-
ulation environments to decrease costs and to increase the efficiency. Hence, new
technologies tangential to the research in autonomous drive will need to have fur-
ther developed toolboxes for testing through simulation. Enabling the simulation of
normal drivers would thus be beneficial in this process.

This research proposes a methodology for modelling human behaviour in a multi-
lane highway environment using a data-driven approach. The model makes use of
a Long Short-Term Memory (LSTM) network to gather ego, object and road vari-
ables, and then predict the future trajectory of the ego vehicle. Studying the ability
to navigate through traffic, it is important for the model to both plan the direct
path on the road and the ability to identify future actions, like changing lanes. Fur-
ther, performance measures are introduced in order to investigate the behavioural
outcome of the developed model. The performance is measured both based on the
quality of the prediction given the input, and then also the performance of predict-
ing future manoeuvres that are not yet initialised.

Thus, the research topics do not only reflect an investigation for methods mod-
elling a normal driver but also underscores the methods of verifying the result. The
proposed model is shown to, with a high precision, be able to predict the future
trajectory of the ego vehicle and also predict future lane changes. The model is also
verified to be aware of surrounding objects in the predictions.

Keywords: Trajectory Prediction, Driver Intention, Machine Learning, Deep Learn-
ing, Artificial Neural Networks, RNN, LSTM.

v


Acknowledgements

Hi,

When you are reading this report, keep in mind that the work done would not have
been possible without some great people’s involvement and their interest in our work.

We have had the pleasure of working with two tremendous supervisors from both
Volvo Cars Corporation and Chalmers University of Technology. Anders Ödblom
from Volvo Cars Corporation has been our major contact through the work, and
with his keen eyes, he has always asked the hard questions that made us both apply
more critical thinking and strive for a higher level. Carl Toft, our supervisor at
Chalmers, being the bright spirit he is and always giving great support and input
in our work, he was always there to answer any of our questions. We were always
amazed by the level of engagement from our supervisors and we are very grateful
that we had the opportunity to work with them.

We would also like to highlight the work made by Alexander Bükk and Rickard
Johansson, creating the start point of this research. Without their help, we would
still be in the cave of segmenting tons of data sequences.

Finally, we would like to thank everyone in the office at Volvo Cars Corporation
that made this last part of our degrees a truly memorable time. With that said,
hope you will enjoy the results we produced during this time period.

Best regards,

Gustav Andersson & Edvin Aspelin, Gothenburg, June 2019

vii


Contents

List of Figures xi

List of Tables xv

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 5
2.1 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Linear layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Data normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Methods 11
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Data distribution . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Network modelling . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2 Data input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Network training . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Driver intention . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Safety evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.3 Prediction fluctuation . . . . . . . . . . . . . . . . . . . . . . 24

4 Results 25
4.1 Network performance and training progress . . . . . . . . . . . . . . . 25
4.2 Object dependence in model . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Feature correlations in prediction performance . . . . . . . . . . . . . 31
4.4 Output sequence length . . . . . . . . . . . . . . . . . . . . . . . . . 33

ix


Contents

4.5 Lane change predictions . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.6 Safety assessment of predicted trajectory . . . . . . . . . . . . . . . . 38
4.7 Evaluation of prediction fluctuation . . . . . . . . . . . . . . . . . . . 39

5 Conclusion 41

6 Future work 43

Bibliography 45

x


List of Figures

2.1 Two visualisations of a single recurrent neuron. The left part repre-
sents a rolled RNN for each time step and the right part the corre-
sponding unrolled RNN. . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 A basic LSTM cell with linear operators (grey), and sigmoid (blue)
and tanh (green) activation functions. . . . . . . . . . . . . . . . . . . 8

2.3 Visual representation of a linear layer in a network. . . . . . . . . . . 9

3.1 The used data are sampled with an integrated sensor system, where
the field of view of the equipment covers the front part of the ego
vehicle and can detect objects up to Rmax meters ahead. . . . . . . . 11

3.2 Top view of a single lane with ego vehicle (red) and a single vehicle
(grey). Note that relative coordinates of surrounding objects, (xk, yk),
are given in the ego vehicle’s body frame, where the x-axis is along
the forward direction of the ego vehicle and the y-axis perpendicular
to it in a right-handed coordinate system. . . . . . . . . . . . . . . . 12

3.3 Histograms showing the distribution of ego velocity and the exposure
to multiple objects throughout the dataset. The statistics are made
from analysing each time sample in the complete training dataset. . . 13

3.4 Histograms showing the distribution of ego velocity and the exposure
to multiple objects throughout the dataset. The statistics are made
from analysing each time sample in the complete validation dataset. . 14

3.5 Each time sample in the training set is analysed to produce a heat map
over where surrounding objects are located throughout the training
dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.6 To emphasise how objects travel in relation to the ego vehicle, the heat
map is produced by observing in what coordinates there has been an
object throughout a each single sequence and then summarising the
data sequence-wise. This is done on the training set. . . . . . . . . . 15

3.7 Heat map showing where objects are detected relative to the ego
vehicle at the moment the ego vehicle crosses a lane marker, analysed
through the training set. . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.8 Top view of an overtake, comparing the ground truth trajectory and
a prediction. Each dot on the lines represents a coordinate point,
sampled and predicted with a frequency of 4 Hz. . . . . . . . . . . . . 17

xi


List of Figures

3.9 Data flow in an LSTM network with n number of cells, hidden di-
mension h and a linear output layer. . . . . . . . . . . . . . . . . . . 18

3.10 Benchmark performance of LSTM network with 2 number of cells
and with hidden dimension of 100, trained on raw dataset including
the objects using Mean Square Error (MSE) loss function. Show-
ing results both including and excluding objects in the input of the
predictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.11 The lane markers are adjusted to not have an initial lateral offset
relative to the ego vehicle. . . . . . . . . . . . . . . . . . . . . . . . . 20

3.12 Example of a lane change scenario. Note that the last and second
last ground truth data point are in different lanes. . . . . . . . . . . . 22

3.13 Visualisation of the fluctuations between two following predictions.
The distance d can also be divided in its lateral and longitudinal
distances to further describe the fluctuations. . . . . . . . . . . . . . 24

4.1 The training progress of the network over number of epochs for the
training and validation dataset, compared with the the benchmark
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Mean losses per prediction after each epoch for the training set and
the validation set with and without objects as an input to the network.
The figure highlights that the objects are necessary to predict the
next movement. When training the network without any objects, the
network fails to predict the next movement. . . . . . . . . . . . . . . 28

4.3 The density of data points with the error after 5 seconds into the
future in relation to the object group in the scenario. Each point
inside the distribution represents a data point with the white circle
being the mean error. Observe from Fig. 3.4 the exposure of all object
groups. The presence of cases with g8−10 is considered negligible and
it can also be seen that g6 and g7 has lower exposure than the rest. . 29

4.4 Comparing the closest distance to surrounding objects versus the er-
ror, 5 seconds into the future. Each point represent a single prediction
in the validation set. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.5 The longitudinal respective the lateral error in relation to the current
velocity of the ego-car. Each point represents a single prediction in
the validation set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.6 The longitudinal and lateral error in relation to the curvature of the
road. Each point represents a single prediction in the validation set.
The curvature is defined as the lateral distance between the first lane
marker point and the last lane marker point given as input to the
network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.7 Longitudinal error between the prediction and the ground truth data,
at the last point of the predictions. The error tends to be higher the
further into the future one predicts. All errors are evaluated on the
validation set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

xii


List of Figures

4.8 Lateral error between the prediction and the ground truth data, at
the last point of the predictions. The error tends to be higher the
further into the future one predicts. All errors are evaluated on the
validation set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.9 Root mean square error of the model output from a network model
trained to predict the trajectory 5 seconds into the future and with a
sample rate of 4 Hz. The error is the total error in the specific time
instance, not coupled with the other time instances. The errors are
evaluated on the validation set. . . . . . . . . . . . . . . . . . . . . . 35

4.10 Histograms over the longitudinal and lateral error respectively, in
scenarios with and without lane changes, using the proposed model. . 36

4.11 Histograms over the longitudinal and lateral error respectively, in
scenarios with and without lane changes, using the benchmark model. 37

4.12 Relative object position data points in the body frame of the ego
vehicle, when following the predicted trajectory from the network
model. Each point represents an object position relative to ego car
(0,0) at some time in the entire validation set. A point is thus only
the object detection point with no spatial information other than the
point position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.13 The fluctuation of the predictions, given both longitudinally and lat-
erally. The distance of the fluctuations are here presented in rela-
tion to the frequency of the distances represented when predicting
throughout the validation set. . . . . . . . . . . . . . . . . . . . . . . 39

xiii


List of Figures

xiv


List of Tables

3.1 Summarising the average number of left and right ego vehicle lane
changes in each sequence. . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Summary of the categorisation in the input channels to the network. . 20
3.3 Summary of learning parameters used to train the network. . . . . . . 21

4.1 The lowest mean square error of an epoch evaluated on the validation
set, trained for 500 epochs with different number of cells and hidden
dimensions. Note that all cells share the same hidden dimension size
in each separate model. . . . . . . . . . . . . . . . . . . . . . . . . . . 26

xv


List of Tables

xvi


1
Introduction

In recent years, the idea of self-driving cars has motivated researchers all over the
world and is a popular topic in technological media. Relieving the driver from
the task of driving will be an entirely new paradigm for the automotive industry
and is expected to radically transform means of transportation. Furthermore, self-
driving cars also move the driver’s responsibility of driving safely and environmen-
tally friendly to the car manufacturers instead, opening up for more sophisticated
work towards increasing safety and effectiveness.

As always, we find ourselves in a position where verification is necessary to deliver
safe and reliable products. With new technology raising the level of automation,
there is a need to create appropriate methods in order to verify the safety level of
these technical solutions. Now, modelling a normal driver will enable verification
methods where new automotive vehicles can be tested in software, letting it interact
in a synthetic traffic environment.

These new tests will depend heavily on the behaviour model for interacting ve-
hicles, and the goal of this research is to propose a data-driven machine learning
approach to create a driver model which mimics the human behaviour and his or
her interaction in traffic.

1.1 Background

A data-driven approach to trajectory predictions using machine learning has shown
exceptional results in previous works, [1–3]. In fact, neural networks in general have
proven to be a prevalent choice in almost any prediction-based task. Nowadays
machine learning can be seen in tasks ranging everywhere from image classifications
to teaching a robot complex movement skills, [4,5]. Also, mimicking physical motion
in humans is a popular usage of machine learning, for example, networks that imitate
human handwriting and speech, [6, 7].

In other words, neural networks are excellent at learning the how. How to talk,
how to walk, and how to write. However, teaching someone letters and sentences
does not mean that they can write a novel. The complexity of teaching a network
to write a novel and evaluating the result would grow rampantly in comparison to
the letters and sentences. And how would you measure the performance of such a

1


1. Introduction

network and the produced novel? What is a good novel? If you knew exactly how
to write the best novel, the Pulitzer prize would not be far away - if you have the
answer, please let us know.

Making the same analogy in driving, there is a difference between knowing how
to overtake another vehicle and knowing when to do it. In autonomous vehicles, this
could be a crucial distinction when interacting with other vehicles. It is therefore
essential to investigate how to predict the intention of a driver. We would like to
introduce a network that innately combines both prediction and intention, using
the network structure and processing of the input data to achieve that. Uniting
trajectory prediction and intention prediction in a network like this has become
the vision of this project, and it seeks to investigate both the how and when in a
network’s performance.

1.2 Aim

The aim of this research is to model driver intention and trajectory prediction,
using a supervised deep learning approach. The goal is to design a model which
makes use of sequential input of data collected from the surroundings of a vehicle to
mimic human behaviour. The research will use a vanilla Long Short-Term Memory
(LSTM) network as a starting point and further develop and evaluate it to propose
a structured methodology for creating a network suited for highway driving. This
research attempts to evaluate how the LSTM network can benefit from altering the
network structure and preprocess the dataset to skew its focus towards surround-
ing objects. Thus, there are two focus points of the research; data preprocessing
and network structure. How the processing of the dataset affects the performance
outcome of the network will need to be empirically tested, as the dataset is very
much unique in regards of sensor disruptions, data structures etc. Evaluating the
network structure will include performance measures of different network structures
and hyper-parameters. Developing means of evaluating both the quality of trajec-
tory predictions and driver intentions is key to analysing the performance.

It is a necessity that the data is interpretable and properly invariant for the net-
work to function as intended. Thus the aim includes exploring possibilities of using
alternative methods of expressing the road state. In the used dataset, surrounding
objects are described as solely longitudinal and lateral coordinates relative to the ego
vehicle. A challenge with the dataset is that it uses gathered sensor data as ground
truth. Even though the measurements may be close to the true value, the network
may inherit any faults from the data. Hence we will investigate if the performance
can be improved with other methods of describing the input data.

The essential challenges and research topics to investigate are:
• Data transformation and pre-processing.
• Implementation of neural network models.
• Investigating performance of network structures.

2


1. Introduction

• Training and evaluation of models.
• From the results, determine what network has the best performance and eval-

uate its strengths and weaknesses.
• Evaluate the model’s performance correlations in regards to input features and

make an assessment of the environmental importance of surrounding objects.

1.3 Limitations

The project limits itself to only observe data from highway scenarios. In other words,
the data is selected from datasets from where the ego vehicle operates in a multi-
lane environment with a velocity greater than 70 km/h. The dataset contains 6500
recorded driving logs with solely frontal field of view, where each log contains roughly
80 seconds of data and includes at least one lane change. Currently, a subset of the
collected data parameters has been processed and restored, for example interpolating
missing lane markers. The processed dataset contains; ego yaw rate and velocity, ego
lateral distance from lane markers, and relative spatial displacement of surrounding
objects.

Deciding on network architecture is also limited to only investigating structures
of LSTM networks.

1.4 Related work

It is not obvious to use machine learning for vehicle trajectory predictions. In [8],
Houenou et al. instead uses motion modelling to predict the trajectory of surround-
ing objects. The vehicles are given a constant yaw rate and constant acceleration
motion model and then combines it with a manoeuvre recognition model to make
predictions about future positions. Rather than having a neural network help with
decisions, they use a deterministic model for manoeuvre recognition. The model is
solely based on kinematic measurements and road geometry detections. A downside
of this model is that it will not scale with more data, and may become a limitation
in the performance of the trajectory predictions.

Other models, such as hidden Markov models and dynamic Bayesian networks,
have also been used for various prediction tasks. Ye et al. [9] suggest a hidden
Markov model in combination with linear layers to make vehicle trajectory prediction
and Wöllmer et al. [10] use a dynamic Bayesian network combined with an LSTM
network for artificial listening. Both these implementations use another model than
a neural network to handle some sub-task in the tool-chain.

Another similar work has been done by Altché and La Fortelle [11], where a
Long Short-Term Memory is used to predict a vehicle’s future trajectory. It shows
the possibility of predicting trajectories based on real driver log data. The results
show that they can do so with a reasonable error, and the implications from the

3


1. Introduction

network performance seem promising. However, there are not any further analysis
of the results other than the loss. Alexander Bükk and Rickard Johansson have also
created an implementation of an LSTM network to model the driver, which proved
potent in predicting future trajectories [12]. It was however shown that the model
was not interaction aware, in the sense that surrounding objects did not affect the
prediction. We now want to investigate if it is possible to achieve a high-performance
neural network that can mimic those interactions from human driving.

4


2
Theory

Neural networks are a convincing alternative for almost any analytic task. The
results are continuously improving and new network structures are invented by the
day. According to the Universal approximation theorem [13], a single layer neural
network can approximate any given function. The size of the network would in
many cases be unfeasibly large and the network may fail to learn from the input
data, but it somewhat reflects the great expressiveness of neural networks.

This paper will not attempt to improve on the massively iterated explanation
of the fundamentals in neural networks. Instead, this chapter will go through the
underlying theory used to model, train and evaluate the Long Short-Term Memory
(LSTM) networks that are used in this research. The desired outcome is to give the
reader an understanding of how LSTM networks can be implemented to perform in
a satisfying manner in situations related to the research topics.

However, if you are unfamiliar with Artificial Neural Networks in general, or just
need a quick refresher, Ian Goodfellow et al. [14] made an exceptional introduction
to deep learning and the basics of neural networks.

2.1 Recurrent Neural Networks

Humans dont start their thinking from scratch every second. [...]
Your thoughts have persistence.

— Christopher Olah, [15]

A major shortcoming with traditional neural networks is their disability to re-
member previous inputs. Remembering the input is necessary for any sequence-
based information input, e.g. semantics in text or approximating velocity from
position coordinates. Recurrent Neural Networks (RNN) seeks to address this issue
by introducing a continuous memory in each cell. With its internal memory, the cell
is able to remember the input history by updating the memory with each sequential
input. It does not remember the history exactly, per se, but instead expresses it in
its own high dimensional space through the sequence.

5


2. Theory

xt

ht

ot

W (hx)

W (oh)

W (hh)

= h0
W (hh)

x1

h1

o1

W (hx)

W (oh)

W (hh)

x2

h2

o2

W (hx)

W (oh)

W (hh) ...

xT

hT

oT

W (hx)

W (oh)

Figure 2.1: Two visualisations of a single recurrent neuron. The left part represents
a rolled RNN for each time step and the right part the corresponding unrolled RNN.

The RNN cell is visualised in Fig. 2.1 both with a loop going from the output
back as an input, or an unrolled version where the sequential input and output can
be seen more clearly. Note that both visualisations are only different visual takes on
the same cell structure. The memory content is passed through each iteration and
is used to determine the output, thus allowing for content to be conserved from one
step to another. The cells use the sigmoid function, σ, as activation function. Each
step in the sequence can be computed by the following equations, with the matrices
W and b being the weights and biases respectively.

σ(x) = ex

1 + ex
(2.1)

ht = σ(W (hh)ht−1 +W (hx)xt + b(h)) (2.2)
ot = σ(W (oh) · ht + b(o)) (2.3)

Each weight matrix W is optimised in the training process, and it can be seen in
Eq. (2.2)-(2.3) how the memory, ht, affects the output, ot. In supervised learning,
the optimal outcome of the network is known. Through penalising an error in the
prediction with a loss function, for example with a Mean Square Error (MSE), the
weights can be adjusted to make better predictions. Weight optimisation is then
done through back-propagating the network through time, which is a method to
calculate the gradient needed for the weight update. The proof of back-propagation
through time is rather long and is not reflected in the topics in this research, but it
is fundamental in weight optimisation for RNN and B. Mehlig presents an exquisite
formulation of it in [16].

In specific fields, RNN networks have been truly groundbreaking, and the tremen-
dous results are touched upon in Andrej Karpathy’s The Unreasonable Effectiveness
of Recurrent Neural Networks [1]. But a simple RNN does not come without some
uncomfortable bumps, with the most adverse one being the vanishing and exploding
gradient problem. It basically stems down to the results from the back-propagation
through time, where the weight matrix W (hh) decides if the gradient is going towards

6


2. Theory

zero or infinity. At the time t, the gradient in regards to some previous time step k,
is calculated like the following,

∂ht

∂hk

=
∏

t≥i≥k

∂hi

∂hi−1
=

∏
t≥i≥k

W (hh)T

diag
(
d

dh
σ(hi−1)

)
(2.4)

In conclusion from Eq. (2.4), when increasing the time frame of the sequence,
W (hh) will either make the gradient increase or decrease exponentially. This quickly
infers problems when increasing the sequence length of the input and/or output.
Trying to find an optimal weight matrix with a tiny gradient will result in a large
increase in training time, and with a huge gradient, the optimizer have a hard time
finding optimal points. One can reduce the impact of the exploding gradient by
various methods, for example, clipping the gradient at a certain threshold. However,
the vanishing gradient problem persists and one needs to look at other cell structures
to address this problem.

2.2 Long Short-Term Memory

In 1997, Hochreiter and Schmidhuber presented their work in [17], which aims to
address the issues of classic RNN structures. More precisely, they proposed a so-
lution for the gradient problems. The fundamental idea of the proposed solution,
the LSTM cell, is to truncate the back-propagation gradient, where the truncation
does no harm in performance. LSTM cells will thus enforce a constant error flow
through the sequence with its use of Constant Error Carousels, (CEC), which is the
central feature of the LSTM cell. The special structure of LSTM cells allows for
the training of multiplicative gate units to create a communication channel from
the input and memory cell to the constant error flow, to significantly increase its
learning rate capabilities compared to a classic RNN.

7


2. Theory

σ

tanh

σ

σ

×

× + tanh ×
ft ct ht

ot

c̃t

it

ot

xt

Figure 2.2: A basic LSTM cell with linear operators (grey), and sigmoid (blue)
and tanh (green) activation functions.

The implementation of an LSTM cell in Fig. 2.2 is not very different from im-
plementing a recurrent neuron. It basically proposes an alteration of the memory
update equation, Eq. (2.2).

it = σ(Wihht−1 +Wixxt + bi) (2.5)
ft = σ(Wfhht−1 +Wfxxt + bf ) (2.6)
ot = σ(Wohht−1 +Woxxt + bo) (2.7)
c̃t = tanh(Wchht−1 +Wcxxt + bc) (2.8)
ct = ft · ct−1 + it · c̃t (2.9)
ht = ot · tanh(ct) (2.10)

With the altered equations, the cell is able to internally pass it’s constant error
flow in the CEC-loop, which can be observed in Fig. 2.2. How the cell updates
it’s internal states can be described step-by-step. Firstly, the cell decides what it
is going to forget from it’s former internal content state, ct−1. This is done by the
forget gate, ft, which is seen in both Eq. (2.6) and (2.9). Then the cell decides what
information should be added to the cell state with the use of both the candidate c̃t

and the input gate, it, in Eq. (2.5) and (2.8). Finally, it calculates the output ht in
Eq. (2.10), which is also affected by the internal cell state, ct.

The main features of the LSTM cell have made it a popular choice for sequenced-
based data inputs because of its innate ability to store the cell memory over a long
period of time. There exist some variations of the LSTM cell structure, but the one
used in this research is the one presented in this chapter and is also the one mainly
referred to as the standard LSTM cell.

8


2. Theory

2.3 Linear layer

A linear layer is used as a simple network layer that only applies a linear transforma-
tion to the input, with the weight matrix W and the bias vector b. The linear layer
is sometimes also called a dense layer or a fully connected layer, but the structure
and the linear operation is still the same and can be seen in Fig. 2.3 and Eq. (2.11)
respectively.

x1

x2

x3

x4

x5

y1

y2

y3

Figure 2.3: Visual representation of a linear layer in a network.

y = W · x+ b (2.11)

From the previous walk-through of the LSTM cell structure, we learned that the
LSTM cell outputs the hidden state, ht. With the hidden state most likely not being
the same dimension as the desired output, a final layer is applied to map the LSTM
output to the correct form. Input x to the linear layer is then set to ht. The weight
matrix W and vector b is then optimised along with all the other parameters in the
network training process.

2.4 Data normalisation

Normalisation in neural networks is a method that aims to increase learning speed,
make the network more robust, and increase its performance. It assures that there
are no activation functions that, relative to others, go very high or low, creating a
rippling effect of bad comparisons through the network. With the range of different
input channels not far apart, the normalised input circumvent this behaviour. Mak-
ing it easier for the network to compare values also allows for higher learning rates
and work as a catalyst for better performance.

In neural networks, the input data is often normalised to have both mean and
variance of value 0 and 1 respectively, and the normalisation is done separately on

9


2. Theory

each input channel. For a dataset with n input channels, X = {x(1),x(2), ...,x(n)},
where x contains all values for that channel in the dataset, with a total number
of data sample points K, the new normalised input channels are calculated by the
following equations.

x(i) =


x

(i)
1

x
(i)
2
...
x

(i)
K

 (2.12)

x̄(i) = 1
K

∑
x(i), ∀i ∈ {1, 2, ..., n} (2.13)

σ(i)2 = (x(i) − x̄(i))T (x(i) − x̄(i))
K − 1 (2.14)

x̃(i) = x(i) − x̄(i)
√
σ(i)2

(2.15)

⇒ X̃ = {x̃(1), x̃(2), ..., x̃(n)} (2.16)

Thus, X̃ is the new normalised input to the network. As the normalisation is cal-
culated solely for the training set, the same normalisation parameters are later used
to normalise the validation data. The reason for the validation data not included
in the normalisation parameters is to reassure that the validation data not by any
means affect the training data.

10


3
Methods

The goal is to present a network that reflects the abilities of a human driver. Doing
so includes the whole tool-chain, from raw dataset to performance measures and
behavioural analysis of the proposed model. Thus, the insights of the methods used
to develop the driver model are presented to thoroughly walk through each step of
the process. It also gives a description of the dataset used to produce the results
and the methodology for constructing the network model. Furthermore, it covers
the methods of evaluation and all of the performance metrics.

3.1 Dataset

Figure 3.1: The used data are sampled with an integrated sensor system, where
the field of view of the equipment covers the front part of the ego vehicle and can
detect objects up to Rmax meters ahead.

11


3. Methods

The data later used for training the model is comprised of data logs gathered from
driver expeditions made to various destinations around the globe. Consisting of 6500
recorded sequences, roughly 80 seconds each, both the driver and the environment
around the vehicle are varying in the dataset. Each sequence contains roughly 320
data collection samples with a sampling frequency of 4 Hz. Of the total number
of gathered logs, logs have been specifically selected to only consist of a multi-lane
environment, with a speed constantly above 70 km/h, and throughout each sequence
at least one overtake is present. Objects are represented by xy-coordinates, which
are sensor estimations of where the objects are located relative to the ego vehicle.
Also, the sampled data only contains forward-looking object detections, which means
that only objects in front of the car, up to a set distance Rmax ahead, will be a part
of the dataset. A top view of the sensor vision used to gather the data logs in the
dataset is represented in Fig. 3.1.

Furthermore in the classification, an object includes the object types; cars, trucks,
motorbikes, or an unclassified road object. Lane markers in the dataset are approx-
imated as N -degree polynomials with parameters a, as in Eq. (3.1). Where x is the
longitudinal distance along the road in ego body frame coordinates, starting from
ego position x = 0. Then for each lane marker, j:

yj =
N∑

i=0
ai,jx

i = fj(x) (3.1)

Lane marker approximations are calculated and updated in each time step, as-
suring that only the latest approximation is the one used. The approximation is
done for each lane marker present on the road. Summarising the contents of the
collected data gives the following features:

• Yaw rate of ego vehicle, ψ̇
• Ego forward velocity, v
• Object positions relative to ego vehicle, (xi, yi)
• Lane marker polynomials, yj = fj(x)

v

ψ

(0, 0) (xi, yi)

Figure 3.2: Top view of a single lane with ego vehicle (red) and a single vehicle
(grey). Note that relative coordinates of surrounding objects, (xk, yk), are given in
the ego vehicle’s body frame, where the x-axis is along the forward direction of the
ego vehicle and the y-axis perpendicular to it in a right-handed coordinate system.

12


3. Methods

The logs also underwent some data processing where missing road line estimations
have been interpolated to maintain the integrity of the lane markers. Also, objects
identified outside of the outer lane markers are discarded. The dataset is split with a
90/10 ratio, where 90% of the dataset is placed in the training set and the remaining
10% in the validation set.

3.1.1 Data distribution

In this section, the complete training and validation dataset is analysed to facilitate
the opportunity to explore what behaviour the network may inherit from fitting to
the training data. It also enables an investigation of if the data is enough to properly
train the network to make vehicle trajectory predictions.

Analysing ego velocity and object detections for each time sample in the training
set and validation set respectively, the distribution of said data in the dataset is
presented in Fig. 3.3 and 3.4. The interval of the velocity is shown to be within 70
km/h up to 160 km/h. In order to mask the performance of the sensor setups object
detection, objects have been grouped together. This means that each group, gi, can
include various object types and/or number of objects. However, times where the
sensor setup detects no objects are not masked.

80 100 120 140 160
0%

5%

10%

15%

Velocity [km/h]

Fr
eq

ue
nc

y

Velocity distribution

0 g1 g2 g3 g4 g5 g6 g7 g8 g9 g10
0%

5%

10%

15%

20%

25%

30%

Number of objects

Fr
eq

ue
nc

y

Distribution of number of objects

Figure 3.3: Histograms showing the distribution of ego velocity and the exposure
to multiple objects throughout the dataset. The statistics are made from analysing
each time sample in the complete training dataset.

13


3. Methods

80 100 120 140 160
0%

4%

8%

12%

16%

20%

Velocity [km/h]

Fr
eq

ue
nc

y

Velocity distribution

0 g1 g2 g3 g4 g5 g6 g7 g8 g9 g10
0%

5%

10%

15%

20%

25%

30%

Number of objects

Fr
eq

ue
nc

y

Distribution of number of objects

Figure 3.4: Histograms showing the distribution of ego velocity and the exposure
to multiple objects throughout the dataset. The statistics are made from analysing
each time sample in the complete validation dataset.

Ego car

0 Rmax

Longitudinal position [m]

La
te

ra
lp

os
iti

on
[m

]

Surrounding objects in relation to the ego car

Low

Mid

High

Figure 3.5: Each time sample in the training set is analysed to produce a heat
map over where surrounding objects are located throughout the training dataset.

14


3. Methods

In both Fig. 3.5 and 3.6 the positions of surrounding objects are presented. In
Fig. 3.5, the intensity displayed is based on each time sample. However, a single
object travelling in the same speed right in front of the ego vehicle has a greater
time exposure than an object which the ego vehicle overtakes, but in reality, they
are still just one object. Instead, Fig. 3.6 shows where objects have travelled on
a sequence-based time measure. Meaning each sequence in the dataset is analysed
to create a binary grid of locations where objects have been located in that specific
sequence, and then each location grid for all sequences are added together.

Ego car

0 Rmax

Longitudinal position [m]

La
te

ra
lp

os
iti

on
[m

]

Surrounding objects in relation to the ego car per sequence

Low

Mid

High

Figure 3.6: To emphasise how objects travel in relation to the ego vehicle, the
heat map is produced by observing in what coordinates there has been an object
throughout a each single sequence and then summarising the data sequence-wise.
This is done on the training set.

Fig. 3.7 shows the distribution of surrounding objects at the time instance of
the lane change. The distinct spread in lateral offset over distance is caused by
the varying curvature of the road and the yaw angle of the vehicle, as the data is
presented in ego-vehicle body coordinates and not in the road frame.

Ego car

0 Rmax

Longitudinal position [m]

La
te

ra
lp

os
iti

on
[m

]

Surrounding objects in relation to ego car exactly on lane change

Low

Mid

High

Figure 3.7: Heat map showing where objects are detected relative to the ego vehicle
at the moment the ego vehicle crosses a lane marker, analysed through the training
set.

15


3. Methods

Lastly, the number of lane changes are presented in Tab. 3.1. A lane change is
considered as the ego vehicle crossing a lane marker.

Table 3.1: Summarising the average number of left and right ego vehicle lane
changes in each sequence.

Turn type Average per sequence
Left 0.895

Right 0.895
Total 1.79

Each sequence has an average of nearly two lane changes, presented in Tab.
3.1. This means that the selected dataset contains about two lane changes in each
sequence of 80 seconds and contains equally many lefts as right turns.

3.1.2 Summary

From the dataset analysis, it can be seen that the ego vehicle has plenty of multi-
object exposure throughout the dataset. The speed of the ego vehicle is not uni-
formly distributed, but it still somewhat follows a normal distribution. Also, the
ego vehicle has a tendency to have other cars in its right lane compared to the left.

The question still stands if the data is sufficient to properly train a network to
mimic the driver’s movements. But the conclusion is that there is no obvious lack
of data in the dataset, other than the low exposure of some object groups, that may
inhibit the training process.

3.2 Network architecture

Designing a neural network could almost be considered half science half art. There
are a lot of parameters that can be optimised and they can vary in vast ranges. And
when something structural changes, such as hidden dimensions or the number of
cells in the network, the optimisation probably needs to be readjusted. This section
describes the methods used in this research to model the final vehicle trajectory
prediction network. It will also go through how the network is trained to achieve a
satisfying result.

3.2.1 Network modelling

From the dataset, ground truth positions of where the ego vehicle travelled in each
sequence have been extracted. Positions are given as longitudinal and lateral co-

16


3. Methods

ordinates, such that each position corresponds to two data points, (xi, yi). These
positions are then used as the target output of the network. The network is thereby
trained to predict future positions relative to the current position. An overview of
the scenario can be seen in Fig. 3.8.

Ground truth

Prediction

Figure 3.8: Top view of an overtake, comparing the ground truth trajectory and
a prediction. Each dot on the lines represents a coordinate point, sampled and
predicted with a frequency of 4 Hz.

An LSTM network has the feature that it can be used to model input-output
as sequence to sequence. This means that when the sequential input goes through
the network, the output can also be given as a sequence. In vehicle trajectory
prediction, the input would be given as a sequential data input gathered from the
previous history of the ego vehicle and its surroundings. From history, the network
will then output a sequence of predicted future positions. The sequential input and
output do not necessarily need to be of the same length, which allows for the history
and prediction to be of different time intervals. The relationship of the sequence
lengths will later be analysed in terms of performance to investigate how time affects
the performance of the network model, see section 4.4.

The time interval of the output is decided to be 5 seconds. This is due to that
the network analysis includes performance measurements of driver intention, which
is based on that the network can predict manoeuvres that are not yet initialised.
To set the output time interval is a somewhat subjective decision, but is based on
the research of driver behaviour in a highway environment [18]. There the average
lane change manoeuvre is about 7 seconds long, and by that, 5 seconds should be
sufficient to assure that the network has to predict far into a manoeuvre that is not
yet initialised. As the sampling frequency of the data is 4 Hz, the output length is
then set to 20.

17


3. Methods

LSTM

1

h

LSTM

2

h

... LSTM

n

h

Linear layer

Input Output

Figure 3.9: Data flow in an LSTM network with n number of cells, hidden dimen-
sion h and a linear output layer.

The LSTM network itself is constructed as a sequence of LSTM cells, all sharing
the same hidden dimension. Before the output vector, the output of the last LSTM
cell is passed through a linear layer which maps the hidden state to the final output
in ego-vehicle coordinates. The number of cells and the size of the hidden dimension
will later be analysed to find the best performance. Data flow through a network
with hidden dimension h and n number of cells can be seen in Fig. 3.9. The size
of the linear layer is set by the hidden dimension of the LSTM cells and the output
length. With a hidden dimension h and an output length of m, the linear layer will
have the size m× h.

3.2.2 Data input

In order to justify the changes done to the network input, a benchmark model is
used to explain the fundamental problem with using the raw dataset as input to
the network model. The benchmark model will also be used to compare the final
performance of the later proposed network model and preprocessing methods.

Benchmark model

In order to compare the results, a benchmark model is used. This model uses
the unprocessed data from the dataset, such that performance improvements
from the preprocessing also is captured in the results.

The model is a vanilla LSTM model with 2 LSTM cells, where both cells have
a hidden dimension of 100.

From the dataset, the data is preprocessed such that information about the road
state is either transformed or removed. Starting with the objects on the road, the
first input channels to the network is describing objects in front of the vehicle. With
one object position described with two values, x and y, two channels corresponds to
one object. Although the input to the network then goes up to a certain number of
vehicles, the road state is not always exposed to the maximum amount of objects.
Thus, the network will need to have a null input to the empty channels at those
times, but as the input needs to be scalars one has to choose another value than null.

18


3. Methods

Instead, the empty channels are set to a distance which the dataset never reaches,
(x, y) = (Rmax, Rmax). Not choosing (0, 0) as the empty value is due to that the
network should easily learn that low values are of more importance, as objects very
close to the ego vehicle should imply greater importance. Also, an unnecessary large
input may infer with the weight scaling of the network.

Through training an LSTM network with the raw dataset as input and then
removing the objects in the validation step, the object dependency is analysed. In
Fig. 3.10, it is clear that the benchmark network performs similarly both with and
without the use of surrounding objects. The conclusion is thus that the network can
make accurate predictions with the use of only internal states of the vehicle, such
as yaw rate and lateral velocity. A normal driver would certainly care about the
current status of the road. Hence, both the yaw rate and the lateral distance to the
lane markers are removed as in-data to the network in order to force the network to
navigate with respect to other objects.

0 20 40 60 80 100 120 140 160 180 200 220 24010−1

100

101

102

103

104

Epochs

M
SE

[m
2 ]

Mean losses per predictions on benchmark network model

Training
Validation with objects
Validation without objects

Validation min - with objects = 0.61

Validation min - without objects = 0.60

Training min = 0.29

Figure 3.10: Benchmark performance of LSTM network with 2 number of cells
and with hidden dimension of 100, trained on raw dataset including the objects
using Mean Square Error (MSE) loss function. Showing results both including and
excluding objects in the input of the predictions.

In the raw dataset, each lane marker is given as an N -degree polynomial. The

19


3. Methods

goal is to remove the initial lateral distance from each lane marker relative to the ego
vehicle, and that the lane marker input to the network is four x- and y-coordinates
for each lane marker. Hence, to remove the lateral velocities, or the lateral offsets,
from the input data the polynomials are converted to real-world coordinates. This is
done for each sequence, by firstly calculating the closest distance to the lane marker
for each time sample, y = f(0), and then calculating global lane marker positions
for the whole sequence. As the global vehicle position is also known, the x- and
y-coordinates of the lane markers at a set longitudinal distance can be extracted at
all vehicle positions. However, the x- and y-positions still carries the information
where the ego vehicle is on the road. Thus each lane marker is offset such that the
first point of each lane marker is (0, 0), as seen in Fig. 3.11.

(0, 0)

(0, y0)

Figure 3.11: The lane markers are adjusted to not have an initial lateral offset
relative to the ego vehicle.

The final input will thus not include any lateral offsets or lateral velocity for the
ego vehicle, forcing the network to adjust its prediction in regards to surrounding
objects’ movement. Furthermore, as the global lane marker positions are known,
object detections outside of the set road width can be filtered out. To summarise
the data input to the network after the preprocessing:

Before preprocessing:
• Ego yaw rate
• Ego forward velocity
• Object positions
• Lane marker polynomial

After preprocessing:
• Ego forward velocity
• Object positions
• Lane marker positions without lat-

eral offset at first point

Table 3.2: Summary of the categorisation in the input channels to the network.

x1 − xn xn+1 − x2n x2n+1 x2n+2 − xend

Object x-positions Object y-positions Velocity Lane markers

A summary of the input channels after the preprocessing is presented in Tab.
3.2. Overall of the input channels, a subset of them are considered object channels.
There are 2n object channels which can handle n objects at once, as one position

20


3. Methods

takes two coordinates. All channels are coupled such that x- and y-positions of a
single object is always related through the input indices. Throughout the dataset,
objects are fluctuating between different object channels, as no tracking has been
implemented to keep the same objects in a single pair of channels. In order for
the network to learn the object channels more easily, a simple tracking algorithm is
applied to all objects. If the objects are changing within a distance ε between two
time samples, those two positions are considered the same object and set to be in
the same input channel.

√
x2 + y2 < ε (3.2)

The tracking in Eq. (3.2) is then applied to the whole dataset.

3.2.3 Network training

When training the network, the weights are updated as the network is encountering
data. That means that with every prediction in the training, the weights are directly
updated with respect to the outcome. It is important to note that this is not done
in the validation step, as it would let the network cheat in its validation metrics.
The learning rate is updated at various epochs in the training, to let the network
converge smoothly and fast to its optimum. At the set epochs, the learning rate is
scaled by a certain factor. In all of the training, Adam is chosen as optimisation
algorithm and the loss is chosen to be the Mean Square Error (MSE) between the
ground truth trajectory points and the predicted points. All parameters set when
training the network is summarised in Tab. 3.3.

Table 3.3: Summary of learning parameters used to train the network.

Parameters Values
Epochs 500
Optimiser Adam
Initial learning rate 0.001
Learning rate update epochs [50, 100, 200, 400]
Learning rate update factor 0.5

A potential problem with the object channels is that certain channels may be
exposed to different scenarios, which are not certain to be in the same channel in
the future. The problem is fixed by at each epoch, object channels are shuffled
such that all channels are exposed to all object inputs. The results of the network
training are presented in section 4.1.

21


3. Methods

3.3 Performance measures

When constructing any kind of network, the evaluation methods are crucial in de-
termining if the model behaved as intended or not. To begin with, a differentiable
objective function, or loss function, is the backbone of the network’s performance
evaluation. It is natural for the network to adjust towards the loss function as it is
the measurement that it uses to adjusts its weights, hence the need for differentia-
bility.

However, sometimes it is not enough to evaluate a network’s performance by
only looking at one number. Manoeuvring a vehicle puts multiple crucial actions
on the driver, that when done poorly may impede the safety in traffic severely.
As unintended actions may compromise the safety on the road, it is vital to test
the algorithm further. Thus, the network is evaluated with several performance
measurements, which are presented and elaborated upon in this section.

3.3.1 Driver intention

Evaluating if the network knows when to do actions is done by analysing the perfor-
mance of lane changes. The network predicts the future trajectory from a time point
where the lane change manoeuvre has not yet been initialised, and as the ground
truth data points are known the quality of the prediction can then be calculated.
This performance number is then compared to the same performance measure when
driving straight. Thus, there is a comparison between how the performance of lane
changes and straight driving differentiates from one another.

For the performance measure to function properly, the lane change manoeuvre is
not yet to be initialised when the network predicts its trajectory. The classification
of a lane change is thus defined as when the last ground truth trajectory point is on
another lane than the second last. In other words, if the car crosses the lane marker
in its last time step of the ground truth trajectory, seen in Fig. 3.12. Cases not
classified as lane changes are considered straightway driving.

Figure 3.12: Example of a lane change scenario. Note that the last and second
last ground truth data point are in different lanes.

The formula for the performance measure calculates the absolute distance be-
tween the last point in the prediction and the last point of the ground truth tra-

22


3. Methods

jectory. If (x, y) is the last ground truth data point and (x̂, ŷ) is the last predicted
data point, then the lane change performances is given by Eq. (3.3)-(3.5).

l = ||(x, y) − (x̂, ŷ)||2 (3.3)
lx = ||x− x̂|| (3.4)
ly = ||y − ŷ|| (3.5)

When driving on a highway, and as seen in the dataset, the forward velocity
is often over 100 km/h. The forward velocity is thus much larger compared to
the lateral velocity the ego vehicle will reach, for example in a lane change. It
is therefore logical that the error also should be much larger longitudinally versus
laterally. Thus, individual errors are of interest as well. In the performance
analysis, both the longitudinal and lateral errors will be investigated to
make a conclusion about the model’s range of error. An important note is
also that an increase in lateral error is often more significant in regards of safety
than the same increase in longitudinal error, as a lateral difference could mean that
the ego vehicle is in a different lane.

3.3.2 Safety evaluation

To make an assessment of the safety aspect of the driver model, the output of
the network is analysed to verify that there are no collisions in the prediction.
The distances from surrounding vehicles will therefore be calculated in regards to
the predicted trajectory instead of the ground truth trajectory, for each prediction
separately. Thus, following the predicted trajectory tells how close the model is
manoeuvring around the other vehicles on the road if it would take control itself.
The relative object positions are therefore adjusted to match the object distances
in the vehicle frame if the vehicle would follow the predicted trajectory. If another
vehicle, at time steps k, has a relative position to ego vehicle of (xk, yk), the difference
between the ground truth and the prediction, ∆(xk, yk), will be adjusted in the
relative positions of the surrounding objects.

(x̂k, ŷk) = (xk, yk) − ∆(xk, yk) (3.6)

The new object positions from Eq. 3.6, (x̂k, ŷk), can then tell if the model is
predicting a trajectory close to already existing objects, or if it will collide. Creating
a plot with the results opens up for the opportunity to see any differences in the
behaviour of the ground truth data and the predicted trajectories, see section 4.6.

23


3. Methods

3.3.3 Prediction fluctuation

Now, the robustness for the predictions is analysed by investigating the spatial fluc-
tuations for each subsequent prediction. For each prediction, there will be some
difference between the current prediction and the one in the next time step. As
all coordinates are relative to the ego vehicle, the distance between the predic-
tions will not be affected by that the vehicle is moving from sample to sample.
Rather the predictions will change based on the environmental change around the
vehicle, i.e change in surrounding objects and lane markers. For each prediction,
p = {(x1, y1), (x2, y2), ..., (xn, yn)}, the fluctuation is the distance visualised in Fig.
3.13.

pt

pt+1

d

Figure 3.13: Visualisation of the fluctuations between two following predictions.
The distance d can also be divided in its lateral and longitudinal distances to further
describe the fluctuations.

As each time prediction has a sampling frequency of 4 Hz, the distance between
each prediction can be expressed as a velocity, vp. Assume f as the sampling fre-
quency, then the velocity between each sample can be calculated like the following.

vp = d · f (3.7)

Eq. (3.7) is then applied to all predictions in all sequences. The prediction
velocity is calculated for all predictions that have a temporal relationship to each
other. Consequentially, there will not be any spikes in the velocity between different
data log sequences. These results are presented in section 4.7

24


4
Results

To revisit the question if it is possible for a network to learn how to drive a car, the
proposed network model is evaluated on its performance measures in order to make
a conclusion about its driving abilities. The key features of interest are analysed to
make an effort in answering the question if the network, in fact, can manoeuvre a
vehicle. Furthermore, the characteristics of the network will be thoroughly analysed
to give a perspective of how well the network performs in various environments.

Firstly, the most general performance of the network, the loss, is evaluated by
analysing the Mean Square Error (MSE) of the predictions. This is to show how
well the network model can predict the ground truth data, by calculating the av-
erage error between all data points in the prediction. Secondly, there will be an
investigation of the object dependencies of the network. In other words, show how
the surrounding objects around the vehicle affect the network performance. Thirdly,
making a study of how the characteristics of the performance measures change in
regards to how far into the future the prediction is made.

Throughout the results, the longitudinal direction is regarded as the x-axis of the
ego vehicle body frame, and the lateral direction is along the y-axis.

4.1 Network performance and training progress

Tuning an LSTM-network with different parameters includes seeking a balance be-
tween the ability to learn complex scenarios without risking to overfit the model to
the data. Increasing the complexity, or expressiveness, of the model often means
increasing the hidden dimension of the cells and/or increasing the number of cells.
A drawback of increasing the model complexity is an increase in computation time
due to more trainable parameters, but once again, the aim is not to optimise the
model in regards to computation time. Rather, the focus lies in finding a model
which is able to mimic human behaviour. With that said, the validation loss of
various LSTM model sizes is presented in Tab. 4.1. All models have in common
that all cells share the same hidden dimension size in each separate model.

25


4. Results

Hidden dimensions
Number of cells

1 2 3 4 5

25 0.553 0.415 0.425 0.409 0.427
50 0.432 0.425 0.409 0.422 0.43
75 0.441 0.413 0.427 0.428 0.405
100 0.437 0.418 0.434 0.423 0.432
150 0.41 0.417 0.448 0.433 0.425
200 0.398 0.423 0.422 0.432 0.434
300 0.419 0.411 0.424 0.441 0.448

Table 4.1: The lowest mean square error of an epoch evaluated on the validation
set, trained for 500 epochs with different number of cells and hidden dimensions.
Note that all cells share the same hidden dimension size in each separate model.

Hereby, the future results will be based on the network with 300 hidden dimen-
sions and 3 cells. The decision is based on that the model performs marginally equal
to the others, but with higher complexity. The model could thus scale better with
more training data. When increasing the size of the dataset, the results from the
network may thus improve. This way, the model analysed is also ready to be trained
with more data than currently available.

After the data-processing been executed and the network model has been trained
a comparison with the benchmark network is made. Fig. 4.1 shows the training
progress for both the chosen network and the benchmark. As can be seen, the
proposed model performs marginally better than the benchmark.

26


4. Results

0 50 100 150 200 250 300 350 400 450 50010−1

100

101

102

103

Benchmark min = 0.61

Validation min = 0.44

Training min = 0.25

Epochs

M
SE

[m
2 ]

Mean prediction loss per epoch

Training
Validation
Benchmark - Validation

Figure 4.1: The training progress of the network over number of epochs for the
training and validation dataset, compared with the the benchmark model.

From Fig. 4.1, the model is overfitting the training data and can be seen when
the training loss goes down while the validation loss increases. This is caused by
the over-exposure of the training data, and as the network only can measure its own
performance on the training data the network gets more and more specialised in its
skills. But as the network gets more skilled on the training data, it also trades off
its generalisation abilities of the new data presented in the validation set. As this
is an unwanted behaviour, the best performing model on the validation
set is saved for further analysis.

4.2 Object dependence in model

When driving a car, a human’s choices are largely affected by surrounding vehi-
cles. You have to compromise with others about the drivable space on the road.
Mimicking human behaviour thus calls for the objects to have an importance in the
trajectory prediction. To analyse if the surrounding objects are important to achieve
a low loss, the mean losses for each epoch are presented in Fig. 4.2. The figure shows
the mean losses of the network when training with objects, and then excluding them
in the validation predictions. Remember from the preprocessing, a non-existing ob-

27


4. Results

ject is set to the coordinate (Rmax, Rmax). It is clear that without the objects, the
network is not able to predict the next movement at all. However, this does not
imply that one must have surrounding objects to achieve low loss, rather, it tells
that the network cares about the objects in the predictions. That means the object
data is not superfluous for the network when learning to drive like a normal driver.
The opposite can be observed in the benchmark model, where the objects do not
make any difference in the training progress, and are almost completely redundant
as an input.

0 50 100 150 200 250 300 350 400 450 50010−1

100

101

102

103

Epochs

M
SE

[m
2 ]

Mean losses per prediction

Validation with objects
Validation without objects
Benchmark - Validation with objects
Benchmark - Validation without objects

Figure 4.2: Mean losses per prediction after each epoch for the training set and
the validation set with and without objects as an input to the network. The figure
highlights that the objects are necessary to predict the next movement. When
training the network without any objects, the network fails to predict the next
movement.

To confirm the importance of objects, an analysis of how the number of objects
affects the predictions is necessary. Indeed, the complexity of a traffic scenario
increases with more objects. It is therefore interesting to analyse if scenarios with
more objects increase the average loss, or if no surrounding objects make the network
useless, as could be implied from Fig. 4.2. Fig. 4.3 present a violin plot where the
loss is compared to the object groups. A violin plot is very similar to a box plot, but
the data point distribution is also visualised. The conclusion is that the mean loss

28


4. Results

does not increase significantly between scenarios with zero objects and object groups
g1−5. Note, the average loss gets higher with object groups g6 and g7. However, due
to the lack of scenarios with those object groups, it is understandable that so is
the case. It is therefore a risk that the dataset is not well exposed to those object
groups. One can also observe spikes (outliers) for all scenarios but it is not possible
to see any correlation between when the spikes occur and the object groups, other
than that the max error seems to be lower for when no other objects are present.

Thus, it is confirmed that the proposed network model consider surrounding
objects in its predictions, while not depending on their presence. It implies that the
normal driver changes his/her behaviour when other objects are around and that
the network model can identify that.

0 g1 g2 g3 g4 g5 g6 g7
0

5

10

15

Lo
ng

itu
di

na
le

rr
or

[m
]

Error/Object groups

0 g1 g2 g3 g4 g5 g6 g7
0

1

2

Object groups

La
te

ra
le

rr
or

[m
]

Figure 4.3: The density of data points with the error after 5 seconds into the future
in relation to the object group in the scenario. Each point inside the distribution
represents a data point with the white circle being the mean error. Observe from
Fig. 3.4 the exposure of all object groups. The presence of cases with g8−10 is
considered negligible and it can also be seen that g6 and g7 has lower exposure than
the rest.

Furthermore, based on Fig. 4.4 closer objects leads to worse predictions. The be-
haviour could be explained by the increasing level of compromise about the drivable

29


4. Results

surface around the vehicle when another object gets closer. As a tangent from the
results in Fig. 4.4, the network may reflect the driver in the sense that close objects
are of greater importance and that the driver changes his/her behaviour according
to those objects. Note that the errors presented in the figures are the error of the
predictions after 5 seconds, not the mean error over each prediction.

0 Rmax

0

5

10

15

Distance to closest object [m]

Lo
ng

itu
di

na
le

rr
or

[m
] Error/distance to closest object - Longitudinal

0 Rmax

0

1

2

Distance to closest object [m]

La
tit

ud
in

al
er

ro
r

[m
]

Error/distance to closest object - Lateral

Figure 4.4: Comparing the closest distance to surrounding objects versus the error,
5 seconds into the future. Each point represent a single prediction in the validation
set.

It can be seen in both Fig. 4.3 and 4.4 that there are some troublesome outliers
present where the error is large compared to the mean error. A huge error in the
prediction could cause a formidable accident in traffic. Thus, one must highlight
when these worst-case predictions occur, and if it is possible to prevent them. How-
ever, the large error values often come from the same scenario in a sequence of 1-2
seconds. These cases have been individually analysed and the error stems from cases
where another vehicle acts with high unpredictability. For example, cases with cut-
ins or fast deceleration. When analysing the video feed from the specific sequences,
it is humanly impossible to predict the movement of the object that causes the er-
ror. As the normal driver would not be able to predict this movement pattern in
surrounding vehicles, it is possible that these cases are not learnable for the network
given the data input.

30


4. Results

4.3 Feature correlations in prediction performance

The correlation between longitudinal and lateral error to the velocity of the ego car
is presented in Fig. 4.5. No obvious correlation can be observed, and it is not clear
to draw any conclusions that the current velocity would affect the quality of the
predictions. Note, since the distribution of the number of events where the ego car
drives around 120 km/h are larger, it is probable that there also should be more
cases with large error in that velocity interval. Once again, the errors presented in
the figures are the error after 5 seconds into the future in the predictions.

80 100 120 140 160

0

5

10

15

20

25

80 100 120 140 160

0

0.5

1

1.5

2

2.5

3

3.5

4

Figure 4.5: The longitudinal respective the lateral error in relation to the current
velocity of the ego-car. Each point represents a single prediction in the validation
set.

Furthermore, the correlation between the curvature of the road and the error is
presented in Fig. 4.6. This to analyse if the error correlates with the steepness of
the curves.

31


4. Results

-20 0 20

0

5

10

15

20

25

-20 0 20

0

0.5

1

1.5

2

2.5

3

3.5

4

Figure 4.6: The longitudinal and lateral error in relation to the curvature of the
road. Each point represents a single prediction in the validation set. The curvature
is defined as the lateral distance between the first lane marker point and the last
lane marker point given as input to the network.

Seen in the left plot in Fig. 4.6, the loss actually has a tendency to decrease with
greater steepness of the curve. A cause for this could be that large errors occur
in situations where other vehicles intrude with the planned path. However, other
vehicles may be less likely to make any actions in a curve.

32


4. Results

4.4 Output sequence length

Predicting the future trajectory has to this point been limited to predicting 5 seconds
into the future, or 20 samples forward with a sampling frequency of 4 Hz. As this
limit is set by the need of analysing the network’s ability to predict the driver’s
intention, it is not necessarily needed to always plan the trajectory 5 seconds ahead.
However, with a smaller time horizon, the performance will not be eligible for the
same analysis previously presented. But it is still interesting to investigate the
performance if the time horizon is changed, while keeping the input history of 5
seconds. As it is harder to predict further into the future, it is obvious that the
error also will follow that trend. But, characterising the error curve for various time
horizons is still important to verify that the error is smaller when predicting not as
far in time. Also, when implementing the algorithm you may want to update the
prediction more often than once every 5 seconds.

Fig. 4.7 and Fig. 4.8 represent the longitudinal and the lateral error respectively,
between ground truth and the predictions in time for different percentiles. And as
previously mentioned, the error is in fact increasing together with the output length,
i.e. it’s more difficult for the network to predict far ahead in the future.

70 75 80 85 90 95 1000

5

10

15

20

25

Percentile

Lo
ng

itu
di

na
le

rr
or

[m
]

Longitudinal error

6 s
5 s
4 s
3 s
2 s
1 s

Figure 4.7: Longitudinal error between the prediction and the ground truth data,
at the last point of the predictions. The error tends to be higher the further into
the future one predicts. All errors are evaluated on the validation set.

33


4. Results

70 75 80 85 90 95 1000

1

2

3

4

Percentile

La
te

ra
le

rr
or

[m
]

Lateral error

6 s
5 s
4 s
3 s
2 s
1 s

Figure 4.8: Lateral error between the prediction and the ground truth data, at
the last point of the predictions. The error tends to be higher the further into the
future one predicts. All errors are evaluated on the validation set.

Remember that during each prediction, the network is not updated with any new
information. And as the predictions are made seconds in advance, the reason for a
large error is possibly because the traffic situation on the road changes substantially
within these seconds. At least, substantially enough that the ego driver needs to
react accordingly, inferring an error in the previous prediction. The previous predic-
tion may be justified for that specific traffic situation, but as the situation changes
it may become infeasible.

As the driver intention cannot be analysed through the previous means with a
lower time frame in the prediction, an analysis of a model that has been trained for
predicting 5 seconds ahead in needed. Thus in Fig. 4.9, the network has been trained
to predict 5 seconds in the future, or 20 samples ahead. The error in each time sample
is then analysed to give the performance of the model over time. One can observe
that the error increases exponentially over time and that an implementation of the
model may benefit from rapidly updating the prediction when driving.

34


4. Results

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Time [s]

R
oo

t
m

ea
n

sq
ua

re
er

ro
r

[m
]

Root mean square error over time

Error

Figure 4.9: Root mean square error of the model output from a network model
trained to predict the trajectory 5 seconds into the future and with a sample rate
of 4 Hz. The error is the total error in the specific time instance, not coupled with
the other time instances. The errors are evaluated on the validation set.

35


4. Results

4.5 Lane change predictions

An important quality within the network is the ability to mimic a real driver’s ability
to make decisions in different scenarios, and to some extent, predict the future of
the traffic flow. So far, the analysis has indicated that it is, to some point into
the future, possible to perform highway driving in the set environment. Anyway, a
distinct difference between a real driver and a network is its ability to know how to
make a lane change and when to do it. This calls to analyse if the network knows
when to perform a lane change.

Fig. 4.10 shows the error for lane change driving as well as straightway driving,
represented as histograms. The error is evaluated for predictions five seconds in the
future and the method used to calculate the error is covered in section 3.3.1.

0 2 4
0%

20%

40%

60%

Error [m]

Fr
eq

ue
nc

y

Lane change - Long.

Lane change
Mean = 0.91

0 2 4
0%

20%

40%

60%

Error [m]

Fr
eq

ue
nc

y

Lane change - Lat.

Lane change
Mean = 0.26

0 2 4
0%

20%

40%

60%

Error [m]

Fr
eq

ue
nc

y

Straightway driving - Long.

Straightway driving
Mean = 1.10

0 2 4
0%

20%

40%

60%

Error [m]

Fr
eq

ue
nc

y

Straightway driving - Lat.

Straightway driving
Mean = 0.28

Figure 4.10: Histograms over the longitudinal and lateral error respectively, in
scenarios with and without lane changes, using the proposed model.

36


4. Results

0 2 4
0%

20%

40%

60%

Error [m]

Fr
eq

ue
nc

y
Lane change - Long.

Lane change
Mean = 1.43

0 2 4
0%

20%

40%

60%

Error [m]

Fr
eq

ue
nc

y

Lane change - Lat.

Lane change
Mean = 0.25

0 2 4
0%

20%

40%

60%

Error [m]

Fr
eq

ue
nc

y

Straightway driving - Long.

Straightway driving
Mean = 1.44

0 2 4
0%

20%

40%

60%

Error [m]

Fr
eq

ue
nc

y

Straightway driving - Lat.

Straightway driving
Mean = 0.30

Figure 4.11: Histograms over the longitudinal and lateral error respectively, in
scenarios with and without lane changes, using the benchmark model.

From Fig. 4.10, the proposed network model actually performs marginally both
longitudinally and laterally in lane changes compared to straightway driving. As
previously stated, predicting a lane change 5 seconds in the future means that the
lane change is not yet initialised at the moment of prediction. Thus, the network
shows its ableness of knowing when to make a lane change. Also, when comparing the
lane change performance with the benchmark model, the proposed model performs
better in almost all measures. With the exception of the lateral error when changing
lanes, where the models perform near equally, the proposed model is outperforming
the benchmark, in Figure 4.11.

The important note here is that the model performs better in the lane changes
than the straightway driving. If we can conclude from the network training progress,
in section 4.1, that we are able to mimic the human driving with high precision in
the general case - then we are also able to identify that the model can predict future
lane changes as well as, if not better, the straightway driving. Thus, with that
conclusion, the model is able to mimic the driver intention from the data logs.

37


4. Results

4.6 Safety assessment of predicted trajectory

From the model output, the predicted trajectory is compared to the ground truth
target trajectory in each prediction separately. If then, the vehicle would follow the
predicted path, would it manoeuvre into or close to another object? The comparison
is done by calculating the new relative object positions from the predicted trajectory
instead of following the ground truth, by the methodology presented in section
3.3.2. This is to verify that the model does not make dangerous manoeuvres that
may compromise the safety in the road environment. From the error distributions
presented in section 4.3, large deviations from the ground truth position may infer
a security risk.

Figure 4.12: Relative object position data points in the body frame of the ego
vehicle, when following the predicted trajectory from the network model. Each
point represents an object position relative to ego car (0,0) at some time in the
entire validation set. A point is thus only the object detection point with no spatial
information other than the point position.

In Fig. 4.12, the ego-car is placed in the origin (0,0), and all surrounding ob-
jects are expressed relative to this point and in the ego vehicle body frame. When
analysing the relative object positions the closest object longitudinally to the ego

38


4. Results

vehicle is 7.5 meters away and the closest laterally is 3.5 meters away, thus no direct
collisions with surrounding objects. This shows that while the network still has some
predictions with large error, there are no collisions in those predictions.

4.7 Evaluation of prediction fluctuation

To investigate the behaviour of the predictions in regards to time, the change be-
tween each subsequent sample is calculated. This to see if the predictions tend to
fluctuate a lot and to be able to tell something about the robustness of the pre-
dictions. The fluctuation distance is calculated from the last point in the 5 second
prediction, by the method presented in section 3.3.3.

−10 −5 0 5 10
0%

5%

10%

15%

20%

25%

30%

35%

40%

Distance [m]

Fr
eq

ue
nc

y

Longitudinal

−2 −1 0 1 2
0%

5%

10%

15%

20%

25%

30%

35%

40%

Distance [m]

Fr
eq

ue
nc

y
Lateral

Figure 4.13: The fluctuation of the predictions, given both longitudinally and
laterally. The distance of the fluctuations are here presented in relation to the
frequency of the distances represented when predicting throughout the validation
set.

The fluctuations shown in Fig. 4.13 is an interesting part of the behavioural
outcome of the network model. If a large fluctuation in the predictions is a downside
or not is not obvious to answer. As previously seen in the error distribution, there
are a few sequences that produce a large error compared to others. The conclusion
from these sequences was that the large errors were caused by other vehicles on the
road acting with high unpredictability. Even though there was a large error for a
small period of time, the network model was then able to properly assess the traffic
situation and predict accordingly. What stems from this observation is that a high
fluctuation in the predictions is not explicitly a bad property. It could also mean

39


4. Results

that the model is able to quickly adapt to an unpredictable situation, just like a
normal driver would.

40


5
Conclusion

This study presents a data-driven approach using neural networks for modelling nor-
mal driving behaviour on a multi-lane highway environment. This driver model was
evaluated on decided performance measures, to conclude that the model reflected
the innate behaviours that are present in human driving. In other words, mimic the
driving in data logs gathered on expeditions in various places over the world.

The chosen network model made use of Long Short-Term Memory (LSTM) cells,
structured as a sequence-to-sequence model, in order to make a trajectory prediction
of the future data points. To make these predictions, the model used surrounding
objects’ relative positions, ego velocity, and road lane marker positions. With the
help of these data features, the network model was able to make predictions about
the future trajectory mostly with high precision. Further, the behaviour of the
network was analysed to verify the intended performance. For instance, the model
was tested to investigate the object importance in the predictions. It was shown
that the network model made use of surrounding objects when these were present
and that it never predicted the trajectory into a collision. A comprehensive study
of correlations between prediction error and data features was made to investigate
any deficiencies or cases that the network may have trouble dealing with.

Analysing the network model’s ability to predict the driver’s intention in traffic
stretched to evaluate if the network model was able to predict a lane change before
it was initiated. The conclusion was that, when comparing lane change performance
with performance in straightway driving, the network model could properly reflect
the driver’s intention of making future actions.

The conclusion is thus that the proposed network model can predict the future
trajectory of a human-driven vehicle with a low error range, and that it can to some
extent also predict the driver’s intention regarding lane changes.

41


5. Conclusion

42


6
Future work

To progress towards the goal of modelling the normal driver, this chapter highlights
ideas and further extensions to aim this purpose.

Dataset

Future works should expand the dataset to include other kinds of events and envi-
ronments. This could be driving in city traffic, country roads or including a wider
range of velocities of the ego-vehicle. Also, expanding the dataset to 360-degree
vision would be beneficial for capturing the whole vision of the driver.

Network architecture

This research used an LSTM-approach to find features in the dataset. Future work
could investigate another kind of network structures or any combination of other
methods, like the ones mentioned in related works.

To train networks of this complexity and with this amount of data, takes a long
time. Thus, future work could investigate how to train LSTM-networks in a more
efficient way. This could be data structure analysis as well as optimising the input
structure to the network.

Also, since general driving is not just one correct answer, an interesting extension
of the network model would be a statistical approach to the network output. The
network output would then reflect the various choices the driver is faced with when
driving.

Closed-loop analysis

So far, the network has been tested and evaluated open-loop, meaning that the
network has always been updated with the ground truth after each prediction. An
interesting future work would be to implement the network in a closed-loop envi-
ronment. The driver model could then further analysed.

43


6. Future work

44


Bibliography

[1] A. Karpathy, “The Unreasonable Effectiveness of Recurrent Neural Networks.”
[Online]. Available: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

[2] X. B. Peng and G. Berseth, “DeepLoco: Dynamic Locomotion Skills Using
Hierarchical Deep Reinforcement Learning,” ACM Trans. Graph, vol. 36, p. 41,
2017. [Online]. Available: https://doi.org/http://dx.doi.org/10.1145/3072959.
3073602.

[3] I. H. Kim, J. H. Bong, J. Park, and S. Park, “Prediction of drivers intention
of lane change by augmenting sensor information using machine learning tech-
niques,” Sensors (Switzerland), 2017.

[4] M. Manoj krishna, M. Neelima, M. Harshali, and M. Venu Gopala Rao, “Image
classification using Deep learning,” International Journal of Engineering
& Technology, vol. 7, no. 2.7, p. 614, 3 2018. [Online]. Available:
https://www.sciencepubco.com/index.php/ijet/article/view/10892

[5] Z. Wang, J. Merel, S. Reed, G. Wayne, N. De Freitas, and N. H. Deepmind,
“Robust Imitation of Diverse Behaviors,” Tech. Rep. [Online]. Available:
https://arxiv.org/pdf/1707.02747.pdf

[6] K. M. Kumar, H. Kandala, and N. S. Reddy, “Synthesizing and
Imitating Handwriting Using Deep Recurrent Neural Networks and Mixture
Density Networks,” in 2018 9th International Conference on Computing,
Communication and Networking Technologies (ICCCNT). IEEE, 7 2018, pp.
1–6. [Online]. Available: https://ieeexplore.ieee.org/document/8493843/

[7] Y. Lee, T. Kim, and S.-Y. Lee, “Voice Imitating Text-to-Speech Neural
Networks,” Tech. Rep. [Online]. Available: https://arxiv.org/pdf/1806.00927.
pdf

[8] A. Houenou, P. Bonnifait, V. Cherfaoui, and W. Yao, “Vehicle trajectory predic-
tion based on motion model and maneuver recognition,” in IEEE International
Conference on Intelligent Robots and Systems, 2013, pp. 4363–4369.

[9] N. Ye, Y. Zhang, and R. Wang, “Vehicle trajectory prediction based on hid-
den Markov model,” KSII Transactions on Internet and Information Systems,
vol. 10, no. 7, pp. 3150–3170, 7 2016.

[10] M. Wöllmer, B. Schuller, F. Eyben, and G. Rigoll, “Combining long short-term
memory and dynamic bayesian networks for incremental emotion-sensitive ar-

45

http://karpathy.github.io/2015/05/21/rnn-effectiveness/
https://doi.org/http://dx.doi.org/10.1145/3072959.3073602.
https://doi.org/http://dx.doi.org/10.1145/3072959.3073602.
https://www.sciencepubco.com/index.php/ijet/article/view/10892
https://arxiv.org/pdf/1707.02747.pdf
https://ieeexplore.ieee.org/document/8493843/
https://arxiv.org/pdf/1806.00927.pdf
https://arxiv.org/pdf/1806.00927.pdf


Bibliography

tificial listening,” IEEE Journal on Selected Topics in Signal Processing, vol. 4,
no. 5, pp. 867–881, 10 2010.

[11] F. Altché and A. de La Fortelle, “An LSTM Network for Highway Trajectory
Prediction,” 1 2018. [Online]. Available: http://arxiv.org/abs/1801.07962

[12] A. Bükk and R. Johansson, “Private Communication,” 2019.
[13] B. Hanin, “Universal Function Approximation by Deep Neural Nets with

Bounded Width and ReLU Activations,” 8 2017. [Online]. Available:
http://arxiv.org/abs/1708.02691

[14] I. Goodfellow, Y. Begio, and A. Courville, “Deep Learning.” [Online]. Available:
http://www.deeplearningbook.org/

[15] C. Olah, “Understanding LSTM Networks.” [Online]. Available: http:
//colah.github.io/posts/2015-08-Understanding-LSTMs/

[16] B. Mehlig, “Artificial Neural Networks,” Tech. Rep., 2019. [Online]. Available:
https://arxiv.org/pdf/1901.05639.pdf

[17] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Tech.
Rep. 8, 1997. [Online]. Available: http://www7.informatik.tu-muenchen.de/
~hochreithttp://www.idsia.ch/~juergen

[18] G. Li, S. E. Li, L. Jia, W. Wang, B. Cheng, and F. Chen, “Driving Maneu-
vers Analysis Using Naturalistic Highway Driving Data,” in IEEE Conference
on Intelligent Transportation Systems, Proceedings, ITSC, vol. 2015-October.
Institute of Electrical and Electronics Engineers Inc., 10 2015, pp. 1761–1766.

46

http://arxiv.org/abs/1801.07962
http://arxiv.org/abs/1708.02691
http://www.deeplearningbook.org/
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://arxiv.org/pdf/1901.05639.pdf
http://www7.informatik.tu-muenchen.de/~hochreithttp://www.idsia.ch/~juergen
http://www7.informatik.tu-muenchen.de/~hochreithttp://www.idsia.ch/~juergen

	List of Figures
	List of Tables
	Introduction
	Background
	Aim
	Limitations
	Related work

	Theory
	Recurrent Neural Networks
	Long Short-Term Memory
	Linear layer
	Data normalisation

	Methods
	Dataset
	Data distribution
	Summary

	Network architecture
	Network modelling
	Data input
	Network training

	Performance measures
	Driver intention
	Safety evaluation
	Prediction fluctuation


	Results
	Network performance and training progress
	Object dependence in model
	Feature correlations in prediction performance
	Output sequence length
	Lane change predictions
	Safety assessment of predicted trajectory
	Evaluation of prediction fluctuation

	Conclusion
	Future work
	Bibliography