Mission Management for Fuel Cell
Electric Trucks – a Bilevel Approach
Optimise power split, speed and refuelling in the presence of
uncertain external factors
Master’s thesis in Systems, Control and Mechatronics

ILYA KUANGALIYEV

OSCAR MARK

DEPARTMENT OF ELECTRICAL ENGINEERING

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2024
www.chalmers.se

www.chalmers.se


Master’s thesis 2024

Mission Management for Fuel Cell Electric Trucks
- a Bilevel Approach

Optimise power split, speed and refuelling in the presence of
uncertain external factors

ILYA KUANGALIYEV
OSCAR MARK

Department of Electrical Engineering
Division of Systems and control

Chalmers University of Technology
Gothenburg, Sweden 2024


Mission Management for Fuel Cell Electric Trucks - a Bilevel Approach
Optimise power split, speed and refuelling in the presence of uncertain external
factors
ILYA KUANGALIYEV, OSCAR MARK

© ILYA KUANGALIYEV, OSCAR MARK 2024.

Supervisors: Saurabh Suman, Olof Lindgärde, Volvo Group Technology
Examiner: Nikolce Murgovski, Department of Electrical Engineering

Master’s Thesis 2024
Department of Electrical Engineering
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: Illustration of mission management for fuel cell electric trucks

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Printed by Chalmers Reproservice
Gothenburg, Sweden 2024

iv


Mission Management for Fuel Cell Electric Trucks - a Bilevel Approach
Optimise power split, speed and refuelling in the presence of uncertainty
ILYA KUANGALIYEV, OSCAR MARK
Department of Electrical Engineering
Chalmers University of Technology

Abstract
Fuel cell electric vehicles (FCEVs) have a crucial role in the transition to fossil-
free travel and transport. In order to fully leverage their potential, proper mission
planning of such vehicles should be conducted. This thesis aims to solve the mixed-
integer problem of mission management with a hierarchical bilevel approach. The
first layer solves a simplified mixed-integer problem of choosing hydrogen stations to
refuel at and the amount to refuel with the help of stochastic dynamic programming,
allowing us to account for uncertain external factors to obtain a more robust solution.
The second layer becomes a smooth problem that can be solved with more complex
and comprehensive dynamics. In addition, the benefits of using non-equidistant
stage sampling were evaluated. To evaluate the robustness and optimality of the
proposed approach 3 tests were conducted. The simulations were conducted with
a 45-ton truck using 3 routes: Zagreb–Paris (1559 km), Gothenburg–Kiruna (1793
km), and Gothenburg–Rødby (461 km). Overall, according to the results, the bilevel
approach can yield solutions comparable in optimality to the ones that were obtained
with different strategies. In some cases, the proposed approach produced better
results when it comes to robustness in the presence of disturbances, but the results
were not conclusive enough to answer either of the research questions with a high
level of confidence. Even though the non-equidistant stage sampling did not show
any substantial increase in simulation accuracy, it was shown that stage and DP
stage resolutions can be reduced without significant losses in optimality, making the
computation time almost 3 times smaller.

Keywords: Fuel cell electric trucks, Mission Management, Stochastic optimal con-
trol, Mixed-integer nonlinear program.

v


Acknowledgements
We would like to extend our heartfelt gratitude to our supervisors, Saurabh Suman
and Olof Lindgärde. Their support and advice throughout the project has been
invaluable. By giving us a good understanding of the background and inspiring
us with a lot of great ideas for implementation they helped the project to advance
without a single hitch. We also want to thank our examiner, Nikolce Murgovski, for
being so dedicated to helping us with every issue we had and sharing all his experi-
ence in optimal control. Finally, we want to thank our peer, Isac Borghed, who was
doing his thesis on a similar topic together with us at Volvo Group Technology, for
great and effective collaboration and partnership.

Ilya Kuangaliyev, Gothenburg, June 2024
Oscar Mark, Gothenburg, June 2024

vii


List of Acronyms

Below is the list of acronyms that have been used throughout this thesis listed in
alphabetical order:

DC Direct current
GPS Global positioning system
ICE Internal combustion engine
IVP Initial value problem
BEV Battery electric vehicle
EM Electric machine
ESS Energy storage system
HRS Hydrogen refuelling station
MSE Mean squared error
MPC Model predictive control
MINLP Mixed-integer nonlinear program
NLP Nonlinear program
NMPC Nonlinear model predictive control
FC Fuel cell
FCEV Fuel cell electric vehicle
HEV Hybrid electric vehicle
PEM Predictive energy management
PEMFC Proton-exchange membrane fuel cell
RPM Revolutions per minute
DP Dynamic programming
SDP Stochastic dynamic programming
SoC State of charge

ix


Nomenclature

Below is the nomenclature of indices, sets, parameters, and variables that have been
used throughout this thesis.

Indices

i State index
n Stage number in the NLP
k Stage number in the MINLP

Sets

X State constraint set
Xf Terminal state constraint set
U Control input constraint set
Kstation Set of stages in MINLP that have a hydrogen station

Parameters

x0 Initial condition
xtarget Target terminal state value
m Vehicle mass
mb Battery mass
Af Vehicle effective frontal surface area
Cd Vehicle aerodynamic drag coefficient
Crr Vehicle rolling resistance coefficient
ktm Gear ratio of the transmission
kamb Coefficient of heat exchange between the battery’s case and air

xi


g Free fall acceleration
ρair Air density
ηem Electric machine efficiency
ηtm Transmission efficiency
ηth Battery thermal system efficiency
Pfc,max Fuel cell maximum power output
Pfc,min Fuel cell minimum power output
Ṗfc,max Fuel cell maximum power output rate
Ṗfc,min Fuel cell minimum power output rate
FT,max Maximum traction force
Fbr,max Maximum friction brake force
H̄2,min Minimum normalised hydrogen level
H̄2,max Maximum normalised hydrogen level
SoCmin Minimum SoC level
SoCmax Maximum SoC level
Cb Battery electric capacity
CH2 Hydrogen tank capacity
cb Battery specific heat capacity
N Number of stages in the NLP
K Number of stages used in the MINLP
M Number of quantised state values used in SDP
R Tyre effective radius
trf Full tank refuelling time
tq Station queue time
vmax Maximum vehicle speed
Tb,min Minimum battery temperature
Tb,max Maximum battery temperature
λ Discount factor
∆s Spatial step
∆H Hydrogen’s lower heating value
κ Risk tolerance (Maximum risk of failure)

Variables

xii


x State vector
u Control input vector
y Outputs vector
t Time
tK Total travel time in NLP
tN Total travel time in MINLP
v Vehicle longitudinal velocity
vrel Vehicle speed relative to the air
s Vehicle position along the path
q Battery charge
SoC Battery state of charge
Ib Battery current
H2 Hydrogen level in the vehicle tank
H2,rf Hydrogen refuelled at a hydrogen station
H̄2 Normalised hydrogen level in the vehicle tank
H̄2,rf Normalised hydrogen refuelled at a hydrogen station
Tb Battery temperature
Pem Electric machine power demand
Pfc Fuel cell power output
Pb Battery power output
Pbr Friction brake power
Pth Battery thermal system power
Ebr Energy lost from friction breaking
Qact Battery active heat generation rate
Qpass Battery passive heat generation rate
Qamb Battery dissipated heat rate
FT Traction force
Ffc Traction force from the fuel cell
Fb Traction force from the battery
Fbr Friction brake force
Frr Rolling resistance force
Fair Aerodynamic drag force
Fg Gravitational force due to road slope
Fz Normal reaction force of the road

xiii


sH2 Slack variable for H̄2 bound relaxation
sSoC Slack variable for SoC bound relaxation
st Slack variable for t bound relaxation
τem Electric machine torque
ωem Electric machine angular velocity
δπ Binary variable used for feasibility check in the MINLP
δst Binary variable identifying a hydrogen station’s presence
ωt Cost of time in the NLP
ωH2 Cost of hydrogen in the NLP
ωs Cost of slack variables in the NLP
ωbr Cost of friction breaking in the NLP
εe Cost of deviating from desired state targets
εbr Cost of friction braking in the MINLP
εfuel Cost of fuel (hydrogen) in the MINLP
εfail Cost of mission failure in the MINLP
εstop Cost of making a stop to refuel in the MINLP
εtr Cost of breaching the fuel cell’s transient’s maximum allowed level

in the MINLP
εllim,xi

Cost of breaching a state’s lower bound in the MINLP
εulim,xi

Cost of breaching a state’s upper bound in the MINLP
εC Cost of mission completion in the MINLP
µPem Power demand’s mean
σPem Power demand’s standard deviation
γ Vehicle’s gear

Functions

p(s) Road parameters
α(s) Road grade
φ(s) Vehicle heading angle
θ(s) Wind heading angle
Tamb(s) Ambient temperature
w(s) Wind speed
β(·) Wind effective angle of attack

xiv


vlim(s) Road speed limit
∆vtr(s) Traffic speed reduction
Paux(Tamb) Auxiliary power demand
Rb(SoC, Tb) Battery internal resistance
Pem,loss(Pem) Electric machine loss
Pb,max(SoC, Tb) Battery maximum power output
Pb,min(SoC, Tb) Battery minimum power output
Uoc(SoC, Tb) Battery open circuit voltage
H2,con(Pfc) Hydrogen consumption rate
J(·) Objective function to minimise in the nonlinear problem
Pb,loss(·) Battery heat loss
Pfc,heat(Pfc) Fuel cell heat loss
Pr(·) Probability function
f(·) Probability density function of the uncertainty
f(·) Continuous state dynamics
g(·) Nonlinear inequality constraints
kem(γ) Gear ratio of the motor
L(·) Cost function used in the MINLP
L(·) Cost function used in the MINLP
L(·)dr Cost function of driving used in the MINLP
L(·)lim Cost function of leaving the state constraint set used in the MINLP
L(·)tr Cost function of breaching the maximum allowed level of the fuel

cell’s transient used in the MINLP
L(·)rf Cost function of refuelling used in the MINLP
T (·) Terminal cost function used in the MINLP
Q(·) Bellman action-value function
V (·) Bellman value function
π Optimal policy from dynamic programming
δDirac(s) Dirac delta function
ηfc(Pfc) Fuel cell efficiency

xv


xvi


Contents

List of Acronyms ix

Nomenclature xi

List of Figures xix

List of Algorithms xxi

List of Tables xxiii

1 Introduction 1
1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Main research questions . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 5
2.1 Vehicle modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Longitudinal dynamics . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Powertrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Stochastic optimal control . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Mixed-integer optimisation . . . . . . . . . . . . . . . . . . . . 16

2.3 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Stochastic dynamic programming . . . . . . . . . . . . . . . . 17

3 Methods 19
3.1 Solution strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Discretisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1 Space sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Data downsampling . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Problem formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Top layer – stochastic dynamic programming . . . . . . . . . . 22
3.3.2 Bottom layer – non-linear programming . . . . . . . . . . . . 27

3.4 Simulation setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.1 Test 1 – Robustness analysis . . . . . . . . . . . . . . . . . . . 29
3.4.2 Test 2 – Iterative method . . . . . . . . . . . . . . . . . . . . 31
3.4.3 Test 3 – Resolution analysis . . . . . . . . . . . . . . . . . . . 32

xvii


Contents

4 Results 33
4.1 Downsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Test 1 – Robustness analysis . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Test 2 – Iterative method . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Test 3 – Resolution analysis . . . . . . . . . . . . . . . . . . . . . . . 42

5 Discussion 47
5.1 Downsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Robustness Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Iterative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Resolution analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Conclusion 51

Bibliography 53

A Algorithms used in solution and evaluation I
A.1 Speed preprocessing algorithm . . . . . . . . . . . . . . . . . . . . . . I
A.2 Heuristic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . II

B Nonlinear problem formulation III

xviii


List of Figures

2.1 Illustration of wind’s relative speed. . . . . . . . . . . . . . . . . . . . 7
2.2 Transmission efficiency maps as a function of motor torque and RPM

and at optimal gear. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Illustration of PEMFC’s principle of operation. [27], CC0 . . . . . . . 9
2.4 Hydrogen power and fuel cell efficiency, as functions of fuel cell power. 10
2.5 Normalised charge and discharge power bounds, as functions of bat-

tery temperature and state of charge. . . . . . . . . . . . . . . . . . . 12
2.6 (a): Normalised open circuit voltage and internal resistance, as func-

tions of battery temperature and SoC. . . . . . . . . . . . . . . . . . 13
2.7 Schemes representing the cases of the vehicle’s power flow. . . . . . . 14

3.1 Block diagram illustrating the hierarchical approach. . . . . . . . . . 20
3.2 Comparison of interpolation methods in discontinuous region. . . . . 25
3.3 Illustration of stochastic variables’ sampling. . . . . . . . . . . . . . . 26
3.4 Illustration of normal approximation of the stochastic power demand.

Probability density function of total power demand (v = 70 km/h,
α = 0, v′ = 0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1 Comparison of equidistant and non-equidistant sampling for two dif-
ferent routes, with original resolutions of N = 15000 and N = 6000,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 exponential fit of MSE data for route one. . . . . . . . . . . . . . . . 34
4.3 H̄2 and SoC trajectories together with the power split for the Gothenburg–

Kiruna route with equidistant sampling with N = K = 1600. Sub-
figures (a)–(c) correspond to different numbers of quantised state
values in SDP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4 H̄2 and SoC trajectories together with the power split for the Gothenburg–
Kiruna route with non-equidistant sampling with N = K = 1600.
Subfigures (a)–(c) correspond to different numbers of quantised state
values in SDP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

xix


List of Figures

xx


List of Algorithms

1 Speed preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
2 Heuristic station selection . . . . . . . . . . . . . . . . . . . . . . . . II

xxi


List of Algorithms

xxii


List of Tables

3.1 Station positions and hydrogen prices for the Zagreb–Paris route. . . 30
3.2 Station positions and hydrogen prices for the Gothenburg–Kiruna route. 31
3.3 Initial and terminal state constraints, stage quantisation and SDP

regeneration decision for simulations 1-12. . . . . . . . . . . . . . . . 32

4.1 Simulation results with different methods for the route Zagreb–Paris
with no uncertainty applied. . . . . . . . . . . . . . . . . . . . . . . . 35

4.2 Simulation results with different methods for the route Zagreb–Paris,
with uncertainty applied after the station selection. . . . . . . . . . . 36

4.3 Simulation results with different methods for the route Gothenburg–
Kiruna with no uncertainty applied. . . . . . . . . . . . . . . . . . . . 37

4.4 Simulation results with different methods for the route Gothenburg–
Kiruna, with uncertainty applied after the station selection. . . . . . 38

4.5 Results from simulations for route one, Zagreb–Paris, using the same
seed for every simulation. . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.6 Results from simulations for route two, Gothenburg–Kiruna, using
the same seed for every simulation. . . . . . . . . . . . . . . . . . . . 40

4.7 Results from simulations for route three, Gothenburg–Rødby, using
the same seed for every simulation. . . . . . . . . . . . . . . . . . . . 40

4.8 Results from simulations for route one, Zagreb–Paris, using a different
seed for every simulation. . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.9 Results from simulations for route two, Gothenburg–Kiruna, using a
different seed for every simulation. . . . . . . . . . . . . . . . . . . . . 41

4.10 Results from simulations for route three, Gothenburg–Rødby, using
a different seed for every simulation. . . . . . . . . . . . . . . . . . . 41

4.12 Simulation results with different SDP state quantisation and stage
resolution with non-equidistant stage sampling for the route Gothenburg–
Kiruna. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.11 Simulation results with different SDP state quantisation and stage
resolution with equidistant stage sampling for the route Gothenburg–
Kiruna. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

xxiii


List of Tables

xxiv


1
Introduction

Humanity has set ambitious goals for reducing worldwide carbon emissions to fight
climate change. As transport is responsible for about a quarter of the EU’s total
CO2 emissions, 27% of which come from heavy-duty trucks [1], the market actors
can not ignore this challenge. From a lifecycle perspective, most of the emissions
occur during the use phase of vehicles. Therefore, one of the ways that companies
can reduce emissions is to manufacture and encourage the use of vehicles running
on more sustainable sources of energy.

Today plenty of resources are spent on developing alternatives to the internal com-
bustion engine. An example of such is fuel cell electric vehicles. One challenge in
bringing these vehicles to the market is the lack of infrastructure required to charge
or refuel. To ensure the adoption of these new technologies in the commercial sector
additional services need to be provided that plan routes and suggest where to stop
to refuel, as such decisions might have unintuitive optimal solutions.

A common way of approaching the mission management task is viewing it as an op-
timisation problem and solving it using nonlinear model predictive control (NMPC)
[2]. Some of the previous work in this field make it evident that mission man-
agement for fuel cell electric vehicles using NMPC is not a trivial task for many
reasons. Firstly, problems arise from the specific behaviour of the fuel cell, which
makes it necessary to use an additional short-term energy storage system (ESS), in
other words, a battery. The ESS lets the vehicle brake regeneratively, increasing en-
ergy efficiency, and provides an additional power source during acceleration and hill
climbing, reducing power fluctuations for the fuel cell, which increases its lifetime
[3]. This, however, in turn leads to some complications. To fully utilise the ESS
the battery state of charge (SoC) must be used optimally. For instance, the battery
SoC should be sufficiently high before a large hill climb, and nearly empty before
a steep descent, in order to efficiently assist the fuel cell and recuperate otherwise
lost energy. Having two energy sources also complicates the optimisation task as
it increases the number of states in the vehicle, where one state tracks the level of
hydrogen in the tank and another tracks the battery’s charge. This in turn means
the controller needs to know the road topography ahead of time and plan accordingly.

Another complication arises from the fact that the task contains some binary deci-
sion variables, such as whether to stop at a hydrogen station to fuel or not. This
makes the optimisation problem mixed-integer which is computationally demanding
to approach using only NMPC as then it has to be solved for many combinations of

1


1. Introduction

the binary variables. In this project, a two-layer hierarchical approach is developed,
where the discrete decision variables are determined at the top level using stochastic
dynamic programming (SDP) leaving the residual problem to be solved at the lower
level as a smooth nonlinear program (NLP).

1.1 Previous work
One of the main works that this project is based on is that by Sundberg and Bragde
[4]. In their thesis, the same problem is approached only from the NMPC perspec-
tive, where some smoothening techniques are used to omit the integer part of the
objective function. Also, some complexity-reducing assumptions such as simplified
thermal dynamics of the battery were made in that work which will be challenged
in this work. Another example of using optimal control applied to the modelling of
FCEVs is [5] where a supervisory control is conducted to ensure an optimal power
split between the battery and the fuel cell. In [6] one can find how power split
optimisation is conducted with model predictive control (MPC).

The area of vehicle routing and planning has been studied for decades. Optimal ve-
hicle routing problems consider single or multiple vehicles travelling along a graph,
potentially loading/unloading goods or passengers at certain points and minimis-
ing a cost often associated with fuel, time, or more recently, environmental impact
[7]. For an overview, see the book by Toth and Vigo [8]. For the single vehicle
mission, eliminating the routing problem and instead considering a fixed route de-
creases model complexity, allowing instead for more complex vehicle modelling, e.g.,
hybrid vehicles where a powertrain split can be optimised [9]. Early work consider-
ing hybrid electric vehicles (HEVs), by Rajagopalan and Washington [10] and Lin
et al. [11], utilise GPS technology to optimise the power split between the inter-
nal combustion engine (ICE) and the electric machine (EM) for a short horizon
based on topography and traffic data. In papers by Kolmanovsky et al. [12] and
by Johannesson et al. [13], certain parameters are modelled as stochastic. In Sund-
berg and Bragde’s thesis, and in the paper by Johannesson et al. the problem is
discretised spatially, as space-dependent dynamics become easier to formulate than
time-dependent dynamics and maintain problem convexity. In her master thesis,
Katsagyri [14] samples the road at non-equidistant points to capture more detail in
the road characteristics and in doing so improves accuracy.

When it comes to thermal modelling of the battery, Hamednia et al. [15] discusses
optimal (battery) thermal management for battery electric vehicles (BEVs) from the
mission planning perspective in their paper. In it, they also separate driving and
charging dynamics into a hybrid dynamical system, where the charging dynamics
are modelled in a time domain, whereas the driving dynamics are modelled in a
spatial domain. Another example is a paper by Murashko et al. [16], in which they
describe the modelling thermal management system for the lithium-ion battery pack
in hybrid electric vehicles (HEVs).

Different hierarchical approaches have also been incorporated quite extensively in

2


1. Introduction

past works. One example is a paper by Johannesson et al. [17], in which the solution
is based on a 3-layer structure to approach the mixed-integer problem, where each
layer operates on different timescales with different update frequencies, prediction
horizons, and model abstractions. Another example of such an approach is described
in a paper by Ganesan et al. [18], where the lower layer is intended to solve the
mixed-integer problem of gear selection using receding horizon MPC with different
numerical relaxations.

Methods accounting for stochastic factors have also been used in quite a few works,
one example being a paper by Wang et al. [19], where optimal energy schedul-
ing for a hybrid fuel cell/PV/battery system in the presence of uncertain factors is
done based on stochastic dynamic programming. In their paper, Ritter et al. [20]
incorporate stochastic model predictive control for the energy management of HEVs.

1.2 Limitations
In order to reduce the complexity of the problem, no other vehicle dynamics than
the longitudinal ones are considered in this work, which means that only the forces
that influence a vehicle’s movement in the direction of travel are accounted for. For
the same reason, no fluid dynamics of hydrogen during refuelling is considered in
the work.

According to EU laws, truck drivers must take a 45-minute long break after 41
2

hours of driving at the latest, and must not drive for more than 9 hours a day,
including breaks, except for twice a week where it may be extended to 10 hours
[21]. This factor significantly complicates the solution of the mission management
problem, therefore an optimisation problem considering these constraints will be left
for future work.

1.3 Main research questions
Through implementing a hierarchical approach for handling the mixed-integer prob-
lem, the authors hope to answer the following research questions:

• Can splitting the mixed-integer problem into two layers, solving the integer
decisions in the higher layer, thus making the second layer a smooth non-linear
problem, yield a similar optimal solution as the MINLP problem?

• Further, can modelling certain dynamics as stochastic processes improve the
robustness of the refuelling station selection?

• Can non-equidistant stage sampling help reduce the computational heaviness
of the problem without significant losses in accuracy and optimality?

3


1. Introduction

4


2
Theory

This chapter details how the vehicle and its powertrain are modelled, detailing the
background to help understand the equations used in modelling.

2.1 Vehicle modelling
As previously stated only the longitudinal dynamics are considered as the truck is
driving, the problem can thus be considered 2-dimensional, one being the spatial
dimension and the other being the temporal dimension.

2.1.1 Longitudinal dynamics
The equation of motion of the truck in the longitudinal direction can be written as

mv̇ = FT − Frr − Fair − Fg − Fbr, (2.1)

where, on the left side, m is the mass of the truck and v̇ is the time derivative of
the truck’s speed. On the right-hand side, FT is the traction effort generated by the
EM, Frr is the sum of all rolling resistance forces acting on each tyre, Fair is the
aerodynamic drag force, and Fg is the gradient force due to the road slope. The
equation of motion is missing longitudinal slip losses, which appear when the wheel
rotation speed differs from the corresponding vehicle speed. Modelling slip losses
require tracking wheel rotation speed, which significantly increases the complexity
of the model, and as these losses are mostly relevant during acceleration and decel-
eration modelling them provides little increase in accuracy, especially in the mission
management context. A more detailed description of traction effort FT is done in
section 2.1.2 whereas the other forces are discussed here.

Rolling resistance

Rolling resistance is the resistive force that arises as a wheel is rolling on the surface
of the road. The deformation of the tyre during rolling is both elastic and viscous, so-
called viscoelastic, due to the properties of rubber material. In the elastic component
of the deformation, no permanent (plastic) deformations occur, which is, however,
not the case for the viscous component as this deformation can not be fully recovered
after unloading. During viscous deformation, friction between the molecules causes
a conversion of the mechanical energy transferred to the tyre into heat, hence leading

5


2. Theory

to energy losses [22]. Mathematically these losses are expressed as a rolling resistance
force that can be written as

Frr = CrrFz, (2.2)

where Crr is the rolling resistance coefficient of the tyres and Fz is the sum of all
the normal reaction forces of the road acting on each tyre. The rolling resistance
coefficient Crr is a measure of the force needed for a vehicle to maintain a constant
speed with zero road grade and zero air resistance per unit weight. The rolling
resistance coefficient is usually found empirically and is approximately equal to
0.0035 for a 40-tonne Volvo FH truck operating in the summer [23]. External factors
such as rain, snow and the road surface can have significant effects on the rolling
resistance coefficient, increasing it by upwards of 50% [24][25]. For a truck driving
on a road, the normal force is Fz = mg cos α, where α is the road grade. The rolling
resistance force can then be rewritten as

Frr = Crrmg cos α. (2.3)

Aerodynamic drag

Aerodynamic drag forces, Fair arise as an object moves through a fluid. The equation
is given by

Fair = 1
2ρairCdAfv2, (2.4)

where ρair is the density of air, v the vehicle’s longitudinal speed, and CdAf , often
combined as one parameter denoted drag area, is the product of the drag coefficient
and the effective frontal area of the vehicle respectively. Considering the dynamics
of wind speed and heading the equation can be expanded into

Fair = 1
2ρairCdAf (β)v2

rel, (2.5)

where the parameter CdAf is now a function of the effective angle of attack β and
the vehicle’s speed relative to the air around it, vrel, both of which are given by the
equations

β = arctan vrel,y

vrel,x

− φ− π (2.6)

vrel =
√

v2
rel,x + v2

rel,y, (2.7a)

vrel,x = w cos θ − v cos φ, (2.7b)

vrel,y = w sin θ − v sin φ. (2.7c)

The parameter w is the wind speed, φ is the vehicle’s heading angle and θ is the
wind heading angle, both relative to a set coordinate frame. An illustration of all
those speeds can be seen in Figure 2.1. The value of CdAf (β) has been determined
empirically for a range of common angles β.

6


2. Theory

Figure 2.1: Illustration of wind’s relative speed.

Gradient force

The part of the gravitational force acting in the direction of the vehicle’s heading is
given by

Fg = mg sin α, (2.8)
where a positive slope angle α, signifying an increase in altitude ahead, results in
a resistive gradient force and a negative slope angle results in an additive gradient
force.

Braking force

Braking of the vehicle is done with the help of either friction brakes or regenerative
brakes, however, in the context of this work, all the variables that have br as an
index, will refer to the friction brakes only. More about the regenerative brake force
will be discussed further in this chapter.

2.1.2 Powertrain
The vehicle considered in this thesis is a heavy-duty fuel-cell electric truck. A
fuel cell converts hydrogen gas to electric energy which, assisted by a lithium-ion
battery, drives an electric machine. The battery can be charged by the fuel cell or
by regenerative braking but cannot be charged by plug-in.

Electric machine

The EM converts electric energy into tractive effort, driving the vehicle in the desired
direction. The conversion from motor torque τem to traction force FT depends on
the gear ratio kem of the selected gear γ, the fixed gear ratio of the transmission
ktm, the transmission efficiency ηtm and the effective tyre radius R as such

FT = ηtmkem(γ)ktmτem

R
. (2.9)

The efficiency ηtm depends on τem and the motor RPM. An efficiency map can
be seen in Figure 2.2a. In turn, the RPM depends on the vehicle speed and the
gear selection. The field of optimising gear selection has been well studied [18].

7


2. Theory

Inherently it is an integer problem which brings complexity to the NLP layer. It is
subsequently assumed that a low-level controller performs gear selection to achieve
close to optimal efficiency. By doing so, the efficiency mapping can be made directly
from vehicle speed to traction effort, as in Figure 2.2b.

0.60.6
0.6 0.6

0.80.8

0
.8

0
.8

0.8 0.8

0.860.86

0
.8

6

0
.8

6

0.86 0.86

0.90.9

0
.9

0
.9

0.9 0.9

0.91

0
.9

1

0.91 0.91

0.92

0
.9

2

0.92 0.92

0.9
3

0.93

0
.9

3

0.93

0
.9

3

0.93

0.9
4

0
.9

4

0 500 1000 1500 2000 2500

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0.6

0.65

0.7

0.75

0.8

0.85

0.9

(a) As a function of motor torque and
RPM. (b) At optimal gear.

Figure 2.2: Transmission efficiency maps as a function of motor torque and RPM
and at optimal gear.

The direct relationship between τem and motor power Pem is given by the angular
speed of the wheels as

τem = ηemPem

ωem

, (2.10)

which, under the assumption that there is zero slip between the wheels and the road,
is a function of the vehicle speed as such

τem = ηemPemR

kemktm(γ)v . (2.11)

Putting this into equation (2.9) shows the relation between motor input power Pem

from the DC-bus, vehicle speed v, and traction force FT to be

FT = ηtmηem
Pem

v
. (2.12)

For more convenience when solving the problem using numerical methods, the effi-
ciency can be represented by an EM loss term

FT = Pem − Pem,loss

v
, (2.13)

where Pem,loss = (1 − ηtmηem)Pem and Pem,loss > 0 for both positive and negative
motor powers.

8


2. Theory

Fuel cell

A fuel cell is an electrochemical system that is intended to convert the chemical
energy of a fuel (hydrogen in this case) and an oxidising agent (oxygen in this case)
into electricity through a pair of redox reactions. Polymer electrolyte membrane
fuel cells (PEMFC) utilise a membrane that is permeable for protons, but not for
the reactant gases and electrons [26]. Thanks to that property, this kind of fuel cell
is also known as a proton-exchange membrane fuel cell. As shown in Figure 2.3,
hydrogen enters the fuel cell at the anode and splits into protons and electrons. The
electrical chemical reaction that occurs at the anode is

H2 −−→ 2 H+ + 2 e−. (2.14)

The protons pass through the membrane while the electrons flow through an external
circuit, producing an electric current. After that, the oxygen entering the cathode
is combined with the resulting protons and electrons to produce water (either gas
or liquid) and heat. The reaction on the cathode’s side is thus

O2 + 4 H+ + 4 e− −−→ 2 H2O. (2.15)

Figure 2.3: Illustration of PEMFC’s principle of operation. [27], CC0

The entire reaction that occurs in the PEMFC is

2 H2 + O2 −−→ 2 H2O. (2.16)

In the context of the problem, knowing how a fuel cell’s hydrogen consumption and
efficiency vary depending on its power is important. According to [28], hydrogen

9


2. Theory

consumption can be approximated as a quadratic function of the fuel cell’s electrical
power Pfc as follows

H2,con(Pfc) = −(a2P
2
fc + a1Pfc + a0), (2.17)

where a0, a1, a2 ≥ 0. Hydrogen power can then be written as

PH2(Pfc) = H2,con(Pfc)∆H = −(a2P
2
fc + a1Pfc + a0)∆H, (2.18)

where ∆H is hydrogen’s lower heating value.

The efficiency of a fuel cell represents the share of chemical (hydrogen) energy that
is converted into electrical energy. The heat that is emitted during the chemical
reaction is considered to be a loss and most of it is removed by the cooling system.
Thus, fuel cell efficiency can be written as

ηfc(Pfc) = Pfc

|PH2(Pfc)|
= Pfc

(a2P 2
fc + a1Pfc + a0)∆H

. (2.19)

The fuel cell hydrogen power and efficiency curves can be seen in Figures 2.4a and
2.4b, respectively.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

(a) Hydrogen power.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) Fuel cell efficiency.

Figure 2.4: Hydrogen power and fuel cell efficiency, as functions of fuel cell power.

Accounting for the hydrogen consumption by the fuel cell and the amount refuelled
at hydrogen stations, the differential equation of the hydrogen level in the tank can
be written as follows

Ḣ2 =


H2,con, H2,rf > 0

H2,con + δst
CH2

trf

, H2,rf = 0 , (2.20)

which after substitution becomes

Ḣ2 =


−(a2P

2
fc + a1Pfc + a0), H2,rf > 0

−(a2P
2
fc + a1Pfc + a0) + δst

CH2

trf

, H2,rf = 0 , (2.21)

10


2. Theory

where H2,rf is the amount of hydrogen refuelled at a hydrogen station, CH2 is the
capacity of the vehicle’s hydrogen tank, trf is the time needed to refuel the hydrogen
tank to the fullest, δst is a binary variable identifying the presence of a hydrogen
station nearby at the current time. The refuelling time of FCEVs is usually below 15
minutes [29], which makes it reasonable to neglect the time dynamics of refuelling
and consider the refuelling rate constant.

It is important to note that operation with low fuel cell power Pfc should be avoided
as it leads to degradation of the catalyst in the fuel cell [30]. At the same time,
operating at high power levels is also detrimental to the fuel cell as it may incur fuel
starvation, which degrades both the catalyst and gas diffusion layers in the fuel cell
through carbon oxidation [31]. Constraining the fuel cell to work within a desired
operating region can thus lengthen its lifespan.

Since the fuel cell must operate in a very narrow temperature range around 70 ◦C,
the heat generated during its operation must be systematically removed [32]. The
heat generated by the fuel cell system can be approximated as

Pfc,heat(Pfc) = c2P
2
fc + c1Pfc + c0 (2.22)

where c0, c1, c2 ≥ 0.

Battery

The battery model is comprised of lithium-titanium-oxide (LTO) cells. These offer
a faster charge/discharge cycle and a longer life cycle than other lithium-based
batteries at the cost of lower terminal voltage [33]. Connecting several of these cells
in parallel yields a battery pack with a capacity Cb. The normalised battery charge,
the so-called SoC is defined as the relation between the current battery charge q and
its capacity

SoC = q

Cb

. (2.23)

The SoC’s dynamics can then be described as

˙SoC = Ib

Cb

, (2.24)

where Ib is the battery current. As the power of any circuit equals voltage multiplied
by the current in it, the relation between the normalised battery discharge rate ˙SoC
and the output power of the battery Pb is given by

˙SoC = −Pb

CbUoc(SoC, Tb)
, (2.25)

where the battery open circuit voltage Uoc is a function of SoC and the bat-
tery’s internal temperature Tb that is shown in Figure 2.6a, and Pb is the bat-
tery power that lies within an interval that is determined by SoC and Tb: Pb ∈
[Pb,min(SoC, Tb), Pb,max(SoC, Tb)]. Pb is positive when the battery is discharging,
and negative when charging. The upper bounds on the battery’s discharge and

11


2. Theory

charge powers as functions of battery temperature and SoC are shown in Figures
2.5a and 2.5b, respectively.

(a) Discharge power bound. (b) Charge power bound.

Figure 2.5: Normalised charge and discharge power bounds, as functions of battery
temperature and state of charge.

Battery temperature can be modelled with the help of the heat balance equation

Ṫb = Qpass + Qact + Qamb

cbmb

(2.26)

where cb and mb are the battery’s specific heat capacity and total mass, respectively,
Qpass is the rate of the heat generated by passive heat sources affecting the battery’s
temperature, Qact is the active heat rate supplied to or removed from the battery by
external auxiliary components that are intended to control battery’s temperature,
and Qamb is the rate of the heat dissipated into the ambient air. It is worth noting
that the incorporated model assumes a uniform distribution of temperature along
the battery, which substantially reduces the complexity of the problem.

Passive heat ultimately consists of the heat losses generated due to internal re-
sistance in the battery. These losses can be modelled according to Joule’s law as

Qpass = Rb(SoC, Tb)I2
b = Rb(SoC, Tb)

P 2
b

U2
oc(SoC, Tb)

, (2.27)

where Rb is the battery’s internal resistance which significantly decreases at higher
battery temperatures as one can see in Figure 2.6b.

12


2. Theory

(a) Open circuit voltage. (b) Internal resistance.

Figure 2.6: (a): Normalised open circuit voltage and internal resistance, as func-
tions of battery temperature and SoC.

Active heat generation rate can be defined as

Qact = ηthPth, (2.28)

where Pth is the power of the battery thermal system and ηth is its efficiency. Pth

is positive when heat is supplied to and negative when heat is removed from the
battery.

Heat dissipated into the ambient air can be modelled using Fourier’s law of heat
conduction

Qamb = kamb(Tamb − Tb), (2.29)
where Tamb is the temperature of the ambient air and kamb > 0 is the coefficient of
heat exchange between the battery’s case and air, which normally is dependent on
the vehicle’s speed but in this work will be chosen as constant for simplicity.

The battery modelling is based on [15] and [34], so for more detailed derivations
check out those.

Power balance equations

The fuel cell and battery supply power to the DC Bus, which then passes it to the
EM, battery and fuel cell thermal systems and other auxiliary loads. The battery
can be charged either through regenerative braking or by the fuel cell. The battery
cannot be charged and discharged at the same time. There are 4 different cases of
power flow in the vehicle:

(a) Both the battery and the fuel cell provide power to the EM and for the aux-
iliaries and thermal management needs as in Figure 2.7a.

(b) The vehicle is braking regeneratively but all the power goes to the auxiliaries
and thermal management needs as in Figure 2.7b. This case is not realistic as
the fuel cell’s minimum operating power is always larger than the auxiliaries
and thermal management needs.

13


2. Theory

(c) The battery is charging from the fuel cell as in Figure 2.7c.
(d) The battery is charging from the regenerative brakes as in Figure 2.7d.

(a) Both the battery and the fuel cell
act as power sources.

(b) All the power goes to the
auxiliaries and thermal management.

(c) Battery charging from the fuel cell.
(d) Battery charging from the

regenerative brakes.

Figure 2.7: Schemes representing the cases of the vehicle’s power flow.

The power balance equation can be written as

Pem = Pfc + Pb − Pfc,heat − |Pth| −Qpass − Paux, (2.30)

where Paux is the auxiliary power mainly used for air conditioning in the cabin.
Combining this with equations (2.13) and (2.30) gives an equation for traction effort

FT = 1
v

(Pfc + Pb − Pfc,heat − |Pth| −Qpass − Paux − Pem,loss). (2.31)

2.2 Optimal control
A common way of describing a system with nonlinear dynamics in continuous time
is by using a system of differential equations of the form

ẋ(t) = f(t, x(t), u(t)), (2.32)

where t is time, x is the state vector and u is the (control) input vector. The
state vector contains, at minimum, all the information necessary to unambiguously
describe the current state of the system. The input vector represents an external
impact coming from an actuator that is intended to affect the state of the system.

14


2. Theory

Optimal control aims to find an optimal control input trajectory u∗(t) and a corre-
sponding optimal state trajectory x∗(t) that minimises objective, or also called cost,
function J . Usually, the actuator is limited by some region of operation, control
constraint set U ∈ Rm, which the control trajectory must fulfil. The same applies
to the state vector which must lie within a constraint set X ∈ Rn. In addition,
if the system is analysed in a set time interval [t0, tN ], there will also appear an
initial state x(t0) = x0 and a terminal set Xf which x(tN) must belong to. Those
control trajectories, which meet the control and state constraints and lead the state
trajectory to the target set Xf are called feasible. It is worth mentioning that the
control constraint set U largely depends on the initial state x0, but also that it is
generally not fixed as it may vary with time t and the state value x. The same
goes for the state constraints X as they can also be time-dependent. All the afore-
mentioned constraints can be expressed as general inequality constraints of the form
g(t, x(t), u(t)) ≤ 0. So finally we can mathematically formulate the continuous-time
optimal control problem as

minimizex, u J(t, x(t), u(t)) (2.33a)

subject to ẋ(t) = f(t, x(t), u(t)), (2.33b)
x(t0) = x0, (2.33c)

g(t, x(t), u(t)) ≤ 0, (2.33d)

Optimal control problems are typically solved numerically with the help of compu-
tational software, therefore the dynamics described in (2.32) must be discretised,
which can be done by solving the initial value problem (IVP) described in (2.34) us-
ing integration methods such as Euler or other higher order Runge-Kutta methods.
After discretisation, the continuous objective function, dynamics and constraints
need to be evaluated at N + 1 time instances t = t0, t1, ..., tN only.

x(tn) = xn, ẋ = f(t, x(t), u(t)), for t ∈ [tn, tn+1] (2.34)

It is assumed in this work that u is a piece-wise constant vector, that is

u(t) = u(tn), ∀ t ∈ [tn, tn+1). (2.35)

2.2.1 Stochastic optimal control

Commonly, system dynamics are modelled to contain parameters representing some
sort of uncertainty. These can stem from external factors or internal mismatches in
the modelled and actual dynamics. In the presence of disturbance with uncertain
behaviour, here denoted w, the aim is to find the control input trajectory u∗(t) that
minimises the expected value of the cost function.

15


2. Theory

minimizex, u E [J(t, x(t), u(t))] (2.36a)

subject to ẋ(t) = f(t, x(t), u(t), w(t)), (2.36b)
x(0) = x0, (2.36c)

g(t, x(t), u(t)) ≤ 0, (2.36d)
w(t) ∼ fw(w(t)), (2.36e)

where fw is the probability density function (pdf) of the uncertainty.

2.2.2 Mixed-integer optimisation
If the problem described in equation (2.36) has one or several variables that are
restricted to be an integer, for instance, ui ∈ Ui ⊂ Z, the cost function becomes
non-smooth and non-differentiable, forcing one to solve a separate optimal control
problem for many combinations of the discrete variables, which is, of course, very
computationally demanding. However, dynamic programming is one of the algo-
rithms that can handle such problems without any increase in the computational
burden and more about it is written in the next subsection.

2.3 Dynamic programming
Invented by Richard Bellman, dynamic programming (DP) has been used as an
optimisation method since the 1950s and, naturally, has also found application in
optimal control [35, 36]. As with optimal control, its purpose is to find a feasible
optimal control trajectory u∗(t) for any initial state x0 over the time interval [t0, tN ].
As there are infinite time instances in the considered time interval, discretising the
system is necessary, as mentioned in section 2.2.

By discretising dynamical system for K+1 time instances t in [t0, tK ] the problem can
be considered to be split into K subproblems, where for problem k = 0, . . . , K − 1
the initial state x0 = xk is constrained by X(tk), likewise, for the terminal state
xN = xk+1 ∈ X(tk+1), and the optimal input u∗

k minimises the cost function J(t, x, u)
over the interval [tk, tk+1]. This requires that the cost function J can also be split,
such that over a time interval [tk, tk+1], J only depends on the input u and state x
trajectories within that interval. In more abstract terms, the cost at time tk can be
expressed as the cost of getting from xk to xk+1 and the cost associated with getting
from xk+1 to xK .

If it is possible splitting the problem in such a way the necessary conditions for
DP are fulfilled, and an action-value equation can be introduced, defined by the
Bellman equation

Q(t, x, u) = L(t, x, u) + V (t+, x+), (2.37)
where the first term L is commonly referred to as the stage cost, that is the cost
associated with taking the action u at time t in state x and V is the optimal cost of

16


2. Theory

the remaining subproblems, meaning V is the optimal solution to the action-value
function Q at the next time instance t+ and the resulting state values x+ from taking
control action u, expressed as

V (t, x) = min
u

Q(t, x, u). (2.38)

The control input u that minimises the action-value function at time t in state x is
referred to as the optimal policy, given by

π(t, x) = argmin
u

Q(t, x, u), (2.39)

which, when combined with equation (2.38) gives the Bellman equation for the value
function as

V (t, x) = L(t, x, π(t, x)) + V (t+, x+), (2.40)
expressing the value in state x at time t.

The purpose of DP is to consecutively determine the value function and the pol-
icy for all states and for each time step to then save this policy as a look-up table
allowing one to determine the optimal input trajectory u∗(t) for any given initial
value x(t0) by iteratively looking up the policy π(x) and integrating the dynamics
to attain x+. Noting how the action-value function depends on the value for future
actions but is independent from actions take at previous time instances one can
conclude that the policy must be determined recursively, starting at tK and step-
ping backwards in time, commonly known as backward induction. The backward
induction is initialized by defining the value at the final time instance tK as

V (tK , x) = T (x), (2.41)

where T is common notation for the terminal cost function. Thus, the cost function
J has been split according to

J(t, x(t), u(t)) =
K−1∑
i=1

L(ti, x(ti), u(ti)) + T (x(tK)). (2.42)

2.3.1 Stochastic dynamic programming
In the presence of external stochastic factors in the system, the model dynamics
become uncertain such that the continuous system dynamics become

ẋ(t) = f(t, x(t), u(t), w(t)), (2.43)

thus making x+ and therefore V (t+, x+) stochastic variables in the value function
in equation (2.40), which require additional measures to be taken in the approach
described in the previous section. In order to still fulfill the criteria for DP the un-
certainty w of the system must have the Markov property, meaning the uncertainty
of the system at a certain time instance can only depend on the state x and input
u at that time instance. This makes the optimal control problem a Markov decision
process, and using DP to solve it is referred to as stochastic dynamic programming

17


2. Theory

(SDP)[37].

Since the value V (x) is now stochastic, its values in the Bellman equations (2.37)
and (2.40) are represented by the expected values

Q(t, x, u) = L(t, x, u) + E
[
V (t+, x+) | t, x, u

]
. (2.44a)

V (t, x) = L(t, x, π(x)) + E
[
V (t+, x+) | t, x, π

]
, (2.44b)

To calculate the expected value it is convenient to introduce a probability density
function that would describe the conditional probability that the system’s next state
value will be x+ given that the system currently is at state x, the action u is taken
and the stochastic uncertainty w affects the system, at time instance t. Written
mathematically, that would be fx+| t,x,u(x+), which together with the definition of
mathematical expected value can be used to write the Bellman equations for the
stochastic case as

Q(t, x, u) = L(t, x, u) +
∫ ∞

-∞
V (t+, x+)fx+| t,x,u(x+)dx+, (2.45a)

V (t, x) = L(t, x, π(x)) +
∫ ∞

-∞
V (t+, x+)fx+| t,x,u(x+)dx+. (2.45b)

18


3
Methods

This chapter details how the hierarchical strategy was set up, how the route was
sampled, how cost functions were modelled and how the strategy evaluation was set
up.

3.1 Solution strategy
Before going to concrete mathematical formulations of the problems to solve, it is
worth describing the general strategy that was used to solve the stated problem of
mission management.

As mentioned before, the problem contains binary variables corresponding to deci-
sions to stop at hydrogen stations along the route. These decision variables make the
objective function discontinuous and, therefore non-differentiable and unapproach-
able as an NLP. Thus it becomes a mixed-integer nonlinear program (MINLP) that
has a combinatorial nature and is very computationally expensive to solve as the
solution lies in solving a separate NLP problem for many combinations of the binary
variables as stated in 2.2.2.

In this context, the work in this thesis seeks to investigate the benefits and draw-
backs of an approach with the problem decomposed into two parts. The first part,
the so-called top layer, was to solve the MINLP problem with a method that can
handle binary variables, in this case SDP. Since dynamic programming explores ev-
ery possible combination of states and inputs its computational complexity increases
exponentially with the number of states and inputs, as well as stochastic variables.
Due to this the mathematical model used in this layer had to undergo some sim-
plifications to account for that. One of the main simplifications was the exclusion
of the thermal management of the battery from the scope of the optimisation, thus
making battery temperature a fixed parameter for the entire route. As illustrated
in Figure 3.1, the layer takes mission and vehicle parameters as input and outputs
the optimal station selection, eliminating this decision from the second part, the
so-called bottom layer.

Since the binary decisions are taken the problem is no longer mixed-integer and
a global NLP optimisation can be conducted with more accurate and comprehen-
sive dynamics, more inputs and more states. This layer outputs the final speed,
battery temperature and power split trajectories. Unlike the top layer, the bottom

19


3. Methods

layer does not account for any uncertain parameters.

Figure 3.1: Block diagram illustrating the hierarchical approach.

It is essential to note that in order to reduce the number of states, speed was not
optimised in the top layer and a predefined base speed profile was calculated instead.
Therefore, a feedback loop with an optimised speed profile from the bottom layer
to the top layer was implemented as an option. This feedback is illustrated with a
dashed line in Figure 3.1.

3.2 Discretisation
As mentioned in section 2.2, the problem needs to be discretised to be solved nu-
merically by a computer. The problem is split into a number of stages, K for the
top layer, and N for the bottom layer, for which an optimisation problem is solved
separately. The route thus has to be divided into K and N road segments respec-
tively, over which the dynamics are integrated in the solving process. This section
describes how the problem is discretised and how the data is sampled to reach the
desired number of data points.

3.2.1 Space sampling
Since the positions of hydrogen stations are fixed along the route, it poses a more
convenient solution to solve the problem in the spatial domain, meaning that all the
time derivatives written in section 2.1 have to be rewritten as spatial ones. For the
rest of this paper, the notation x′ will represent the space derivative of an arbitrary
variable x, i.e. x′ = dx/ds, where s is the vehicle’s position along the route. The
transition from the time derivative ẋ to the space domain is done using the equation

x′ = dx

ds
= dx

dt

dt

ds
= ẋ

v
. (3.1)

The differential equations (2.1), (2.21), (2.25) and (2.26) thus become

mv′ = 1
v

(FT − Frr − Fair − Fg − Fbr), (3.2a)

H′
2 = −1

v
(a2P

2
fc + a1Pfc + a0) + δstδDirac(s)H2,rf , (3.2b)

20


3. Methods

SoC ′ = −Pb

vCbUoc(SoC, Tb)
, (3.2c)

T ′
b = Qpass + Qact + Qamb

vcbmb

. (3.2d)

δDirac(s) in (3.2b) is the Dirac delta function needed to demonstrate that refuelling
happens instantly in the space domain. Spatial sampling will be used for both layers
in the control hierarchy. The Bellman equations for stochastic dynamic program-
ming, given in (2.44), will also have to be rewritten in the space domain

Q(s, x, u) = L(s, x, u) + E
[
V (s+, x+) | s, x, u

]
. (3.3a)

V (s, x) = L(s, x, π(x)) + E
[
V (s+, x+) | s, x, π

]
, (3.3b)

3.2.2 Data downsampling
Originally, road data parameters such as position, altitude, speed limits and weather
are provided at a much higher resolution than the desired number of stages in the
solver. Hence, the data must be downsampled to K and N data points, a process
that can have a significant effect on the simulation’s accuracy. Unless the original
number of data points is evenly divisible by the desired data points the data can
not be downsampled into segments of equal distance, so-called equidistantly, with-
out some sort of mathematical interpolation. The process of interpolating and then
downsampling data introduces data points that were originally not part of the data
set which risks decreasing the accuracy of the data.

Additionally, different road segments affect the dynamics differently. A long flat
road segment requires only two data points to be accurately described, while a
hilly segment of the same length requires more data points to be described with
equal accuracy. The parameter with the largest effect on dynamics is altitude. By
downsampling the road data using an algorithm that preserves the highest detail
in the altitude profile, while not introducing interpolated data points, the simula-
tion should in theory become more accurate with the same number of data points N .

Such an algorithm would be computationally intensive, however, an algorithm aim-
ing to achieve close to optimal results can have a much lower computation time at
the trade-off of optimality. One created by the authors of [4] was used to downsam-
ple data non-equidistantly to preserve the detail in altitude with the intention of
decreasing computation time. In essence, the algorithm splits the data into windows
and eliminates redundant data points. A way to evaluate the detail retention is to
upsample to the original resolution using first-order hold and calculate the mean
squared error (MSE). Another proposed metric is the cumulative error in energy
consumption when simulating a truck driving the downsampled routes compared to
the original route.

A comparison of equidistant and non-equidistant downsampling was done for two
routes. For the first route, of length 1793 km, the data was downsampled to

21


3. Methods

N = 100, . . . , 1000 stages, while for the second route, of length 461 km, the data
was downsampled to N = 100, . . . , 500 stages. To calculate the energy consumption
a speed profile was generated by generating a noisy speed profile around 70 km/h
and integrating equation (3.2a) using explicit Euler.

Additional parameters such as speed limits and station positions were considered,
but ultimately no changes were made to the algorithm. This means that currently
speed limit changes and hydrogen stations are positioned at the longitudinally clos-
est data point post-downsampling.

3.3 Problem formulations
In this section the models and cost functions of the two layers are detailed.

3.3.1 Top layer – stochastic dynamic programming
The purpose of the top layer is to select stations from the available ones at which to
stop and fuel during the mission and decide how much hydrogen should be refuelled.

Mathematical model

States and inputs of the system will thus be

x(s) =
[
SoC

H̄2

]
, u(s) =

[
Pfc

H̄2,rf

]
, (3.4)

where H̄2 = H2/CH2 is the normalised hydrogen level in the tank and H̄2,rf =
H2,rf/CH2 is the normalised amount of hydrogen refuelled at a hydrogen station.
Note that if H̄2,rf = 0, the vehicle should pass the station without refuelling, which
implies that this input doubles as a binary variable.

As vehicle speed is neither a state nor an input it must be deterministic, or mod-
elled as a stochastic variable. Preprocessing of vehicle and road data, such as speed
limits, power output limits, road grade, desired arrival time and maximum risk of
failure is done to produce a predefined base speed profile. The algorithm for this
can be found in Appendix A.1.

The dynamics used to find x+ in the SPD algorithm in equation (3.3) start with the
power input to the EM, given by combining equations (3.2a) and (2.12)

Pem = ηtmηemv (v′mv + Frr + Fg + Fair) . (3.5)

As road parameters are known, as well as a predefined speed trajectory, the input
power to the EM can be determined and is henceforth referred to as the power de-
mand, as it governs how much power needs to be output by the DC bus to meet the
demand from the machine to fullfil the differential equations. One can notice that

22


3. Methods

Pem depends on several external factors, such as road grade, traffic speed, wind di-
rection and speed and rolling resistance which all display a random behaviour in real
life, or are impossible to predict flawlessly hours in advance. This makes modelling
Pem as a single stochastic variable a suitable choice, to represent the uncertainty
while reducing the computational complexity. In short, a number of vehicle and
mission-specific parameters determine the shape of the probability distribution of
Pem, which differs for each spatial step. A more detailed description of Pem mod-
elling is done later in this section.

Assuming constant battery temperature, the vehicle’s power balance (??) can be
simplified to

Pfc + Pb = Pfc,heat + Paux + Pb,loss + Pem + Pem,loss, (3.6)

where, as battery temperature is outside of the optimisation scope in this layer,
battery losses are modelled as a quadratic fit independent of the battery temperature

Pb,loss = (b0 − b1SoC)P 2
b , b0, b1 ≥ 0. (3.7)

This makes equation (3.6) quadratic in Pb and can for a given Pem and a selected Pfc

be solved to find Pb. If, at a certain spatial step, no solution exists, or the solution
exceeds the state of charge-dependent power output limit Pb,max(SoC) the power
demand can not be met for the given Pfc and it is not considered for the policy
π. If the solution exceeds the regenerative charging power limit Pb,min(SoC), or the
power needed to recharge the battery fully, given from inverting (3.9), another term
is added to the power balance, namely the braking power Pbr, to allow for a solution
to be found, given the braking power is minimised. Equation (3.6) then becomes

Pfc + Pb − Pbr = Pfc,heat + Paux + Pb,loss + Pem + Pem,loss. (3.8)

It should be noted that friction brakes Pbr are applied independently of the EM and
do not induce additional losses in the EM, thus this representation is not perfectly
accurate. Finally, after Pb has been calculated, the next state values x+ can be
found by integrating differential equations

x′ =
[
SoC ′

H̄′
2

]
=


− Pb

vCbUoc(SoC)
− 1

vCH2

(
a2P

2
fc + a1Pfc + a0

)
+ δstδDiracH̄2,rf

 . (3.9)

Objective function

The dynamics in section 3.3.1 return three outputs: the battery power consumption
Pb, the braking power required Pbr, and a binary variable δπ indicating whether the
input should be considered for the policy or not, determined by its feasibility, as
mentioned above.

y(s) =

 Pb

Pbr

δπ

 (3.10)

23


3. Methods

Additionally, a separate policy is determined for each of the control inputs, as

π(s, x(s)) =
[

P ∗
fc

H̄∗
2,rf

]
. (3.11)

The terminal cost function is defined as

T (x(sN)) = x⊤
e εexe + εC , xe = x(sN)− xtarget, (3.12)

where
εe =

[
εe,SoC 0

0 εe,H2

]
(3.13)

are the costs of deviating from the desired SoC and H2 targets and εC is the cost
of mission completion. The cost at any other spatial point is the sum of the cost
of driving and refuelling, as long as the risk of failure is below or equal our risk
tolerance κ, as such

L(s) =

Ldr(s, x(s), u(s)) + Lrf (s, x(s), u(s)), Pr (δπ(s) | u(s)) ≤ κ

∞, Pr (δπ(s) | u(s)) > κ
, (3.14)

The cost function of driving is in turn the sum of three costs

Ldr(s, x(s), u(s)) = Llim(x(s)) + Ltr(x(s), u(s)) + Lbr(y(s)), (3.15)

where the cost of leaving the state constraint set is

Llim(x(s)) = Llim,SoC(x(s)) + Llim,H2(x(s)), (3.16)

Llim,xi
(x(s)) =



εllim,xi

xi,min − xi(s)
xi,min

, xi(s) < xi,min

εulim,xi

xi(s)− xi,max

xi,max

, xi(s) > xi,max

εfail, xi(s) /∈ Xi =
0, otherwise

, (3.17)

where εulim,xi
and εllim,xi

are the costs of breaching a state’s upper and lower bounds
{xi,min, xi,max} ⊆ Xi, respectively, and εfail is the cost of failing the mission by
exiting the feasible state space Xi, the cost of breaching the maximum allowed level
of the fuel cell’s transient is

Ltr(x(s), u(s)) =

εtr, |P ∗
fc(s+, x+(s))− Pfc(s)| > P ′

fc,max∆s(s)
0, otherwise

, (3.18)

where εtr is the cost of exceeding the fuel cell’s transient’s maximum allowed level,
and the cost of using friction brakes

Lbr(y(s)) = εbr
Pbr(s)∆s(s)
Fbr,maxv(s) , (3.19)

24


3. Methods

where εbr is the cost of friction braking. The cost function of refuelling is calculated
as follows

Lr(s, x(s), u(s)) =

δst(s)
(
εfuel(s)H̄2,rf (s)CH2 + εstop(s)

)
H̄2,rf (s) > 0

0, H̄2,rf (s) = 0
, (3.20)

where εfuel and εstop are the costs of fuel (hydrogen) and stopping, respectively.

Since the output y(s) is stochastic, making L(s) stochastic, both terms in the action-
value function in equation (2.44a) are stochastic and the action-value function is
calculated as

Q(s, x(s), u(s)) = E
[
L(s, x(s), u(s)) + V (s+, x+(s)) | s, x(s), u(s)

]
. (3.21)

The value function and optimal policy for each input is saved at every stage as

V [k, x, u] = min
u

Q(s[k], x, u) (3.22a)

P ∗
fc[k, x] = argmin

Pfc

Q(s[k], x, u), k = 1, . . . , K (3.22b)

H̄∗
2,rf [k, x] = argmin

H̄2,rf

Q(s[k], x, u), δst(s[k]) = 1 (3.22c)

As these are only saved for a discrete set of stages, states and inputs linear inter-
polation is done to calculate the value and policy at any point in space and for any
state and input values with the exception of refuelling decisions H̄2,rf . As this input
has an inherent binary interpretation its value is often discontinuous with regard
to state values. Hence, it is important to identify discontinuities and interpolate
using a different method around these. An example of this is shown in Figure 3.2
where an estimation of the discontinuity is illustrated and the difference between
linear and nearest interpolation is shown. An intuitive explanation of this is that
the truck will either pass the station or stop and refuel enough to finish the mission
without stopping again. Refuelling less than the policy causes the truck to need
another stop later on, which is not cost-effective.

Figure 3.2: Comparison of interpolation methods in discontinuous region.

25


3. Methods

Stochastic variables

Implementing dynamic programming means quantising states, inputs and distur-
bances in a range of interest as xi = x1

i , . . . , x
Mxi
i ∈ Xi where Mxi

represents the
number of discrete values at which to determine a policy for state xi. Likewise,
Mui

denotes the number of inputs to consider for the policy of input ui, and Mwi

the number of samples from the random distribution of wi to perturb the system
with. The number of calculations to be made for each spatial sample can thus be
calculated by ∏

j Mj. Because of this, the number of states and inputs is already
kept to a minimum; likewise, this is desired for the number of disturbances. Hence,
it is of interest to approximate the disturbances with a single variable and associated
probability density function.

The variables that were modelled as stochastic were wind speed w, wind angle
θ, rolling resistance coefficient Crr due to road conditions, mass m, and speed re-
duction due to traffic ∆vtr. As all of these eventually affect the power demand Pem

it was chosen as the variable which the single disturbance should affect.

The vehicle dynamics are not linear in all stochastic variables, nor are all of the
variables modelled as Gaussian distributions, hence it is not possible to implicitly
derive a density function for a desired combined variable. However, by sampling
each stochastic variable as shown in Figure 3.3 and calculating the power demand
for the sampled values, akin to a Monte Carlo method, it is possible to determine a
combined probability distribution from the resulting data points. The distribution
could then be approximated using a normal distribution as shown in Figure 3.4,
allowing it to be represented mathematically using only two parameters, a mean
and a variance.

0 10 20 30
0

0.02

0.04

0.06

0.08

0.1

39.5 40 40.5
0

0.5

1

1.5
10

-3

0.0035 0.004 0.0045
0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50
0

0.02

0.04

0.06

0.08
probability

sampled points

Figure 3.3: Illustration of stochastic variables’ sampling.

26


3. Methods

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.002

0.004

0.006

0.008

0.01

0.012

simulation

normal fit

base demand

Figure 3.4: Illustration of normal approximation of the stochastic power demand.
Probability density function of total power demand (v = 70 km/h, α = 0, v′ = 0).

As some stochastic variables were modelled as a distribution around a given base
value, i.e. mass was normally distributed around a known approximate vehicle mass
m and resulting speed was the product of the base speed v and the speed reduction
from traffic, the resulting power demand’s mean µPem and standard deviation σPem

become a function of some base values, as well as some spatially varying parameters
such as road grade and acceleration, which have a large effect on the power demand.

µPEM
(s) = µPEM

(Crr0, m, v(s), v′(s), α(s)) (3.23a)

σPEM
(s) = σPEM

(Crr0, m, v(s), v′(s), α(s)) (3.23b)

A joint probability distribution for vehicle speed given power demand was also de-
termined, with covariance 1, as the speed is used in the integration step to calculate
the total power consumption, see equation (3.9). Consequently, the distribution of
energy consumption varies less than that of power demand, as outliers in power
demand correlate with a decreased speed, and a change in speed entails an implicit
change in the step size in the Euler integration.

3.3.2 Bottom layer – non-linear programming

Mathematical model

After refuelling decisions have been made at the upper level and the problem is no
longer mixed-integer, an NLP with more accurate and comprehensive dynamics can
be formulated. The states and inputs of the system in discrete distance space thus

27


3. Methods

become

x[n] =



SoC

H̄2
t
v2

Tb

Ffc[n− 1]


, u[n] =



sH2

sSoC

st

Ffc

Fb

Fbr

Pth


, (3.24)

where sH2 , sSoC, st are slack variables that are used for bound relaxation for H2, SoC, t,
respectively.

Using distance sampling substantially increases the complexity of the state dynam-
ics, especially for the vehicle’s speed. This complexity can be partly eliminated by
using v2 instead of v as a state representing the vehicle’s speed and that is thanks
to the following property

dv2

ds
= 2v

dv

ds
= 2v

dv

dt

dt

ds
= 2v̇. (3.25)

In order to make the equation of speed dynamics linear in control inputs, some
power control inputs have been replaced by their force counterparts. The relation
between the two can be described as

Ffc = Pfc

v
, (3.26a)

Fb = Pb

v
, (3.26b)

Fbr = Pbr

v
. (3.26c)

Using the mathematical relations written before and accounting for some transfor-
mation made specifically for this layer, the state dynamics become

SoC ′ = −Fb

CbUoc(SoC, Tb)
, (3.27a)

H̄′
2 = − 1√

v2CH2

(
a2F

2
fcv

2 + a1Ffc

√
v2 + a0

)
+ δstδDiracH̄2,rf , (3.27b)

t′ = 1√
v2

+ δstδDirac · (trf + tq), (3.27c)

(v2)′ = 2
m

(
Ffc + Fb − Fbr−

− Qpass + PEM,loss + |Pth|+ Pfc,heat + Paux√
v2

− Frr − Fair − Fg

)
, (3.27d)

T ′
b = Qpass + Qact + Qamb√

v2cbmb

. (3.27e)

28


3. Methods

tq in equation (3.27c) is queing time of a hydrogen station. The complete discretised
mathematical formulation of the problem can be found in Appendix B. It is solved
numerically with the help of Embotech’s FORCESPRO solver [38] which utilises a
Newton method-based approach by the name of nonlinear interior point (IP). More
information about the solver can be found in [4] and [39]. The problem is solved to
minimise the objective function described below.

Objective function

The objective function for the bottom layer is defined as follows

J(x, u) = ωtt[N ]+

+
N∑

n=1
(ωs(sH2 [n] + sSoC [n] + st[n]) + ωH2δst[n]H2,rf [n] + ωbrFbr[n]∆s[n]) , (3.28)

where time cost ωt is used to minimise the time spent en route, hydrogen cost ωH2 is
used to minimise the refuelled amount of hydrogen, slack variables costs ωs is used
to minimise their use and enforce constraint fulfilment and cost of friction braking
ωbr is used to minimise the use of friction brakes.

3.4 Simulation setups
Reiterating the research questions asked:

• Can splitting the mixed-integer problem into two layers, solving the integer
decisions in the higher layer, thus making the second layer a smooth non-linear
problem, yield a similar optimal solution as the smoothened MINLP problem?

• Further, can modelling certain dynamics as stochastic processes improve the
robustness of the refuelling station selection?

Aiming to answer these questions a number of simulation tests were set up, each
quantifying the claims above or additional metrics that were deemed interesting.
The test cases aim to answer the following questions:
Test 1 Does the bilevel approach produce results of equal or lower total cost, and

if so, how consistently? In the case of disturbances, does the stochastic
strategy find a feasible solution more consistently than the deterministic
strategy?

Test 2 Are there any possible benefits to be had when iterating the two layers until
convergence?

Test 3 What is the tradeoff between computational time and optimality or robust-
ness? Can sampling the route non-equidistantly assist the strategy in any
of the above questions?

3.4.1 Test 1 – Robustness analysis
The first set of simulations was done to evaluate the optimality and robustness
of station selection using the two-layer approach. The following strategies were
compared:

29


3. Methods

SDPκ Use a bilevel approach with stochastic dynamic programming selecting sta-
tions, with a maximum failure risk of κ.

DP Use a bilevel approach with deterministic dynamic programming selecting sta-
tions.

Sigm Use a sigmoid activation function, similar to the one in [4], to smoothen the
MINLP problem allowing stations to be selected directly in one layer.

Heur Use a bilevel approach with a heuristic algorithm based on estimated power
consumption selecting stations. The algorithm can be found in Appendix A.2.

The simulations were conducted on two routes: one from Zagreb, Croatia to Paris,
France (1559 km) and another from Gothenburg, Sweden to Kiruna, Sweden (1793
km). In Tables 3.1 and 3.2, the parameters of the hydrogen stations that lie along
the chosen routes are listed. Using the different strategies listed above as the top
layer, selecting stations and refuelling amount, the evaluation was then based on the
total cost J , given from running the top layer results in the bottom layer (NLP).
The same simulation was then done but with a constant disturbance applied to the
rolling resistance to simulate wet road conditions due to precipitation. The purpose
of this was to see whether the station selection and refuelling amount were sufficient
in an uncertain scenario. The parameters used for the top and bottom layer were
K = 600, N = 800, Mx1 = Mx2 = 31, Mu = 21, Mw = 13 and with initial and
terminal conditions x1[0] = x2[0] = 0.90 and x1[N ] = 0.30, x2[N ] = 0.15.

Table 3.1: Station positions and hydrogen prices for the Zagreb–Paris route.

Station Position [km] Price [SEK/kg] Station Position [km] Price [SEK/kg]
1 93.64 60.3 17 587.18 67.0
2 96.56 83.0 18 602.79 62.9
3 109.24 59.7 19 748.12 74.5
4 141.43 84.7 20 821.28 72.8
5 156.06 69.6 21 824.20 73.5
6 223.36 73.9 22 877.85 61.8
7 249.70 79.3 23 922.72 62.5
8 273.11 70.6 24 977.34 61.4
9 310.17 62.9 25 1056.34 64.0
10 323.83 59.1 26 1084.62 57.5
11 342.36 55.3 27 1122.66 58.3
12 376.50 78.3 28 1152.91 86.2
13 474.04 67.9 29 1164.61 86.1
14 503.30 58.0 30 1362.61 59.1
15 549.14 83.3 31 1392.85 53.4
16 565.72 62.1 32 1426.01 59.7

30


3. Methods

Table 3.2: Station positions and hydrogen prices for the Gothenburg–Kiruna route.

Station Position [km] Price [SEK/kg] Station Position [km] Price [SEK/kg]
1 6.73 67.1 19 667.27 59.4
2 13.46 77.7 20 670.63 80.5
3 48.22 52.5 21 712.12 86.4
4 50.47 63.1 22 745.77 63.5
5 70.65 57.6 23 753.62 76.7
6 172.70 55.7 24 802.96 83.2
7 176.07 59.0 25 855.67 83.8
8 235.51 64.6 26 905.01 55.5
9 259.06 66.4 27 907.26 53.9
10 288.21 71.4 28 908.38 58.4
11 327.46 67.2 29 1025.01 83.2
12 344.29 76.5 30 1160.71 55.9
13 373.44 59.7 31 1166.31 67.2
14 382.42 83.2 32 1167.43 86.0
15 513.63 53.5 33 1304.25 71.2
16 527.08 67.1 34 1503.87 76.7
17 529.33 72.1 35 1790.96 63.5
18 617.92 57.4 36 1792.09 76.5

3.4.2 Test 2 – Iterative method
The premise of this test is that on the one hand, the optimal station selection is
dependent on the speed travelled, as that determines the efficiency of the energy
consumption and the total trip time, while on the other hand, the optimal speed
profile is dependent on which stations were selected, i.e. when the truck needs to
lower the speed to have sufficient energy until the next station is reached. This
two-way dependency is one reason why the hierarchical strategy can not be con-
sidered globally optimal, as it optimises one variable at a time. One strategy to
mitigate this drawback of the bilevel approach is to iterate the two layers until the
solution potentially converges to a solution upon which no further improvements
can be made. This is no guarantee that the solution will be globally optimal, but
rather improving the solution marginally.

The idea is thus to run the mission optimiser, save the speed profile and rerun
the mission optimiser using this optimal speed profile to potentially get a new opti-
mal station selection, for which a new optimal speed profile is found, as illustrated in
Figure 3.1. When doing so, a question arises: Should the SDP policy be regenerated
using the new speed profile as well, or just the forward simulation? Generating the
SDP policy makes up a majority of the computation time of the mission optimiser,
incentivising reusing the policy generated initially. On the contrary, the policy is
determined for a certain speed profile, meaning if this changes significantly the pol-
icy is no longer optimal and might not give desirable results.

31


3. Methods

The simulations run in this test were done for three different routes, with two sets
of parameter values, specifically the stage resolution for the two layers, K and N ,
and trying both approaches of regenerating the SDP policy for each iteration and
utilising the initial SDP policy. This comes out to 12 simulations in total, for which
the parameters are detailed in Table 3.3. All simulations use non-equidistant sam-
pling, and a risk tolerance of κ = 0.05. As conditions for convergence, the stations
selected should be the same for three consecutive iterations and the total cost J
should not increase or decrease by more than 1% from the previous iteration, with
a maximum of 10 iterations allowed.

Table 3.3: Initial and terminal state constraints, stage quantisation and SDP
regeneration decision for simulations 1-12.

simulation x1[0] x2[0] x1[N ] x2[N ] K N reg. SDP
Zagreb – Paris

1 0.90 0.90 0.30 0.15 250 500 no
2 0.90 0.90 0.30 0.15 250 500 yes
3 0.90 0.90 0.30 0.15 500 1000 no
4 0.90 0.90 0.30 0.15 500 1000 yes

Gothenburg – Kiruna
5 0.90 0.90 0.30 0.15 250 500 no
6 0.90 0.90 0.30 0.15 250 500 yes
7 0.90 0.90 0.30 0.15 500 1000 no
8 0.90 0.90 0.30 0.15 500 1000 yes

Gothenburg – Rødby
9 0.90 0.90 0.75 0.75 250 500 no
10 0.90 0.90 0.75 0.75 250 500 yes
11 0.90 0.90 0.75 0.75 500 1000 no
12 0.90 0.90 0.75 0.75 500 1000 yes

3.4.3 Test 3 – Resolution analysis
Another set of simulations was be aimed at investigating how different resolutions
of the state vector quantisation in the MINLP M and stage quantisation in both
problems K, N can affect the optimality of the overall solution. The purpose of this
test is to determine whether the two-layer solution can perform competitively at a
similar computation time as the other solutions. The simulations were conducted
based on a route from Gothenburg, Sweden to Kiruna, Sweden (1793 km). All
combinations of the following parameters were tested:

• Mx1 = Mx2 = {11, 21, 31},
• N = K = {400, 800, 1200, 1600},

and the parameters total cost J , the total amount of refuelled hydrogen, the selected
stations, the travel time tN , and the computation time will be evaluated. Initial and
terminal conditions were x1[0] = x2[0] = 0.90, x1[N ] = x2[N ] = 0.2. One set
of simulations were run using equidistant sampling, and one using non-equidistant
sampling.

32


4
Results

4.1 Downsampling

The results for the comparison between the equidistant and non-equidistant down-
sampling of the road data is shown in Figure 4.1. An exponential fit was made to
the MSE data for route one, illustrated in Figure 4.2. This shows that for any N
over 175, a 10% increase in N is required to achieve the same MSE for equidistant
as when using non-equidistant. Looking at total energy consumption error the non-
equidistant sampling performs better on average than equidistant for route one and
worse on average for route two. The total error in energy consumption is notably
less than 0.3% for as low as 100 stages using both methods.

(a) MSE for route one. (b) Energy consumption for route one.

(c) MSE for route two. (d) Energy consumption for route two.

Figure 4.1: Comparison of equidistant and non-equidistant sampling for two dif-
ferent routes, with original resolutions of N = 15000 and N = 6000, respectively.

33


4. Results

Figure 4.2: exponential fit of MSE data for route one.

34


4. Results

4.2 Test 1 – Robustness analysis

The results of the simulations for the two chosen routes with and without uncertainty
added can be seen in Tables 4.1 - 4.4 below. A dagger (†) signifies a failure of
convergence of the solver in the bottom layer, determined either from the solver’s
output message or by observing the results. Failure to converge can be a consequence
of the station selection being infeasible, but such deductions should be made with
caution. Due to the non-linearity of the problem tiny changes in the parameters can
cause the solver to not converge to a solution.

Table 4.1: Simulation results with different methods for the route Zagreb–Paris
with no uncertainty applied.

method J tN [min] Selected
stations

∑ H̄2,rf Ebr [J ] tcomp [sec]

variable H2 cost, equidistant sampling
SDPκ=0.05 394664 1301 18 31 0.7463 1598 68.61
SDPκ=0.50 394592 1301 18 31 0.7471 1598 55.65
DP † † 22 28 0.4473 † 56.67
sigm 559505 1864 16 0.3857 1599 80.76
heur 394739 1302 20 31 0.7437 1598 20.09

constant H2 cost, equidistant sampling
SDPκ=0.05 393973 1301 18 31 0.7379 1598 54.01
SDPκ=0.50 394612 1302 18 31 0.7518 1598 55.75
DP 385738 1283 17 0.4479 4488 56.74
sigm 536466 1859 16 0.3931 1599 40.01
heur 395381 1302 20 31 0.7437 1598 19.65

constant H2 cost, non-equidistant sampling
SDPκ=0.05 394281 1302 17 24 0.8058 1598 67.09
SDPκ=0.50 400229 1322 17 28 30 0.7861 1598 68.73
DP † † 18 0.4445 † 56.53
sigm 504915 2307 16 0.3687 1599 39.92
heur 394106 1302 20 31 0.7437 1598 19.67

variable H2 cost, non-equidistant sampling
SDPκ=0.05 397766 1307 19 32 0.7341 1598 67.41
SDPκ=0.50 402050 1308 19 32 0.7372 1598 67.97
DP † † 23 0.4209 † 56.93
sigm 486618 2133 17 26 0.3973 1599 40.93
heur 398233 1309 21 32 0.7279 1598 38.13

35


4. Results

Table 4.2: Simulation results with different methods for the route Zagreb–Paris,
with uncertainty applied after the station selection.

method J tN [min] Selected
stations

∑ H̄2,rf Ebr [J ] tcomp [sec]

variable H2 cost, equidistant sampling
SDPκ=0.05 701179 2015 17 28 0.9448 1602 44.59
SDPκ=0.50 707140 2018 17 28 0.9439 1602 45.88
DP 516429 1710 20 21 26 0.5479 1598 34.03
sigm 509478 1689 11 16 0.4200 1598 123.04
heur 682114 1976 18 28 0.9357 1602 15.44

constant H2 cost, equidistant sampling
SDPκ=0.05 724601 2063 14 28 0.9538 1603 68.86
SDPκ=0.50 708528 2019 13 25 0.9554 1602 70.32
DP 504097 1671 20 0.5819 1598 33.65
sigm 513491 1689 11 16 0.4198 1598 82.87
heur 681484 1976 18 28 0.9357 1602 16.21

constant H2 cost, non-equidistant sampling
SDPκ=0.05 † † 15 26 0.9568 † 72.05
SDPκ=0.50 † † 15 25 0.9486 † 42.91
DP 510064 1691 18 21 0.5914 1598 32.60
sigm 509803 1690 11 16 0.4208 1598 81.09
heur 681523 1976 18 28 0.9357 1602 15.43

variable H2 cost, non-equidistant sampling
SDPκ=0.05 † † 18 29 0.9359 † 70.91
SDPκ=0.50 † † 18 29 0.9417 † 67.98
DP 581831 1718 21 27 29 0.5762 1601 55.19
sigm 563614 1708 17 27 0.5559 1600 76.90
heur † † 19 29 0.9186 † 38.60

In Tables 4.1 and 4.2 the deterministic DP strategy performs the best, cost-wise,
while failing to converge three out of eight times. The heuristics algorithm performs
on par with the SDP strategy for the deterministic simulations and outperform it
for the uncertain simulations. The sigmoid strategy is the only one to converge
every time, performing the worst in the deterministic set, but the best out of all
the strategies in the uncertain set. Perhaps this is a consequence of the uncertainty
being part of the bottom layer optimisation and not simulated in an additional
plant-controller model, indicating that the method of evaluation is flawed.

36


4. Results

Table 4.3: Simulation results with different methods for the route Gothenburg–
Kiruna with no uncertainty applied.

method J tN [min] Selected
stations

∑ H̄2,rf Ebr [J ] tcomp [sec]

constant H2 cost, equidistant sampling
SDPκ=0.05 397956 1294 18 0.7042 16002 67.24
SDPκ=0.50 397826 1293 18 0.7016 16002 69.00
DP 396182 1292 23 0.4725 16001 58.51
sigm 481240 1383 17 0.4297 160020 18.80
heur 409092 1330 20, 31 0.7424 16005 44.11

variable H2 cost, equidistant sampling
SDPκ=0.05 415933 1352 18, 28 0.8137 16009 52.03
SDPκ=0.50 403478 1314 19, 20 0.7285 16003 73.81
DP 396013 1292 19 0.4948 16001 64.27
sigm 481137 1383 20, 31 0.4305 160020 16.60
heur 408950 1330 20, 31 0.7424 16005 45.21

variable H2 cost, non-equidistant sampling
SDPκ=0.05 397340 1294 19 0.7286 16003 65.88
SDPκ=0.50 397701 1294 18 0.6979 16002 65.56
DP 395816 1292 17 0.4704 16001 60.35
sigm 866507 1394 17 0.4534 3982674 16.57
heur 409507 1330 20, 21 0.7424 16005 50.17

constant H2 cost, non-equidistant sampling
SDPκ=0.05 934964 1300 19 0.7057 5365848 70.13
SDPκ=0.50 648533 1302 19 0.7071 2496247 68.39
DP 647081 1301 24 0.4848 2496246 58.76
sigm 866605 1393 17 0.4551 3980611 18.01
heur 655704 1326 21, 32 0.7337 2496249 44.22

37


4. Results

Table 4.4: Simulation results with different methods for the route Gothenburg–
Kiruna, with uncertainty applied after the station selection.

method J tN [min] Selected
stations

∑ H̄2,rf Ebr [J ] tcomp [sec]

variable H2 cost, equidistant sampling
SDPκ=0.05 505343 1648 4 23 30 31 2.0834 1598 52.52
SDPκ=0.50 505400 1649 4 23 30 31 2.0830 1598 15.00
DP † † 4 26 31 1.7397 † 28.22
sigm † † 9 14 15 22 29 31 1.8598 † 51.05
heur 498882 1628 5 23 30 31 2.0250 1598 2.90

constant H2 cost, equidistant sampling
SDPκ=0.05 503405 1648 4 16 29 31 2.0948 1598 7.40
SDPκ=0.50 503238 1649 4 19 30 31 2.1028 1598 13.09
DP 556520 2468 3 24 30 1.7538 693787385 26.10
sigm † † 14 15 22 29 31 1.7359 † 56.88
heur 499019 1628 5 23 30 31 2.0250 1598 53.96

constant H2 cost, non-equidistant sampling
SDPκ=0.05 505391 1649 4 22 30 31 2.0857 1598 66.82
SDPκ=0.50 504057 1649 1 18 29 31 2.0996 1598 70.11
DP 569126 1808 4 26 31 1.7445 1595 89.24
sigm † † 14 15 22 29 31 1.7520 † 61.44
heur 498183 1628 5 23 30 31 2.0250 1598 55.24

variable H2 cost, non-equidistant sampling
SDPκ=0.05 510085 1664 5 25 34 36 2.0994 1598 100.08
SDPκ=0.50 516100 1684 5 25 32 33 36 2.0997 1598 110.06
DP † † 5 29 36 1.7496 † 76.08

sigm † † 1 3 4 15 18
25 27 28 33 36 2.0541 † 48.36

heur 503535 1643 7 26 34 36 2.0250 1598 46.46

For Tables 4.3 and 4.4 the SDP strategy is outperformed by the DP strategy in
the deterministic case, while performing better in the uncertain case, albeit being
outperformed by the heuristics algorithm there. The sigmoid strategy has the worst
performance, failing to find a solution in the uncertain case.

38


4. Results

4.3 Test 2 – Iterative method

Presented below are the results from running the simulations in test 2. The first two
simulations for each route used K = 250 and N = 500 while the latter used K = 500
and N = 1000. Even-numbered simulations, on the right-hand side, regenerated a
new SDP policy for each iteration while odd-numbered simulations, on the left-
hand side, reused the initial policy. The first set of 12 simulations, seen in Tables
4.5-4.7, were run using the seed rng(5) for the random number generator, which
determines how the hydrogen prices vary between stations. The second set, seen
in Tables 4.8-4.10, used a new seed for each simulation resulting in prices varying
between simulations run with and without regenerating the SDP policy. Note that
for route two, Gothenburg-Kiruna, the number of stations available decreases with
the lower stage resolution, as nearby stations are combined, resulting in the indices
not being comparable between stage resolutions and the randomly generated costs
differing.

Table 4.5: Results from simulations for route one, Zagreb–Paris, using the same
seed for every simulation.

simulation 1
iter. J tN [min] stations

∑
H̄2,rf

1 403718 1299 18 0.7084
2 423368 1295 18 0.7138
3 406036 1301 18 0.7139
4 398754 1295 18 0.7141
5 406004 1301 18 0.7138
6 467642 1300 18 0.7141
7 484385 1304 18 0.7139
8 403719 1298 18 0.7182
9 468267 1295 18 0.7138
10 476583 1300 18 0.7137

simulation 2
iter. J tN [min] stations

∑
H̄2,rf

1 403718 1299 18 0.7084
2 † † † †

simulation 3
iter. J tN [min] stations

∑
H̄2,rf

1 413230 1326 19 32 0.7168
2 417976 1331 18 27 0.8146

3 409419 1341 18 27
29 0.8122

4 405712 1323 18 27 0.8089
5 403655 1322 18 27 0.8089
6 403320 1321 18 27 0.8101

simulation 4
iter. J tN [min] stations

∑
H̄2,rf

1 413230 1326 19 32 0.7168
2 † † † †

39


4. Results

Table 4.6: Results from simulations for route two, Gothenburg–Kiruna, using the
same seed for every simulation.

simulation 5
iter. J tN [min] stations

∑
H̄2,rf

1 1.1 · 107 2630 16 28 0.9520
2 745434 2461 15 26 1.2761
3 † † † †

simulation 6
iter. J tN [min] stations

∑
H̄2,rf

1 1.1 · 107 2630 16 28 0.9520
2 711221 2318 15 26 1.2674
3 † † † †

simulation 7
iter. J tN [min] stations

∑
H̄2,rf

1 490930 1619 18 32 0.9892
2 492082 1619 18 32 1.0158
3 492070 1618 18 32 1.0160

simulation 8
iter. J tN [min] stations

∑
H̄2,rf

1 490930 1619 18 32 0.9892
2 492052 1617 18 32 1.0166
3 492312 1618 18 32 1.0163

Table 4.7: Results from simulations for route three, Gothenburg–Rødby, using the
same seed for every simulation.

simulation 9
iter. J tN [min] stations

∑
H̄2,rf

1 116464 383 9 0.3193
2 120941 397 9 0.3319
3 118311 389 9 0.3270
4 119430 392 9 0.3272

simulation 10
iter. J tN [min] stations

∑
H̄2,rf

1 116464 383 9 0.3193
2 699926 1597 9 0.5409
3 † † † †

simulation 11
iter. J tN [min] stations

∑
H̄2,rf

1 122220 400 9 0.3289
2 136394 448 9 0.3384
3 142515 468 9 0.3440
4 150348 490 9 0.3497
5 165948 537 9 0.3630
6 201265 612 9 0.3922
7 † † † †

simulation 12
iter. J tN [min] stations

∑
H̄2,rf

1 122220 400 9 0.3289
2 † 1758 12 0.6063
3 † † † †

Table 4.8: Results from simulations for route one, Zagreb–Paris, using a different
seed for every simulation.

simulation 1
iter. J tN [min] stations

∑
H̄2,rf

1 437415 1299 18 0.6918

2 421984 1358 17 20
24 30 0.7145

3 410299 1319 17 20 0.7524
4 410465 1319 17 20 0.7522

simulation 2
iter. J tN [min] stations

∑
H̄2,rf

1 405938 1315 16 30 0.7131
2 † † † †

simulation 3
iter. J tN [min] stations

∑
H̄2,rf

1 409438 1322 19 32 0.7163
2 † 2298 16 26 0.8237
3 406646 1326 16 26 0.8721
4 404271 1324 16 26 0.8218

simulation 4
iter. J tN [min] stations

∑
H̄2,rf

1 432181 1336 19 32 0.7284
2 † † † †

40


4. Results

Table 4.9: Results from simulations for route two, Gothenburg–Kiruna, using a
different seed for every simulation.

simulation 5
iter. J tN [min] stations

∑
H̄2,rf

1 487589 1611 17 26 0.9702
2 487641 1612 17 26 0.9771
3 406646 1326 16 26 0.8721
4 404271 1324 16 26 0.8218

simulation 6
iter. J tN [min] stations

∑
H̄2,rf

1 487653 1611 17 27 0.9616
2 487499 1611 17 27 0.9688
3 406646 1326 16 26 0.8721
4 404271 1324 16 26 0.8218

simulation 7
iter. J tN [min] stations

∑
H̄2,rf

1 496783 1639 21 25
33 0.9736

2 497745 1638 21 25
33 0.9691

3 406646 1326 16 26 0.8721
4 404271 1324 16 26 0.8218

simulation 8
iter. J tN [min] stations

∑
H̄2,rf

1 496804 1639 23 32
33 0.9698

2 491992 1619 23 32 0.9886
3 406646 1326 16 26 0.8721
4 404271 1324 16 26 0.8218

Table 4.10: Results from simulations for route three, Gothenburg–Rødby, using a
different seed for every simulation.

simulation 9
iter. J tN [min] stations

∑
H̄2,rf

1 114569 377 10 0.2819
2 123673 407 10 11 0.3288
3 122332 402 10 11 0.3252
4 122295 402 10 11 0.3233

simulation 10
iter. J tN [min] stations

∑
H̄2,rf

1 115177 378 12 0.3176
2 4438936 1651 12 0.5764
3 † † † †

simulation 11
iter. J tN [min] stations

∑
H̄2,rf

1 121791 398 10 0.3284
2 126756 415 10 0.3332
3 124980 409 10 0.3322
4 122295 402 10 11 0.3233

simulation 12
iter. J tN [min] stations

∑
H̄2,rf

1 118808 389 12 0.3262
2 433304 1255 12 0.5202
3 † † † †

From the results, there is a general pattern of simulations where the SDP policy
is regenerated ending up infeasible, as illustrated by the † symbols. The cause of
this is presumed to be the speed profile determined by the deterministic bottom
layer optimization not being robust enough for a SDP policy to be found at the risk
tolerance of κ = 0.05 used. A possible way to mitigate this is to use the algorithm
to preprocess the speed profile given by the bottom layer, as done initially. Simula-
tions 5 and 6 in Table 4.9 seem to suggest that the stage quantization K = 250 and
N = 500 is insufficient to consistently find an optimal solution.

Notably, numerous simulations converges to a lower trip cost, as seen in simula-
tion 3 in Table 4.5, simulations 1 and 3 in Table 4.8 and all simulations in Table
4.9. In total, 8 out of 24 simulations resulted in a lower cost after iterating. In sim-
ulation 3 in Table 4.5 and simulation 5-8 in Table 4.9 the station selection changes
which brings a significant improvement, as seen in the decrease in cost. For other
simulations, the trip cost instead diverges or stays stagnant. Simulation 1 in Table

41


4. Results

4.5 illustrates the inconsistent behaviour of the NLP solver, as the only parameter
that changes between attempts at finding the optimal solution is the initial guess
and the refuel amount ∑ H̄2,rf which differs by at most 1.4%, with total trip cost J
differing by as much as 19.5% between solutions.

4.4 Test 3 – Resolution analysis
This section contains the results of the test that was conducted based on the de-
scription in subsection 3.4.3. In Figure 4.3 and Table 4.11, one can find test results
obtained with equidistant stage sampling, and in Figure 4.4 and Table 4.11 the ones
obtained with non-equidistant stage sampling.

Results presented in Tables 4.11 and 4.12 demonstrate that solving with Mx1 =
Mx2 = 11 yielded a solution with two more refuelling stops than with Mx1 = Mx2 =
21 and Mx1 = Mx2 = 31, which in turn resulted in a higher travel time and total
cost. Similar results were seen when using non-equidistant sampling. However, there
was no difference in the stations selected and no significant difference in the amount
of refuelled hydrogen with 21 and 31 state values in SDP, whereas the computation
time differed by between 30 and 50 seconds depending on the stage resolution. Sta-
tions 15 and 30 were selected for refuelling which, according to Table 3.2, have the
lowest prices in their respective parts of the route. This further suggests that the
SDP strategy reduces the total cost of operation as desired.

When it comes to stage sampling N and K, as shown in Table 4.11, using a lower
stage resolution consistently yielded lower energy losses which turned out to be al-
most linear to the number of stages N , meaning that it most probably was by caused
scaling in cost function and specific behaviour of the solver rather than resolution of
the stage discretisation. Thus, the difference in cost and brake energy, and implicitly
accuracy, is small relative to the difference in computation time, which is approx-
imately halved when going from N = K = 1600 to N = K = 800. Worth noting
is that in one of the simulations, a resolution of N = K = 400 was insufficient for
the solver to converge. This suggests a trade-off between reliability and accuracy is
necessary to achieve the desired results, as ultimately the mission optimiser needs
to be reliable in providing a solution.

The results when running the same simulations using non-equidistant stage sam-
pling, which are demonstrated in Table 4.12, suggest that the difference in perfor-
mance is minimal. Losses from friction braking are identical to the losses when using
equidistant sampling. Noticeably, the simulation which was infeasible in Table 4.11
was solvable when using non-equidistant sampling. However, no conclusive assump-
tions can be made based on this as the sample size is too low and only one route
and vehicle is considered. Although the stations selected were the same, the trip
time was slightly longer for the non-equidistant simulations, differing by an average
of 1.2% which is reflected in the trip cost as well. This can be explained by an in-

42


4. Results

(a) Simulation results for Mx1 = Mx2 = 31

(b) Simulation results for Mx1 = Mx2 = 21

(c) Simulation results for Mx1 = Mx2 = 11

Figure 4.3: H̄2 and SoC trajectories together with the power split for the
Gothenburg–Kiruna route with equidistant sampling with N = K = 1600. Sub-
figures (a)–(c) correspond to different numbers of quantised state values in SDP.

43


4. Results

Table 4.12: Simulation results with different SDP state quantisation and stage
resolution with non-equidistant stage sampling for the route Gothenburg–Kiruna.

Mx1 = Mx2 N = K J tN [min] Selected
stations

∑ H̄2,rf Ebr [J ] tcomp [sec]

31

1600 493493 1620

15, 30

1.056 3198 231.01
1200 497093 1622 1.042 2398 183.8
800 492097 1618 1.030 1598 121.76
400 495038 1634 0.991 798 66.6

21

1600 493511 1619

15, 30

1.058 3198 186.5
1200 497282 1623 1.045 2398 138.3
800 491902 1617 1.038 1598 90.6
400 495222 1634 1.018 798 61.3

11

1600 506863 1662
15, 27,
30, 34

1.159 3198 147.8
1200 510501 1665 1.139 2398 125.9
800 504525 1658 1.123 1598 80.1
400 507578 1674 1.060 798 42.33

creased accuracy of the solution, although such claims would have to be investigated
further.

Table 4.11: Simulation results with different SDP state quantisation and stage
resolution with equidistant stage sampling for the route Gothenburg–Kiruna.

Mx1 = Mx2 N = K J tN [min] Selected
stations

∑ H̄2,rf Ebr [J ] tcomp [sec]

31

1600 488376 1605

15, 30

1.048 3198 236.9
1200 487542 1605 1.038 2398 159.8
800 485240 1598 1.022 1598 95.2
400 485467 1604 1.011 798 56.3

21

1600 488236 1605

15, 30

1.047 3198 172.6
1200 487559 1605 1.043 2398 134.3
800 485305 1598 1.033 1598 77.56
400 † † 1.029 † †

11

1600 501133 1646
15, 27,
30, 34

1.148 3198 120.5
1200 500238 1645 1.131 2398 91.0
800 497800 1639 1.103 1598 63.0
400 498178 1645 1.096 798 39.6

44


4. Results

(a) Simulation results for Mx1 = Mx2 = 31.

(b) Simulation results for Mx1 = Mx2 = 21.

(c) Simulation results for Mx1 = Mx2 = 11.

Figure 4.4: H̄2 and SoC trajectories together with the power split