Predicting Finite Element Simulation
output using Machine Learning
A study to understand the potential of Graph Neural Networks
in predicting vehicle occupant pre-crash simulation kinematics

Master’s thesis in Biomedical Engineering

CHIARA ROSANNA FICHERA

DEPARTMENT OF MECHANICS AND MARITIME SCIENCES

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2022
www.chalmers.se

www.chalmers.se


Master’s thesis in Biomedical Engineering

Predicting Finite Element Simulation output
using Machine Learning

A study to understand the potential of Graph Neural Networks in
predicting vehicle occupant pre-crash simulation kinematics

Chiara Rosanna Fichera

Department of Mechanics and Maritime Sciences
Division of Vehicle Safety

Injury Prevention
Chalmers University of Technology

Gothenburg, Sweden 2022


Predicting Finite Element Simulation output using Machine Learning
A study to understand the potential of Graph Neural Networks in predicting vehicle
occupant pre-crash simulation kinematics
Chiara Rosanna Fichera

© Chiara Rosanna Fichera, 2022.

Supervisor: Jobin John, Division of Vehicle Safety, Chalmers
Co-supervisor: Johan Iraeus, Division of Vehicle Safety, Chalmers
Examiner: Johan Iraeus

Master’s Thesis 2022:70
Department of Mechanics and Maritime Sciences
Division of Vehicle Safety
Injury Prevention
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: Simplified Hybrid III 50th percentile dummy model.

Typeset in LATEX, template by Magnus Gustaver
Printed by Chalmers Reproservice
Gothenburg, Sweden 2022

iv


Predicting Finite Element Simulation output using Machine Learning
A study to understand the potential of Graph Neural Networks in predicting vehicle
occupant pre-crash simulation kinematics
Chiara Rosanna Fichera
Department of Mechanics and Maritime Sciences
Chalmers University of Technology

Abstract
Each year around 1.35 million people die in traffic crashes and many more get in-
jured. Human Body Models have become a popular tool to understand and prevent
injuries and fatalities as they simulates the human body response during crash sce-
narios.

With improvements in the accuracy and bio-fidelity of Human Body Models, the
demand for computational resources is increased up to requiring days for a pre-
single crash simulation. Therefore, Dimensionality Reduction methods can be used
to create a surrogate that is a smaller representation of the model. The reduced
representation translated to a latent space that still retains the complexity and the
features of the simulation. To build the surrogate, supervised learning tasks have
been implemented on the reduced dimension to map the simulation to its parameters.

To create a compressed version of the input simulation, Graph Neural Networks
have been considered since they incorporate the geometrical structure of the model.
The Graph Neural Network has been compared to an equivalent Convolutional Neu-
ral Network architecture and to a Principal Component Analysis. Random Forest,
Gradient Boosting, and XGBoost were chosen and compared to build the surrogate
model. Due to computational limitations, and since pre-crash simulations are al-
ready time-consuming, a simplified Hybrid III dummy model was chosen in place of
a full Human Body Model.

For all Dimensionality Reduction methods, the accuracy of the reconstruction im-
proves by increasing both the number of samples and the latent space size. Principal
Component Analysis shows a better performance in terms of lower errors compared
to Graph Neural Network. Moreover, the Graph Neural Network structure is compa-
rable to an equivalent Convolutional Neural Network architecture in term of perfor-
mance. Random Forest and Gradient boosting proved to be better than XGBoost.
Principal Component Analysis requires less computational time and resources.
Although Graph Neural Network was outperformed in this study, further improve-
ments in the development of the method may still have potential.

Keywords: HBM, Dimensionality Reduction, Graph Neural network, Principal Com-
ponent Analysis, Machine Learning.

v


Acknowledgements
I would like to thank my supervisors, Johan and Jobin, for all the inputs during the
weekly meetings.
A thanks to Alex and Julia for the stimulating discussion on the topic during the
spring.
A Luca, per avermi incoraggiato fin dall‘inizio senza alcuna riserva per poter vivere
e fare esperienza di questo sogno nel cassetto.
Un grazie speciale alla mia famiglia, e soprattutto alla mamma, per fare il tifo per
me e per avermi permesso di vivere quest´avventura unica. E alla nonna, che spera
sempre io possa tornare a casa.

A loving thanks to Thomas, the best supporter ever, for encouraging me all the
way and for believing in me, without a doubt, from the very first moment. Words
cannot express enough gratitude for you.

Chiara Rosanna Fichera, Gothenburg, August 2022

vii


Contents

List of Figures xi

List of Tables xiii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 The crash phase . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 The evolution of test subjects . . . . . . . . . . . . . . . . . . 2

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Aim and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Theory 7
2.1 Finite Elements Methods . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 What is a Neural Network? . . . . . . . . . . . . . . . . . . . 10
2.2.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . 12
2.2.3 Graph Convolutional Networks . . . . . . . . . . . . . . . . . 12
2.2.4 Unsupervised methods . . . . . . . . . . . . . . . . . . . . . . 15
2.2.5 Supervised methods . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.6 Error Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Methods 25
3.1 Data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Dimensionality Reduction I - GNN . . . . . . . . . . . . . . . . . . . 28
3.3 Dimensionality Reduction II - CNN . . . . . . . . . . . . . . . . . . . 32
3.4 Dimensionality Reduction III - PCA . . . . . . . . . . . . . . . . . . 33
3.5 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Animation on META post-processor . . . . . . . . . . . . . . . . . . 35

4 Results 37
4.1 Beam model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . 37
4.1.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 Pre-crash Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . 42
4.2.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

ix


Contents

5 Discussion 49
5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 Comparison of Dimensional Reduction . . . . . . . . . . . . . . . . . 51
5.4 Comparison of Regression . . . . . . . . . . . . . . . . . . . . . . . . 51
5.5 From the dummy model to the HBM . . . . . . . . . . . . . . . . . . 52
5.6 Computational resources . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Conclusion 57

7 Future Work 59

A Appendix I

x


List of Figures

1.1 Events occuring during crash phase. . . . . . . . . . . . . . . . . . . . 2
1.2 Hybrid III 50th percentile male dummy model used for frontal impact

test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 HBM v.9 Female model developed by SAFER . . . . . . . . . . . . . 4
1.4 Surrogate model process . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Hybrid III 50th percentile FE Dummy model . . . . . . . . . . . . . . 7
2.2 Single step Euler approximation . . . . . . . . . . . . . . . . . . . . . 8
2.3 Different FE elements [37] . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Structure of biological and artificial neuron [1]. . . . . . . . . . . . . . 11
2.5 An example of a Feed-forward Neural Network composed of three

layers [17]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 CNN convolution operation process [24]. . . . . . . . . . . . . . . . . 13
2.7 General representation of a graph with nodes and edges [20] . . . . . 13
2.8 An example of a graph [21] . . . . . . . . . . . . . . . . . . . . . . . . 14
2.9 Graph convolution operation [20] . . . . . . . . . . . . . . . . . . . . 15
2.10 General architecture of an autoencoder [25]. . . . . . . . . . . . . . . 16
2.11 Principal component analysis on a sample dataset . . . . . . . . . . . 17
2.12 Example of Linear regression. . . . . . . . . . . . . . . . . . . . . . . 19
2.13 Model of a decision tree [28]. . . . . . . . . . . . . . . . . . . . . . . . 20
2.14 Random forest method [29]. . . . . . . . . . . . . . . . . . . . . . . . 21
2.15 The addictive process of the Gradient boosting algorithm [30]. . . . . 21

3.1 LS-DYNA Element shell . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 The beam model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Pre-crash simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 The edge list connections for an imaginary structure. . . . . . . . . . 29

4.1 GNN evaluation on Node 99 with respect to different latent space sizes. 38
4.2 GNN Train and Test MAE for different sample sizes . . . . . . . . . . 38
4.3 PCA - explained variance with respect to the number of components

for the beam model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Number of components and explained variance . . . . . . . . . . . . . 39
4.5 PCA beam model: Train MAE error with respect to the number of

components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6 Beam model: node 44 with respect to the different number of com-

ponents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

xi


List of Figures

4.7 Beam model: final MAE with respect to sample size. . . . . . . . . . 41
4.8 Beam model: node 99 after the RF regression . . . . . . . . . . . . . 41
4.9 Pre-crash simulation - GNN: Node 2004014 for different latent sizes . 42
4.10 CNN performance on training set - Node 2009276 . . . . . . . . . . . 43
4.11 Pre-crash PCA: Explained variance vs. number of components. . . . . 44
4.12 Amount of variance expresses by each component, increasing the num-

ber of components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.13 PCA - varying size of components - Node 2005770 . . . . . . . . . . 45
4.14 MAE for Random Forest, Gradient Boosting, and XGBoost in test set 45
4.15 Pre-crash Regression: Node 2004014 from test set . . . . . . . . . . . 46
4.16 PCA-GNN comparison: after 729 ms . . . . . . . . . . . . . . . . . . 46
4.17 CNN - GNN comparison: after 729 ms. . . . . . . . . . . . . . . . . . 47

5.1 Time consumption of different Dimensionality reduction method . . . 53
5.2 Time comparison to reproduce 500 simulations. . . . . . . . . . . . . 54
5.3 Comparison of training time between GNN, CNN, and PCA. . . . . . 54
5.4 Pre-crash simulation: Comparison of training time between methods

used and LS-DYNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.1 Hourglass-like anomaly observed on the right foot of the model . . . . 60
7.2 Hourglass-like anomaly located in the forehead of the dummy . . . . 60

A.1 Head node - 2005184 . . . . . . . . . . . . . . . . . . . . . . . . . . . I
A.2 Chest Node - 2002195 . . . . . . . . . . . . . . . . . . . . . . . . . . II
A.3 Left hand 2002263 . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
A.4 Right foot - 2009159 . . . . . . . . . . . . . . . . . . . . . . . . . . . III
A.5 PCA-GNN comparison: after 929 ms . . . . . . . . . . . . . . . . . . III
A.6 PCA-GNN comparison: after 954 ms . . . . . . . . . . . . . . . . . . IV
A.7 PCA-GNN comparison: after 979 ms . . . . . . . . . . . . . . . . . . IV
A.8 CNN - GNN comparison: after 929 ms . . . . . . . . . . . . . . . . . IV
A.9 CNN - GNN comparison: after 954 ms . . . . . . . . . . . . . . . . . V
A.10 CNN - GNN comparison: after 979 ms . . . . . . . . . . . . . . . . . V

xii


List of Tables

3.1 FE simulation parameters for beam object . . . . . . . . . . . . . . . 26
3.2 Pre-crash FE parameters . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Number of edges in the models . . . . . . . . . . . . . . . . . . . . . 29
3.4 Graph Autoencoder Architecture for the beam model . . . . . . . . . 30
3.5 Graph Autoencoder Architecture for the Pre-crash simulation . . . . 31
3.6 Training hyperparameters: epochs and learning rate . . . . . . . . . . 32
3.7 CNN Architecture for the Pre-crash simulation . . . . . . . . . . . . . 33
3.8 Regression: FE simulation parameters and Latent space . . . . . . . . 35

4.1 MAE and RMSE for different latent space size . . . . . . . . . . . . . 37
4.2 Number of components and explained variance . . . . . . . . . . . . . 39
4.3 MAE and RMSE for different latent space size . . . . . . . . . . . . . 42
4.4 MAE for max timesteps scaling and standardization . . . . . . . . . . 43
4.5 Training and Test error for CNN architecture. . . . . . . . . . . . . . 43
4.6 Amount of variance expresses by each component, increasing the num-

ber of components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1 Beam model: Time required for different regression for training and
prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

xiii


List of Tables

xiv


List of Acronyms

AI Artificial Intelligence.
ANN Artificial Neural Network.
ASIRT Association for Safe International Road Travel.

CNN Convolutional Neural Network.

DR Dimensionality Reduction.

FE Finite Elements.
FE-HBMs Finite Elements - Human Body Model.
FEM Finite Elements Methods.
FFNN Feed Forward Neural Network.

GNN Graph Neural Network.

HBM Human Body Model.

MAE Mean Absolute Error.
ML Machine Learning.
MSE Mean Square Error.

NN Neural Network.

PCA Principal Component Analysis.

RMSE Root Mean Square Error.

xv


List of Acronyms

xvi


1
Introduction

Finite Elements simulations are used in automotive development to predict and in-
vestigate injuries, as they represent an useful tool to save resources in crash test
simulations. However, producing many simulations is computationally demanding
and thus time-consuming.
Machine learning is an innovative method applied in many fields to predict states
based on a large quantity of data. Here, this method is applied to a Finite Element
(FE) model representing a human body in a pre-crash scenario to reduce comput-
ing time and increase efficiency. These models aid to study crash kinematics and
understand injuries from vehicle crashes to find ways to mitigate and prevent these
injuries.

1.1 Background
According to the Annual Global Road Crash Statistics of the Association for Safe
International Road Travel (ASIRT) around 1.35 million people die each year due to
traffic accidents [2]. Furthermore, every year between 20 and up to 50 million suffer
from non-fatal injuries. Even though, Europe has the lowest road-traffic mortality
rate with a tendency to fall, it is still a leading external cause of death and injury.
Sweden’s fatality rate in traffic is one of the lowest in the EU (27 per million pop-
ulation in 2015) [3]. The total cost of road accident casualties (both fatalities and
injuries) is estimated to be 48,5 billion euros in the European Union [3]. The aim of
traffic safety research is to decrease the number of accidents in general but especially
fatalities and injuries. As a very idealistic ambition, the Vision Zero aims to elimi-
nate all traffic fatalities and severe injuries by increasing the safety [2, 5]. The goal
is to have a holistic approach towards traffic safety including traffic environments,
injuries and fatalities prevention, dynamics of car crashes as well as designing new
safety systems.
In the automotive field, active and passive safety are the two different branches that
focus on car safety and crashes. In active safety, new features (e.g. driver assistant)
are designed to prevent the crash to occur. Parallelly, passive safety investigates
and develops measures to mitigate the seriousness of the crash.
In this regard, the focus of injury prevention is to assess and reduce the injury sever-
ity of the vehicle’s occupants and to understand the mechanical response during the
impact. Commonly, these findings can be used to develop new safety features (e.g.,
roof-rail airbags) or improve existing ones (e.g., seat belt pre-tensioners). Moreover,
analyzing the few moments before a crash can give a better understanding of what

1


1. Introduction

can be done and improved before the crash happens. During the pre-crash phase, the
driver might undertake manoeuvres or adjustment to avoid the impact. These lead
the driver to be out of his ideal position and resulting in more severe injuries during
the crash. This behaviour occurs in an higher extend in case of automated driving
where the driver is partially or completely disengaged from the driving activity and
therefore unaware of the forthcoming maneuvers.

1.1.1 The crash phase
In order to understand the dynamic of an impact, a brief explanation of the different
phases of a crash follows. The phases are shown in the Figure 1.1. A car crash event
can be split into three different phases: pre-crash, in-crash and post-crash [6].
In the pre-crash phase, the chance of a crash to occur is high. In this time frame,
usually around the order of seconds, the accident can be avoided, or the severity
reduced, if actions are taken. At the end of this phase, if the crash cannot be
avoided, a point of no return is reached - as shown in Figure 1.1. Usually, the main
cause of car crashes is human error but external factors such as road conditions can
also contribute substantially [7].
The next step is the in-crash phase which lasts around 150 ms. In this stage, the
impact occurs. Nothing in this phase can be done to avoid the impact. The occupant
of the vehicle is subject to forces that can cause severe injuries if the delta velocity
(the change of velocity during the impact) is high and the occupant comes in contact
with the inner parts of the vehicle.
Lastly, the post-crash phase follows the in-crash phase. Both the vehicle and the
occupants are halted and rescue takes place. In this stage, the severity of the crash
can be assessed and fatality risk can be calculated.

Figure 1.1: Events occuring during crash phase.

1.1.2 The evolution of test subjects
To better understand the dynamics of the impact, crash scenarios are reproduced
using a human substitute as vehicle occupant. At early stage, volunteers, animals,
and human cadavers were used as subjects to conduct studies [4]. In the 1950s, the
ethical concerns along with anatomical differences of the subject moved the focus
to develop mechanical dummies. A dummy, as shown in Figure 1.2, is a mechanical
substitute that resembles the human body during tests and is able to record the
forces and deformations during an impact [4].

2


1. Introduction

Figure 1.2: Hybrid III 50th percentile male dummy model used for frontal impact
test.

At first, crash test dummies were incorporated in FE simulation environments to
run crash tests in a virtual environment. This was done to reduce the effort and
cost of destructive physical tests and allows more extensive testing of different crash
scenarios with a wider variation of crash parameters.
Thanks to the high precision of the FE simulations, it is possible to investigate
the local deformation and the strains. At a later stage, Finite Element - Human
Body Models ( FE-HBMs ) have been developed (see example in Figure 1.3) to
reach higher bio-fidelity of the human body and yield a more accurate prediction
of injuries. In most recent research, Active-HBMs have been developed and used to
incorporate the response of muscles activation in the model [39, 40]. In addition,
morphing has been used to shape and adapt varying or outlying anthropometric
features on HBMs [41].

1.2 Motivation
Pre-crash simulations are computationally expensive and time-consuming. Simula-
tion time through FE solver of pre-crash scenarios is extremely long due to the long
duration of the event. State-of-the-art FE models are based on complex Partial Dif-
ferential Equations, which involve a high number of complex variables and boundary
conditions.Hence, obtaining a reduced representation of complex data, also called
surrogate or meta-models, can be a solution for such challenges.
Dimensionality Reduction entails having a surrogate model that retains most of the
complexity and features of the model. DR reduces the number of features to a
small number of meaningful components. It is convenient when working with big
and complex models that have many redundant features. As result, it is possible
to obtain a clear and straightforward data visualization. Having a smaller model

3


1. Introduction

Figure 1.3: HBM v.9 Female model developed by SAFER

requires less computational resources, having reduced training and computational
time.
The dimensionality reduction process is presented in Figure 1.4.

Figure 1.4: Surrogate model process

Specifically, it can be divided into two main parts:

• building a low dimensional input, also called latent space, where the high
dimensional input is encoded

• create a surrogate modeling on the reduced dimension model by mapping the
model parameters to the simulation. The model parameters (e.g., breaking or
steering acceleration) rule and influence the simulations outcomes.

These two steps are obtained by using ML algorithms. The first step is an unsu-
pervised learning task, whereas the second is a supervised task. In this project,
different unsupervised methods are used to encode the input in a lower space.

4


1. Introduction

1.3 Aim and limitations
This project aims to implement a Dimensionality Reduction method based on Graph
Neural Network, to build and evaluate a surrogate model that can predict kinemat-
ics of the pre-crash phase using input data of a Human Body Model model. As a
consequence, a model that predicts the kinematics history of a pre-crash simulation
can be used as a substitute for FE-solver-based simulations.

Below follows a list specifying the research objectives for this thesis.
• Identify and investigate a GNN architecture for Dimensionality Reduction
• Compare the GNN architecture to an equivalent CNN
• Compare the GNN to Principal Component Analysis
• Evaluate different supervised methods to build the surrogate model
• Analyze the time consumption and the computational resources of the methods

analyzed compared to an FE solver
Due to limited computational resources a Hybrid III 50th percentile Dummy model
was used instead of the SAFER HBM v9.

5


1. Introduction

6


2
Theory

In this chapter, a brief introduction on the methods used in the project such as
Finite Elements Methods (FEM), Dimensionality Reduction methods, Graph Neu-
ral Network (GNN), Principal Component Analysis (PCA) and supervised learning
algorithms is given.

2.1 Finite Elements Methods
Finite Elements Methods are a numerical method that overcomes the difficulty of
resolving complex Differential Equations (DEs) with analytical approaches. The
analytical approach requires the assumption of many hypotheses if the set of equa-
tions can be solved for the global system. Thus, the unfeasibility to investigate
accurately the local effects. In the numerical approach, the space is considered as a
collection of subdomains - that form a mesh - called Finite Elements - illustrated in
Figure 2.1. For every Finite Element, a set of coupled equations is formulated and
an approximate numerical solution can be found.

Figure 2.1: Hybrid III 50th percentile FE Dummy model

One of the simplest approaches to develop a numerical solution is based on Euler
Methods. A simple and generic case is presented to show how it is formulated. For
a given equation in the time domain, the initial condition for t = t0 is known, as
shown in 2.1.

7


2. Theory

ẋ(t) = f(x, t)
x(t0) = x0

(2.1)

Let´s consider the nth time-step for x(t = tn). The equation will be written as
follow:

ẋn = f(xn, tn) (2.2)
The derivative of ẋn is defined also known as Euler approximation:

ẋn ≈ xn+1 − xn

tn+1 − tn

= xn+1 − xn

∆t
(2.3)

From Equation 2.2 and 2.3, it is possible to obtain:
xn+1 − xn

∆t
= f(xn, tn) (2.4)

Rewriting the equation and isolating the term xn+1:

xn+1 = xn + ∆tf(xn, tn) (2.5)
The Equation 2.5 takes the name of Euler explicit method. Since it is obtained from
known terms, xn and f(xn, tn) and the next iteration is isolated on one side of the
equation. This formula approximates the next value of the function by knowing
the previous step values and the step size taken in the tangential direction. The
geometrical meaning of the approximation approach is shown in Figure 2.2 [42].
The Euler approximation is derived from the truncated Taylor expansion of the
derivative definition and therefore has an error of O(h2).

Figure 2.2: Single step Euler approximation

Alternatively another formulation can be drawn. By re-writing the 2.3 and consid-
ering the previous time-steps we would have:

ẋn = xn − xn−1

∆t
(2.6)

8


2. Theory

And obtaining:

xn = xn−1 + ∆tf(xn, tn) (2.7)

And then, re-indexing from xn to xn+1:

xn+1 = xn + ∆tf(xn+1, tn+1) (2.8)

The Equation 2.8 shows the implicit or backward Euler formula. It is important to
notice that in this version, the term n + 1 appears on both sides of the equation,
therefore knowing the current step is not sufficient to solve the equation.
The implicit and explicit Euler formulas are two different methods to approximate
and solve a differential equation. Numerical solvers can be either based on the
explicit or the implicit method. The implicit methods are generally used for static
and dynamic analyses when investigating linear or quasi-linear effects [43]. However,
multiple iterations and matrix inversion are needed to solve the equations. The
approach is computationally heavy on memory. In contrast, the explicit method,
due to its iterative formulation, is lighter on memory usage and suited for highly non-
linear geometrical and material effects. However, since the methods can be unstable,
to ensure convergence small steps need to be considered, making it suitable only for
short simulations [43].
Overall, there are different trade-offs between choosing an implicit or explicit solver.
The method of the solver is chosen based on the field of application. In this project,
an explicit solver like LS-DYNA is more suitable for the application, due to the high
non-linearity and short duration of crash simulations.

The mesh
The fundamental blocks that constitute a mesh are shell, solid, discrete, and beam
shells. The different shell parts are shown in Figure 2.3 below.
In particular:

• Shell elements are elements composed of 4 nodes, or vertexes.
• Solid elements are defined by 8 nodes. They can be tetrahedrons, pentahe-

drons, or hexahedrons.
• Beam elements are defined by 3 nodes, where the first 2 describe the geometry

and the third the orientation.
• Discrete elements are defined by 2 nodes. It can be either a spring or a damper

between two nodes.
In addition, there is one more category called constrained parts or elements. The
constrained category applies constraints within the model and its structural part.
The distance between two constrained nodes is fixed and does not vary throughout
the simulation. They can be used to model a joint, for instance, or to impede the
degree of freedom of parts.

9


2. Theory

(a) Shell elements (b) Solid elements (c) Beam element

(d) Discrete elements

Figure 2.3: Different FE elements [37]

2.2 Machine Learning
In recent years, Machine Learning (ML) has gained importance and found applica-
tion in many fields. Machine Learning is a sub-field of Artificial Intelligence (AI)
that aims to use data and algorithms to imitate the way that humans learn. This
entails the human ability to improve itself over time [9].
ML can be classified into different categories based on the learning rule - supervised
or unsupervised. A supervised algorithm uses labeled data during the training pro-
cess to validate the prediction made. It requires a ’supervisor’ to determine and
label the data before being used in the training process. Supervised models are gen-
erally used for classification or regression tasks. In contrary, unsupervised models
do not require labels but seek patterns in the dataset itself. They are typically used
for dimensionality reduction, clustering, and association tasks [18]. Often, these two
algorithms are based on Artificial Neural Network (ANN)s.

2.2.1 What is a Neural Network?
The name Neural Network comes from the neuron which is a biological element of a
brain network. In the brain, a neuron receives and processes inputs and eventually
transmits output signals to other neurons. ANNs are designed inspired by this
physiological process.
Similarly, the building block of an ANN is called a neuron, which forms an intri-
cate interconnected structure that can be extended up to millions of nodes, as we
could find in a brain. The similarity between a biological and artificial neuron are
highlighted in Figure 2.4 [1].
As shown in the Figure 2.4, a weight is associated with every input. In the case of
multiple inputs, the neuron outputs a weighted sum of the inputs.
To continue the process, the output of the first layers of neurons is propagated to
the next layer and so on [16]. This structure, as shown in 2.5, takes the name of
Feed Forward Neural Network (FFNN), as the output is always forward-propagated
to the next layer. The layers located between the input and output are defined as

10


2. Theory

Figure 2.4: Structure of biological and artificial neuron [1].

hidden layers. FFNN, shown in Figure 2.5 takes also the name of fully connected
layers, meaning that each node of a layer is connected to all the units of the previous
and next layer.
The learning process is driven by different parameters. An ANN can have trainable
parameters (e.g, the connection between the layers) that are adjusted during the
training process. In contrast, the other category of parameters, called hyperparam-
eters, control the training process and are set beforehand (e.g., learning rate, epochs,
number of layers, and many others).
ANNs learn tasks from empirical samples or data during the training phase. No
previous knowledge or theory is used by the model to learn, and therefore requires
a considerable amount of data.

Figure 2.5: An example of a Feed-forward Neural Network composed of three
layers [17].

11


2. Theory

The training process updates the weights at every cycle, or epoch, based on the
performance of the network. The weights are adjusted to minimize the cost function
or loss. The loss is a function that relates the values of the model at a k-step to the
k-target value. A low loss, generally indicates that the model prediction is close to
the ground truth.

2.2.2 Convolutional Neural Networks
The structure of fully connected layers is not applicable to big inputs, such as
images, as they need to store connections for every pixel. In addition, the network
does not have any spatial information. To overcome this challenge, CNNs have been
developed and become widely applied in the computer vision field. Convolutional
Neural Networks are a type of ANN that uses the convolution operation. The idea
is to consider the data structure as a grid and to share information within a smaller
area, called receptive field. An image is a great example of a grid structure, where
every pixel of the image represents one node of the grid structure. The convolution
operation extracts meaningful features from the data structure. The convolution
starts with a kernel, which is a small matrix of learnable parameters, or weights.
In Figure 2.6, the convolution process in its different phases is shown. The kernel
slides over a 2D plane and computes element-wise product between the weight and
the pixel (see Subfigure a and b of 2.6). The weighted product is then summed into
an output pixel. The process is repeated for the entire grid structure. By sliding
over the input plan, the convolution process results in a feature map, which is a
weighted sum of the input features and a smaller representation of the input (as
shown in Figure 2.6c).

2.2.3 Graph Convolutional Networks
To first introduce Graph Convolutional Networks a few steps need to be taken. First,
a definition of a graph will be presented, followed by what a GNN is and lastly how
convolution is performed.

What is a graph?

In a formal definition, a graph, shown in Figure 2.7a, is an object composed of
nodes (see Subfigure 2.7b) that are connected by edges (see Subfigure 2.7c). The
connections indicate a relationship between the nodes. Mathematically, a graph is
expressed as:

G = (V, E) (2.9)

Here, V indicates the vertices, or nodes, and E is the edge between them.
A direction of the edges can be defined when sharing the information between the
nodes. Given the graph structure, the information can be either stored in the nodes
or/and edges. This means that the relationships among the edges are part of the
information too. To store a graph structure and its information, an adjacency matrix
can be derived. The adjacency matrix is a binary matrix of size n × n, where n is
the number of nodes. If for example, the node j is connected to the k-node, at the

12


2. Theory

Figure 2.6: CNN convolution operation process [24].

(a) a graph with (b) nodes highlighted (c) edges highlighted

Figure 2.7: General representation of a graph with nodes and edges [20]

13


2. Theory

entrance of matrix j-k, k-j a 1 will be placed. Otherwise, if there is no connection,
there will be a zero. Given this reference, the adjacency matrix is a hollow matrix
(i.e. zeros on the diagonal) since a node does not have a connection to itself.
Besides the matrix, there is also another way to show the relationships among the
edges. This is represented by the adjacency list. The list is of size [2, num_edges],
and it stores the link between each node. For example, if node -j and -k are connected
there will be a pair [j,k] [21].
For instance, in the following Figure 2.8, an example of a graph structure and its
derived edge list are presented. The number inside the nodes indicates the label of
the node. The node feature contains the information stored in the node, e.g. -1 for
the node x1 [21].

Figure 2.8: An example of a graph [21]

The adjacent list and matrix of Figure 2.8 highlighting the nodes connection would
be 1:

adj_list =


0 1
1 0
1 2
2 1

 adj_matrix =

0 1 0
1 0 1
0 1 0



A Graph neural network

A Graph Neural Network is a neural network that employs graphs-structured data
[20]. The peculiarity of the graph consists of a lack of ordered grid structure, namely
an un-ordered structure for the node in the graph. They work in a similar way that
a neural network is working on images, for example. This application has enabled
to see different experiments from another perspective. To cite a few, GNNs have
a large application and success in certain areas like chemistry or social networks
among others. In the former case, the edges and the nodes of a graph encode a
molecule, made of atoms and bonds [23]. As for the latter, the nodes are represented
by people´s profile and the edges could be the connections such as ads or people
recommendations.

1From: https://pytorch-geometric.readthedocs.io/en/latest/ in "Introduction by example"

14


2. Theory

Graph Convolution

In Graph Neural Networks, convolution works differently than a standard CNN. A
graph, per definition, consists of a lack of ordered structure. Compared to an image,
in which the position of the pixel is unique and determined, in a graph, there is no
such thing. When it comes to convolution, though, there are other ways to aggregate
information. The method is called Message Passing and it was first introduced by
T. Kipf and M. Welling, illustrated in Figure 2.9 [22]. From the left, given a selected
node, the surrounding nodes - represented by the hollow circles - in h-hops distance
are gathered and the features are aggregated. The adjacency matrix is used to
gather information of the neighboring connection. The mathematical operation in
which the nodes’ features are aggregated is defined using the aggregation function
and usually the sum, max, mean, etc... are used. As a result, in the next layer, on
the right of Figure 2.9, the node’s features are updates.

Figure 2.9: Graph convolution operation [20]

In this project, the convolution function is a GCN class [22]. The aggregation
function adopted is the sum, meaning the nodes’ features are summed with each
other, as shown in Figure 2.9. The convolution also entails self-loops. This implies
that when considering the surrounding nodes, the features of the selected node itself
are also aggregated.

2.2.4 Unsupervised methods
An unsupervised neural network works with unlabeled data. Due to this character-
istic, an unsupervised model is used to find similarities or differences in data, cluster
the dataset, or dimensionally reduced it.
In this project, three different Dimensionality Reduction methods have been in-
vestigated: Graph Convolutional Autoencoder, Convolutional Neural Network and
Principal Component Analysis (PCA).

Graph Convolutional Autoencoder

An autoencoder is a type of neural network model that encodes the input data to
a latent space, a reduced representation of the original input, and then decoded it

15


2. Theory

back to the original status. One of the strengths of the autoencoder is the capacity
of reconstructing the input dataset from the latent space to the original neatly.
One variant of the autoencoder is the one used in this project, called convolutional
autoencoder, due to the employment of convolutional layers in both the encoding
and decoding part.

The architecture The autoencoder, shown in Figure 2.10 is composed of two
parts: the encoder - on the left - and the decoder. The former is the part that
encoded the input to the latent space, whereas the latter is used to reconstruct the
input from the latent space. The latent space, also called bottleneck, is usually a
smaller representation of the input.

Figure 2.10: General architecture of an autoencoder [25].

Specifically, the encoder is composed of a few different building blocks: convolu-
tional layers, a flatten layer, a pooling layer, and linear, or fully connected layers.
In contrast, a decoder is composed of convolutional layers, an unflatten layer, an
unpooling layer to make it consisted with the coding part, and linear layers.
The graph convolutional layer implements the Message Passing function. The flat-
ten layer flattens a multidimensional input tensor to a 1D tensor. This layer is
fundamental since the latent space is made of a 1D tensor. The unflatten layer is
implemented using a reshape function to reach the opposite outcome. A pooling
layer reduces the dimension of the input layer by a selected factor. The reduction of
dimension is driven by the following criteria. The most common pooling layers are
MaxPooling and Average Pooling. In the MaxPooling, the max feature of the kernel
considered is kept and the others are discarded, whereas on the Average Pooling the
average of the features considered is computed. The pooling layer forms the decoder
part, that is where the features are reduced. The linear layer is a fully connected
layer. In the encoder, it is implemented after the pooling layer to reduce the num-
ber of features of a layer. This layer allows the model to go from a few thousand
features to only a few, forming the latent space. In the decoder, instead, it increases
the number of features in the layer from the latent layer to the next one.

16


2. Theory

Principal Components Analysis

PCA is a ML method used for dimensionality reduction that geometrically projects
the observation on a reduced dimensions variables called Principal Components
(PCs). It is often used with big datasets that are computationally hard to han-
dle. The advantage consists of having a smaller dataset that is easy to process and
still contain most of the information [19].
The idea behind the method is to try to preserve as much variability of the dataset
as possible [19]. This is achieved by finding new variables called PCs that are linear
functions of the observations. PCA minimizes the distance between the data and the
projection on the PCs. As a result, it maximizes the variance. Due to this fact, fewer
elements are needed to represent the majority of the variance in the dataset. The
PCs found are not correlated with each other, and they are sortes so that the first
one contains the highest variance. In the figure 2.11 below, an example is given, the
observation and PCs have been shown on a 2D plane. The dataset has been reduced
and represented by 2 principal components (the two orthogonal vectors). The first
component is the one that lies in the direction where most of the information is
contained, i.e where the variance is maximed.

Figure 2.11: Principal component analysis on a sample dataset

To further explain the method, the process can be divided into different steps:
1. Standardize the dataset
2. Compute the covariance matrix to identify correlations
3. Compute the eigenvectors and eigenvalues of the covariance matrix to identify

the principal components
4. Create a feature vector to decide which principal components to keep
5. Recast the data along the principal components axes

Standardize the dataset The standardization, or Z-score, consists of centering
the data by using the mean and the standard deviation. The transformation makes

17


2. Theory

the PCA robust to outliers. This is done by scaling the observations according to
the following formula.

z = value − mean

standard_deviation
(2.10)

Compute the covariance matrix and identify the principal components
The second step consists of computing the covariance matrix of the standardized
data to find the possible correlation. The covariance matrix is a symmetrical matrix
of size p × p, where p is the number of variables. By finding the covariance matrix,
and using the eigendecomposition, the principal components are represented by the
eigenvectors.
Let´s assume that inside a dataset there are p variables and n observation for each
variable so that the dataset is a p × n matrix that we will call X. X ∈ p × n.
First, we look for the correlation within the matrix that maximize the variance. The
variance will be given by:

aX = [a1x1, a2x2, a3x3..anxn] (2.11)

where a represents the vector containing coefficients. The variance of the linear
combination can be expressed as

var(aX) = aT Sa (2.12)

where an are the coefficients of the linear combination, aT is the transpose of a and
S is the covariance matrix. To solve the equation we first need to assume that we
have a unit norm vector, namely aT a = 1. Therefore the equation becomes:

aSa − λ(aT a − 1) = 0 (2.13)

By differentiating with respect to a:

Sa − λa ==> Sa = λa (2.14)

From this equation we see that a correspond to the eigenvectors and λ to the eigen-
values of the matrix covariance S.
To give an example, the covariance matrix S for 3 variable datasets will be:

S =

Cov(x, x) Cov(x, y) Cov(x, z)
Cov(y, x) Cov(y, y) Cov(y, z)
Cov(z, x) Cov(z, y) Cov(z, z)

 (2.15)

Once the eigenvalues have been found, the higher eigenvalue corresponds to the first
principal components, since it represents the most variance [19].

Create a feature vector to decide which principal components to keep
In this step the dimensional reduction takes place. Once found the components
and the most significant ones identified, a feature vector is created. The feature
vector contains the kept eigenvectors in its columns. Usually, only some of the

18


2. Theory

components are kept, since they can explain enough variance. To give an idea, the
review from Jolliffe and Cadima [19] reports that 70% of the variance is a good
cut-off or, depending on the case, 2-3 PCs are enough, since most of the variance is
usually contained in the first components.

Recast the data along axes of the principal components In the last step,
the dataset is reoriented from the original axis to the axis identified by the principal
components. This is done by multiplying the feature vector transposed with the
transposed standardized dataset.

featureT ∗ datasetT
standardize (2.16)

2.2.5 Supervised methods
The supervised methods implemented in this project are regression algorithms. The
regression relates a dependent variable to one or more independent variables. It is
considered a supervised algorithm because the label or target are represented by the
independent variables, or predictors. A simple example can be Linear regression -
shown in Figure 2.12. It finds the best linear fit between two variables. The best
fit possible is achieved when the residual sum, i.e. the squared distance between
the point and the straight line - is minimum. The points represent the observations
whereas the line is the best fit found by the algorithm [26].

Figure 2.12: Example of Linear regression.

However, there are also other methods based om different algorithms. Ensemble
methods combine different models to increase the accuracy of the results. There are
three different groups of ensemble methods: bagging, boosting, and stacking.
Bagging is short for bootstrapping and aggregation. Bootstrapping is a sampling
technique that uses sampling with replacement. 2 The aggregation, instead, consists
of combining the individual predictions when making the final prediction. The
bagging method minimizes the variance and reduces the risk of over-fitting, i.e.

2Sampling with replacement means that when extracting one item from the distribution, this
item is placed back in the distribution before sampling again.

19


2. Theory

when the model learns the training data too well that results in learning the noise
and worsen the prediction of unseen data. The Random Forest method belongs to
this category. The boosting methods instead, base their predictions on learning from
previous mistakes. This improves the accuracy of the prediction over time. Both
gradient boosting and XGboost are boosting methods. Lastly, stacking, or stacked
generalization, trains the algorithm using several bagging or boosting methods.

Random Forest

To explain the Random Forest methodology a brief introduction to Decision Trees
is presented.

A decision tree is a tool that uses binary logical operation to make decisions based
on the data given. Visually it is represented by a flowchart of decisions with binary
outcomes, as shown in Figure 2.13. The structure is simple. The root and internal
nodes are decisions. Every decision has an outcome, that can either be another
decision, the internal node or a leaf node. The leaf node represents the end of the
branch, namely the outcome of the branch. The depth of the tree is determined by
the minimum number of questions to reach an accurate prediction.

Figure 2.13: Model of a decision tree [28].

Random forest combines several classifying decision trees on various sub-samples of
the dataset. During the training phase, the algorithm trains multiple decision trees
parallelly. When it comes to the prediction, the result of all the trees is averaged to
improve its accuracy and limit overfitting, as shown in 2.14.

20


2. Theory

Figure 2.14: Random forest method [29].

Gradient Boosting

Gradient Boosting belongs to the boosting category. As Random Forest, it is based
on decision trees. However, instead of building parallel trees, the trees are built
consecutively. As the name suggests, a gradient-descent algorithm is used, which
helps the algorithm to find the best fit when the gradient is minimized. In Figure
2.15, the algorithm is shown. First, it builds a tree and computes the error, i.e. the
residuals. The second tree will be built based on the error of the previous one. The
process continues and iterates the same steps. In this way, the error decrease over
time, since the algorithm learns from previous mistakes. Moreover, there is also one
important hyperparameter, the learning rate. Every tree is scaled by the learning
rate. In this way, not every tree contributes in the same way, and only the more
significant ones contribute to the final prediction.

Figure 2.15: The addictive process of the Gradient boosting algorithm [30].

21


2. Theory

XGBoost

XGBoost stands for Extreme Gradient Boosting. It can be considered a development
of Gradient Boosting. It is optimized to have faster computational speed and better
performance. Here, the trees are built in parallel instead of consecutively.
The goal, as for the other algorithms, is to reduce the residuals, between the predic-
tion of the tree and the target value. In the process, a few steps can be identified
[32].
First, a tree is created. Then, the prediction for the first tree is made. The average
of the predictions is computed and the residuals are calculated with the loss function
of choice. Nextly, a similarity score is calculated as follows:

similarity − score =
√

Σresiduals

λ + nresiduals

(2.17)

where λ is the L2 regularization term, also known as the Mean Square Error.
By using the similarity score, the information gain is calculated. The information
gain measures the quality of a tree structure. Particularly, which features or split-
point maximize the gain. Mathematically, expressed as:

Lgain = 1
2

[
(∑

i∈IL
(gi)2)∑

i∈IL
hi + λ

+ (∑
i∈IR

(gi)2)∑
i∈IR

hi + λ
− (∑

i∈I(gi)2)∑
i∈I hi + λ

]
− γ (2.18)

where the g terms are the sum of the gradient of the loss function, i.e the squared
sum of the residuals. The sum of the denominator gives H, which is the hessian of
the loss function. This corresponds to the number of residuals. The first term is the
information gathered from the left child node, the second term from the right node,
and the last one from the root node. γ is a hyperparameter, and represents a fixed
threshold for the gain-improvement to keep a split [31]. Therefore the equation can
be expressed as:

Lgain = Leftchild−leaf−similarity +Rightchild−leaf−similarity −Rootnode−similarity. (2.19)

During the tree-building process, the algorithm stops for two reasons. The first
occurs when the tree has reached its maximum predefined depth, whereas the second
takes place if all the sum of the Hessian falls in one single node. In addition, at
the end of the building process, the pruning step takes place 3. Some nodes can
be pruned if their gain value is lower than γ. If instead, the gain value is higher
than γ then the pruning process stops and does not check the parent nodes of that
particular branch.
Lastly, the last steps involve predicting the residual values of the trees and scaling
these by the learning rate.

3Pruning decreases the size of the trees. In the process every split is evaluated. If given condition
are satisfied, the split is considered redundant and removed.

22


2. Theory

2.2.6 Error Metrics
To evaluate the performance of the different methods, different error metrics are
used. They relate the prediction of the method to the target value in order to
improve the accuracy and the performance of the model.

1. Mean Square Error (MSE)
Also known as L2 regularization. It is defined as the sum of the squared
difference of the residuals. It is defined as:

MSE = 1
n

N∑
i=1

(ytrue − ŷpredicted)2 (2.20)

where n is the number of samples. It is expressed in mm2.
2. Root Mean Square Error (RMSE)

It is the root of MSE. It is useful since the metric is expressed in mm,

RMSE =

√√√√ 1
n

N∑
i=1

(ytrue − ŷpredicted)2 (2.21)

3. Mean Absolute Error (MAE)
Also known as L1 regularization. It computes the expected value of the abso-
lute error loss.

MAE = 1
nsamples

(
N∑

i=1
|ytrue − ŷpredicted|) (2.22)

23


2. Theory

24


3
Methods

Firstly, an overview of the models and their features are described to better under-
stand the workflow and the reason for some choices. Then, an overview of the work-
flow is presented. This includes data acquisition, extraction, and arrangement for
dimensionality reduction methods to build the meta-model. The methods compared
were Graph Autoencoder, Convolutional Neural Network, and Principal Component
Analysis. Supervised regression tasks - introduced in Section 2.2.5, such as Random
Forest, Gradient Boosting, and XGBoost - have been implemented on reduced di-
mensional space to link the simulation parameters to the latent space. Lastly, from
the surrogate model, the latent space for the test data was reconstructed back to its
original size to compare it with the original data and to estimate the error.

At the beginning of this project, a simplified model has been used to study and
implement the previously mentioned workflow. The reason for this choice is that a
simpler simulation is faster to run and analyze. In the simulation, a beam object
moving into the 3D space has been modeled. The beam starts from an equilibrium
position (see Figure 3.2a) and is first exposed to a force on one side (see the arrow in
Figure 3.2b) along with another one that will obstruct the object to move, labeled as
release force. At a certain point, the latter is removed and the object is free to move
in the space . Since initially the object was blocked on one side, when released, it
will move into the space with high-frequency oscillations. Geometrically, the beam
is meshed by quads elements1, shown in Figure 3.1. It contains 104 nodes. The
simulation outputs 102 discrete timesteps which correspond to the duration 100 ms.

Figure 3.1: LS-DYNA Element shell

In the simulation, multiple parameters regulate the forces, the components setup,
or the dynamics of the parts during the events. These will be referred to as FE
simulation parameters. The value of the parameters has been chosen arbitrarily

14 vertexes per element

25


3. Methods

(a) Initial position.

(b) After 70ms, the constraint
is removed and the object moves
freely into 3D space

Figure 3.2: The beam model

following the empirical judgment. For the beam model, the FE parameters were the
acceleration applied to the extremity of the beam and the release force and were
varied as shown in Table 3.1.

FE Beam model
Parameters Unit min max
Release [N] 1 15
Acceleration [m/s2] 0.1 1.2

Table 3.1: FE simulation parameters for beam object

Once the methods have been implemented and the first results obtained, the pipeline
developed has been applied to the pre-crash phase simulations of a dummy model.
The pre-crash simulation is composed of a Hybrid III dummy model of a 50th per-
centile man seated in the driver´s position inside the vehicle cockpit. The portion of
the cockpit in the simulation contains only the driver’s side, as shown in Figure 3.3a.
The simulation starts with the driver being positioned in the seat, also known as the
settling phase, of 300 ms. Afterward, a pulse is applied and the vehicle begins to
move at a constant velocity, approaching a pre-crash phase, up until a break or steer
takes place. The pre-crash phase lasts 2.3 seconds. The simulation is controlled by
5 FE parameters (see Table 3.1). The pre-pretensioner tightens the seat belt and
makes it adhere to the dummy. A tense seatbelt limits the possibility of the shoul-
der belt to slide out in case of critical events. Different scaling factors have been
chosen ( see Table 3.2) and will influence the kinematics of the dummy during the
pre-crash or in-crash phase. At 640 ms, the retractor of the shoulder belt and the
lap belt are locked. The retractor forces the belt to lock in case of a sudden change
in the delta velocity. The delay in the breaking of steering acceleration was varied
between 0 ms, corresponding with the end of the settling phase, up to 700 ms, i.e.
1 second after the start of the simulation. This parameter changes the initiation of
the pre-crash phase, where the maneuvers such as steering or breaking occur. The
acceleration value of the two parameters changes the range of motion during the
pre-crash phase. A higher breaking acceleration will increase the forward motion of
the dummy. For every simulation, only breaking or steering are applied. This was
chosen to limit the complexity of the simulation and thus aim at lower training time
for the NN models.

26


3. Methods

FE Dummy model
Parameters Unit min max
Scaling factor Pre-pretensioner Force - 0.001 2
Breaking or Steering Initiation 2 - 0 1
Delay Breaking or Steering Initiation [ms] 0. 700
Breaking acceleration in g [m/s2] 0.05 0.7
Steering Acceleration in g [m/s2] 0.05 0.7
Lock time belt - Pretensioner [ms] 300 300

Table 3.2: Pre-crash FE parameters

(a) Initial position. (b) After 600ms, dummy subjected to
steering acceleration

Figure 3.3: Pre-crash simulation

The node history of the dummy is sampled every 25 milliseconds for a total of 81
timesteps. For the purpose of the project, only the nodes belonging to the dummy
were taken into consideration, leaving out all the ones belonging to the vehicle
(including the seat belt). The dummy model is composed of 7444 nodes. As for
the geometrical structure, this is more complex than the beam simulation. It is
composed of solid, shell, discrete, beam elements (see Theory 2.1).

3.1 Data extraction
As mentioned above, two different models have been used. The simulations were
performed using LS-DYNA solver R11.1.0 for the beam model and R.9.3.0 for the
pre-crash simulations.
To train the ML methods, multiple simulations with varying parameter values are
needed. For this purpose, the Dynakit python library 3 was utilized. The parameters
are obtained using a uniform distribution and random sampling. After obtaining
multiple simulations, the FE simulation parameters were extracted from the d3hsp
file, an output file from LS-DYNA solver.

2The breaking or steering initiation range is within [0,1], where 0 corresponds to only steering
and 1 only breaking.

3Available at https://github.com/dynapy/dynakit

27


3. Methods

The data has been extracted from D3Plot of the simulation using the lasso-python
open-source library from LASSO GmbH [34]. The nodal history acquired from this
step represents the dataset used in the project. The D3Plot stores the time history
of every node in the simulation object. Since the focus of this project is centered
around the kinematics of beam and the dummy model, only nodes belonging to these
were extracted. For the pre-crash simulation, since the dummy and the vehicle are
moving at a constant velocity, a new reference point belonging to the dummy was
selected and centered to form a new coordinate axis. If this step would have not been
taken, the relative displacement of the nodes for the pre-crash adjustments of the
dummy would have been obscured by the distance covered at a constant velocity. In
both cases, the relative displacement was selected. Specifically, the initial position
of the node was subtracted at every step to get the relative values. In both models,
the displacement is measured in millimeters.
To sum up, from the D3plot the node history has been extracted and stored in 4D
tensor of shape:

[nsimulation, ntimesteps, nnodes, xyzdisplacement] (3.1)
and the simulation parameters stored in multidimensional tensor of size:

[nsimulation, nF E−parameters] (3.2)

80% of the displacement and parameters tensors were kept for training and 20% for
validation.

3.2 Dimensionality Reduction I - GNN
The displacement tensor extracted from D3plot was structured in a 4D tensor of the
following dimensions:

[nsimulation, ntimesteps, nnodes, xyzdisplacement] (3.3)
The following approach was taken:

• the number of simulation and timesteps were stacked together to create 1D
array of shape nsamples = [nsimulation ∗ ntimesteps]. This results in having one
sample equivalent to one time-step.

• the nodes and their displacement were extracted and to be later saved in the
node features.

For this part of the project PYG - Pytorch Geometric Library was used [21]. As
required by the library, the nodes’ labels were relabeled from a range of [1,205]
to [1,104] for the beam model and [1000001, 1011769] to [1,7444] for the pre-crash
simulation.
In addition, in the pre-crash simulation, 3 nodes used for positioning were removed.
These nodes were polluting the dataset of relative displacements, since they were
moving at a constant velocity, they had a high displacement value. They were used
only to position the dummy during the settling phase, and therefore they did not
alter the final results. Therefore, the model ended up being composed of only 7441
nodes.

28


3. Methods

Lastly, one more step was taken to create the dataset. Since the GNNs take into
consideration also the structure of the model, this needed to be extracted from the
FE model definition. As mentioned in the Theory, the node belonging to solid,
beam, discrete, and shell elements were extracted.
For every connection of the node, an adjacency list has been created. Particu-
larly, not only the directed adjacent nodes have been inserted, but all the elements
connected 1 unit from the selected node. To give an example, In Figure 3.4, an
adjacency list for node 1 on imaginary squared structure is shown.
By following this criterium, the adjacency list will result in:

node 1 : → [2,3,4,5,6,7,8,9]

Figure 3.4: The edge list connections for an imaginary structure.

In fact, if we were to pull node 3, the entire structure would be deformed and not
only nodes 2 and 8 would be influenced. Therefore, the choice of adding also node
1. After following this principle, the number of resultant edges is presented in Table
3.34.

Number of edges
Beam model [2, 656]
Pre-crash simulation [2, 96 928]

Table 3.3: Number of edges in the models

Nextly, once data preparation was completed, a new dataset needed to be built, by
joining the nodal history and the edge information. The Python Geometric library
offers a class, named Data, that builds a geometric dataset. This class enables to
define a graph following the definition given in 2.9, namely defining a node history
and connection together.

data = Data(x = x, edge_index = edge_index) (3.4)
The Data.x is the features tensor. It is [nnodes, 3] long, where n_nodes correspond to
the label and the three dimensions are the x,y,z displacement for the specific node.
Whereas the Data.edge_index stores the edge connections of the entire model.
In the graph convolution, the number of simulations and timesteps are stacked
together in one dimension. As a result, one graph for every timestep is created.
This means that the geometric dataset is a list of data classes long [nsim ∗ ntimesteps].

4The first dimension 2 indicates the two edges connected to each other. See the adjacent list of
Figure 2.8 in the Theory.

29


3. Methods

In addition, data has been normalized and standardized. In the first case, all the
values have been scaled by the max increment of two timesteps. The scale factor
was 13.13 mm for the beam model and 25.52 mm for the pre-crash simulation. The
other method, performed only for the pre-crash simulation, was the z_score. It
centers the dataset around the mean value and scales it for the standard deviation,
as shown in equation 2.10.
The model and the features of an autoencoder have been introduced in section 2.2.4.
Different architectures have been tested and studied, by changing different combi-
nation of layers size and depths of the network. The reason why the architecture
between the two model are different lies in the fact that they are working with dif-
ferent dataset size, yielding to slightly different approaches and architectures. For
example, the linear layer size in the pre-crash simulation was combined with a pool-
ing due to high memory and RAM usage. Linear layers, due to their structure, take
a considerable amount of RAM, having to store weight and value for every node.
Specifically, the architecture adopted is shown in the the tables 3.4, 3.5.

Beam model
Encoder

Layers In-channel Out-
channel

Output shape

Linear 3 3 [104,3]
GCN convolution 3 3 [104,3]
ReLU - - -
GCN convolution 3 3 [104,3]
ReLU - - -
Flatten - - 312
Linear 312 150 150
ReLU - - -
Linear 150 12 12 5

Decoder
Linear 12 150 150
ReLU - - -
Linear 150 312 312
ReLU - - -
Reshape - - [104,3]
GCN convolution 3 3 [104,3]
ReLU - - -
GCN convolution 3 3 [104,3]
ReLU - - -
Linear 3 3 [104,3]

Table 3.4: Graph Autoencoder Architecture for the beam model

Different setups have been implemented and will be presented in the Result section.
A study for the latent space size has been performed to investigate how the per-
formance of the model is influenced. For the beam model, the number of samples
and the latent space size is varied and the performance of the model is evaluated.

30


3. Methods

Pre-Crash simulation
Encoder

Layers In-channel Out-channel Output shape
Linear 3 3 [7441,3]
ReLU - - -
GCN convolution 3 3 [7441,3]
ReLU - - -
GCN convolution 3 3 [7441,3]
ReLU - - -
Flatten - - 22323
MaxPooling - In channel /2 11161
Linear 11161 20 20

Decoder
Linear 20 11161 11161
Upsample 2 * In channel 22323
ReLU - - -
Reshape - - [7441,3]
GCN convolution 3 3 [7441,3]
ReLU - - -
GCN convolution 3 3 [7441,3]
ReLU - - -
Linear 3 3 [7441,3]

Table 3.5: Graph Autoencoder Architecture for the Pre-crash simulation

31


3. Methods

The sample size considered and analyzed was 25, 50, 75, 100 simulations. For the
pre-crash simulation, the comparison of sample size was not performed for computa-
tional limits, as the number of samples was 75 and the number of parameters varied
was 5. In addition, different methods such as scaling and standardization have been
tested.
The network was trained using Google Colab PRO Tesla T4 GPU. The models were
trained with Adam optimizer and the hyperparameters are shown in Table 3.6.

Hyperparameters Beam model Pre-crash Simulation
Learning rate 1−5 0.001

Epochs 100 500

Table 3.6: Training hyperparameters: epochs and learning rate

After the training, for the different implementation, the dataset was re-scaled back
to the initial values to estimate the real error. MAE and MSE were computed to
evaluate the result of the autoencoder.

3.3 Dimensionality Reduction II - CNN
The aim of developing a Convolutional Neural Network model lies in investigating
the advantage of using geometrical information of the model to achieve the similar
outcomes.
The architecture was chosen to be as similar as possible to the GNN autoencoder, but
instead of graph convolution, 2D convolution was used. To investigate the difference
in performance, the receptive field of convolution resulted in being similar to GNN.
In Table 3.5, the number of layer of graph convolution used is two. Therefore it
corresponds to aggregating and convoluting nodes up to 2 node distance. In this
architecture to achieve a similar result, a 2x2 kernel was used. The CNN does not
store the geometry of the model. Hence, by having two consecutive nodes does not
mean they are adjacent in the FE structure. The approach to design the architecture
was inspired by the image-pixel applications. The x,y,z displacement represent the
channels of the input layer, similar to the color of RGB channel in images. The data
arrangement was slightly different. The input data of the network was represented
by a simulation instead of a singular time sample. In convolution, the number
of nodes and the time dimension represent the width and the height of the input
window. Therefore, the input batch is of the following shape:

[nsimulation, xyzdisp, nnodes, ntimesteps] (3.5)

The architecture used was the following shown in Table 3.7:
Some of the layers have already been introduced in the first architecture. How-
ever, there are two new layers, convolution2D (see Section 2.2.2, Figure 2.9 ) and
MaxPool2D ( see Section 2.2.4) .
The convolution 2D works over a 4D input [N, C, H, W], where N is the batch
size, C denotes the number of channels, H and W are the height and width of input

32


3. Methods

Pre-Crash simulation CNN
Encoder

Layers In-channel Out-channel Kernel Output shape
Linear 81 81 [1,3,7441,81]
ReLU - - - [1,3,7441,81]
2D convolution 3 3 [2,2] [1,3,7440,80]
ReLU - - - [1,3,7440,80]
2D convolution 3 3 [2,2] [1,3,7439,79]
Max Pooling 2D scale-factor 4 - [1,3,1859,19 ]
Flatten - - - [1,105963]
Linear 105963 20 - [1,20]

Decoder
Linear 20 105963 - [1,105963]
ReLU - - - [1,105963]
Reshape - - - [1,3,1859,19]
Upsample - [7439,79] - [1,3,7439,79]
2D convolution 3 3 [2,2] [1,3,7440,80]
ReLU - - - [1,3,7440,80]
2D convolution 3 3 [2,2] [1,3,7441,81]
ReLU - - - [1,3,7441,81]
Linear 81 81 [1,3,7441,81]

Table 3.7: CNN Architecture for the Pre-crash simulation

planes. In this case, to maintain the same methodology, the batch size has been kept
to one as in the GNN architecture. This indicates that the model is trained with
one sample at the time. As already mentioned, the number of channels correspond
to three, for the x,y,z, displacement. The convolution kernel slides along the window
formed by the width and height of the input window. For every convolution, the
size of the input size will shrink. The final output is given by the formula:

Hout = Hin − (kernelsize − 1) (3.6)
Wout = Win − (kernelsize − 1) (3.7)

The MaxPool2D downsamples - by a factor of 4 - the input plane by extracting the
max features over the kernel size. The output of the max pool is given by the same
formula of the convolution, presented in 3.7.

3.4 Dimensionality Reduction III - PCA
As mentioned in Section 2.2.4, the dataset obtained from the D3plot has been stan-
dardized using z-score method.
The data needed to be arranged in a way suitable for the PCA method on sklearn
python library [35]. The PCA function implemented works with input shape of

nsamples, nfeatures (3.8)

33


3. Methods

From the 4D tensor, the nsimulation and ntimesteps along with nnodes and x, y, zdisplacement

were stacked together to form a 2D tensor. By shaping the input dataset in the 2D
tensor dimension, the next step was to decide the number of components or ex-
plained variance. The percentage of explained variance chosen was 99% percent
from the dummy model, resulting in 5 components. After inputing these data in
the PCA function the output of the reduced dimensional space was of shape:

[nsamples, ncomponents] (3.9)
The DR in PCA consists of reducing the number of features to the number of
components, in this particular case to reduce the node history for every sample to
a few components.
To compute the error from the PCA, the inverse of PCA transformation to input
shape was computed and MSE and MAE were calculated for the entire dataset.

3.5 Regression
Both PCA and GNN are using the same data arrangement for regression. Therefore
will be treated once for both of the methods.
After obtaining the latent space for the model, the next step consists of linking this
one to the FE parameters. The regression takes as input FE parameters and will
relate the parameters to the target, the latent space.
The FE parameters, after being extracted from the d3hsp - an output file from
LS-DYNA - have been stored a tensor of shape

[nsims, nparameters] (3.10)

Therefore it needed also to be arranged to match the same shape of the latent space.
Two steps were taken towards this setup:

• The first dimension was brought to match the one of the reduced dimension,
therefore n_sim ∗ timesteps

• Due to the arrangement simulation * timestep, the result obtained is one low-
dimensional space for every timestep. Therefore, time becomes one additional
parameter to connect k-latent space with k-timestep.

To give an example, given α, β, two FE parameters for a general instance, the new
tensor for the FE is arranged as shown in Table 3.8.
Three algorithms were used: Random Forest, Gradient Boosting, and XGboost. The
first two are from the sklearn library [35] and the last one from XGBooost library
[31]. For all the regression methods, the hyperparameters were tuned using two
different optimization frameworks, the RandomizeSearchCV from Sklearn [35] and
Optuna library [36].
The RandomSearch performs a Random Search of some fixed parameters set where
every setting is chosen over a distribution. It also allows to perform KFold cross-
validation. The number of folds for the KFold cross-validation was chosen to be 5,
i.e. the training dataset was divided in 5 folds and 80% was used for training and
20% for validation. The best set of parameters is chosen by using a fit and score
method.

34


3. Methods

Simulation Input Target

1

α1, β1 ,t1 Latent spacet1
α1, β1,t2 Latent spacet2

.. ..
α1, β1 ,tk Latent spacek

n-1

αn−1, βn−1,t1 Latent spacet1
αn−1, βn−1,t2 Latent spacet2

.. ..
αn−1, βn−1,tk Latent spacek

n

αn, βn,t1 Latent spacet1
αn, βn,t2 Latent spacet2

.. ..
αn, βn,tk Latent spacek

Table 3.8: Regression: FE simulation parameters and Latent space

Optuna uses a Bayesian approach to sample the hyperparameters. In contrast to
the random search where the parameters are sampled from a distribution, the choice
for the set of parameters is based on previous evaluations. For simplicity, it can be
summed up as two-step process, the study and the trial. First, a study is initiated
to evaluate the objective function. In the study, Optuna takes in consideration a
subset of the hyper-parameters given. After initiating a study, the trial takes place.
In this part of the process, the algorithm evaluates the set of parameters by giving
a score. After testing and evaluating for n trials, the framework output the best set
of parameters with the best score.

After finding the best set of parameters for the regression methods, the regressors
were fitted to the training set. Lastly, the validation set was predicted. For the
PCA, the inverse transform of the PCA is used. It restores the number of feature
for every sample and scales it back. For the GNN instead, the output from the
regression is decoded back by using the trained decoder. Next, the MAE and MSE
are computed for original test set.

3.6 Animation on META post-processor
The last part of the pipeline consists of recovering the predicted data and arranging
them to animate the reconstructed simulation in META post-processor. To goal is
to have an animation of the model. To accomplish this last part of the pipeline, a
script from Karl-Johan Larsson (PhD from Chalmers) was used. The script enables
the META post-processor to read LS-DYNA format files and build the animation
from a static base model. To be able to achieve this, the script was slightly modified
to read a multi-step simulation. In addition, the simulation needed to be written in
LS-DYNA format. To implement this, every timestep of the simulation was created
as a unique file. In the file, the total displacement for every node was written.
The total displacement was obtained by summing the initial position to the relative

35


3. Methods

displacement output from the surrogate models, separately for x,y,z-displacements.
And in the case of the GNN the reference coordinate system was restored to the
initial one.

36


4
Results

The performance of the different models is compared and presented in this chapter.
The chapter is divided into two sections, the first regarding the beam model and
the second regarding the pre-crash simulation. For both models, the dimensionality
reduction will be presented first, followed by the regression methods.

4.1 Beam model

4.1.1 Dimensionality Reduction
GNN

The structure of the Graph autoencoder is presented in Table 3.4. In the Table 4.1
below, the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) on
the train and test data is shown for data. The latent space dimension tested were
10,25, 50. The size was arbitrarily chosen. The error in both the training and testing
set reduces when the latent space is increased. By having all the same epochs and
learning rate, the only parameter changing is the size of the linear layer, which can
be held responsible for this outcome. A reduced latent space can reduce the capacity
of the layer to learn throughout the training. In fact, a latent space size belongs to
one of the hyperparameters that need to be optimized in an architecture.

Latent space
size

Train Test

10 RMSE 7.85 9.78
MAE 5.06 6.373

25 RMSE 7.82 9.14
MAE 5.03 5.763

50 RMSE 6.83 8.49
MAE 4.41 5.07

Table 4.1: MAE and RMSE for different latent space size

In the following Figures 4.1, node 99 - located on the upper edge of the beam -
trajectories are displayed. The trajectory of x,y,z is plotted as function of time.
In Figure 4.1c, the prediction is closer to the true values rather than 4.1b, 4.1a.
Moreover, it seems to better interpolate high displacement of the y axis, whereas
the high frequencies deflections are better captured by 4.1a.

37


4. Results

(a) Latent space of 10 (b) Latent space of 12 (c) Latent space of 50

Figure 4.1: GNN evaluation on Node 99 with respect to different latent space
sizes.

In addition, a comparison of the test and train error has been made when given
a fixed latent size, learning rate, and epochs. Figure 4.2 shows MAE for a size of
25, 50, 75, 100 samples. The error decreases when the sample size is increased.
The maximum error of 21.552 mm of the test set for 25 samples is caused by an
unrepresentative test dataset of only 5 samples. In this case, there are enough
samples for the training set, but the model struggles to generalize the predictions
for the test set. By increasing the number of samples to 50, this issue is overcome
and the model reaches a good fit. From 50 up to 100 samples, the model seems to
have found a plateau, that even when increasing the number of samples it does not
yield evident differences.

Figure 4.2: GNN Train and Test MAE for different sample sizes

PCA

Different numbers of components capture different percentages of variance in the
dataset. In Figure 4.3, this relationship is highlighted. Interestingly, the 3 compo-
nents capture 90% of the variance. Whereas going from 99% to 99.99% the number
of components changes from 7 to 13.

38


4. Results

Figure 4.3: PCA - explained variance with respect to the number of components
for the beam model

In Table 4.2 and Figure 4.4, the number of components and the number of variance
explained by each component is presented. As introduced in Section 2.2.4, the first
component captures the majority of the variance. It is also possible to see that from
the 8th component the total amount of variance captured is 1 %, with a contribution
of 0.41 % from the 9th component.

N. of components Total Variance
3 90%
7 99%
13 99.99%

Table 4.2: Number of components and explained variance

Figure 4.4: Number of components and explained variance

Figure 4.5, shows the relation between number of components and MAE on the
training dataset of the method. Capturing more variance, and therefore having

39


4. Results

more components, results in lower training error. Moreover, the graph shows also
the same trend for different samples sizes of 25, 50, 75, 100. The graph starts at
7.84 for 100 samples, 7.8 for for 75 samples, 7.26 for 50 samples and 7.037 for 25
samples. The minimum error reached for 13 components is 0.0023 by samples size
of 50. Analogous results are achieved by the other samples sizes, slightly different
values after the third decimal.
As shown in Figure 4.5, different number of components yields different training
error. In Figures 4.6, node 44 - located on the lower edge of the beam - of the
training set is plotted for different amount of components. Clearly, in Figure 4.6a,
the prediction is inaccurate from the time 0, and pronounced particularly for the z
axis. On the contrary, when using 13 components, the predicted values and the true
ones are overlapping by being able to capture the high-frequency deflection around
0 of the y axis.

Figure 4.5: PCA beam model: Train MAE error with respect to the number of
components

(a) 3 components (b) 7 components (c) 13 components

Figure 4.6: Beam model: node 44 with respect to the different number of
components.

40


4. Results

4.1.2 Regression
For the beam model, in Figure 4.7, a comparison between the number of samples,
regression methods, and validation error is presented for the Graph Autoencoder
and PCA. The latent size chosen was 7 for the PCA and 25 for GNN. The error
plotted is the average MAE of the validation set.

(a) PCA-regression (b) GNN - regression

Figure 4.7: Beam model: final MAE with respect to sample size.

While on the autoencoder the decreasing trend is clearly visible, the same is not for
the PCA. However, it is possible to notice that there is a decline from 50 sample
size. The minimum error is reached in PCA with the Random Forest algorithm.
Lastly, in Figure 4.8, the final trajectory for both method have been plotted from a
sample node.

(a) PCA- RF (b) GNN- RF

Figure 4.8: Beam model: node 99 after the RF regression

4.2 Pre-crash Simulation
In the section below the result for the pre-crash simulation are presented. The
number of simulations used in the dataset is 75, representing all the simulations
available.

41


4. Results

4.2.1 Dimensionality Reduction
GNN

For the Pre-crash simulation, a slightly different architecture was used. The archi-
tecture is presented in Table 3.5.
Table 4.3 shows the train and test Mean Absolute Error (MAE) and Root Mean
Square Error (RMSE) for different latent space sizes. Specifically, the chosen sizes
were 5, 25, and 100. It is possible to notice that the lowest error of 5.11 for the test
set has been reached with a latent size of 100. Overall, Table 4.3 shows that a latent
space of 5 or 100 are the achieved better results than 25.

Latent space
size

Train Test

5 RMSE 6.83 7.38
MAE 4.47 5.174

25 RMSE 8.22 8.95
MAE 5.48 6.01

100 RMSE 6.67 7.40
MAE 4.63 5.11

Table 4.3: MAE and RMSE for different latent space size

The plots for the different latent sizes of training sample 1 for node 2004041 -
located in the leg of the Dummy - are presented in Figure 4.9. In 4.9a the model
reconstruction is far from the ground truth, especially for the y and x displacement.
Of particular interest is the prediction of the x axis (in red) in Figure 4.9c. As it can
be noted, in the first 20 timesteps, the model predictions reconstruction converge to
almost overlap the ground truth. In Figure 4.9b y displacement in the dotted line is
closer to the ground truth than in Figure 4.9c. However, Table 4.3 shows that the
global MAE for the simulation is lower, indicating that the prediction for a single
point may be worst but the general performance of the model improved.

(a) Latent size of 5 (b) Latent size of 25 (c) Latent size of 100

Figure 4.9: Pre-crash simulation - GNN: Node 2004014 for different latent sizes

Scaling vs. Standardization In Table 4.4, the train and test MAE for the
different scaling method is shown. As it is possible to notice, the difference between

42


4. Results

the two is clear. The standardization performs better than max-scaling. In the
AppA, different nodes from the various part of the model have been plotted.
At the end of the training, the training and test error was computed. The results
show as follows in Table 4.4.

Max-time Scaling Standardization
Train MAE [mm] 4.63 3.61
Test MAE [mm] 5.11 3.95

Table 4.4: MAE for max timesteps scaling and standardization

Convolutional Neural Network

The CNN architecture is presented in Table 3.7. The network was trained for 500
epochs and a learning rate of 0.001. The training and test error achieved are pre-
sented in the table 4.5:

MAE [mm]
Train 5.044
Test 5.770

Table 4.5: Training and Test error for CNN architecture.

Node 2009276 - located in the right foot of the dummy - of the train and test set is
plotted below in the Figures 4.10a, 4.10b below.

(a) Node 2009276 - Train set (b) Node 2009276 - Test set

Figure 4.10: CNN performance on training set - Node 2009276

PCA

Similarly to the beam model, the relationship between variance and the number of
components is presented in Figure 4.11. In this case, the number of components to

43


4. Results

capture 90% of the variance is 2. However, in the pre-crash simulation 26 compo-
nents are needed to catch 99.99% of the variance.

Figure 4.11: Pre-crash PCA: Explained variance vs. number of components.

Moreover, Table 4.6 and Figure 4.12 show the number of components along with
the variance captured for each component. Interestingly, in this case, starting from
the fourth component one percent of the variance is expressed. In the case of 26
components, the amount of variance carried from the last components is negligible.

N. of components Total Variance
2 90%
5 99%
26 99.99%

Table 4.6: Amount of variance expresses by each component, increasing the number
of components

Figure 4.12: Amount of variance expresses by each component, increasing the
number of components

44


4. Results

Nextly, in Figure 4.13, node n. 2005770 - located in the right leg of the Dummy - from
the training set is plotted in relation to the variance captured. It is interesting to
notice, in Fig 4.13b that going from only 2 to 7 components increases the accuracy
to almost having an overlap of true and predicted value. Whereas the difference
between Figure 4.13b and Figure 4.13c is less pronounced.

(a) 90 % explained
variance - 2

(b) 99 %
- 7 components

(c) 99.99 %
- 26 components

Figure 4.13: PCA - varying size of components - Node 2005770

4.2.2 Regression
In the Figures below 4.14, a comparison of the test set MAE for the three methods
is presented. For the PCA the best method is the Random Forest, whereas for GNN
is the Gradient Boosting. PCA paired with Random Forest achieves the lowest error
of 4.075 mm. XGBoost has the highest error if paired with PCA. The same counts
for the GNN with an error of 6.669 mm.

(a) PCA (b) GNN

Figure 4.14: MAE for Random Forest, Gradient Boosting, and XGBoost in test
set

In addition, node 2004014 - located in the left leg of the dummy - is plotted for both
methods paired with GB in Figure 4.15. Overall, displacement for Figure 4.15a is
closer to the ground truth than Figure 4.15b. For the y displacement (the magenta
line), the reconstructions are similar. However, the behaviour differs for the x (the
red line) and z displacements (the blue line). In Figure 4.15a, the prediction and
ground truth are overlapping for the first 20 steps for the x displacement, indicating

45


4. Results

the good performance of the model in reconstructing the latent space. In contrast,
in Figure 4.15b the reconstructed displacement is not converging to the ground truth
as much as the other model, yielding an increase of the MAE error for the node plot.

(a) PCA-RF (b) GNN-GB

Figure 4.15: Pre-crash Regression: Node 2004014 from test set

PCA vs GNN Lastly, in the following Figures 4.16 - A.7, a comparison of the
animated simulations is presented. Both methods used gradient boosting regression.
In green is the ground truth, which is the original simulation. In orange, the recon-
structed simulation. The timesteps selected were the 17th and the sequence from
the 25th to 29th, specifically time 729.99 and from 929 to 1029 ms. The sequence
of frames was chosen to display the reconstruction in subsequent frames where the
breaking of steering is occurring at its maximum. As shown in Figure 4.16, it is
possible to notice that most of the reconstruction error is located in both the upper
and lower limbs, the head, and the pelvis area. PCA achieved better results, coher-
ently with what shown in Figure 4.14. Here, in this chapter, only the first frame is
shown. For a better comparison, the other frames are shown in the Appendix A.

(a) PCA-GB (b) GNN-RF

Figure 4.16: PCA-GNN comparison: after 729 ms

46


4. Results

CNN vs. GNN Additionally, another comparison is presented between the CNN-
GB network and the GNN-GB. The same frames have been selected. At 729 ms,
the two models have analogous results. In the frame sequence CNN has better
performance and lower error in both the upper limbs and back area.

(a) CNN-GB (b) GNN-RF

Figure 4.17: CNN - GNN comparison: after 729 ms.

47


4. Results

48


5
Discussion

The initial questions and goal of the project were to study and analyze in-depth
the GNN potential. Differently from CNN, GNNs store and use geometrical and
spatial information in the convolution, therefore have the potential of achieving
better results. Particularly, the graph built by the adjacency matrix is a similar
copy of the model FE mesh. Different methods of DR were implemented, evaluated
and compared. The outcome and the performance of the models were determined
by many factors. In the following sections, some of the most important aspects will
be addressed.

5.1 Dataset
Starting from the dataset, for instance, different scaling techniques can be imple-
mented. Normalization, scaling, and standardization are the most common. They
all act on the dataset differently. The presence of outliers1 can influence the entire
workflow, and therefore different methods are adopted to minimize the effect of out-
liers.

For example, in the beam model, 10 isolated nodes were not moving throughout
the simulation. These nodes were not belonging to the beam object but were used
by the solver during the simulation. In the beginning, these nodes were contaminat-
ing the dataset and the network was terrible at predicting their trajectories. When
the stationary nodes were removed from the beam object, the network achieved
lower training and testing errors. Similarly, in the pre-crash simulation three nodes
that were extracted from D3plot and used during the settling phase. The average
relative displacement of the nodes in the dummy is 25mm. The relative displace-
ments of the outliers were 20m (a thousand times more!). When normalizing or
scaling, due to high values, the mean of the dataset was offset from the real and
true mean, and undermining the loss function throughout the learning process. The
solution adopted was to create a sub-graph that left out the outliers.
A comparison of different scaling methods was described in Section 4.2.1. The stan-
dardization showed to perform slightly better than the max scaling with a difference
in the error of around 1 mm. This behavior can be explained by the fact that the
polluting nodes were the only clear outliers and the relative displacements did not

1Outlier is a value that differs significantly from the rest of the dataset and interferes with the
learning process. In ML, several approaches are taken to reduce the effect on the dataset in order
not to affect the training quality.

49


5. Discussion

differ noticeably from each other, which is logical since the nodes are moving to-
gether. As a result, from the very first step, every choice determines the outcome
of a method and shows that data cleaning is a fundamental part of a ML pipeline.
In addition, another factor influences the results. For the beam model, the sampling
rate was 1000 Hz (one sample per ms). For the pre-crash simulation, the solver
sampled every 5 ms. However, in the data extraction process two timesteps dif-
fered by 25 ms. These choices were the result of computational limitations. The
lower the sampling, the higher the memory usage to store the dataset. Choosing a
sampling rate of 5 ms was not feasible for all the simulations during this project.
But increasing the sampling rate, and hence the timesteps, increase the number of
samples for training. As shown, by the samples comparison of the beam model, the
higher number of samples, the lower the error.

5.2 Architecture
The arrangement of the dataset determines the ML architecture. Choosing an input
size of one timestep or one entire simulation affects the hyperparameters and the
learning process of the network. Also, the hyperparameters such as learning rate,
epoch, layers size, and batch size are of great importance and challenging to tune.

Batch size
In GNN, the choice of the batch size is fundamental. At the beginning of the project,
a small CNN on the beam model was implemented to learn the Pytorch API. In this
example, the hyperparameters were tuned accordingly to the case and architecture.
In the GNN architecture, using the same batch size of 5, yielded the network not
learning any feature in the training process. Therefore, hyperparameters are unique
choices for each application that are time-consuming to tune manually.

Over-smoothing
Generally, for datasets such as the Hybrid III dummy, deeper NNs are used as
they have more trainable parameters. A limitation for GNN was the so-called over-
smoothing. This phenomenon is very common in the GNN field [44]. Over-smoothing
occurs when there are several layers of GNN convolution. This leads to features of
the nodes being convoluted many times. As a result, they become similar to each
other due to several aggregations [44]. This phenomenon observed at the initial stage
of the implementation limited the depth of the network and the solution adopted
was to reduce the number of convolution layers in the network.

Limitation GNN
There is a rather well-known and successful architecture in other applications and
this proved to achieve good results in the project from Gurram and Venkata [8]. This
architecture is known as convolutional U-net. Its name comes from the shape of the

50


5. Discussion

network. They are based on convolutional auto-encoders, namely an architecture
alternating convolution and pooling layers. The embedded space is achieved both
by the contribution of convolutional and pooling layers. Unfortunately in the GNN
field, pooling layers are a challenging feature and are not yet fully established. The
reason for such challenges derives from the graph shape and its lack of rigorous
structure, unlike images. Therefore, when reducing the input features there are no
fixed criteria to aggregate them. More importantly, no current implementation can
unpool the graph to its original form. This is due to the information of the edges
being lost when the graph is pooled which cannot be reconstructed afterwards.
Moreover, the current limitation comes from the maximum number of nodes in a
graph that can be process by the four most common pooling methods. According
to Grattarola and al., this limit is 10000 for dense adjacency matrices and 30000 for
sparse ones [38] 2. This imposes a limitation that might be extended in future work.
As a solution, a simpler architecture with no graph-pooling was chosen. This became
one of the weaknesses of the U-nets architecture implemented in this project. The
encoding power lies exclusively on the Linear and MaxPool1D layers. In fact, the
pooling layer used is Maxpool1D on a one-dimensional tensor and no convolution
was used after pooling.

5.3 Comparison of Dimensional Reduction
Given the two different datasets, the number of components needed to capture the
same variance varies, as shown in Figure 4.3 and 4.11. For a smaller dataset, like the
beam model, only 13 components are necessary to capture the 99.99% of variance of
the dataset (see blue line in Figure 4.3). In contrast, the pre-crash simulation needs
26 components. To reach 90% of the variance, only three components for the beam
model and five components for the pre-crash model are needed. This result confirms
that a reasonable cut-off for PCA is three components as presented in [19] (see
Section 2.2.4). While higher numbers of components can yield an improvement, it
becomes more time consuming. That is because in the regression, each FE parameter
needs to be mapped to each variable in the latent space.
As seen in the results, PCA proved to be the best method for Dimensionality Re-
duction by reaching the lowest error. The GNN has a higher MAE in the training
and test dataset in both beam and pre-crash simulation due to the limitation of a
missing pooling layer. Surprisingly, the CNN proved to be as good as the GNN with
a latent space of 25.

5.4 Comparison of Regression
Of the three regression methods tested, the Random Forest achieved the best result
in the testing in both PCA and Graph Auto-encoder for the pre-crash simulation. In
contrast, XGBoost was the weakest regression method. The advantage of Random

2Dense matrix means the majority of entries in the matrix is non-zero.

51


5. Discussion

Forest is having fewer parameters to tune. This makes tuning more efficient. On
the contrary, XGBoost has the potential to outperform Gradient Boosting as it
is an enhanced implementation of the latter. However, it is more challenging to
tune due to the number of hyperparameters. Regarding this issue, hyperparameters
optimization frameworks are helpful. The remaining challenge lies in the necessity
to choose arbitrarily the range of values for each hyperparameter leading to a trade
off between optimal parameters and time consumption (see Section 3.5). For the
XGboost, a hyperparameters study needed to be performed to understand which
parameters were of major influence in the training process.

5.5 From the dummy model to the HBM

It would be possible to replace the dummy model with an actual HBM model in
the pre-crash simulation. The adaptation of the methods would be straight for-
ward but the simulation would become computationally demanding. Particularly,
creating training datasets with the FE solver would require time 3. This was not
feasible within this project, hence the usage of a simpler model. This issue could be
resolved using more CPU cores and generate simulations in parallel. Working with
the dataset would take more RAM for processing and using the data in the training
process. For example, the dataset of the simple dummy model used for the pre-crash
simulation had already a size of 2 gigabyte for 75 samples. The HBM dataset of 500
000 nodes would be considerably larger, about 60 times more. Challenges would
arise when processing the data for the methods and during the training phase, espe-
cially in the DR part, as the regression would work with a reduced dataset. As the
focus of this project was more on method development, a simpler model enabled a
more thorough investigation of feasible methods.

5.6 Computational resources

In this section, a discussion and overview of the resources and time consumption of
the methods is given.

Beam model
The number of simulations produced for the project was 100 samples. The amount
of time for each simulation was of 95 seconds with 4 CPU cores. The overall time
to produce 100 FE simulations was of 2.6 hours. The node displacement dataset
stored occupies 12.1 MB.
In Figure 5.1 below, a comparison of the training time between the PCA and GNN
is given.

3A single simulation with a HBM model requires 112 hours - 4.6 days - with 64 CPU cores. For
the dummy only 4 CPU cores were used.

52


5. Discussion

Figure 5.1: Time consumption of different Dimensionality reduction method

It is possible to notice that the PCA is the fastest with only 1.7 seconds. Whereas
the for GNN, when trained for 100 epochs, it takes 3500 times more.

To give a comparison, let´s suppose to reproduce 500 simulations with LS-DYNA
solver. The time required for the solver would b