Development of electrolyte descriptors
for predicting cycling performance of elec-
trochemical cells

Master’s thesis in Complex Adaptive Systems + Systems, Control and Mechatronics

Victor Haugaard
Gustav Silverstam

DEPARTMENT OF PHYSICS

CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2023
www.chalmers.se

www.chalmers.se


Master’s thesis 2023

Development of electrolyte descriptors for
predicting cycling performance of electrochemical

cells

Victor Haugaard
Gustav Silverstam

Department of Physics
Chalmers University of Technology

Gothenburg, Sweden 2023


Development of electrolyte descriptors for predicting cycling performance of electro-
chemical cells
VICTOR HAUGAARD
GUSTAV SILVERSTAM

© VICTOR HAUGAARD, GUSTAV SILVERSTAM 2023.

Supervisor: Rasmus Andersson, Compular
Examiner: Giovanni Volpe, Department of Physics, Gothenburg University

Master’s Thesis 2023
Department of Physics
Chalmers University of Technology
SE-412 96 Gothenburg

Cover: Structural descriptor containing graph level features such as concentration,
diffusivity and ionic conductivity for each molecular compound in a simulated rep-
resentation of an electrolyte.

Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria
Printed by Chalmers Reproservice
Gothenburg, Sweden 2023

iv


Development of electrolyte descriptors for predicting cycling performance of electro-
chemical cells
VICTOR HAUGAARD, GUSTAV SILVERSTAM
Department of Physics
Chalmers University of Technology

Abstract
This master’s thesis presents the development of structural electrolyte descrip-
tors utilized for predicting the cycling performance of electrochemical cells such
as lithium-ion batteries (LIBs). The research is conducted in collaboration with
Compular, a startup company focusing on materials development and battery tech-
nology improvement. Through molecular dynamics (MD) simulations and trajectory
analysis facilitated by Compular’s software, electrolyte descriptors are developed,
integrating structural and electrochemical molecular properties. The simulations
applied are based on data from battery performance tests, collected and annotated
in this project. We propose to use machine learning (ML) to model the relationship
between the chemical structure of electrolytes and their performance characteristics.
The developed descriptors function as input to a graph neural network (GNN) and
thereby offer a novel and efficient method for evaluating electrolyte performance
and optimizing electrochemical cells. The findings of this thesis confirm that the
descriptors successfully extract necessary information from electrolytes using Com-
pular’s analysis software, CHAMPION, and demonstrate their compatibility with
the GNN. Moreover, the discussion highlights the importance of annotated data,
the complexity of electrolyte descriptors and their predictive abilities. Limitations,
challenges and potential enhancements are also addressed, underscoring the need
for a larger dataset and exploring possible actions to enhance the performance of
the model. In conclusion, this research bridges the gap between empirical experi-
ments and theoretical understanding of battery cycling performance while reducing
the need for extensive manual testing. It provides a foundation for further inves-
tigations into electrolyte performance prediction and represents a significant step
towards more efficient and sustainable battery technologies.

Keywords: lithium-ion batteries, electrolytes, descriptors, machine learning, graph
neural network, molecular dynamics

v


Acknowledgements
We would like to express sincere gratitude to our supervisor at Compular, Rasmus
Andersson, and co-supervisors Magnus Rahm and Fabian Årén, for their supportive
and helpful guidance throughout this master’s thesis. We would also like to thank
our colleagues and friends at Compular for giving us this opportunity and providing
us with their expertise, enthusiasm, constructive feedback and a welcoming atmo-
sphere at the office. We also would like to thank Professor Giovanni Volpe for taking
on this thesis as supervisor and examiner, we sincerely appreciate it.

Victor Haugaard and Gustav Silverstam, Gothenburg, May 2023

vii


List of Acronyms

Below is the list of acronyms that have been used throughout this thesis listed in
alphabetical order:

ANN Artificial Neural Network
CHAMPION Chalmers hierarchical atomic, molecular, polymeric, and ionic anal-

ysis toolkit
CNN Convolutional Neural Network
CV Cyclic Voltammetry
DoD Depth of Discharge
EC Ethylene Carbonate
EMC Ethyl Methyl Carbonate
EoL End-of-Life
ESW Electrochemical Stability Window
GCN Graph Convolutional Network
GNN Graph Neural Network
HPC High-Performance Computing
LIB Lithium-Ion Battery
LiPF6 Lithium Hexafluorophosphate
LiFSI Bis(fluorosulfonyl)imide
MD Molecular Dynamics
ML Machine Learning
MLP Multi-Layer Perceptron
MSE Mean Squared Error
ReLU Rectified Linear Unit
SEI Solid-Electrolyte Interphase

ix


Contents

List of Acronyms ix

List of Figures xiii

1 Introduction 1
1.1 Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 5
2.1 Lithium-Ion Batteries . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Electrolytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Battery Performance Testing . . . . . . . . . . . . . . . . . . . 8

2.2 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.1 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Methodology 19
3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.1 Target Value Estimation . . . . . . . . . . . . . . . . . . . . . 20
3.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Simulation Pre-processing . . . . . . . . . . . . . . . . . . . . 21
3.2.1.1 Electrolyte Components Annotation . . . . . . . . . 21
3.2.1.2 Molar Ratio Calculation . . . . . . . . . . . . . . . . 22
3.2.1.3 Geometry Creation Using CHAMPION . . . . . . . . 22

3.2.2 Molecular Dynamics Simulation . . . . . . . . . . . . . . . . . 23
3.2.3 Analysis Using CHAMPION . . . . . . . . . . . . . . . . . . . 24

3.3 Descriptor Development . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Node Embedding . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2 Graph Construction . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 Graph Feature Extraction . . . . . . . . . . . . . . . . . . . . 25
3.3.4 Graph Batching . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.1 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

xi


Contents

4 Results & Discussion 29
4.1 Annotated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Conclusions & Future Work 37

Bibliography 39

xii


List of Figures

2.1 Schematic of a Lithium-Ion battery. . . . . . . . . . . . . . . . . . . . 6
2.2 Illustration of capacity fade compared to the number of cycles for

three different electrolyte solvents. . . . . . . . . . . . . . . . . . . . . 10
2.3 MD simulation schematic. . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Graph representation of nodes and their respective connections, show-

ing the connection rules of a CHAMPION output. . . . . . . . . . . . 12
2.5 The architecture of a simple four-layered fully connected ANN with

three input neurons x, weights w(n), two hidden layers l(n) of four
neurons with corresponding biases b(n) and two output neurons y. . . 13

2.6 Schematic overview of the operations performed from input to output
within a neuron in ANN. Input values xn is multiplied by weights wnj,
summarized with an added bias b and then multiplied by an activation
function φ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.7 Typical GNN architecture design. . . . . . . . . . . . . . . . . . . . . 16
2.8 Example of how graph convolutions propagate features to target node A. 16
2.9 An example of how a simple descriptor of four features describes the

structure of a given input. In this example, each row of the descrip-
tor is an atom in the molecule. All four features are numerically
represented for each atom in this case. . . . . . . . . . . . . . . . . . 17

3.1 Illustration of capacity fade compared to the number of cycles for
three different electrolytes, including a line for battery failure (70%
capacity). The initial drop in capacity is due to equilibration cycles,
performed to create the SEI. . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Representation of EC in a Python dictionary. . . . . . . . . . . . . . 21
3.3 An illustration of the created starting geometry using CHAMPION. . 23
3.4 An illustration of the geometry after an electrolyte simulation using

CP2K, final timestep. New molecular compounds have been shaped
from molecules and atoms. . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5 Illustration of what the electrolyte descriptors are representing. The
features in this example are arbitrary. . . . . . . . . . . . . . . . . . . 26

xiii


List of Figures

3.6 Schematic overview of the GNN purpose of the network. Electrolyte
descriptors as DataElectrolyte objects are sent into the network, where
each molecular compound becomes a graph with each molecule as em-
bedded nodes. The node-embedded graphs are sent into graph convo-
lutions, that perform two layers of message passing within each graph,
followed by pooling and summarizing each graph into a weighted value. 27

3.7 Illustration of the full network architecture from developed electrolyte
descriptor input to predicted cycle performance output. The network
is divided into graph convolutions, graph pooling, graph feature con-
catenation, sorting and padding, graph-wise fully connected layer,
fully connected layers and prediction. Input vector and matrix sizes
are shown in between all network operations. . . . . . . . . . . . . . . 28

4.1 Cycle performance of 104 annotated battery cycle tests, with Cy-
cles to 70% capacity of each electrolyte test. The temperature used
within the test is also represented. The relevant data from test 81 is
presented in the bottom part of the figure. . . . . . . . . . . . . . . . 29

4.2 Illustration of all attributes of a DataElectrolyte object. . . . . . . . . 31
4.3 Representation of an electrolyte including all molecular compounds

in it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Plot of graph number 12 in the electrolyte along with its correspond-

ing graph features. In this example, the included graph features are
charge, mass, concentration, molar conductivity, electrical mobility,
ionic conductivity and diffusivity. . . . . . . . . . . . . . . . . . . . . 32

4.5 Training and validation loss of the GNN over 1000 epochs. . . . . . . 33

xiv


1
Introduction

A societal trend of sustainable solutions, green innovations, and environmentally
friendly processes is evidently flourishing in the industrial market [1]. This change
has been accompanied by a parallel surge in the development and implementation of
artificial intelligence [2], which opens up a new universe of problem-solving methods.
Of particular relevance to this thesis is battery development. However, a majority
of methods currently employed to improve battery technology require a laborious
process of manual testing and iteration, followed by an extended period of analysis
to determine performance and lifespan-related properties [3].

Compular [4] is a startup company and university spinout from the battery research
group of Professor Patrik Johansson at the Physics Department of Chalmers Univer-
sity of Technology, with the vision to digitalize materials development. Starting with
liquid electrolytes for lithium-ion batteries (LIBs) and similar chemistries, Compular
develops a software tool to automate high-quality molecular dynamics (MD) sim-
ulations and trajectory analysis to accelerate the work of industrial battery R&D
departments, including their own unique patented analysis methods.

Compular’s software can predict a much richer set of properties of liquid electrolytes
than available from commonly available experimental or other computational tech-
niques. A notable exception is the uniqueness of their technology, which discovers
the emergent structures in electrolytes by detecting bonds between atoms based on
which pairs of atoms move together. The analysis thus finds both the covalent bonds
holding molecules together, as well as other cohesive interactions including ion-ion
and ion-solvent coordination and hydrogen bonding. Based on these bonds, the tool
creates a time-dependent global bond graph of the simulated system. Most of the
subsequent analysis is based on partitioning this graph into smaller structures using
either connected components or node embeddings and using statistical physics to
characterize the properties of each topologically distinct structure. Overall system
properties can be computed by aggregation of structure properties, which allows
explaining how the distribution of structures gives rise to the system properties [5].

1.1 Purpose and Scope
A common problem for battery cell developers is predicting which electrolyte com-
positions will give rise to good battery performance. Good performance is measured

1


1. Introduction

by building a coin cell and subjecting it to repeated charge and discharge cycles
according to some predetermined schedule. This is called cyclic voltammetry (CV),
herein referred to as cycling. During cycling, the battery capacity generally fades
over time and for most possible combinations of electrolytes and electrodes, the cell
operation abruptly stops functioning after a modest number of cycles, rendering the
cell unusable. This capacity loss normally returns close to baseline when return-
ing to lower charge/discharge rates. In addition, the faster the cell is charged and
discharged, the less energy can be extracted per cycle. These types of cycling exper-
iments are the main way cells are tested [6]. In electrolyte development, most R&D
departments keep the other cell components constant while varying the electrolyte
composition. It has been found empirically that small variations in electrolyte com-
position can affect battery performance significantly [7].

The physics determining cell cycling performance as a function of electrolyte com-
position is much too complicated to be accurately modeled, in part because elec-
trolytes in LIBs usually operate outside of their electrochemical stability window
(ESW) [8], so that cell operation is enabled only by the formation of a protec-
tive film on the electrode-electrolyte interface which permits lithium ions to pass
through while avoiding the continuous breakdown of electrolyte components. This
film is known as the solid-electrolyte interphase (SEI) and is a dynamic structure
that is believed to be both heterogeneous and dynamic [9]. While the prediction
of the structure and dynamics of the SEI is in very high demand, doing so reliably
from first principles is an unsolved problem.

Nevertheless, one can more straight-forwardly reason about what electrolyte prop-
erties should in principle determine the nature of the SEI, even if no solution can
be found: the distribution of local structures, their formation, breakage and recom-
bination dynamics, transport properties, and electrochemical stabilities, perhaps in
addition to a few system-wide properties, e.g. density and viscosity. These proper-
ties coincide with the properties predicted well by Compular. This could enable the
development of a set of electrolyte descriptors for predicting cycling performance
based on supervised learning, where the molecular compounds and cycling data
come from experiments, while the rest of the descriptors are based on Compular’s
simulation and analysis techniques.

Compular’s analysis methods provide the means necessary to predict these proper-
ties, which could enable the development of electrolyte descriptors. These descrip-
tors, combined with experimental data on molecular compounds and cycling perfor-
mance, could enable the creation of predictive models for cycling performance. This
approach would bridge the gap between empirical experiments and theoretical un-
derstanding, facilitating easier battery analysis and reducing the need for extensive
manual testing. The situation begs the question: is it possible to develop efficient
electrolyte descriptors, that capture relevant data from Compular’s analysis soft-
ware, and includes the decisive information that enables a machine learning (ML)
model to produce accurate cycling predictions?

2


1. Introduction

1.2 Aim
The aim of this thesis is to develop structural electrolyte descriptors that could be
utilized to predict the cycling performance of electrochemical cells.

1.3 Limitations
The main objective is to focus exclusively on the development of structural elec-
trolyte descriptors. As a primary limitation, an initial version of a neural network
will be outlined to confirm the predictive capabilities of the descriptors, rather than
predicting cycling performance accurately. Consequently, the scope of tuning and
model evaluation will be confined to analyzing the utility of using the descriptors
for prediction purposes. Moreover, the development of electrolyte descriptors is
subject to limitations imposed by the availability of data collected from open-source
datasets and the time allocated for data collection. It is important to note that this
project is considered to be a preliminary study, which inherently restricts the extent
of analysis and reliance on existing research findings for its outcomes.

3


1. Introduction

4


2
Theory

In this section, the fundamentals of lithium-ion batteries are explained along with
the properties of electrolytes in such batteries. Furthermore, the techniques used
when analyzing battery performance are presented as well as the MD used when
simulating electrolytes. Additionally, the theoretical background of artificial neural
networks (ANNs) and graph neural networks (GNNs) is introduced.

2.1 Lithium-Ion Batteries
LIBs are extensively employed as power sources in a wide range of applications ow-
ing to their high energy densities, high Coulombic efficiencies, wide electrochemical
stability, high ionic conductivity, low self-discharge features and a range of voltages
accessible with diverse electrode designs [10, 11]. LIBs have higher energy density
compared to other rechargeable batteries such as nickel metal hydride batteries and
lead acid batteries, which is due to the higher operating voltages of LIB. While
energy density is one of the most important factors for portable electronics, cycle
life is just as relevant in some aspects, especially when the lifespan of the battery is
considered [12].

The structure of a LIB is presented in Figure 2.1. The battery consists of two
electrodes, a separator, two current collectors and an electrolyte. The negative and
positive electrodes, also known as anode and cathode, are usually separated by a
porous separator. The electrolyte is a liquid consisting of one or several carbonates
like ethylene carbonate (EC) and ethyl methyl carbonate (EMC). During charge,
lithium ions move from the positive electrode (cathode) to the negative electrode
(anode) through the electrolyte. This creates free ions in the anode, which flows
through an external cicuit to the cathode, creating an electric current. The opposite
happens when discharging the battery, when the lithium ions move through the
electrolyte from the anode to the cathode a flow of electrons through an outside
current is generated [13] which creates a current to the electronic device that is
plugged in. This procedure is possible due to the attributes of the electrolyte, which
has an important role in ensuring the functionality of the battery, e.g. enabling ion
transport while disabling electron transport between the electrodes [14].

5


2. Theory

Figure 2.1: Schematic of a Lithium-Ion battery.

2.1.1 Electrolytes
Electrolytes play a crucial role in batteries, serving as the medium through which
ions flow between the electrodes and facilitating the conversion of chemical energy
into electrical energy. An electrolyte is a solution of a salt in a solvent that con-
tains ions and conducts electricity when dissolved in a solvent, e.g. water. The
electrolyte is typically a liquid or gel that contains ions, which can move between
the two electrodes as the battery discharges and charges. During discharge, the
electrolyte undergoes a chemical reaction that causes ions to flow from the anode to
the cathode, generating an electric current as seen in Figure 2.1. In modern batter-
ies, this reaction is reversible, meaning the ions flow back from the cathode to the
anode, restoring the battery’s state of charge during charging [15,16].

The choice of electrolyte in a battery is critical due to its effect on electrochemical
performance, safety and lifespan [11]. As mentioned above, common electrolytes
for LIBs consist of a lithium salt such as lithium hexafluorophosphate (LiPF6) or
lithium bis(fluorosulfonyl)imide (LiFSI) dissolved in an organic solvent [17,18]. The
characteristics and properties within a specific electrolyte play a relevant role in the
determination of the potential cycle lifespan and behavior under certain conditions.
From a cycle perspective, this thesis identifies the following properties that are es-
sential to consider when creating electrolyte descriptors:

• Ionic conductivity: The ability of the electrolyte to conduct ions is crucial
for the overall performance of the battery, as it affects the rate of charge
and discharge. The rate at which electrons spontaneously flow from the more
negative to the more positive potential of the electrodes. Herein, the partial
conductivity of the active ion (Li+ in LIB) is especially important, since this is
the ion involved within the electrochemical reactions that drive the cell [19,20].

6


2. Theory

• Electrical mobility: Refers to the speed at which the charged particles, like ions,
move when interacting with the electrical field created by the electrodes [21].

• Electrochemical Stability Window (ESW): The range of potentials or voltages
within which an electrolyte is stable against oxidation and reduction. At too
high or too low voltages, the oxidation or reduction of the electroactive species
may cause undesirable side reactions, such as the breakdown of the solvent or
electrode materials, gas evolution, or the formation of unwanted byproducts.
The ESW is determined by the redox potentials of the species involved in the
electrochemical reaction, as well as by the chemical and physical properties of
the solvent, electrode and the reaction environment [22,23].

• Solvent stability: The stability of the solvent in the electrolyte is important
for maintaining the homogeneity of the solution and avoiding degradation over
time [24].

• Redox stability: The stability of the electrolyte against oxidation-reduction
reactions is important for maintaining the stability and consistency of the
battery voltage over time [25].

• Viscosity: The viscosity of the electrolyte affects the ion transport within the
battery and therefore affects the overall rate performance of the battery [23].

• Concentration: The concentration of the inherent atoms, molecules and molec-
ular compounds of an electrolyte determine the rate at which they appear.

• Thermal stability: The thermal stability of the electrolyte is important for
ensuring that the electrolyte does not decompose at high temperatures, which
can cause battery degradation [26].

• Electrolyte/electrode interfacial resistance: The resistance at the interface be-
tween the electrolyte and the electrode affects the rate and efficiency of ion
transport and therefore, the overall performance of the battery [24,26].

• Electrolyte/electrode compatibility: The compatibility between the electrolyte
and the electrodes is important for reducing the formation of unwanted byprod-
ucts which can affect the performance and longevity of the battery, as well as
enabling creation of the sufficient SEI layer, further explained in section 2.1.2.
The compatibility depends on a combination of properties stated above [24].

These properties play a crucial role in determining the performance and longevity of
an electrolyte and optimizing them is highly preferable and desired when developing
electrolyte descriptors [19, 20, 22, 27, 28]. Moreover, to analyze the properties of
an electrolyte, the battery behavior needs to be studied to understand what the
electrolyte properties are influenced by. Therefore, battery performance testing is
essential.

7


2. Theory

2.1.2 Battery Performance Testing
Traditional battery testing involves key parameters such as C-rates, Depth of Dis-
charge (DoD), temperatures and the use of electrochemical techniques such as Cyclic
Voltammetry (CV) to study battery behavior within different cell chemistries. These
parameters are important to understand the lifespan and characteristics of different
batteries. When the performance of a battery is analyzed, cycle life is often an
important aspect. The cycle life represents the number of times a battery can be
charged and discharged over the lifetime of the battery. It should be noted that the
cycle life of a battery is highly dependent on the DoD [29].

DoD is a term commonly used in battery testing and refers to the charge that has
been removed from the battery electrode during discharge. More specifically, the
capacity removed from a battery. It is typically expressed as a percentage of the
battery’s total capacity. For example, if a 100 mAh battery is discharged at a current
of 50 mA for 20 minutes the DoD will be:

DoD = 50mA · 20min
100mA · 60min ≈ 16.7%

DoD is an important parameter in battery testing because it directly affects the
battery’s overall lifespan and performance [30]. In general, deeper discharge cycles
can reduce the battery’s overall lifespan and performance, while shallower discharge
cycles can help to extend the battery’s lifespan and performance [30]. Therefore,
understanding the DoD of a battery is critical when designing battery systems and
performing battery testing. By measuring the DoD of a battery or electrolyte, one
can evaluate its performance and optimize its use for specific applications.

Another important parameter when discussing battery performance testing is the
C-rate. The C-rate is a measure of the rate at which a battery is charged or dis-
charged relative to its capacity. Specifically, it refers to the rate at which a battery
is charged or discharged relative to its rated capacity, which is typically measured
in ampere-hours (Ah) [31]. For example, if the capacity is 100 Ah and it is charged
or discharged at a C-rate of 1C, then it is being charged or discharged at a rate
of 100 A. If it is charged or discharged at a C-rate of 2C, then it is being charged
or discharged at a rate of 200 A. The C-rate has an impact on the cycle life of the
battery cell and the lifespan typically decreases when using a higher C-rate [32]. If
the battery will be used in a power tool or a vehicle motor, the C-rate will be high
automatically since these are draining electricity at a higher rate. However, vehicles
often have a power management system that limits the amount of received power in
order to avoid rapid degradation of the battery [32].

Temperature is an additional parameter that can affect the performance of a battery.
A direct consequence of higher temperature is the higher chemical reaction rate in
the battery [33] which could lead to increased capacity loss. On the other hand,
when the ambient temperature is low, the capacity retention of a battery decreases.
This occurs since the ionic conductivity of the lithium salt electrolytes declines [34]
and therefore the overall performance of the battery cell as well. Therefore, the op-
timal range for a LIB to operate with good performance is generally 15 − 35°C [35].

8


2. Theory

This is an important factor when measuring the performance of the battery and
could lead to a lower lifespan of the LIB.

CV [36] is a useful technique for analyzing the performance of electrolytes in battery
cells. In CV, the potential of an electrode is swept linearly and the resulting current
is measured as a function of potential. CV can be used to study the ion transport
properties of the battery cell, as well as its ESW. It can also be used to evaluate
the electrochemical behavior of the electrolyte, such as its redox behavior and can
provide information about the rate and mechanism of electrochemical reactions.

A further aspect that is considered when testing the cyclic performance of batteries
is the cutoff voltages. The cutoff voltages are the specified lower and upper voltages
of the battery during cycling [37]. The battery test is generally conducted with
an upper cutoff voltage that avoids overcharging [38], which could have undesirable
consequences for the battery cell. These cutoff voltages will be referred to as voltage
range in this thesis.
When testing the performance of batteries and specifically electrolytes, the usual
method is to cycle the batteries for a certain number of cycles or until failure. The
point of failure could be described as its end-of-life (EoL), which is when the capac-
ity has faded and the battery is no longer considered to perform acceptably for its
area of use. EoL varies depending on battery, where for a LIB it is usually around
70% of its initial capacity [39]. During battery performance testing, equilibration
with higher ESW is performed to form a protective SEI on the electrodes. The SEI
is a thin, solid layer that acts as a protective film between the electrode and the
electrolyte with the primary role to stabilize the interface between the electrode and
the electrolyte. The SEI layer forms as a result of electrochemical reactions between
the electrolyte and the electrode material. This layer is crucial since it helps prevent
further reactions between the electrolyte and the electrode, reducing degradation of
the battery’s performance [40].

The parameters described in this section are vital and the conditions of these affect
the lifespan of the battery when tested. There are many studies where the goal is to
cycle the batteries for a set number of cycles using varying values for DoD, C-rate,
temperature and with different electrolyte components [41–50]. An illustration of
how results from a cycling test can look is presented in Figure 2.2. This is usually
how the results are presented when testing different electrolytes and analyzing their
capacity. In most studies, salt, solvent, electrodes, C-rates, DoD, temperature and
voltage range are presented along with the results.

9


2. Theory

Figure 2.2: Illustration of capacity fade compared to the number of cycles for three
different electrolyte solvents.

2.2 Molecular Dynamics
MD is a computational method used to study the behavior of molecules and their
interactions over time. It simulates the motion of atoms and molecules using classical
mechanics and statistical thermodynamics, providing insight into the physical and
chemical properties of materials at the molecular level. MD simulations can be used
to investigate a wide range of phenomena, from the behavior of simple gases and
liquids to the folding of proteins and the dynamics of chemical reactions. They are
widely used to predict the properties and behavior of molecules and materials under
different conditions and to design new materials with specific properties [51–53].
The force calculations in MD are calculated for each atom based on Newton´s laws
of motion as in (2.1).

a = F
m

(2.1)

There are many ways of simulating the steps and movement of the atoms, with
different accuracies, cost functions and stabilities [22]. Here we perform MD using
the force acting on each atom, one can then predict the spatial position of each atom
as a function of time. Within a simulation, the acting force calculated from (2.1) is
repeatedly calculated each timestep ∆t and then used to update the position and
velocity of each atom as in Figure 2.3.

10


2. Theory

Figure 2.3: MD simulation schematic.

MD simulation possesses substantial capabilities from several aspects. Firstly, it
computes the spatial position and motion of every atom at each moment in time,
information that is difficult to obtain through experimental techniques. Secondly,
MD is highly controllable and applicable to specific dynamics, which further en-
hances its utility and makes it a highly controllable and effective way of analyzing
MD [54]. Within this thesis, MD is achievable through the utilization of the CP2K
model GFN-xTB [55], a software that encompasses quantum chemistry and con-
densed matter physics. This program provides a general framework for various
modeling approaches program and is capable of computing atomistic simulations
of diverse systems such as solid-state, liquid, molecular, crystal, and biological sys-
tems [56].

Moreover, when applying MD for the analysis of electrolytes, the molecular struc-
ture needs to be structurally represented with all complex inherent information.
Consequently, the data from MD would need to be represented using graphs.

2.3 Graph Theory
Graph theory is a branch of mathematics that deals with the study of graphs, which
are structures consisting of a set of vertices (also called nodes) and edges (the links)
connecting pairs of vertices. One of the key concepts in graph theory is the degree of
a vertex, which is the number of edges that connect to it. Graphs can be classified
based on their degree sequence, which is the list of degrees of the vertices in the
graph. For example, a graph in which all vertices have the same degree is called a
regular graph.

Another important concept in graph theory is connectivity, which refers to how
easily information can flow between vertices in a graph. A graph is said to be con-

11


2. Theory

nected if there is a path between any two vertices. If a graph can be split into two
or more disconnected components, it is called a disconnected graph [57, 58]. The
connections within a graph could be obtained by creating a matrix with one-hot
encoding consisting of 0’s and 1’s which represents positive or negative connection.

Moreover, when handling with subgraphs, for instance, a molecule in a molecular
compound, this could be encoded into vector space with graph embedding, which is
a way to represent the connections of all subgraphs when encoding it for network
usage. Graph embedding is a technique used in graph analysis to transform graphs
into a vector or a set of vectors. The goal of graph embedding is to capture the
structural information of the graph [59]. One way of performing graph embedding
of subgraphs practically is to use PyTorch geometric [60] which enables encoding of
subgraphs into a network like a GNN which is explained later in section 2.4.1. Such
graphs can be represented using NetworkX [61], which is a Python module that en-
ables the creation, manipulation and study of the structure, dynamics and functions
of complex networks. When handling graphs, NetworkX provides useful functions
that add nodes and edges from the simulated input data, to create a structured
graph representation.

Graph theory prevents molecular representation from losing information about the
connections between molecules and atoms. The significance lies in the necessity to
uphold and analyze the inherent associations existing between atoms, molecules and
molecular compounds. Molecular graph representation can be accomplished from
MD simulation data using Chalmers hierarchical atomic, molecular, polymeric, and
ionic analysis toolkit (CHAMPION) [5,22], as presented in Figure 2.4.

Figure 2.4: Graph representation of nodes and their respective connections, show-
ing the connection rules of a CHAMPION output.

12


2. Theory

2.4 Artificial Neural Networks
An ANN is an ML method with the purpose of simulating complex perceptual and
cognitive decisions of the human brain [62,63]. Similar to a brain, the network con-
sists of neurons that propagate, process and transmit information into a new layer
of neurons. The neurons are organized into layers where each layer receives input
from the previous layer and propagate output to the next layer. An ANN can con-
sist of multiple layers of neurons, called hidden layers. All neurons within each layer
receive input and transmit an output from the neurons in closeby layers, defined as
a fully connected layer which could be seen in Figure 2.5. The input is multiplied
with an individual numerical weight which determines the strength of influence the
connections have on each other. These are updated and adjusted for optimization
during a training process of the network. The weights are then summarized, with
a bias term added. The bias is added to the weighted sum of the inputs before
the activation function is applied. The bias term allows the neuron to have some
influence on its output even if all of its inputs are zero [64]. Finally, the output is
propagated through a non-linear activation function which tells the network whether
each neuron should be activated or not, as illustrated in Figure 2.6.

Figure 2.5: The architecture of a simple four-layered fully connected ANN with
three input neurons x, weights w(n), two hidden layers l(n) of four neurons with
corresponding biases b(n) and two output neurons y.

13


2. Theory

Figure 2.6: Schematic overview of the operations performed from input to output
within a neuron in ANN. Input values xn is multiplied by weights wnj, summarized
with an added bias b and then multiplied by an activation function φ.

The purpose of the activation function is to introduce non-linearity into the net-
work, which allows it to model complex relationships between inputs and outputs.
Without this, the ANN would be more in the range of a linear regression model.
There are many different types of activation functions, such as the sigmoid function,
the rectified linear unit (ReLU) function and the tanh function, each with its own
strengths and weaknesses [64]. Within this thesis, ReLU will be considered.

Altogether, given an activation function φ, input values x, weights w and bias b, we
can calculate the output value of a neuron z as seen in Figure 2.6 as follows:

z = φ
( ∑

Wjxj + b
)

(2.2)
When determining the prediction of the model, we calculate the correctness with
loss functions. The loss function could be calculated differently, based on the desired
way to quantify the difference between the ground truth ŷj and the output of the
network y. A usual loss function L is Mean Squared Error (MSE):

L = 1
2

∑
(ŷi − yi)2 (2.3)

For an ANN, the weights and biases of each neuron are randomly initialized and
then updated for each iteration within a training process. This process is called
backpropagation, with the purpose of converging toward a local minimum of the
loss function and indicating how much the network should change its internal pa-
rameters. Backpropagation calculates the gradient (derivative) of the loss function
associated with a given state with respect to the weights. A common way of per-
forming backpropagation is using a method called stochastic gradient descent, which
uses a small update rule of the weights expressed as follows [65]:

δW (k)
mn = −α

∂L

∂W
(k)
mn

(2.4)

14


2. Theory

Where the constant α > 0 is a predefined learning rate. The partial derivative of the
loss function L with respect to the weight connecting the mth neuron in the hidden
layer k to the nth neuron in the output layer, denoted as W (3)

mn, we get the weight
update chain rule for this output weight as:

∂L

∂W
(3)
mn

= −
∑

(ŷi − yi)
∂yi

∂W
(k)
mn

(2.5)

Where the next layer is added to the chain rule:

∂yi

∂W
(3)
mn

= ∂

∂W
(3)
mn

φ
( ∑

W
(3)
ij z

(2)
j + bi

)
(2.6)

Which is equal to:
∂yi

∂W
(3)
mn

= φ′
( ∑

W
(3)
ij z

(2)
j + bi

)
δimz(2)

n (2.7)

Where δim is the Kronecker delta, which is 1 when i = m and zero otherwise. This
process of updating the weight works the same way for the bias:

δb(k)
m = −α

∂L

∂b
(k)
m

(2.8)

Which gives for b:

∂L

∂b
(2)
m

= (ŷm − ym)φ′
( ∑

W
(3)
ij z

(2)
j + bi

)
(2.9)

2.4.1 Graph Neural Networks
A GNN is a deep learning-based method used to operate on graphs described in
section 2.3. A GNN can be used for many purposes and areas such as social net-
works [66], physical systems [67] and knowledge graphs [68]. Before creating a GNN,
it is important to detect the graph structure for utilization. There are two differ-
ent types of graph structures, non-structural and structural. When the graphs are
implicit, it is a non-structural scenario and the task will be to build the graphs and
then design the GNN to fit to this graph structure. When the scenario is structural,
the graphs are explicit and already suited as input for the GNN [69].

When dealing with tasks on a graph level there are usually three types of tasks,
node-level tasks, edge-level tasks and graph-level tasks. These tasks usually refer
to classification where the goal is to categorize nodes, edges or graphs into different
classes or regression where the aim is to predict a continuous value for a node, edge
or graph [69]. The GNN is then designed to fit the specific need of the problem
and there are three computational functions that can be used in a GNN, propaga-
tion, sampling and pooling. The purpose of propagation is to spread information
between the nodes, usually by convolution or recurrence. This enables the network
to capture relations between nodes and their features. Sampling is often used with
the propagation when dealing with large graphs. The pooling operation is applied
when high-level graph features need to be extracted using the node features [69].

15


2. Theory

A typical GNN architecture with the features described in this section is presented
in Figure 2.7.

Figure 2.7: Typical GNN architecture design.

Graph convolutions are used in graph convolutional networks (GCNs) to propagate
information between nodes similar to convolutional neural networks (CNNs) do with
pixels in an image [70]. Convolutions are performed in order to learn feature infor-
mation about nodes that are far away. By adding more convolutional layers, more
information is propagated and the depth enables the receptive field of each node
to grow [71]. In Figure 2.8, an illustration of how graph convolutions propagate
features in a graph is presented. In this example, two convolutions are performed
in order to propagate features from nodes in the neighborhood of node A.

Figure 2.8: Example of how graph convolutions propagate features to target
node A.

16


2. Theory

2.4.2 Descriptors
When evaluating the performance of an ML model, the data representation is es-
sential. For example, when using compounds as data, the representations of these
are called descriptors [72]. The main aim is to select good descriptors that fit the
ML model properly. In essence, creating descriptors that include data properties
that have a correlation with the target property [72]. This descriptor development
coincides with the primary objective of this thesis, as mentioned in section 1.2.

When handling complex data such as databases that contain information about
chemical compounds, the data may not fit directly as input in a neural network.
Therefore, some kind of feature extraction needs to be performed in order to reduce
the dimensionality and create representative descriptors [73]. In other words, the
data needs to be processed and transformed in some way. The transformation of
data features is often referred to as feature engineering, which is the method of
transforming the input data to suitable descriptors for an ML model. The ability
to fit the descriptors to different ML models could also be important. Therefore,
flexible descriptors that can be decoupled from the model are preferred, since some
descriptors need to be altered in order to fit the specific model [74].

Figure 2.9: An example of how a simple descriptor of four features describes the
structure of a given input. In this example, each row of the descriptor is an atom
in the molecule. All four features are numerically represented for each atom in this
case.

17


2. Theory

18


3
Methodology

Herein we aim to provide a comprehensive overview of the approach taken to conduct
the research and achieve the objectives outlined in the introduction. This chapter
describes in detail the procedures and techniques used in each of the four parts of
the project, namely data collection, simulation, descriptor development and model
development.

3.1 Data Collection
In order to develop electrolyte descriptors for predicting cycling performance, an
investigation was conducted involving the examination of open-source work and
journals for data collection and annotation [41–50]. Unfortunately, the open-source
battery databases did not reveal the necessary data about cyclic performance test-
ing. However, the journals selected for analysis contained comprehensive results of
essential parameters and measurements obtained from classical battery performance
testing as described in 2.1.2. The journals were numerically and graphically ana-
lyzed and annotations were made accordingly.

The annotation process involved conducting keyword searches for relevant terms
such as long-term battery cycle testing, cycling performance, CV, ESW, cycle ca-
pacity, C-rates, DoD, voltage range and temperature in conjunction with battery
cycling testing that shared similar testing criteria and setup. Once battery testing
with appropriate parameters was identified, the primary focus was to locate how
the specific discharge capacity declined over the number of life cycles, as illustrated
in Figure 3.1. Whenever this data was given, combined with specific data of elec-
trodes surrounding electrolytes, the ratios of the specific electrolyte salt and solvent
used, the C-rates, voltage range, and long-term cycle number the electrolyte data
was annotated in an Excel file system. In specific cases where the explicit capacity
decline was not provided, a thorough analysis and interpretation of the graph was
conducted.

19


3. Methodology

Figure 3.1: Illustration of capacity fade compared to the number of cycles for three
different electrolytes, including a line for battery failure (70% capacity). The initial
drop in capacity is due to equilibration cycles, performed to create the SEI.

Hence, for each unique electrolyte tested, the following measurements, ratios and
essential parameters used were annotated:

• Salt
• Solvent
• Molar ratio of salt and solvent
• Electrodes
• Temperature
• C-rate
• Voltage range
• Cycles
• Capacity

All cycling tests annotated were performed with 100% DoD. After annotating a
sufficient amount of data for initial testing, all annotated electrolyte data needed
to be estimated on the same dimension for a reasonable comparison, since most
electrolytes were cycled differently.

3.1.1 Target Value Estimation
The point where the batteries reach 70% of their initial capacity is the point where a
battery generally has reached its EoL. Hence, the point of 70% capacity is estimated
as a general target value for all annotated data, to achieve an equal dimension for
the target value (label) in a neural network.

20


3. Methodology

The target value Cycles To 70% is calculated from Cycles and Capacity from the
obtained data. Based on common testing equilibration to form SEI at the beginning
of the testing as described in 2.1.2, as seen from the quick drop in capacity at the
start of the curves in Figure 3.1, the graphs needed to be interpreted from how the
curves converge after the equilibration cycles. All curves were therefore graphically
interpreted and annotated with an individual gradient factor. This gradient factor
is the slope of the cycling graph at the end of the test and is used to calculate the
number of cycles it takes for a test to fade down to 70% capacity. Cycles to 70%
was then used as a label for cycling prediction.

3.2 Simulation
The simulation step was performed in order to gather further information about
the electrolytes. This includes pre-processing of the electrolyte data collected, sim-
ulation in CP2K as well as data analysis using CHAMPION. The last step was
necessary since the descriptors developed in this thesis are based on the simulated
and analyzed electrolytes using Compular’s technique.

3.2.1 Simulation Pre-processing
In order to simulate the electrolytes gathered from the performance tests, certain
pre-processing steps were required. The components of the electrolyte had to be
annotated, the molar ratio of the electrolyte needed to be calculated and a simulation
geometry was necessary.

3.2.1.1 Electrolyte Components Annotation

The data collected in 3.1 were additionally annotated, since molecular information
was necessary for the electrolyte simulations. First of all, the electrolyte that would
be simulated included a set of molecules. These molecules were compiled in a Python
file with all components being dictionaries with essential attributes. These attributes
included name, atom type, number of atoms, density as well as molecular mass. In
Figure 3.2, a Python dictionary representation of EC is presented.

Figure 3.2: Representation of EC in a Python dictionary.

Using this Python file as a database for molecular components in the electrolytes
chosen for simulation, the molar ratio for the specific electrolyte could be computed.

21


3. Methodology

3.2.1.2 Molar Ratio Calculation

Using the electrolyte component information, the molar ratio of each electrolyte
could be determined. Since the different solvents in the electrolytes were specified
either by weight, mole per liter or weight percentage, the molar ratio was calculated
depending on what was presented in the data collected. The goal of this calculation
was to determine the number of molecules for each component of the electrolyte
while the number of total atoms was between 800 and 1200. The reason for the
constraint on the number of atoms was that the simulation of molecular dynamics
is complex and computationally expensive. Therefore, simulating over 1200 atoms
would take more than 3 days which was not possible in this project.

Since the calculations of the molar ratio were different for each electrolyte, a num-
ber of methods needed to be implemented in Python. The main method calculates
the number of atoms which should be around 1000, based on this the number of
molecules of each component is calculated. Another calculates the solvent ratios
based on the molecular weight of the salt and the solvent which is needed if the
ratio of the electrolyte solvent is defined but the salt is defined in moles/liter. The
two last functions handle electrolyte solvents that are defined in weights or weight
percentages. The first converts weight ratios for the solvent into molar ratios to
fit the ratio to the salt while the second adds solvent components based on weight
percentage.

These methods were used to compute the amount of substance for each component
in the selected electrolyte. Subsequently, this information was used to create the ge-
ometry in CHAMPION, which was the last pre-processing step before the molecular
dynamics simulation.

3.2.1.3 Geometry Creation Using CHAMPION

In order to simulate the molecular dynamics, the starting geometry of the electrolytes
was necessary. This is essential when simulating in CP2K since the initial geometry
coordinates need to be defined when using it. Using the amount of substance of each
component of the electrolyte, calculated as described in section 3.2.1.2, the geometry
was created using CHAMPION. The created geometry defines the position and
rotation of all molecular species at the initial step which is used as input to CP2K.
An example of how the geometry looks before a simulation is presented in Figure
3.3.

22


3. Methodology

Figure 3.3: An illustration of the created starting geometry using CHAMPION.

3.2.2 Molecular Dynamics Simulation

With a prepared initial geometry setup of correct molar ratios and the number of
atoms, the MD simulations of high-performance computing (HPC) could be started.
The simulation was set up with approximately five electrolytes on parallel comput-
ing nodes using Amazon Web Services (AWS).

The simulation was set up with 60,000 timesteps ∆t where one step simulates 1
femtosecond for each electrolyte. The forces acting on the atoms were simulated
as described in section 2.2 using CP2K. The final electrolyte structure output of
the molecular dynamics from approximately 1000 atoms can be seen in Figure 3.4.
After simulation, the structure becomes more ordered and reflects the interactions
between different species in the system, resulting in a clearer arrangement of clos-
est neighbors. In this thesis, structures that consist of more than one molecule are
referred to as molecular compounds.

Figure 3.4: An illustration of the geometry after an electrolyte simulation using
CP2K, final timestep. New molecular compounds have been shaped from molecules
and atoms.

23


3. Methodology

3.2.3 Analysis Using CHAMPION
From the simulated electrolyte structure, a configuration file for CHAMPION anal-
ysis was set up. The CHAMPION simulation was run on each electrolyte to obtain
essential properties and characteristics of electrolytes in batteries, as well as struc-
tural graph data of connections with nodes and vertices. The output of the anal-
ysis was given as a database to be analyzed in SQLite for descriptor development.
Each molecular compound in the produced database contains a graph number. The
database also includes a list of edges between all molecules within the specific molec-
ular compound and the corresponding parent graph. In the database, a list of nodes
corresponding to a specific graph is present. These are the nodes of the graph that
corresponds to the specific molecular compound. With this information, the de-
velopment of structural descriptors for each graph within the electrolyte could be
initiated.

3.3 Descriptor Development
The development of the descriptors was the most important process of this thesis.
This includes extracting descriptive information about the simulated and analyzed
electrolytes that could be used in an ML model. To produce detailed descriptors
for the electrolytes, some processing steps needed to be done. These steps include
creating node embeddings, adding nodes, edges, graph measurements and graph
attributes.

3.3.1 Node Embedding
As described in section 3.2.3, the molecular compounds in the produced database
represent graphs with molecules as nodes. Since all molecular species of each graph
are specified by name in the database, they needed to be embedded in some way,
in this case by using one-hot encoding. The first step to perform one-hot encoding
was to list all molecular species present in all electrolytes that were simulated. An
example of a list of molecules is presented below.

[”Li+”, ”EMC”, ”EC”, ”PC”, ”PF6−”, ”FMES”, ”V C”]

When the embedding of each node of a graph was performed, all nodes were com-
pared to a molecular list similar to the one above and one-hot encodings were created.
The one-hot encoding for a node that is EC would therefore look like this:

[
0 0 1 0 0 0 0

]
This way, each molecule (node) of each graph gets a vector representation instead of
a name, which is useful since neural networks handle numerical values rather than
categorical text strings.

24


3. Methodology

3.3.2 Graph Construction
When the node embedding was completed, the graphs consisting of these nodes
were constructed. In order to construct the graphs, the databases generated from
the analysis were used. As described in section 3.2.3, every molecular compound in
the electrolyte has a corresponding graph number which consists of all edge numbers
and node numbers representing the graph. The interrelated nodes and edges could
therefore be extracted from the database accordingly.

The first step in the process of constructing the graphs was to create empty graphs,
this was done using the NetworkX module in Python which is described in section
2.3. The new class that was created in this project is called MolGraph and it
inherits all features from a NetworkX graph. It also contains additional methods
that were added. The first additional method is the ability to add the nodes and
edges to the graph from the corresponding database. This method is essential in
order to structure the molecular compound graph with their molecules as nodes
and the connections between them as edges. Additionally, this method generates
corresponding one-hot encoding for each molecule in the graph, this is performed as
described in section 3.3.1.

3.3.3 Graph Feature Extraction
Following graph construction, the features of the graph were extracted from the
electrolyte analysis database. This is necessary since all molecular compounds in
the electrolyte have features that affect the behavior and performance of the elec-
trolyte. The class MolGraph possesses supplementary methods that add features
to the graph. These methods extract attributes such as charge, mass, diffusivity,
concentration, molar conductivity, electrical mobility and ionic conductivity for each
molecular compound in the electrolyte and append these as graph features. To pre-
vent irrelevant information from the database, graphs that are not present in the
final simulation step are excluded from the electrolyte representation. In essence,
the concentration of these graphs is zero in the database.

This procedure was performed on each graph in the electrolyte and the resulting
graph objects acquired the necessary features to describe a molecular compound.
Subsequently, the graphs needed to be assembled to represent the whole electrolyte,
which was done using graph batching.

3.3.4 Graph Batching
Since the graphs constructed were not assembled, they needed to be aggregated into
a suiting object that represents an electrolyte and is applicable as input to an ML
model. Therefore, the PyTorch Geometric batch object was used. This object type
aggregates graphs into a batch that functions as a large graph. Therefore, a class
called Electrolyte was created that inherits the features of a batch object. This elec-
trolyte object represents all graphs in the electrolyte and includes the corresponding
graph features for each graph within the electrolyte. This way, graph embedding

25


3. Methodology

such as GNN could be used with the electrolyte as input. Furthermore, additional
methods were added to the electrolyte class. These include a method for plotting
the electrolyte, a method for plotting a specific molecular compound from an elec-
trolyte and a print method that produces graph feature values from all molecular
compounds in an electrolyte. The methods are flexible in the aspect of what molecu-
lar compound should be plotted or printed as well as what graph features to visualize.

The resulting electrolyte objects, therefore, function as descriptors for the electrolyte
representations from the database. This allows the information about the simulated
and analyzed electrolytes to be concretized and the descriptors to function as input
to an ML model. Additionally, the electrolytes can be analyzed further after pro-
cessing and specific aspects of its features can be inspected in detail. An illustration
of what the electrolyte descriptor is representing is presented in Figure 3.5.

Figure 3.5: Illustration of what the electrolyte descriptors are representing. The
features in this example are arbitrary.

3.4 Model Development

The ML model was developed as a limited initial version outlined to confirm the
predictive capabilities of the descriptors, rather than predicting cycling performance
accurately. In this section, the model development is described step by step including
its parameters, operations, architecture and lastly the training with the evaluation
procedure.

26


3. Methodology

3.4.1 Model Design
The model receives electrolyte descriptors input as PyTorch Geometric data batch
objects, DataElectrolyte objects. As described in Section 3.3.4, these descriptors con-
sist of molecular compounds presented as batched graphs with connected graph fea-
tures. The model’s objective is to find patterns between the node connections within
the graphs, followed by learning from the graph-level features. To obtain this the
network could be divided into two parts, consisting of a GNN with message-passing
convolutions and a multi-layer perceptron (MLP) neural network with prediction.

The first part of the model consists of a GNN with message passing. The electrolyte
descriptor inputs were transformed into node-embedded graphs, where each node em-
bedding represents a molecule and its connections. Message passing on between the
nodes was implemented with two layers of graph convolutions, followed by pooling
all nodes into weighted graph values. The pooled graph values were then concate-
nated with each respective graph feature for the MLP and prediction. This part of
the model and its schematic overview is illustrated in Figure 3.6.

Figure 3.6: Schematic overview of the GNN purpose of the network. Electrolyte
descriptors as DataElectrolyte objects are sent into the network, where each molec-
ular compound becomes a graph with each molecule as embedded nodes. The node-
embedded graphs are sent into graph convolutions, that perform two layers of mes-
sage passing within each graph, followed by pooling and summarizing each graph
into a weighted value.

The second part of the model consists of a MLP, with linear fully connected layers.
Graph features were concatenated with pooled graph values, followed by sorting the
graphs by the molecular compounds of the highest concentration. Out of these sorted
values, P graphs were padded to the same size to be used in the MLP. The following
layer is a graph-wise fully connected layer that learns from the graph feature and
outputs fully connected values. These are sent into several fully connected layers for

27


3. Methodology

deeper learning with ReLU activation function utilized after each layer. Lastly, the
model predicts the cycle performance by comparing the predicted value against the
true target value Cycles To 70%, which is normalized around zero. The prediction
is evaluated with MSE loss. This choice of the loss function is usual in regression
tasks such as true value prediction since the goal is to minimize the error between
the target value and the predicted value. The complete network architecture from
input to prediction, including the network operations and dimensions, is described
in figure 3.7.

Figure 3.7: Illustration of the full network architecture from developed electrolyte
descriptor input to predicted cycle performance output. The network is divided
into graph convolutions, graph pooling, graph feature concatenation, sorting and
padding, graph-wise fully connected layer, fully connected layers and prediction.
Input vector and matrix sizes are shown in between all network operations.

To obtain some indication of predictable descriptors from this initial version, the
data were divided into training and validation sets. Additionally, the Adam opti-
mizer, MSE loss and a low learning rate were utilized considering that the amount of
data is limited. The training was performed until the network converged or found a
minimum, and compared to the validation loss. Hyperparameter tuning, evaluation
metrics and visual interpretation were done to evaluate the results and analyze if
the descriptors were sufficient as input for the model.

28


4
Results & Discussion

This section offers a comprehensive analysis of the research efforts and presents an
in-depth interpretation of the obtained results. The annotated data, the developed
electrolyte descriptors and the results from the GNN are presented, emphasizing the
complexity of the electrolyte descriptors and their predictive abilities. The method-
ologies employed are evaluated, limitations are acknowledged and the broader im-
plications of electrolyte descriptors for cycle prediction are discussed extensively.

4.1 Annotated Data
The collected data consisted of 104 battery cell cycle performance tests. These
tests involved various electrolyte compositions, electrodes, temperature, C-rates and
voltage ranges. The cycle performance of each test was annotated and the target
value was estimated with Cycles To 70% as in section 3.1.1. The resulting target
values are presented in Figure 4.1.

Figure 4.1: Cycle performance of 104 annotated battery cycle tests, with Cycles
to 70% capacity of each electrolyte test. The temperature used within the test is
also represented. The relevant data from test 81 is presented in the bottom part of
the figure.

29


4. Results & Discussion

The annotated data is considered profoundly important in developing electrolyte de-
scriptors, as it determines the fundamental properties of electrolytes and ultimately
influences the characteristics, complexity and dimensions of the descriptors, which
serve as the network input. Consequently, the annotated data exceedingly influence
the predictive capabilities of the descriptors. The optimal approach based on this
would be to obtain a great amount of structural data from the same reliable source.
This was not an option since no such publicly available dataset exists, which lim-
ited the data collection to manual annotation and collection. However, while the
amount of data collected in this project may not be sufficient for extensive model
training purposes, it does provide enough data for developing functional electrolyte
descriptors and testing a functional prediction model.

One of the main challenges associated with collecting battery data from open sources
is the complexity and variability of cycling test characteristics. Not only is it a
challenging task to find similar research papers, but also difficult to find research
papers that provide all the comprehensive information used within their battery
testing. Additionally, there is a significant variation of testing metrics like C-rate
and voltage range among these papers. This variability adds complexity and in-
creases the dimensionality, hence weakening the comparability between the different
battery tests. The number of parameters that profoundly affect the results includes
temperature, C-rate, voltage range and electrodes used. For instance, a battery
performance test conducted over a wider voltage range may face more challenging
conditions and consequently yield worse cycle results compared to a similar battery
test performed over a narrower voltage range. Furthermore, open-source battery
tests could easily fool the masses by tuning parameters and presenting results from
certain viewpoints [75]. This makes it challenging to differentiate and compare the
performance between the different battery tests, posing a complexity for an ML
model to identify patterns.

This issue of complexity could be resolved by reducing the number of parameters
and performance metrics, which could be done by simply performing each battery
test with the same metrics, i.e. same dimensions. Ideally, the only varying metric
would be the electrolyte composition. Nevertheless, as previously mentioned in the
introduction, this objective lies beyond the scope of this project and is constrained
by the availability of open-source battery testing resources.

Furthermore, the estimated target value Cycles to 70% serves as a convenient
method for comparing battery performance, as it assigns standardized labels to
battery tests within the same dimension. The 70% mark is rooted in the recognition
that a battery has reached its EoL, which is commonly used as an indicator of a fin-
ished test. This target value provides a consistent reference across all tests, making
it a suitable label for ML. However, this approach is a simplified way of determin-
ing its performance and may lose accuracy, since the provided data varies. Various
parameters can affect the behavior of battery tests including the equilibration cy-
cles in the beginning, as explained in 3.1.1, which is performed differently among
research papers and solved by multiplying an individual gradient factor. Although

30


4. Results & Discussion

efforts were conducted to determine this factor through interpretation and analysis,
it should be considered to be arbitrarily chosen. In conclusion, the labeling is only
as accurate as it could be from visual interpretation, making prediction harder.

4.2 Descriptors

The resulting electrolyte descriptors are DataElectrolyte objects that are flexible for
the intended user. In Figure 4.2, its attributes are presented. All attributes are
vital for the future neural network to process it correctly, although the molecule
list is only necessary for plotting the electrolyte. As can be seen in Figure 4.2, all
attributes of the electrolyte are explained and each has significance for the descriptor
to be viable.

Figure 4.2: Illustration of all attributes of a DataElectrolyte object.

The graph attributes of the electrolyte descriptor are essential to ensure that the
neural network can accept it as input, these include x, edge index and batch. The
nodes along with their one-hot encoding are stored in x and the edge index ensures
that each connection between nodes is accounted for. Batch is the attribute that
keeps track of what subgraph each node belongs to, which is essential when pool-
ing the molecular compounds in the neural network. However, the most important
traits of the descriptor are the graph features, which are the values produced by the
analysis from CHAMPION. This is what separates this project from other studies
on electrolyte performance prediction since these descriptors include additional in-
formation about the simulated behavior of the electrolytes.

The resulting electrolyte descriptor contains the ability to plot the representation of
the electrolyte or one of the molecular compounds in it. In Figure 4.3, a plot of an
electrolyte is presented. The plot enables the user to inspect the electrolyte further
and detect specific molecules in it.

31


4. Results & Discussion

Figure 4.3: Representation of an electrolyte including all molecular compounds in
it.

In Figure 4.4, a plot of specific molecular compounds is presented along with its
graph features. Since the descriptor has this method attached, the user is enabled
to analyze specific molecular compounds within the electrolyte and investigate its
features more accurately.

Figure 4.4: Plot of graph number 12 in the electrolyte along with its corresponding
graph features. In this example, the included graph features are charge, mass, con-
centration, molar conductivity, electrical mobility, ionic conductivity and diffusivity.

As discussed, the produced electrolyte descriptors are flexible and incorporate the
necessary attributes that are required for a GNN to accept it as input. The flexibil-

32


4. Results & Discussion

ity is especially demonstrated when choosing graph features from the analysis using
CHAMPION. There are several features that were not included in this project which
can efficiently be added, using the MolGraph class discussed in section 3.3.2. Since
the main aim was to add features that affect the performance of the battery, rele-
vant electrolyte attributes in this aspect could be added subsequently. Furthermore,
the ability to include multiple molecular compounds as graphs is favorable in this
project. This is valuable since all electrolytes contain several molecules and molec-
ular compounds that usually are represented as graphs. Therefore, the descriptors
are suitable for a GNN and can include the necessary graph features to predict the
cycle performance. However, what features are relevant for the prediction are yet to
be determined, but the user could easily switch between graph attributes in order
to find the most suitable for the task.

The user has, as mentioned, the ability to add further features to the descriptors.
This is valuable in the aspect of battery level features such as electrodes, C-rates for
charge and discharge as well as cycling temperature. These electrolyte attributes
were annotated in the collected data as described in section 4.1. Since these at-
tributes can have relevance to the performance of the electrolyte, they could be
valuable as input to the GNN as well. In this project, these features were not added
to the electrolytes but could easily be introduced as electrolyte level features in the
DataElectrolyte objects.

4.3 Model Evaluation
The developed GNN accepts an electrolyte descriptor as input in order to predict
the cycle number of the electrolyte. The predictive capabilities of the GNN are
presented as the loss over epochs in Figure 4.5.

Figure 4.5: Training and validation loss of the GNN over 1000 epochs.

33


4. Results & Discussion

As can be seen in the resulting plot, the training loss is decreasing and converges
after 1000 epochs which indicates that it learns from the training set. However, the
validation loss is high compared to the training loss which demonstrates the absence
of accurate prediction on unseen data. Since the amount of training data is con-
fined, it does not represent the entirety of the dataset and subsequently the model
overfits the training data. Overfitting could also occur when the dataset possesses
noisy data, which would indicate that some of the features within the electrolyte
data points are not relevant for predicting purposes. In the context of this thesis,
it is impossible to deduce if this is the case since the amount of data is limited.
Furthermore, due to complex and time-consuming simulations, only 20 out of the
104 battery tests were simulated and could be generated as electrolyte descriptors.
Out of these 20 electrolytes, 6 had inherent faulty data and had to be removed. The
remaining 14 electrolytes provided a sufficient amount of molecular compound data
which enabled pattern recognition on the training set but as expected it does not
accurately represent all possible input data points.

One way to expand the dataset for an ML model is to use data augmentation [76],
which is a procedure where the data is modified to appear different for the model.
This way, the amount of data could increase and the diversity of the dataset could
grow. Applying data augmentation on this dataset could be an option but would
probably be difficult since the data is produced from analysis using CHAMPION
and the target values are based on battery testing. Another way to achieve better
performance with limited data is to use transfer learning [77]. In transfer learn-
ing, the model is pre-trained, typically trained on a large and general dataset. By
transmitting learned weights and representations, the model can benefit from the
knowledge acquired during the pre-training phase, leading to improved performance
as well as reduced data requirements for the target task. However, this may be dif-
ficult to apply on this GNN since there are no similar networks that are pre-trained
for the same purpose as this project. Nonetheless, it may be applicable in the aspect
of generating more descriptor data, by using descriptors of molecules from publically
available databases if they are similar enough.

Another critical aspect of the model evaluation is the importance of node embedding.
Proper node embeddings are essential in a GNN considering that node features are
propagated through graphs and the model learns from this. If the node embeddings
are not diverse enough, the necessary information about each node will not matter
for the model. In this project, one-hot encodings were used as node representations
to specify what molecular species each node is. However, molecular fingerprints [78]
are another option for node embedding. Molecular fingerprints encode the structure
of a certain molecule in order to extract relevant features. The most common fin-
gerprint is a series of binary digits (bits) that represent the presence or absence of
particular substructures within a molecular species. These fingerprints could enrich
the feature representation of molecules in the electrolyte and possibly work better
as node embedding. Such fingerprints are obtainable using structural text repre-
sentation as input in open-source toolkits for cheminformatics such as RDKit [79].
The molecular text representation could be accessed through the database following

34


4. Results & Discussion

the CHAMPION analysis step. With that in mind, the lack of time in this project
prevented further investigation of fingerprints as node embedding.

Although there are several approaches to increase the performance of the model,
these options could only be examined further if the dataset was more comprehen-
sive. Without a sufficient amount of data, it is difficult to detect if an alteration
improves the model performance or coincidentally improves it. With an extended
dataset, the hyperparameters such as model complexity and learning rate could also
be examined to a greater extent. By adjusting the number of layers, learning rate
and optimizer, the prediction capability could increase and consequently enhance
model performance. However, the hyperparameter modification currently ensures
the model’s ability to accept electrolyte descriptors as input and generates a predic-
tion based on the training set. Since this was the primary scope of the thesis, the
results are satisfactory.

35


4. Results & Discussion

36


5
Conclusions & Future Work

This thesis challenges to bridge the gap between empirical experiments and theo-
retical understanding of battery cycling performance as well as reducing the need
for extensive manual testing by aiming to determine cell cycling performance as a
function of electrolyte compositions. In detail, this project investigated the devel-
opment of structural electrolyte descriptors for the purpose of cycling prediction.
The developed descriptors were developed based on electrolyte MD simulations and
CHAMPION analysis, successfully possessing structural information about molec-
ular compounds and their inherent chemical and physical features. The electrolyte
descriptors are structured as DataElectrolyte objects that easily could be modified
based on the desired features. The developed electrolyte descriptors effectively serve
as cycle predictors by serving as input to a neural network, namely a GNN, where
they demonstrate successful functionality. In correlation with the project limita-
tions, only an initial version of the network was developed. Although the model
robustly exhibited some indications of predictive capabilities, it is challenging to
draw definitive conclusions about its performance.

For the matter of improving model performance, further analysis necessitates a
larger dataset to obtain more conclusive results. The significance of data volume
can be attributed to several factors. Firstly, a larger dataset provides a broader rep-
resentation of the diverse range of electrolyte compositions and properties, allowing
the models to capture the intricacies and nuances of electrolyte behavior more ef-
fectively. Additionally, a substantial dataset helps overcome issues related to data
sparsity and improves the generalizability of the developed models. By training on
a larger and more diverse dataset, the GNN can better account for variations in
electrolyte systems, leading to enhanced predictive capabilities.

Building upon our findings, there are several promising avenues for future research
in the field of predictive electrolyte descriptors. Primarily, efforts should focus
on expanding and diversifying the existing electrolyte databases. By collaborating
with researchers and industry partners, data-sharing initiatives can be established
to consolidate and standardize electrolyte information from various sources. This
ensures the avoidance of manual data collection and annotation, which is highly
time-consuming. Furthermore, the development of experimental techniques and au-
tomated data acquisition methods, such as those employed by Compular, holds the
potential to facilitate the rapid generation of large-scale electrolyte datasets.

Moreover, future work involves exploring advanced ML and data analysis techniques.

37


5. Conclusions & Future Work

Deep learning architectures, such as recurrent neural networks or transformers, can
be employed to leverage vast amounts of data and extract intricate patterns from
electrolyte descriptors. Additionally, incorporating domain knowledge and physical
principles into the predictive models can further enhance their accuracy and inter-
operability. Moreover, it is crucial to validate and refine the developed predictive
models through experimental validation. Conducting targeted experiments using
selected electrolyte compositions predicted by the models can provide valuable feed-
back and enable iterative improvements.

By addressing these areas of future work, we can continue to harness the power of
data and predictive models to propel advancements in electrolyte research, leading
to more efficient, safer and sustainable energy storage technologies.

38


Bibliography

[1] Arora, N. K., Fatima, T., Mishra, I., Verma, M., Mishra, J., & Mishra, V.
(2018). Environmental sustainability: challenges and viable solutions. Environ-
mental Sustainability, 1, 309-340.

[2] Verdecchia, R., Sallou, J., & Cruz, L. (2023). A Systematic Review of Green
AI. arXiv preprint arXiv:2301.11047.

[3] Borah, R., Hughson, F. R., Johnston, J., & Nann, T. (2020). On battery ma-
terials and methods. Materials Today Advances, 6, 100046.

[4] Compular. (n.d.). Compular Tech. Retrieved May 26, 2023, from
https://compulartech.com/

[5] Andersson, R., Årén, F., Franco, A. A., & Johansson, P. (2021). CHAMPION:
Chalmers hierarchical atomic, molecular, polymeric and ionic analysis toolkit.
Journal of Computational Chemistry, 42(23), 1632-1642.

[6] Mabbott, G. A. (1983). An introduction to cyclic voltammetry. Journal of
Chemical education, 60(9), 697.

[7] Huang, J., Dong, X., Wang, N., & Wang, Y. (2022). Building low-temperature
batteries: Non-aqueous or aqueous electrolyte?. Current Opinion in Electro-
chemistry, 100949.

[8] Dou, Q., Wang, Y., Wang, A., Ye, M., Hou, R., Lu, Y., ... & Yan, X. (2020).
“Water in salt/ionic liquid” electrolyte for 2.8 V aqueous lithium-ion capacitor.
Science Bulletin, 65(21), 1812-1822.

[9] Quintans De Souza, G. (2021). A comparison between aqueous and organic
electrolytes for lithium ion batteries.

[10] Kim, T., Song, W., Son, D. Y., Ono, L. K., & Qi, Y. (2019). Lithium-ion
batteries: outlook on present, future, and hybridized technologies. Journal of
materials chemistry A, 7(7), 2942-2964.

[11] Angulakshmi, N., & Stephan, A. M. (2015). Efficient electrolytes for
lithium–sulfur batteries. Frontiers in Energy Research, 3, 17.

[12] Manthiram, A. (2017). An outlook on lithium ion battery technology. ACS
central science, 3(10), 1063-1069.

[13] Xie, J., & Lu, Y. C. (2020). A retrospective on lithium-ion batteries. Nature
communications, 11(1), 2499.

[14] Boz, B., Dev, T., Salvadori, A., & Schaefer, J. L. (2021). Electrolyte and elec-
trode designs for enhanced ion transport properties to enable high performance
lithium batteries. Journal of The Electrochemical Society, 168(9), 090501.

[15] Li, J., Mazzola, M. S., Gafford, J., Jia, B., & Xin, M. (2012). Bandwidth based
electrical-analogue battery modeling for battery modules. Journal of Power
Sources, 218, 331-340. https://doi.org/10.1016/j.jpowsour.2012.07.006

39


Bibliography

[16] Meng, Y. S., Srinivasan, V., & Xu, K. (2022). Designing better electrolytes.
Science, 378(6624), eabq3750.

[17] Xu, K. (2004). Nonaqueous liquid electrolytes for lithium-based rechargeable
batteries. Chemical reviews, 104(10), 4303-4418.

[18] Kang, S. J., Park, K., Park, S. H., & Lee, H. (2018). Unraveling the role of LiFSI
electrolyte in the superior performance of graphite anodes for Li-ion batteries.
Electrochimica Acta, 259, 949-954.

[19] Chen, K., & Xue, D. (2016). Materials chemistry toward electrochemical energy
storage. Journal of Materials Chemistry A, 4(20), 7522-7537.

[20] Armand, M., & Tarascon, J. M. (2008). Building better batteries. nature,
451(7179), 652-657.

[21] Ashrafizadeh, S. N., Seifollahi, Z., Ganjizade, A., & Sadeghi, A. (2020). Elec-
trophoresis of spherical soft particles in electrolyte solutions: A review. Elec-
trophoresis, 41(1-2), 81-103.

[22] Andersson, R. (2020). Dynamic Structure Discovery and Ion Transport in Liq-
uid Battery Electrolytes. Chalmers Tekniska Hogskola (Sweden).

[23] Lewandowski, A., & Świderska-Mocek, A. (2009). Ionic liquids as electrolytes
for Li-ion batteries—An overview of electrochemical studies. Journal of Power
sources, 194(2), 601-609.

[24] Chen, X., & Zhang, Q. (2020). Atomic insights into the fundamental inter-
actions in lithium battery electrolytes. Accounts of Chemical Research, 53(9),
1992-2002.

[25] Schwietert, T. K., Arszelewska, V. A., Wang, C., Yu, C., Vasileiadis, A., de
Klerk, N. J., ... & Wagemaker, M. (2020). Clarifying the relationship between
redox activity and electrochemical stability in solid electrolytes. Nature mate-
rials, 19(4), 428-435.

[26] Ping, P., Wang, Q., Sun, J., Xiang, H., & Chen, C. (2010). Thermal stabilities
of some lithium salts and their electrolyte solutions with and without contact to
a LiFePO4 electrode. Journal of the Electrochemical Society, 157(11), A1170.

[27] Li, Q., Chen, J., Fan, L., Kong, X., & Lu, Y. (2016). Progress in electrolytes
for rechargeable Li-based batteries and beyond. Green Energy & Environment,
1(1), 18-42.

[28] Newman, J., & Balsara, N. P. (2021). Electrochemical systems. John Wiley &
Sons.

[29] Borah, R., Hughson, F. R., Johnston, J., & Nann, T. (2020). On battery ma-
terials and methods. Materials Today Advances, 6, 100046.

[30] Guena, T., & Leblanc, P. (2006, September). How depth of discharge affects the
cycle life of lithium-metal-polymer batteries. In INTELEC 06-Twenty-Eighth
International Telecommunications Energy Conference (pp. 1-8). IEEE.

[31] Ahn, D., & Raj, R. (2011). Cyclic stability and C-rate performance of amor-
phous silicon and carbon based anodes for electrochemical storage of lithium.
Journal of Power Sources, 196(4), 2179-2186.

[32] Mothilal Bhagavathy, S., Budnitz, H., Schwanen, T., & McCulloch, M. (2021).
Impact of charging rates on electric vehicle battery life. Findings, 2021(March).

[33] Laidler, K. J. (1984). The development of the Arrhenius equation. Journal of
chemical Education, 61(6), 494.

40


Bibliography

[34] Zhang, S., Xu, K., & Jow, T. (2003). Low-temperature performance of Li-ion
cells with a LiBF 4-based electrolyte. Journal of Solid State Electrochemistry,
7, 147-151.

[35] Ma, S., Jiang, M., Tao, P., Song, C., Wu, J., Wang, J., ... & Shang, W. (2018).
Temperature effect and thermal impact in lithium-ion batteries: A review.
Progress in Natural Science: Materials International, 28(6), 653-666.

[36] Kissinger, P. T., & Heineman, W. R. (1983). Cyclic voltammetry. Journal of
chemical education, 60(9), 702.

[37] Togasaki, N., Yokoshima, T., Oguma, Y., & Osaka, T. (2020). Prediction of
overcharge-induced serious capacity fading in nickel cobalt aluminum oxide
lithium-ion batteries using electrochemical impedance spectroscopy. Journal of
Power Sources, 461, 228168.

[38] Belov, D., & Yang, M. H. (2008). Failure mechanism of Li-ion battery at over-
charge conditions. Journal of Solid State Electrochemistry, 12, 885-894.

[39] Meegoda, J. N., Malladi, S., & Zayas, I. C. (2022). End-of-Life Management of
Electric Vehicle Lithium-Ion Batteries in the United States. Clean Technologies,
4(4), 1162-1174.

[40] Wang, A., Kadam, S., Li, H., Shi, S., & Qi, Y. (2018). Review on modeling
of the anode solid electrolyte interphase (SEI) for lithium-ion batteries. npj
Computational Materials, 4(1), 15.

[41] Yu, Z., Wang, H., Kong, X., Huang, W., Tsao, Y., Mackanic, D. G., ... & Bao,
Z. (2020). Molecular design for electrolyte solvents enabling energy-dense and
long-cycling lithium metal batteries. Nature Energy, 5(7), 526-533.

[42] Xia, J., Ma, L., & Dahn, J. R. (2015). Improving the long-term cycling perfor-
mance of lithium-ion batteries at elevated temperature with electrolyte addi-
tives. Journal of Power Sources, 287, 377-385.

[43] Su, C. C., He, M., Shi, J., Amine, R., Zhang, J., Guo, J., & Amine, K.
(2021). Superior long-term cycling of high-voltage lithium-ion batteries enabled
by single-solvent electrolyte. Nano Energy, 89, 106299.

[44] Nagpure, S. C., Tanim, T. R., Dufek, E. J., Viswanathan, V. V., Crawford, A.
J., Wood, S. M., ... & Liaw, B. (2018). Impacts of lean electrolyte on cycle life
for rechargeable Li metal batteries. Journal of Power Sources, 407, 53-62.

[45] Li, Q., Jiao, S., Luo, L., Ding, M. S., Zheng, J., Cartmell, S. S., ... & Xu, W.
(2017). Wide-temperature electrolytes for lithium-ion batteries. ACS applied
materials & interfaces, 9(22), 18826-18835.

[46] Cao, X., Jia, H., Xu, W., & Zhang, J. G. (2021). Localized high-concentration
electrolytes for lithium batteries. Journal of The Electrochemical Society,
168(1), 010522.

[47] Chae, S., Kwak, W. J., Han, K. S., Li, S., Engelhard, M. H., Hu, J., ... &
Zhang, J. G. (2021). Rational design of electrolytes for long-term cycling of Si
anodes over a wide temperature range. ACS Energy Letters, 6(2), 387-394.

[48] Li, W., Dolocan, A., Li, J., Xie, Q., & Manthiram, A. (2019). Ethylene
carbonate-free electrolytes for high-nickel layered oxide cathodes in lithium-ion
batteries. Advanced Energy Materials, 9(29), 1901152.

41


Bibliography

[49] Brox, S., Röser, S., Husch, T., Hildebrand, S., Fromm, O., Korth, M., ... &
Cekic-Laskovic, I. (2016). Alternative Single-Solvent Electrolytes Based on Cya-
noesters for Safer Lithium-Ion Batteries. ChemSusChem, 9(13), 1704-1711.

[50] Sharova, V., Moretti, A., Diemant, T., Varzi, A., Behm, R. J., & Passerini, S.
(2018). Comparative study of imide-based Li salts as electrolyte additives for
Li-ion batteries. Journal of Power Sources, 375, 43-52.

[51] Hansson, T., Oostenbrink, C., & van Gunsteren, W. (2002). Molecular dynam-
ics simulations. Current opinion in structural biology, 12(2), 190-196.

[52] Frenkel, D., Smit, B., & Ratner, M. A. (1996). Understanding molecular simu-
lation: from algorithms to applications (Vol. 2). San Diego: Academic Press.

[53] Leach, A. R. (2001). Molecular modelling: principles and applications (2nd
ed.). Prentice Hall.

[54] Hollingsworth, S. A., & Dror, R. O. (2018). Molecular Dynamics Simulation for
All. Neuron, 99(6), 1129-1143. https://doi.org/10.1016/j.neuron.2018.08.011

[55] Grimme, S., Bannwarth, C., & Shushkov, P. (2017). A robust and accurate
tight-binding quantum chemical method for structures, vibrational frequencies,
and noncovalent interactions of large molecular systems parametrized for all
spd-block elements (Z= 1–86). Journal of chemical theory and computation,
13(5), 1989-2009.

[56] Kühne, T. D., Iannuzzi, M., Del Ben, M., Rybkin, V. V., Seewald, P., Stein,
F., ... & Hutter, J. (2020). CP2K: An electronic structure and molecular dy-
namics software package-Quickstep: Efficient and accurate electronic structure
calculations. The Journal of Chemical Physics, 152(19), 194103.

[57] Beezer, R. A. (2008). Review of: Graph Theory by JA Bondy and USR Murty.
[58] Chartrand, G., Lesniak, L., & Zhang, P. (2010). Graphs & digraphs (Vol. 39).

CRC press.
[59] Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications,

and performance: A survey. Knowledge-Based Systems, 151, 78-94.
[60] Fey, M., & Lenssen, J. E. (2019). Fast graph representation learning with Py-

Torch Geometric. arXiv preprint arXiv:1903.02428.
[61] Hagberg, A., Swart, P., & S Chult, D. (2008). Exploring network structure,

dynamics, and function using NetworkX (No. LA-UR-08-05495; LA-UR-08-
5495). Los Alamos National Lab.(LANL), Los Alamos, NM (United States).

[62] Hopfield, John J. "Artificial neural networks." IEEE Circuits and Devices Mag-
azine 4.5 (1988): 3-10.

[63] Abraham, A. (2005). Artificial neural networks. Handbook of measuring system
design.

[64] LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
https://doi.org/10.1038/nature14539

[65] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553),
436-444.

[66] Wu, Y., Lian, D., Xu, Y., Wu, L., & Chen, E. (2020, April). Graph con-
volutional networks with markov random field reasoning for social spammer
detection. In Proceedings of the AAAI conference on artificial intelligence (Vol.
34, No. 01, pp. 1054-1061).

42


Bibliography

[67] Sanchez-Gonzalez, A., Heess, N., Springenberg, J. T., Merel, J., Riedmiller,
M., Hadsell, R., & Battaglia, P. (2018, July). Graph networks as learnable
physics engines for inference and control. In International Conference on Ma-
chine Learning (pp. 4470-4479). PMLR.

[68] Hamaguchi, T., Oiwa, H., Shimbo, M., & Matsumoto, Y. (2017). Knowledge
transfer for out-of-knowledge-base entities: A graph neural network approach.
arXiv preprint arXiv:1706.05674.

[69] Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., ... & Sun, M. (2020).
Graph neural networks: A review of methods and applications. AI open, 1,
57-81.

[70] Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., ... & Chen, T.
(2018). Recent advances in convolutional neural networks. Pattern recognition,
77, 354-377.

[71] Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., & Weinberger, K. (2019,
May). Simplifying graph convolutional networks. In International conference
on machine learning (pp. 6861-6871). PMLR.

[72] Seko, A., Togo, A., & Tanaka, I. (2018). Descriptors for machine learning of
materials data. Nanoinformatics, 3-23.

[73] Amar, Y., Schweidtmann, A. M., Deutsch, P., Cao, L., & Lapkin, A. (2019).
Machine learning and molecular descriptors enable rational solvent selection in
asymmetric catalysis. Chemical science, 10(27), 6697-6706.

[74] Himanen, L., Jäger, M. O., Morooka, E. V., Canova, F. F., Ranawat, Y. S., Gao,
D. Z., ... & Foster, A. S. (2020). DScribe: Library of descriptors for machine
learning in materials science. Computer Physics Communications, 247, 106949.

[75] Johansson, P., Alvi, S., Ghorbanzade, P., Karlsmo, M., Loaiza, L., Thangavel,
V., ... & Årén, F. (2021). Ten ways to fool the masses when presenting battery
research. Batteries & Supercaps, 4(12), 1785-1788.

[76] Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmen-
tation for deep learning. Journal of big data, 6(1), 1-48.

[77] Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer
learning. Journal of Big data, 3(1), 1-40.

[78] Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T.,
Aspuru-Guzik, A., & Adams, R. P. (2015). Convolutional networks on graphs
for learning molecular fingerprints. Advances in neural information processing
systems, 28.

[79] RDKit: Open-source cheminformatics. https://www.rdkit.org

43


Bibliography

44


DEPARTMENT OF PHYSICS
CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden
www.chalmers.se

www.chalmers.se

	List of Acronyms
	List of Figures
	Introduction
	Purpose and Scope
	Aim
	Limitations

	Theory
	Lithium-Ion Batteries
	Electrolytes
	Battery Performance Testing

	Molecular Dynamics
	Graph Theory
	Artificial Neural Networks
	Graph Neural Networks
	Descriptors


	Methodology
	Data Collection
	Target Value Estimation

	Simulation
	Simulation Pre-processing
	Electrolyte Components Annotation
	Molar Ratio Calculation
	Geometry Creation Using CHAMPION

	Molecular Dynamics Simulation
	Analysis Using CHAMPION

	Descriptor Development
	Node Embedding
	Graph Construction
	Graph Feature Extraction
	Graph Batching

	Model Development
	Model Design


	Results & Discussion
	Annotated Data
	Descriptors
	Model Evaluation

	Conclusions & Future Work
	Bibliography