DF

Deep autoencoder for condition
monitoring of wind turbines
- Detecting and diagnosing anomalies

Master’s thesis in Complex Adaptive Systems

JOHANNA RENMAN

Department of Physics
CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2019


Deep autoencoder for condition
monitoring of wind turbines

Detecting and diagnosing anomalies

JOHANNA RENMAN

DF

Department of Physics
Chalmers University of Technology

Gothenburg, Sweden 2019


Deep autoencoder for condition monitoring of wind turbines
– Detecting and diagnosing anomalies
JOHANNA RENMAN

© JOHANNA RENMAN, 2019.

Supervisor: Pramod Bangalore, Greenbyte AB
Examiner: Kristian Gustafsson, Department of Physics, University of Gothenburg

Department of Physics
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: Wind turbine

Typeset in LATEX
Printed by Chalmers Reproservice
Gothenburg, Sweden 2019

iv


Abstract
Over the last decade, energy production from wind turbines has grown by 400%,
prompted by public investments and climate change awareness as well as advances
in technology. With subsidies offered to wind farms dwindling, the owners and
operators of wind farms are forced to cut operational cost to stay profitable. This
has lead to a renewed focus on predictive and preventive maintenance, targeting not
only the traditionally well monitored large components in the wind turbine, but also
including smaller, more easily replaced components. Advances in the use of machine
learning to model complex system, combined with the growing access of data have
allowed advanced methods for condition monitoring and anomaly detection to be
developed. This has been applied to the field of condition monitoring for wind
turbines in various research projects, utilizing data from the Supervisory Control
and Data Acquisition (SCADA) system available in modern wind turbines. Most
of these systems have modeled just one component at a time and are therefore
not able to provide a complete condition monitoring system. Using autoencoders
for condition monitoring of wind turbines enables the whole wind turbine to be
modeled as one system, by learning the internal connections between the SCADA
signals. This has previously been studied, but most studies focus on the anomaly
detection step, and leave out the important part of diagnosing which signals are most
affected by the anomaly. By finding these signals, the source of the fault can be
found, which is important to allow for recommendation on where to do maintenance.
This thesis investigates the application of deep autoencoders to detect and diagnose
developing faults. The autoencoder has been used to produce a residual, taken as
the error between the input to the autoencoder and its reconstructed signal. For the
fault detection, the Mahalanobis distance has been used on the residual. For fault
diagnosis, the residual for each signal has been standardized and analyzed to examine
which signals are mostly affected by the fault. This was tested on eight known faults
found in five different components: gearbox, cooling system, hydraulic system, yaw
encoder issue and generator slipring. The proposed condition monitoring system
was successful in detecting and diagnosing all faults but one.
This thesis also presents an approach to understanding what the autoencoder has
learned, with the use of simulated faults. The study provides a good method for dis-
covering what connections between the SCADA signals the autoencoder has learned
as well as information about how the residual is affected when one signal is experi-
encing a fault.

Keywords: Wind turbine, condition monitoring, anomaly detection, preventive
maintenance, SCADA, autoencoder, neural networks, fault diagnosis, anomaly di-
agnosis, fault detection

v


Acknowledgements
I would like to thank the whole of Geeenbyte for making this thesis possible. In
particular I want to thank my supervisor, Pramod Bangalore, for his guidance during
all stages of the process: Thank you for always taking time to answer my questions
and your support when I needed it. I would also like to thank Niklas Renström,
who did not only provide me with his research in the area, but who was also open
for discussing ideas and results. Thank you Edmund for offering help and for your
expertise in English grammar and thank you Thomas for your support and openness.
I would also like to thank Kristian Gustafsson for being my examiner and for his
interest in the research.
Finally, a big thank you to all of my friends and family, not only for this thesis but
for being there during my whole education. Without you this would not have been
possible and I am forever grateful.

Johanna Renman, Gothenburg, December 2019

vii


Contents

1 Introduction 1
1.1 Background and problem overview . . . . . . . . . . . . . . . . . . . 1
1.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Aim and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 5
2.1 Wind turbines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Basics about wind turbines . . . . . . . . . . . . . . . . . . . . 5
2.1.2 SCADA system for wind turbines . . . . . . . . . . . . . . . . 6

2.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Feedforward neural networks . . . . . . . . . . . . . . . . . . 7

2.2.2.1 Forward propagation . . . . . . . . . . . . . . . . . . 7
2.2.2.2 Backpropagation . . . . . . . . . . . . . . . . . . . . 8
2.2.2.3 Activation function . . . . . . . . . . . . . . . . . . . 9

2.2.3 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3.1 Anomaly detection using autoencoder . . . . . . . . 10

2.3 Preprocessing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Transforming data . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1.1 Standardizing data . . . . . . . . . . . . . . . . . . . 11
2.3.1.2 ZCA whitening . . . . . . . . . . . . . . . . . . . . . 11

2.4 Data postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Mahalanobis distance . . . . . . . . . . . . . . . . . . . . . . 12
2.4.2 Exponentially Weighted Moving Average . . . . . . . . . . . . 12

3 Design of a condition monitoring system for wind turbines 13
3.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Training and validation data . . . . . . . . . . . . . . . . . . 13
3.1.2 Inference data . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Data cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 Transforming data . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.1 Model design . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2 Training the autoencoder . . . . . . . . . . . . . . . . . . . . . 16

ix


Contents

3.4 Data postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.1 Residual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4.1.1 Standardizing residual . . . . . . . . . . . . . . . . . 17
3.4.2 Exponentially Weighted Moving Average . . . . . . . . . . . . 18

3.5 Condition monitoring for wind turbines . . . . . . . . . . . . . . . . 18
3.5.1 Fault detection . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.5.1.1 Mahalanobis distance . . . . . . . . . . . . . . . . . 18
3.5.2 Fault diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5.2.1 Boxplot for fault diagnosis . . . . . . . . . . . . . . 20

4 Validation study 23
4.1 Decoding the autoencoder . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1.1 Positive vs negative fault . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1.2 Results and discussion . . . . . . . . . . . . . . . . . 25

4.1.2 Connection map . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2.1 Generating the connection map . . . . . . . . . . . . 31
4.1.2.2 Analysis of the connection map . . . . . . . . . . . . 31

4.1.3 Conclusion on simulated faults . . . . . . . . . . . . . . . . . . 33
4.2 Validation of the condition monitoring system . . . . . . . . . . . . . 35

4.2.1 Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Results and discussion per fault type . . . . . . . . . . . . . . 36

4.2.3.1 Healthy data . . . . . . . . . . . . . . . . . . . . . . 36
4.2.3.2 Cooling system failures . . . . . . . . . . . . . . . . . 37
4.2.3.3 Generator slipring failures . . . . . . . . . . . . . . 37
4.2.3.4 Hydraulic system issue . . . . . . . . . . . . . . . . . 37
4.2.3.5 Gearbox failure . . . . . . . . . . . . . . . . . . . . . 38
4.2.3.6 Yaw encoder failure . . . . . . . . . . . . . . . . . . 38

4.2.4 Conclusion from the validation on real world faults . . . . . . 48

5 Closure 49
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Bibliography 51

A Appendix 1 I
A.1 Comparing data transformers . . . . . . . . . . . . . . . . . . . . . . I

A.1.1 Connection map for ZCA whitened data . . . . . . . . . . . . I
A.1.2 Test on real faults using ZCA whitened data . . . . . . . . . . II

A.1.2.1 Cooling system issue . . . . . . . . . . . . . . . . . . II
A.1.2.2 Generator slipring issue . . . . . . . . . . . . . . . . III
A.1.2.3 Gearbox failure . . . . . . . . . . . . . . . . . . . . . III

A.1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III

B Gaussian residuals IX

x


1
Introduction

1.1 Background and problem overview

Over the last decade, energy production from wind turbines has grown by 400% [1].
In 2018, wind power stood for a large part of the world’s current renewable electricity
with 24 % of the total capacity [2]. This growth is prompted by public investments
and climate change awareness as well as advances in technology. Wind farms are
increasingly being built offshore for several reasons, including less disturbance in
urban areas, stronger and more stable wind and possibility for larger units, which
in turn may produce more energy [3]. The cost of maintenance for offshore wind
farms is however significant, which increases the need for good condition monitoring
systems to allow for preventive maintenance. A good condition monitoring system
is of course not only good for offshore wind farms but also onshore wind farms can
greatly benefit from predicting and planning maintenance. In fact, over an operating
life of 20 years, maintenance costs for wind farm may reach 15% of the total income
for onshore wind farms and 30% of total income for offshore wind farms. This cost
can be greatly reduced with a good condition monitoring system that allows for
predictive and preventive maintenance [4].

Condition monitoring involves observing the components of a wind turbine in order
to identify changes in the system that could indicate a developing fault. Tradition-
ally condition monitoring rely on e.g. visual inspection, vibration analysis, strain
measurement, thermography and acoustic emissions. Some of these techniques are
intrusive and impose a wear on the component being monitored [5]. The advances in
data-driven modelling using machine learning and the recent developments in sensors
and signal processing systems have allowed for new types of data-driven condition
monitoring systems, often utilizing SCADA (Supervisory Control And Data Acqui-
sition) data. Data-driven models are an effective way of creating a model without
having to consider the mathematical model of the physical system, instead the in-
ternal connections between measured input and output signals are learned to model
the system.

This thesis aims to design such a condition monitoring system for wind turbines with
the use of SCADA data. The system should (i) be able to detect developing faults
and (ii) be able to diagnose the fault, i.e. finding the degrading component(s).
In particular, the use of deep autoencoders, which is a type of Artificial Neural
Network, for condition monitoring purpose is examined.

1


1. Introduction

1.2 Previous work

Various research projects have investigated the use of SCADA-data to model a
condition monitoring system for wind turbines. Many of the methods aim to create
a model that learns the complex internal relationships between SCADA signals, such
as component temperatures, rotational speed, power produced, electrical quantities
etc. These types of models typically reconstructs a SCADA signal from a subset of
the other signals, and the residual between the original data and the reconstructed
data can be used to detect faults in the wind turbine. The capability to model highly
nonlinear relationships makes Deep Neural Networks (DNN) a popular choice. In
Reference [4] and [6] DNN models are used for condition monitoring of the gearbox
by learning how temperature and oil pressure signals in the gearbox behave. Since
these models are focused on the gearbox they will not react to a fault somewhere
else in the wind turbine.
In Reference [7] and [8] an Artificial Neural Network (ANN) model for each sub-
system of the wind turbine was proposed to get a more complete coverage. The
system was successful in detecting a variety of faults, but a deep knowledge of the
physical connection of the parts in the wind turbine was needed for a correct signal
mapping. Since the wind turbines on the market today are operating in a variety
of ways, with different signals being monitored for different turbines, this method is
difficult to employ across different types of wind turbines.
The use of a different kind of ANN model called autoencoder (AE) has shown success
in detecting developing faults, as shown in Reference [9] and [10]. By learning to
reproduce all its inputs, the AE has the capability to model complex relations and
to monitor the whole system at once. AE’s have successfully been applied in the
field of condition monitoring for wind turbines as shown in Reference [11], where the
AE was able to detect blade issues in the wind turbine and in Reference [12] and [13]
where AE’s detected various types of faults, like sensor faults and yaw encoder faults.
While these studies have applied the AE to a variety of faults in the wind turbine
and successfully shown that the AE can be used to detect the faults, the focus lies in
the detection step; to find out if the wind turbine is experiencing a fault. The next
step is to identify what signals are responsible for causing the alarm and thereby
help determining the root cause of the fault and provide recommendation on where
to do maintenance. In Reference [14], the Mahalanobis distance, D2, on the residual
between the input and the output of the AE was used find periods with a potential
fault in the wind turbine. In order to diagnose the fault, a method of determining the
relative contribution for each separate residual signal to the Mahalanobis distance
was proposed. This method calculated the partial Mahalanobis distance, D2

(i), on
all residual signals except the ith signal and used the difference di = D2 − D2

(i)
to determine the contribution of signal i to the overall Mahalanobis distance. A
large contribution suggested that the fault affected that signal a lot. The result of
this study suggests that this method might give some information about what the
fault in the turbine is, but due to lack of information in the turbine service logs
it is unclear whether some of the results are false positives. The authors suggests
alternative approaches to dealing with the problem of fault diagnosis and this thesis
proposes one such approach.

2


1. Introduction

1.3 Aim and limitations
This thesis aims to develop a condition monitoring system based on an AE that
is able to both find periods when the wind turbine operates with a fault and to
diagnose what components are faulty, by finding which measured signals are most
affected by the fault. Since most of the previous studies based on AE have focused
on the fault detection step [11] [12] [13], the main focus in this thesis is the fault
diagnosis.
ANNs, and hence AEs, are seen as black box models, meaning that it is difficult to
analyze why an ANN comes to a certain conclusion or gives a certain result. To
understand what the autoencoder has learned, this thesis presents a novel method
for examination of what the AE has learned with the help of simulated faults.
This provides information about what internal connections between the SCADA
data signals the AE has learned as well as information about how the residual (the
difference between the input signal and the reconstructed signal) is affected when
there is a fault in a certain signal.
The aim with the thesis is as follows:

1. Design a condition monitoring system based on an autoencoder that (i) detects
developing faults and (ii) is able to diagnose the fault, i.e. finding the degrading
component(s).

2. Examine what internal connections the autoencoder has found in order to
understand how the residual between the input and output of the autoencoder
behaves.

This thesis will utilize previous studies for the design of the AE, in particular the
studies made in Reference [13].

1.4 Structure of the thesis
The thesis is structured as follows:

• Chapter 2 provides the theoretical background to the methods and models
used in this thesis. Basic theory about wind turbines and SCADA data is
provided as well as information about artificial neural networks in general and
autoencoders in particular. The chapter also provides theoretical information
regarding pre- and post-processing of data, such as data transformation and
the Mahalanobis distance for multivariate distance measurement.

• Chapter 3 describes in detail the proposed method of using deep autoencoders
for condition monitoring of wind turbines. It explains how the data was pre-
and post-processed as well as the design and training of the autoencoder. It
also provides the proposed method for condition monitoring of wind turbines
with the use of an autoencoder, which is tested in the following chapter.

• Chapter 4 presents the validation tests that were done to examine the perfor-
mance of the condition monitoring system proposed in the previous chapter.
The first part of the chapter describes the usage of simulated faults to under-
stand what the autoencoder has learned and the knowledge gained from the
following discussion is used on real world faults in the second part of the chap-

3


1. Introduction

ter, in which eight faults are examined with the proposed condition monitoring
system.

• Chapter 5 discusses the findings in the thesis and proposes future work.
Finally, the thesis is concluded.

4


2
Theory

This chapter provides the theoretical background to the methods and models used
in this thesis. The chapter starts with basic theory about wind turbines and the
Supervisory Control and Data Acquisition (SCADA) system and its usage. This
is followed by an overview of Artificial Neural Networks (ANN), starting with an
explanation of one of the most basic ANNs, the Single Hidden Layer Feedforward
Network. This leads to an explanation of the Autoencoder, which is the type of ANN
used in this thesis. Theory for preprocessing data, including data transformation, is
presented in the following section. After this, two tools that are used to postprocess
the data are explained.

2.1 Wind turbines
A wind turbine has multiple parts functioning together. Here follows an overview
of how a conventional horizontal-axis wind turbine is operating and a description
of its main components. Next follows a description of the SCADA system and its
usage.

2.1.1 Basics about wind turbines
The conventional horizontal-axis wind turbine consists of a rotor and a nacelle, which
are placed on top of a tower that rests on a foundation. The tower is usually 70 - 120
meters high, to allow for long blades and to capture the faster, less turbulent wind
that is present at higher altitudes. The blades are placed on the rotor and their
lenght ranges between 20 to 80 meters. The rotor is connected to the low-speed
shaft, within the nacelle, which in turn is connected to the high-speed shaft via
the gearbox. This increases the rotational speed from about 30-60 RPM to about
1000-1800 RPM, which is the rotational speed required by the generator to produce
electricity. The generator is connected to a converter that transforms the electricity
to lie within the grid frequency. The electricity is then transported through the grid.
To avoid unnecessary strain on the components, the wind turbine operates within
the wind speed range of 4-25 m/s. See Figure 2.1 for a sketch of the turbine, with
the most important parts marked.
The most high-maintenance part of the wind turbine is the gearbox, which is also
the part that is causing most downtime [15]. The many wheels and bearings in the
gearbox suffer from great stress because of wind turbulence and a fault in any part
of the gearbox may lead to a halt. Due to this, wind turbines without a gearbox

5


2. Theory

Figure 2.1: A sketch showing many of the important parts of the wind turbine.
1) Blades, 2) Rotor, 3) Tower, 4) Nacelle, 5) Gear box, 6) Low-speed shaft, 7)
High-speed shaft, 8) Generator, 9) Converter

are being developed. These wind turbines instead uses direct drive and are said to
improve reliability. In this thesis, all the wind turbines examined have a gear box.

2.1.2 SCADA system for wind turbines
The Supervisory Control and Data Acquisition (SCADA) system is an important
part of the wind turbine. It is used on one hand to control the wind turbine remotely
and on the other hand to collect data to monitor both the current and the historical
behaviour of the wind turbine. The data recorded is sensor data collected as 10
minute averages.
Signals that are recorded for the wind turbine are for example wind speed, wind
direction and power as well as component specific signals like bearing temperatures
and lubrication oil temperatures and pressures. Signals from electrical components,
like currents and voltages, are also recorded.
By monitoring these signals, it is possible to gain knowledge about how the turbine
is currently operating and to compare it to how it historically has behaved. There
are multiple ways to monitor these signals, from setting static thresholds on signals
to developing a model of how the data looks for a healthy turbine. This thesis will
describe one such model.

2.2 Artificial Neural Networks
In this section a brief overview of artificial neural networks is given, with the aim to
give the right background for discussing the autoencoder, which is a type of artificial
neural network that is used in this thesis for condition monitoring of wind turbines.

6


2. Theory

Readers already familiar with the basics of neural networks can skip to Section 2.2.3
to read about the autoencoder.

2.2.1 Introduction
Artificial neural networks are inspired by how the biological brain learns. The brain
contains neurons, that are wired together with synapses in complex patterns that
are always evolving, allowing the brain to learn. Instead of having to specify a set
of rules, the brain learns by examples and experience. This is the mechanism that
artificial neural networks are trying to copy, why they consists of artificial neurons
that can be wired together to learn. The value of one neuron depends on the value
of the neurons connected to it as well as the weight of the connections between the
neurons. For a better overview and easier understanding, artificial neural networks
are usually visualized as directed graphs. One of the simplest versions of a neural
network is the single hidden layer feedforward network, which structure is shown as
a directed graph in Figure 2.2 and will be explained in more detail in the following
subsection.

2.2.2 Feedforward neural networks
This section aims to explain the basics of one of the most simple forms of neural
networks, the single hidden layer feedforward network. Many of the concepts are
very similar or identical to how the autoencoder network is used and trained, but is
easier explained by the use of this simple neural network.
A graph of a single hidden layer feedforward network is shown in Figure 2.2. This
network consists of one input layer, one hidden layer and one output layer. Each layer
consists of multiple neurons. The neurons in the input layer are all connected to the
neurons in the hidden layer and all the neurons in the hidden layer are connected to
the neurons in the output layer. This is called a fully connected network. Neurons
within a layer are not connected, which is an important aspect of a feedforward
network. What connects the neurons are weighted edges, usually just called weigths.
When talking about training a network, what is meant is the method of finding the
optimal values of the weights, and one of the most common ways of doing this is
explained in Section 2.2.2.2. When the optimal values of the weights are found, the
weights are fixed and will not change.
Information in the network can flow in two ways; forward, which is used for calcu-
lating the output of the network, and backward, which is only used during training
of the network. In the following two sections both of these will be explained in more
detail.

2.2.2.1 Forward propagation

Calculating the output of the network involves first calculating the values of the
neurons in the hidden layer, Equation (2.1),

Vi = g(h)(
∑
j

W
(h)
i,j xj + b

(h)
i ) (2.1)

7


2. Theory

x1

x2

xm

V1

V2

Vn

y1

y2

yo

Input Hidden Output

Figure 2.2: Architecture of a single hidden layer feedforward neural network.

where Vi is the value of hidden neuron i, W (h)
i,j is the weight between neuron i in the

hidden layer and neuron j in the input layer, xj is the value of input neuron j, b(h)
i is

a bias for neuron i in the hidden layer and g(h)(. . . ) is the activation function for the
hidden layer (see 2.2.2.3 for explanation of activation functions). The superscript (h)

specifies that the weights, biases and activation function are specific for the hidden
layer.
The output of the network is then calculated in a similar way, Equation (2.2),

yi = g(o)(
∑
j

W
(o)
i,j Vj + b

(o)
i ) (2.2)

where yi is the value of output neuron i and the superscript (o) specifies that the
weights, biases and activation function are specific for the output layer.
If there is more than one hidden layer, the procedure is the same; the values of the
neurons in the layers are calculated layer by layer until reaching the output layer.

2.2.2.2 Backpropagation

Backpropagation is a common way of training the network by changing the weights
of the network to reach optimal results. The optimal result is reached by trying to
minimize a function called the loss function. The loss function measures how close
the output of the network is to the wanted output of the network (called the target
values, ti). In this thesis the mean squared error, Equation (2.3), is used as loss
function.

L = 1
n

n∑
i=1

(ti − yi)2 (2.3)

The goal is to minimize the loss function, which is done with the use of an optimizer.
One common optimizer is gradient descent, which requires the activation functions
used on the layers to be differentiable. Gradient descent is used to repeatedly up-
date the weights by adding increments calculated on the error, see Equation (2.4),

8


2. Theory

where η is the learning rate and the superscript (µ) specifies the layer. The name
backpropagation refers to the process of updating the weights closest to the output
layer and then propagating the loss backwards, updating weights in each layer until
reaching the input layer [16].

W (µ)
m,n ←− W (µ)

m,n − η
∂L

∂W
(µ)
m,n

(2.4)

2.2.2.3 Activation function

There are many types of activation functions, but for networks that are trained
using backpropagation it is important that the activation function is differentiable.
Here three types of activation functions are presented; linear, sigmoid and ReLU.
The simplest activation function is linear, see Equation (2.5).

g(x) = x (2.5)

The sigmoid activation function, usually implemented as the logistic function, see
Equation (2.6), is often used when the goal is to calculate probabilities, since it
outputs a value between 1 and 0.

g(x) = 1
1 + e−x (2.6)

The ReLU activation function [17] is defined as the positive part of its argument,
see Equation (2.7). It is the most used activation function in deep learning. ReLU
is not differentiable when x = 0, but a common convention is to set the derivate to
zero at x = 0.

g(x) = max(0, x) (2.7)

2.2.3 Autoencoder
The autoencoder is the type of artificial neural network that is used in this thesis. It
is a feed-forward neural network with a layout that allows the network to learn the
most important features of the input signals, to allow for dimensionality reduction
and reconstruction of the data. The training of an autoencoder can be seen as an
unsupervised or self-supervised training algorithm, meaning that no labeled dataset
is used as targets of the network. Instead the network is trained to reproduce its in-
puts as correctly as possible, which is done by using the input values as targets. The
network is trained with the use of back-propagation, with a loss-function depending
on both the input and the output of the network; L(x, x̂).
In Figure 2.3 a graph of a deep autoencoder is shown. A deep neural network is
a network that contains multiple hidden layers, which allows the network to learn
both linear and non-linear relationships. The autoencoder can be seen to consist of
two sub-networks, where one is called the encoder, since it lowers the dimension of
the data. The other sub-network is called the decoder since it reconstructs the data
into the original dimension. The encoder and decoder contain the same amount of
layers and neurons, but in opposite order. The middle-layer, which is the layer with

9


2. Theory

the least amount of neurons, is called the code-layer. The bottleneck structure of
the autoencoder forces it to learn the most significant features from the input data,
to be able to reconstruct it well.
Among other things, autoencoders can be used for non-linear dimensionality reduc-
tion and for anomaly detection. In this thesis, they are used for the latter.

x1

x2

x3

x4

x5

V
(1)

1

V
(1)

2

V
(1)
i

V
(2)

1

V
(2)

2

V
(2)
j

V
(3)

1

V
(3)

2

V
(3)

3

V
(4)

1

V
(4)

2

V
(4)
k

V
(5)

1

V
(5)

2

V
(5)
l

x̂1

x̂2

x̂3

x̂4

x̂5

Encoder Code Decoder

Figure 2.3: Architecture of a deep autoencoder with 5 hidden layers and code size 3.

2.2.3.1 Anomaly detection using autoencoder

When training an autoencoder the model is forced to find the most important fea-
tures and relations of the input data, to be able to reconstruct the data well. This
is due to the bottle neck structure of the autoencoder, where as much information
as possible about the data has to be kept in a lower dimension. When training the
autoencoder on data from healthy operating conditions it will learn to reconstruct
normal data very well, but when faced with data from abnormal working condi-
tions it will fail to do so. By looking at the reconstruction error, called residual, i.e
the error between the input data and the reconstructed data, see Equation (3.3),
it is possible to find anomalies. More information about how this is done in the
application for wind turbines is presented in Section 3.5.1.

2.3 Preprocessing data
Raw data is often incomplete and in need of preprocessing to transform it into a
more fitting format. There are multiple important steps included in data prepro-

10


2. Theory

cessing, such as data cleaning, feature selection and data transformation. During
the data cleaning step missing values and outliers are handled, either by removal or
by for example interpolation of nearby values. Feature selection is the process of se-
lecting which features to use for the model. This can be done manually, by applying
knowledge of which features are important for the model. It can also be done auto-
matically, by applying feature selection techniques such as Lasso [18], which forces
the coefficients for some of the features to zero, or by feature fusion techniques like
PCA [19], which transforms the data into a set of uncorrelated principal compo-
nents, allowing a subset of the components to capture a large part of the variance.
In this thesis, the feature selection is a combination of manual and automatic se-
lection: Multiple features are manually selected with knowledge of what signals are
available and important in the wind turbine. The autoencoder is then performing
feature selection and feature fusion automatically, as the architecture of it forces it
to only keep important information.

2.3.1 Transforming data
Data transformation is used to ensure that the data is in wanted format, which can
differ between applications. This can include removing biases and trends and en-
suring that all features have the same amplitude. When features are measured with
different units, the amplitude of the values can differ by many orders of magnitude,
why it is good to scale it to a particular range. In this thesis, two types of data
transformation have been examined; Standardizing data and ZCA whitening.

2.3.1.1 Standardizing data

Standardizing the data ensures that the features are scaled and that some bias is
removed. This is done per feature by removing the mean value from the feature and
by dividing it by its standard deviation, as seen in Equation (2.8), where ~x represent
the original feature and ~y the transformed feature.

~y = ~x− µ
σ

µ = Mean[~x]
σ2 = Var[~x]

(2.8)

2.3.1.2 ZCA whitening

Whitening is a linear transformation that converts the vector ~x with mean ~µ and a
positive definite covariance matrix Σx into a new vector ~y of the same dimension as
~x with identity covariance matrix Σy = I according to Equation (2.9), where W is
the transformation matrix.

~y = W~x (2.9)

The transformation matrix needs to fulfill WTW = Σ−1
x , to fulfill the requirement

of Σy = I. As long as this condition is met, there are infinite many transformation
matrices that will whiten ~x. ZCA whitening gives one example of a transformation

11


2. Theory

matrix, that minimizes the total squared distance between the original and whitened
variables [20]. The transformation matrix for ZCA is shown in Equation (2.10).

WZCA = Σ1/2
x (2.10)

2.4 Data postprocessing
This section provides the theory of two data postprocessing tools that are used in
this thesis: The multivariate distance measure Mahalanobis distance and the signal
smoothing Exponentially Weighted Moving Average. The application of these will
be further explained in the next chapter.

2.4.1 Mahalanobis distance
Mahalanobis distance [21] is a multivariate distance measure which measures the dis-
tance between a point, ~x, and a distribution. It is calculated as shown in Equation
(2.11), where ~µ is a vector of mean values of independent variables from the distri-
bution and Σ−1 is the inverse covariance matrix of the independent variables from
the distribution. If the true distribution is not known, µ and Σ can be estimated
from data.

D(~x) =
√

(~x− ~µ)TΣ−1(~x− ~µ) (2.11)

The Mahalanobis distance can be used as a test statistic for determining the likeli-
hood that a point comes from the distribution. A high Mahalanobis distance means
that it is less likely that the point comes from the distribution. When the features
are uncorrelated, Mahalanobis distance becomes Euclidean distance with respect to
the mean of the data, since the covariance matrix is then the identity matrix.

2.4.2 Exponentially Weighted Moving Average
The exponentially weighted moving average (EWMA) can be used on time series
data to lower the importance of temporary spikes and noise and allow for underlying
trends to be more easily detected [22]. The EWMA is defined as shown in Equation
(2.12), where x is the original data, z is the EWMA of the data and λ ∈ (0, 1] is a
constant. A starting value for z is needed for the first iteration and is in this thesis
set to the starting value of ~x; z0 = x0.

zi = λxi + (1− λ)zi−1 (2.12)

The EWMA is a weighted average of all previous samples where the value of λ decides
how important the past values are in relation to the present values. If λ = 0.2, the
weight to the current value is 0.2 and the weights given to the preceding values are
0.16, 0.128, 0.1024 etc. This exponential decay in the weights is the reason why the
method is called exponentially weighted moving average.

12


3
Design of a condition monitoring

system for wind turbines

This chapter describes in detail the proposed method of using deep autoencoders
for condition monitoring of wind turbines. It starts with a description of which
SCADA signals that were used for training and testing of the method, followed by
an explanation of how the data was cleaned and transformed in the preprocessing
step. Thereafter comes a section describing the design and the training of the
autoencoder. Next, the postprocessing of the data is explained: How the residual
is taken and standardized and how Exponentially Weighted Moving Averaged can
be used to find trends in noisy data. The final section explains in detail how the
residual is used to find faults in the wind turbine with the use of the Mahalanobis
distance as well as an description on how the residual for individual signals can be
used for fault diagnosis.

3.1 Data description

The data used to train and test the models in this thesis was SCADA data collected
from multiple wind turbines in two different wind farms, see Section 2.1.2 for more
information about SCADA data. The data was collected as 10 minute averages
where each feature is a signal from a sensor. The signals that were used are listed
in Table 3.1.

3.1.1 Training and validation data

For each turbine 12 months of data was collected as training data and the following
6-9 months were used as validation data. This provided a training dataset of about
50000 data points and a validation dataset of 25000−40000 data points per turbine.
The data for training and validation should ideally come from a healthy turbine,
since the goal with the autoencoder is to learn to correctly reconstruct data during
healthy condition. A healthy turbine refers to a turbine with no faulty component
that is operating as expected during normal conditions. To ensure that the data
used in this thesis came from healthy turbines service logs were examined to see if
the turbine had experienced any fault during the period.

13


3. Design of a condition monitoring system for wind turbines

3.1.2 Inference data
The goal with the training of the autoencoders was to be able to evaluate new data,
called inference data. This data was preprocessed the same way as the training and
validation data.

Signal name Unit
Power kW
Wind speed m/s
Wind direction ◦

Generator bearing front temperature ◦C
Generator bearing rear temperature ◦C
Generator phase 1 temperature ◦C
Generator phase 2 temperature ◦C
Generator phase 3 temperature ◦C
Generator slip ring temperature ◦C
Hydraulic oil temperature ◦C
Gear oil temperature ◦C
Gear bearing temperature ◦C
Nacelle temperature ◦C
Ambient temperature ◦C
Grid inverter temperature L1 ◦C
Top controller temperature ◦C
Hub controller temperature ◦C
Spinner temperature ◦C
Rotor inverter temperature L1 ◦C
Rotor inverter temperature L2 ◦C
Rotor inverter temperature L3 ◦C
Grid busbar temperature ◦C
Gear bearing temperature ◦C
Voltage L1 V
Voltage L2 V
Voltage L3 V
Current L1 A
Current L2 A
Current L3 A
Generator RPM RPM
Rotor speed RPM
Blade angle (pitch position) ◦

Table 3.1: The 32 SCADA-data signals that were used for the turbines examined.

14


3. Design of a condition monitoring system for wind turbines

3.2 Data preprocessing
This section explains how the data was preprocessed, the theory behind this process
can be found in Section 2.3. The data was handled as tables, in the format of
dataframes, meaning that the columns represented the signals and each timestamp
was represented as a row. Note that the data was cleaned and transformed per
turbine, this since it was later used to train or test autoencoders, which is also done
per turbine.

3.2.1 Data cleaning
If a signal could not be recorded it would show up as a NaN-value in the data.
This could happen for longer periods, which could lead to the certain signal not
having enough values to be used at all. If any signal would have more than 20%
NaN-values, the column containing this signal would be dropped and that signal
would be ignored for this turbine. It could also happen temporarily, which would
result in few, less than 20%, NaN-values. In this case all data on these timestamps
were dropped, to ensure that every timestamp had valid data for all signals.
It was possible to see when the turbine was shut down by looking at the timestamps
where power was zero. Since the turbine behaves in a different way when it is shut
down, the signals for these timestamps were unwanted. Therefore all timestamps
where power was zero was removed, together with 30 minutes before and after the
shut down of the turbine, to avoid abnormal signals due to start/stop behaviour.

3.2.2 Transforming data
Two types of transformations were initially examined in this thesis, standardizing
data and ZCA whitening. The theory behind these transformations can be found
in 2.3.1. Standardizing data was tried since this is one of the most common ways of
doing data transformation for neural networks. It ensures that the data is centered
around zero and that the signals are scaled appropriately. ZCA whitening takes
this one step further, by also decoupling the signals from each other, i.e removing
the correlation between the signals. The idea is that removing correlation between
signals can help the autoencoder separate the signals from each other and thereby
facilitate training.
When transforming the data, the transformation was done with parameters calcu-
lated on the training data. For example, when standardizing the data the mean and
the variance were needed. These parameters were taken from the training data and
when transforming both training, validation and inference data, the same parame-
ters were used.
To examine whether the data should be standardized or ZCA whitened, both types
of transformation were used on the simulated faults and the real faults presented
in Section 4.1 and 4.2. The results showed that ZCA whitening was less fit to
use when the goal is to perform fault diagnosis. The autoencoders trained and
applied on ZCA transformed data provided less information about the fault than
what standardizing the data did. When trying to analyze what the autoencoder

15


3. Design of a condition monitoring system for wind turbines

had learned by creating the connection map explained in Section 4.1.2, it was clear
that the autoencoder trained on ZCA whitened data had learned relationships that
made it difficult to use for fault diagnosis. One explanation for these results is
that when ZCA whitening the data, information is actually removed from the data
and is stored in the transformation matrix instead, which means that there is less
information in the ZCA whitened data to draw conclusions from. Appendix A.1
provides more details about test cases, results and further discussion of this. From
these conclusions it was clear that the data should be standardized for best result,
which is the transformation that has been used in this thesis.

3.3 Autoencoder
The thesis Condition monitoring system for wind turbines by Renström [13] exam-
ined the use of autoencoders for anomaly detection in wind turbine data. Different
autoencoder designs and hyperparameters1 were examined and the best performing
model has been reproduced in this thesis, for continued examination. This section
describes the design and training of the autoencoder, including what hyperparam-
eters were used. For a thorough examination of the design process the reader is
referred to Reference [13].

3.3.1 Model design
The layout of the autoencoder is shown in Equation (3.1), where the numbers rep-
resent neurons per layer. nsignals is the number of available signals from Table 3.1
(see Section 3.2.1 for an explanation why not all signals might be available). If all
signals were available nsignals = 32.

nsignals × 144× 96× 64︸ ︷︷ ︸
Encoder

×18×︸ ︷︷ ︸
Code

64× 96× 144× nsignals︸ ︷︷ ︸
Decoder

(3.1)

The activation function used between each layer was ReLU [17], except for the last
layer which had a linear activation function. Batch normalization was used, which
is a way of standardizing the output of each layer to increase the stability of the
neural network [23]. The weights were initialized from a uniform distribution. The
autoencoder was implemented and trained using the programming language Python
3.7 with the package PyTorch (torch version 1.2.0) [24].

3.3.2 Training the autoencoder
When training the autoencoder the standardized training and validation data for
the turbine of interest was used. It was trained using backpropagation with Mean
Squared Error (MSE) as loss function. This meant that the output of the autoen-
coder was calculated using the training data, then the MSE between the recon-
structed and the training data was used to update the weights, using a version of

1Hyperparameters in Machine Learning are parameters that are set before training, for example
the size of the network or the learning rate.

16


3. Design of a condition monitoring system for wind turbines

gradient descent called ADAM [25]. The MSE for the validation data was also cal-
culated, but this error was never used to update weights. Instead it was used as a
criteria for early stopping, meaning that the training would stop if the autoencoder
started to overfit to the training data and thereby worsen the reconstruction of the
validation data. The training was done using batch training, with batch size 256,
which combined with the optimizer ADAM is a preferred way of training deep neural
networks since it lowers the risk of getting stuck in a local minima.

3.4 Data postprocessing
When the autoencoder has been trained it can be used to examine new data, called
inference data, coming from the same turbine as it was trained on. The inference
data, ~xinf , needs to be transformed using the same transformation as was done on
the training data, producing ~yinf . It is thereafter used as input to the autoencoder
which in turn produces a reconstruction, ~̂yinf . This reconstructed data is then
inverse-transformed to its original format, ~̂xinf . This process is described in Equation
(3.2).

~xinf −→ ~yinf −→ Autoencoder −→ ~̂yinf −→ ~̂xinf (3.2)

The difference between the original data and the reconstructed data is called the
residual and is examined further with the goal of using it for fault detection and
diagnosis.

3.4.1 Residual
The residual is the difference between the original data and the reconstructed data,
see Equation (3.3).

res(~x, ~̂x) = ~x− ~̂x (3.3)

The residual is close to zero for signals that are reconstructed well and large for
signals not reconstructed well. Some signals might always be better or worse re-
constructed than the other signals due to what the autoencoder has learned. This
makes it hard to compare the residual from one signal to another signal. To make
comparison of residuals between the signals possible, the residual is standardized
with respect to the validation data.

3.4.1.1 Standardizing residual

To make it possible to compare the residual for each signal, the residual has to be
standardized with respect to how well the signal is usually reconstructed. For this
reason, the validation data is used as a benchmark for how well the reconstruction
usually is for each signal. The residual for the inference data is transformed accord-
ing to Equation (3.4), where ~̂resinf is the transformed residual for the inference data.
When discussing the residual in the following sections and chapters of this thesis, it
is always the standardized residual that is used.

17


3. Design of a condition monitoring system for wind turbines

~̂resinf = ~resinf − µ
σ

µ = Mean[ ~resval]
σ2 = Var[ ~resval]

(3.4)

3.4.2 Exponentially Weighted Moving Average
The residual from the autoencoder can be noisy even when it is calculated on data
from a healthy wind turbine. Instabilities in the grid, wind gusts and other envi-
ronmental events are usually very short lived, but can still cause high spikes in the
residual. To not let these temporary events affect the condition monitoring system
it is necessary to process the residual. Real faults in the wind turbine are more
long-lived and should not be affected by a method that mainly targets short lived
faults. One way of doing this is to use the exponentially weighted moving average
(EWMA), that was described in Section 2.4.2. This method removes noise and al-
lows for easier detection of underlying trends. If the underlying trend shows a large
residual, it is more likely that this is due to a real fault in the wind turbine. In Fig-
ure 3.1b, the Mahalanobis distance of a residual is shown before and after applying
EWMA.
To calculate the EWMA the parameter λ is needed, which decides how important
previous samples are in relation to the current sample. It was set to λ = 0.004,
which corresponds to a response time of approximately 42 hours.

3.5 Condition monitoring for wind turbines
The condition monitoring system for wind turbines that is proposed in this thesis
consists of two parts; detecting time periods when the wind turbine is operating
with a developing fault and diagnosing which signals are affected by the fault. This
is examined in the next chapter, first on simulated faults in Section 4.1 and then on
data from turbines with known faults in Section 4.2.

3.5.1 Fault detection
To detect when the wind turbine is operating in an abnormal way is crucial for a
good condition monitoring system. The residual produced from the autoencoder
can be used to find out how well the autoencoder has reproduced the signals. Since
the autoencoder is trained on healthy data it has learned the relations between the
signals for healthy conditions. If the residual is larger for new data than it is for
known healthy data, there is potentially a fault in the wind turbine. To detect if the
residual is larger than normal, the Mahalanobis distance can be used as a metric.

3.5.1.1 Mahalanobis distance

The Mahalanobis distance (described in 2.4.1) measures the distance between a point
and a distribution and can be used to determine the likelihood that a point is from

18


3. Design of a condition monitoring system for wind turbines

(a) The Mahalanobis distance of the validation residual and infer-
ence residual before applying EWMA.

(b) The Mahalanobis distance of the validation residual and infer-
ence residual after applying EWMA with λ = 0.004.

Figure 3.1: A comparison of how a signal can look before and after applying
EWMA with λ = 0.004.

the distribution. When applied to anomaly detection this translates to determining
if a point is an anomaly or not.
When the Mahalanobis distance is applied to the residual of an autoencoder, the
residual of the validation data is used to estimate the distribution, since this is how
the residual is expected to look when it comes from healthy conditions. The Ma-
halanobis distance of the inference data is calculated for each timestamp according
to Equation (2.11), with the covariance matrix and mean vector estimated from the
residual of the validation data.
By setting a threshold on the Mahalanobis distance it is possible to classify times-
tamps from the inference data as anomalies. Since the Mahalanobis distance is noisy,
mainly due to noise in the original data, the threshold might be passed for short time
periods even when there is no fault present in the data. To ensure the threshold is
only passed when there is an underlying trend of a large Mahalanobis distance, the
Mahalanobis distance is averaged using EWMA (see Section 2.4.2) before applying

19


3. Design of a condition monitoring system for wind turbines

the threshold, as seen in Figure 3.1. The value of the threshold can be determined
by using datasets with known healthy and faulty time periods for the wind turbine.
Thereafter the threshold can be varied to find the threshold that gives the highest
amount of true positives (alarms for real faults), while minimizing the number of
false positives (alarms for no fault). The best result is often task-specific, where the
importance of finding all faults is weighted against having few false alarms. This was
examined extensively in Reference [13], and setting thresholds on the Mahalanobis
distance will not be further examined in this thesis.

3.5.2 Fault diagnosis
It is usually not enough to only detect that the wind turbine is operating with
a developing fault. Without knowledge about where in the turbine the fault is
located it is difficult to give recommendations on how to solve the problem. When a
component is operating with a fault, the signals measured from, or in connection to,
that component are affected. This section presents the proposed way of diagnosing
which signals are affected the most by the fault, which in turn can help in finding
the source of the fault.
When examining which signals are affected the most by the fault, the residual of the
signals are kept separated, as opposed to using the Mahalanobis distance to merge
them together. To allow for comparison between the residual signals, the residual is
standardized as described in 3.4.1.1. The idea is that signals not affected by the fault
will have a small residual, while signals affected should have a large residual. This
assumes that the residual for data from a healthy wind turbine approximately follows
a Gaussian distribution, which was tested and shown to be a valid approximation,
as can be seen in Appendix B.

3.5.2.1 Boxplot for fault diagnosis

How the separate signals are affected by the fault can be seen by looking at the
standardized residual for each separate signal. In order to compare the affect on
the different signals, the mean value of each residual is taken over a time period
defined as the test period. These mean values can then be compared to find out
which signals were affected the most. To make sure that the results gained from
such a study is not due to something just one autoencoder has learned, multiple
autoencoders with the same hyperparameters were trained on the same data. Due
to randomness in the initialization of the weights and in the training of the network,
each autoencoder could produce slightly different results. Each autoencoder was
therefore used to produce a residual, on which a mean value was taken per signal.
These sets of mean values per signal were then compared by using a boxplot to plot
the result. The resulting boxplot both show how similar the results are between
different autoencoders as well as give a majority vote to what signals were most
affected by the fault.
To clarify how a boxplot is created, here follows an example of how to calculate
the values needed for one box. In this example 10 autoencoders with the same
hyperparameters have been trained for the same turbine and all 10 autoencoders
have been used to create 10 sets of residuals. The specific residual signal examined

20


3. Design of a condition monitoring system for wind turbines

in this example is the residual for Power. There are 10 residuals for the power-signal,
where each residual is a time series over how the residual for the power varies for
the specified test period. For each residual, take the mean value of the power over
the test period, resulting in 10 mean values for the power, one for each autoencoder.
These 10 values can now be used to create one box, as in Figure 3.2. The box
is created by using a function in the programming language Python (the function
comes from the package Pandas; pandas.DataFrame.boxplot [26]) and the box is
defined as follows: The line in the middle of the box is the median of the values —
yes, that is a median of means in this case— and the outer edges of the box are
the 25th and 75th percentile of the values, which are called Q1 and Q3 respectively.
The lines extending from the box are called whiskers and are calculated by using
the length of the box, the Inner Quartile Range: IQR = Q3−Q1. The upper line
extends to Q3 + 1.5IQR and the lower line to Q1− 1.5IQR. If there are any values
outside of the whiskers, they are represented as circles and are considered outliers.
This is done for all signals, to create multiple boxes and to be able to compare
residual signals between each other.

Figure 3.2: Example of a boxplot with one signal. The median, Q2, is represented
by the green line inside the box. The lower and upper edge of the box, Q1 and
Q3 are determined by the 25th and 75th percentile of the values used to create the
box. The whiskers are calculated from the values of Q1 and Q3, as displayed in the
figure. The circle represents an outlier.

21


3. Design of a condition monitoring system for wind turbines

22


4
Validation study

This chapter consists of two main sections, both aiming to test the method for
condition monitoring that was described in the previous chapter. The first section
describes the method and results of using simulated faults to examine what the
autoencoder has learned. The results in this section explains what connections
between data signals the autoencoder has found and provides information about
how to interpret the residual for each signal. This information is valuable for the
next section, in which eight test cases from real world faults are examined with the
proposed condition monitoring method.

4.1 Decoding the autoencoder

The tests described in this section aim to examine how the autoencoder reacts when
one signal experiences a fault and how that in turn affects the reconstruction of
the data. This helps interpreting the residual and understanding which internal
connections between the signals the autoencoder has learned. To do this, simulated
faults were introduced to data from a healthy turbine. The data used for this was
the same data as the training data, which in the following subsections is called the
original data. Simulated faults are distortions manually added to the data in the
preprocessing step. An advantage of manually adding faults is that the exact fault
is known, which is not the case for the real world faults examined in the following
section, 4.2.
Two types of simulated faults were examined, one was generated as an exponential
function and was used to determine what happens when the fault added is positive
or negative, see Section 4.1.1. The other type of fault was a positive constant,
proportional to the standard deviation of the signal to be distorted. This fault was
used to compare how adding the fault to different signals would affect the other
signals and is described further in Section 4.1.2.
The simulated faults used in this section only affect one signal at a time and thereby
differ from the types of faults a turbine would normally experience, which usually
affect multiple signals. The exception would be sensor faults, where one sensor
would produce a faulty signal that could be similar to the simulated faults discussed
here.

23


4. Validation study

(a) Positive simulated fault, generated as
shown in Equation (4.1).

(b) Negative simulated fault, generated by
multiplying the distortion in Equation (4.1)
by -1.

(c) The original signal (blue) and the dis-
torted signal (green). The distorted signal
is created by adding the distortion shown in
Figure 4.1a to the original signal.

(d) The original signal (blue) and the dis-
torted signal (green). The distorted signal
is created by adding the distortion shown in
Figure 4.1b to the original signal.

Figure 4.1: A positive and a negative simulated fault added to the original signal
Rotor inverter temperature L1. The signals have been smoothed with EWMA for
visualisation.

4.1.1 Positive vs negative fault
This section describes how the residual is affected by adding a positive or a negative
distortion to one signal, making the signal either higher or lower than original. This
is done in order to examine if it is possible to find out what signal was distorted by
looking at the residual and to see if the autoencoder has learned to find connections
between signals.

4.1.1.1 Method

A positive distortion was generated as shown in Equation (4.1), where zi is the dis-
tortion at timestamp i and xi is taken from a set of evenly spaced values between
5 and 10 (this set is generated using the function numpy.linspace in the program-
ming language Python [27]). The positive distortion is displayed in Figure 4.1a. A
negative distortion was generated by multiplying the positive distortion by -1 and
is displayed in Figure 4.1b.

zi =0.3 + 2xi

300 +N (0.2, 0.1) (4.1)

24


4. Validation study

The positive distortion was added to the signal Rotor inverter temperature L1 in
the original data. This would symbolize a turbine operating with too high Rotor
inverter temperature L1. For comparison, the negative distortion was also added to
the signal. The original and the distorted signal for both faults are shown in Figure
4.1c and 4.1d.
After the data was distorted, it was standardized and inserted into a trained au-
toencoder. The autoencoder reconstructed the data and after the reconstructed
data was inverse transformed back to original shape, the residual was calculated
and standardized as explained in Section 3.4.1.
The results for the distorted signal and two other signals, Rotor inverter temperature
L2 and Rotor inverter temperature L3, were then analyzed together by plotting the
original, distorted and reconstructed signals as well as the residual, for the three
signals. These signals were analyzed together since they are measured on the same
component and thereby vary together when everything is working as expected. By
just distorting one of the three signals, it is possible to see if the autoencoder has
learned the underlying relationship between the signals and how that affects the
residual of the signals.
Due to randomness in the training of the autoencoder, two autoencoders with the
same hyperparameters trained on the same data could produce different results.
Therefore, to see that the results were not unique to just one autoencoder, 10 au-
toencoders, with the same hyperparameters, were trained on the same training data.
The data was then distorted as described above and reconstructed using the 10 au-
toencoders. This allowed for a boxplot for the positive and the negative fault to be
created, as described in 3.5.2.1. The boxplot also provides an overview of how the
distortion affects all signals and not just the rotor inverter temperatures.

4.1.1.2 Results and discussion

Figure 4.2 shows the affect on the signals Rotor inverter temperature L1, L2 and
L3 when distorting Rotor inverter temperature L1 by a positive distortion. The
residual is clearly positive for the distorted signal, while it is negative for the other
two signals. This result could be interpreted as a large positive residual implies a
fault connected to this signal. To test this assumption, the same signal was distorted
with a negative residual, with the result shown in Figure 4.3. From the result in
this figure it is clear that Rotor inverter temperature L2 and L3 have large positive
residuals while the distorted signal has a large negative residual. If the assumption
is correct, this result would mean that Rotor inverter temperature L2 and L3 are
the distorted signals, which is wrong. This means that it is hard to draw conclusion
on what signal is behaving in an abnormal way by just looking at which signal has
the largest positive residual.
Another way to interpret the residuals is to view all residuals, positive and negative,
as a result of the fault the wind turbine is experiencing. A faulty component will
affect all the signals directly or indirectly connected to it, which will be seen in the
residual if the autoencoder has correctly learned how the signals are connected. The
results in Figure 4.2 and 4.3 show that the autoencoder has learned that the signals
Rotor inverter temperature L1, L2 and L3 vary together during normal conditions.
The autoencoder reconstructs each rotor inverter temperature signal from the input

25


4. Validation study

of all three signals. As can be seen in Figure 4.2 the reconstructed signal for Rotor
inverter temperature L1 is higher than the original signal, due to it trying to recon-
struct the distorted signal, but lower than the distorted signal, since it is created
from a function also including Rotor inverter temperature L2 and L3, which are
lower than the distorted signal. The same is seen by looking at the reconstructed
signals for Rotor inverter temperature L2 and L3, which are higher than their re-
spective original signal, due to the reconstruction also including the higher value
of the distorted signal. This is why the residual is positive for the distorted signal
and negative for the other signals. A result as the one shown in Figure 4.2 could
be interpreted as Rotor inverter temperature L1 being too high compared to Rotor
inverter temperature L2 and L3 according to what the autoencoder knows. The
same reasoning can be used on the negative distortion in Figure 4.3.
To compare the results from multiple autoencoders and to visually show how the
residual for all signals, not just the rotor inverter temperatures, are affected, the
boxplot method explained in Section 3.5.2.1 was used on the data from the two
distortions as well as on the original healthy, data for comparison. The result for
the positive distortion is shown in Figure 4.4, for the negative distortion in Figure 4.5
and for the original healthy data in Figure 4.6. These figures show, apart from the
same result as discussed above, that the distortion also affects Power and Current
L1, L2 and L3. For the positive distortion the residual for these signals is slightly
lower than zero and for the negative distortion it is slightly higher than zero . This
implies that the autoencoder expects the wind turbine to produce more power when
the rotor inverter temperature is high and less when it is low. More power means
higher currents, which the autoencoder also have learned.
This discussion leads to two important conclusions: (i) The autoencoder can learn
relationships between signals which can be explained by how the signals are physi-
cally connected and (ii) the residual for the signals connected to the distorted signal,
and the distorted signal itself, were affected the most by the distortion. This means
that the signals with the largest residual should be analyzed when trying to di-
agnose the fault. A fault in a component will likely show as large residuals for
all signals measured directly on the faulty component, but could also show on the
residual for signals that are indirectly connected through relationships learned by
the autoencoder.

26


4. Validation study

(a) Original signal (blue), distorted sig-
nal (green), reconstructed signal by au-
toencoder (dashed orange).

(b) Residual between the distorted signal
and the reconstructed signal, with distortion
(blue) and without distortion (dashed red).

(c) Original signal (blue), reconstructed
signal by autoencoder (dashed orange).

(d) Residual between the original signal
and the reconstructed signal, with distortion
(blue) and without distortion (dashed red).

(e) Original signal (blue), reconstructed
signal by autoencoder (dashed orange).

(f) Residual between the original signal
and the reconstructed signal, with distortion
(blue) and without distortion (dashed red).

Figure 4.2: The effect on the rotor inverter temperatures from adding a positive
exponential distortion to the signal Rotor inverter temperature L1, as shown in
Figure 4.1c. The residual for the simulated fault is compared with the residual
when there was no distorted signal in figure b), d) and f). All signals have been
smoothed with EWMA for visualisation.

27


4. Validation study

(a) Original signal (blue), distorted sig-
nal (green), reconstructed signal by au-
toencoder (dashed orange).

(b) Residual between the distorted signal
and the reconstructed signal, with distortion
(blue) and without distortion (dashed red).

(c) Original signal (blue), reconstructed
signal by autoencoder (dashed orange).

(d) Residual between the original signal
and the reconstructed signal, with distortion
(blue) and without distortion (dashed red).

(e) Original signal (blue), reconstructed
signal by autoencoder (dashed orange).

(f) Residual between the original signal
and the reconstructed signal, with distortion
(blue) and without distortion (dashed red).

Figure 4.3: The effect on the rotor inverter temperatures from adding a negative
exponential distortion to the signal Rotor inverter temperature L1, as shown in
Figure 4.1d. The residual for the simulated fault is compared with the residual
when there was no distorted signal in figure b), d) and f). The signals have been
smoothed with EWMA for visualisation.

28


4. Validation study

Figure 4.4: Boxplot from the results of 10 autoencoders on the positive exponential
distortion of the original signal Rotor inverter temperature L1, as shown in Figure 4.1c.

Figure 4.5: Boxplot from the results of 10 autoencoders on the negative exponential
distortion of the original signal Rotor inverter temperature L1, as shown in Figure 4.1d

29


4. Validation study

Figure 4.6: Boxplot from the results of 10 autoencoders on healthy data.

30


4. Validation study

4.1.2 Connection map
The results in the previous section showed that by distorting one signal, it is possible
to see what other signals are affected. It showed that the autoencoder had learned
that Rotor inverter temperature L1, L2 and L3 change together and that they in
turn also affect the signals Power and Current L1, L2, L3. The aim of this section
is to expand this into understanding how each signal affect all the other signals. As
before, the data distorted was the data that was used to train the autoencoder and
is referred to as the original data in the following text and plots.

4.1.2.1 Generating the connection map

In a similar manner as when distorting the signal Rotor inverter temperature L1, in
Section 4.1.1, a simulated fault was added to one signal while the other signals were
unchanged. The effect on the residual for this simulated fault was recorded as the
mean value of the residual for each signal over a certain time period, resulting in
one mean value per signal. The data was then reset to original and another signal
was distorted. This was repeated until all signals were distorted once. The result
of this is multiple sets of mean values, where each set belong to what signal was
distorted. This can be displayed as a matrix, where the sets are inserted as column
vectors, resulting in a n × n matrix (where n is the number of signals) were each
column tells what signal was distorted and the rows tell how the distortion affected
the signal on each row. To make sure the results are not unique to one autoencoder,
this was done for the 10 autoencoders that were trained on the original data and
the results is a matrix like the one described above, where the values are the means
of the result from the 10 autoencoders.
The fault that was added to each signal was a positive translation of the signal that
depended on the standard deviation σ of the signal, see Equation (4.2), where ~xi is
the original data for signal i and ~̂xi is the distorted signal i. In Figure 4.7 such a
distortion is shown.

~̂xi = ~xi + σi

σi =
√
Var[~xi]

(4.2)

4.1.2.2 Analysis of the connection map

The resulting matrix, called a connection map, can be seen in Figure 4.8. It is
coloured according to what values are high (red) or low (blue). The colouring is
done per column, meaning that the highest value in each column is dark red and the
lowest is dark blue. Since the colouring is done per column, the colours should not
be compared between the columns. In the connection map in Figure 4.8 it is clear
that the largest value in each column belongs to the signal that was distorted, which
is shown as a diagonal of dark red values in the map. This result means that when
adding a positive distortion to one signal, the same signal has the largest positive
residual. It is also possible to see what other signals are affected by a distortion.

31


4. Validation study

Figure 4.7: Distortion of a signal by using Equation (4.2). The lower blue line is
the original signal and the upper orange line is the distorted signal.

If the mean value for a residual from a certain signal is non-zero it means that the
autoencoder has found a connection between this signal and the distorted signal.
When the value is negative, it means that the autoencoder expects the two signals to
be proportionally connected (high together or low together) when the wind turbine is
healthy, as was shown and discussed in the previous section for the signals connected
to the rotor inverter. The opposite is true when the value is positive, meaning that
the signals are inversely proportional.
The result that was shown in the previous section can be seen by looking at column
Rotor inverter temperature L1. The mean residual value is large and positive for
Rotor inverter temperature L1 and large but negative for Rotor inverter temperature
L2 and L3. Note that the values differ compared to the values shown in the previous
section, since the signals were distorted with different distortions.
By looking at the columns for Current L1, L2 and L3 it is visible that they are
affecting both each other and Power. The same is seen in the column for Power ;
when distorting Power, the currents are also affected. For these distortions, the
mean residual values for the discussed affected signals are negative, except for the
distorted signal. This means that they are proportionally connected, a result that
confirms physical relations.
The Generator bearing front temperature seems to be related to the Generator phase
1-3 temperatures, which is also expected since they are all measured on the generator.
Column Ambient temperature shows that when the ambient temperature is high,
the residual for Generator slipring temperature is negative, meaning that those two
temperatures usually vary together. What is notable in this column is that the
residuals for Generator phase 1-3 temperature are also positive when the ambient
temperature is high which is probably due to their relation to the Generator slipring
temperature. This relationship can be seen in the column for generator slipring
temperature where a high Generator slipring temperature results in negative residual
for Generator phase 1-3 temperature.
As shown by this discussion, the connection map can be used to analyze what the
autoencoder has learned. It shows which signals are connected in the autoencoder
and shows that it seems to have learned relations that can be explained physically
in the turbine. The connection map could be further investigated, by for example
trying out different distortions, changing the amplitude of the distortion or distorting

32


4. Validation study

multiple signals at once. Since the autoencoder is highly non-linear, the relations
shown in this sections might change for other types of distortions. This would help in
understanding how stable the results in the connection map is as well as potentially
give more knowledge about what the autoencoder has learned. This will not be
further investigated in this thesis, but is left as a suggestion for future work.

4.1.3 Conclusion on simulated faults
The analysis of simulated faults helps in understanding how the autoencoder reacts
to faults in one signal and how that affects the reconstruction of the data. It has
been shown that a fault in one signal affects the reconstruction of multiple signals,
which is seen as large amplitude of the residual for affected signals. It has also been
shown that when a signal is too high, the residual for that signal is positive and large,
while the residual for proportionally connected signals usually is large but negative.
The opposite is also true, a too low signal results in a large negative residual for the
distorted signal and a large positive residual for proportionally connected signals.
This shows that it is hard to know if it is one signal being too low or another signal
being too high that is the source of the fault. It suggests that rather than telling
that one signal is faulty, the analysis of the residual tells which signals are affected
by the fault.
The connection map in Section 4.1.2 helps understanding what internal connections
between the signals the autoencoder has learned. Many of the connections can be
explained by physical relations, which shows that the autoencoder has learned real
relationships. This method of explaining what the autoencoder has learned does
give some insight, but it is important to remember that the autoencoder is highly
non-linear because of its deep architecture. When distorting signals together, or by
other amounts, other relationships could arise, that are not visible in the current
connection map.
The simulated faults discussed have been added to only one signal at a time, which
is usually not how a fault would look like in the real world, unless it is a sensor
fault. To know if the autoencoder can be used to find real world faults, and how
those would affect the residual, data from known faults needs to be examined. This
will be done in the following section, 4.2.

33


4. Validation study

F
igure

4.8:
T
he

connection
m
ap

created
by

adding
a
distortion

to
each

signalone
by

one.
T
he

colum
ns

represent
the

distorted
signaland

the
row

s
are

the
effect

on
the

signals.
T
he

colouring
is
done

per
colum

n,w
here

the
highest

value
in

each
colum

n
is
coloured

dark
red

and
the

low
est

value
dark

blue.
T
he

colours
are

not
com

parable
betw

een
colum

ns.
T
he

connection
m
ap

is
created

by
using

10
autoencoders

and
the

m
ap

is
the

m
ean

from
the

result
from

the
10

autoencoders.

34


4. Validation study

4.2 Validation of the condition monitoring system
In this section, the proposed method for condition monitoring of wind turbines
presented in Section 3.5 was tested on data from wind turbines with known faults.
The condition monitoring system consists of two parts, the fault detection part,
described in Section 3.5.1, and the fault diagnosis part, described in 3.5.2.

4.2.1 Test cases
The condition monitoring system was tested on data from turbines with known
faults. Most of the faults were found by an existing condition monitoring system
that is based on the work presented in [28], and were confirmed by looking at
service logs. The other faults were found only by looking at service logs, since the
existing condition monitoring system did not produce a warning for these faults. The
existing condition monitoring system uses one small feed-forward neural network per
signal it monitors, which allows it to produce warnings when the monitored signals
are higher than expected, but does not incorporate the connections between the
different signals. In the following, the existing condition monitoring system is called
ANN-CMS while the novel condition monitoring system proposed in this thesis is
called AE-CMS.
The test cases are presented in Table 4.1. The column Warning signals shows the
signals that were causing warnings according to the ANN-CMS. If the cell is empty
the ANN-CMS did not give warning for the fault. In the column Cause the cause
for the alarm is specified. This information, and the information in Component
breakdown, which specifies when the component broke down, both come from service
logs. The time period for the tests are specified as 60 days per fault, and is shown
in the column Test time period. As seen in the table there are eight faults of which
five are unique. The faults came from six different turbines where two turbines
experienced two different faults at different times. There is also one test case for
healthy data, to compare the results from healthy data to results from a faulty
turbine.

4.2.2 Method
The proposed method for condition monitoring of wind turbines is described in
Section 3.5 and includes fault detection and fault diagnosis. The fault detection
part is represented by one plot per test case that shows the Mahalanobis distance
calculated on the residual from the validation data and inference data from one
autoencoder. In the Mahalanobis distance plot, the test period for the fault is
marked with a blue square. To tell if the fault could be detected, the Mahalanobis
distance must be larger for the data within the marked test period than it is for the
validation data, otherwise the fault could not be found by setting a threshold on the
Mahalanobis distance.
The fault diagnosis is done by comparing the residual for each individual signal
and this is done by examining a boxplot showing the mean residual over the test
period. The boxplots were created as described in Section 3.5.2.1, with the use of

35


4. Validation study

Wind
farm

Wind
turbine Warning signal (ANN-CMS) Cause Component

breakdown
Test time
period

A 3 No fault No fault 2018-04-01 –
2018-06-01

A 1 Rotor inverter temp L1, L2, L3,
Grid inverter temperature L1

Cooling system
issue

2018-12-03 –
2019-02-01

A 5 Rotor inverter temp L1, L2, L3,
Grid inverter temperature L1

Cooling system
failure

2019-01-19 –
2019-03-20

A 5 Generator slipring temperature Generator slipring
hose failure

2018-07-14 –
2018-09-12

A 2 Generator slipring temperature Slipring brush
failure

2019-09-01 –
2019-10-31

A 2 Hydraulic oil temperature
Hydraulic
system component
failure

2018-09-16 –
2018-11-15

A 3 Gearbox 2019-01-17 2018-11-18 –
2019-01-17

A 4 Gearbox 2018-11-15 2018-09-16 –
2018-11-15

B 1 Yaw encoder
failure

2018-03-02 –
2018-05-01

Table 4.1: Table over the test cases. The column Warning signal (ANN-CMS)
shows the signals that were causing a warning according to an existing condition
monitoring system. The column Cause shows the cause of the fault according to
service logs and the column Component breakdown show, if known, when the com-
ponent broke down and needed to be replaced.

10 autoencoders per turbine. The result of the fault diagnosis is compared to what
signals caused warning according to the ANN-CMS. If the same signals seemed to
be responsible for the fault, the fault diagnosis proposed here was seen as successful
for the specific fault. For each fault, the results are also compared with knowledge
about what component the signals were measured on. This was especially important
for the two types of faults that the ANN-CMS had not warned for, since in these
cases there was no known faulty signal to compare the result to.

4.2.3 Results and discussion per fault type
Here the results from the test cases are presented. The results are displayed as plots
and are described and discussed under the subsection for each fault type. Note that
both the range of the plotted data and the values on the y-axises differ between plots.
In Table 4.2, the largest positive and negative residuals are shown in comparison
with the result from the ANN-CMS and the service logs.

4.2.3.1 Healthy data

The result of the AE-CMS on healthy data can be seen in Figure 4.9. The Maha-
lanobis distance is low, and the amplitudes of the residuals shown in the boxplot
are also low. It looks as if the wind turbine might be producing less power than ex-
pected, by looking at the low residual for power and current, but since the amplitude
overall is so low this is not bad enough to be classified as a fault.

36


4. Validation study

4.2.3.2 Cooling system failures

Two turbines experienced faults in the cooling system connected to the rotor inverter,
which resulted in too high temperatures in the rotor inverter and the grid inverter
according to the ANN-CMS. The result for Wind farm A: Turbine 1 and Turbine 5
can be seen in Figure 4.10 and 4.11 respectively. The Mahalanobis distance is high
for both turbines at the faulty time period, showing that the fault is detected. The
boxplots are very similar for the two faults, both showing that the residual for the
signals Rotor inverter temperature L1, L2, L3 and Grid inverter temperature L1 is
large and positive, which means that these temperature are higher than expected.
This matches the result from the ANN-CMS. The residual for Power and for the
currents are negative, which is likely due to the produced power being expected to
be higher when the wind turbine is operating at the current temperatures for grid
and rotor inverter. The connection between the grid and rotor inverter temperatures
and the signals for power and current can also be seen in the connection map, Figure
4.8, which confirms the suspicion that the autoencoder expects more power to be
produced at these high temperatures.

4.2.3.3 Generator slipring failures

There were two turbines that experienced problems with the generator slipring,
Wind farm A: Turbine 5 and Turbine 2, with results shown in Figure 4.12 and 4.13,
respectively. Both result show a clear rise in the Mahalanobis distance during the
test period, meaning that the faults can be found. Both also show a large residual
for the signal Generator slipring temp, which is the same signal as the ANN-CMS
warned for. This signal is also the only signal that is measured directly on the
generator slipring, and a rise in this signal is expected when there is a fault in the
generator slipring.
In both boxplots it is also visible that the residuals for the signals Generator phase
1-3 temp are negative, which could mean that they are either lower that they should
be, or that the autoencoder has learned that they are usually more similar to the
Generator slipring temperature. The latter would mean that the autoencoder has
found a relationship between these signals and this relationship can in fact be seen
in the connection map in Figure 4.8. This relationship can also be explained by
these temperatures all being measured on the generator.

4.2.3.4 Hydraulic system issue

Wind farm A: Turbine 2 experienced a problem in the hydraulic oil system and
the result from the AE-CMS can be seen in Figure 4.14. As seen in the figure, the
Mahalanobis distance is large during the test period, meaning the fault detection
system can find the fault. The boxplot shows a large positive residual for the signal
Hydraulic oil temperature, which match the results from the ANN-CMS. The residual
for Gear bearing temperature is also large and positive. In the connection map,
Figure 4.8, these two signals are shown to vary together. The reason for this can be
explained physically since the hydraulic oil system and the gearbox cooling system
use the same oil. A fault in a component in the hydraulic system may therefore

37


4. Validation study

affect both the Hydraulic oil temperature and the Gear bearing temperature.

4.2.3.5 Gearbox failure

Two wind turbines in the test set experienced gearbox failures according to service
logs. These faults were not found by the ANN-CMS, so there is no ground truth
telling what signals were high or abnormal. Figure 4.15 shows the result for Wind
farm A: Wind turbine 3. The Mahalanobis distance shows a clear peak during the
test period and hence the fault can be detected. The boxplot shows large positive
residuals for Gear oil temperature and Gear bearing temperature. In Figure 4.16, the
result for Wind farm A: Wind turbine 4 is shown. The Mahalanobis distance plot
shows one large, but thin, spike during the test period, but for the main part of the
test period the Mahalanobis distance is relatively low. This means that the fault
is difficult to detect by using the Mahalanobis distance. The boxplot shows a large
positive residual for Gear bearing temperature and a large negative residual for Gear
oil temperature.
As seen by comparing the results from the two wind turbines, both have a large
positive residuals for Gear bearing temperature, while the residual for Gear oil tem-
perature is positive for the first turbine and negative for the second. These tempera-
tures are both measured in the gearbox, why a fault in the gearbox is likely to affect
these signals. The difference in the sign for Gear oil temperature could possibly be
due to different components in the gearbox being responsible for the fault. The
exact faults were not specified in the service logs. Both boxplots also show that the
power and the currents are having negative residuals, which is probably due to the
wind turbine producing more power at this high Gear bearing temperature when it
is operating in a healthy condition.

4.2.3.6 Yaw encoder failure

The yaw encoder turns the wind turbine in the direction of the wind, to maximize
power production and ensure optimal operation conditions for the turbine. If it is
not operating correctly the wind turbine cannot turn in the right direction, which
Wind farm B: Wind turbine 1 experienced. The ANN-CMS did not produce a
warning for this issue. The result from the AE-CMS can be seen in Figure 4.17.
The Mahalanobis distance is high for the test period, while the amplitude of the
residuals shown in the boxplot are as low as they were for the healthy wind turbine
shown in Figure 4.9. From the boxplot it is hard to by eye diagnose what fault
happened in the wind turbine. There may be a pattern from the residuals that
could be found and used to classify Yaw encoder failures, for example with the use
of a neural networks classifier.

38


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.9: Data from Wind farm A: Turbine 3, when it is operating under
healthy conditions. In (a), the test period for this test is marked. The gearbox
failure described in Figure 4.15 can also be seen as the rise in the Mahalanobis
distance at 2019-01.

39


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.10: Data from Wind farm A: Turbine 1. The turbine was experi-
encing cooling system issues according to service logs. According to the existing
anomaly detection method the signals Rotor inverter temperature L1, L2, L3 and
Grid inverter temperature L1 were high. In (a), the test period for this test is
marked.

40


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.11: Data from Wind farm A: Turbine 5. The turbine was experi-
encing cooling system issues according to service logs. According to the existing
anomaly detection method the signals Rotor inverter temperature L1, L2, L3 and
Grid inverter temperature L1 were high. In (a), the test period for this test is
marked. The fault described in Figure 4.12 can also be seen as the rise in the
Mahalanobis distance at 2018-07 – 2018-09.

41


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.12: Data fromWind farm A: Turbine 5. The turbine was experiencing
a failure in the generator slipring hose according to service logs. According
to the existing anomaly detection method the signal Generator slipring temperature
was high. In (a), the test period for this test is marked. The fault described in
Figure 4.11 can also be seen as the rise in the Mahalanobis distance at 2019-01 –
2019-04.

42


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.13: Data fromWind farm A: Turbine 2. The turbine was experiencing
a failure in the generator slipring brush according to service logs. According
to the existing anomaly detection method the signal Generator slipring temperature
was high. In (a), the test period for this test is marked. The fault described in
Figure 4.14 can also be seen as the rise in the Mahalanobis distance at 2018-07 –
2018-11.

43


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.14: Data from Wind farm A: Turbine 2. A component in the
hydraulic system failure according to service logs. According to the existing
anomaly detection method the signal Hydraulic oil temperature was high. In (a),
the test period for this test is marked. The fault described in Figure 4.13 can also
be seen as the rise in the Mahalanobis distance at 2019-08 – 2019-11.

44


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.15: Data from Wind farm A: Turbine 3. Gearbox failure according
to service logs. In (a), the test period for this test is marked.

45


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.16: Data from Wind farm A: Turbine 4. Gearbox failure according
to service logs. In (a), the test period for this test is marked.

46


4. Validation study

(a) Mahalanobis distance on validation and inference residual. Test period marked.

(b) Boxplot on the mean values from the test period. Results from 10 autoencoders.

Figure 4.17: Data from Wind farm B: Turbine 1. Yaw encoder failure
according to service logs. In (a), the test period for this test is marked.

47


4. Validation study

Wind
farm

Wind
turbine Warning signal (ANN-CMS) Cause Result

figure
Large positive residual (median val.)
(AE-CMS)

Large negative residual
(median val.) (AE-CMS)

A 3 No fault No fault 4.9 -

A 1 Rotor inverter temp L1, L2, L3,
Grid inverter temperature L1

Cooling system
issue 4.10 Rotor inverter temp L1 (5.2), L2 (4.8),

L3 (5.3), Grid inverter temperature L1 (6.2) Nacelle temperature (-6.6)

A 5 Rotor inverter temp L1 (10.2),
L2 (11), L3, Grid inverter temperature L1

Cooling system
failure 4.11 Rotor inverter temp L1 (4.2), L2 (4.6),

L3 (4.6), Grid inverter temperature L1 (6.7)

A 5 Generator slipring temperature Generator slipring
hose failure 4.12 Generator slipring temperature (6.0)

A 2 Generator slipring temperature Slipring brush
failure 4.13 Generator slipring temp (5.7) Phase 1 temperature (-3.5)

A 2 Hydraulic oil temperature
Hydraulic
system component
failure

4.14 Hydraulic oil temperature (9.5),
Gear bearing temperature (6.1)

A 3 Gearbox 4.15 Gear oil temperature (2.8),
Gear bearing temperature (1.5)

A 4 Gearbox 4.16 Gear bearing temperature (1.6) Gear oil temperature (-0.8),
Phase 2 temperature (-0.7)

B 1 Yaw encoder
failure 4.17 Hub controller temp (0.3) Power (-0.4)

Table 4.2: Table over the test cases showing faults and warnings according to the ANN-
CMS and service logs and the corresponding results gained with the AE-CMS. In the two
columns Large positive / negative residual (median val.) (AE-CMS) the largest positive
and negative residuals are presented for each fault, with the median values displayed within
parenthesis.

4.2.4 Conclusion from the validation on real world faults
Out of eight test cases it was clear that seven faults could be detected by monitoring
the rise in the Mahalanobis distance, which is two more faults than what the ANN-
CMS had detected. The fault that could not be detected by the AE-CMS (nor the
ANN-CMS) was the gearbox failure for Wind farm A: Wind turbine 4, since the
Mahalanobis distance was low for almost the whole test period, as seen in Figure
4.16. This fault might be detectable if the separate residuals were monitored instead,
since it was clear in the boxplot for the fault that the separate residuals for the gear
temperatures were large.
It was possible to diagnose the fault for seven of the eight test cases by looking at the
boxplots. The boxplots showed large residuals for all the signals the ANN-CMS had
produced warnings for and could also be used to diagnose the two gearbox failures,
since the residual for the signals measured in the gearbox were large compared to the
other signals. The fault that the AE-CMS failed to diagnose was the Yaw encoder
issue, Figure 4.17, as the amplitude of the separate residuals in the boxplot were as
low as for a healthy wind turbine and showed no clear pattern. Since there was no
signal that directly measured the yaw encoder, it might be hard to see the fault on
the residual for a specific signal. But, since the Mahalanobis distance was large for
the test period, there must be some fault in the separate residuals that should be
possible to find, but maybe not by just analysing the boxplot by eye. An algorithm
could potentially be trained to find patterns in the residual and thereby help in
diagnosing this and other types of faults. This is suggested as a continuation of the
work presented in this thesis.
The connection map, Section 4.1.2, helped when interpreting the result seen in the
boxplots. For example it helped explaining why the residual for the power and
currents was low for the cooling issues shown in Figure 4.10 and 4.11. It also helped
in explaining the low residual for Generator phase 1-3 temperatures when there was
a fault in the generator slipring, as discussed in Section 4.2.3.3.

48


5
Closure

5.1 Conclusion
The aim of this thesis was to (i) design a condition monitoring system based on
an autoencoder that can detect and diagnose developing faults, (ii) examine what
internal connections the autoencoder has found in order to understand how the
residual between the input and output of the autoencoder behaves.
The first aim was met by the proposed condition monitoring system, as was shown
in the validation study of eight known faults found in five different components:
gearbox issue, cooling issue, hydraulic oil issue, yaw encoder issue and generator
slipring issue. The autoencoder was used to produce a residual, taken as the error
between the input to the autoencoder and its reconstructed signal. For fault detec-
tion, the Mahalanobis distance was used on the residual and it was shown that when
the wind turbine experienced a fault, the Mahalanobis distance was large for seven
of the eight faults. For fault diagnosis, the residual for each signal was standardized
with respect to how large the residual is for healthy data. The standardized residual
was analyzed to examine which signals are mostly affected by the fault.