Department of Energy and Environment  Institute of Aircraft Design 

CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF STUTTGART 

Gothenburg, Sweden 2015   Stuttgart, Germany 2015 

 
SCADA-Data Analysis for Condition  

Monitoring of Wind Turbines 

 
Master’s thesis in Energy Engineering 

Simon Letzgus 

 
MASTER’S THESIS 

 
SCADA Data Analysis for Condition Monitoring of 
Wind Turbine Components 

 
Master’s Thesis within the Energy Engineering program 

SIMON LETZGUS 

 
EXAMINERS: 

Prof. Dr. Po-Wen Cheng 
Ph.D. Jimmy Ehnberg 

 
SUPERVISORS: 

Lic. Pramod Bangalore 
Dipl. Ing. Kolja Müller 

 
Department of Energy and Environment 

Division of Electric Power Engineering 

CHALMERS UNIVERSITY OF TECHNOLOGY 

Gothenburg, Sweden 2015 

 
Institute of Aircraft Design 

Stuttgart Wind Energy (SWE) 

UNIVERSITY OF STUTTGART 

Stuttgart, Germany 2015 


  V 

 
Abstract 

Wind energy, the world’s fastest growing renewable energy technology, is developing 

towards a major utility source. Turbines are growing in size and are located in more 

remote sites, sometimes even offshore, to benefit from better wind conditions. These 

developments help to maximize the output per turbine but come with challenges for 

operation and maintenance (O&M). Unexpected failures result in longer downtimes and 

consequently higher revenue losses. Hence, maintenance management promises consid-

erable cost saving potential and the analysis of data form the turbine inbuilt supervisory 

control and data acquisition (SCADA) system can effectively support maintenance de-

cisions. 

This thesis aims to investigate possibilities to utilize SCADA data for early failure de-

tection in critical wind turbines (WTs). Therefore, a condition monitoring approach is 

further developed and applied. The method uses artificial neural networks to model tar-

get parameters under normal operating conditions and analyzes deviations from the 

measured values with the help of statistical tools, such as the Mahalanobis distance 

(MHD) measure. In order to increase the robustness and accuracy of the approach, the 

development of several data pre-processing methods is presented. Two different anoma-

ly detection philosophies are investigated by building two different models. A gearbox 

model which is monitoring local variables to indicate component malfunctions and a 

power model which is predicting the turbine’s power output to indicate problems form a 

system’s perspective. 

Based on the available data both monitoring approaches were applied to investigate 

gearbox failures for indirect drive WTs and generator bearing failures for direct drive 

WTs. Furthermore, the power model was found to be an effective method for ice detec-

tion on WT blades. The successful detection of gearbox anomalies long before a final 

component breakdown is presented. However, the model was not able to detect all gear-

related problems investigated. It was concluded that the availability of parameters 

which are potentially affected by component malfunctions play a decisive role in this 

approach. The power model application showed that a different anomaly detection ap-

proach might be better suited for the investigated cases. However, this approach is well 

suited for the detection of icing and recommendations for further studies are derived. 

 
Keywords: Artificial neural networks (ANN), condition monitoring, supervisory con-

trol and data acquisition (SCADA), failure detection, wind power, gearbox monitoring, 

turbine monitoring, icing detection 


VI  

 
Zusammenfassung 

Windenergie, die am schnellsten wachsende Technologie unter den erneuerbaren Ener-

gien, gewinnt weltweit an Bedeutung. Immer größere Anlagen werden an teilweise un-

zugänglichen Orten, beispielsweise Offshore, errichtet, um von guten Windbedingungen 

zu profitieren und Energieerträge zu maximieren. Diese Entwicklung bringt jedoch Her-

ausforderungen für Betrieb und Wartung der Anlagen mit sich. Eine intelligente, kos-

tenminimale Wartungsstrategie ist daher besonders wichtig. Die Analyse der Daten aus 

dem SCADA-System der Windkraftanlagen kann hierbei wertvolle Informationen zur 

Unterstützung der Wartungsplanung liefern. 

Im Rahmen dieser Arbeit werden Möglichkeiten zur Nutzung von SCADA-Daten für 

die Fehlerfrüherkennung in Windkraftanlagen untersucht. Hierbei wird eine Monitoring 

Methode weiterentwickelt und angewendet, die mithilfe von Neuronalen Netzen Anla-

genparameter unter Normalbedingungen modelliert und Abweichungen von gemesse-

nen Werten durch den Einsatz statistischer Methoden, wie beispielsweise der Mahala-

nobis Distanz, untersucht. Hierbei wird der Ansatz zum einen für das Monitoring einer 

einzelnen Komponente und zum anderen für die Überwachung der kompletten Anlage 

angewendet. Des Weiteren werden, um die Genauigkeit und Robustheit des Ansatzes zu 

erhöhen, mehrere Methoden zur Daten-Aufbereitung vorgestellt. 

Basierend auf den vorhandenen Daten konzentriert sich die Entwicklung und Anwen-

dung des komponentenbezogenen Ansatzes auf das Getriebe der Windkraftanlagen. Die 

Analyse mehrerer Fehlerfälle zeigt, dass die Methode Getriebefehler, lange bevor diese 

in einem kompletten Getriebeschaden resultieren, erkennen kann. Im Rahmen des Sys-

tem-Ansatzes wird die Anlagenperformance überwacht. Die Anwendung auf Anlagen 

mit Fehlern in der Generator-Lagerung zeigt vor allem die Herausforderungen bei der 

Beurteilung von Performance-Abweichungen. Des Weiteren wird gezeigt, dass mit die-

sem Ansatz Eisbildung an den Rotorblättern nachgewiesen werden kann. 


  VII 

 
Acknowledgement 

 
This research work has been carried out with support through the Professor Dr.-Ing. 

Erich Müller-Stiftung. The financial support is gratefully acknowledged. 

 
I would like to sincerely acknowledge my gratitude to my supervisor Pramod Banga-

lore, who has made my stay at Chalmers possible and has supported me throughout the 

research project. 

 
I would also like to thank both of my examiners Prof. Po Wen Cheng and Jimmy 

Ehnberg as well as my supervisor at my home institute Kolja Müller for their support 

and the uncomplicated arrangement of the research exchange. 

 
Special thanks goes to the employees of the industrial partner Stena Renewables; espe-

cially to Thomas Svensson and Johannes Lundvall whose support through data and ex-

pertise in wind turbine operation has contributed substantially to the outcomes of this 

work. 

 
Furthermore, I want to thank Daniel Karlsson for the vivid discussions around the 

common topic; Tobias Zengel, Sumit Kumar and especially Fabian Hufgard for proof-

reading parts of my report; and in addition all the fellow master thesis students for hav-

ing made the time in the office a good memory. 

 
Finally, I would like to thank Katarzyna Leszek for the warm support during these last 

busy weeks and my family for their loving support throughout my studies. 

 
Stuttgart, 2015-08-11 


VIII 

 
 IX 

 
Declaration of Originality 

I hereby certify that I am the sole author of this thesis and that no part of this thesis has 

been published or submitted for publication. 

Furthermore, I certify that, to the best of my knowledge, my thesis does not infringe 

upon anyone’s copyright nor violate any proprietary rights and that any ideas, tech-

niques, quotations, or any other material from the work of other people included in my 

thesis, published or otherwise, are fully acknowledged in accordance with the standard 

referencing practices. 

 
Stuttgart, 2015-08-11 

 
X 

 
    XI 

 
Table of Content 

Abstract ........................................................................................................................... V 

Zusammenfassung ........................................................................................................ VI 

Acknowledgement ....................................................................................................... VII 

Declaration of Originality ............................................................................................ IX 

Table of Content ............................................................................................................ XI 

Preface ........................................................................................................................... XV 

List of Figures ............................................................................................................. XVI 

List of Tables ........................................................................................................... XVIII 

Abbreviations ............................................................................................................... XX 

1 Introduction ......................................................................................................... 1 

1.1 Background ............................................................................................................ 1 

1.2 Task Description .................................................................................................... 2 

1.3 WT Data and Project Partner ................................................................................. 2 

2 Theoretical Background ..................................................................................... 3 

2.1 Wind Turbines and SCADA .................................................................................. 3 

2.1.1 The SCADA system .............................................................................................. 4 

2.1.2 Gearbox ................................................................................................................. 5 

2.2 Reliability and Maintenance in Wind Turbines ..................................................... 7 

2.2.1 Wind Turbine Reliability ....................................................................................... 7 

2.2.2 Maintenance Management in Wind Turbines ....................................................... 9 

2.2.3 Condition Monitoring in Wind Turbines ............................................................. 11 

2.2.4 SCADA based CM using Normal Behavior Models ........................................... 13 

2.3 Artificial Neural Networks .................................................................................. 14 

2.3.1 Building blocks of the Artificial Neural Network ............................................... 15 

2.3.2 Network Training Methods ................................................................................. 17 

2.3.3 Application of Artificial Neural Networks in Wind Turbines ............................. 19 

2.3.4 Neural Networks in MATLAB ............................................................................ 20 

2.4 Statistical Background ......................................................................................... 21 

2.4.1 Basic Statistical Measures ................................................................................... 21 

2.4.2 Distributions ........................................................................................................ 22 

2.4.3 Mahalanobis Distance ......................................................................................... 25 


XII  

 
2.5 Gearbox Condition Monitoring Approach ........................................................... 26 

2.5.1 Gearbox Model .................................................................................................... 26 

2.5.2 Anomaly Detection Approach ............................................................................. 26 

2.5.3 Anomaly Detection Application .......................................................................... 28 

3 Model Development ........................................................................................... 29 

3.1 Model Development Process ............................................................................... 29 

3.2 Parameter selection .............................................................................................. 30 

3.3 Model Architecture.............................................................................................. 31 

3.4 Data Pre-Processing ............................................................................................. 32 

3.5 Model Training .................................................................................................... 33 

3.5.1 Training Period and Turbine Individual Networks .............................................. 33 

3.5.2 ANN Training ...................................................................................................... 34 

3.5.3 Inconsistencies in ANN Training ......................................................................... 34 

3.5.4 Lag and Normalization Consideration ................................................................. 36 

3.6 Model Evaluation and Validation ....................................................................... 36 

3.6.1 Training Evaluation ............................................................................................. 37 

3.6.2 Healthy Turbine Application ............................................................................... 37 

3.6.3 Faulty Turbine Application .................................................................................. 38 

4 Gearbox Model ................................................................................................... 39 

4.1 Model Development and Training ....................................................................... 39 

4.1.1 Parameter Selection ............................................................................................. 39 

4.1.2 Data Pre-Processing ............................................................................................. 43 

4.2 Validation and Comparison ................................................................................. 49 

4.3 Model Application ............................................................................................... 53 

4.3.1 Gearbox Study Case 1 .......................................................................................... 53 

4.3.2 Gearbox Study Case 2 .......................................................................................... 58 

4.4 Discussion ............................................................................................................ 61 

5 Power Model ....................................................................................................... 63 

5.1 Model Development and Training ....................................................................... 63 

5.1.1 Parameter Selection ............................................................................................. 63 

5.1.2 Data Pre-Processing ............................................................................................. 64 

5.1.3 Data post-processing ............................................................................................ 67 

5.1.4 Model Training .................................................................................................... 67 

5.2 Validation and Comparison ................................................................................. 68 

5.3 Model Application ............................................................................................... 71 

5.3.1 Power Study Case Gearbox Failure ..................................................................... 71 

5.3.2 Power Study Case Generator Bearing Failure ..................................................... 73 

5.4 Discussion ............................................................................................................ 75 


 XIII 

 
6 Closure ................................................................................................................ 78 

6.1 Summary .............................................................................................................. 78 

6.2 Discussion and Conclusions ................................................................................ 78 

6.3 Future Work ......................................................................................................... 80 

References ...................................................................................................................... 82 

 
XIV 

 
 XV 

 
Preface 

The Swedish Wind Power Technology Centre (SWPTC) is a research centre for design 

of wind turbines. The purpose of the Centre is to support Swedish industry with knowl-

edge of design techniques as well as maintenance in the field of wind power. The re-

search in the Centre is carried out in six theme groups that represent design and opera-

tion of wind turbines; Power and Control Systems, Turbine and Wind loads, Mechanical 

Power Transmission and System Optimisation, Structure and Foundation, Maintenance 

and Reliability as well as Cold Climate.  

 
This Master’s Thesis was performed within the main project in Theme group 5.  

 
SWPTC’s work is funded by the Swedish Energy Agency, by three academic and thir-

teen industrial partners. The Region Västra Götaland also contributes to the Centre 

through several collaboration projects.  


 XVI 

 
List of Figures 

Figure 2-1: Cut-away view of a typical wind turbine (adopted from [9]) ........................... 4 

Figure 2-2: Measurements available in a typical SCADA system [4] ................................ 5 

Figure 2-3: Schematic structure of a three stage planetary gearbox typically used in 

WTs [4] ........................................................................................................... 6 

Figure 2-4:  Average number of failures per turbine and year by component and the 

resulting downtimes ....................................................................................... 8 

Figure 2-5: Contribution of each component to the annual turbine downtime.................... 9 

Figure 2-6: ANN based CM approach [4] ......................................................................... 14 

Figure 2-7: The sigmoid function plotted with varying shaping parameters .................... 16 

Figure 2-8: Examples for different ANN architectures [4] ............................................... 16 

Figure 2-9: Normal distribution with different parameter configurations ......................... 23 

Figure 2-10: Weibull distribution with different parameter configurations ...................... 24 

Figure 2-11: Mahalanobis distances based on a sample (white) with its center (red) ....... 25 

Figure 2-12: The averaged MHD violates the threshold several days in advance to a 

gearbox bearing failure in [4] ....................................................................... 28 

Figure 3-1: Schematic flow chart of the iterative model development process ................ 29 

Figure 3-2: Correlation matrix between different SCADA-parameters ............................ 30 

Figure 3-3: Example for ‘correct’ prediction of abnormally high bearing temperature 

by normal behavior model due to incorrect choice of input parameters ...... 31 

Figure 3-4: Turbine specific behavior profile of gear bearing temperatures 

throughout a year [4] .................................................................................... 33 

Figure 3-5:  Bearing temperature measured and modelled with different trainings for 

healthy (top) and faulty (bottom) turbine ..................................................... 35 

Figure 3-6: Structure of Model Training and Application ................................................ 35 

Figure 4-1: Visualization of final gearbox model parameter configuration with inputs 

(blue) and targets (violet) ............................................................................. 40 

Figure 4-2: Gear bearing temperature depending on power output and rotor rpm ............ 41 

Figure 4-3: Gearbox related parameter correlations averaged over more than 10 

healthy turbine years .................................................................................... 41 

Figure 4-4: Relative performance gear bearing model based on the MSE for different 

model input configurations and indication of the model’s anomaly 

detection ability. ........................................................................................... 42 

Figure 4-5: Visualization of the different filters applied within the gearbox model ......... 44 

Figure 4-6: Visualization of the General Boundary Filter ................................................. 45 

Figure 4-7: Visualization of the General Cluster Filter ..................................................... 46 

Figure 4-8: Temperature overestimation after large data gaps .......................................... 47 

Figure 4-9: Visualization of the Skip Filter ....................................................................... 48 

Figure 4-10:  Performance of different configurations for skip filter and skip 

parameter ...................................................................................................... 49 

Figure 4-11: Measured versus modelled temperatures for a healthy turbine .................... 50 


 XVII 

 
Figure 4-12: Training error histogram (100 bins) for the bearing temperature (left) 

and the gear oil temperature (right) model .................................................. 51 

Figure 4-14a: Gear bearing anomaly detection for healthy turbine .................................. 51 

Figure 4-14b: Gear oil anomaly detection for healthy turbine .......................................... 52 

Figure 4-15: Modelled and measured temperatures before gearbox failure ..................... 54 

Figure 4-16: Anomaly detection of both models before gearbox failure .......................... 55 

Figure 4-17: Output versus measured temperatures during the period of model alarm .... 56 

Figure 4-18: Rotor RPM and power input signals and SCADA alarms during model 

alarm ............................................................................................................ 56 

Figure 4-19: Modelled and measured temperatures before gearbox failure ..................... 58 

Figure 4-20: Anomaly detection of both models before gearbox failure in SC02 ............ 59 

Figure 4-21: ANN input signals and their extreme values in the training data set for 

the period when the model triggered alarms ................................................ 60 

Figure 5-1: Visualization of final power model parameter configuration with inputs 

(blue) and targets (violet) ............................................................................. 64 

Figure 5-2: Curtailment data points filtered from a training set. ...................................... 66 

Figure 5-3: Big deviation between model output and measured power due to 

averaging before and after turbine shutdown. .............................................. 67 

Figure 5-4: Measured versus modelled power output in February (left). Training data 

(black) and measured power (magenta), right. ............................................ 68 

Figure 5-5: Shift of power curve with seasons (left) towards less efficient power 

production with lower temperatures (right). ................................................ 69 

Figure 5-6: Modelled versus measured turbine power output over one day ..................... 70 

Figure 5-7: Power model application for anomaly detection in a gearbox failure case .... 72 

Figure 5-8: Modelled versus measured power for three threshold violation periods........ 72 

Figure 5-9: Power curve of training and application dataset ............................................ 73 

Figure 5-10: Shifted errors during application .................................................................. 73 

Figure 5-11: MHD measure for both turbines until failure occurrence ............................ 74 

Figure 5-12: Measured values in relation to training data set and model ......................... 75 

 
XVIII  

 
List of Tables 

Table 2-1: Overview of CM techniques applied in WTs based on [7] .............................. 11 

Table 2-2: Specification of the present gearbox model ..................................................... 26 

Table 3-1: ANN architecture specification for all developed models ............................... 32 

Table 4-1: Overview of filters of the gearbox model ........................................................ 43 

Table 4-2: GBF-boundaries for parameters of the gearbox model .................................... 45 

Table 4-3: Specification of parameters for clustering of data set and parameters used 

for filtering with MHD ................................................................................. 46 

Table 4-4: Model performance for healthy turbine application averaged over 20 

ANNs in comparison with literature values. ................................................ 50 

Table 4-5: Results of anomaly detection for LSS-bearing failure ..................................... 52 

Table 4-6: Summary of gearbox model specifications ...................................................... 53 

Table 4-7: Summary of gearbox study case 1 ................................................................... 57 

Table 4-8: Summary of gearbox study case two ............................................................... 61 

Table 4-9: Overview over investigated gearbox study cases............................................. 62 

Table 5-1: Overview of filters of the power model ........................................................... 64 

Table 5-2: GBF-boundaries for parameters of the power model ....................................... 65 

Table 5-3: Model performance for healthy turbine application averaged over 20 

ANNs in comparison with literature values. ................................................ 70 

Table 5-4: Summary of power model specifications ......................................................... 71 

 
XIX 

 
Abbreviations 

ANN Artificial Neural Network 

CBM Condition Based Maintenance 

CDF Cumulative Distribution Function 

CM Condition Monitoring 

COE Cost of Energy 

GBF General Boundary Filter 

LMA Levenberg-Marquard Algorithm 

MAE Mean Average Error 

MHD Mahalanobis Distance 

MSE Mean Square Error 

O&M Operation and Maintenance 

PDF Probability Density Function 

RMSE Root Mean Square Error 

SC Study Case 

SCADA Supervisory Control And Data Acquisition 

WT Wind Turbine 

 
 XXI 

 
  1 

 
1 Introduction 

1.1 Background 

Wind energy is currently the fastest growing renewable generation technology and is an 

important pillar for the transition to more sustainable energy systems in many countries. 

The global generation capacity reached 370 GW in 2014 which allows a supply of near-

ly 5 % of the world’s electricity demand [1]. In Europe wind is the leading technology 

in terms of new power capacity installations, far ahead of conventionals. Today approx-

imately 10 % of the European electricity consumption is generated by wind power and 

this share is expected to further grow in the coming years [2]. In other words, wind 

power is developing towards a major utility source. 

With this massive penetration wind energy has to compete with various generation 

technologies and cost of energy (COE) has become an important issue. Therefore, dif-

ferent developments to cut down generation cost can be observed in recent years. Tur-

bine size is increasing steadily to maximize each turbine’s output. In addition, the tur-

bines are erected at sites with best possible wind conditions which are more and more 

often found in remote locations, onshore or even offshore. These trends come with new 

challenges in O&M. Due to difficult logistics unexpected failures can be costly to repair 

and lead to long turbine downtimes, entailing production losses, which can have a sig-

nificant impact on the economics of a project [3]. 

Hence, maintenance management promises considerable cost saving potential and has 

received increasing attention in recent years. Efforts have focused on early failure detec-

tion in critical components of the WT; see for example [4, 5, and 6]. Condition monitor-

ing (CM) concepts provide valuable information and can contribute significantly to in-

creasing turbine reliability. Hence, a smart integration of CM information in the O&M- 

strategy, resulting in so called condition based maintenance (CBM), can help to mini-

mize O&M costs. Among the different CM approaches analysis of SCADA data with 

appropriate algorithms has shown promising results [4, 7]. 

The intention of this thesis is to contribute to early failure detection by analyzing data 

from the turbine‘s SCADA system. Therefore, the approach presented in [4] will be further 

developed and applied to critical WT components. 


2  

 
1.2 Task Description 

Wind industry has seen rapid growth in recent years with countries striving to have 

more sustainable energy sources in the electric power system. One of the obstacles for 

the growth of wind industry is high maintenance cost and long downtimes for WTs, 

especially for offshore wind farms [8]. Hence, focus on early detection of failure of crit-

ical components in the WT and condition based maintenance has increased in recent 

times. Traditional condition monitoring using vibration signals has proven to be a useful 

tool for monitoring the health of components. Furthermore, use of information rich Su-

pervisory Control and Data Acquisition (SCADA) data has received increased attention 

in recent years. This thesis aims to contribute to early failure detection by analyzing 

data from the turbine’s SCADA system. 

Within the framework for a wind power maintenance management tool, a methodology 

based on artificial neural networks for anomaly detection in gearboxes was presented in 

[4]. The gearbox is a critical component of the WT in terms of reliability and the ap-

proach has to be further developed and applied to new turbine data in study cases. 

Moreover, the project will analyze the potentials of monitoring the overall turbine per-

formance to detect degradation in one of the subcomponents. In particular, the detection 

of generator bearing failures in direct drive turbines is investigated. 

1.3 WT Data and Project Partner 

This master’s thesis project was carried out in cooperation with Stena Renewable as an 

industrial partner. Stena Renewables operates multiple wind farms in Sweden and pro-

vided data extracted from their SCADA systems. Moreover, Stena Renewable contrib-

uted to the project through their expertise in wind farm O&M. The outcome of the pro-

ject relies both on the correct application of appropriate methods as well as the quality 

of the input data. Thus the most promising data sets were carefully selected. With the 

analysis of the provided data, we hope to be able to contribute to the understanding of 

the recorded problems, as well as an early detection of future failures. 

In addition, SCADA data was provided from a WT manufacturer for different failure 

cases. Unfortunately not much additional information regarding the turbine’s condition 

and maintenance activities was available for these data sets. However, the data has been 

investigated and conclusions were drawn when possible. 


  3 

 
2 Theoretical Background 

This chapter provides the theoretical background knowledge which is required to un-

derstand and critically discuss the analysis conducted within this master’s thesis. 

Therefore, the first chapter gives an introduction into WTs and the relevant components 

followed by the chapters focusing on reliability and maintenance in WTs. Furthermore, 

the concept of neural networks, the statistical tools used within this thesis and the ap-

proach for anomaly detection in WTs are presented. References are given, when a more 

detailed explanation would exceed the scope of the chapter. 

2.1 Wind Turbines and SCADA 

WTs have long been used to utilize the kinetic energy of the wind. Nowadays mainly 

three bladed horizontal axis WTs are used for power generation. The turbines consist of 

typical sub components, which are briefly described below (based on [9]): 

 Rotor: consists of usually three blades flanged to the hub, which is mounted on 

the front end of the rotor shaft outside the nacelle. The rotor converts the kinetic 

energy of the wind into mechanical energy and transmits the rotation to the 

shaft. 

 Mechanical Drive Train: describes all rotating mechanical components in be-

tween the rotor hub and the generator. Its design can vary significantly depend-

ing on the turbines drive concept. Direct drive turbines are able to operate with-

out the most complex drive train component, the gearbox, but come with special 

requirements for the generator. The drive philosophy also influences the shaft 

bearing concept. 

 Electrical System: Covers all components for the conversion of the mechanical 

into electrical energy with the generator as the main component. Conventional 

synchronous and asynchronous generators can be found in WTs depending on 

the grid connection concept. A common configuration is a synchronous genera-

tor in combination with a converter, which decouples the generator and from the 

grid. 

 Nacelle: protects the whole drive train and the electrical system against envi-

ronmental impacts. Can be turned by the yaw system so that the rotor is always 

facing the main wind direction. Furthermore, the nacelle contains various auxil-

iary systems such as brakes, cooling system or measuring equipment to ensure a 

safe operation. 


4  

 
 Tower: The whole previously described configuration is mounted on top of a 

tower to benefit from higher wind speeds above ground. 

Figure 2-1 shows the typical arrangement of the described components. 

 
Figure 2-1: Cut-away view of a typical wind turbine (adopted from [9]) 

2.1.1 The SCADA system 

Contrary to conventional power plants, WTs are unmanned and often situated in remote 

locations. Nevertheless, a wind power plant also needs to be controlled and monitored. 

Therefore, the turbines are equipped with monitoring and data evaluation systems, so 

called Supervisory Control and Data Acquisition (SCADA) systems. On one hand 

SCADA enables to remote control the power plant. Turbines can be switched on or off, 

power output can be curtailed and the power factor adjusted if necessary. On the other 

hand the SCADA system collects measurements of various sensors placed all over the 

WT. Technical parameters, such as bearing and lubrication oil temperatures, electric 

quantities and power output are measured as well as environmental parameters like 

wind speed, wind direction or ambient and nacelle temperature. In fact, each WT manu-

facturer has an individual concept of how to set up the SCADA system of their turbines. 

Figure 2-2 gives an overview over the basic measurements typically collected.  


  5 

 
Figure 2-2: Measurements available in a typical SCADA system [4] 

Although highly individual, all of them have in common that large quantities of data are 

extracted and stored in databases. Modern turbines store hundreds of data points every 

ten minutes, which leads to a tremendous amount of data over the years. A complete 

yearly SCADA data set of one of the turbines analyzed in this thesis, for example, con-

tained more than half a million single measurements. Extracting them from the database 

for analysis can be time-consuming work, depending on the user-friendliness of the in-

terface and the available hardware. 

The collected measurements give an insight into the turbine’s instantaneous operating 

conditions and thus enable remote turbine monitoring. The SCADA system is, for in-

stance, able to automatically generate alarms and warnings, if a parameter exceeds a 

pre-selected threshold value. However, the information about turbine condition which is 

hidden in SCADA data is not fully utilized by turbine operators nowadays. This is par-

tially due to the fact that the system indicates impending failures too late and generates 

a vast number of alarms and warnings giving operators a hard time to distinguish be-

tween serious and negligible error messages [4]. Nevertheless, information from 

SCADA data can be extracted using more advanced mathematical and statistical meth-

ods. 

2.1.2 Gearbox 

A gearbox is typically used to increase the rotational speed of a WT’s rotor in order to 

utilize it for a higher speed electrical generator. Modern gearboxes can perform gear 

ratios of more than 1:100 and lose only a few percent of the transmitted power [9]. 

There are two main forms of toothed-wheel gearboxes: parallel-shaft systems and the 

technically more advanced planetary gearing. WTs generally require multiple stage gear 

systems and combined planetary-parallel-system can be found (compare Figure 2-3). 

The integrated planetary solution shows clear advantages in size, mass and relative cost 


6  

 
and is thus superior in large WTs. Nevertheless, cheap parallel-shaft solutions, which 

are widely available from different manufacturers, are often preferred in small turbines 

[9]. 

 
Figure 2-3: Schematic structure of a three stage planetary gearbox typically used in 

WTs [4] 

Like in other gearbox applications, WT gearboxes contain a gear oil system to ensure 

lubrication and steady temperatures of gears and bearings. Therefore, the multiple cir-

cuit system is equipped with heat exchangers for cooling at high temperatures and heat-

ing at low temperatures. It is controlled based on the gear oil temperature, which is usu-

ally measured in the oil sump and recorded by the SCADA-system. Furthermore, oil 

purity is an important factor for the service life of a gearbox and automated oil filtering 

is implemented in most gearboxes. Nevertheless, the gear oil is usually subject of regu-

lar inspections and has to be replaced during the lifetime of a gearbox [9]. 

Despite experience of almost two decades of WT technology, gearboxes are still a ma-

jor source for turbine failures (compare 2.2.1). Due to difficult dynamic operating con-

ditions and the high number of operating hours throughout a turbine’s lifetime gearbox 

dimensioning is a challenging task. Especially gearbox bearings, the gearwheels and the 

lubrication system are subjects of concern [8]. Unforeseen repairs or replacements of 

bearings, which sometimes necessitate the disassembly of the entire turbine, can be very 

expensive. Therefore, vibrations, temperatures and oil quality of roller bearings are 

normally subjected to online condition monitoring in modern turbines (compare 2.2.3) 

[9]. Moreover, the SCADA-system usually records gearbox bearing temperatures de-

pending on the manufacturer’s practice, the turbine generation and the requirements 

specified by the operator. 


  7 

 
2.2 Reliability and Maintenance in Wind Turbines 

As shown in the previous sections, WTs contain conventional components and subas-

semblies of mechanical-electrical energy conversion, such as a shafts, bearings, gear-

boxes and generators. Like other technical systems, they have to undergo regular ser-

vice to guarantee their correct operation. Nevertheless, maintenance is particularly im-

portant for a wind power plant, because WTs have to stand harsh environmental condi-

tions where component failures can have a decisive impact on a project’s economic suc-

cess. The following sections will provide information about the reliability of modern 

turbines and highlight the current state-of-art in WT O&M. 

2.2.1 Wind Turbine Reliability 

Once a WT is commissioned it has to operate properly for a design lifetime of at least 

20 years. Unlike other technical systems the turbines operate for several thousand hours 

each year while being exposed to a wide range of wind speeds and temperatures, includ-

ing extreme weather situations such as storms, lightning strikes and hail [9]. In fact, the 

site location has a significant impact on turbine reliability through the prevailing climate 

[10].These rough environmental conditions result in heavy dynamic loads, making WT 

components prone to fatigue failures. In consequence, reliable turbine design and opera-

tion is a challenging task [9]. 

On a system level, reliability is often characterized by turbine availability   which is 

calculated by dividing the mean time to failure MTTF through the sum out of MTTF and 

the mean down time MDT (compare equation 2-1) 

     
         (2-1) 

Despite the rough operating conditions average availability of today’s onshore turbines 

is usually above 95 % [11]. However, this high availability can only be guaranteed by a 

costly maintenance organization [12]. 

When analyzing turbine reliability in greater detail, it has been observed that some 

components of a WT fail more frequently than others, indicating that they are particu-

larly sensitive. The frequency of a specific failure’s occurrence is typically reported as 

its average failure rate      as failure per turbine and year. Therefore, the absolute 

number of failures    which occurred in a specific component is summed up over a 

certain period and then divided by the observation time   in turbine years (compare 

equation 2-2) [13]. 

     
            (2-2) 


8  

 
However, reliability of a turbine cannot be judged by looking at the failure frequency 

only, because the measure does not indicate the severity of a failure. Therefore, the 

average downtime      per failure caused by a specific component is calculated by 

summing up the individual downtimes    and dividing them by the total number of 

observed failures    (compare equation 2-3) [13]. The result is a measure for the aver-

age severity and production loss related to a certain component’s failure. 

     
          (2-3) 

Both measures, the average failure frequency of a component and the average down-

time of such a failure, are combined to calculate the average annual downtime caused 

by the turbine component, which indicates the severity of a failure and corresponds to 

the lost revenue due to a malfunction. This number is suggested as an indirect indica-

tor for the economic damage of a failure, in case no financial information is available 

[5]. 

In this thesis, data presented in [14] containing data for more than 620 turbines be-

tween 1997 and 2005 as well as data from a database containing 28 additional WTs with 

more actual data was used for the analysis of turbine reliability. Together, the data rep-

resents almost 3200 years of turbine operation. All of the turbines are located in Sweden 

and their size ranges from several hundred kW up to multiple MW. The results are pre-

sented in Figure 2-4 in form of average number of failures per turbines and year 

grouped by components and their subsequent average downtimes: 

 
Figure 2-4:  Average number of failures per turbine and year by component and the 

resulting downtimes  

The highest failure rate can be found in electrical components, the control system, 

including sensors, and the hydraulic system. However, these failures can often be 


  9 

 
fixed by a simple restart of the turbine system whereas other components cause much 

longer downtimes due to repair work and maintenance logistics. Breakdowns of main 

turbine components can lead to standstill periods of several weeks. That is why par-

ticularly gearbox failures cause long downtimes even though their average failure rate 

is not exceptionally high. 

It has also been observed, that the majority of a turbine’s annual downtime is caused by 

failures of few components. The failures were primarily related to gearboxes, electric 

systems, the blade/pitch- and the yaw system which account for more than 60% of an-

nual turbine downtime (compare Figure 2-5). Therefore, they are identified as critical 

for system reliability and the economic success of a wind project. 

 
Figure 2-5: Contribution of each component to the annual turbine downtime 

Publications presenting data on WT field failures show similar results and thus draw 

similar conclusions regarding component reliability (compare [12, 13, 15, 16, 17], and 

[8]). 

2.2.2 Maintenance Management in Wind Turbines 

Reliability problems in WTs can lead to high cost for operators. Component degradation 

and failures can result in severe performance degradation, costly repair or replacement 

actions and long turbine downtimes. These risks can be a serious threat to the economic 

success of a wind project. That is why especially small and medium size WT operators 

outsource maintenance and are willing to pay insurance premiums to maintenance spe-

cialists, who then guarantee certain turbine availability. However, O&M cost can ac-

count for up to 20 % of a wind project’s total COE and influences the measure in differ-

ent ways, as can be seen in equation 2-4 [3].  


10  

 
            (2-4) 

ICC represents the initial capital cost, usually the most important factor in the equa-

tion, which is multiplied with the fixed charge rate (FCR) and added to the levelized 

replacement cost (LRC), which is determined by turbine reliability. Moreover, reliabil-

ity influences the COE directly through O&M costs as well as indirectly by affecting 

the Annual Energy Production (AEP), which can be severely affected by failure 

caused downtime. Therefore, reducing reliability related costs shows great overall cost 

reduction potential and maintenance management aims to determine the optimal 

maintenance strategy to minimize these costs [3]. 

In maintenance management two main strategies can be distinguished and goal of intel-

ligent maintenance management is to identify a cost optimal strategy between those two 

traditional approaches [7] (compare Figure 2-1). 

 
Figure 2-1: Costs associated with traditional maintenance strategies (Adopted from 

[7]) 

 Corrective, sometimes also called reactive maintenance is a run to failure con-

cept. Maintenance actions are initiated after failure occurrence and detection. 

Thus, cost of repair is potentially high as only minimal failure prevention efforts 

are made. Also, this concept can lead to long turbine downtimes, in case compo-

nents with a long lead time need to be replaced. However, a corrective mainte-

nance approach allows utilizing the component lifetime to its maximum. 

 Preventive maintenance on the other hand intends to prevent an equipment 

breakdown through regular scheduled maintenance or condition based mainte-

Number of Failures

T
o

ta
l 
M

a
in

te
n

a
n

c
e

 C
o

s
t

 
Total Cost

Prevention Cost

Repair Cost

Corrective

Maintenance

Preventive

Maintenance

optimum

Intelligent

Maintenance


  11 

 
nance (CBM) actions. CBM is a subcategory of preventive maintenance which 

takes additional information about the turbine components into account. With 

the knowledge about the component’s condition actions can be initiated to miti-

gate the consequences of a failure even before failure occurrence. Therefore, it is 

necessary to detect the change in machinery condition on time and to be able to 

interpret the observed change correctly [18]. However, preventive maintenance 

aims for a reduction of repair cost which is partially compensated by the increas-

ing prevention efforts. 

2.2.3 Condition Monitoring in Wind Turbines 

For successful maintenance management, information about the turbine condition is 

essential. Based on that, the appropriate maintenance actions can be arranged. Tradi-

tionally, the information was acquired through manual onsite inspections. However, 

with the increasing number of installed turbines in remote sites frequent inspections 

becomes more challenging and expensive. Therefore, new CM-strategies are developed, 

combining new sensor technology with online of offline data analysis. Table 2-1 gives 

an overview of traditional and state-of-the-art condition monitoring approaches and 

their potential applications in WTs based on [7]. Furthermore, selected techniques are 

introduced in the following paragraphs. 

Table 2-1: Overview of CM techniques applied in WTs based on [7] 

 
 Temperature Monitoring: A standard approach for WT CM, which can be 

conducted with thermometers as well as infra-red thermography. It is one of the 

most popular CM tools applied in WTs. As every component has a maximum 

operational temperature which is usually exceeded only in case of abnormally 

high friction, it is a reliable criterion for failure detection. Furthermore, tempera-

Monitoring Approach

Visual Inspection
Cracking

Adjustment Error

Spalling

Fire

Rotor

Blades

Tower

Nacelle

Acoustic Emission

Temperature Measurement
 Lubrication Problems

Bearing Damages
Bad Connections

Drive Train

Electrical System
Generator

Thermography
Bearing Damages

Winding Damage

Broken Sensors

Electrical Problems

Drive Train

Electrical System
Generator

Vibration Analysis
Rotor

Drive Train
Tower

Oil Analysis
Oil Leakage

Lubrication Problems
Braking in Teeth

Strain measurement
Fatigue Information

Crack Information
Deterioration

Rotor

Blades

Shaft

Tower

Power Signal Analysis
Displacement

Eccentricity of Wheels
Rotor Assymmetries

WT subsystem

Drive Train

Detectable Failures

High Vibration

Drive Train

Generator

Gearbox

Defects in Rotating Elements


12  

 
tures are rather slow changing measurements due to the thermal inertia of the 

components. This can be an advantage when analyzing data with a low sample 

rate, for example 10 minute average values stored in a SCADA system. For 

temperature this can be a sufficient resolution for condition monitoring. On the 

other hand, slow changing measures have only limited value in early failure pre-

diction because they simply indicate a failure too late. Nevertheless, tempera-

tures are often used as a secondary criterion in case, for example, the vibration 

monitoring shows an alarm. 

 Vibration Monitoring: One of the well-established technologies for rotating 

machinery is the analysis of vibration signals, since changes in mechanical 

equipment can lead to abnormal vibration signals long before a failure occurs. 

The vibration signals, recorded by different sensors, are usually transformed into 

a frequency domain and then analyzed. In WTs vibration analysis is applied to 

monitor shafts, bearings, gearboxes and blades. Shortcomings of this technology 

are the requirement of additional equipment and difficulties in detecting low-

frequency faults. 

 Oil Analysis: Another broadly applied monitoring technique, especially in tur-

bines with gearboxes. As shown in 2.2.1, gearboxes are especially critical in 

terms of reliability and therefore gear oil analysis commonly used for gearbox 

monitoring, as it is the only method for detecting cracks inside the gearbox. 

Usually the oil’s viscosity, oxidation, water content, particles and temperature 

are recorded either through offline-sample analysis or online monitoring. Even 

though modern on-line sensing methods, such as electromagnetic, flow or pres-

sure-drop and optical debris sensing, are available, offline sample monitoring is 

often used due to the high cost for the online equipment. 

 Strain and Optical Monitoring: Recently, strain measurement and optical fiber 

monitoring for WT structures has received increasing attention as the fatigue 

loads the turbine is exposed to can be estimated. The measurements of strain 

gauges, which can be placed randomly on the structure, are processed with the 

help of finite element method to monitor the effects of the high dynamic loads. 

However, strain gauges are not very long lasting and these techniques require 

expensive measurement equipment. New approaches try to connect available 

SCADA-data measurements and short term strain measurements to extrapolate 

strain estimations. Such applications might help the technology to a broader ap-

plication in the future [19]. 

The technologies presented in the previous paragraphs are mainly used to monitor a 

specific subsystem within the turbine. Other approaches widen the balance limits and 

aim for monitoring the global WT system. Different mechanical and electrical faults for 


  13 

 
example lead to disturbances in the mechanical as well as in the electrical energy flow. 

Consequently mechanical torque oscillation can also be detected on the electrical side of 

the power train through power signal analysis. That way blade or rotor imbalances can 

be detected. A comparably simple method is the monitoring of process parameters. 

There, the values and relationships of temperatures, power, wind and rotor speed or 

blade angles are compared with specifications and limits determined by manufacturers. 

For this kind of analysis for example SCADA-signals can be used. More advanced ap-

proaches based on parameter prediction and trending are not common today. 

However, the importance of condition monitoring is expected to further increase in the 

future, due to the earlier mentioned developments in the wind industry. The more ma-

ture the new techniques become, the cheaper their application gets. Also, the cost of 

condition monitoring can be compensated with lower premiums for insurances reward-

ing such systems [9] Developing towards more reliable, cost effective, integrated and 

smart solutions condition monitoring is about to become an integral part of modern 

maintenance strategies [7]). 

2.2.4 SCADA based CM using Normal Behavior Models 

Today’s turbines are not necessarily equipped with sensors for stress, vibration or power 

analysis, but with numerous units collecting data for the SCADA system (compare 

2.1.1). The SCADA system collects information about the turbine key features, which 

can be analyzed for condition monitoring purposes. Thus, the analysis of SCADA data 

can be a cost effective integrated way to monitor several critical components of a WT 

[5]. 

Different techniques, ranging from simple threshold checks to complex statistical 

analyses are used to detect anomalies. A comprehensive overview of publications and 

their proposed methods to analyze SCADA data for CM of WTs is provided by [20]. 

A common approach is the application of normal behavior models. Based on inputs 

extracted from SCADA data the model should be able to predict a target parameter 

under normal operating conditions. For anomaly detection the real time signal is com-

pared with the estimated model output. The success of the approach is determined by 

the accuracy of the developed model. Here artificial intelligence methods have proven 

to be a sufficient tool for modelling complex systems, such as WT components [21]. 

Among different approaches neural networks showed particularly good results and were 

successfully applied in WT fault detection [22]. 


14  

 
Figure 2-6: ANN based CM approach [4] 

However, the utilization of SCADA data for CM comes with some challenges. Since 

the SCADA system was not originally designed for CM, not all parameters for a full 

turbine CM are available. Also, the data rate of 10 minute average values is too slow 

for some condition monitoring techniques [7]. Moreover, it can be difficult to trace 

back an anomaly in the data to its origin. Therefore, it is important to understand a 

failure’s specific impact on SCADA data. This knowledge can be achieved either 

through the analysis of data along with maintenance reports or with the help of data 

mining approaches, depending on data availability [21]. Nevertheless, exploitation of 

SCADA data for WT condition monitoring has successfully been demonstrated in 

several studies; see [4, 5, 6, 21, 23, 24 and 25]. 

2.3 Artificial Neural Networks 

Artificial neural networks (ANN) are a concept of computing inspired by the biologi-

cal structure brain. In analogy an ANN is able to acquire knowledge in a learning pro-

cess. After training it can recall the learned patterns and input/output relations. Since 

the training data presented to the ANN can be theoretical, experimental empirical or a 

combination of these, ANNs can be used for a broad range of applications [26]. More-

over, the network is able to generalize its knowledge to a certain extent and apply it to 

new input data it has never seen before. This makes it a powerful tool, well suited to 

model real world non-linear systems in engineering and science [27]. For problems, 

which are too complex for an analytical approach, ANNs can deliver an almost perfect 

approximation based on the experience drawn from the training data. However, this 

lack of analytical background comes with difficulties in explaining and judging the 

ANN’s output [26]. Even though the ANN is a black box model, it was demonstrated 

to be a useful tool in various applications [27]. The following sections give a general 

introduction into structure and functionality of ANNs based on [28]. 


  15 

 
2.3.1 Building blocks of the Artificial Neural Network 

The fundamental information processing unit of an ANN is called a neuron. A neuron 

generates an output based on its input signals and consists of three basic elements: A set 

of synapses, an adder and an activation function (compare Figure 2-2). 

 
Figure 2-2: Model of a neuron [4] 

Synapses are characterized by a weight or strength, which is determined during model 

training. A neuron’s input signal    at synapse j is multiplied with the synaptic weight 

  . Subsequently, it is added to all other weighted input signals and a fixed bias value   

by a linear combiner (compare equation 2-5). This sum    is input for the activation 

function   which determines the neuron’s output then (compare equation 2-6). 

       
                (2-5) 

                  (2-6) 

There are two different types of activation functions: Threshold and sigmoid functions. 

A threshold function is discontinuous and can assume a value of either 0 or 1 whereas a 

sigmoid function can assume any value between 0 and 1. Sigmoid functions are well 

balanced between linear and nonlinear behavior and the most common activation func-

tions used in neural networks. Their shape can be influenced by variation of the slope 

parameter  . Note that the sigmoid function becomes a threshold function for an infinite 

  (compare equation 2-7). Figure 2-7 shows the corresponding graph for different shape 

parameters. 

      
                 (2-7) 


16  

 
Figure 2-7: The sigmoid function plotted with varying shaping parameters 

Neurons can be arranged in different architectures depending on the network’s purpose. 

A single-layer network, as the name suggests, consists of only one single layer of neu-

rons which directly connect inputs and outputs. Multi-layer networks on the other hand 

contain one or more hidden layers. Outputs of the previous layer are used as input for 

the next layer. The elements of those layers, the hidden neurons, cannot be directly seen 

from either input or output of the network. Through hidden layers the network is able to 

model the higher order non-linearity in the input output relationship. 

In general, feed-forward and recurrent networks can be distinguished. In contrary to a 

feed-forward network a recurrent network has at least one feedback loop. Through 

feedback loops, non-linear dynamic behavior can be implemented and the performance 

of a network can be improved significantly. Figure 2-8 shows examples of different 

network structures. 

 
Figure 2-8: Examples for different ANN architectures [4] 

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

x


 (

x
)

 
a = infite

a = 100

a = 25

a = 5


  17 

 
Neural network design is a challenging task, because of the lack of well-developed the-

ory for network optimization. An architecture which is able to predict with accuracy 

must be found through experimental studies for a specific case. Two approaches are 

common to find the optimal network structure. The first option is to start with an over-

sized network and remove synapses or entire neurons, if they are not active or carry 

only little weight. Starting with a small network and increasing the number of neurons 

until satisfactory solutions are achieved is the second option. Both approaches include a 

trial and error to find the network, which suits the application best. However, when 

modelling real world non-linear relationships generally two hidden layers lead to suffi-

cient results [4]. 

2.3.2 Network Training Methods 

ANNs are intelligent systems, which are able to learn from their environment. 

Knowledge about input/output relations is acquired through a learning process and 

stored in form of a network’s synaptic weights. After a successful training the ANN is 

able to use this information to interpret and predict parameters in consistence with the 

outside world. Depending on the network’s purpose, it can be trained for different tasks, 

such as pattern association, pattern recognition, function approximation or control pur-

poses. There are two conceptual different learning methods for ANN training: super-

vised and unsupervised learning. 

Supervised Learning 

In supervised learning input/output examples are presented to the network. The training 

data contains labeled data sets. Input parameters represent different environmental con-

ditions and output parameters their desired network responses. A vector of input varia-

bles is presented to the network and its actual response is compared with the optimal 

response of the training data set. In an iterative process, the difference between actual 

and desired response is minimized by adjusting the synaptic weights. Through this pro-

cess of error-correction learning, knowledge which was previously stored in the pre-

defined training data is transferred to the network. A scheme of supervised learning is 

displayed in Figure 2-3. 

Within supervised learning two classes of training methods are distinguished: batch and 

online learning, in batch learning all training data samples are presented to the network 

simultaneously, what is called an epoch. Multiple epochs are generated through random 

shuffling for feedforward networks and through splitting for recurrent networks to also 

train the weight of the feedback-synapsis. Once the performance shows no further im-

provement, the training is finished. Through this parallel learning process, batch learn-

ing is fast and ensures convergence to a local minimum. However, achievement of a 

global minimum is not guaranteed. Online learning on the other hand optimizes the syn-


18  

 
aptic weights sample by sample. Once all samples have been presented to the network, 

one epoch is completed. Here the number of training epochs is also based on the per-

formance improvement from epoch to epoch. Online learning is slower than batch learn-

ing but simpler to implement and more responsive to redundancies. 

 
Figure 2-3: Scheme of supervised learning [4] 

Unsupervised Learning 

In case no labeled examples of the function to be learned by the network are available, 

unsupervised learning can be conducted. During the learning process a task independent 

measure of the desired network quality is optimized using competitive learning rules to 

adjust the synaptic weights. Consequently the network becomes tuned due to statistical 

regularities of the input data. 

Levenberg-Marquardt Algorithm 

There are multiple algorithms available to optimize the synaptic weights during model 

training. Within this thesis the Levenberg-Marquardt training algorithm (LMA) was 

used due to the fact that it is Matlab’s fastest and at the same time most accurate algo-

rithm for networks of up to a few 100 weights [29]. The LMA updates the synaptic 

weights according to equation 2-8. 

                     (2-8) 

The regularization parameter   is used to combine Newton’s method (for      and 

Gradient descent method (for    overpowering    for a fast convergence. H is the ap-

proximated Hessian matrix,   the identity matrix with the same dimensions and   the 

gradient vector of the cost function      (compare equations 2.9 – 2.11). 

     
      (2-9) 


  19 

 
           (2-10) 

     
          (2-11) 

              
  is the training sample and the approximating function           repre-

sents the network. For additional information about optimization algorithms for network 

training refer to [28]. 

2.3.3 Application of Artificial Neural Networks in Wind Turbines 

ANNs have the ability to model very complex non-linear relations and are therefore 

well suited for applications in WTs. They are mainly used to analyze the large sets of 

measurements from CM-sensors or the SCADA system. Also, they are applied to pre-

dict or optimize the power output and give information about turbine or component 

condition. Some of these approaches are highlighted in the following paragraphs. 

An approach for optimizing the power factor and production of a WT was presented 

by [30]. A control approach based on different data mining algorithms was generated 

to optimize settings of the blade pitch and yaw angle. ANNs with different configura-

tions were tested against a classification and regression tree as well as a support vector 

machine regression. The ANN based model showed the best results and it was shown 

that information drawn from historical SCADA data can significantly improve a tur-

bine’s power output. 

A methodology analyzing SCADA data with four data mining algorithms to predict 

turbine failures was presented in [31]. Here the turbine’s power curve was modelled 

by each of algorithm and used to determine turbine health. Failures were classified by 

occurrence, severity and the specific fault. The model was able to detect failures in 

advance and the approach using ANNs was identified as the best. A similar team con-

secutively used ANN’s for normal behavior modelling of bearing temperatures in WT 

[32]. 

An intelligent system for predictive maintenance for WT monitoring was subject of 

[33]. Within this framework multilayer perceptron ANNs were used to create normal 

behavior models for failure detection. This knowledge captured by the networks was 

then combined with a fuzzy expert system for fault diagnosis and maintenance optimi-

zation for WTs. Based on this, an on-line health condition monitoring tool, called 

SIMAP was developed and its application was presented for WT gearbox monitoring. 

Following a similar method, an ANN based normal behavior model for gearbox- and 

generator bearing temperatures was developed and presented in [21]. Gearbox bearing 

temperature and generator winding temperature were predicted and used for fault de-

tection.  


20  

 
A comparative analysis of neural network and regression based condition monitoring 

approaches for WT fault detection is conducted in [22]. The developed models are 

applied to five real measured faults. The comparison between the approaches reveal 

that ANN based models are best suited for failure detection, because they give earlier 

and clearer indication of damages. Moreover, it was realized, that the investigated 

bearing failures were easier to detect than the stator anomalies. The same authors de-

scribe the development and application of a method combining ANN based normal 

behavior models and fuzzy logic in [23] and [34]. Such an adaptive neuro fuzzy infer-

ence system allows implementation of expert knowledge in addition to ANN data 

analysis. A large number of normal behavior models is developed using 33 SCADA 

standard signals. The comparison with an ANN model shows that the selected ap-

proach has advantages in model training speed and fault diagnosis can be conducted 

using the fuzzy interference system. 

2.3.4 Neural Networks in MATLAB 

Within this thesis, the numerical computing environment MATLAB was used for data 

processing and the ANN based analysis. Therefore, the WT data, which was extracted 

from the SCADA-system in the txt-format, was converted into csv-files and then im-

ported into the MATLAB environment for processing and analysis. The following sec-

tions give a quick overview of the features and inbuilt functions used within this thesis. 

MATLAB offers a so called Neural Network Toolbox, which contains functions and 

apps for ANN-modelling and application. The program provides a graphical user inter-

face which facilitates model design and training through visualization and predefined 

figures. However, all implemented functions can also be manually called and modified 

within a MATLAB-script. 

The toolbox supports different supervised and unsupervised network architectures, 

ranging from relatively simple feedforward networks to complex dynamic or pattern 

recognition networks and thus allows choosing the most suitable configuration for the 

specific application. Also, several training algorithms are implemented, including gradi-

ent descent methods, conjugate gradient methods and the LMA. Moreover, the toolbox 

features various pre- and post-processing tools [35]. 

Throughout the thesis the software was found to be a useful tool for data processing 

and neural network analysis. The wide range of implemented functions facilitates the 

application of complex mathematical concepts significantly. However, using these 

pre-defined functions for a complex analysis still requires a complete understanding of 

the theoretical background, to be able to appropriately assess and judge the corre-

sponding outcomes. The current and the following chapter should be seen in this con-

text. 


  21 

 
2.4 Statistical Background 

Statistics helps us to understand and learn from data with the ultimate goal to translate 

data into knowledge [36].Within this thesis, large data sets are analyzed with the help of 

statistical tools to gain knowledge about the condition of technical components of a 

WT. The statistical tools which are hereby applied will be introduced in the following 

sections. 

2.4.1 Basic Statistical Measures 

The following paragraphs give a short introduction of the statistical standard measures 

which are used in this thesis either directly or as an input for more advanced analysis. If 

not referenced otherwise, the explanations are based on [36]. 

Mean Absolute and Mean Square Error 

For model performance evaluation two measures are used in this thesis: the mean abso-

lute error (MAE) and the mean squared error (MSE); both are commonly reported num-

bers in the evaluation of time series prediction [37]. The MAE is calculated as the aver-

age deviation of the predicted variable from the target value without taking their direc-

tion into account (compare equation 2-12) and it provides a vivid indication of the mod-

els quality. The MSE, however, is the most common performance function used to train 

neural networks [29] and calculated as shown in equation 2-13. Both equations are used 

for model assessment where fi represents the model’s output and yi the actual target 

measurement for the time step i for a total number of n time steps. 

    
     (2-12)  and      

 
     (2-13) 

Variance and Standard Deviation 

When the variability of a parameter is analyzed it is usually reported as a deviation from 

the mean. Hereby the average of the squared deviation from the mean is called variance 

  (compare equation 2-14). Since the variance uses squared units it is much easier to 

interpret its square root, the standard deviation (compare equation 2-15). 

  
      (2-14)  and      
        

  (2-15) 

In both equations n represents the number of points and    is the mean of the sample x. 

Looking at equation 2-15, it is obvious, that the larger the standard deviation, the higher 

is the variance. 


22  

 
Covariance and Correlation 

Also, the association between variables is of interest, especially when explanatory vari-

ables are required in modelling. The so called covariance and the correlation describe 

the strength of the linear association between two quantitative variables. The covariance 

can be calculated with equation 2-16. For multidimensional parameter associations, the 

covariance matrix is a helpful tool, where matrix element of position m,n is c         . 

           
          (2-16) 

N represents the number of points and    and    are the means of the samples x and y. 

The indicator commonly used to assess parameter relations is the correlation coefficient 

 , which is the normalized covariance. The correlation coefficient can be calculated by 

equation 2-17. 

  
             (2-17) 

Here,   is the total number of elements, and    and    are the standard deviations and    

and    the means of the samples x and y. The correlation coefficient shows the following 

properties: 

 r is always in the range of -1 to +1 and the stronger the linear association, the 

closer it is to the absolute value of 1. 

 A negative r indicates a negative and a positive r a positive association. 

 r has no unit and is identical, not matter which one is the explanatory and which 

the response variable. 

In case two signals are strongly associated but shifted relatively to each other, caused by 

a delay for example, a simple correlation analysis might not be able to detect the rela-

tion. Therefore, the correlation between two signals is calculated while one signal is 

shifted step-by-step relative to the other. This so called cross-correlation analysis allows 

identifying correlations even if the signals are shifted and is widely used in signal anal-

ysis. 

2.4.2 Distributions 

When analyzing the outcome of a model not only the absolute values, but also the fre-

quency of occurrence of these values can be important. A variable’s probability distri-

bution gives answers to both questions. This information can be used to separate more 

frequent regular outcomes from rare irregular ones, for example by defining a threshold 


  23 

 
based on a value’s frequency of occurrence. The theoretical background of distributions 

used within this thesis is explained in the following sections based on [38] and [36]. 

The probability distribution of a variable is typically specified by a probability density 

function (PDF), which determines the probability that a variate takes the value x (com-

pare equation 2-18). It is practical to normalize the PDF with the total area under the 

curve. Then the area under the curve above any particular interval corresponds to the 

intervals probability of occurrence and total area below the curve equals a probability of 

1. The integration of the PDF results in the cumulative distribution function (CDF) 

(compare equation 2-19). The CDF represents the probability that the variable takes a 

value less than or equal to x. 

                     (2-18) 

                     (2-19) 

Visualization of a variable’s distribution can be done with the help of histograms or by 

an approximated continuous distribution functions. Within this thesis the normal distri-

bution and a two parameter Weibull distribution were used. 

Normal Distribution 

The normal distribution is the most important distribution in statistics, partially because 

many variables appear to be normally distributed by nature but mainly because of the 

central limit theorem. It says that the sampling distribution of the mean becomes ap-

proximately normal even if the original variable was not normally distributed. The nor-

mal distribution is characterized by a symmetric, bell-shaped curve and can be de-

scribed with two parameters – the mean µ and the standard deviation σ (compare equa-

tion 2-20). 

        
                  (2-20) 

 
Figure 2-9: Normal distribution with different parameter configurations 

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
0

0.2

0.4

0.6

0.8

1

1.2

1.4

Values

D
e

n
s

it
y

 
 = 0 and  = 0.3

 = 0 and  = 1

 = 1 and  = 0.5


24  

 
One of its important characteristics is that the probability of occurrence within any 

number of standard deviations from the mean is identical for all normal distributions. 

Also, it describes the distribution of continuous, random variables. Therefore, the error 

is often assumed to be normally distributed in modelling applications. 

Weibull Distribution 

Another widely applicable distribution is the Weibull-distribution. It plays an important 

role in reliability and it is also used to describe site wind resources. The Weibull is a 

flexible distribution and its shape can be influenced by the shape parameter γ, its loca-

tion parameter μ and its scale parameter   (compare equation 2-21) [38]. In case the 

location parameter equals zero (μ=0) it results in the two parameter Weibull distribution 

used in this thesis (compare  

Figure 2-10). Also, it includes the Extreme Value Distribution (    and    ) as 

well as the Rayleigh distribution (    and    ) as special cases [38]. 

     
                     (2-21) 

The CDF for the two parameter Weibull distribution can be calculated following equa-

tion 2-22. 

                                   (2-22) 

Within this thesis the parameters of the Weibull distribution function are estimated us-

ing the MATLAB inbuilt function wblfit, which uses the maximum likelihood method 

for approximation. The parameters are then inputs for the MATLAB function wblcdf, 

which calculates the CDF function based on the PDF-parameters. 

 
Figure 2-10: Weibull distribution with different parameter configurations 

0 0.5 1 1.5 2 2.5 3
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Values

D
e

n
s

it
y

 
 = 1 and   = 1

 = 1 and   = 1.5

 = 2 and   = 3


  25 

 
2.4.3 Mahalanobis Distance 

The anomaly detection methodology applied in this thesis is based on the MHD and a 

good comprehension of the measure is therefore useful (based on [39]). The MHD is a 

unit less, multidimensional distance. It is calculated similarly to the better known Eu-

clidean distance but takes the covariance of its values into account which allows captur-

ing the correlation between the variables (compare equation 2-23). 

                
              for i=1 to n    (2-23) 

Here,                is the i
th

 vector from a total of n observations and    is the 

vector of its means. 

The graphical interpretation of the MHD in a two-dimensional variable space shows 

elliptic lines representing equivalent MHDs from the sample center. The shape of the 

ellipses is influenced by the correlation between the variables (compare Figure 2-11) 

 
Figure 2-11: Mahalanobis distances based on a sample (white) with its center (red) 

Figure 2-11 shows relative MHDs based on a basis sample (white data points). It can be 

observed, that the distance measure reacts much more sensitive to data points which are 

not ‘in line’ with the basis sample. This feature makes the MHD useful for outlier detec-

tion, where it was successfully applied in many fields. 

-4 -3 -2 -1 0 1 2 3 4

-3

-2

-1

0

1

2

3

X1

X
2

 
Basis Sample Center

M
a

h
a

la
n

o
b

is
 D

is
ta

n
c

e

50

100

150

200

250


26  

 
2.5 Gearbox Condition Monitoring Approach 

Successful condition monitoring using normal behavior models consist of two main 

parts. Firstly, a model is required that is able to predict the target variable with high ac-

curacy. And then an approach has to be developed which is able to reliably distinguish 

model inaccuracies and abnormal conditions. Among others, a promising approach for 

condition monitoring based on SCADA data was presented in [4] (compare 2.3.3). Be-

cause the present thesis aims to further develop and apply this approach, it will be 

introduced more detailed in the following sections. 

2.5.1 Gearbox Model 

The present approach uses a NARX ANN to model the normal behavior of gearbox 

bearing temperatures. The ANN contains 20 neurons with sigmoid activation functions 

in the hidden and one neuron with a threshold function in the output layer. The tempera-

ture of the monitored bearing is modelled using the five input parameters displayed in 

Table 2-2. 

Table 2-2: Specification of the present gearbox model 

ANN Type NARX 

Layer Hidden Output 

Neurons 20 1 

Activation Function Sigmoid Threshold 

Inputs 

Power 
Rotor RPM 

Nacelle Temperature 

Gear Oil Temperature 
LSS Bearing Temperature 

Outputs HSS Bearing Temperature 

 
Also an automated approach for training data selection is presented in [4] to prevent 

over fitting and speed up the training process. This training data selection procedure, 

however, was not followed in this thesis, as over fitting did not occur and moderate 

training times were achieved. Moreover, a basic pre-filtering was conducted, which 

was found to be crucial to prevent false network training and thus was extended in the 

present work. 

2.5.2 Anomaly Detection Approach 

One of the challenges in the application of ANNs for condition monitoring is the appro-

priate judgement of model output. When is a prediction error due to inaccurate model-

ling and when does a deviation from the measured value indicate a component failure? 

As the ANN lacks of physical understanding of the modelled component, this questions, 


  27 

 
has to be answered with the help of statistical tools. Therefore, the RMSE and the 

Mahalanobis distance were compared in [4], in which the latter was found to be the 

more robust and thus the more adequate measure to detect malfunctions in WT com-

ponents. 

For calculating the MHD during condition monitoring stage the data set containing the 

SCADA-measurements of the target variables and the corresponding model errors are 

combined (compare equation 2-24). Afterwards their MHD values are calculated using 

equation 2-25, where      is the mean error during training and      is the covariance 

matrix for the healthy data during model training. 

                             (2-24) 

                      
                   (2-25) 

Threshold Definition 

To decide whether a data point is reflecting abnormal behavior, an appropriate thresh-

old value has to be defined. As a prerequisite, the training data of the normal behavior 

model has to be free of failures and represents the healthy component condition. Un-

der that assumption it can be concluded that errors during model training are due to 

inaccuracies of the ANN model. This information is taken into account, when decid-

ing the threshold value for anomaly detection. That’s why the threshold value is calcu-

lated based on the model errors during training stage and data points in monitoring 

stage which show a high MHD compared to the MHDs obtained during training stage 

can be labeled as outliers 

The MHD values during the healthy turbine state, namely during network training, is 

calculated using equation 2-26 and 2-27.         represents the model’s training 

errors and          the SCADA measurements of the target parameter during the 

trining period. 

                              (2-26) 

                        
       (2-27) 

                         (2-28) 

The distribution of the MHD values during training was found to be accuratley 

represented by a two-parameter Weibull probability distribution function (compare 

2.4.2). Hence any data point during condition monitoring stage is defined as an outlier, 


28  

 
if the occurance of its MHD in a healthy turbine is less than 1% (comopare equation 2-

12) [4]. In addition, gearbox-related SCADA alarms where taken into account, to judge 

the turbine condition. 

2.5.3 Anomaly Detection Application 

The presented approach was applied to a turbine with a gearbox bearing failure in [4] 

which was detected several days before the vibration monitoring alarm which lead to 

an inspection where the failure was discovered. For anomaly detection the MHD was 

averaged over three days and then compared to the calculated threshold, since the 

MHD reacts much more sensitive to outliers than for example the RMSE. The averag-

ing ensured that the threshold is only violated in case of high MHD-values over a 

longer period and thus it can be concluded that the health of the monitored component 

is seriously affected. Therefore, false alarms are based on model errors are excluded 

which increases the robustness of the approach. Figure 2-12 shows the development of 

the averaged MHD-measure and the threshold value in a successful failure detection 

case presented in [4]. 

 
Figure 2-12: The averaged MHD violates the threshold several days in advance to a 

gearbox bearing failure in [4] 


  29 

 
3 Model Development 

Against the presented background this thesis aims to further develop and apply the 

anomaly detection methodology introduced in chapter 2.5. Therefore, the following 

chapters describe the general model development process followed within this thesis 

and explain its subtasks. Within the chapters 4 and 5 the described approach is applied 

for CM of WTs. 

3.1 Model Development Process 

The process of developing an ANN based normal behavior model can be divided into 

multiple subtasks which together represent an iterative development process. Before a 

first model training the input and output parameters have to be selected according to the 

desired application. Moreover, a suitable ANN architecture has to be specified. Lastly, a 

data pre-processing approach has to be developed, to enable appropriate model training. 

After completing these tasks the model can be trained and the result should be verified 

during a testing and validation process, where the model is applied to healthy and faulty 

WTs. When developing an ANN, it can be difficult to find the optimal network configu-

ration for a specific application, since the performance depends on all the previously 

described factors and processes. Thus, finding a suitable ANN for an engineering appli-

cation is always an iterative process, where the pre-training configurations are varied 

until a sufficient result is achieved (compare Figure 3-1) [26]. The following sections 

describe the general approaches followed by this thesis in the development of the ANN 

based normal behavior models. 

 
Figure 3-1: Schematic flow chart of the iterative model development process 


30  

 
3.2 Parameter selection 

The selection of appropriate input and output parameters is an essential part of ANN 

development. In a first step the target parameter has to be selected. Potential component 

failures should manifest themselves in the chosen measurement, to enable failure detec-

tion. This shows the importance of target parameter selection for successful anomaly 

detection. In many cases there is only little choice because of the limited availability of 

measurements addressing the malfunction. In fact, the applicability of the approach of-

ten depends on the availability of potential target measurements. 

The selection of input parameters, on the other hand, is more complex. Relevant input 

parameters have to be chosen in a way, so that the model is able to predict the target 

parameter under normal operating conditions with sufficient accuracy. This ensures a 

detectable deviation between the model output and the actual parameter measurement 

during a malfunction in the corresponding component. In contrast to the target parame-

ter selection there usually is a big number of potential input measurements to choose 

from. Here, the physical relations between the turbine components which result in cor-

relations between the corresponding parameters play a key role. However, only few 

works have considered correlations between parameters of the SCADA system at the 

stage of parameter selection [24]. In this thesis a comprehensive study of the correla-

tions between component related parameters has been conducted. Figure 3-2 shows the 

correlation coefficients between selected parameters. Data representing almost 10 WT 

years has been analyzed and the results have been taken into account when selecting the 

model inputs. 

 
Figure 3-2: Correlation matrix between different SCADA-parameters 

1 2 3 4 5 6 7 8 9 10 11

WTG19_Generator Bearing Temp. Avg. (1)

WTG19_Generator Phase1 Temp. Avg. (2)

WTG19_Generator RPM Avg. (3)

WTG19_Hydraulic Oil Temp. Avg. (4)

WTG19_Gear Bearing Temp. Avg. (5)

WTG19_Gear Oil Temp. Avg. (6)

WTG19_Nacelle Temp. Avg. (7)

WTG19_Rotor RPM Avg. (9)

WTG19_Ambient Temp. Avg. (9)

WTG19_Ambient WindSpeed Avg. (10)

WTG19_Grid Production Power Avg. (11)

Average Correlation Between SCADA-Parameters

 
A
v
e
ra

g
e
 C

o
rr

e
la

ti
o
n
 C

o
e
ff

ic
ie

n
t 

R

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


  31 

 
When choosing the model inputs and outputs, two main objectives were considered. 

Firstly, the performance of the normal behavior model was optimized to achieve a suffi-

cient accuracy and secondly, the model has to correctly indicate failures as well as pre-

vent false alarms during the application stage. Both conditions were evaluated in the 

validation process described in chapter 3.6. 

It has been realized that the choice of input parameters should not be based on statistics 

only. Even though an input parameter with a high correlation to the target parameter 

will probably result in a performance improvement, it can lead to problems in anomaly 

detection. This is especially critical if two parameters show high correlation and similar 

behavior in case of a component failure. Due to the high correlation the input parameter 

is likely to get highly weighted during model training. Thus the parameter will have a 

big influence on the model output and improve the model’s performance significantly, 

since it gives a clear indication of target parameter. In case of a failure however, this 

results in a ‘correct‘ prediction of the abnormal target parameter behavior, which is then 

labels as ‘normal’. Figure 3-3 gives an example of such a case. Hence, the turbine’s 

physical system relations have to be taken into account during the selection process to 

avoid such model behavior. 

 
Figure 3-3: Example for ‘correct’ prediction of abnormally high bearing temperature 

by normal behavior model due to incorrect choice of input parameters 

3.3 Model Architecture 

As mentioned earlier, there is no established standard method for neural network design 

and thus a suitable and stable network has to be found in a trial and error process (com-

pare 2.3.1). After defining the input and output parameters, which in engineering appli-

cations are often defined by the technical problem itself, a network topology has to be 

determined [26]. Within this thesis, the model architecture was selected based on find-

40

45

50

55

60

65

70

75

80

T
e
m

p
e
ra

tu
re

 i
n

 °
C

Model Validation Gearbox

 
Measured Temperature

Modelled Temperture


32  

 
ings of related projects. In [4] a NARX network with 20 hidden neurons was success-

fully applied for detection of a gearbox failure (compare 2.5.1). The same configura-

tion was found to be sufficient in [40] where parametric studies were carried out to find 

the best model architecture for modelling the power output of a turbine. This is why 

this configuration was chosen for both models presented later within this thesis. Table 

3-1 sums up the selected ANN topology. 

Table 3-1: ANN architecture specification for all developed models 

ANN Type NARX 

Layer Hidden Output 

Neurons 20 1 

Activation Function Sigmoid Threshold 

 
3.4 Data Pre-Processing 

After successful determination of model architecture and parameters, the network needs 

to be trained. To build a functioning normal behavior model, the training data presented 

to the ANN has to represent normal operating conditions of the turbine. This is especial-

ly important since the synaptic weights are decided solely based on the training data, 

without any physical understanding of the system. If a model has been trained with er-

roneous data, it might not be able to identify abnormal behavior as such and thus fail its 

purpose. 

Unfortunately, data extracted from SCADA system is usually not ‘clean’. Malfunctions 

in the SCADA communication system, sensor or signal processing errors and standstill 

during maintenance and repair actions lead to missing and faulty data points, hidden in 

the large data sets. Also it cannot be guaranteed, that the complete data set selected for 

training does not contain any traces of minor errors during this period. To make sure 

that the ANN training is not distorted by such measurements, faulty data is removed 

from the training data set by applying an initial data screening and filtering process. 

In general, it was realized that SCADA systems form different manufacturers report the 

measurements with variable reliability. Some systems reported more than 95 % of the 

yearly operational data points correctly, whereas in others only around 50 % of the data 

sets were complete. This also depends on the recording philosophy. Some systems keep 

recording measurements, when the turbine is out of operation, others do not. However, 

sufficient model training was found to be possible also in cases with only half of the 

training subsets available, provided that the training data set covers the whole range of 

normal operation throughout the application period. 


  33 

 
Since the data pre-processing is model specific, it is described in the corresponding 

model sections (compare chapter 4.1.2 and 5.1.2). 

3.5 Model Training 

Model training is a crucial factor for the successful application of ANN based normal 

behavior models, since the application performance highly relies on the training data 

presented to the net. The data pre-processing ensures that unhealthy data is removed 

from the training sets. However, it is not guaranteed that the training data covers the full 

range of normal operating conditions. This is particularly important because at presence 

ANNs are not good at extrapolating information beyond the training domain [26]. On 

the other hand, too much training data leads to extensive training times and overfitting, 

which again results in a decrease of the models application performance. This is why it 

is important to select appropriate training periods. 

3.5.1 Training Period and Turbine Individual Networks 

For sufficient model training, it is very important, that the training data presented to the 

network covers the complete scope of the relevant parameters as well as their combina-

tions and patterns for healthy turbine behavior. For the turbines located in Sweden, dis-

tinctive seasonal variations of operating parameters, especially temperatures, were ob-

served (compare Figure 3-4). Consequently, training data representing the period of a 

whole year was used to train the networks, if available. 70 % out of this data is used for 

model training, 15 % for testing and 15 % for an initial validation. 

 
Figure 3-4: Turbine specific behavior profile of gear bearing temperatures throughout 

a year [4] 

Figure 3-4 also explains why it was decided to train one individual model for each tur-

bine instead of developing one general model which can be applied to several turbines. 


34  

 
It can be observed that the same parameter, in this case a gear bearing temperature, 

shows significantly different behavior from turbine to turbine, even though all of them 

are located in the same geographical area and are facing similar environmental condi-

tions. Therefore, individual models can approximate the selected operational parameters 

more precisely and thus are better suited for accurate normal behavior modelling. Indi-

vidual turbine behavior can be modelled with the help of ANNs by training the network 

with data from a particular turbine, resulting in a unique, turbine specific model. 

3.5.2 ANN Training 

The LMA, which is used within this thesis to train the ANNs, starts model training with 

a random initialization of synaptic weights, which are then optimized (compare chapter 

2.3.2). This means that networks which are exposed to the same training data sets get 

slightly different synaptic weights assigned during the training process. In general, this 

is not a problem, since the differences are marginal, but it is possible that the training 

process gets stuck in a local minimum which leads to a relatively bad performing mod-

el. In order to prevent this, n-number of ANNs are trained with the same input data and 

the model with the best performance is consecutively chosen. This ensures that the 

model, which will later be applied for anomaly detection does not show particularly bad 

performance. Within this thesis, the number of trainings to choose the best ANN from 

was arbitrarily chosen as three. However, a larger number can be chosen but at cost of 

computation time. 

3.5.3 Inconsistencies in ANN Training 

The random initialization of the synaptic weights at the beginning of the training pro-

cess leads to a unique ANN at the end of each training session. The best-of-three-

trainings practice, described in the previous paragraph, excludes the possibility of an 

unusually bad training result. It has been observed that different trainings lead to net-

works which model the target parameter with only small variations, if the application 

input is in the range the network has been trained for (compare top chart Figure 3-5). 

Nevertheless, the random synaptic weight initialization can cause problems in anomaly 

detection stage. In case data is presented to the network it has not seen during training, 

which is a plausible scenario in case of a malfunction, since the network has been 

trained with healthy turbine data only, the ANNs might have weaknesses in extrapolat-

ing beyond the training domain. That can lead to model responses, which differ signifi-

cantly from training to training in case of a malfunction (compare bottom chart Figure 

3-5). 


  35 

 
Figure 3-5:  Bearing temperature measured and modelled with different trainings for 

healthy (top) and faulty (bottom) turbine 

This behavior can lead to inconsistent results in anomal