Department of Energy and Environment Institute of Aircraft Design CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF STUTTGART Gothenburg, Sweden 2015 Stuttgart, Germany 2015 SCADA-Data Analysis for Condition Monitoring of Wind Turbines Master’s thesis in Energy Engineering Simon Letzgus MASTER’S THESIS SCADA Data Analysis for Condition Monitoring of Wind Turbine Components Master’s Thesis within the Energy Engineering program SIMON LETZGUS EXAMINERS: Prof. Dr. Po-Wen Cheng Ph.D. Jimmy Ehnberg SUPERVISORS: Lic. Pramod Bangalore Dipl. Ing. Kolja Müller Department of Energy and Environment Division of Electric Power Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2015 Institute of Aircraft Design Stuttgart Wind Energy (SWE) UNIVERSITY OF STUTTGART Stuttgart, Germany 2015 V Abstract Wind energy, the world’s fastest growing renewable energy technology, is developing towards a major utility source. Turbines are growing in size and are located in more remote sites, sometimes even offshore, to benefit from better wind conditions. These developments help to maximize the output per turbine but come with challenges for operation and maintenance (O&M). Unexpected failures result in longer downtimes and consequently higher revenue losses. Hence, maintenance management promises consid- erable cost saving potential and the analysis of data form the turbine inbuilt supervisory control and data acquisition (SCADA) system can effectively support maintenance de- cisions. This thesis aims to investigate possibilities to utilize SCADA data for early failure de- tection in critical wind turbines (WTs). Therefore, a condition monitoring approach is further developed and applied. The method uses artificial neural networks to model tar- get parameters under normal operating conditions and analyzes deviations from the measured values with the help of statistical tools, such as the Mahalanobis distance (MHD) measure. In order to increase the robustness and accuracy of the approach, the development of several data pre-processing methods is presented. Two different anoma- ly detection philosophies are investigated by building two different models. A gearbox model which is monitoring local variables to indicate component malfunctions and a power model which is predicting the turbine’s power output to indicate problems form a system’s perspective. Based on the available data both monitoring approaches were applied to investigate gearbox failures for indirect drive WTs and generator bearing failures for direct drive WTs. Furthermore, the power model was found to be an effective method for ice detec- tion on WT blades. The successful detection of gearbox anomalies long before a final component breakdown is presented. However, the model was not able to detect all gear- related problems investigated. It was concluded that the availability of parameters which are potentially affected by component malfunctions play a decisive role in this approach. The power model application showed that a different anomaly detection ap- proach might be better suited for the investigated cases. However, this approach is well suited for the detection of icing and recommendations for further studies are derived. Keywords: Artificial neural networks (ANN), condition monitoring, supervisory con- trol and data acquisition (SCADA), failure detection, wind power, gearbox monitoring, turbine monitoring, icing detection VI Zusammenfassung Windenergie, die am schnellsten wachsende Technologie unter den erneuerbaren Ener- gien, gewinnt weltweit an Bedeutung. Immer größere Anlagen werden an teilweise un- zugänglichen Orten, beispielsweise Offshore, errichtet, um von guten Windbedingungen zu profitieren und Energieerträge zu maximieren. Diese Entwicklung bringt jedoch Her- ausforderungen für Betrieb und Wartung der Anlagen mit sich. Eine intelligente, kos- tenminimale Wartungsstrategie ist daher besonders wichtig. Die Analyse der Daten aus dem SCADA-System der Windkraftanlagen kann hierbei wertvolle Informationen zur Unterstützung der Wartungsplanung liefern. Im Rahmen dieser Arbeit werden Möglichkeiten zur Nutzung von SCADA-Daten für die Fehlerfrüherkennung in Windkraftanlagen untersucht. Hierbei wird eine Monitoring Methode weiterentwickelt und angewendet, die mithilfe von Neuronalen Netzen Anla- genparameter unter Normalbedingungen modelliert und Abweichungen von gemesse- nen Werten durch den Einsatz statistischer Methoden, wie beispielsweise der Mahala- nobis Distanz, untersucht. Hierbei wird der Ansatz zum einen für das Monitoring einer einzelnen Komponente und zum anderen für die Überwachung der kompletten Anlage angewendet. Des Weiteren werden, um die Genauigkeit und Robustheit des Ansatzes zu erhöhen, mehrere Methoden zur Daten-Aufbereitung vorgestellt. Basierend auf den vorhandenen Daten konzentriert sich die Entwicklung und Anwen- dung des komponentenbezogenen Ansatzes auf das Getriebe der Windkraftanlagen. Die Analyse mehrerer Fehlerfälle zeigt, dass die Methode Getriebefehler, lange bevor diese in einem kompletten Getriebeschaden resultieren, erkennen kann. Im Rahmen des Sys- tem-Ansatzes wird die Anlagenperformance überwacht. Die Anwendung auf Anlagen mit Fehlern in der Generator-Lagerung zeigt vor allem die Herausforderungen bei der Beurteilung von Performance-Abweichungen. Des Weiteren wird gezeigt, dass mit die- sem Ansatz Eisbildung an den Rotorblättern nachgewiesen werden kann. VII Acknowledgement This research work has been carried out with support through the Professor Dr.-Ing. Erich Müller-Stiftung. The financial support is gratefully acknowledged. I would like to sincerely acknowledge my gratitude to my supervisor Pramod Banga- lore, who has made my stay at Chalmers possible and has supported me throughout the research project. I would also like to thank both of my examiners Prof. Po Wen Cheng and Jimmy Ehnberg as well as my supervisor at my home institute Kolja Müller for their support and the uncomplicated arrangement of the research exchange. Special thanks goes to the employees of the industrial partner Stena Renewables; espe- cially to Thomas Svensson and Johannes Lundvall whose support through data and ex- pertise in wind turbine operation has contributed substantially to the outcomes of this work. Furthermore, I want to thank Daniel Karlsson for the vivid discussions around the common topic; Tobias Zengel, Sumit Kumar and especially Fabian Hufgard for proof- reading parts of my report; and in addition all the fellow master thesis students for hav- ing made the time in the office a good memory. Finally, I would like to thank Katarzyna Leszek for the warm support during these last busy weeks and my family for their loving support throughout my studies. Stuttgart, 2015-08-11 VIII IX Declaration of Originality I hereby certify that I am the sole author of this thesis and that no part of this thesis has been published or submitted for publication. Furthermore, I certify that, to the best of my knowledge, my thesis does not infringe upon anyone’s copyright nor violate any proprietary rights and that any ideas, tech- niques, quotations, or any other material from the work of other people included in my thesis, published or otherwise, are fully acknowledged in accordance with the standard referencing practices. Stuttgart, 2015-08-11 X XI Table of Content Abstract ........................................................................................................................... V Zusammenfassung ........................................................................................................ VI Acknowledgement ....................................................................................................... VII Declaration of Originality ............................................................................................ IX Table of Content ............................................................................................................ XI Preface ........................................................................................................................... XV List of Figures ............................................................................................................. XVI List of Tables ........................................................................................................... XVIII Abbreviations ............................................................................................................... XX 1 Introduction ......................................................................................................... 1 1.1 Background ............................................................................................................ 1 1.2 Task Description .................................................................................................... 2 1.3 WT Data and Project Partner ................................................................................. 2 2 Theoretical Background ..................................................................................... 3 2.1 Wind Turbines and SCADA .................................................................................. 3 2.1.1 The SCADA system .............................................................................................. 4 2.1.2 Gearbox ................................................................................................................. 5 2.2 Reliability and Maintenance in Wind Turbines ..................................................... 7 2.2.1 Wind Turbine Reliability ....................................................................................... 7 2.2.2 Maintenance Management in Wind Turbines ....................................................... 9 2.2.3 Condition Monitoring in Wind Turbines ............................................................. 11 2.2.4 SCADA based CM using Normal Behavior Models ........................................... 13 2.3 Artificial Neural Networks .................................................................................. 14 2.3.1 Building blocks of the Artificial Neural Network ............................................... 15 2.3.2 Network Training Methods ................................................................................. 17 2.3.3 Application of Artificial Neural Networks in Wind Turbines ............................. 19 2.3.4 Neural Networks in MATLAB ............................................................................ 20 2.4 Statistical Background ......................................................................................... 21 2.4.1 Basic Statistical Measures ................................................................................... 21 2.4.2 Distributions ........................................................................................................ 22 2.4.3 Mahalanobis Distance ......................................................................................... 25 XII 2.5 Gearbox Condition Monitoring Approach ........................................................... 26 2.5.1 Gearbox Model .................................................................................................... 26 2.5.2 Anomaly Detection Approach ............................................................................. 26 2.5.3 Anomaly Detection Application .......................................................................... 28 3 Model Development ........................................................................................... 29 3.1 Model Development Process ............................................................................... 29 3.2 Parameter selection .............................................................................................. 30 3.3 Model Architecture.............................................................................................. 31 3.4 Data Pre-Processing ............................................................................................. 32 3.5 Model Training .................................................................................................... 33 3.5.1 Training Period and Turbine Individual Networks .............................................. 33 3.5.2 ANN Training ...................................................................................................... 34 3.5.3 Inconsistencies in ANN Training ......................................................................... 34 3.5.4 Lag and Normalization Consideration ................................................................. 36 3.6 Model Evaluation and Validation ....................................................................... 36 3.6.1 Training Evaluation ............................................................................................. 37 3.6.2 Healthy Turbine Application ............................................................................... 37 3.6.3 Faulty Turbine Application .................................................................................. 38 4 Gearbox Model ................................................................................................... 39 4.1 Model Development and Training ....................................................................... 39 4.1.1 Parameter Selection ............................................................................................. 39 4.1.2 Data Pre-Processing ............................................................................................. 43 4.2 Validation and Comparison ................................................................................. 49 4.3 Model Application ............................................................................................... 53 4.3.1 Gearbox Study Case 1 .......................................................................................... 53 4.3.2 Gearbox Study Case 2 .......................................................................................... 58 4.4 Discussion ............................................................................................................ 61 5 Power Model ....................................................................................................... 63 5.1 Model Development and Training ....................................................................... 63 5.1.1 Parameter Selection ............................................................................................. 63 5.1.2 Data Pre-Processing ............................................................................................. 64 5.1.3 Data post-processing ............................................................................................ 67 5.1.4 Model Training .................................................................................................... 67 5.2 Validation and Comparison ................................................................................. 68 5.3 Model Application ............................................................................................... 71 5.3.1 Power Study Case Gearbox Failure ..................................................................... 71 5.3.2 Power Study Case Generator Bearing Failure ..................................................... 73 5.4 Discussion ............................................................................................................ 75 XIII 6 Closure ................................................................................................................ 78 6.1 Summary .............................................................................................................. 78 6.2 Discussion and Conclusions ................................................................................ 78 6.3 Future Work ......................................................................................................... 80 References ...................................................................................................................... 82 XIV XV Preface The Swedish Wind Power Technology Centre (SWPTC) is a research centre for design of wind turbines. The purpose of the Centre is to support Swedish industry with knowl- edge of design techniques as well as maintenance in the field of wind power. The re- search in the Centre is carried out in six theme groups that represent design and opera- tion of wind turbines; Power and Control Systems, Turbine and Wind loads, Mechanical Power Transmission and System Optimisation, Structure and Foundation, Maintenance and Reliability as well as Cold Climate. This Master’s Thesis was performed within the main project in Theme group 5. SWPTC’s work is funded by the Swedish Energy Agency, by three academic and thir- teen industrial partners. The Region Västra Götaland also contributes to the Centre through several collaboration projects. XVI List of Figures Figure 2-1: Cut-away view of a typical wind turbine (adopted from [9]) ........................... 4 Figure 2-2: Measurements available in a typical SCADA system [4] ................................ 5 Figure 2-3: Schematic structure of a three stage planetary gearbox typically used in WTs [4] ........................................................................................................... 6 Figure 2-4: Average number of failures per turbine and year by component and the resulting downtimes ....................................................................................... 8 Figure 2-5: Contribution of each component to the annual turbine downtime.................... 9 Figure 2-6: ANN based CM approach [4] ......................................................................... 14 Figure 2-7: The sigmoid function plotted with varying shaping parameters .................... 16 Figure 2-8: Examples for different ANN architectures [4] ............................................... 16 Figure 2-9: Normal distribution with different parameter configurations ......................... 23 Figure 2-10: Weibull distribution with different parameter configurations ...................... 24 Figure 2-11: Mahalanobis distances based on a sample (white) with its center (red) ....... 25 Figure 2-12: The averaged MHD violates the threshold several days in advance to a gearbox bearing failure in [4] ....................................................................... 28 Figure 3-1: Schematic flow chart of the iterative model development process ................ 29 Figure 3-2: Correlation matrix between different SCADA-parameters ............................ 30 Figure 3-3: Example for ‘correct’ prediction of abnormally high bearing temperature by normal behavior model due to incorrect choice of input parameters ...... 31 Figure 3-4: Turbine specific behavior profile of gear bearing temperatures throughout a year [4] .................................................................................... 33 Figure 3-5: Bearing temperature measured and modelled with different trainings for healthy (top) and faulty (bottom) turbine ..................................................... 35 Figure 3-6: Structure of Model Training and Application ................................................ 35 Figure 4-1: Visualization of final gearbox model parameter configuration with inputs (blue) and targets (violet) ............................................................................. 40 Figure 4-2: Gear bearing temperature depending on power output and rotor rpm ............ 41 Figure 4-3: Gearbox related parameter correlations averaged over more than 10 healthy turbine years .................................................................................... 41 Figure 4-4: Relative performance gear bearing model based on the MSE for different model input configurations and indication of the model’s anomaly detection ability. ........................................................................................... 42 Figure 4-5: Visualization of the different filters applied within the gearbox model ......... 44 Figure 4-6: Visualization of the General Boundary Filter ................................................. 45 Figure 4-7: Visualization of the General Cluster Filter ..................................................... 46 Figure 4-8: Temperature overestimation after large data gaps .......................................... 47 Figure 4-9: Visualization of the Skip Filter ....................................................................... 48 Figure 4-10: Performance of different configurations for skip filter and skip parameter ...................................................................................................... 49 Figure 4-11: Measured versus modelled temperatures for a healthy turbine .................... 50 XVII Figure 4-12: Training error histogram (100 bins) for the bearing temperature (left) and the gear oil temperature (right) model .................................................. 51 Figure 4-14a: Gear bearing anomaly detection for healthy turbine .................................. 51 Figure 4-14b: Gear oil anomaly detection for healthy turbine .......................................... 52 Figure 4-15: Modelled and measured temperatures before gearbox failure ..................... 54 Figure 4-16: Anomaly detection of both models before gearbox failure .......................... 55 Figure 4-17: Output versus measured temperatures during the period of model alarm .... 56 Figure 4-18: Rotor RPM and power input signals and SCADA alarms during model alarm ............................................................................................................ 56 Figure 4-19: Modelled and measured temperatures before gearbox failure ..................... 58 Figure 4-20: Anomaly detection of both models before gearbox failure in SC02 ............ 59 Figure 4-21: ANN input signals and their extreme values in the training data set for the period when the model triggered alarms ................................................ 60 Figure 5-1: Visualization of final power model parameter configuration with inputs (blue) and targets (violet) ............................................................................. 64 Figure 5-2: Curtailment data points filtered from a training set. ...................................... 66 Figure 5-3: Big deviation between model output and measured power due to averaging before and after turbine shutdown. .............................................. 67 Figure 5-4: Measured versus modelled power output in February (left). Training data (black) and measured power (magenta), right. ............................................ 68 Figure 5-5: Shift of power curve with seasons (left) towards less efficient power production with lower temperatures (right). ................................................ 69 Figure 5-6: Modelled versus measured turbine power output over one day ..................... 70 Figure 5-7: Power model application for anomaly detection in a gearbox failure case .... 72 Figure 5-8: Modelled versus measured power for three threshold violation periods........ 72 Figure 5-9: Power curve of training and application dataset ............................................ 73 Figure 5-10: Shifted errors during application .................................................................. 73 Figure 5-11: MHD measure for both turbines until failure occurrence ............................ 74 Figure 5-12: Measured values in relation to training data set and model ......................... 75 XVIII List of Tables Table 2-1: Overview of CM techniques applied in WTs based on [7] .............................. 11 Table 2-2: Specification of the present gearbox model ..................................................... 26 Table 3-1: ANN architecture specification for all developed models ............................... 32 Table 4-1: Overview of filters of the gearbox model ........................................................ 43 Table 4-2: GBF-boundaries for parameters of the gearbox model .................................... 45 Table 4-3: Specification of parameters for clustering of data set and parameters used for filtering with MHD ................................................................................. 46 Table 4-4: Model performance for healthy turbine application averaged over 20 ANNs in comparison with literature values. ................................................ 50 Table 4-5: Results of anomaly detection for LSS-bearing failure ..................................... 52 Table 4-6: Summary of gearbox model specifications ...................................................... 53 Table 4-7: Summary of gearbox study case 1 ................................................................... 57 Table 4-8: Summary of gearbox study case two ............................................................... 61 Table 4-9: Overview over investigated gearbox study cases............................................. 62 Table 5-1: Overview of filters of the power model ........................................................... 64 Table 5-2: GBF-boundaries for parameters of the power model ....................................... 65 Table 5-3: Model performance for healthy turbine application averaged over 20 ANNs in comparison with literature values. ................................................ 70 Table 5-4: Summary of power model specifications ......................................................... 71 XIX Abbreviations ANN Artificial Neural Network CBM Condition Based Maintenance CDF Cumulative Distribution Function CM Condition Monitoring COE Cost of Energy GBF General Boundary Filter LMA Levenberg-Marquard Algorithm MAE Mean Average Error MHD Mahalanobis Distance MSE Mean Square Error O&M Operation and Maintenance PDF Probability Density Function RMSE Root Mean Square Error SC Study Case SCADA Supervisory Control And Data Acquisition WT Wind Turbine XXI 1 1 Introduction 1.1 Background Wind energy is currently the fastest growing renewable generation technology and is an important pillar for the transition to more sustainable energy systems in many countries. The global generation capacity reached 370 GW in 2014 which allows a supply of near- ly 5 % of the world’s electricity demand [1]. In Europe wind is the leading technology in terms of new power capacity installations, far ahead of conventionals. Today approx- imately 10 % of the European electricity consumption is generated by wind power and this share is expected to further grow in the coming years [2]. In other words, wind power is developing towards a major utility source. With this massive penetration wind energy has to compete with various generation technologies and cost of energy (COE) has become an important issue. Therefore, dif- ferent developments to cut down generation cost can be observed in recent years. Tur- bine size is increasing steadily to maximize each turbine’s output. In addition, the tur- bines are erected at sites with best possible wind conditions which are more and more often found in remote locations, onshore or even offshore. These trends come with new challenges in O&M. Due to difficult logistics unexpected failures can be costly to repair and lead to long turbine downtimes, entailing production losses, which can have a sig- nificant impact on the economics of a project [3]. Hence, maintenance management promises considerable cost saving potential and has received increasing attention in recent years. Efforts have focused on early failure detec- tion in critical components of the WT; see for example [4, 5, and 6]. Condition monitor- ing (CM) concepts provide valuable information and can contribute significantly to in- creasing turbine reliability. Hence, a smart integration of CM information in the O&M- strategy, resulting in so called condition based maintenance (CBM), can help to mini- mize O&M costs. Among the different CM approaches analysis of SCADA data with appropriate algorithms has shown promising results [4, 7]. The intention of this thesis is to contribute to early failure detection by analyzing data from the turbine‘s SCADA system. Therefore, the approach presented in [4] will be further developed and applied to critical WT components. 2 1.2 Task Description Wind industry has seen rapid growth in recent years with countries striving to have more sustainable energy sources in the electric power system. One of the obstacles for the growth of wind industry is high maintenance cost and long downtimes for WTs, especially for offshore wind farms [8]. Hence, focus on early detection of failure of crit- ical components in the WT and condition based maintenance has increased in recent times. Traditional condition monitoring using vibration signals has proven to be a useful tool for monitoring the health of components. Furthermore, use of information rich Su- pervisory Control and Data Acquisition (SCADA) data has received increased attention in recent years. This thesis aims to contribute to early failure detection by analyzing data from the turbine’s SCADA system. Within the framework for a wind power maintenance management tool, a methodology based on artificial neural networks for anomaly detection in gearboxes was presented in [4]. The gearbox is a critical component of the WT in terms of reliability and the ap- proach has to be further developed and applied to new turbine data in study cases. Moreover, the project will analyze the potentials of monitoring the overall turbine per- formance to detect degradation in one of the subcomponents. In particular, the detection of generator bearing failures in direct drive turbines is investigated. 1.3 WT Data and Project Partner This master’s thesis project was carried out in cooperation with Stena Renewable as an industrial partner. Stena Renewables operates multiple wind farms in Sweden and pro- vided data extracted from their SCADA systems. Moreover, Stena Renewable contrib- uted to the project through their expertise in wind farm O&M. The outcome of the pro- ject relies both on the correct application of appropriate methods as well as the quality of the input data. Thus the most promising data sets were carefully selected. With the analysis of the provided data, we hope to be able to contribute to the understanding of the recorded problems, as well as an early detection of future failures. In addition, SCADA data was provided from a WT manufacturer for different failure cases. Unfortunately not much additional information regarding the turbine’s condition and maintenance activities was available for these data sets. However, the data has been investigated and conclusions were drawn when possible. 3 2 Theoretical Background This chapter provides the theoretical background knowledge which is required to un- derstand and critically discuss the analysis conducted within this master’s thesis. Therefore, the first chapter gives an introduction into WTs and the relevant components followed by the chapters focusing on reliability and maintenance in WTs. Furthermore, the concept of neural networks, the statistical tools used within this thesis and the ap- proach for anomaly detection in WTs are presented. References are given, when a more detailed explanation would exceed the scope of the chapter. 2.1 Wind Turbines and SCADA WTs have long been used to utilize the kinetic energy of the wind. Nowadays mainly three bladed horizontal axis WTs are used for power generation. The turbines consist of typical sub components, which are briefly described below (based on [9]):  Rotor: consists of usually three blades flanged to the hub, which is mounted on the front end of the rotor shaft outside the nacelle. The rotor converts the kinetic energy of the wind into mechanical energy and transmits the rotation to the shaft.  Mechanical Drive Train: describes all rotating mechanical components in be- tween the rotor hub and the generator. Its design can vary significantly depend- ing on the turbines drive concept. Direct drive turbines are able to operate with- out the most complex drive train component, the gearbox, but come with special requirements for the generator. The drive philosophy also influences the shaft bearing concept.  Electrical System: Covers all components for the conversion of the mechanical into electrical energy with the generator as the main component. Conventional synchronous and asynchronous generators can be found in WTs depending on the grid connection concept. A common configuration is a synchronous genera- tor in combination with a converter, which decouples the generator and from the grid.  Nacelle: protects the whole drive train and the electrical system against envi- ronmental impacts. Can be turned by the yaw system so that the rotor is always facing the main wind direction. Furthermore, the nacelle contains various auxil- iary systems such as brakes, cooling system or measuring equipment to ensure a safe operation. 4  Tower: The whole previously described configuration is mounted on top of a tower to benefit from higher wind speeds above ground. Figure 2-1 shows the typical arrangement of the described components. Figure 2-1: Cut-away view of a typical wind turbine (adopted from [9]) 2.1.1 The SCADA system Contrary to conventional power plants, WTs are unmanned and often situated in remote locations. Nevertheless, a wind power plant also needs to be controlled and monitored. Therefore, the turbines are equipped with monitoring and data evaluation systems, so called Supervisory Control and Data Acquisition (SCADA) systems. On one hand SCADA enables to remote control the power plant. Turbines can be switched on or off, power output can be curtailed and the power factor adjusted if necessary. On the other hand the SCADA system collects measurements of various sensors placed all over the WT. Technical parameters, such as bearing and lubrication oil temperatures, electric quantities and power output are measured as well as environmental parameters like wind speed, wind direction or ambient and nacelle temperature. In fact, each WT manu- facturer has an individual concept of how to set up the SCADA system of their turbines. Figure 2-2 gives an overview over the basic measurements typically collected. 5 Figure 2-2: Measurements available in a typical SCADA system [4] Although highly individual, all of them have in common that large quantities of data are extracted and stored in databases. Modern turbines store hundreds of data points every ten minutes, which leads to a tremendous amount of data over the years. A complete yearly SCADA data set of one of the turbines analyzed in this thesis, for example, con- tained more than half a million single measurements. Extracting them from the database for analysis can be time-consuming work, depending on the user-friendliness of the in- terface and the available hardware. The collected measurements give an insight into the turbine’s instantaneous operating conditions and thus enable remote turbine monitoring. The SCADA system is, for in- stance, able to automatically generate alarms and warnings, if a parameter exceeds a pre-selected threshold value. However, the information about turbine condition which is hidden in SCADA data is not fully utilized by turbine operators nowadays. This is par- tially due to the fact that the system indicates impending failures too late and generates a vast number of alarms and warnings giving operators a hard time to distinguish be- tween serious and negligible error messages [4]. Nevertheless, information from SCADA data can be extracted using more advanced mathematical and statistical meth- ods. 2.1.2 Gearbox A gearbox is typically used to increase the rotational speed of a WT’s rotor in order to utilize it for a higher speed electrical generator. Modern gearboxes can perform gear ratios of more than 1:100 and lose only a few percent of the transmitted power [9]. There are two main forms of toothed-wheel gearboxes: parallel-shaft systems and the technically more advanced planetary gearing. WTs generally require multiple stage gear systems and combined planetary-parallel-system can be found (compare Figure 2-3). The integrated planetary solution shows clear advantages in size, mass and relative cost 6 and is thus superior in large WTs. Nevertheless, cheap parallel-shaft solutions, which are widely available from different manufacturers, are often preferred in small turbines [9]. Figure 2-3: Schematic structure of a three stage planetary gearbox typically used in WTs [4] Like in other gearbox applications, WT gearboxes contain a gear oil system to ensure lubrication and steady temperatures of gears and bearings. Therefore, the multiple cir- cuit system is equipped with heat exchangers for cooling at high temperatures and heat- ing at low temperatures. It is controlled based on the gear oil temperature, which is usu- ally measured in the oil sump and recorded by the SCADA-system. Furthermore, oil purity is an important factor for the service life of a gearbox and automated oil filtering is implemented in most gearboxes. Nevertheless, the gear oil is usually subject of regu- lar inspections and has to be replaced during the lifetime of a gearbox [9]. Despite experience of almost two decades of WT technology, gearboxes are still a ma- jor source for turbine failures (compare 2.2.1). Due to difficult dynamic operating con- ditions and the high number of operating hours throughout a turbine’s lifetime gearbox dimensioning is a challenging task. Especially gearbox bearings, the gearwheels and the lubrication system are subjects of concern [8]. Unforeseen repairs or replacements of bearings, which sometimes necessitate the disassembly of the entire turbine, can be very expensive. Therefore, vibrations, temperatures and oil quality of roller bearings are normally subjected to online condition monitoring in modern turbines (compare 2.2.3) [9]. Moreover, the SCADA-system usually records gearbox bearing temperatures de- pending on the manufacturer’s practice, the turbine generation and the requirements specified by the operator. 7 2.2 Reliability and Maintenance in Wind Turbines As shown in the previous sections, WTs contain conventional components and subas- semblies of mechanical-electrical energy conversion, such as a shafts, bearings, gear- boxes and generators. Like other technical systems, they have to undergo regular ser- vice to guarantee their correct operation. Nevertheless, maintenance is particularly im- portant for a wind power plant, because WTs have to stand harsh environmental condi- tions where component failures can have a decisive impact on a project’s economic suc- cess. The following sections will provide information about the reliability of modern turbines and highlight the current state-of-art in WT O&M. 2.2.1 Wind Turbine Reliability Once a WT is commissioned it has to operate properly for a design lifetime of at least 20 years. Unlike other technical systems the turbines operate for several thousand hours each year while being exposed to a wide range of wind speeds and temperatures, includ- ing extreme weather situations such as storms, lightning strikes and hail [9]. In fact, the site location has a significant impact on turbine reliability through the prevailing climate [10].These rough environmental conditions result in heavy dynamic loads, making WT components prone to fatigue failures. In consequence, reliable turbine design and opera- tion is a challenging task [9]. On a system level, reliability is often characterized by turbine availability which is calculated by dividing the mean time to failure MTTF through the sum out of MTTF and the mean down time MDT (compare equation 2-1) (2-1) Despite the rough operating conditions average availability of today’s onshore turbines is usually above 95 % [11]. However, this high availability can only be guaranteed by a costly maintenance organization [12]. When analyzing turbine reliability in greater detail, it has been observed that some components of a WT fail more frequently than others, indicating that they are particu- larly sensitive. The frequency of a specific failure’s occurrence is typically reported as its average failure rate as failure per turbine and year. Therefore, the absolute number of failures which occurred in a specific component is summed up over a certain period and then divided by the observation time in turbine years (compare equation 2-2) [13]. (2-2) 8 However, reliability of a turbine cannot be judged by looking at the failure frequency only, because the measure does not indicate the severity of a failure. Therefore, the average downtime per failure caused by a specific component is calculated by summing up the individual downtimes and dividing them by the total number of observed failures (compare equation 2-3) [13]. The result is a measure for the aver- age severity and production loss related to a certain component’s failure. (2-3) Both measures, the average failure frequency of a component and the average down- time of such a failure, are combined to calculate the average annual downtime caused by the turbine component, which indicates the severity of a failure and corresponds to the lost revenue due to a malfunction. This number is suggested as an indirect indica- tor for the economic damage of a failure, in case no financial information is available [5]. In this thesis, data presented in [14] containing data for more than 620 turbines be- tween 1997 and 2005 as well as data from a database containing 28 additional WTs with more actual data was used for the analysis of turbine reliability. Together, the data rep- resents almost 3200 years of turbine operation. All of the turbines are located in Sweden and their size ranges from several hundred kW up to multiple MW. The results are pre- sented in Figure 2-4 in form of average number of failures per turbines and year grouped by components and their subsequent average downtimes: Figure 2-4: Average number of failures per turbine and year by component and the resulting downtimes The highest failure rate can be found in electrical components, the control system, including sensors, and the hydraulic system. However, these failures can often be 9 fixed by a simple restart of the turbine system whereas other components cause much longer downtimes due to repair work and maintenance logistics. Breakdowns of main turbine components can lead to standstill periods of several weeks. That is why par- ticularly gearbox failures cause long downtimes even though their average failure rate is not exceptionally high. It has also been observed, that the majority of a turbine’s annual downtime is caused by failures of few components. The failures were primarily related to gearboxes, electric systems, the blade/pitch- and the yaw system which account for more than 60% of an- nual turbine downtime (compare Figure 2-5). Therefore, they are identified as critical for system reliability and the economic success of a wind project. Figure 2-5: Contribution of each component to the annual turbine downtime Publications presenting data on WT field failures show similar results and thus draw similar conclusions regarding component reliability (compare [12, 13, 15, 16, 17], and [8]). 2.2.2 Maintenance Management in Wind Turbines Reliability problems in WTs can lead to high cost for operators. Component degradation and failures can result in severe performance degradation, costly repair or replacement actions and long turbine downtimes. These risks can be a serious threat to the economic success of a wind project. That is why especially small and medium size WT operators outsource maintenance and are willing to pay insurance premiums to maintenance spe- cialists, who then guarantee certain turbine availability. However, O&M cost can ac- count for up to 20 % of a wind project’s total COE and influences the measure in differ- ent ways, as can be seen in equation 2-4 [3]. 10 (2-4) ICC represents the initial capital cost, usually the most important factor in the equa- tion, which is multiplied with the fixed charge rate (FCR) and added to the levelized replacement cost (LRC), which is determined by turbine reliability. Moreover, reliabil- ity influences the COE directly through O&M costs as well as indirectly by affecting the Annual Energy Production (AEP), which can be severely affected by failure caused downtime. Therefore, reducing reliability related costs shows great overall cost reduction potential and maintenance management aims to determine the optimal maintenance strategy to minimize these costs [3]. In maintenance management two main strategies can be distinguished and goal of intel- ligent maintenance management is to identify a cost optimal strategy between those two traditional approaches [7] (compare Figure 2-1). Figure 2-1: Costs associated with traditional maintenance strategies (Adopted from [7])  Corrective, sometimes also called reactive maintenance is a run to failure con- cept. Maintenance actions are initiated after failure occurrence and detection. Thus, cost of repair is potentially high as only minimal failure prevention efforts are made. Also, this concept can lead to long turbine downtimes, in case compo- nents with a long lead time need to be replaced. However, a corrective mainte- nance approach allows utilizing the component lifetime to its maximum.  Preventive maintenance on the other hand intends to prevent an equipment breakdown through regular scheduled maintenance or condition based mainte- Number of Failures T o ta l M a in te n a n c e C o s t Total Cost Prevention Cost Repair Cost Corrective Maintenance Preventive Maintenance optimum Intelligent Maintenance 11 nance (CBM) actions. CBM is a subcategory of preventive maintenance which takes additional information about the turbine components into account. With the knowledge about the component’s condition actions can be initiated to miti- gate the consequences of a failure even before failure occurrence. Therefore, it is necessary to detect the change in machinery condition on time and to be able to interpret the observed change correctly [18]. However, preventive maintenance aims for a reduction of repair cost which is partially compensated by the increas- ing prevention efforts. 2.2.3 Condition Monitoring in Wind Turbines For successful maintenance management, information about the turbine condition is essential. Based on that, the appropriate maintenance actions can be arranged. Tradi- tionally, the information was acquired through manual onsite inspections. However, with the increasing number of installed turbines in remote sites frequent inspections becomes more challenging and expensive. Therefore, new CM-strategies are developed, combining new sensor technology with online of offline data analysis. Table 2-1 gives an overview of traditional and state-of-the-art condition monitoring approaches and their potential applications in WTs based on [7]. Furthermore, selected techniques are introduced in the following paragraphs. Table 2-1: Overview of CM techniques applied in WTs based on [7]  Temperature Monitoring: A standard approach for WT CM, which can be conducted with thermometers as well as infra-red thermography. It is one of the most popular CM tools applied in WTs. As every component has a maximum operational temperature which is usually exceeded only in case of abnormally high friction, it is a reliable criterion for failure detection. Furthermore, tempera- Monitoring Approach Visual Inspection Cracking Adjustment Error Spalling Fire Rotor Blades Tower Nacelle Acoustic Emission Temperature Measurement Lubrication Problems Bearing Damages Bad Connections Drive Train Electrical System Generator Thermography Bearing Damages Winding Damage Broken Sensors Electrical Problems Drive Train Electrical System Generator Vibration Analysis Rotor Drive Train Tower Oil Analysis Oil Leakage Lubrication Problems Braking in Teeth Strain measurement Fatigue Information Crack Information Deterioration Rotor Blades Shaft Tower Power Signal Analysis Displacement Eccentricity of Wheels Rotor Assymmetries WT subsystem Drive Train Detectable Failures High Vibration Drive Train Generator Gearbox Defects in Rotating Elements 12 tures are rather slow changing measurements due to the thermal inertia of the components. This can be an advantage when analyzing data with a low sample rate, for example 10 minute average values stored in a SCADA system. For temperature this can be a sufficient resolution for condition monitoring. On the other hand, slow changing measures have only limited value in early failure pre- diction because they simply indicate a failure too late. Nevertheless, tempera- tures are often used as a secondary criterion in case, for example, the vibration monitoring shows an alarm.  Vibration Monitoring: One of the well-established technologies for rotating machinery is the analysis of vibration signals, since changes in mechanical equipment can lead to abnormal vibration signals long before a failure occurs. The vibration signals, recorded by different sensors, are usually transformed into a frequency domain and then analyzed. In WTs vibration analysis is applied to monitor shafts, bearings, gearboxes and blades. Shortcomings of this technology are the requirement of additional equipment and difficulties in detecting low- frequency faults.  Oil Analysis: Another broadly applied monitoring technique, especially in tur- bines with gearboxes. As shown in 2.2.1, gearboxes are especially critical in terms of reliability and therefore gear oil analysis commonly used for gearbox monitoring, as it is the only method for detecting cracks inside the gearbox. Usually the oil’s viscosity, oxidation, water content, particles and temperature are recorded either through offline-sample analysis or online monitoring. Even though modern on-line sensing methods, such as electromagnetic, flow or pres- sure-drop and optical debris sensing, are available, offline sample monitoring is often used due to the high cost for the online equipment.  Strain and Optical Monitoring: Recently, strain measurement and optical fiber monitoring for WT structures has received increasing attention as the fatigue loads the turbine is exposed to can be estimated. The measurements of strain gauges, which can be placed randomly on the structure, are processed with the help of finite element method to monitor the effects of the high dynamic loads. However, strain gauges are not very long lasting and these techniques require expensive measurement equipment. New approaches try to connect available SCADA-data measurements and short term strain measurements to extrapolate strain estimations. Such applications might help the technology to a broader ap- plication in the future [19]. The technologies presented in the previous paragraphs are mainly used to monitor a specific subsystem within the turbine. Other approaches widen the balance limits and aim for monitoring the global WT system. Different mechanical and electrical faults for 13 example lead to disturbances in the mechanical as well as in the electrical energy flow. Consequently mechanical torque oscillation can also be detected on the electrical side of the power train through power signal analysis. That way blade or rotor imbalances can be detected. A comparably simple method is the monitoring of process parameters. There, the values and relationships of temperatures, power, wind and rotor speed or blade angles are compared with specifications and limits determined by manufacturers. For this kind of analysis for example SCADA-signals can be used. More advanced ap- proaches based on parameter prediction and trending are not common today. However, the importance of condition monitoring is expected to further increase in the future, due to the earlier mentioned developments in the wind industry. The more ma- ture the new techniques become, the cheaper their application gets. Also, the cost of condition monitoring can be compensated with lower premiums for insurances reward- ing such systems [9] Developing towards more reliable, cost effective, integrated and smart solutions condition monitoring is about to become an integral part of modern maintenance strategies [7]). 2.2.4 SCADA based CM using Normal Behavior Models Today’s turbines are not necessarily equipped with sensors for stress, vibration or power analysis, but with numerous units collecting data for the SCADA system (compare 2.1.1). The SCADA system collects information about the turbine key features, which can be analyzed for condition monitoring purposes. Thus, the analysis of SCADA data can be a cost effective integrated way to monitor several critical components of a WT [5]. Different techniques, ranging from simple threshold checks to complex statistical analyses are used to detect anomalies. A comprehensive overview of publications and their proposed methods to analyze SCADA data for CM of WTs is provided by [20]. A common approach is the application of normal behavior models. Based on inputs extracted from SCADA data the model should be able to predict a target parameter under normal operating conditions. For anomaly detection the real time signal is com- pared with the estimated model output. The success of the approach is determined by the accuracy of the developed model. Here artificial intelligence methods have proven to be a sufficient tool for modelling complex systems, such as WT components [21]. Among different approaches neural networks showed particularly good results and were successfully applied in WT fault detection [22]. 14 Figure 2-6: ANN based CM approach [4] However, the utilization of SCADA data for CM comes with some challenges. Since the SCADA system was not originally designed for CM, not all parameters for a full turbine CM are available. Also, the data rate of 10 minute average values is too slow for some condition monitoring techniques [7]. Moreover, it can be difficult to trace back an anomaly in the data to its origin. Therefore, it is important to understand a failure’s specific impact on SCADA data. This knowledge can be achieved either through the analysis of data along with maintenance reports or with the help of data mining approaches, depending on data availability [21]. Nevertheless, exploitation of SCADA data for WT condition monitoring has successfully been demonstrated in several studies; see [4, 5, 6, 21, 23, 24 and 25]. 2.3 Artificial Neural Networks Artificial neural networks (ANN) are a concept of computing inspired by the biologi- cal structure brain. In analogy an ANN is able to acquire knowledge in a learning pro- cess. After training it can recall the learned patterns and input/output relations. Since the training data presented to the ANN can be theoretical, experimental empirical or a combination of these, ANNs can be used for a broad range of applications [26]. More- over, the network is able to generalize its knowledge to a certain extent and apply it to new input data it has never seen before. This makes it a powerful tool, well suited to model real world non-linear systems in engineering and science [27]. For problems, which are too complex for an analytical approach, ANNs can deliver an almost perfect approximation based on the experience drawn from the training data. However, this lack of analytical background comes with difficulties in explaining and judging the ANN’s output [26]. Even though the ANN is a black box model, it was demonstrated to be a useful tool in various applications [27]. The following sections give a general introduction into structure and functionality of ANNs based on [28]. 15 2.3.1 Building blocks of the Artificial Neural Network The fundamental information processing unit of an ANN is called a neuron. A neuron generates an output based on its input signals and consists of three basic elements: A set of synapses, an adder and an activation function (compare Figure 2-2). Figure 2-2: Model of a neuron [4] Synapses are characterized by a weight or strength, which is determined during model training. A neuron’s input signal at synapse j is multiplied with the synaptic weight . Subsequently, it is added to all other weighted input signals and a fixed bias value by a linear combiner (compare equation 2-5). This sum is input for the activation function which determines the neuron’s output then (compare equation 2-6). (2-5) (2-6) There are two different types of activation functions: Threshold and sigmoid functions. A threshold function is discontinuous and can assume a value of either 0 or 1 whereas a sigmoid function can assume any value between 0 and 1. Sigmoid functions are well balanced between linear and nonlinear behavior and the most common activation func- tions used in neural networks. Their shape can be influenced by variation of the slope parameter . Note that the sigmoid function becomes a threshold function for an infinite (compare equation 2-7). Figure 2-7 shows the corresponding graph for different shape parameters. (2-7) 16 Figure 2-7: The sigmoid function plotted with varying shaping parameters Neurons can be arranged in different architectures depending on the network’s purpose. A single-layer network, as the name suggests, consists of only one single layer of neu- rons which directly connect inputs and outputs. Multi-layer networks on the other hand contain one or more hidden layers. Outputs of the previous layer are used as input for the next layer. The elements of those layers, the hidden neurons, cannot be directly seen from either input or output of the network. Through hidden layers the network is able to model the higher order non-linearity in the input output relationship. In general, feed-forward and recurrent networks can be distinguished. In contrary to a feed-forward network a recurrent network has at least one feedback loop. Through feedback loops, non-linear dynamic behavior can be implemented and the performance of a network can be improved significantly. Figure 2-8 shows examples of different network structures. Figure 2-8: Examples for different ANN architectures [4] -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x  ( x ) a = infite a = 100 a = 25 a = 5 17 Neural network design is a challenging task, because of the lack of well-developed the- ory for network optimization. An architecture which is able to predict with accuracy must be found through experimental studies for a specific case. Two approaches are common to find the optimal network structure. The first option is to start with an over- sized network and remove synapses or entire neurons, if they are not active or carry only little weight. Starting with a small network and increasing the number of neurons until satisfactory solutions are achieved is the second option. Both approaches include a trial and error to find the network, which suits the application best. However, when modelling real world non-linear relationships generally two hidden layers lead to suffi- cient results [4]. 2.3.2 Network Training Methods ANNs are intelligent systems, which are able to learn from their environment. Knowledge about input/output relations is acquired through a learning process and stored in form of a network’s synaptic weights. After a successful training the ANN is able to use this information to interpret and predict parameters in consistence with the outside world. Depending on the network’s purpose, it can be trained for different tasks, such as pattern association, pattern recognition, function approximation or control pur- poses. There are two conceptual different learning methods for ANN training: super- vised and unsupervised learning. Supervised Learning In supervised learning input/output examples are presented to the network. The training data contains labeled data sets. Input parameters represent different environmental con- ditions and output parameters their desired network responses. A vector of input varia- bles is presented to the network and its actual response is compared with the optimal response of the training data set. In an iterative process, the difference between actual and desired response is minimized by adjusting the synaptic weights. Through this pro- cess of error-correction learning, knowledge which was previously stored in the pre- defined training data is transferred to the network. A scheme of supervised learning is displayed in Figure 2-3. Within supervised learning two classes of training methods are distinguished: batch and online learning, in batch learning all training data samples are presented to the network simultaneously, what is called an epoch. Multiple epochs are generated through random shuffling for feedforward networks and through splitting for recurrent networks to also train the weight of the feedback-synapsis. Once the performance shows no further im- provement, the training is finished. Through this parallel learning process, batch learn- ing is fast and ensures convergence to a local minimum. However, achievement of a global minimum is not guaranteed. Online learning on the other hand optimizes the syn- 18 aptic weights sample by sample. Once all samples have been presented to the network, one epoch is completed. Here the number of training epochs is also based on the per- formance improvement from epoch to epoch. Online learning is slower than batch learn- ing but simpler to implement and more responsive to redundancies. Figure 2-3: Scheme of supervised learning [4] Unsupervised Learning In case no labeled examples of the function to be learned by the network are available, unsupervised learning can be conducted. During the learning process a task independent measure of the desired network quality is optimized using competitive learning rules to adjust the synaptic weights. Consequently the network becomes tuned due to statistical regularities of the input data. Levenberg-Marquardt Algorithm There are multiple algorithms available to optimize the synaptic weights during model training. Within this thesis the Levenberg-Marquardt training algorithm (LMA) was used due to the fact that it is Matlab’s fastest and at the same time most accurate algo- rithm for networks of up to a few 100 weights [29]. The LMA updates the synaptic weights according to equation 2-8. (2-8) The regularization parameter is used to combine Newton’s method (for and Gradient descent method (for overpowering for a fast convergence. H is the ap- proximated Hessian matrix, the identity matrix with the same dimensions and the gradient vector of the cost function (compare equations 2.9 – 2.11). (2-9) 19 (2-10) (2-11) is the training sample and the approximating function repre- sents the network. For additional information about optimization algorithms for network training refer to [28]. 2.3.3 Application of Artificial Neural Networks in Wind Turbines ANNs have the ability to model very complex non-linear relations and are therefore well suited for applications in WTs. They are mainly used to analyze the large sets of measurements from CM-sensors or the SCADA system. Also, they are applied to pre- dict or optimize the power output and give information about turbine or component condition. Some of these approaches are highlighted in the following paragraphs. An approach for optimizing the power factor and production of a WT was presented by [30]. A control approach based on different data mining algorithms was generated to optimize settings of the blade pitch and yaw angle. ANNs with different configura- tions were tested against a classification and regression tree as well as a support vector machine regression. The ANN based model showed the best results and it was shown that information drawn from historical SCADA data can significantly improve a tur- bine’s power output. A methodology analyzing SCADA data with four data mining algorithms to predict turbine failures was presented in [31]. Here the turbine’s power curve was modelled by each of algorithm and used to determine turbine health. Failures were classified by occurrence, severity and the specific fault. The model was able to detect failures in advance and the approach using ANNs was identified as the best. A similar team con- secutively used ANN’s for normal behavior modelling of bearing temperatures in WT [32]. An intelligent system for predictive maintenance for WT monitoring was subject of [33]. Within this framework multilayer perceptron ANNs were used to create normal behavior models for failure detection. This knowledge captured by the networks was then combined with a fuzzy expert system for fault diagnosis and maintenance optimi- zation for WTs. Based on this, an on-line health condition monitoring tool, called SIMAP was developed and its application was presented for WT gearbox monitoring. Following a similar method, an ANN based normal behavior model for gearbox- and generator bearing temperatures was developed and presented in [21]. Gearbox bearing temperature and generator winding temperature were predicted and used for fault de- tection. 20 A comparative analysis of neural network and regression based condition monitoring approaches for WT fault detection is conducted in [22]. The developed models are applied to five real measured faults. The comparison between the approaches reveal that ANN based models are best suited for failure detection, because they give earlier and clearer indication of damages. Moreover, it was realized, that the investigated bearing failures were easier to detect than the stator anomalies. The same authors de- scribe the development and application of a method combining ANN based normal behavior models and fuzzy logic in [23] and [34]. Such an adaptive neuro fuzzy infer- ence system allows implementation of expert knowledge in addition to ANN data analysis. A large number of normal behavior models is developed using 33 SCADA standard signals. The comparison with an ANN model shows that the selected ap- proach has advantages in model training speed and fault diagnosis can be conducted using the fuzzy interference system. 2.3.4 Neural Networks in MATLAB Within this thesis, the numerical computing environment MATLAB was used for data processing and the ANN based analysis. Therefore, the WT data, which was extracted from the SCADA-system in the txt-format, was converted into csv-files and then im- ported into the MATLAB environment for processing and analysis. The following sec- tions give a quick overview of the features and inbuilt functions used within this thesis. MATLAB offers a so called Neural Network Toolbox, which contains functions and apps for ANN-modelling and application. The program provides a graphical user inter- face which facilitates model design and training through visualization and predefined figures. However, all implemented functions can also be manually called and modified within a MATLAB-script. The toolbox supports different supervised and unsupervised network architectures, ranging from relatively simple feedforward networks to complex dynamic or pattern recognition networks and thus allows choosing the most suitable configuration for the specific application. Also, several training algorithms are implemented, including gradi- ent descent methods, conjugate gradient methods and the LMA. Moreover, the toolbox features various pre- and post-processing tools [35]. Throughout the thesis the software was found to be a useful tool for data processing and neural network analysis. The wide range of implemented functions facilitates the application of complex mathematical concepts significantly. However, using these pre-defined functions for a complex analysis still requires a complete understanding of the theoretical background, to be able to appropriately assess and judge the corre- sponding outcomes. The current and the following chapter should be seen in this con- text. 21 2.4 Statistical Background Statistics helps us to understand and learn from data with the ultimate goal to translate data into knowledge [36].Within this thesis, large data sets are analyzed with the help of statistical tools to gain knowledge about the condition of technical components of a WT. The statistical tools which are hereby applied will be introduced in the following sections. 2.4.1 Basic Statistical Measures The following paragraphs give a short introduction of the statistical standard measures which are used in this thesis either directly or as an input for more advanced analysis. If not referenced otherwise, the explanations are based on [36]. Mean Absolute and Mean Square Error For model performance evaluation two measures are used in this thesis: the mean abso- lute error (MAE) and the mean squared error (MSE); both are commonly reported num- bers in the evaluation of time series prediction [37]. The MAE is calculated as the aver- age deviation of the predicted variable from the target value without taking their direc- tion into account (compare equation 2-12) and it provides a vivid indication of the mod- els quality. The MSE, however, is the most common performance function used to train neural networks [29] and calculated as shown in equation 2-13. Both equations are used for model assessment where fi represents the model’s output and yi the actual target measurement for the time step i for a total number of n time steps. (2-12) and (2-13) Variance and Standard Deviation When the variability of a parameter is analyzed it is usually reported as a deviation from the mean. Hereby the average of the squared deviation from the mean is called variance (compare equation 2-14). Since the variance uses squared units it is much easier to interpret its square root, the standard deviation (compare equation 2-15). (2-14) and (2-15) In both equations n represents the number of points and is the mean of the sample x. Looking at equation 2-15, it is obvious, that the larger the standard deviation, the higher is the variance. 22 Covariance and Correlation Also, the association between variables is of interest, especially when explanatory vari- ables are required in modelling. The so called covariance and the correlation describe the strength of the linear association between two quantitative variables. The covariance can be calculated with equation 2-16. For multidimensional parameter associations, the covariance matrix is a helpful tool, where matrix element of position m,n is c . (2-16) N represents the number of points and and are the means of the samples x and y. The indicator commonly used to assess parameter relations is the correlation coefficient , which is the normalized covariance. The correlation coefficient can be calculated by equation 2-17. (2-17) Here, is the total number of elements, and and are the standard deviations and and the means of the samples x and y. The correlation coefficient shows the following properties:  r is always in the range of -1 to +1 and the stronger the linear association, the closer it is to the absolute value of 1.  A negative r indicates a negative and a positive r a positive association.  r has no unit and is identical, not matter which one is the explanatory and which the response variable. In case two signals are strongly associated but shifted relatively to each other, caused by a delay for example, a simple correlation analysis might not be able to detect the rela- tion. Therefore, the correlation between two signals is calculated while one signal is shifted step-by-step relative to the other. This so called cross-correlation analysis allows identifying correlations even if the signals are shifted and is widely used in signal anal- ysis. 2.4.2 Distributions When analyzing the outcome of a model not only the absolute values, but also the fre- quency of occurrence of these values can be important. A variable’s probability distri- bution gives answers to both questions. This information can be used to separate more frequent regular outcomes from rare irregular ones, for example by defining a threshold 23 based on a value’s frequency of occurrence. The theoretical background of distributions used within this thesis is explained in the following sections based on [38] and [36]. The probability distribution of a variable is typically specified by a probability density function (PDF), which determines the probability that a variate takes the value x (com- pare equation 2-18). It is practical to normalize the PDF with the total area under the curve. Then the area under the curve above any particular interval corresponds to the intervals probability of occurrence and total area below the curve equals a probability of 1. The integration of the PDF results in the cumulative distribution function (CDF) (compare equation 2-19). The CDF represents the probability that the variable takes a value less than or equal to x. (2-18) (2-19) Visualization of a variable’s distribution can be done with the help of histograms or by an approximated continuous distribution functions. Within this thesis the normal distri- bution and a two parameter Weibull distribution were used. Normal Distribution The normal distribution is the most important distribution in statistics, partially because many variables appear to be normally distributed by nature but mainly because of the central limit theorem. It says that the sampling distribution of the mean becomes ap- proximately normal even if the original variable was not normally distributed. The nor- mal distribution is characterized by a symmetric, bell-shaped curve and can be de- scribed with two parameters – the mean µ and the standard deviation σ (compare equa- tion 2-20). (2-20) Figure 2-9: Normal distribution with different parameter configurations -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Values D e n s it y  = 0 and  = 0.3  = 0 and  = 1  = 1 and  = 0.5 24 One of its important characteristics is that the probability of occurrence within any number of standard deviations from the mean is identical for all normal distributions. Also, it describes the distribution of continuous, random variables. Therefore, the error is often assumed to be normally distributed in modelling applications. Weibull Distribution Another widely applicable distribution is the Weibull-distribution. It plays an important role in reliability and it is also used to describe site wind resources. The Weibull is a flexible distribution and its shape can be influenced by the shape parameter γ, its loca- tion parameter μ and its scale parameter (compare equation 2-21) [38]. In case the location parameter equals zero (μ=0) it results in the two parameter Weibull distribution used in this thesis (compare Figure 2-10). Also, it includes the Extreme Value Distribution ( and ) as well as the Rayleigh distribution ( and ) as special cases [38]. (2-21) The CDF for the two parameter Weibull distribution can be calculated following equa- tion 2-22. (2-22) Within this thesis the parameters of the Weibull distribution function are estimated us- ing the MATLAB inbuilt function wblfit, which uses the maximum likelihood method for approximation. The parameters are then inputs for the MATLAB function wblcdf, which calculates the CDF function based on the PDF-parameters. Figure 2-10: Weibull distribution with different parameter configurations 0 0.5 1 1.5 2 2.5 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Values D e n s it y  = 1 and  = 1  = 1 and  = 1.5  = 2 and  = 3 25 2.4.3 Mahalanobis Distance The anomaly detection methodology applied in this thesis is based on the MHD and a good comprehension of the measure is therefore useful (based on [39]). The MHD is a unit less, multidimensional distance. It is calculated similarly to the better known Eu- clidean distance but takes the covariance of its values into account which allows captur- ing the correlation between the variables (compare equation 2-23). for i=1 to n (2-23) Here, is the i th vector from a total of n observations and is the vector of its means. The graphical interpretation of the MHD in a two-dimensional variable space shows elliptic lines representing equivalent MHDs from the sample center. The shape of the ellipses is influenced by the correlation between the variables (compare Figure 2-11) Figure 2-11: Mahalanobis distances based on a sample (white) with its center (red) Figure 2-11 shows relative MHDs based on a basis sample (white data points). It can be observed, that the distance measure reacts much more sensitive to data points which are not ‘in line’ with the basis sample. This feature makes the MHD useful for outlier detec- tion, where it was successfully applied in many fields. -4 -3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 X1 X 2 Basis Sample Center M a h a la n o b is D is ta n c e 50 100 150 200 250 26 2.5 Gearbox Condition Monitoring Approach Successful condition monitoring using normal behavior models consist of two main parts. Firstly, a model is required that is able to predict the target variable with high ac- curacy. And then an approach has to be developed which is able to reliably distinguish model inaccuracies and abnormal conditions. Among others, a promising approach for condition monitoring based on SCADA data was presented in [4] (compare 2.3.3). Be- cause the present thesis aims to further develop and apply this approach, it will be introduced more detailed in the following sections. 2.5.1 Gearbox Model The present approach uses a NARX ANN to model the normal behavior of gearbox bearing temperatures. The ANN contains 20 neurons with sigmoid activation functions in the hidden and one neuron with a threshold function in the output layer. The tempera- ture of the monitored bearing is modelled using the five input parameters displayed in Table 2-2. Table 2-2: Specification of the present gearbox model ANN Type NARX Layer Hidden Output Neurons 20 1 Activation Function Sigmoid Threshold Inputs Power Rotor RPM Nacelle Temperature Gear Oil Temperature LSS Bearing Temperature Outputs HSS Bearing Temperature Also an automated approach for training data selection is presented in [4] to prevent over fitting and speed up the training process. This training data selection procedure, however, was not followed in this thesis, as over fitting did not occur and moderate training times were achieved. Moreover, a basic pre-filtering was conducted, which was found to be crucial to prevent false network training and thus was extended in the present work. 2.5.2 Anomaly Detection Approach One of the challenges in the application of ANNs for condition monitoring is the appro- priate judgement of model output. When is a prediction error due to inaccurate model- ling and when does a deviation from the measured value indicate a component failure? As the ANN lacks of physical understanding of the modelled component, this questions, 27 has to be answered with the help of statistical tools. Therefore, the RMSE and the Mahalanobis distance were compared in [4], in which the latter was found to be the more robust and thus the more adequate measure to detect malfunctions in WT com- ponents. For calculating the MHD during condition monitoring stage the data set containing the SCADA-measurements of the target variables and the corresponding model errors are combined (compare equation 2-24). Afterwards their MHD values are calculated using equation 2-25, where is the mean error during training and is the covariance matrix for the healthy data during model training. (2-24) (2-25) Threshold Definition To decide whether a data point is reflecting abnormal behavior, an appropriate thresh- old value has to be defined. As a prerequisite, the training data of the normal behavior model has to be free of failures and represents the healthy component condition. Un- der that assumption it can be concluded that errors during model training are due to inaccuracies of the ANN model. This information is taken into account, when decid- ing the threshold value for anomaly detection. That’s why the threshold value is calcu- lated based on the model errors during training stage and data points in monitoring stage which show a high MHD compared to the MHDs obtained during training stage can be labeled as outliers The MHD values during the healthy turbine state, namely during network training, is calculated using equation 2-26 and 2-27. represents the model’s training errors and the SCADA measurements of the target parameter during the trining period. (2-26) (2-27) (2-28) The distribution of the MHD values during training was found to be accuratley represented by a two-parameter Weibull probability distribution function (compare 2.4.2). Hence any data point during condition monitoring stage is defined as an outlier, 28 if the occurance of its MHD in a healthy turbine is less than 1% (comopare equation 2- 12) [4]. In addition, gearbox-related SCADA alarms where taken into account, to judge the turbine condition. 2.5.3 Anomaly Detection Application The presented approach was applied to a turbine with a gearbox bearing failure in [4] which was detected several days before the vibration monitoring alarm which lead to an inspection where the failure was discovered. For anomaly detection the MHD was averaged over three days and then compared to the calculated threshold, since the MHD reacts much more sensitive to outliers than for example the RMSE. The averag- ing ensured that the threshold is only violated in case of high MHD-values over a longer period and thus it can be concluded that the health of the monitored component is seriously affected. Therefore, false alarms are based on model errors are excluded which increases the robustness of the approach. Figure 2-12 shows the development of the averaged MHD-measure and the threshold value in a successful failure detection case presented in [4]. Figure 2-12: The averaged MHD violates the threshold several days in advance to a gearbox bearing failure in [4] 29 3 Model Development Against the presented background this thesis aims to further develop and apply the anomaly detection methodology introduced in chapter 2.5. Therefore, the following chapters describe the general model development process followed within this thesis and explain its subtasks. Within the chapters 4 and 5 the described approach is applied for CM of WTs. 3.1 Model Development Process The process of developing an ANN based normal behavior model can be divided into multiple subtasks which together represent an iterative development process. Before a first model training the input and output parameters have to be selected according to the desired application. Moreover, a suitable ANN architecture has to be specified. Lastly, a data pre-processing approach has to be developed, to enable appropriate model training. After completing these tasks the model can be trained and the result should be verified during a testing and validation process, where the model is applied to healthy and faulty WTs. When developing an ANN, it can be difficult to find the optimal network configu- ration for a specific application, since the performance depends on all the previously described factors and processes. Thus, finding a suitable ANN for an engineering appli- cation is always an iterative process, where the pre-training configurations are varied until a sufficient result is achieved (compare Figure 3-1) [26]. The following sections describe the general approaches followed by this thesis in the development of the ANN based normal behavior models. Figure 3-1: Schematic flow chart of the iterative model development process 30 3.2 Parameter selection The selection of appropriate input and output parameters is an essential part of ANN development. In a first step the target parameter has to be selected. Potential component failures should manifest themselves in the chosen measurement, to enable failure detec- tion. This shows the importance of target parameter selection for successful anomaly detection. In many cases there is only little choice because of the limited availability of measurements addressing the malfunction. In fact, the applicability of the approach of- ten depends on the availability of potential target measurements. The selection of input parameters, on the other hand, is more complex. Relevant input parameters have to be chosen in a way, so that the model is able to predict the target parameter under normal operating conditions with sufficient accuracy. This ensures a detectable deviation between the model output and the actual parameter measurement during a malfunction in the corresponding component. In contrast to the target parame- ter selection there usually is a big number of potential input measurements to choose from. Here, the physical relations between the turbine components which result in cor- relations between the corresponding parameters play a key role. However, only few works have considered correlations between parameters of the SCADA system at the stage of parameter selection [24]. In this thesis a comprehensive study of the correla- tions between component related parameters has been conducted. Figure 3-2 shows the correlation coefficients between selected parameters. Data representing almost 10 WT years has been analyzed and the results have been taken into account when selecting the model inputs. Figure 3-2: Correlation matrix between different SCADA-parameters 1 2 3 4 5 6 7 8 9 10 11 WTG19_Generator Bearing Temp. Avg. (1) WTG19_Generator Phase1 Temp. Avg. (2) WTG19_Generator RPM Avg. (3) WTG19_Hydraulic Oil Temp. Avg. (4) WTG19_Gear Bearing Temp. Avg. (5) WTG19_Gear Oil Temp. Avg. (6) WTG19_Nacelle Temp. Avg. (7) WTG19_Rotor RPM Avg. (9) WTG19_Ambient Temp. Avg. (9) WTG19_Ambient WindSpeed Avg. (10) WTG19_Grid Production Power Avg. (11) Average Correlation Between SCADA-Parameters A v e ra g e C o rr e la ti o n C o e ff ic ie n t R 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 31 When choosing the model inputs and outputs, two main objectives were considered. Firstly, the performance of the normal behavior model was optimized to achieve a suffi- cient accuracy and secondly, the model has to correctly indicate failures as well as pre- vent false alarms during the application stage. Both conditions were evaluated in the validation process described in chapter 3.6. It has been realized that the choice of input parameters should not be based on statistics only. Even though an input parameter with a high correlation to the target parameter will probably result in a performance improvement, it can lead to problems in anomaly detection. This is especially critical if two parameters show high correlation and similar behavior in case of a component failure. Due to the high correlation the input parameter is likely to get highly weighted during model training. Thus the parameter will have a big influence on the model output and improve the model’s performance significantly, since it gives a clear indication of target parameter. In case of a failure however, this results in a ‘correct‘ prediction of the abnormal target parameter behavior, which is then labels as ‘normal’. Figure 3-3 gives an example of such a case. Hence, the turbine’s physical system relations have to be taken into account during the selection process to avoid such model behavior. Figure 3-3: Example for ‘correct’ prediction of abnormally high bearing temperature by normal behavior model due to incorrect choice of input parameters 3.3 Model Architecture As mentioned earlier, there is no established standard method for neural network design and thus a suitable and stable network has to be found in a trial and error process (com- pare 2.3.1). After defining the input and output parameters, which in engineering appli- cations are often defined by the technical problem itself, a network topology has to be determined [26]. Within this thesis, the model architecture was selected based on find- 40 45 50 55 60 65 70 75 80 T e m p e ra tu re i n ° C Model Validation Gearbox Measured Temperature Modelled Temperture 32 ings of related projects. In [4] a NARX network with 20 hidden neurons was success- fully applied for detection of a gearbox failure (compare 2.5.1). The same configura- tion was found to be sufficient in [40] where parametric studies were carried out to find the best model architecture for modelling the power output of a turbine. This is why this configuration was chosen for both models presented later within this thesis. Table 3-1 sums up the selected ANN topology. Table 3-1: ANN architecture specification for all developed models ANN Type NARX Layer Hidden Output Neurons 20 1 Activation Function Sigmoid Threshold 3.4 Data Pre-Processing After successful determination of model architecture and parameters, the network needs to be trained. To build a functioning normal behavior model, the training data presented to the ANN has to represent normal operating conditions of the turbine. This is especial- ly important since the synaptic weights are decided solely based on the training data, without any physical understanding of the system. If a model has been trained with er- roneous data, it might not be able to identify abnormal behavior as such and thus fail its purpose. Unfortunately, data extracted from SCADA system is usually not ‘clean’. Malfunctions in the SCADA communication system, sensor or signal processing errors and standstill during maintenance and repair actions lead to missing and faulty data points, hidden in the large data sets. Also it cannot be guaranteed, that the complete data set selected for training does not contain any traces of minor errors during this period. To make sure that the ANN training is not distorted by such measurements, faulty data is removed from the training data set by applying an initial data screening and filtering process. In general, it was realized that SCADA systems form different manufacturers report the measurements with variable reliability. Some systems reported more than 95 % of the yearly operational data points correctly, whereas in others only around 50 % of the data sets were complete. This also depends on the recording philosophy. Some systems keep recording measurements, when the turbine is out of operation, others do not. However, sufficient model training was found to be possible also in cases with only half of the training subsets available, provided that the training data set covers the whole range of normal operation throughout the application period. 33 Since the data pre-processing is model specific, it is described in the corresponding model sections (compare chapter 4.1.2 and 5.1.2). 3.5 Model Training Model training is a crucial factor for the successful application of ANN based normal behavior models, since the application performance highly relies on the training data presented to the net. The data pre-processing ensures that unhealthy data is removed from the training sets. However, it is not guaranteed that the training data covers the full range of normal operating conditions. This is particularly important because at presence ANNs are not good at extrapolating information beyond the training domain [26]. On the other hand, too much training data leads to extensive training times and overfitting, which again results in a decrease of the models application performance. This is why it is important to select appropriate training periods. 3.5.1 Training Period and Turbine Individual Networks For sufficient model training, it is very important, that the training data presented to the network covers the complete scope of the relevant parameters as well as their combina- tions and patterns for healthy turbine behavior. For the turbines located in Sweden, dis- tinctive seasonal variations of operating parameters, especially temperatures, were ob- served (compare Figure 3-4). Consequently, training data representing the period of a whole year was used to train the networks, if available. 70 % out of this data is used for model training, 15 % for testing and 15 % for an initial validation. Figure 3-4: Turbine specific behavior profile of gear bearing temperatures throughout a year [4] Figure 3-4 also explains why it was decided to train one individual model for each tur- bine instead of developing one general model which can be applied to several turbines. 34 It can be observed that the same parameter, in this case a gear bearing temperature, shows significantly different behavior from turbine to turbine, even though all of them are located in the same geographical area and are facing similar environmental condi- tions. Therefore, individual models can approximate the selected operational parameters more precisely and thus are better suited for accurate normal behavior modelling. Indi- vidual turbine behavior can be modelled with the help of ANNs by training the network with data from a particular turbine, resulting in a unique, turbine specific model. 3.5.2 ANN Training The LMA, which is used within this thesis to train the ANNs, starts model training with a random initialization of synaptic weights, which are then optimized (compare chapter 2.3.2). This means that networks which are exposed to the same training data sets get slightly different synaptic weights assigned during the training process. In general, this is not a problem, since the differences are marginal, but it is possible that the training process gets stuck in a local minimum which leads to a relatively bad performing mod- el. In order to prevent this, n-number of ANNs are trained with the same input data and the model with the best performance is consecutively chosen. This ensures that the model, which will later be applied for anomaly detection does not show particularly bad performance. Within this thesis, the number of trainings to choose the best ANN from was arbitrarily chosen as three. However, a larger number can be chosen but at cost of computation time. 3.5.3 Inconsistencies in ANN Training The random initialization of the synaptic weights at the beginning of the training pro- cess leads to a unique ANN at the end of each training session. The best-of-three- trainings practice, described in the previous paragraph, excludes the possibility of an unusually bad training result. It has been observed that different trainings lead to net- works which model the target parameter with only small variations, if the application input is in the range the network has been trained for (compare top chart Figure 3-5). Nevertheless, the random synaptic weight initialization can cause problems in anomaly detection stage. In case data is presented to the network it has not seen during training, which is a plausible scenario in case of a malfunction, since the network has been trained with healthy turbine data only, the ANNs might have weaknesses in extrapolat- ing beyond the training domain. That can lead to model responses, which differ signifi- cantly from training to training in case of a malfunction (compare bottom chart Figure 3-5). 35 Figure 3-5: Bearing temperature measured and modelled with different trainings for healthy (top) and faulty (bottom) turbine This behavior can lead to inconsistent results in anomal