Recognizing safety-critical events from naturalistic driving data Master’s Thesis in the Master’s programme of Automotive Engineering NIEVES PAÑEDA GONZÁLEZ Department of Applied Mechanics Division of Vehicle Safety CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden 2011 Master’s thesis 2011:38 MASTER’S THESIS 2011:38 Recognizing safety-critical events from naturalistic driving data Master’s Thesis in the Master’s programme of Automotive Engineering NIEVES PAÑEDA GONZÁLEZ Department of Applied Mechanics Division of Vehicle Safety CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden 2011 Recognizing safety-critical events from naturalistic driving data Master’s Thesis in the Master’s programme of Automotive Engineering NIEVES PAÑEDA GONZÁLEZ © NIEVES PAÑEDA GONZÁLEZ, 2011 Master’s Thesis 2011:38 ISSN 1652-8557 Department of Applied Mechanics Division of Vehicle Safety Chalmers University of Technology SE-412 96 Göteborg Sweden Telephone: + 46 (0)31-772 1000 Cover: Curso de amaxofobia para profesores de autoescuela en Córdoba, 2011, Nicole Kidman, Available at: [Accessed 20 May 2011]. Chalmers Reproservice / Department of Applied Mechanics Göteborg, Sweden 2011 I Recognizing safety-critical events from naturalistic driving data Master’s Thesis in the Master’s programme of Automotive Engineering NIEVES PAÑEDA GONZÁLEZ Department of Applied Mechanics Division of Vehicle Safety Chalmers University of Technology ABSTRACT New trends in research on traffic accidents involve conducting Naturalistic Driving Studies (NDS). NDS are based on large-scale data collection of driver, vehicle and environment information in real-traffic. NDS provide large data sets which have proven to be extremely valuable for the analysis of safety-critical events such as near crashes and incidents. NDS data needs to be filtered to recognize safety-critical events. Filtering safety-critical events has been traditionally achieved by using kinematics triggers (e.g. searching for deceleration below a certain threshold signifying harsh braking). The low sensitivity and specificity of this filtering procedure, however, requires manual annotation of video data to decide whether the events individuated by the triggers are actually safety-critical. Such reviewing procedure is based on subjective decisions, time-consuming, and often tedious for the analysts. This project looked into improving this reviewing procedure using video data collected from 100 Volvo cars during one year in Gothenburg within a NDS called euroFOT. More than 400 videos from the triggered events have been reviewed, concluding that driver’s reaction may be the key to discriminate safety-critical events. In fact, whether an event if safety-critical or not depends on the driver. Several statistical procedures have been then applied to automatically recognize driver reaction from video data. In this project, we showed how combining automated video analysis with kinematics triggers increases sensitivity of near crash recognition from NDS data. These results open up to new ways to use video frames in NDS. Key words: naturalistic driving, driver behavior, traffic safety, near crashes, safety- critical events, driver’s reaction, euroFOT II III Contents INTRODUCTION 1 Naturalistic Field Operational Tests: real-traffic data 1 State of the art of N-FOTs: EuroFOT 2 Available data from VCC (euroFOT) 4 Data reduction approach: triggering data 4 What is safety-critical? Driver behaviour in NDS 6 Purpose 7 METHODS 8 Driver’s reaction recognition. General assumptions 9 Definition of training sample 10 Recognition of driver’s reaction. General structure 11 Data description & Image pre-processing 11 Recognition of driver’s reaction in sequences 13 Silhouette detection in STD of Jerk images 23 Evaluation criteria. Data set definition 33 RESULTS 35 Performance in the training sample 35 Optical Flow 37 Mean criterion 37 Harmonic mean 40 Mean&General mask 40 GLCM properties 41 Results in the validation data set 42 Mean criterion 42 Harmonic mean 44 Ranges of jerk from OF 44 GLCM properties 46 Analysis of false negatives and positives 47 Mean criterion in motion’s detection 49 Comparison 51 DISCUSSION & CONCLUSIONS 54 Where did the idea of recognizing driver’s reaction come from? Triggering in euroFOT based on the 100-Car study algorithms 54 Recognizing drivers’ reaction as potential trigger 57 Final conclusions 60 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 IV REFERENCES 62 APPENDIX 1 66 APPENDIX 2 73 APPENDIX 3 76 APPENDIX 4 80 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 V Preface This project ends the academic formation that I held in Spain over the last years. “Recognizing safety-critical events from naturalistic driving data” has given me the opportunity to learn about traffic safety in the multicultural environment of an open area at SAFER. Personally and professionally, I will never forget this experience in Sweden. There are many thanks that I would like to share: Thanks to those who make the naturalistic driving studies possible. Specially, thanks to Volvo Cars for allowing me access to their database in this research. Thanks to all the participants who have been recorded while driving for their collaboration in gaining knowledge about driver behaviour. Without them this project wouldn’t be possible. Thanks to SAFER, where I was working during the last months. It was a pleasure to be part of this family and the incredible work of this group to save lives. It’s said that a good teacher teaches, and the best teacher inspires. To my supervisor Marco Dozza, thanks for inspiring me during this project. Thanks for this opportunity, for trusting me from the beginning and for your guidance during these months. I feel very lucky to have worked not only with a great professional, but a great person. Thanks to the University of Oviedo for letting me participate in this international exchange. Specially, thanks to my supervisor in Spain Ramón Rubio. To my friends and to everyone I’ve shared experience, thanks for making unforgettable this year in Göteborg. To my family, muchas gracias for your unconditional support in my life plan. Göteborg June 2011 Nieves Pañeda González CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 VI CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 1 Introduction This chapter presents the reader with an overview about Naturalistic Driving Studies (NDS) and their implementation together with Field Operational Tests (FOTs). In particular, the euroFOT project is introduced as a base of this project. This chapter also covers the limitations found in previous studies and formulates the research question and objectives for the present project. 1.1 Naturalistic Field Operational Tests: real-traffic data Statistics said than more than 1.2 million people die on the roads in traffic accidents every year (WHO, 2009). Technological advances allow the development of new systems in cars to mitigate road accidents by automatically detecting risk situations. To make it possible, it is essential to know which the real causes of accidents are. New trends in research on traffic accidents involve conducting Naturalistic Driving Studies. Naturalistic Driving Study (NDS) as concept refers to a “method of observation that captures driver behaviour in a way that does not interfere with the various influences that govern those behaviours” (Boyle et al., 2009). Statistics and crash investigations rarely provide information about behavioural issues before the incident. In simulations, test subjects are well aware of the experimental conditions. Thus, NDS aim collecting data on driver behaviour in a natural setting. In this naturalistic observations drivers use, preferably, their own car equipped with cameras during their daily driving. Experience in this field shows that drivers quickly forget the presence of cameras. On the other hand, new technologies enable the collection of an extended amount of data, such as vehicle dynamics or the environment, in real traffic within large-scale testing programmes called Field Operational Tests (FOTs). FOTs are studies undertaken to evaluate the efficiency of intelligent in-vehicle systems as well as the impact on safety and the driver acceptance, among others (ERTICO, 2009). The main purpose of these systems is to assist and inform drivers while driving. This concept applied to the field of safety embraces alerting the driver or automatically acting in the car in presence of what the system understands as a risky situation. To sum up, FOTs are a complementary step to the development of intelligent in-vehicle systems. The procedure is mainly based on: -Instrumenting cars with loggers to collect information from the CAN bus (signals from accelerometers, gyroscopes, turn indicators, etc.), GPS and/or extra sensors. -Driving such equipped cars to collect data. -Performing analysis from collected data. Although FOTs and NDS pursue different objectives, this view is changing. Combination of both, called Naturalistic Field Operational Test (N-FOT), allow the use of this unobtrusive observation of drivers to evaluate their relationship with the car and the environment under crash-risk and the effectiveness of intelligent in-vehicle systems. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 1 1.1.1 State of the art of N-FOTs: EuroFOT During the last years, FOTs and N-FOTs have been conducted in the United States, Asia and, relatively new, in Europe. Particularly, US has extensive experience in NDS with programs as 100-Car study, 250-Truck study, the Commercial Vehicle Operation study or the Strategic Highway Research Programme (SHRP2). The 100-Car Naturalistic Driving Study (Dingus et. Al, 2006) was the first large-scale program where data from 100 drivers were collected during one year. The main goal of this research project was the study of contributing and associative factors (such as driver behavior, kinematic characteristics and corrective actions) in critical situations. In the ongoing SHRP2 project (TRB, 2011), data from 3000 volunteer drivers in instrumented cars will be collected. Main goals are to redesign highways (congestion reduction, planning, environmental conditions) and to study human behavior for a safer highway. Among the European experience in this field can be highlighted the contributions of SAFER, the Vehicle and Traffic Safety Centre at Chalmers University, in Sweden. Programs as SeMiFOT (Victor et. al., 2010) in collaboration with Michigan, carried out the development of a N-FOT methodology. Data were collected from 14 vehicles during six months, with the participation of 39 drivers that made 12.571 trips.The methodology is widely used in accident research and evaluation of safety and acceptance. The ongoing second version SeMiFOT2 is using the data collected in the first version of the program. New statistical methods, such as extreme value theory, are being explored to identify and model outliers. This provides useful information for insurance companies, for instance, to establish a link between rare events and catastrophic consequences (García, 2004). In addition, the analyses of visual motion in drivers are one of the main lines of research. Other ongoing European projects are TeleFOT, 2BeSafe NDS, INTERACTION, TSSFOT, simTD and euroFOT (ERTICO, 2010). Particularly, this research has accessed the data collected in euroFOT. Characteristics of this program are further explained below. Co-founded by the European Commission, euroFOT began in May 2008 and will last until February 2012 supported by 28 partners (vehicle manufacturers, automotive suppliers, and research institutes among others). As stated in the previous section, intelligent in-vehicle systems are tested to explore potential ways to improve European road traffic. The tested applications in euroFOT may be classified as (ERTICO, 2010): •Assisting the driver in forward/rear directional safety: - Adaptive cruise control - Forward collision warning - Speed Control System • Assisting the driver to detect hazards at the sides of the car: - Blind Spot Information System - Lane departure warning / Lane Assist / Impairment Warning • Advanced applications: - Curve Speed Warning - Fuel Efficiency Adviser - Safe Human/Machine Interface 2 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 These functions have been tested in a fleet of 1000 instrumented cars from nine different brands across France, Germany, Italy and Sweden. This has led one of the largest and most completed FOT’s databases in Europe for public research. As can be seen in Figure 1.1. FOTs are operated on fleets managed by different OEMs around Europe. Figure 1.1 Geographical coverage of euroFOT: OEMs and operation sites. (Mure S., 2010, EuroFOT [electronic print] Available at: [Accessed May 2011]). Depending on the project and the OEM, various devices are part of the test equipment to collect data. These may be classified according to the source of the recorded signals: -CAN bus. -CAN bus and video cameras. -CAN bus, video cameras and extra sensors (as eye tracker). In addition to the test and evaluation of intelligent in-vehicle systems, some research focuses on naturalistic observation, hence the implementation of cameras in the cars. In any case, the resources for data collection and storage are common in both types of projects. Another type of drivers’ data comes from interviews and questionnaires. Both the kinematics of the car from loggers and camera images have proved very useful when studying the interaction between driver, vehicle and the environment during a crash risk situation. The knowledge on driver behaviour and dynamics of the car before an accident allow for hypothesising possible causes. This is a step towards the inclusion of new measures in accident prevention. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 3 1.1.2 Available data from VCC (euroFOT) In particular, this research has accessed the data collected from 100 Volvo Cars driving for a year in Gothenburg within euroFOT program. After a certain period of continuously data collection, information from loggers was downloaded and transferred to a network. Then, these signals have been post-processed and stored into MatLab variables. The available data are mostly signals from the CAN bus sampled at 10 Hz, GPS information, video images and signals from the eye tracker. These provide information on, for example, kinematic values (such as speed, lateral and longitudinal acceleration, brake pressure, yaw rate, steering wheel jerk, among others) or signals from intelligent in-vehicle systems and turn indicators. A total of four cameras are installed in each of the instrumented cars. Two are located in the front and back of the cars to mainly reconstruct rear-end crashes and evaluate the traffic flow. One is located under the steering wheel, to record the pedals and the feet movements. Finally, another camera is located in the rear-mirror, focusing the driver. The eye tracking is also available. 1.2 Data reduction approach: triggering data To understand the causes of road accidents and be able to further develop countermeasures is essential to analyze safety critical situations. The identification of safety critical situations among hours of normal driving is a limitation when loggers and cameras are continuously recording. Therefore, once data are collected, a filtering process is carried out before performing analysis (see in Figure 1.2). This process is commonly called triggering the data. The main goal of this data reduction approach is the discrimination between normal driving situations (negative situations) and the critical events (positive situations) while driving. Figure 1.2 General steps before the evaluation of safety in FOTs. A more precise definition of what those critical situations are, is given in the first large- scale FOT conducted in US, the 100-Car study. The distinction is done as follows (Dingus et al., 2006): 4 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Instrumenting cars Driving: data collection Triggering data Analysis Evaluating the impacts on safety Thresholds Positives Negatives False triggered Positives Negatives True triggered Crash Relevant Events -Crash: situations in which there is physical contact between the subject vehicle and another vehicle, fixed object, pedestrian, cyclist or animal. -Near-Crash: situations requiring a rapid, severe, evasive maneuver to avoid a crash. -Incident: situations requiring an evasive maneuver occurring at less magnitude than a near crash. These safety critical situations are grouped under the name Crash Relevant Events (CREs). Once they are located in the database, the next steps are to conduct a detailed description (of the driver behavior, the environment, traffic conditions, etc), draw conclusions and evaluate possible solutions. Conventionally, CREs from naturalistic driving data have been isolated from the large database using kinematic triggers. These are pieces of code that run throughout the database and record situations with certain kinematic values. Most of these triggers are associated with common evasive maneuvers and acceleration peaks. For example, one of the most typical responses in drivers is to slam on the brakes to avoid a rear-end collision, which leads to peaks in longitudinal acceleration. Therefore, situations in which deceleration is below a certain threshold1 may indicate that there is a CRE. In that case, the recorded situations have been true triggered and constitute a list of candidates to CRE. However, as evidenced by triggering with kinematic values, some CREs are missing (positives that haven’t been triggered, usually called false positives) and many normal driving situations are wrongly triggered (false negatives). This is mainly due to some cutoff kinematic values related to evasive maneuvers may be identical to those obtained while normal driving because of the diversity of drivers and ways of driving. For instance, the same acceleration value may or may not be indicative of risk depending on the aggressiveness of the driver and his/her driving experience. Taking as reference signals such as braking, incidents in which the driver is distracted would be lost. Hence the importance of a precise definition of what is a CRE and the development of intelligent triggers. Among all the possible types of CRE, crashes may be more likely to be detected. This is due to the involvement of contact is likely to cause sudden changes in kinematic parameters. However, near-crashes and incidents are closer to normal actions while driving. Thus, trying to locate these situations, which are also relevant from a safety and statistical point of view, creates a high rate of false negative events. The low sensitivity and specificity of triggering with kinematic values require the intervention of reviewers, who decide whether the situation is critical by watching the video segments from the candidates to CRE. Therefore, only the true triggered events that have been considered positive by the annotators pass into the analysis phase. Such reviewing procedure it’s mostly based on subjective decisions, time-consuming and often tedious for the annotators. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 5 1 values taken as references for each trigger to save results (if keeping decelerations below -4 m/s2, then acceleration is the trigger and -4 is the threshold). 1.3 What is safety-critical? Driver behaviour in NDS The 100-Car study defines CRE as: “A subjective judgment of any circumstance that requires, but is not limited to, a crash avoidance response on the part of the subject-vehicle driver, any other vehicle, pedestrian, cyclist, or animal that is less severe than a rapid evasive maneuver (as defined in near-crash event), but greater in severity than a normal maneuver to avoid a crash(...)” (Klauer et al., 2006) When annotators review the list of candidates to CRE from the triggering process, their subjective judgment it’s based primarily on their perception of how critical the situation seems. This concept is under the above definition, since annotators should evaluate whether the circumstance requires a crash avoidance response on the driver or other involved. Given the limitation of answer this question by just checking the kinematic values of the car or its proximity to other vehicles (objective judgment), each annotator mostly bases his/her opinion on the own driving experience. This hypothesis casts a question: what I think it’s critical, is it also critical for you?. It may be that the fairest answer to this issue requires some empathy with the subject-vehicle driver. This changes the question into: Does the driver think that the situation is safety-critical?. The answers to this question in previous studies were based, for instance, on the force with which the driver depresses the brake pedal2 or on changes in the speech under threatening conditions (Malta et al., 2009). This is also related with the fact that around the 60% of drivers brakes before a crash (Molinero et al., 2009). The main limitation arises in those critical situations closer to normal driving in kinematic terms, such as near-crashes and incidents. These provide a large source of information and a definite benefit in safety and statistical analysis concerning NDS (Guo et al., 2010). There are many literature about how driving is affected by factors such as country, gender, age, or lifestyle among others (Evans, 2004). These factors imply a diversity of driving modes, hence the importance of using the driver as part of the analysis. This conclusion was also pointed out in 100-Car study (Klauer et al., 2006). The analysis of driver behaviour in NDS has been used, for instance, in the development of a model based on multi-modal signals (Takeda, 2010), or in the study of situations when drivers approach to intersections. In this case, it has found a relationship between distance to other vehicles and the location of covering the brake pedal (Sato and Akamatsu, 2007). The movements of the head and eyes are also objects of study in the distractions at the wheel (Nagase et al. 2009). Regarding to the driver behaviour prior to a CRE, Molinero et al. (2009) define key events in situations with failure or not presence of manoeuvres. These include excessive speed and inappropriate reaction, which they relate to driver panic. This concept is present in so-called oops reactions in SeMiFOT, used in the study of driver inattention associated with poor driving performance (Victor et. al, 2010). They also highlight the importance of optimizing the CRE triggers. 6 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 2 brake pressure signal in combination with speed is a potential trigger detected while triggering an initial euroFOT dataset (see Appendix 1). The main limitations in the identification of CRE in a large data set are the variety of drivers and the wide range of situations. This procedure based on what the driver is expected to do, such as evasive maneuvers, leads to loss CRE and results in a high rate of negative situations. Although the perception of what is risky and what can be done depends on the person, there may be a common feeling when someone realizes that something is wrong. This feeling may materialize in a particular body language, before whatever evasive action, if any. 1.4 Purpose Conventional triggering does not seem very efficient to find critical situations among hours of normal driving in a large database. Although kinematic filters can run automatically into the database, the high rate of false events requires the manual intervention of reviewers. Such reviewing procedure is mostly based on the drivers’ reactions in images from cameras inside the cars. In addition, this procedure is time- consuming and often tedious for analysts. Furthermore, comparison of results between different NDSs may also be inaccurate given that the validations are subjective decisions of reviewers opening for inter-subject and intra-subject reliability concerns. A traditional triggering procedure applied to the initial euroFOT data set suggested the hypothesis that there is a relationship between driver motion and CRE. This idea came after watching more than 400 videos containing 40 positive situations3. The main objective of this thesis is to test such hypothesis by creating an algorithm able to automatically identify CREs among the events triggered with kinematics values in euroFOT database. Such algorithm is based on the recognition of driver’s reaction from video images. By defining a training sample from the initial triggered procedure, several methods were applied to recognise the driver’s reaction using images from cameras inside the car. Once possible algorithms had been defined and tested in the training sample, the next step was to evaluate them in a larger data set. Conclusions of these procedures and suggestions for future research are also addressed in the last chapters of this thesis. The scope of this thesis has excluded the use of images other than 1) the driver’s body and 2) the search for kinematic values related to the driver’s reactions. Further, this thesis moves a first step toward the integration of video information for triggering CRE focusing on the driver reaction and not on the current possibilities of image-processing algorithms. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 7 3 further information in Appendix 1 2 Methods The following chapter proposes the algorithms employed in this thesis to recognize drivers’ reaction from cameras inside the cars. The different algorithms were tested on a training sample containing two normal driving situations and a CRE for eleven different drivers. The intermediate goal was to find a method that allowed for an automatic discrimination between true and false CRE. training sample Figure 2.1 Methodology. The Figure 2.1 contains a schema of the followed methodology, whose steps are addressed in more detail throughout the following sections. To have an overall idea, these can be summarized as follows: 1 33 sequences, containing positives and negatives situations, were extracted from the euroFOT database to define a training sample. 2 Then, three methods were applied in the training sample to discriminate between positives and negatives: the t-test&vartest, the Optical Flow calculation and the STD of jerk. The last two were identified as potential algorithms and entered the next phase. 3 STD of jerk required an intermediate step to convert its graphical information to numerical. Among several methods, the mean, harmonic mean and GLCM properties were used as three different convertors that allow an automatic detection. 8 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 validation data set euroFOT database methods 11 positives 22 negatives t-test & vastest optical flow STD of jerk algorithms based on driver’s reaction recognition mean harmonic mean GLCM props. Nº pixels in STD edge detection 19 positives 101 negatives from graphical to numerical information 1 2 3 4 5 4 The three convertors of STD of jerk together with the Optical Flow criterion defined four potential algorithms based on driver’s reaction recognition in the identification of CREs. 5 Finally, the four algorithms were first tested on the training sample and then on the validation data set. This was formed by 120 situations (101 negatives and 19 positives) extracted from euroFOT database. The results from this phase are explained in the next chapter. 2.1 Driver’s reaction recognition. General assumptions As suggested by the viewing of videos of candidates to CRE triggered in an initial euroFOT data set, the key to discriminate between normal driving situations and CRE may be the driver’s reaction. In fact, it’s the driver who decides whether the situation is critical (positive event) or not (negative event). For instance, harsh braking is one of the most typical responses when drivers presence a critical situation. A high decelerations is used as trigger to detect such CREs. However, there are more aggressive driving styles, so the same deceleration level may be achieved in drivers that are totally aware of the situation. Due to the diversity of drives and personalities, reviewers examine which is the driver attitude in the videos to guess whether the situation is critical for him/her. In euroFOT, these images are taken from cameras located in the rear-view mirror inside the cars. These are oriented toward the driver, making it possible to observe his/her torso4. In the sequences of CREs is observed a rigid body motion common to all drivers. This reaction is characterized by sudden movements, such as suddenly grab the steering wheel with both hands and tilt the body forward. This theory also fits with the findings in a study of emotions and associated motions, which relates the surprise with an acceleration of the whole-body portions (Kobayashi, 2008). Prior to the beginning the search for possible methods, assumptions and requirements should be defined. Based on the findings of the initial triggering procedure, assumptions are: 1) Driver reaction is an indicator of CREs. 2) Motion in the driver’s body from euroFOT cameras can be used to detect driver reaction. Given the presence of kinematic changes while driving, driver reaction implies movement (it may not be just a change in face expression). 3) On the basis of the second assumption, individual movements may be not enough self-explanatory. Thus, a sequence of movements seems the best indicator of driver’s reactions. The main requirement is that the greatest number of positive events should be detected with the least possible number of negative events. This means to increase the sensitivity of CRE recognition. The main challenge in this point is to identify near CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 9 4 front-seat passengers are not included in the camera’s field-of-vision. crashes and incidents, since they have kinematic values closer to normal driving situations. Other issues to consider are the variety of drivers and the computational time. It’s important to create an algorithm able to detect different drivers’ reactions in the shortest possible time. This can be generalized considering the images as matrices containing numbers (pixel intensities). In addition, a statistical approach can contribute to measure changes in these matrices and to save computational time. Due to privacy and ethical issues, throughout this project the faces of drivers are hidden to remain anonymous. The tools used in the analysis were placed in locked rooms following the requirements on personal data handling. Only authorized analysts were able to see the displayed data in such rooms. An information document was signed before getting access to ensure that individual drivers could not be identified by anyone except authorized persons. Finally, the extracted data have been revised to include them in this report. 2.2 Definition of training sample The training sample is a part of the entire data set used for testing methods. This involves testing and searching alternatives to recognize driver’s reaction within a limited collection of data. Results from this procedure allow the development of potential algorithms able to identify CREs among negative and positive situations. This will be further evaluated in a larger data set. For the results to be enough consistent, the training sample should be representative of the population. In this case, it contains two-second sequences of eleven different drivers randomly selected among positive events. This positive events were obtained using the kinematic triggers defined in 100-Car (Dingus et al, 2006) in an initial euroFOT data set. By watching those videos is possible to identify the whole driver’s reaction within two-seconds (starting half-a-second before the triggered time). Events have been further described in Appendix 2. The training sample also contains two additional negative events for each driver (see Figure 2.2). These have been recorded in the sequences that take place four and two seconds before the positive event. Such sequences are related to normal driving, thus they are defined as negative events. Figure 2.2 Procedure to define the training sample. To sum up, the training sample contains 33 situations, of which eleven are positive events. The fact that these are experimented by different drivers is based on the requirements established for the algorithm, given that the database is formed by 100 drivers. 10 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Initial euroFOT data set Triggering based on 100-Car study + Validation by video viewing 11 true positives events by 11 different drivers Frames collection of 2-second video sequence for each driver 2 negative events + 1 positive event Training sample 33 events 11 positives 22 negatives 2.3 Recognition of driver’s reaction. General structure Several methods were applied to the training sample in order to: 1) Recognize the positive events among the rest of the sample. Which are the differences in driver’s reactions between positive and negative events? 2) Once the differences were established, efforts were focus on finding a way to automatically detect as many positive events as the lowest possible of negatives. At this point, it was important to save computational time. Possible solutions to address both research questions are presented below together with some initial steps to prepare the images. Note that this research aims to identify in a rough and fast way the reactions of the drivers. Therefore, more specific and accurate image processing methods, such as defining specific features in the images and analyzing the movement, were not considered. This is mainly limited by the size of the database and the variety of drivers. 2.3.1 Data description & Image pre-processing As specified in previous sections, issues as the computational time and the diversity of drivers play an important role together to rightly identify driver’s reaction. Therefore, images were treated as matrices containing pixel intensity values. Since the collected data in euroFOT is available in MatLab, the scripts to access and evaluate the data were also programmed in its language. This section covers technical issues about the structure of the data and initial steps to extract and prepare the images (see Figure 2.3). Figure 2.3 Steps of data acquisition and image pre-processing. Extracting sequences from videos_ Three video sequences of two-second duration were extracted from files in format .avi for each of the drivers of the training sample5. The original images are in grayscale with 288x352 pixels. The frame rate is 12,5 fps6. Each frame was saved to a level of array, which is defined by two other structures: cdata and colormap (see Figure 2.4). Figure 2.4 Unfold of array structure and frames information. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 11 5 some problems to access the frames in long trips with the MatLab function aviread were solved by using an application called videoIO, developed by Geral Dalley (2006). 6 frames per second Frames 1 2 3 ... cdata colormap Matrix with intensity pixel values Array containing RGB values EuroFOT data set Extracting sequences from videos Cutting images Removing flashes Mask in window Cutting images_ Since images are in grayscale, one plane was enough for defining the pixel values (colored images are defined by three planes). Only a certain area of the matrix stored in cdata was saved around the driver’s torso to remove superfluous information (see Figure 2.5). Thus, the final sizes of the images were 283x231 pixels. Figure 2.5 Clipping the torso of the driver7. Removing flashes_ Over-bright images were removed from the sequences to avoid false changes in pixel intensity. Observations from the training sample indicate a constant frequency of one flashed frame each five. This effect was observed in some of the drivers, but this pre-filtering script was applied to all the sequences without distinction. Although this implies to eliminate right information in some cases and makes the sequences faster than in reality, it is preferable to false intensity changes. Mask in window_ Superfluous information, as outside movements in the window’s area, may affect the results by generating changes in pixels intensities not related with driver’s motion. Therefore, a binary mask changed the pixel’s intensities to null values in the window’s area. A situation in which the driver’s body leans forward was taken as dimensional reference to not lose information of the driver’s motion. Figure 2.6 Mask polygon in window’s area. 12 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 7 the driver’s face has been covered due to confidentiality issues. 2.3.2 Recognition of driver’s reaction in sequences Following this first phase, the images were already preprocessed and the training sample was defined by a collection of frames for each situation in 33 arrays, of which 11 were positive events. Below, there is an explanation of the three different methods applied in the identification of those events based on driver’s reaction recognition 2.3.2.1 T-test and Vartest. Comparison of false&positive events Frames are defined as matrices containing values of pixel intensity. These values change depending on the motion in the scene. Since negative and positive situations are recorded for each driver, it is possible to compare both to see how different the distribution of pixel values in each case is. A way to use this information is conducting a t-test of the null hypotheses that data in a certain pixel position along both arrays of each sequence are from the same normal distribution. This theory was applied using two different functions in MatLab: - Ttest2: tests the null hypothesis that values for each pixel position come from populations with equal means, against the alternative that means are unequal (unequal variance is assumed). - Vartest2: tests the null hypothesis that values for each pixel position come from populations with equal variance, against the alternative that variance is unequal. Under 5% of significance level (by default), functions return h=1 if the null hypothesis is rejected and h=0 on the contrary, so results can be represented as binary images. In addition, it computes a p-matrix containing the probability of observing the values as extremes. Three different populations were considered in this calculation: - Intensities in the same pixel position over time in both sequences. - First derivative values for each pixel position over tie in both sequences: deriving also takes into account the time changes. Those most obvious (the largest change in intensity in less time) were expected to be blank areas in the h-matrix. - Square of the first derivative values for each pixel position over time in both sequences: if the pixel intensity decreases during the sequence, the first derivative becomes negative. Then, the square values consider whether this effect can affect the results. For each population, two binary images resulted from the calculation of two different t- tests: one with two negative events and the another with a positive and a negative event. The procedure was the same when performing a vartest. It was expected that the binary image resulting from the comparison of a negative and a positive events contained more white areas than the resulting from the two negatives. This would mean that the intensities in those pixels have experienced more changes and, therefore, rejected the null hypothesis. According to this theory, the positive event could be recognized in the following steps (see Figure 2.7): CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 13 Figure 2.7 Procedure of recognition using t-test&vartest as potential triggers. First, t-test and vartest were applied to compare a negative and a positive sequence for the three different populations. The aim was to detect which population and which test were most suitable for recognizing the driver’s motion. Results of the first approach are shown below (see Figures 2.8 and 2.9): Figure 2.8 Binary images from vartest of: intensities, first derivative of intensities and square of first derivative of intensities (driver A686). Figure 2.9 Binary images from vartest of: intensities, first derivative of intensities and square of first derivative of intensities (driver A686). 14 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Sequence 1 Sequence 2 T-test // Vartest Binary image (h values) Group & Measure of white areas threshold Positive event Negative event As can be seen in Figure 2.8, the binary image from the t-test of square-of-first- derivative of intensities seemed to be the most representative of the driver’s motion. It highlights the areas that have undergone major changes: the hand and the driver’s head. Using the square-of-first-derivative of intensities as population, t-tests were performed in two negative sequences and in a positive and a negative sequences applying the procedure detailed in Figure 2.7. When comparing the positive and the negative events, more white areas were expected on the resulting binary image that collects the h values. Finally, grouping and measuring these white areas might be used to determinate which of two comparisons belong to a positive event (driver’s reaction). Figure 2.10 Binary images from t-test of square of first derivative of intensities. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 15 Driver A686 Driver A484 Driver A567 In most of the drivers, the t-test comparing positive and negative sequences (event VS. 2-second bef.) resulted in more white areas than the obtained from two negative sequences. This effect is observed in drivers A686 and A484 in Figure 2.10, since the images on the left have more white areas than the images on the right resulting of the comparison of two negative sequences. Nevertheless, the opposite occurred in the driver A567. Similar results are also observed in some of the drivers of the training sample. The binary images are quite noisy, which makes difficult to relate the driver’s reactions with the white areas from the rejected null hypothesis. Then, it seems unclear to define a certain white-area threshold to highlight the reaction. The main problem is that some maneuvers while driving may be related with a broad change in the rate of pixel intensities. For instance, turning the steering wheel creates a larger white area when is compared via t-test with a sequence of driving on a straight road. Then, alternative methods were explored to distinguish the driver’s reactions. 2.3.2.2 Standard Deviation of Jerk Given that results from the t-test were unclear in the discrimination between positive and negative events, possible alternatives are discussed below. As noted with the conventional triggering procedure, the reactions of drivers are mostly related to sudden quick motions of the driver’s torso. Therefore, the time in which the action takes place appears to be an important factor. One way to take the motion’s time into account is by deriving intensities in each pixel position over time. Velocity (first derivative) and acceleration (second derivative) represent rate of change of position and velocity over time, respectively. If each derivation level is related to a rate of change of what is deriving, then the third derivative represents a rate of change of acceleration. Young relates control with third derivative in “The Reflexive Universe” to explain any fact of the daily life. He illustrates that controlling the car can be expressed with the third derivative since it is related to changes in acceleration (Young, 2004). The third derivative is also called jerk, and its application can be extrapolated to various fields of mathematics and engineering (Iradier, 2006). The jerk of intensity values at each pixel location over time can give an idea of how sudden these changes are. Calculating the jerk for each driving sequence allows studying whether the positive sequences have values significantly different from the negatives. From this calculation, an array containing matrices with jerk values for each sequence was obtained as result. Two ways to look into these arrays were considered: - Computing the standard deviation (STD): The wider variance in a normal distribution of jerk values, the more different that they have been over time. It was expected from this analysis that the highest changes in accelerations were represented as whiter areas in a grayscale image. - Representing the maximum square of jerk values: Peaks in jerk distribution can also be represented as white areas in a grayscale image, without taking into account how different these values have been in the distribution. The squared values are used to avoid any influence from negative numbers when the pixel intensity changes to a lower value by deriving. 16 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Prior to compute the jerk and its variance, over-bright images were removed from the sequence to avoid false changes in pixel intensity. Observations from the training sample indicated a constant frequency of one flashed frame each five in most of the drivers. This pre-filtering algorithm was applied to all the sequences without distinction. Thus, it was possible to lose information by removing right frames and it also made sequences faster than in reality. The resulting images from both calculations in the three sequences for one of the drivers (two negatives and one positive) of the sample are presented below. The goal was to distinguish the positive event (called just event from now) from the other two situations: Figure 2.11 Maximum of square of jerk values: negative, negative and positive events. Figure 2.12 STD of jerk values: negative, negative and positive events (driver A241). The rates of acceleration changes during the event were not significantly different from those obtained in negative sequences. In fact, maximums were mainly achieved when drivers were maneuvering in sequences previous to the event. Thus, driver’s reaction might not be related to peaks in jerk values. However, differences between negative and positive images from the standard deviation of jerk (STD of jerk) seemed noticeable, as shown in Figure 2.12. As can be seen in the image from the positive event in Figure 2.12, a white silhouette of the driver appeared when calculating STD of jerk during the event. If driver had CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 17 remained in the same position, a dark image was obtained as a result. On the other hand, maneuvers seemed to generate a certain white area in the image. This effect can be observed in the image on the middle in Figure 2.12, when the driver turns the steering wheel. In this context, it’s interesting the case of one of the drivers of the sample: Figure 2.13 STD of jerk values: negative, negative and positive events (driver A1064). As commented before, the sequences of the training sample were chosen randomly from the previous triggering procedure. In this specific case, it was unexpected clearly recognize the driver’s reaction since the motion was almost imperceptible in the video sequence. However, by looking at the Figure 2.13, the driver silhouette obtained in the positive sequence enables to make a distinction respect to the other two previous states. These results support the theory that the driver’s reaction during a CRE seems to involve the whole body (rigid body motion in the driver’s torso), while maneuvers seem to involve just a certain part. This fact makes important to consider the area in which the changes take place to make the distinction. In most of the drivers of the sample, the resulting driver’s silhouette from STD of jerk allowed an intuitive recognition of the driver’s reaction and hence the positive events. Therefore, the possibilities were to post-process the STD-of-jerk images to identify the threshold that relates the graphic silhouette with the driver’s motion (a conversion from graphical to numerical information), or to keep trying other methods. 2.3.2.3 Optical Flow A numerical alternative to the graphical method discussed in the previous paragraph was the calculation of the optical flow. Its original formulation came from Horn and Schunck (1981), who defined optical flow as “the distribution of apparent velocities of movement of brightness patterns in an image”. This distribution provides information about the object motion in terms of spatial allocation and rate of change. The optical flow constrains equation is defined as: Ix·u+Iy·v+It=0 (2.1) 18 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Ix, Iy, It: spatiotemporal image brightness derivatives. u=horizontal optical flow v=vertical optical flow This equation relates the intensity changes in a sequence of images with three- dimensional object motion. Nevertheless, this relationship could result unclear in some cases. For instance, the optical flow is zero in all the points of a rotating movement of a sphere. However, assuming the surfaces are flat, motion may be related to changes in brightness. Therefore, velocities of object motion can be estimated by solving u and v8. During the last years, several studies have been carried to improve the performance of the classical formulation. In this context, D. Sun, S. Roth and M. J. Black (2010a,b) have recently contributed developing a more accurate optical flow. They published a public Matlab code to compute this new optical flow formulation under educational proposes (Sun, 2010). Outputs are two matrices with the horizontal and vertical components of optical flow (OF from now) for each pair of processed images. Estimating flow in drivers_ The OF code developed by Sun et al. (2010) were implemented in the training sample with the parameters established by default9. The main objective was to assess whether the body’s motion of the driver can be estimated with changes in brightness in a two-dimensional image. Such script returned two matrices containing speed components for each pair of computed images. In this case, matrices were combined into a single keeping the magnitude of speed, since this value seemed more significant than the flow’s direction. Each matrices collection was kept in an array for each of the sequences in the training sample. As commented in the previous section, the highest changes in acceleration in STD of jerk images weren’t reached during the driver’s reaction. This led to consider other alternatives that peaks in speed along the arrays to make the distinction. As happened in the calculation of jerk, pre-filtering alters the results since intermediate values are lost. However, this was preferable to false intensity changes due to flashes. Taking as reference one of the drivers of the sample, initial estimates consisted on using speed data from the OF calculation in each array (with and without filtering frames) to calculate: - Local maximum from Optical Flow (OF): peak in speed of optical flow over the array of each sequence. - Maximum sum of the whole array: maximum value in a matrix resulting from the sum of individual matrices with velocities of the entire OF array for each sequence. - Number of pixels above or equal to the 95% of the peak in a matrix resulting from the sum of individual matrices of the entire OF array for each sequence. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 19 8 see further information about Optical Flow and its formulation in Appendix 3 9 it’s also possible to test different methodologies of flow calculation; by default the program use the “Classic+NL-Fast” (only pixels from certain regions are weighted to save computational time). Table 2.1 Comparative of OF values in negative and positive events (driver A686). Sequences (driver A686) Original sequence Filtered sequence Local max. Max. sum of the whole array Nº of pixels with speed>=95% of the highest speed in total sum matrix Local max. Max. sum of the whole array Nº of pixels with speed>=95% of the highest speed in total sum matrix Negative (4 sec. before) 39,79 74,5798 1 38,98 59,7046 155 Negative (2 sec. before) 30,49 93,219 122 28,65 84,4021 277 Positive event 18,88 112,603 206 11,01 96,6913 298 By looking at the Table 2.1, some differences might be identified between the calculation with the original and the filtered sequence. Anyway, maximum values were achieved in the same categories. Local peaks of speed were not recorded during the positive event. However, it registered the maximum when considering in the calculation a single matrix containing the sum of speeds over time. Besides, this matrix had more pixels with higher sum of velocities than the negative sequences. Anyway, differences were not enough consistent to establish these values as criterion of discrimination. For instance, the number of pixels containing the highest sum of velocities was 277 in the negative filtered sequence (with 84.4021 of peak speed) and 298 in the positive (with 96.6913 of peak speed). This suggested analyzing the speed of the optical flow between single frames instead of using a matrix containing the sum of values along the array. Results of performing the same calculation on single frames from each of the sequences for the same driver are presented below: Average speed on single frames As can be seen in the Figure 2.14, the average speed was significantly greater during the event (red line in the graph) than in negative sequences (obtained two and four seconds before the event). The peak was achieved in the 16th matrix of the OF array resulted from processing the images of the positive sequence. Figure 2.14 Speed average over OF frames in positive and negative sequences. 20 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Peak speed on single frames However, the peak speed wasn’t obtained in the OF of the positive event. As can be seen in the Figure 2.15, the highest peak speed was recorded in the negative sequence, which takes place four seconds before the event. Figure 2.15 Peak speed over OF frames in positive and negative sequences. Number of pixels sharing highest peak speeds on single frames As shown in Figure 2.16, at the time of maximum speed average (16th matrix of the event sequence, see in Figure 2.14) the number of pixels with, at least, the 95% of the peak speed was 760. Another maximum was observed on the 13th of the array with 847 pixels. However, in comparison with other sequences, the maximum was registered in one of the negatives two seconds before the event. Figure 2.16 Peak speed over OF frames in positive and negative sequences. The most significant speed rate during the event was recorded in the 16th matrix of the OF array. It resulted from the estimation of speeds between 16th and 17th frames of the original filtered sequence. Note that OF interpolation warps the second image and its derivative toward the first10. Those original frames are shown below to know the significance of such peak speed: CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 21 10 see optical flow formulation in Appendix 3. Figure 2.17 Frames of the event sequence corresponding to peak speed (driver A686). As can be seen in Figure 2.17, Frame 16 corresponded to the biggest change in motion during the reaction, when the body leaned forward. In Frame 17 driver was back to the original state. The change in motion was evident in this part of the sequence. The greatest differences between the event and second-before sequences seemed to be related to the average speed on single frames. According to this, sudden changes might be recognized by deriving. The Figure 2.18 shows the second derivative values of the average speed vectors for each sequence. Jerk of average speed on single frames The biggest slope matched with the change in motion between the 14th and 15th values during the event sequence. At that moment, driver’s body leaned forward due to the inertia of harsh braking. Rates of acceleration changes recorded two seconds before the event kept values into a rate during the whole array. However, other significant change occurred between the 4th and the 5th derivative values four seconds before the event. Figure 2.18 Jerk of average speeds over OF frames in sequences. As can be seen in Figure 2.19, the second biggest slope in the sequence that occurs four seconds before the event was due to a change in the driver’s position: 22 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Figure 2.19 Frames of the negative corresponding to peak speed (driver A686). Given these findings, it seemed that peaks in the distribution of jerk from OF velocities were associated with the driver’s motion. Nevertheless, the main obstacle is the computational time required to estimate the OF 2.3.3 Silhouette detection in STD of Jerk images In the last section, several methods were applied in the recognition of driver’s reaction in presence of CREs. Among these methods, the OF and the STD of jerk were identified as potential algorithms. The main limitation of the OF was the computational time spending. Although it didn’t concern the STD of jerk, its results were graphical. So, this graphical information should be converted into numerical to facilitate an automatic detection method. This automatic detection mostly involved the study of the properties which characterize the images of the STD of jerk in the event among the negative sequences. Figure 2.20 Converters from graphical to numerical information in STD of jerk. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 23 STD of jerk images Dark image Graphical information Remaining in the same position Maneuvering Reacting in a CRE Certain white area (i.e. Steering wheel) White driver’s silhouette To numerical information Observations Mean Harmonic mean Counting pixels in intensity intervals Edge detection GLCM properties General silhouette Converters 2.3.3.1 Mean A first alternative to the use of graphic information from the STD of jerk was conducted by plotting the sum of the values along rows. The reason was to try to find out a silhouette in the images corresponding to the driver’s reaction. According to this hypothesis, it was thought that the mean11 would be higher during the event than in previous sequences. This was based on the dispersion of pixels over the image to generate the driver’s silhouette. Taking as reference a driver from the training sample, the STD of jerk values were sum along rows for each of the sequences. The resulting vectors are plotted below together with the STD of jerk images of the driver (see Figure 2.20 and 2.21). Figure 2.21 STD of jerk values: negative, negative and positive events (driver A936). As evidenced in Figure 2.20, the sudden motion from the driver’s reaction generated a white silhouette in the event sequence. Its distribution of STD values along rows reached a mean of 11900, while means in previous sequences are 2598 y 7299, respectively. The peak in the distributions of STD corresponded to the two- second before image, where a bright white area is concentrated in the middle of the figure. Figure 2.22 Distribution of sum of STD of jerk values along rows (driver A936). This result was unexpected since in the pre-filtering procedure over-bright images were removed. By reviewing the video it was checked that this area corresponds to a movement of the driver, who moves the arm from the steering wheel to the mouth. 24 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 11 arithmetic average of the values over the distribution. 2.3.3.2 Harmonic mean In educational fields, the harmonic mean is commonly used to calculate the final grades of students to ensure a reasonable level of work during the academic year (Wilson, 2006). This case was also somewhat related given that mean was affected by local peaks in distributions. Hasna and Alouini (2002) used this formulation to study the performance of wireless communication. They defined the harmonic mean as follows: “Given two numbers X1 and X2, the harmonic mean of X1 and X2, �H(X1,X2), is defined as the reciprocal of the arithmetic mean of the reciprocals of X1 and X2, that is: It is clear that the harmonic mean of two numbers is equal to the square of their geometric mean divided by their arithmetic mean.” (Hasna and Alouini, 2002). The harmonic mean is not affected by the outliers (related to maneuvers) and it’s also useful when the data are resulted from indirect calculations. In this case, jerk belongs to derivatives of intensity changes in pixels over time. Therefore, this method was thought as an alternative of the mean calculation when distributions of STD of jerk were affected by normal driving maneuvers and position changes in the driver. As can be seen in Table 2.2, the mean of the negative sequence that takes place 4- seconds before the event was 7299, which corresponds to a 61,33% of the mean achieved during the event (11900). This influence was lower in the case of the harmonic mean. Assuming as 100% the harmonic mean achieved during the event (6393), the value recorded in the 4-seconds-bef. sequence represents a 18,62% (1190,5). Table 2.2 Comparison between mean and harmonic mean in the distribution of sum of STD of jerk values along rows in sequences of driver A936: Criterion 2-sec. bef. 4-sec. bef. Event Mean 2598 7299 11900 Harmonic mean 369,8246 1190,5 6393 2.3.3.3 Counting pixels in intensity intervals Since the maximum values of STD of jerk weren’t achieved during the event, another option was to consider the number of pixels with a certain STD within an interval. This same concept is in the calculation of the image histogram. This method allows representing the intensity levels respect to the number of pixels that share such intensities. Histograms can be used to obtain the parameters of a texture (Alba et al., 2006). In some way, the driver’s silhouette is related to a texture given that it’s defined by a relationship between pixels. Some of the properties of histograms can be summarized as follows (Olmos, 2008): CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 25 (2.2) - Images can’t be rebuilt from their histograms. - Two images can be associated to the same histogram. - Histograms not contain spatial information about the image. In the next trial, the driver A686 was took as reference to analyze the intensity levels generated by changes in driver’s position. Images of STD of jerk from the event and a previous sequence are represented below in a three-dimensional graphic (see Figures 2.23 and 2.24). This graphical representation gave an idea about the rates of STD and their location over the image in a negative and in a positive situation. Figure 2.23 Tridimensional distribution of STD of jerk over the image in one of the negative sequences of driver A686. Figure 2.24 Tridimensional distribution of STD of jerk over the image of the positive sequence (event) of driver A686. As shown in the graphs above, the negative sequence concentrated highest values of STD in the area generated by the hand movement. It seems that the widest variances might be not related to the driver’s reaction. Results from counting the number of pixels within certain intervals of STD of jerk in each images are presented as follows: 26 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 As can be seen in Figure 2.29, the differences were more significant in the fourth and fifth interval of STD. These intervals were groups of pixels with STD of jerk between 50 and 20. In this range of STD, a higher number of pixels were counted during the event than in the previous sequences. Figure 2.25 Distributions of number of pixels within intervals of STD in the event and in previous sequences for the driver A686. The numerical values associated to Figure 2.25 are represented in Table 2.3: Table 2.3 Number of pixels within intervals of STD of jerk during the event and previous sequences. Driver A686 Number of pixels within a certain interval of STD of jerk Nº of interval (STD) 1 (300-150) 2 (150-100) 3 (100-50) 4 (50-20) 5 (20-10) 6 (10-5) 7 (0) Event 1542 4178 11708 19422 26384 32199 28559 2-second before 871 2430 6167 11030 20147 30255 30152 4-second before 5229 8195 12788 17063 20705 28007 28962 By adding the values of the fourth and fifth intervals (columns “4 (50-20)” and “5 (20-10)” in Table 2.3), the numerical difference between the event and the previous sequences was not enough significant to discriminate between both situations. The number of pixels during the event was 45806, while 31177 and 37768 pixels were counted in previous sequences, respectively. This suggested taking into account the spatial distribution of pixels in next tests. 2.3.3.4 Edge detection: Hough transform The Image Processing Toolbox in MatLab contains several procedures to detect edges in an image. In the following test, the Hough transform method was applied to a positive and a negative situation for the same driver. This method is based on the parametric representations of lines in a plane (MathWorks, 2011): �=x·cos�+y·sin� (2.3) CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 27 The general procedure consists in detecting edges using Sobel or Canny algorithms. The resulting images may have open forms and isolated points. Then, the correction is possible by taking an initial point and drawing straight lines in a polar coordinate system. � and � values are accumulated in a matrix called Standard Hough Transform (SHT) to guess which pixel is more likely to belong to each edge. Peaks in SHT represent potential lines in the input image. Finally, houghlines command finds the extremes of the lines and fills the small gaps. This method was applied to a pair of images from the same driver, one obtained from an event and another from 2-second before the event. The Hough transform was represented in a graph and its peaks (potential lines) appeared in squares. Then, the detected lines were colored on the input images. Figure 2.26 Hough transform and detected lines from sequence 2-sec. before the event Figure 2.27 Hough transform and detected lines from the event’s sequence. By looking at Figures 2.28 and 2.29, the detected lines were not clearly different to discriminate between both situations. This method is usually useful in detecting roads in aerial images. However the straight lines seem not fit with the driver’s silhouette. This method can be also applied with curve lines by previously defining an original shape. This shape might be not clear definition in this case, due to the variety of drivers and the camera positions. 28 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 2.3.3.5 General silhouette Since maneuvers and changes into positions seem to generate white areas in the STD of jerk images, another possibility in the identification of events was to define a certain area where the reactions were likely to take place to avoid false positives. To define this area, several silhouettes obtained from different drivers of the sample were combined into a single one. Three different procedures were applied using the command wfusimg in MatLab. This program merges two images using fusion methods. Thus, the images containing the drivers’ silhouettes from STD of jerk during the events were merged in pairs (see procedure in Figure 2.28): In the Figure 2.28, x and y are sub- images from intermediate fusions of pairs of images. zt represents the final merged image. Figure 2.28 Schema of combination of silhouettes. The command wfusimg allows to define levels of approximations and details. The following are zt images resulting from variations in these parameters: Figure 2.29 Resulting images using different inputs in the fusion command. Matrices of STD of jerk contain different values depending on the movement of the driver during the sequence and the illumination conditions, for instance. Merging images based on mean values for approximations and details (see image on the right in Figure 2.29) tends to highlight the drivers with widest variances in such matrices. Silhouettes from STD of jerk (2-seconds sequences during events) 1 2 3 4 5 6 7 8 9 10 x x2 x3 x4 x5 y y2 z zt CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 29 -Maximum for approximations and minimum for details: -Maximum absolute for approximations and details: -Mean for approximations and details: However, the merged silhouette should provide an idea about the area in which commonly reactions take place, regardless of STD values. Since the merged images with maximum and minimum levels were quite similar, one of them was selected and a freehand region was drawn around the driver’s place (see image on the left in Figure 2.29). Areas of windows, rear seats and the steering wheel were not taken into account to avoid false positives. Although several reactions and evasive maneuvers are related to turn the steering wheel, this area seems to tend to confusion when discriminating between positive and negative events. This procedure was just an approximation to facilitate the study of changes in pixel intensities in a given area. The position of this Region Of Interest (ROI) was saved into an N-by-2 array in MatLab. This ROI could be applied as binary mask to the image in combination with the rest of the methods, aiming to improve their performance. 2.3.3.6 Gray level co-occurrence matrix Texture filters often use the image’s histogram to statistically evaluate the texture. Although this provides information about its properties, shape or spatial distribution over the image are unknown (IZMIRAN, 2005). Another statistical procedure of texture analysis that considers the spatial distribution is the Gray Level Co-occurrence Matrix (GLCM). GLCM contains how often pairs of different combination of pixel intensities occur in an image (see procedure in Figure 2.35). This texture analysis is originally from Heralick et al.(1973) and today is commonly used in medical image processing, modeling of forests attributes or studying the sea-ice, among others. In this case, this method was thought to identify the driver’s silhouette in images of STD of jerk based on its distribution. Figure 2.30 Process Used to Create the GLCM, [electronic print] Available at [Accessed May 2011]. 30 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 This method can be applied in two main steps: Definition of GLCM: Calculating the frequency of certain relationship between pixels requires the choice of: -Offset: distance between the related pair of pixels. -Direction of offset: direction in which the pair of pixels are going to be evaluated. This choice is based on a visual examination of what it’s likely to be more characteristic of the texture. -Gray levels: the input image is scaled in a certain number of intensity levels. The lower scales, the lower computational time. Besides, the statistical study is improved by reducing the number of levels. Calculation of statistics using GLCM: Once the GLCM is defined, several statistical methods can be used to identify the texture’s properties. Hall-Beyer (2007) has created an online tutorial about how to define a GLCM and its possibilities. She defines three main groups derived from GLCM calculations, which are summarized as follows together with the possibilities offered in MatLab: Contrast: the diagonal of the GLCM contains pairs of pixels with the same gray level. If there is a high frequency of these combinations, then the image doesn’t have much contrast. This measure is the sum of square of variances and increases away from the diagonal(=0 if constant image). Homogeneity: closeness in the distribution of combinations in the GLCM. It increases with less contrast (=1 in the diagonal). Energy: uniformity in the image that is measured by adding the squared elements (moment of inertia) in the GLCM (=1 for uniform image). GLCM correlation: dependency of gray levels between neighboring pixels (=+1 or -1 for perfectly correlated image). This doesn’t take into account the frequency of occurrence of a pixel, but its frequency together with a given pixel value. Measures group 1: distance to the GLCM diagonal (contrast) Measures group 2: how regular the pixels are within the image Measures group 3: descriptive statistics of GLCM CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 31 The Table 2.4 contains some results of testing the GLCM together with a Fusion mask in two drivers of the sample. The drivers were turning the steering wheel and moving the hand in previous sequences. These situations were chosen to define problematic situations that could interfere in the recognition of the driver’s silhouette. Table 2.4 Properties in four directions of the GLCM in drivers A241 and A686 (offset=200). Driver A241 Driver A686 Contrast Correlation Energy Event A241: [0.1598 0.0007 0.2925 0.0031] A686: [0.6412 2.1862 1.9693 0] A241: [0.3671 -0.0004 0.2036 -0.0016] A686: [-0.0565 -0.2229 -0.0400 NaN] A241: [0.8380 0.9984 0.6473 0.9938] A686: [0.7625 0.3703 0.3482 1] 2-sec. bef. A241: [0.0595 0 0.1335 0] A686: [0.0283 0.0696 0.3453 0] A241: [-0.0085 NaN -0.0289 NaN] A686: [-0.0143 -0.0360 -0.0402 NaN] A241: [0.9580 1 0.8710 1] A686: [0.9451 0.8705 0.6109 1] 4-sec. bef. A241: [0.0304 0 0.1766 0] A686: [0 0 1.4572 0] A241: [-0.0085 NaN -0.0356 NaN] A686: [NaN NaN 0.0481 NaN] A241: [0.9625 1 0.8439 1] A686: [1 1 0.5422 1] The contrast was one of the properties resulted in more significant differences between the event and previous sequences. These differences were more evident when increasing the offset between the pair of pixels. This might be due to the sizes of the driver’s torso in the silhouette. This dispersion might not be large enough in the area from maneuvering or moving the hand. Respect to the correlation, in driver A241 the value recorded in the first direction (horizontal) was positive during the event and negative in previous sequences. However, this effect wasn’t observed in driver A686, since values were quite similar in both negative and positive sequences12. In the case of the energy, some values were higher in previous sequences than during the event, depending on the direction of the GLCM. This fact might indicate a higher uniformity in images from negative situations. 32 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 12 note that the NaN values obtained in some directions when calculating the correlation mean that the GLCM variance is null. So, the image is completely uniform according to the defined combination of pair of pixels. 2.4 Evaluation criteria. Data set definition The goal of this project is to create an algorithm able to run throughout the triggered events of the database and to save only those in which the drivers react in presence of CRE. Potential methods of recognition of driver’s reaction have been commented in previous sections using some positive and negative situations. The performance in the training sample will provide an idea about which combinations are more likely to identify the CREs. Nevertheless, the evaluation of the proven methods requires the use of a larger data set containing different situations from those used previously. This data set, called validation data set from now, comes from a triggering process with kinematic triggers in the euroFOT database and a subsequent evaluation by the annotators. The validation data set contains 120 different situations chosen randomly among the events that have been considered positive or have been rejected by the annotators when watching the videos of candidates to CRE. Figure 2.31 Schema of procedure of algorithms’ evaluation. Several thresholds have been considered when implementing the algorithms in the baseline. If the threshold is not strict, then it will result in a greater number of true positive events (CRE rightly triggered), but also false negatives (normal driving situations wrongly triggered as positives). The ideal situation would only capture the 19 positive events without any negative (19 true positives and 101 true negatives). Since this only could be possible in a further study with the adequate adjustments from this preliminary project, there should be a compromise between the true positives to be achieved at the expense of false negatives. Such compromise can be represented in terms of specificity and sensitivity using Receiver Operating Characteristic (ROC) curves. The ROC curve is a graphical representation of a rate of true positives against the rate of false positives for different thresholds in a diagnostic test (Tape, n.d.). This method was originally developed in the World War II in radar-signal detection (Mason and Graham, 2002). Nowadays is common and widespread used in the medical field for diagnosis of diseases. There is a trade-off between sensitivity (rate of positives well diagnosed by the test) and specificity (rate of negatives well diagnosed by the test). If the sensitivity increases, then the specificity decreases and vice-versa. In this case, true positives are CRE rightly triggered (Y-axis) and false positives are normal driving situations wrongly triggered (X-axis). CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 33 Algorithm (potential trigger) Validation data set 120 events 19 positive 101 negative Results: ROC curve The closer the curve is to the left point, the more accurate the test (see representation in Figure 3.2). On the other hand, the less accurate test, the closer to the diagonal. The area under the ROC curve is commonly used as a measure of accuracy. The following values can be used as guide (Tape, n.d.): .90-1 = excellent (A) .80-.90 = good (B) .70-.80 = fair (C) .60-.70 = poor (D) .50-.60 = fail (F) Figure 2.32 Tape T., The Area Under an ROC Curve, [electronic print] Available at [Accessed on May 2011]. In this case, the main limitation of using the AUC when comparing methods is that is more important to save true positives even if this means an increased number of false negatives. Thus, the evaluation of the methods for the whole set of false positives (the area under the entire ROC curve) seems not be the most appropriate in this case. One alternative is to analyze a portion of the ROC curve (Katzman, 1989; Cleveland, 2011). An estimation of the relevant portion of the curve can be defined by a range of false positives below 60% and a range of true positives above 80%. The main reason is to keep almost all the true positives (sensitivity) even if it means increasing the false negatives (1-specificity). Therefore, the negative events in the database may be reduced in at least 40%, without losing more than 20% of positive events. The numerical meaning according to the dimensions of the validation data set is to reduce by 40 the total of 101 negative events and keep at least 16 of the 19 positives. 34 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Area Under Curve (AUC) 3 Results In the previous chapter, several algorithms were considered and tuned up in a training sample with eleven drivers and three different situations for each driver with the aim to distinguish CREs from a collection of negative and positive events. This chapter covers the performance of such algorithms in the training sample and their validation within a larger data set. The ideal algorithm should be able to identify as many positive events with the minimal negative situations. Results are presented below making use of ROC curves. 3.1 Performance in the training sample Along the last chapter, several methods have been applied in the recognition of driver’s reaction to identify safety critical situations. Initial assumptions, as classification based on t-test results, seem to generate noisy images and an unclear definition of the state of the driver’s motion. Nevertheless, analyzing changes in pixel intensities over time suggests that the sought motion may be related to a sudden change in a group of pixels intensities. This same concept is behind the images of STD of jerk and the OF calculations. By looking at the grayscale images from STD of jerk is possible to identify which is the positive event without any additional information in most of the cases. The key of this identification is the silhouette of the driver, which means that there are a group of pixels that share a wide variance of jerk distribution over time. In the case of the OF, peaks in jerk distribution from the average of OF velocities in each frame contribute to discriminate between previous sequences and the event. The calculation is based on average speeds, so a group of pixels change quickly between frames. To assess the validity of this theory, these calculations must be performed throughout the entire training sample. The following is the example of results obtained for one of the drivers. Jerk distributions are presented together with the images from STD of jerk, which are also evaluated as distributions of sum of values along rows. The Appendix 4 covers the same calculations for all the drivers of the training sample. It is expected that a driver’s silhouette appears during the event when plotting the STD of jerk. Distribution of sum of values in rows for each column pursues to distinct between normal driving maneuvers (just certain white areas in the images) and reactions in CREs. Thus, normal maneuvering may be related to local peaks in these curves, while a higher mean may be related with the positive events. This is because white areas are dispersed along the image to reproduce driver’s silhouette. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 35 Figure 3.1 STD of jerk: 4-sec. before the event, 2-sec. bef. and event (driver A484). The driver remains in the same position in sequences before the event. Therefore, the images of STD of jerk seem to be a clear indicator of when the driver reacts. As can be seen in Figure 3.2, the sum of values along the rows for each column is also significantly higher during the event than in previous sequences. Figure 3.2 Distribution of STD of jerk values along rows and columns. Since the driver remains in the same position over time, distributions of jerk should be relatively constant before the event. Some unexpected results were obtained four seconds before the event at 8th iteration, as shown in Figure 3.3. Anyway, the maximum jerk is reached during the event. Figure 3.3 Distribution of jerk from OF velocities. Range: 1.694-(-3.745)=5.439 36 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 3.1.1 Optical Flow Findings in the training sample support that peaks in distribution of jerk from OF velocities are related to the drivers’ reaction in presence of a CRE. This hypothesis is accomplished in ten of the eleven drivers of the sample, in which the ranges of jerk are significantly higher during the event than those obtained in previous sequences. Uncertainties are which range of jerk is related to the driver’s reaction, since these values are different for each driver (see Figure 3.4) and the computational cost of running the optical flow code. Given the dimensions of the database, the computational time is an important limitation. Figure 3.4 Ranges of jerk for drivers of the training sample. To sum up, jerk peaks from OF velocities and images of STD of jerk were identify as potential indicators of positive events. Both methods base the discrimination in the presence of the drivers’ reaction when CREs occur. The main limitation when calculating the OF is the consumption of computational time. On the other hand, it was observed a relationship between the driver’s reaction and images of STD of jerk. In most of the cases was possible to identify what the positive event is by just looking at the driver’s silhouette. As this is graphic information, several converters have been addressed in the last chapter to transform this information into numerical. 3.1.2 Mean criterion The figure below includes the distribution of mean values for each sequence in all the drivers of the training sample: Figure 3.5 Mean of distribution of sum of STD of jerk values along rows for all the drivers of the training sample CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 37 As shown in Figure 3.5, the mean values are higher during the event than in previous sequences in nine of the eleven drivers of the sample. In both exceptions ( 5th and 10th position in the sample), maneuvering in sequences before the event generates peaks in the distribution of STD of jerk values and, consequently, the mean value increases. Images of STD of jerk of both cases are presented below to analyze why the mean value differs from those obtained in the rest of the sample. Figure 3.6 STD of jerk in negative sequences of drivers A567 and A686. Marked areas of moving the hand and turning the steering wheel stay pixels with wide variance in jerk values over time. This causes the mean increases in such situations in comparison with the figure obtained from the positive event. Thus, the mean criterion not seems consistent enough in itself to discriminate between positive and negative situations. Since the values were added only along rows, distributions can be also tested in another direction. This involves calculating the mean of the distribution of sum of STD of jerk values along columns instead of rows. Distributions in both directions using one of the drivers from the exceptions are plotted in the figures below together with the number of zeros (black color) in the images: Figure 3.7 Distribution of sum of STD of jerk values along rows (driver A567). 38 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Figure 3.8 Distribution of sum of STD of jerk values along columns (driver A567). As can be seen in Figures 3.7 and 3.8, peaks of sum of STD values are higher in the previous sequence than during the event along rows and columns. Then, the white area when turning the steering wheel still generates a higher mean in this case. Besides, the number of zeros is very similar in all the sequences. If comparing this result with those obtained for the rest of the sample, it is observed that in both exceptions (drivers A567 and A686) the mean is also higher in other sequences than during the event (see Table 3.1). Regarding the number of zeros, this value is not significant enough to distinct between positive and negative sequences. Table 3.1 Mean of distributions of sum of STD over columns and Number of non- zero values in the training sample. Mean of STD distribution over columns Number of non-zero values Driver 4-sec. Bef. 2-sec. Bef. Event 4-sec. Bef. 2-sec. Bef. Event A34 4190 8092 17920 144,2 148,4 172,5 A241 2885 3869 6198 126,8 130,5 152,7 A481 2505 1892 5295 183,7 181,9 191 A501 6204 10410 12010 162,9 178,7 181,9 A686 7393 3485 5489 128,7 124,5 130,1 A1064 2071 1916 3656 122,8 124,6 132,9 A131 9519 10330 20950 168,4 163,6 200,2 A352 3342 4045 8982 205,2 206,9 217,9 A484 1505 1381 7214 158,6 156 172 A567 3767 6708 6287 154,8 156,2 154,7 A936 2121 5958 9711 98,91 108,2 142,9 In conclusion, the mean values of distributions of sum of STD of jerk along rows and columns have been calculated to recognize the driver’s silhouette as a wider dispersion of intensities over the images. Since this value is affected by concentrated areas in the image from maneuvering and changes in position, other statistical measures are taken into account. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 39 3.1.3 Harmonic mean As was performed with the mean in the last section, the harmonic mean is estimated in the distributions of sum of STD of jerk values along rows and columns. Again, the reason is to try to locate the silhouette by the dispersion of pixels in the image with the difference that harmonic mean is not as affected by outliers. The sum of each par of values (harmonic means in rows and columns) is presented below as “Combination of harmonic means” for all the drivers of the training sample: As shown in Figure 3.9, combinations of harmonic means reach higher values during the event than in previous sequences for all the drivers of the training sample. This result in also observed in drivers in 5th and 10th position in the sample, exceptions of the mean criterion,who register in this case a higher sum of harmonic means during the event. Figure 3.9 Distribution of combination of harmonic means in the training sample. Looking at the range of values of harmonic means in different drivers (see in Figure 3.9), the main issue is to establish a threshold able to identify as many events at the expense of negative situations. 3.1.4 Mean&General mask The mean value was also calculated considering the Fusion mask, binary mask created from the combination of several driver’s silhouettes. The Figure 3.10 shows the mean values using the binary Fusion mask in images of STD of jerk. Despite the fact that the size of the training sample is not large enough to have a statistical sense, these results suggest higher means in those images from events than from previous sequences. Figure 3.10 Mean values by applying Fusion mask in STD of jerk images. 40 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 3.1.5 GLCM properties The contrast and the energy of the GLCM have been tested as well in the entire training sample considering an offset of 200: Figure 3.11 Sum of contrast in four directions of the GLCM in the training sample. Figure 3.12 Energy of the GLCM in the training sample The Figure 3.11 shows the sum of contrasts in four different directions of the GLCM with an offset of 200 pixels for the sequences of the training sample. This value appears to be greater in some of the positive events regarding the previous sequences, but is not a clear discriminator in some cases. The same occurs using energy as property of study in GLCM. It seems that values are generally lower during the events in comparison to previous sequences. The main limitation would be to set a value that discriminates between both situations. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 41 3.2 Results in the validation data set As mentioned previously, the training sample is used to test different methods and identify potential features to discriminate between positive a negative events. However, its dimensions are not large enough for a statistical sense. Therefore, the evaluation requires the use of the validation data set. Below, ROC curves are plotted for each method with different combinations of masks in the images and thresholds. The range of variation of threshold values has been chosen according to the results of the training sample. These are represented as dots on the graph for the entire false positive rate. This gives an idea of the accuracy of the curve. However, only a certain area under a portion of the curve is relevant. It is bounded by two lines on the graphs. The largest area within these boundaries determines which method is the most accurate based on the requirements specified in the Evaluation criteria in Chapter 2. Another consideration when comparing the methods arises in the computational time. This is estimated in terms of how long (in seconds) processing each second of trip takes. This is calculated by taking the time of computing all the iterations when changing the threshold values and considering the two-second duration of each file in the baseline. 3.2.1 Mean criterion The mean criterion evaluates the presence of driver’s silhouette in images of STD of jerk by adding STD values along rows and columns. Both vectors containing partial sums are combined into a single. The mean is calculated in its distribution. 60 iterations have been considered by changing the threshold values with a step of one unit. Three different input images have been considered: -without mask: original image crop around the torso. -BW mask: binary mask hiding the window. -Fusion mask: binary mask around the area in which driver’s silhouettes commonly take place. In comparison with the commented evaluation criteria, the curves in Figure 3.13 are closer to the shape of good accuracy. However, better results would obtain if the curve was closer to the upper left corner. The area under the bounded portion of the curve seems larger without using any mask. The second best option according to this area suggests the use of fusion mask. 42 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 Figure 3.13 ROC curves of thresholding with different combinations of mean criterion. CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 43 3.2.2 Harmonic mean The distribution of partial sums of STD of jerk along rows and columns is now evaluated using the harmonic mean. Unlike the previous test, this method discriminates the outliers of such distribution, mostly regarding with maneuvers. The threshold values vary between harmonic means of 400 and 10000, resulting in a stepped ROC curve. Figure 3.14 ROC curves of thresholding with harmonic mean criteria. The harmonic mean criterion emerged as an alternative of using the mean. Among the 33 cases of the training sample, the harmonic mean was higher in the 11 that were positive. However, as can be seen in Figure 3.14, the bounded area is null. The slope at the beginning of the ROC curve is positive in terms of a further increase in sensitivity against (1-specificity). Nevertheless, variations from a certain threshold values don’t seem to affect the rate of true and false triggered. The best result obtained with this method according to the initial criterion is achieved with harmonic means above 2635. In that case, the false positive rate is 60,4%, while the true positive is 89,4%. 3.2.3 Ranges of jerk from OF The calculation of the optical flow (OF) is a numerical alternative to the use of STD images in the estimation of rates of change in pixel intensities. Events in the baseline have been triggered above three different intermediate values of range of jerk (1, 2 and 6). This low sampling rate is mainly due to the computational cost of implementing the OF Figure 3.15 ROC curves of thresholding with OF Criteria. 44 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 OF velocities have been calculated using the original images and them combinations with binary masks. For the three cases observed in Figure 3.15, ROC curves are closer to the diagonal. Although the results are slightly better with the use of fusion mask, the method seems inaccurate for identifying positive events. This contrasts with the results of evaluating the OF in the training sample. In ten of the eleven drivers, the peak in the jerk distribution in positive situations was clearly significant in comparison with those obtained in negative sequences. The main limitation arise in the threshold value, since it changes for each driver. Figure 3.15 ROC curves of thresholding with OF Criteria. In any case, the potential application of this method together with the use of fusion mask reduce by 40% the number of negatives while triggering 17 of the 19 positive events (see table 3.1). Table 3.2 Results of triggering with ranges of Jerk of OF speeds above one in combination with fusion mask over the original images. Range>1 TRUE FALSE Positive 17 2 Negative 38 63 CHALMERS, Applied Mechanics, Master’s Thesis 2011:38 45 3.2.4 GLCM properties A statistical approach in the identification of the drivers' silhouette in images of STD can be done according to the spatial distribution of pair of pixels over the images. This involves es t imat ing how the properties of GLCM change in positive and negative events. As can be seen in Figure 3.6, the trend at the beginning of the ROC curve when thresholding with Energy values is better than the obtained using the Contrast. This is due to an incre