Recognizing safety-critical events from 
naturalistic driving data

Master’s Thesis in the Master’s programme of Automotive Engineering  

NIEVES PAÑEDA GONZÁLEZ

Department of Applied Mechanics 
Division of Vehicle Safety 
CHALMERS UNIVERSITY OF TECHNOLOGY 
Göteborg, Sweden 2011 
Master’s thesis 2011:38 


MASTER’S THESIS 2011:38

Recognizing safety-critical events from naturalistic 
driving data

Master’s Thesis in the Master’s programme of Automotive Engineering

NIEVES PAÑEDA GONZÁLEZ

Department of Applied Mechanics
Division of Vehicle Safety

 CHALMERS UNIVERSITY OF TECHNOLOGY

Göteborg, Sweden 2011


Recognizing safety-critical events from naturalistic driving data
Master’s Thesis in the Master’s programme of Automotive Engineering
NIEVES PAÑEDA GONZÁLEZ

© NIEVES PAÑEDA GONZÁLEZ, 2011

Master’s Thesis 2011:38
ISSN 1652-8557
Department of Applied Mechanics
Division of Vehicle Safety
Chalmers University of Technology
SE-412 96 Göteborg
Sweden 
Telephone: + 46 (0)31-772 1000

Cover:
Curso de amaxofobia para profesores de autoescuela en Córdoba, 2011, Nicole Kidman, 
Available at: <http://www.blogdelaautoescuela.com/blog/wp-content/uploads/2009/02/
amaxofobia.jpg> [Accessed 20 May 2011].  

Chalmers Reproservice / Department of Applied Mechanics 
Göteborg, Sweden 2011


I

Recognizing safety-critical events from naturalistic driving data
Master’s Thesis in the Master’s programme of Automotive Engineering
NIEVES PAÑEDA GONZÁLEZ
Department of Applied Mechanics
Division of Vehicle Safety
Chalmers University of Technology

ABSTRACT

New trends in research on traffic accidents involve conducting Naturalistic Driving 
Studies (NDS). NDS are based on large-scale data collection of driver, vehicle and 
environment information in real-traffic. NDS provide large data sets which have proven 
to be extremely valuable for the analysis of safety-critical events such as near crashes 
and incidents.

NDS data needs to be filtered to recognize safety-critical events. Filtering safety-critical 
events has been traditionally  achieved by using kinematics triggers (e.g. searching for 
deceleration below a certain threshold signifying harsh braking). The low sensitivity and 
specificity of this filtering procedure, however, requires manual annotation of video data 
to decide whether the events individuated by the triggers are actually safety-critical. 
Such reviewing procedure is based on subjective decisions, time-consuming, and often 
tedious for the analysts.

This project looked into improving this reviewing procedure using video data collected 
from 100 Volvo cars during one year in Gothenburg within a NDS called euroFOT. 
More than 400 videos from the triggered events have been reviewed, concluding that 
driver’s reaction may be the key  to discriminate safety-critical events. In fact, whether 
an event if safety-critical or not depends on the driver. Several statistical procedures 
have been then applied to automatically  recognize driver reaction from video data. In 
this project, we showed how combining automated video analysis with kinematics 
triggers increases sensitivity of near crash recognition from NDS data. These results 
open up to new ways to use video frames in NDS.

Key words: naturalistic driving, driver behavior, traffic safety, near crashes, safety-
critical events, driver’s reaction, euroFOT


II


III

Contents

INTRODUCTION 1

Naturalistic Field Operational Tests: real-traffic data 1
State of the art of N-FOTs: EuroFOT 2
Available data from VCC (euroFOT) 4

Data reduction approach: triggering data 4

What is safety-critical? Driver behaviour in NDS 6

Purpose 7

METHODS 8

Driver’s reaction recognition. General assumptions 9

Definition of training sample 10

Recognition of driver’s reaction. General structure 11
Data description & Image pre-processing 11
Recognition of driver’s reaction in sequences 13
Silhouette detection in STD of Jerk images 23

Evaluation criteria. Data set definition 33

RESULTS 35

Performance in the training sample 35
Optical Flow 37
Mean criterion 37
Harmonic mean 40
Mean&General mask 40
GLCM properties 41

Results in the validation data set 42
Mean criterion 42
Harmonic mean 44
Ranges of jerk from OF 44
GLCM properties 46
Analysis of false negatives and positives 47
Mean criterion in motion’s detection 49
Comparison 51

DISCUSSION & CONCLUSIONS 54

Where did the idea of recognizing driver’s reaction come from? Triggering in 
euroFOT based on the 100-Car study algorithms 54

Recognizing drivers’ reaction as potential trigger 57

Final conclusions 60

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38


IV

REFERENCES 62

APPENDIX 1 66

APPENDIX 2 73

APPENDIX 3 76

APPENDIX 4 80

 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


V

Preface

This project ends the academic formation that I held in Spain over the last years. 
“Recognizing safety-critical events from naturalistic driving data” has given me the 
opportunity to learn about traffic safety in the multicultural environment of an open area 
at SAFER. Personally and professionally, I will never forget this experience in Sweden. 
There are many thanks that I would like to share: 

Thanks to those who make the naturalistic driving studies possible. Specially, thanks to 
Volvo Cars for allowing me access to their database in this research. Thanks to all the 
participants who have been recorded while driving for their collaboration in gaining 
knowledge about driver behaviour. Without them this project wouldn’t be possible. 

Thanks to SAFER, where I was working during the last months. It was a pleasure to be 
part of this family and the incredible work of this group to save lives. 

It’s said that a good teacher teaches, and the best teacher inspires. To my  supervisor 
Marco Dozza, thanks for inspiring me during this project. Thanks for this opportunity, 
for trusting me from the beginning and for your guidance during these months. I feel 
very lucky to have worked not only with a great professional, but a great person.

Thanks to the University of Oviedo for letting me participate in this international 
exchange. Specially, thanks to my supervisor in Spain Ramón Rubio.

To my friends and to everyone I’ve shared experience, thanks for making unforgettable 
this year in Göteborg.

To my family, muchas gracias for your unconditional support in my life plan.

Göteborg June 2011

Nieves Pañeda González

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38


VI CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


1 Introduction

This chapter presents the reader with an overview about Naturalistic Driving Studies 
(NDS) and their implementation together with Field Operational Tests (FOTs). In 
particular, the euroFOT project is introduced as a base of this project. This chapter also 
covers the limitations found in previous studies and formulates the research question 
and objectives for the present project.

1.1 Naturalistic Field Operational Tests: real-traffic data

Statistics said than more than 1.2 million people die on the roads in traffic accidents 
every  year (WHO, 2009). Technological advances allow the development of new 
systems in cars to mitigate road accidents by  automatically  detecting risk situations. To 
make it possible, it is essential to know which the real causes of accidents are. 

New trends in research on traffic accidents involve conducting Naturalistic Driving 
Studies. Naturalistic Driving Study (NDS) as concept refers to a “method of 
observation that captures driver behaviour in a way that does not interfere with the 
various influences that govern those behaviours” (Boyle et al., 2009). Statistics and 
crash investigations rarely provide information about behavioural issues before the 
incident. In simulations, test  subjects are well aware of the experimental conditions. 
Thus, NDS aim collecting data on driver behaviour in a natural setting. In this 
naturalistic observations drivers use, preferably, their own car equipped with cameras 
during their daily  driving. Experience in this field shows that drivers quickly forget the 
presence of cameras.

On the other hand, new technologies enable the collection of an extended amount of 
data, such as vehicle dynamics or the environment, in real traffic within large-scale 
testing programmes called Field Operational Tests (FOTs). FOTs are studies 
undertaken to evaluate the efficiency of intelligent in-vehicle systems as well as the 
impact on safety  and the driver acceptance, among others (ERTICO, 2009). The main 
purpose of these systems is to assist and inform drivers while driving. This concept 
applied to the field of safety embraces alerting the driver or automatically acting in the 
car in presence of what the system understands as a risky situation. 

To sum up, FOTs are a complementary step to the development of intelligent  in-vehicle 
systems. The procedure is mainly based on: 

-Instrumenting cars with loggers to collect information from the CAN bus (signals 
from accelerometers, gyroscopes, turn indicators, etc.), GPS and/or extra sensors.

-Driving such equipped cars to collect data.

-Performing analysis from collected data.

Although FOTs and NDS pursue different objectives, this view is changing. 
Combination of both, called Naturalistic Field Operational Test (N-FOT), allow the 
use of this unobtrusive observation of drivers to evaluate their relationship  with the car 
and the environment under crash-risk and the effectiveness of intelligent in-vehicle 
systems. 

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                      1


1.1.1 State of the art of N-FOTs: EuroFOT

During the last years, FOTs and N-FOTs have been conducted in the United States, Asia 
and, relatively  new, in Europe. Particularly, US has extensive experience in NDS with 
programs as 100-Car study, 250-Truck study, the Commercial Vehicle Operation study 
or the Strategic Highway Research Programme (SHRP2). 

The 100-Car Naturalistic Driving Study  (Dingus et. Al, 2006) was the first large-scale 
program where data from 100 drivers were collected during one year.  The main goal of 
this research project was the study of contributing and associative factors (such as driver 
behavior, kinematic characteristics and corrective actions) in critical situations. In the 
ongoing SHRP2 project (TRB, 2011), data from 3000 volunteer drivers in instrumented 
cars will be collected. Main goals are to redesign highways (congestion reduction, 
planning, environmental conditions) and to study human behavior for a safer highway.

Among the European experience in this field can be highlighted the contributions of 
SAFER, the Vehicle and Traffic Safety Centre at Chalmers University, in Sweden. 
Programs as SeMiFOT (Victor et. al., 2010) in collaboration with Michigan, carried out 
the development of a N-FOT methodology. Data were collected from 14 vehicles during 
six months, with the participation of 39 drivers that made 12.571 trips.The methodology 
is widely used in accident research and evaluation of safety and acceptance.

The ongoing second version SeMiFOT2 is using the data collected in the first version of 
the program. New statistical methods, such as extreme value theory, are being explored 
to identify and model outliers. This provides useful information for insurance 
companies, for instance, to establish a link between rare events and catastrophic 
consequences (García, 2004). In addition, the analyses of visual motion in drivers are 
one of the main lines of research.

Other ongoing European projects are TeleFOT, 2BeSafe NDS, INTERACTION, 
TSSFOT, simTD and euroFOT (ERTICO, 2010). Particularly, this research has accessed 
the data collected in euroFOT. Characteristics of this program are further explained 
below.

Co-founded by the European Commission, euroFOT began in May 2008 and will last 
until February 2012 supported by 28 partners (vehicle manufacturers, automotive 
suppliers, and research institutes among others). As stated in the previous section, 
intelligent in-vehicle systems are tested to explore potential ways to improve European 
road traffic.

 The tested applications in euroFOT may be classified as (ERTICO, 2010):

•Assisting the driver in forward/rear directional safety: 
- Adaptive cruise control
- Forward collision warning
- Speed Control System

• Assisting the driver to detect hazards at the sides of the car:
- Blind Spot Information System
- Lane departure warning / Lane Assist / Impairment Warning

• Advanced applications:
- Curve Speed Warning
- Fuel Efficiency Adviser
- Safe Human/Machine Interface

2 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


These functions have been tested in a fleet of 1000 instrumented cars from nine 
different brands across France, Germany, Italy and Sweden. This has led one of the 
largest and most completed FOT’s databases in Europe for public research.

As can be seen in Figure 1.1. FOTs are operated on fleets managed by different OEMs 
around Europe.

Figure 1.1 Geographical coverage of euroFOT: OEMs and operation sites. (Mure 
S., 2010, EuroFOT [electronic print] Available at: <http://wiki.fot-net.eu/index.php?
title=File:Eurofot.jpg> [Accessed May 2011]).

Depending on the project and the OEM, various devices are part  of the test equipment to 
collect data.  These may be classified according to the source of the recorded signals:

-CAN bus.

-CAN bus and video cameras.

-CAN bus, video cameras and extra sensors (as eye tracker).

In addition to the test and evaluation of intelligent in-vehicle systems, some research 
focuses on naturalistic observation, hence the implementation of cameras in the cars. In 
any case, the resources for data collection and storage are common in both types of 
projects.  Another type of drivers’ data comes from interviews and questionnaires. 

Both the kinematics of the car from loggers and camera images have proved very  useful 
when studying the interaction between driver, vehicle and the environment during a 
crash risk situation. The knowledge on driver behaviour and dynamics of the car before 
an accident allow for hypothesising possible causes. This is a step towards the inclusion 
of new measures in accident prevention.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                      3


1.1.2 Available data from VCC (euroFOT) 

In particular, this research has accessed the data collected from 100 Volvo Cars driving 

for a year in Gothenburg within euroFOT program. After a certain period of 
continuously data collection, information from loggers was downloaded and transferred 
to a network. Then, these signals have been post-processed and stored into MatLab 
variables. 

The available data are mostly signals from the CAN bus sampled at 10 Hz, GPS 
information, video images and signals from the eye tracker. These provide information 
on, for example, kinematic values (such as speed, lateral and longitudinal acceleration, 
brake pressure, yaw rate, steering wheel jerk, among others) or signals from intelligent 
in-vehicle systems and turn indicators.

A total of four cameras are installed in each of the instrumented cars. Two are located in 
the front and back of the cars to mainly reconstruct rear-end crashes and evaluate the 
traffic flow. One is located under the steering wheel, to record the pedals and the feet 
movements. Finally, another camera is located in the rear-mirror, focusing the driver. 
The eye tracking is also available.

1.2 Data reduction approach: triggering data

To understand the causes of road accidents and be able to further develop 
countermeasures is essential to analyze safety  critical situations. The identification of 
safety  critical situations among hours of normal driving is a limitation when loggers and 
cameras are continuously recording. Therefore, once data are collected, a filtering 
process is carried out before performing analysis (see in Figure 1.2). This process is 
commonly called triggering the data. The main goal of this data reduction approach is 
the discrimination between normal driving situations (negative situations) and the 
critical events (positive situations) while driving.

Figure 1.2 General steps before the evaluation of safety in FOTs.

A more precise definition of what those critical situations are, is given in the first large-
scale FOT conducted in US, the 100-Car study. The distinction is done as follows 
(Dingus et al., 2006):
4 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38

Instrumenting 
cars

Driving: data 
collection

Triggering 
data

Analysis Evaluating the 
impacts on safety

Thresholds

Positives

Negatives
False 

triggered

Positives

Negatives

True 
triggered

Crash Relevant Events


-Crash: situations in which there is physical contact between the subject vehicle 
and another vehicle, fixed object, pedestrian, cyclist or animal. 
-Near-Crash: situations requiring a rapid, severe, evasive maneuver to avoid a 
crash.
-Incident: situations requiring an evasive maneuver occurring at less magnitude 
than a near crash. 

These safety critical situations are grouped under the name Crash Relevant Events 
(CREs). Once they are located in the database, the next steps are to conduct a detailed 
description (of the driver behavior, the environment, traffic conditions, etc), draw 
conclusions and evaluate possible solutions.

Conventionally, CREs from naturalistic driving data have been isolated from the large 
database using kinematic triggers. These are pieces of code that run throughout the 
database and record situations with certain kinematic values. Most of these triggers are 
associated with common evasive maneuvers and acceleration peaks. For example, one 
of the most typical responses in drivers is to slam on the brakes to avoid a rear-end 
collision, which leads to peaks in longitudinal acceleration. Therefore, situations in 
which deceleration is below a certain threshold1 may  indicate that there is a CRE. In that 
case, the recorded situations have been true triggered and constitute a list of candidates 
to CRE.

However, as evidenced by  triggering with kinematic values, some CREs are missing  
(positives that haven’t been triggered, usually called false positives) and many normal 
driving situations are wrongly  triggered (false negatives). This is mainly due to some 
cutoff kinematic values related to evasive maneuvers may be identical to those obtained 
while normal driving because of the diversity of drivers and ways of driving. For 
instance, the same acceleration value may or may not be indicative of risk depending on 
the aggressiveness of the driver and his/her driving experience. Taking as reference 
signals such as braking, incidents in which the driver is distracted would be lost. Hence 
the importance of a precise definition of what is a CRE and the development of 
intelligent triggers.

Among all the possible types of CRE, crashes may be more likely to be detected. This is 
due to the involvement of contact is likely  to cause sudden changes in kinematic 
parameters. However, near-crashes and incidents are closer to normal actions while 
driving. Thus, trying to locate these situations, which are also relevant from a safety  and 
statistical point of view, creates a high rate of false negative events.

The low sensitivity and specificity  of triggering with kinematic values require the 
intervention of reviewers, who decide whether the situation is critical by watching the 
video segments from the candidates to CRE. Therefore, only the true triggered events 
that have been considered positive by the annotators pass into the analysis phase. Such 
reviewing procedure it’s mostly based on subjective decisions, time-consuming and 
often tedious for the annotators.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                      5

1 values taken as references for each trigger to save results (if keeping decelerations below -4 m/s2, then 
acceleration is the trigger and -4 is the threshold).


1.3 What is safety-critical? Driver behaviour in NDS

The 100-Car study defines CRE as:

  “A subjective judgment of any circumstance that requires, but is not limited to, a crash 
avoidance response on the part of the subject-vehicle driver, any other vehicle, 
pedestrian, cyclist, or animal that is less severe than a rapid evasive maneuver (as 
defined in near-crash event), but greater in severity than a normal maneuver to avoid a 
crash(...)” (Klauer et al., 2006)

When annotators review the list of candidates to CRE from the triggering process, their 
subjective judgment it’s based primarily on their perception of how critical the situation 
seems. This concept is under the above definition, since annotators should evaluate 
whether the circumstance requires a crash avoidance response on the driver or other 
involved. 

Given the limitation of answer this question by just checking the kinematic values of the 
car or its proximity to other vehicles (objective judgment), each annotator mostly  bases 
his/her opinion on the own driving experience. This hypothesis casts a question: what I 
think it’s critical, is it also critical for you?. It may be that the fairest answer to this issue 
requires some empathy with the subject-vehicle driver. This changes the question into: 
Does the driver think that the situation is safety-critical?.

The answers to this question in previous studies were based, for instance, on the force 
with which the driver depresses the brake pedal2  or on changes in the speech under 
threatening conditions (Malta et al., 2009). This is also related with the fact that around 
the 60% of drivers brakes before a crash (Molinero et al., 2009). The main limitation 
arises in those critical situations closer to normal driving in kinematic terms, such as 
near-crashes and incidents. These provide a large source of information and a definite 
benefit in safety and statistical analysis concerning NDS (Guo et al., 2010). 

There are many literature about how driving is affected by  factors such as country, 
gender, age, or lifestyle among others (Evans, 2004). These factors imply a diversity  of 
driving modes, hence the importance of using the driver as part of the analysis. This 
conclusion was also pointed out in 100-Car study (Klauer et al., 2006).

The analysis of driver behaviour in NDS has been used, for instance, in the development 
of a model based on multi-modal signals (Takeda, 2010), or in the study of situations 
when drivers approach to intersections. In this case, it  has found a relationship between 
distance to other vehicles and the location of covering the brake pedal (Sato and 
Akamatsu, 2007). The movements of the head and eyes are also objects of study in the 
distractions at the wheel (Nagase et al. 2009). 

Regarding to the driver behaviour prior to a CRE, Molinero et al. (2009) define key 
events in situations with failure or not presence of manoeuvres. These include excessive 
speed and inappropriate reaction, which they  relate to driver panic. This concept is 
present in so-called oops reactions in SeMiFOT, used in the study of driver inattention 
associated with poor driving performance (Victor et. al, 2010). They also highlight the 
importance of optimizing the CRE triggers.

6 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38

2 brake pressure signal in combination with speed is a potential trigger detected while triggering an initial 
euroFOT dataset (see Appendix 1).


The main limitations in the identification of CRE in a large data set are the variety of 
drivers and the wide range of situations. This procedure based on what the driver is 
expected to do, such as evasive maneuvers, leads to loss CRE and results in a high rate 
of negative situations. Although the perception of what is risky and what can be done 
depends on the person, there may be a common feeling when someone realizes that 
something is wrong. This feeling may materialize in a particular body language, before 
whatever evasive action, if any.

1.4 Purpose

Conventional triggering does not seem very efficient to find critical situations among 
hours of normal driving in a large database. Although kinematic filters can run 
automatically into the database, the high rate of false events requires the manual 
intervention of reviewers. Such reviewing procedure is mostly  based on the drivers’ 
reactions in images from cameras inside the cars. In addition, this procedure is time-
consuming and often tedious for analysts. Furthermore, comparison of results between 
different NDSs may also be inaccurate given that the validations are subjective 
decisions of reviewers opening for inter-subject and intra-subject reliability concerns.

A traditional triggering procedure applied to the initial euroFOT data set suggested the 
hypothesis that there is a relationship between driver motion and CRE. This idea came 
after watching more than 400 videos containing 40 positive situations3.

The main objective of this thesis is to test such hypothesis by creating an algorithm able 
to automatically identify CREs among the events triggered with kinematics values in 
euroFOT database. Such algorithm is based on the recognition of driver’s reaction from 
video images.

By defining a training sample from the initial triggered procedure, several methods were 
applied to recognise the driver’s reaction using images from cameras inside the car. 
Once possible algorithms had been defined and tested in the training sample, the next 
step was to evaluate them in a larger data set. Conclusions of these procedures and 
suggestions for future research are also addressed in the last chapters of this thesis.

The scope of this thesis has excluded the use of images other than 1) the driver’s body 
and 2) the search for kinematic values related to the driver’s reactions. Further, this 
thesis moves a first step toward the integration of video information for triggering CRE 
focusing on the driver reaction and not on the current possibilities of image-processing 
algorithms.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                      7
3 further information in Appendix 1


2 Methods

The following chapter proposes the algorithms employed in this thesis to recognize
drivers’ reaction from cameras inside the cars. The different algorithms were tested on a 
training sample containing two normal driving situations and a CRE for eleven different 
drivers. The intermediate goal was to find a method that allowed for an automatic 
discrimination between true and false CRE.

training
sample

Figure 2.1 Methodology. 

The Figure 2.1 contains a schema of the followed methodology, whose steps are 
addressed in more detail throughout the following sections. To have an overall idea, 
these can be summarized as follows:

1 33 sequences, containing positives and negatives situations, were extracted from 
the euroFOT database to define a training sample.

2 Then, three methods were applied in the training sample to discriminate between 
positives and negatives: the t-test&vartest, the Optical Flow calculation and the STD 

of jerk. The last two were identified as potential algorithms and entered the next phase. 

3 STD of jerk required an intermediate step to convert its graphical information to 
numerical. Among several methods, the mean, harmonic mean and GLCM properties

were used as three different convertors that allow an automatic detection. 

8 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38

validation
data set

euroFOT 
database

methods

11 positives
22 negatives

t-test & vastest

optical flow

STD of jerk

algorithms based 
on driver’s reaction 

recognition

mean

harmonic mean

GLCM props.

Nº pixels in STD

edge detection

19 positives
101 negatives

from graphical 
to numerical 
information

1

2

3

4

5


4  The three convertors of STD of jerk together with the Optical Flow criterion 
defined four potential algorithms based on driver’s reaction recognition in the 

identification of CREs.  

5  Finally, the four algorithms were first tested on the training sample and then on 
the validation data set. This was formed by 120 situations (101 negatives and 19 
positives) extracted from euroFOT database. The results from this phase are explained 
in the next chapter.

2.1 Driver’s reaction recognition. General assumptions

As suggested by the viewing of videos of candidates to CRE triggered in an initial 
euroFOT data set, the key to discriminate between normal driving situations and CRE 
may be the driver’s reaction. In fact, it’s the driver who decides whether the situation is 
critical (positive event) or not (negative event).

For instance, harsh braking is one of the most typical responses when drivers presence a 
critical situation. A high decelerations is used as trigger to detect such CREs. However, 
there are more aggressive driving styles, so the same deceleration level may be achieved 
in drivers that are totally aware of the situation. Due to the diversity  of drives and 
personalities, reviewers examine which is the driver attitude in the videos to guess 
whether the situation is critical for him/her. 

In euroFOT, these images are taken from cameras located in the rear-view mirror inside 
the cars. These are oriented toward the driver, making it possible to observe his/her 
torso4. In the sequences of CREs is observed a rigid body motion common to all drivers. 
This reaction is characterized by sudden movements, such as suddenly grab the steering 
wheel with both hands and tilt  the body forward. This theory also fits with the findings 
in a study of emotions and associated motions, which relates the surprise with an 
acceleration of the whole-body portions (Kobayashi, 2008). 

Prior to the beginning the search for possible methods, assumptions and requirements 
should be defined. Based on the findings of the initial triggering procedure, 
assumptions are:

1) Driver reaction is an indicator of CREs.

2) Motion in the driver’s body from euroFOT cameras can be used to detect 

driver reaction. Given the presence of kinematic changes while driving, driver 
reaction implies movement (it may not be just a change in face expression).

3) On the basis of the second assumption, individual movements may be not enough 
self-explanatory. Thus, a sequence of movements seems the best indicator of 
driver’s reactions.

The main requirement is that the greatest number of positive events should be detected 
with the least possible number of negative events. This means to increase the 

sensitivity of CRE recognition. The main challenge in this point is to identify  near 

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    9
4 front-seat passengers are not included in the camera’s field-of-vision.


crashes and incidents, since they have kinematic values closer to normal driving 
situations. 

Other issues to consider are the variety of drivers and the computational time. It’s 
important to create an algorithm able to detect different drivers’ reactions in the shortest 
possible time. This can be generalized considering the images as matrices containing 
numbers (pixel intensities). In addition, a statistical approach can contribute to 
measure changes in these matrices and to save computational time.

Due to privacy and ethical issues, throughout this project the faces of drivers are hidden 
to remain anonymous. The tools used in the analysis were placed in locked rooms 
following the requirements on personal data handling. Only  authorized analysts were 
able to see the displayed data in such rooms. An information document was signed 
before getting access to ensure that individual drivers could not be identified by anyone 
except authorized persons. Finally, the extracted data have been revised to include them 
in this report. 

2.2 Definition of training sample 

The training sample is a part of the entire data set used for testing methods. This 
involves testing and searching alternatives to recognize driver’s reaction within a 
limited collection of data. Results from this procedure allow the development of 
potential algorithms able to identify CREs among negative and positive situations. This 
will be further evaluated in a larger data set. 

For the results to be enough consistent, the training sample should be representative of 
the population. In this case, it  contains two-second sequences of eleven different drivers 
randomly selected among positive events. This positive events were obtained using the 
kinematic triggers defined in 100-Car (Dingus et al, 2006) in an initial euroFOT data 
set. By watching those videos is possible to identify  the whole driver’s reaction within 
two-seconds (starting half-a-second before the triggered time). Events have been further 
described in Appendix 2.

The training sample also contains two additional negative events for each driver (see 
Figure 2.2). These have been recorded in the sequences that take place four and two 
seconds before the positive event. Such sequences are related to normal driving, thus 
they are defined as negative events.

Figure 2.2 Procedure to define the training sample.

To sum up, the training sample contains 33 situations, of which eleven are positive 
events. The fact that these are experimented by different drivers is based on the 
requirements established for the algorithm, given that the database is formed by 100 
drivers.  

10 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38

Initial 
euroFOT 
data set

Triggering based on 100-Car study
+

Validation by video viewing
11 true positives events 
by 11 different drivers

Frames collection of 
2-second video 

sequence for each 
driver

2 negative events
+

1 positive event
Training 
sample

33 events
11 positives

22 negatives


2.3 Recognition of driver’s reaction. General structure

Several methods were applied to the training sample in order to:

1) Recognize the positive events among the rest of the sample. Which are the 
differences in driver’s reactions between positive and negative events?
2) Once the differences were established, efforts were focus on finding a way to 
automatically detect as many positive events as the lowest possible of negatives. 
At this point, it was important to save computational time.  

Possible solutions to address both research questions are presented below together with 
some initial steps to prepare the images. Note that this research aims to identify in a 
rough and fast way the reactions of the drivers. Therefore, more specific and accurate 
image processing methods, such as defining specific features in the images and 
analyzing the movement, were not considered. This is mainly limited by the size of the 
database and the variety of drivers.

2.3.1 Data description & Image pre-processing

As specified in previous sections, issues as the computational time and the diversity of 
drivers play an important role together to rightly identify driver’s reaction. Therefore, 
images were treated as matrices containing pixel intensity values. Since the collected 
data in euroFOT is available in MatLab, the scripts to access and evaluate the data were 
also programmed in its language. This section covers technical issues about the 
structure of the data and initial steps to extract and prepare the images (see Figure 2.3).

Figure 2.3 Steps of data acquisition and image pre-processing.

Extracting sequences from videos_ Three video sequences of two-second duration 
were extracted from files in format .avi for each of the drivers of the training sample5. 
The original images are in grayscale with 288x352 pixels. The frame rate is 12,5 fps6. 
Each frame was saved to a level of array, which is defined by two other structures: 
cdata and colormap (see Figure 2.4).

Figure 2.4 Unfold of array structure and frames information.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    11

5 some problems to access the frames in long trips with the MatLab function aviread were solved by 
using an application called videoIO, developed by Geral Dalley (2006).

6 frames per second

Frames 1

2

3

...

cdata

colormap

Matrix with intensity pixel values

Array containing RGB values

EuroFOT
data set

Extracting sequences 
from videos

Cutting 
images

Removing 
flashes

Mask in 
window


Cutting images_ Since images are in grayscale, one plane was enough for defining 
the pixel values (colored images are defined by three planes). Only a certain area of the 
matrix stored in cdata was saved around the driver’s torso to remove superfluous 
information (see Figure 2.5). Thus, the final sizes of the images were 283x231 pixels.

Figure 2.5 Clipping the torso of the driver7. 

Removing flashes_ Over-bright images were removed from the sequences to avoid 
false changes in pixel intensity. Observations from the training sample indicate a 
constant frequency of one flashed frame each five. This effect was observed in some of 
the drivers, but this pre-filtering script was applied to all the sequences without 
distinction. Although this implies to eliminate right information in some cases and 
makes the sequences faster than in reality, it is preferable to false intensity changes.

Mask in window_ Superfluous information, as 
outside movements in the window’s area, may affect the 
results by generating changes in pixels intensities not 
related with driver’s motion. Therefore, a binary mask 
changed the pixel’s intensities to null values in the 
window’s area. A situation in which the driver’s body 
leans forward was taken as dimensional reference to not 
lose information of the driver’s motion.

Figure 2.6 Mask polygon in window’s area.

12 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38
7 the driver’s face has been covered due to confidentiality issues. 


2.3.2 Recognition of driver’s reaction in sequences

Following this first phase, the images were already preprocessed and the training 
sample was defined by a collection of frames for each situation in 33 arrays, of which 
11 were positive events. Below, there is an explanation of the three different methods 
applied in the identification of  those events based on driver’s reaction recognition

2.3.2.1 T-test and Vartest. Comparison of false&positive events

Frames are defined as matrices containing values of pixel intensity. These values change 
depending on the motion in the scene. Since negative and positive situations are 
recorded for each driver, it is possible to compare both to see how different the 
distribution of pixel values in each case is.

A way to use this information is conducting a t-test of the null hypotheses that data in 

a certain pixel position along both arrays of each sequence are from the same 

normal distribution. This theory was applied using two different functions in MatLab:

- Ttest2: tests the null hypothesis that values for each pixel position come from 
populations with equal means, against the alternative that means are unequal 
(unequal variance is assumed).

- Vartest2: tests the null hypothesis that values for each pixel position come from 
populations with equal variance, against the alternative that variance is unequal.

Under 5% of significance level (by default), functions return h=1 if the null hypothesis 
is rejected and h=0 on the contrary, so results can be represented as binary images. In 
addition, it computes a p-matrix containing the probability  of observing the values as 
extremes. Three different populations were considered in this calculation:

- Intensities in the same pixel position over time in both sequences.
- First derivative values for each pixel position over tie in both sequences: deriving 

also takes into account the time changes. Those most obvious (the largest change 
in intensity in less time) were expected to be blank areas in the h-matrix. 

- Square of the first derivative values for each pixel position over time in both 
sequences: if the pixel intensity decreases during the sequence, the first derivative 
becomes negative. Then, the square values consider whether this effect can affect 
the results.

For each population, two binary images resulted from the calculation of two different t-
tests: one with two negative events and the another with a positive and a negative event. 
The procedure was the same when performing a vartest. 
It was expected that the binary image resulting from the comparison of a negative and a 
positive events contained more white areas than the resulting from the two negatives. 
This would mean that the intensities in those pixels have experienced more changes and, 
therefore, rejected the null hypothesis. According to this theory, the positive event could 
be recognized in the following steps (see Figure 2.7):

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    13


Figure 2.7 Procedure of recognition using t-test&vartest as potential triggers.

 
First, t-test and vartest were applied to compare a negative and a positive sequence for 
the three different populations. The aim was to detect which population and which test 
were most suitable for recognizing the driver’s motion. Results of the first  approach are 
shown below (see Figures 2.8 and 2.9):

Figure 2.8 Binary images from vartest of: intensities, first derivative of intensities 
and square of first derivative of intensities (driver A686).

Figure 2.9 Binary images from vartest of: intensities, first derivative of intensities 
and square of first derivative of intensities (driver A686).

14 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38

Sequence 1 Sequence 2

T-test // Vartest

Binary image 
(h values)

Group & Measure 
of white areas

threshold

Positive event

Negative event


As can be seen in Figure 2.8, the binary image from the t-test of square-of-first-
derivative of intensities seemed to be the most representative of the driver’s motion. It 
highlights the areas that have undergone major changes: the hand and the driver’s head. 

Using the square-of-first-derivative of intensities as population, t-tests were performed 
in two negative sequences and in a positive and a negative sequences applying the 
procedure detailed in Figure 2.7. When comparing the positive and the negative events, 
more white areas were expected on the resulting binary image that collects the h values. 
Finally, grouping and measuring these white areas might be used to determinate which 
of two comparisons belong to a positive event (driver’s reaction).

Figure 2.10 Binary images from t-test of square of first derivative of intensities.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    15

Driver A686

Driver A484

Driver A567


In most of the drivers, the t-test comparing positive and negative sequences (event VS. 
2-second bef.) resulted in more white areas than the obtained from two negative 
sequences. This effect is observed in drivers A686 and A484 in Figure 2.10, since the 
images on the left have more white areas than the images on the right resulting of the 
comparison of two negative sequences. Nevertheless, the opposite occurred in the driver 
A567. Similar results are also observed in some of the drivers of the training sample.

The binary images are quite noisy, which makes difficult to relate the driver’s reactions 
with the white areas from the rejected null hypothesis. Then, it seems unclear to define a 
certain white-area threshold to highlight the reaction. The main problem is that some 

maneuvers while driving may be related with a broad change in the rate of pixel 

intensities. For instance, turning the steering wheel creates a larger white area when is 
compared via t-test with a sequence of driving on a straight road. Then, alternative 
methods were explored to distinguish the driver’s reactions.

2.3.2.2 Standard Deviation of Jerk

Given that  results from the t-test were unclear in the discrimination between positive 
and negative events, possible alternatives are discussed below. 

As noted with the conventional triggering procedure, the reactions of drivers are mostly 
related to sudden quick motions of the driver’s torso. Therefore, the time in which the 
action takes place appears to be an important factor. One way to take the motion’s time 
into account is by deriving intensities in each pixel position over time. 

Velocity (first derivative) and acceleration (second derivative) represent rate of change 
of position and velocity over time, respectively. If each derivation level is related to a 
rate of change of what is deriving, then the third derivative represents a rate of change 
of acceleration. Young relates control with third derivative in “The Reflexive Universe” 
to explain any fact of the daily life. He illustrates that controlling the car can be 
expressed with the third derivative since it is related to changes in acceleration (Young, 
2004). The third derivative is also called jerk, and its application can be extrapolated to 
various fields of mathematics and engineering (Iradier, 2006).

The jerk of intensity values at each pixel location over time can give an idea of how 

sudden these changes are. Calculating the jerk for each driving sequence allows 
studying whether the positive sequences have values significantly  different from the 
negatives. From this calculation, an array  containing matrices with jerk values for each 
sequence was obtained as result. Two ways to look into these arrays were considered:

- Computing the standard deviation (STD): The wider variance in a normal 
distribution of jerk values, the more different that they have been over time. It was 
expected from this analysis that the highest changes in accelerations were 
represented as whiter areas in a grayscale image. 

- Representing the maximum square of jerk values: Peaks in jerk distribution can 
also be represented as white areas in a grayscale image, without taking into 
account how different these values have been in the distribution. The squared 
values are used to avoid any influence from negative numbers when the pixel 
intensity changes to a lower value by deriving.

16 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


Prior to compute the jerk and its variance, over-bright images were removed from the 
sequence to avoid false changes in pixel intensity. Observations from the training 
sample indicated a constant frequency of one flashed frame each five in most of the 
drivers. This pre-filtering algorithm was applied to all the sequences without distinction. 
Thus, it was possible to lose information by removing right  frames and it also made 
sequences faster than in reality.

The resulting images from both calculations in the three sequences for one of the drivers 
(two negatives and one positive) of the sample are presented below. The goal was to 
distinguish the positive event (called just event from now) from the other two situations:

Figure 2.11 Maximum of square of jerk values: negative, negative and positive events.

Figure 2.12 STD of jerk values: negative, negative and positive events (driver A241).

The rates of acceleration changes during the event were not significantly  different from 
those obtained in negative sequences. In fact, maximums were mainly achieved when 
drivers were maneuvering in sequences previous to the event. Thus, driver’s reaction 
might not be related to peaks in jerk values. However, differences between negative and 
positive images from the standard deviation of jerk  (STD of jerk) seemed noticeable, as 
shown in Figure 2.12.
As can be seen in the image from the positive event in Figure 2.12, a white silhouette 

of the driver appeared when calculating STD of jerk during the event. If driver had 
CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    17


remained in the same position, a dark image was obtained as a result. On the other hand, 
maneuvers seemed to generate a certain white area in the image. This effect can be 
observed in the image on the middle in Figure 2.12, when the driver turns the steering 
wheel. In this context, it’s interesting the case of one of the drivers of the sample:

Figure 2.13 STD of jerk values: negative, negative and positive events (driver A1064).

As commented before, the sequences of the training sample were chosen randomly from 
the previous triggering procedure. In this specific case, it was unexpected clearly 
recognize the driver’s reaction since the motion was almost imperceptible in the video 
sequence. However, by looking at  the Figure 2.13, the driver silhouette obtained in the 
positive sequence enables to make a distinction respect to the other two previous states.

These results support the theory  that the driver’s reaction during a CRE seems to 

involve the whole body (rigid body motion in the driver’s torso), while maneuvers 

seem to involve just a certain part. This fact makes important to consider the area in 
which the changes take place to make the distinction. 

In most of the drivers of the sample, the resulting driver’s silhouette from STD of jerk 
allowed an intuitive recognition of the driver’s reaction and hence the positive events.  
Therefore, the possibilities were to post-process the STD-of-jerk images to identify the 
threshold that relates the graphic silhouette with the driver’s motion (a conversion from 
graphical to numerical information), or to keep trying other methods.

2.3.2.3 Optical Flow

A numerical alternative to the graphical method discussed in the previous paragraph was 
the calculation of the optical flow. Its original formulation came from Horn and 
Schunck (1981), who defined optical flow as “the distribution of apparent velocities of 
movement of brightness patterns in an image”. This distribution provides information 
about the object motion in terms of spatial allocation and rate of change. The optical 
flow constrains equation is defined as:

            Ix·u+Iy·v+It=0                                                                                                (2.1)

18 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38

Ix, Iy, It: spatiotemporal image brightness derivatives.
u=horizontal optical flow
v=vertical optical flow


This equation relates the intensity changes in a sequence of images with three-
dimensional object motion. Nevertheless, this relationship could result  unclear in some 
cases. For instance, the optical flow is zero in all the points of a rotating movement of a 
sphere. However, assuming the surfaces are flat, motion may  be related to changes in 
brightness. Therefore, velocities of object motion can be estimated by solving u and v8. 

 
During the last years, several studies have been carried to improve the performance of 
the classical formulation. In this context, D. Sun, S. Roth and M. J. Black (2010a,b) 
have recently contributed developing a more accurate optical flow. They  published a 
public Matlab code to compute this new optical flow formulation under educational 
proposes (Sun, 2010). Outputs are two matrices with the horizontal and vertical 
components of optical flow (OF from now) for each pair of processed images.

 Estimating flow in drivers_ The OF code developed by  Sun et al. (2010) were 
implemented in the training sample with the parameters established by default9. The 
main objective was to assess whether the body’s motion of the driver can be 

estimated with changes in brightness in a two-dimensional image.

Such script returned two matrices containing speed components for each pair of 
computed images. In this case, matrices were combined into a single keeping the 
magnitude of speed, since this value seemed more significant than the flow’s direction. 
Each matrices collection was kept in an array  for each of the sequences in the training 
sample. 

As commented in the previous section, the highest changes in acceleration in STD of 
jerk images weren’t reached during the driver’s reaction. This led to consider other 
alternatives that peaks in speed along the arrays to make the distinction. 

As happened in the calculation of jerk, pre-filtering alters the results since intermediate 
values are lost. However, this was preferable to false intensity changes due to flashes. 
Taking as reference one of the drivers of the sample, initial estimates consisted on using 
speed data from the OF calculation in each array (with and without filtering frames) to 
calculate:

- Local maximum from Optical Flow (OF): peak in speed of optical flow over the 
array of each sequence.

- Maximum sum of the whole array: maximum value in a matrix resulting from 
the sum of individual matrices with velocities of the entire OF array for each 
sequence.

- Number of pixels above or equal to the 95% of the peak in a matrix resulting 
from the sum of individual matrices of the entire OF array for each sequence.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    19

8 see further information about Optical Flow and its formulation in Appendix 3

9  it’s also possible to test different methodologies of flow calculation; by default the program use the 
“Classic+NL-Fast” (only pixels from certain regions are weighted to save computational time). 


Table 2.1 Comparative of OF values in negative and positive events (driver A686).

Sequences
 (driver A686)

Original sequence Filtered sequence

Local 
max. 

Max.
sum of 

the whole 
array

Nº of pixels with 
speed>=95% of 
the highest speed 

in total sum 
matrix

Local 
max. 

Max.
sum of 

the whole 
array

Nº of pixels with 
speed>=95% of 
the highest speed 

in total sum 
matrix

Negative (4 sec. before) 39,79 74,5798 1 38,98 59,7046 155
Negative (2 sec. before) 30,49 93,219 122 28,65 84,4021 277

Positive event 18,88 112,603 206 11,01 96,6913 298

By looking at the Table 2.1, some differences might be identified between the 
calculation with the original and the filtered sequence. Anyway, maximum values were 
achieved in the same categories. Local peaks of speed were not recorded during the 
positive event. However, it  registered the maximum when considering in the calculation 
a single matrix containing the sum of speeds over time. Besides, this matrix had more 
pixels with higher sum of velocities than the negative sequences.

Anyway, differences were not enough consistent to establish these values as criterion of 
discrimination. For instance, the number of pixels containing the highest sum of 
velocities was 277 in the negative filtered sequence (with 84.4021 of peak speed) and 
298 in the positive (with 96.6913 of peak speed). This suggested analyzing the speed 

of the optical flow between single frames instead of using a matrix containing the 

sum of values along the array.

Results of performing the same calculation on single frames from each of the sequences 
for the same driver are presented below:

 Average speed on single frames

As can be seen in the Figure 2.14, 
the average speed was 
significantly greater during the 
event (red line in the graph) than 
in negative sequences (obtained 
two and four seconds before the 
event). The peak was achieved in 
the 16th matrix of the OF array 
resulted from processing the 
images of the positive sequence.

Figure 2.14 Speed average over OF frames in positive and negative sequences.

20 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


 Peak speed on single frames

However, the peak speed wasn’t 
obtained in the OF of the positive 
event. As can be seen in the 
Figure 2.15, the highest peak 
speed was recorded in the 
negative sequence, which takes 
place four seconds before the 
event.

Figure 2.15 Peak speed over OF frames in positive and negative sequences.

 Number of pixels sharing highest peak speeds on single frames

As shown in Figure 2.16, at the 
time of maximum speed average 
(16th matrix of the event 
sequence, see in Figure 2.14) the 
number of pixels with, at least, 
the 95% of the peak speed was 
760. Another maximum was 
observed on the 13th of the array 
with 847 pixels. However, in 
comparison with other sequences, 
the maximum was registered in 
one of the negatives two seconds 
before the event. 

Figure 2.16 Peak speed over OF frames in positive and negative sequences.

The most significant speed rate during the event was recorded in the 16th matrix of the 
OF array. It resulted from the estimation of speeds between 16th and 17th frames of the 
original filtered sequence. Note that OF interpolation warps the second image and its 
derivative toward the first10. Those original frames are shown below to know the 
significance of such peak speed:

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    21
10 see optical flow formulation in Appendix 3.


Figure 2.17 Frames of the event sequence corresponding to peak speed (driver A686).

As can be seen in Figure 2.17, Frame 16 corresponded to the biggest change in motion 
during the reaction, when the body leaned forward. In Frame 17 driver was back to the 
original state. The change in motion was evident in this part of the sequence. 

The greatest differences between the event and second-before sequences seemed to be 
related to the average speed on single frames. According to this, sudden changes might 
be recognized by deriving. The Figure 2.18 shows the second derivative values of the 
average speed vectors for each sequence.

 Jerk of average speed on single frames

The biggest slope matched with 
the change in motion between the 
14th and 15th values during the 
event sequence. At that moment, 
driver’s body  leaned forward due 
to the inertia of harsh braking. 

Rates of acceleration changes 
recorded two seconds before the 
event kept values into a rate 
during the whole array. However, 
other significant change occurred 
between the 4th and the 5th 
derivative values four seconds 
before the event.

Figure 2.18 Jerk of average speeds over OF frames in sequences.

As can be seen in Figure 2.19, the second biggest slope in the sequence that occurs four 
seconds before the event was due to a change in the driver’s position:

22 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


Figure 2.19 Frames of the negative corresponding to peak speed (driver A686).

Given these findings, it seemed that peaks in the distribution of jerk from OF 

velocities were associated with the driver’s motion. Nevertheless, the main obstacle is 
the computational time required to estimate the OF

2.3.3 Silhouette detection in STD of Jerk images

In the last section, several methods were applied in the recognition of driver’s reaction 
in presence of CREs. Among these methods, the OF and the STD of jerk were identified 
as potential algorithms. The main limitation of the OF was the computational time 
spending. Although it didn’t concern the STD of jerk, its results were graphical. So, this 
graphical information should be converted into numerical to facilitate an automatic 
detection method. This automatic detection mostly involved the study of the properties 
which characterize the images of the STD of jerk in the event among the negative 
sequences.  

Figure 2.20 Converters from graphical to numerical information in STD of jerk.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    23

STD of jerk images

Dark image
Graphical 

information

Remaining in the 
same position

Maneuvering 

Reacting in a CRE

Certain white area 
(i.e. Steering wheel)

White driver’s 
silhouette

To numerical 
information

Observations

Mean

Harmonic mean

Counting pixels in intensity intervals

Edge detection

GLCM properties

General silhouette

Converters


2.3.3.1 Mean 

A first alternative to the use of graphic information from the STD of jerk was conducted  
by plotting the sum of the values along rows. The reason was to try to find out a 
silhouette in the images corresponding to the driver’s reaction. According to this 
hypothesis, it  was thought that the mean11 would be higher during the event than in 
previous sequences. This was based on the dispersion of pixels over the image to 
generate the driver’s silhouette. 

Taking as reference a driver from the training sample, the STD of jerk values were sum 
along rows for each of the sequences. The resulting vectors are plotted below together 
with the STD of jerk images of the driver (see Figure 2.20 and 2.21). 

Figure 2.21 STD of jerk values: negative, negative and positive events (driver A936).

As evidenced in Figure 2.20, 
the sudden motion from the 
driver’s reaction generated a 
white silhouette in the event 
sequence. Its distribution of 
STD values along rows 
reached a mean of 11900, 
while means in previous 
sequences are 2598 y 7299, 
respectively. The peak in  
the distributions of STD 
corresponded to the two-
second before image, where 
a bright white area is 
concentrated in the middle 
of the figure.

Figure 2.22 Distribution of sum of STD of jerk values along rows (driver A936).

This result was unexpected since in the pre-filtering procedure over-bright images were 
removed. By reviewing the video it was checked that  this area corresponds to a 
movement of the driver, who moves the arm from the steering wheel to the mouth.

24 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38
11 arithmetic average of the values over the distribution.


2.3.3.2 Harmonic mean

In educational fields, the harmonic mean is commonly used to calculate the final grades 
of students to ensure a reasonable level of work during the academic year (Wilson, 
2006). This case was also somewhat related given that mean was affected by local peaks 
in distributions. Hasna and Alouini (2002) used this formulation to study the 
performance of wireless communication. They defined the harmonic mean as follows:

“Given two numbers X1 and X2, the harmonic mean of X1 and X2, �H(X1,X2), is defined 
as the reciprocal of the arithmetic mean of the reciprocals of X1 and X2, that is:

    
It is clear that the harmonic mean of two numbers is equal to the square of their 
geometric mean divided by their arithmetic mean.” (Hasna and Alouini, 2002).

The harmonic mean is not affected by the outliers (related to maneuvers) and it’s also 
useful when the data are resulted from indirect calculations. In this case, jerk belongs to 
derivatives of intensity  changes in pixels over time. Therefore, this method was thought 
as an alternative of the mean calculation when distributions of STD of jerk were 
affected by normal driving maneuvers and position changes in the driver.

As can be seen in Table 2.2, the mean of the negative sequence that takes place 4-
seconds before the event was 7299, which corresponds to a 61,33% of the mean 
achieved during the event (11900). This influence was lower in the case of the harmonic 
mean. Assuming as 100% the harmonic mean achieved during the event (6393), the 
value recorded in the 4-seconds-bef. sequence represents a 18,62% (1190,5). 

Table 2.2 Comparison between mean and harmonic mean in the distribution of sum 
of STD of jerk values along rows in sequences of driver A936:

Criterion 2-sec. bef. 4-sec. bef. Event

Mean 2598 7299 11900
Harmonic mean 369,8246 1190,5 6393

2.3.3.3 Counting pixels in intensity intervals

Since the maximum values of STD of jerk weren’t achieved during the event, another 
option was to consider the number of pixels with a certain STD within an interval. This 
same concept is in the calculation of the image histogram. This method allows 
representing the intensity levels respect to the number of pixels that share such 
intensities. Histograms can be used to obtain the parameters of a texture (Alba et al., 
2006). In some way, the driver’s silhouette is related to a texture given that it’s defined 
by a relationship between pixels. Some of the properties of histograms can be 
summarized as follows (Olmos, 2008):

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    25

(2.2)


- Images can’t be rebuilt from their histograms.

- Two images can be associated to the same histogram.

- Histograms not contain spatial information about the image.

In the next trial, the driver A686 was took as reference to analyze the intensity levels 
generated by changes in driver’s position. Images of STD of jerk from the event and a 
previous sequence are represented below in a three-dimensional graphic (see Figures 
2.23 and 2.24). This graphical representation gave an idea about the rates of STD and 
their location over the image in a negative and in a positive situation.

Figure 2.23 Tridimensional distribution of STD of jerk over the image in one of the 
negative sequences of driver A686.

Figure 2.24 Tridimensional distribution of STD of jerk over the image of the positive 
sequence (event) of driver A686.

As shown in the graphs above, the negative sequence concentrated highest values of 
STD in the area generated by  the hand movement. It seems that the widest variances 
might be not related to the driver’s reaction. Results from counting the number of pixels 
within certain intervals of STD of jerk in each images are presented as follows:

26 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


As can be seen in Figure 2.29, the 
differences were more significant 
in the fourth and fifth interval of 
STD. These intervals were groups 
of pixels with STD of jerk between 
50 and 20. In this range of STD, a 
higher number of pixels were 
counted during the event than in the 
previous sequences.

Figure 2.25 Distributions of number of pixels within intervals of STD in the event and
in previous sequences for the driver A686.

The numerical values associated to Figure 2.25 are represented in Table 2.3:  

Table 2.3 Number of pixels within intervals of STD of jerk during the event and
previous sequences.

Driver A686 Number of pixels within a certain interval of STD of jerk

Nº of interval
(STD)

1 (300-150) 2 (150-100) 3 (100-50) 4 (50-20) 5 (20-10) 6 (10-5) 7 (0)

Event 1542 4178 11708 19422 26384 32199 28559
2-second before 871 2430 6167 11030 20147 30255 30152
4-second before 5229 8195 12788 17063 20705 28007 28962

By adding the values of the fourth and fifth intervals (columns “4 (50-20)” and “5 
(20-10)” in Table 2.3), the numerical difference between the event and the previous 
sequences was not enough significant to discriminate between both situations. The 
number of pixels during the event was 45806, while 31177 and 37768 pixels were 
counted in previous sequences, respectively. This suggested taking into account the 
spatial distribution of pixels in next tests.

2.3.3.4 Edge detection: Hough transform

The Image Processing Toolbox in MatLab contains several procedures to detect edges in 
an image. In the following test, the Hough transform method was applied to a positive 
and a negative situation for the same driver. This method is based on the parametric 
representations of lines in a plane (MathWorks, 2011): 

�=x·cos�+y·sin� (2.3)

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    27


The general procedure consists in detecting edges using Sobel or Canny algorithms. The 
resulting images may have open forms and isolated points. Then, the correction is 
possible by taking an initial point and drawing straight lines in a polar coordinate 
system. � and � values are accumulated in a matrix called Standard Hough Transform 
(SHT) to guess which pixel is more likely to belong to each edge. Peaks in SHT 
represent potential lines in the input image. Finally, houghlines command finds the 
extremes of the lines and fills the small gaps. 

This method was applied to a pair of images from the same driver, one obtained from an 
event and another from 2-second before the event. The Hough transform was 
represented in a graph and its peaks (potential lines) appeared in squares. Then, the 
detected lines were colored on the input images.

   
Figure 2.26 Hough transform and detected lines from sequence 2-sec. before the event

Figure 2.27 Hough transform and detected lines from the event’s sequence.

By looking at Figures 2.28 and 2.29, the detected lines were not clearly different to 
discriminate between both situations. This method is usually  useful in detecting roads in 
aerial images. However the straight lines seem not fit with the driver’s silhouette. This 
method can be also applied with curve lines by  previously  defining an original shape. 
This shape might be not clear definition in this case, due to the variety  of drivers and the 
camera positions.

28 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


2.3.3.5 General silhouette

Since maneuvers and changes into positions seem to generate white areas in the STD of 
jerk images, another possibility in the identification of events was to define a certain 
area where the reactions were likely to take place to avoid false positives.

To define this area, several silhouettes obtained from different drivers of the sample 
were combined into a single one. Three different procedures were applied using the 
command wfusimg in MatLab. This program merges two images using fusion methods. 
Thus, the images containing the drivers’ silhouettes from STD of jerk during the events 
were merged in pairs (see procedure in Figure 2.28):

In the Figure 2.28, x and y are sub-
images from intermediate fusions of 
pairs of images. zt represents the final 
merged image. 

Figure 2.28 Schema of combination of silhouettes.

The command wfusimg allows to define levels of approximations and details. The 
following are zt images resulting from variations in these parameters:

Figure 2.29 Resulting images using different inputs in the fusion command. 

Matrices of STD of jerk contain different values depending on the movement of the 
driver during the sequence and the illumination conditions, for instance. Merging 
images based on mean values for approximations and details (see image on the right in 
Figure 2.29) tends to highlight the drivers with widest variances in such matrices. 

Silhouettes from STD of 
jerk

(2-seconds sequences 
during events)

1
2
3
4
5
6
7
8
9

10

x

x2

x3

x4

x5

y

y2

z

zt

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    29

-Maximum for approximations
and minimum for details: 

-Maximum absolute for 
approximations and details:

-Mean for approximations and 
details:                    


However, the merged silhouette should provide an idea about the area in which 
commonly reactions take place, regardless of STD values. 

Since the merged images with maximum and minimum levels were quite similar, one of 
them was selected and a freehand region was drawn around the driver’s place (see 
image on the left in Figure 2.29). Areas of windows, rear seats and the steering wheel 
were not taken into account to avoid false positives. Although several reactions and 
evasive maneuvers are related to turn the steering wheel, this area seems to tend to 
confusion when discriminating between positive and negative events. 

This procedure was just an approximation to facilitate the study of changes in pixel 
intensities in a given area. The position of this Region Of Interest (ROI) was saved into 
an N-by-2 array in MatLab. This ROI could be applied as binary mask to the image in 
combination with the rest of the methods, aiming to improve their performance. 

2.3.3.6 Gray level co-occurrence matrix 

Texture filters often use the image’s histogram to statistically  evaluate the texture. 
Although this provides information about its properties, shape or spatial distribution 
over the image are unknown (IZMIRAN, 2005). 

Another statistical procedure of texture analysis that considers the spatial distribution is 
the Gray Level Co-occurrence Matrix (GLCM). GLCM contains how often pairs of 

different combination of pixel intensities occur in an image (see procedure in Figure 
2.35). This texture analysis is originally from Heralick et al.(1973) and today  is 
commonly used in medical image processing, modeling of forests attributes or studying 
the sea-ice, among others. In this case, this method was thought to identify the driver’s 
silhouette in images of STD of jerk based on its distribution. 

Figure 2.30 Process Used to Create the GLCM, [electronic print] Available at 
<http://matlab.izmiran.ru/help/toolbox/images/enhanc15.html>[Accessed May 2011].

30 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


This method can be applied in two main steps: 

 Definition of GLCM: Calculating the frequency of certain relationship between 
pixels requires the choice of:

-Offset: distance between the related pair of pixels.

-Direction of offset: direction in which the pair of pixels are going to be evaluated. 
This choice is based on a visual examination of what it’s likely to be more 
characteristic of the texture.

-Gray levels: the input image is scaled in a certain number of intensity levels. The 
lower scales, the lower computational time. Besides, the statistical study is improved 
by reducing the number of levels. 

 Calculation of statistics using GLCM: Once the GLCM  is defined, several 
statistical methods can be used to identify the texture’s properties. Hall-Beyer (2007) 
has created an online tutorial about how to define a GLCM and its possibilities. She 
defines three main groups derived from GLCM calculations, which are summarized as 
follows together with the possibilities offered in MatLab:

Contrast: the diagonal of the GLCM contains pairs of pixels with the 
same gray  level. If there is a high frequency  of these combinations, then 
the image doesn’t have much contrast. This measure is the sum of square 
of variances and increases away from the diagonal(=0 if constant image).

Homogeneity: closeness in the distribution of combinations in the 
GLCM. It increases with less contrast (=1 in the diagonal).

Energy: uniformity in the image that is measured by adding the squared 
elements (moment of inertia) in the GLCM (=1 for uniform image).

GLCM correlation: dependency of gray levels between neighboring 
pixels (=+1 or -1 for perfectly correlated image). This doesn’t take into 
account the frequency of occurrence of a pixel, but its frequency together 
with a given pixel value. 

Measures group 1: distance to the GLCM diagonal (contrast)

Measures group 2: how regular the pixels are within the image

Measures group 3: descriptive statistics of GLCM

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    31


The Table 2.4 contains some results of testing the GLCM  together with a Fusion mask 
in two drivers of the sample. The drivers were turning the steering wheel and moving 
the hand in previous sequences. These situations were chosen to define problematic 
situations that could interfere in the recognition of the driver’s silhouette.

Table 2.4 Properties in four directions of the GLCM in drivers A241 and A686 
(offset=200).

Driver A241 Driver A686 Contrast Correlation Energy

Event

A241:
[0.1598 0.0007 
0.2925 0.0031]

A686:
[0.6412 2.1862 

1.9693 0]

A241:
[0.3671 -0.0004 
0.2036 -0.0016]

A686:
[-0.0565 -0.2229 

-0.0400 NaN]

A241:
[0.8380 0.9984 
0.6473 0.9938]

A686:
[0.7625 0.3703 

0.3482 1]

2-sec. 
bef.

A241:
[0.0595 0 0.1335 0]

A686:
[0.0283 0.0696 

0.3453 0]

A241:
[-0.0085 NaN 
-0.0289 NaN]

A686:
[-0.0143 -0.0360 

-0.0402 NaN]

A241:
[0.9580 1 0.8710 1]

A686:
[0.9451 0.8705 

0.6109 1]

4-sec. 
bef. 

A241:
[0.0304 0 0.1766 0]

A686:
[0 0 1.4572 0]

A241:
[-0.0085 NaN 
-0.0356 NaN]

A686:
[NaN NaN 0.0481 

NaN]

A241:
[0.9625 1 0.8439 1]

A686:
[1 1 0.5422 1]

The contrast was one of the properties resulted in more significant differences between 
the event and previous sequences. These differences were more evident when increasing 
the offset between the pair of pixels. This might be due to the sizes of the driver’s torso 
in the silhouette. This dispersion might not be large enough in the area from 
maneuvering or moving the hand. 

Respect to the correlation, in driver A241 the value recorded in the first direction 
(horizontal) was positive during the event and negative in previous sequences. However, 
this effect wasn’t observed in driver A686, since values were quite similar in both 
negative and positive sequences12.  In the case of the energy, some values were higher in 
previous sequences than during the event, depending on the direction of the GLCM. 
This fact might indicate a higher uniformity in images from negative situations.

32 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38

12 note that the NaN values obtained in some directions when calculating the correlation mean that the 
GLCM variance is null. So, the image is completely uniform according to the defined combination of pair 
of pixels.


2.4 Evaluation criteria. Data set definition

The goal of this project is to create an algorithm able to run throughout the triggered 
events of the database and to save only those in which the drivers react in presence of 
CRE. Potential methods of recognition of driver’s reaction have been commented in 
previous sections using some positive and negative situations. The performance in the 
training sample will provide an idea about which combinations are more likely to 
identify the CREs.

Nevertheless, the evaluation of the proven methods requires the use of a larger data set 
containing different situations from those used previously. This data set, called 
validation data set from now, comes from a triggering process with kinematic triggers 
in the euroFOT database and a subsequent evaluation by the annotators. The validation 
data set contains 120 different situations chosen randomly among the events that have 
been considered positive or have been rejected by the annotators when watching the 
videos of candidates to CRE.

Figure 2.31 Schema of procedure of algorithms’ evaluation.

Several thresholds have been considered when implementing the algorithms in the 
baseline. If the threshold is not strict, then it  will result in a greater number of true 
positive events (CRE rightly triggered), but also false negatives (normal driving 
situations wrongly triggered as positives). The ideal situation would only capture the 19 
positive events without any  negative (19 true positives and 101 true negatives). Since 
this only could be possible in a further study with the adequate adjustments from this 
preliminary project, there should be a compromise between the true positives to be 
achieved at the expense of false negatives. Such compromise can be represented in 
terms of specificity and sensitivity using Receiver Operating Characteristic (ROC) 
curves.

The ROC curve is a graphical representation of a rate of true positives against the rate of 
false positives for different thresholds in a diagnostic test (Tape, n.d.). This method was 
originally  developed in the World War II in radar-signal detection (Mason and Graham, 
2002). Nowadays is common and widespread used in the medical field for diagnosis of 
diseases.

There is a trade-off between sensitivity (rate of positives well diagnosed by the test) and 
specificity (rate of negatives well diagnosed by the test). If the sensitivity increases, 
then the specificity  decreases and vice-versa. In this case, true positives are CRE rightly 
triggered (Y-axis) and false positives are normal driving situations wrongly triggered 
(X-axis).

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    33

Algorithm 
(potential trigger)

Validation 
data set

120 events 19 positive

101 negative

Results: ROC curve


The closer the curve is to the left point, the 
more accurate the test (see representation in 
Figure 3.2). On the other hand, the less 
accurate test, the closer to the diagonal. The 
area under the ROC curve is commonly used 
as a measure of accuracy. The following 
values can be used as guide (Tape, n.d.):

.90-1 = excellent (A)

.80-.90 = good (B)

.70-.80 = fair (C)

.60-.70 = poor (D)

.50-.60 = fail (F)

Figure 2.32 Tape T., The Area Under an ROC Curve, [electronic print] Available at 
<http://gim.unmc.edu/dxtests/ROC3.htm>[Accessed on May 2011].

In this case, the main limitation of using the AUC when comparing methods is that is 
more important to save true positives even if this means an increased number of false 
negatives. Thus, the evaluation of the methods for the whole set of false positives (the 
area under the entire ROC curve) seems not be the most appropriate in this case. One 
alternative is to analyze a portion of the ROC curve (Katzman, 1989; Cleveland, 2011). 

An estimation of the relevant portion of the curve can be defined by a range of false 

positives below 60% and a range of true positives above 80%.  The main reason is to 
keep  almost all the true positives (sensitivity) even if it  means increasing the false 
negatives (1-specificity). Therefore, the negative events in the database may be reduced 
in at least 40%, without losing more than 20% of positive events. The numerical 
meaning according to the dimensions of the validation data set is to reduce by 40 the 
total of 101 negative events and keep at least 16 of the 19 positives.

34 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38

Area Under 
Curve (AUC)


3 Results

In the previous chapter, several algorithms were considered and tuned up in a training 
sample with eleven drivers and three different situations for each driver with the aim to 
distinguish CREs from a collection of negative and positive events. This chapter covers  
the performance of such algorithms in the training sample and their validation within a 
larger data set. The ideal algorithm should be able to identify as many positive events 
with the minimal negative situations. Results are presented below making use of ROC 
curves.

3.1 Performance in the training sample

Along the last chapter, several methods have been applied in the recognition of driver’s 
reaction to identify  safety critical situations. Initial assumptions, as classification based 
on t-test results, seem to generate noisy images and an unclear definition of the state of 
the driver’s motion. Nevertheless, analyzing changes in pixel intensities over time 
suggests that the sought motion may be related to a sudden change in a group of 

pixels intensities. 

This same concept is behind the images of STD of jerk and the OF calculations. By 
looking at the grayscale images from STD of jerk is possible to identify which is the 
positive event without any  additional information in most of the cases. The key of this 
identification is the silhouette of the driver, which means that there are a group of pixels 
that share a wide variance of jerk distribution over time. In the case of the OF, peaks in 
jerk distribution from the average of OF velocities in each frame contribute to 
discriminate between previous sequences and the event. The calculation is based on 
average speeds, so a group of pixels change quickly between frames. 

To assess the validity of this theory, these calculations must be performed throughout 
the entire training sample. The following is the example of results obtained for one of 
the drivers. Jerk distributions are presented together with the images from STD of jerk, 
which are also evaluated as distributions of sum of values along rows. The Appendix 4 
covers the same calculations for all the drivers of the training sample.

It is expected that  a driver’s silhouette appears during the event when plotting the STD 
of jerk. Distribution of sum of values in rows for each column pursues to distinct 
between normal driving maneuvers (just certain white areas in the images) and reactions 
in CREs. Thus, normal maneuvering may be related to local peaks in these curves, 
while a higher mean may  be related with the positive events. This is because white areas 
are dispersed along the image to reproduce driver’s silhouette.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    35


Figure 3.1 STD of jerk: 4-sec. before the event, 2-sec. bef.  and event (driver A484).

The driver remains in the 
same position in sequences 
before the event. Therefore, 
the images of STD of jerk 
seem to be a clear indicator 
of when the driver reacts. As 
can be seen in Figure 3.2, 
the sum of values along the 
rows for each column is also 
significantly higher during 
the event than in previous 
sequences.

 
Figure 3.2 Distribution of STD of jerk values along rows and columns.

Since the driver remains in 
the same position over time, 
distributions of jerk should 
be relatively constant before 
the event. Some unexpected 
results were obtained four 
seconds before the event at 
8th iteration, as shown in 
Figure 3.3. Anyway, the 
maximum jerk is reached 
during the event.

Figure 3.3 Distribution of jerk from OF velocities.

Range: 1.694-(-3.745)=5.439

36 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


3.1.1 Optical Flow

Findings in the training sample support that peaks in distribution of jerk from OF 

velocities are related to the drivers’ reaction in presence of a CRE. This hypothesis 
is accomplished in ten of the eleven drivers of the sample, in which the ranges of jerk 
are significantly higher during the event than those obtained in previous sequences.

Uncertainties are which range of jerk is 
related to the driver’s reaction, since these 
values are different for each driver (see 
Figure 3.4) and the computational cost of 
running the optical flow code. Given the 
dimensions of the database, the computational 
time is an important limitation. 

Figure 3.4 Ranges of jerk for drivers of the training sample.

To sum up, jerk peaks from OF velocities and images of STD of jerk were identify as 
potential indicators of positive events. Both methods base the discrimination in the 
presence of the drivers’ reaction when CREs occur. The main limitation when 
calculating the OF is the consumption of computational time. On the other hand, it 
was observed a relationship between the driver’s reaction and images of STD of jerk. In 
most of the cases was possible to identify what the positive event is by just looking at 
the driver’s silhouette. As this is graphic information, several converters have been 
addressed in the last chapter to transform this information into numerical.

3.1.2 Mean criterion

The figure below includes the distribution of mean values for each sequence in all the 
drivers of the training sample:

Figure 3.5 Mean of distribution of sum of STD of jerk values along rows for all the
drivers of the training sample

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    37


As shown in Figure 3.5, the mean values are higher during the event than in previous 
sequences in nine of the eleven drivers of the sample. In both exceptions ( 5th and 10th 
position in the sample), maneuvering in sequences before the event generates peaks in 
the distribution of STD of jerk values and, consequently, the mean value increases. 
Images of STD of jerk of both cases are presented below to analyze why the mean value 
differs from those obtained in the rest of the sample.

Figure 3.6 STD of jerk in negative sequences of drivers A567 and A686. 

Marked areas of moving the hand and turning the steering wheel stay  pixels with wide 
variance in jerk values over time. This causes the mean increases in such situations in 
comparison with the figure obtained from the positive event. Thus, the mean criterion 
not seems consistent enough in itself to discriminate between positive and negative 
situations. 

Since the values were added only along rows, distributions can be also tested in another 
direction. This involves calculating the mean of the distribution of sum of STD of jerk 
values along columns instead of rows. Distributions in both directions using one of the 
drivers from the exceptions are plotted in the figures below together with the number of 
zeros (black color) in the images:

Figure 3.7 Distribution of sum of STD of jerk values along rows (driver A567).

38 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


Figure 3.8 Distribution of sum of STD of jerk values along columns (driver A567).

As can be seen in Figures 3.7 and 3.8, peaks of sum of STD values are higher in the 
previous sequence than during the event along rows and columns. Then, the white area 
when turning the steering wheel still generates a higher mean in this case. Besides, the 
number of zeros is very similar in all the sequences. 

If comparing this result with those obtained for the rest of the sample, it is observed that 
in both exceptions (drivers A567 and A686) the mean is also higher in other sequences 
than during the event (see Table 3.1). Regarding the number of zeros, this value is not 
significant enough to distinct between positive and negative sequences.

Table 3.1 Mean of distributions of sum of STD over columns and Number of non-
zero values in the training sample.

Mean of STD distribution over columns Number of non-zero values

Driver 4-sec. Bef. 2-sec. Bef. Event 4-sec. Bef. 2-sec. Bef. Event

A34 4190 8092 17920 144,2 148,4 172,5
A241 2885 3869 6198 126,8 130,5 152,7
A481 2505 1892 5295 183,7 181,9 191
A501 6204 10410 12010 162,9 178,7 181,9
A686 7393 3485 5489 128,7 124,5 130,1

A1064 2071 1916 3656 122,8 124,6 132,9
A131 9519 10330 20950 168,4 163,6 200,2
A352 3342 4045 8982 205,2 206,9 217,9
A484 1505 1381 7214 158,6 156 172
A567 3767 6708 6287 154,8 156,2 154,7
A936 2121 5958 9711 98,91 108,2 142,9

In conclusion, the mean values of distributions of sum of STD of jerk along rows 

and columns have been calculated to recognize the driver’s silhouette as a wider 
dispersion of intensities over the images. Since this value is affected by concentrated 

areas in the image from maneuvering and changes in position, other statistical 
measures are taken into account.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    39


3.1.3 Harmonic mean

As was performed with the mean in the last section, the harmonic mean is estimated in 
the distributions of sum of STD of jerk values along rows and columns. Again, the 
reason is to try to locate the silhouette by the dispersion of pixels in the image with the 
difference that harmonic mean is not as affected by outliers. The sum of each par of 
values (harmonic means in rows and columns) is presented below as “Combination of 
harmonic means” for all the drivers of the training sample: 

As shown in Figure 3.9, 
combinations of harmonic 

means reach higher values 

during the event than in 

previous sequences for all 

the drivers of the training 

sample. This result in also 
observed in drivers in 5th and 
10th position in the sample, 
exceptions of the mean 
criterion,who register in this 
case a higher sum of 
harmonic means during the 
event.

Figure 3.9 Distribution of combination of harmonic means in the training sample.

Looking at the range of values of harmonic means in different drivers (see in Figure 
3.9), the main issue is to establish a threshold able to identify as many events at the 
expense of negative situations. 

3.1.4 Mean&General mask

The mean value was also calculated considering the Fusion mask, binary mask created 
from the combination of several driver’s silhouettes. 

The Figure 3.10 shows 
the mean values using 
the binary Fusion mask 
in images of STD of 
jerk. Despite the fact 
that the size of the 
training sample is not 
large enough to have a 
statistical sense, these 
results suggest higher 
means in those images 
from events than from 
previous sequences.  

Figure 3.10 Mean values by applying Fusion mask in STD of jerk images.

40 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


3.1.5 GLCM properties

The contrast  and the energy of the GLCM  have been tested as well in the entire training 
sample considering an offset of 200:

Figure 3.11 Sum of contrast in four directions of the GLCM in the training sample.

Figure 3.12 Energy of the GLCM in the training sample

The Figure 3.11 shows the sum of contrasts in four different directions of the GLCM 
with an offset of 200 pixels for the sequences of the training sample. This value appears 
to be greater in some of the positive events regarding the previous sequences, but is not 
a clear discriminator in some cases. The same occurs using energy as property of study 
in GLCM. It seems that values are generally lower during the events in comparison to 
previous sequences. The main limitation would be to set a value that discriminates 

between both situations. 

 
CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    41


3.2 Results in the validation data set

As mentioned previously, the training sample is used to test different methods and 
identify potential features to discriminate between positive a negative events. However, 
its dimensions are not large enough for a statistical sense. Therefore, the evaluation 
requires the use of the validation data set. 

Below, ROC curves are plotted for each method with different combinations of masks in 
the images and thresholds. The range of variation of threshold values has been chosen 
according to the results of the training sample. These are represented as dots on the 
graph for the entire false positive rate. This gives an idea of the accuracy  of the curve. 
However, only  a certain area under a portion of the curve is relevant. It is bounded by 
two lines on the graphs. The largest  area within these boundaries determines which 
method is the most  accurate based on the requirements specified in the Evaluation 
criteria in Chapter 2. 

Another consideration when comparing the methods arises in the computational time. 
This is estimated in terms of how long (in seconds) processing each second of trip  takes. 
This is calculated by taking the time of computing all the iterations when changing the 
threshold values and considering the two-second duration of each file in the baseline.

3.2.1 Mean criterion

The mean criterion evaluates the presence of driver’s silhouette in images of STD of 
jerk by  adding STD values along rows and columns. Both vectors containing partial 
sums are combined into a single. The mean is calculated in its distribution. 

60 iterations have been considered by  changing the threshold values with a step of one 
unit. Three different input images have been considered:

-without mask: original image crop around the torso.

-BW mask: binary mask hiding the window.

-Fusion mask: binary  mask around the area in which driver’s silhouettes 
commonly take place. 

In comparison with the commented evaluation criteria, the curves in Figure 3.13 are 
closer to the shape of good accuracy. However, better results would obtain if the curve 
was closer to the upper left  corner. The area under the bounded portion of the curve 
seems larger without using any mask. The second best option according to this area 
suggests the use of fusion mask.

42 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


Figure 3.13 ROC curves of thresholding with different combinations of mean 
criterion.

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    43


3.2.2 Harmonic mean

The distribution of partial 
sums of STD of jerk along 
rows and columns is now 
evaluated using the 
harmonic mean. Unlike the 
previous test, this method 
discriminates the outliers 
of such distribution, mostly 
regarding with maneuvers. 

The threshold values vary 
between harmonic means 
of 400 and 10000, resulting 
in a stepped ROC curve.

Figure 3.14 ROC curves of thresholding with harmonic mean criteria.

The harmonic mean criterion emerged as an alternative of using the mean. Among the 
33 cases of the training sample, the harmonic mean was higher in the 11 that were 
positive. However, as can be seen in Figure 3.14, the bounded area is null. The slope at 
the beginning of the ROC curve is positive in terms of a further increase in sensitivity 
against (1-specificity). Nevertheless, variations from a certain threshold values don’t 
seem to affect the rate of true and false triggered. The best result obtained with this 
method according to the initial criterion is achieved with harmonic means above 2635. 
In that case, the false positive rate is 60,4%, while the true positive is 89,4%.

3.2.3 Ranges of jerk from OF

The calculation of the 
optical flow (OF) is a 
numerical alternative to the 
use of STD images in the 
estimation of rates of 
change in pixel intensities.

Events in the baseline have 
been triggered above three 
different intermediate 
values of range of jerk (1, 2 
and 6). This low sampling 
rate is mainly due to the 
computational cost of 
implementing the OF

Figure 3.15 ROC curves of thresholding with OF Criteria.

44 CHALMERS, Applied Mechanics, Master’s Thesis  2011:38


OF velocities have been 
calculated using the 
original images and them 
combinations with binary 
masks. For the three cases 
observed in Figure 3.15, 
ROC curves are closer to 
the diagonal. Although the 
results are slightly better 
with the use of fusion 
mask, the method seems 
inaccurate for identifying 
positive events.

This contrasts with the
results of evaluating the OF 
in the training sample. In 
ten of the eleven drivers,  
the peak in the jerk 
distribution in positive 
situations was clearly 
significant in comparison 
with those obtained in 
negative sequences. The 
main limitation arise in the 
threshold value, since it 
changes for each driver. 

Figure 3.15 ROC curves of thresholding with OF Criteria.

In any case, the potential application of this method together with the use of fusion 
mask reduce by 40% the number of negatives while triggering 17 of the 19 positive 
events (see table 3.1). 

Table 3.2 Results of triggering with ranges of Jerk of OF speeds above one in 
combination with fusion mask over the original images.

Range>1
TRUE FALSE

Positive 17 2
Negative 38 63

CHALMERS, Applied Mechanics, Master’s Thesis 2011:38                                                                    45


3.2.4 GLCM properties

A statistical approach in the 
identification of the drivers' 
silhouette in images of STD 
can be done according to 
the spatial distribution of 
pair of pixels over the 
images. This involves 
es t imat ing how the 
properties of GLCM 
change in positive and 
negative events. 

As can be seen in Figure 
3.6, the trend at the 
beginning of the ROC 
curve when thresholding 
with Energy values is better 
than the obtained using the 
Contrast. This is due to an 
incre