Threat assessment from naturalistic video-data: How to detect, classify, and estimate the position of multiple road users from cameras
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Mobility engineering (MPMOB), MSc
Publicerad
2024
Författare
Fang, Luhan
Wu, Yahui
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Each year, over one million people die in traffic-related crashes, with more than
half involving vulnerable road users (VRUs) such as pedestrians, cyclists, and escooterists. To mitigate the crashes, it is important to understand the causation
mechanisms of such crashes and analyze the interactions and behaviors of VRUs.
Using the knowledge of these crash mechanisms and rider behavior, threat assesment
algorithms can be developed to identify anomaly in rider behavior and likelihood of
a crash. Naturalistic data has proven to be an effective tool in traffic safety research
to identify the road user interaction and crash mechanisms. However, as the data
tends to be large, traditionally the safety critical events have been identified based
on kinematic trigger. However, in some safety critical scenarios, the vehicles may not
exhibit any change in kinematics, which often leads to their exclusion from analysis.
Given the sheer volume of data, manually reviewing each video requires significant
time and resources making it unviable. With the development of computer vision,
investigating the video footage can be very effective. Therefore, by using computer
vision methods, we can accurately identify different road users within the footage.
Once a road user is identified, machine learning models can be utilized to convert
the 2D position on the video frame into real-world positions and kinematics of the
road users. Threat assessment algorithms can use these positions and kinematics
to determine safety-critical scenarios. This approach makes identification of safety
critical scenarios using videos feasible in addition to making efficient use of the collected data.
This thesis proposes a system that first utilizes the version 7 of You Only Look
Once (YOLOv7) model to conduct object detection, providing the bounding boxes
for target road users, including pedestrians, cars, cyclists, e-scooterists, and other
micromobility vehicles such as bicycles and e-scooters. Then, machine learning regressors are used, taking the features of bounding boxes from YOLOv7 as input, to
estimate the positions of the target objects. Finally, a threat assessment algorithm,
based on the range and time gap between the ego vehicle and the target object, is
used to identify whether the scenario in the video footage is a critical event or not.
The object detection model can generate detection results for pedestrians, cars, cyclists, e-scooterists, bicycles, and e-scooters with a mean Average Precision (mAP)
of 0.893 for all classes combined. Among all the selected machine learning regressors
for position estimation, the RandomForestRegressor has the highest R2
score, exceeding 0.9. With the positional information of the road users, the threat assessment algorithm can apply safety metrics to detect critical events.
Beskrivning
Ämne/nyckelord
VRU , Micromobility , Computer Vision , Position Estimation , Threat Assessment