Threat assessment from naturalistic video-data: How to detect, classify, and estimate the position of multiple road users from cameras

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Mobility engineering (MPMOB), MSc
Publicerad
2024
Författare
Fang, Luhan
Wu, Yahui
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Each year, over one million people die in traffic-related crashes, with more than half involving vulnerable road users (VRUs) such as pedestrians, cyclists, and escooterists. To mitigate the crashes, it is important to understand the causation mechanisms of such crashes and analyze the interactions and behaviors of VRUs. Using the knowledge of these crash mechanisms and rider behavior, threat assesment algorithms can be developed to identify anomaly in rider behavior and likelihood of a crash. Naturalistic data has proven to be an effective tool in traffic safety research to identify the road user interaction and crash mechanisms. However, as the data tends to be large, traditionally the safety critical events have been identified based on kinematic trigger. However, in some safety critical scenarios, the vehicles may not exhibit any change in kinematics, which often leads to their exclusion from analysis. Given the sheer volume of data, manually reviewing each video requires significant time and resources making it unviable. With the development of computer vision, investigating the video footage can be very effective. Therefore, by using computer vision methods, we can accurately identify different road users within the footage. Once a road user is identified, machine learning models can be utilized to convert the 2D position on the video frame into real-world positions and kinematics of the road users. Threat assessment algorithms can use these positions and kinematics to determine safety-critical scenarios. This approach makes identification of safety critical scenarios using videos feasible in addition to making efficient use of the collected data. This thesis proposes a system that first utilizes the version 7 of You Only Look Once (YOLOv7) model to conduct object detection, providing the bounding boxes for target road users, including pedestrians, cars, cyclists, e-scooterists, and other micromobility vehicles such as bicycles and e-scooters. Then, machine learning regressors are used, taking the features of bounding boxes from YOLOv7 as input, to estimate the positions of the target objects. Finally, a threat assessment algorithm, based on the range and time gap between the ego vehicle and the target object, is used to identify whether the scenario in the video footage is a critical event or not. The object detection model can generate detection results for pedestrians, cars, cyclists, e-scooterists, bicycles, and e-scooters with a mean Average Precision (mAP) of 0.893 for all classes combined. Among all the selected machine learning regressors for position estimation, the RandomForestRegressor has the highest R2 score, exceeding 0.9. With the positional information of the road users, the threat assessment algorithm can apply safety metrics to detect critical events.
Beskrivning
Ämne/nyckelord
VRU , Micromobility , Computer Vision , Position Estimation , Threat Assessment
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index