Using machine learning to estimate road-user kinematics from video data

Typ
Projektarbete, avancerad nivå
Project Report, advanced level
Program
Publicerad
2024
Författare
Fang, Luhan
Malm, Oskar
Wu, Yahui
Xiao, Tianshuo
Zhao, Minxiang
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Each year, there are over one million people who sustain fatal injuries in traffic-related crashes, with vulnerable road users, often abbreviated as VRUs, being involved in more than half of the crashes. In the context of road safety, VRUs are mainly pedestrians, cyclists, motorcyclists, and e-scooterists. To mitigate the crashes, it is essential to understand the causation mechanisms. Naturalistic data has been recognised as a good tool to understand the trafficant’s behaviour and address safety concerns within the field of traffic safety research. Traditionally, critical events are identified from naturalistic data using the kinematic information from the sensors onboard the vehicles. Video footage for the trip corresponding to the critical events is then used to validate and annotate the events. While this is a reliable method when it comes to identification of crashes, near-crashes may not display any anomalies in the sensor readings thereby going unidentified. These near-crash events would be visible in the video footage, but the manual identification of these by watching videos is not feasible because the amount of videos is too large for the human eyes. Therefore, developing tools that can identify and estimate the position of different road users using the video footage is essential and will enable automation of process of identifying critical events. This report describes such models and also delves into the application of machine learning to allow identify the severity of imminent critical interactions among road users in the future. This project investigates how models can be developed to estimate and predict the position and kinematics of various road users from video data from a camera mounted on an e-scooter. The initial generation of bounding boxes and categories for road users utilized You Only Look Once (YOLOv7) algorithms. The detection for cyclists was achieved by a simple rule-based model calculating the overlap area between the pedestrian and bicycle detected by YOLOv7. The e-scooterists detection model was implemented by combining YOLOv7 and MobileNetV2 models. Different machine learning models were trained to estimate distance for the four different road users: pedestrians, cyclists, e-scooterists, and cars separately using LiDAR and GPS data as the position ground truth. The input for these models was derived from bounding box data extracted from videos. Furthermore, a DBSCAN-based noise remover was used to remove the outlier point of the distance estimation model to filter out points with excessive errors. Finally, a Rauch-Tung-Striebel smoother was applied to the output of the noise remover to improve the distance estimation accuracy and generate both the relative position and velocity of the target road user. It was concluded that the object detection model could achieve an accuracy over 90%, and the distance estimation achieved the highest accuracy when using polar coordinates for all road users, compared with using the cartesian system. The highest R2 score for distance estimation was obtained with a k-nearest neighbors regression model (with n = 2) using the pixel position in x and y direction of center point of the bottom of bounding box, height, and width of the bounding box as input. As a consequence, the e-scooterist model achieved a R2 score of 0.978, while the cyclist and car models attained commendable scores of 0.92 and 0.96 respectively. This means that the distances predicted from the model are highly accurate. These models can now be used to detect critical interactions among road users using the naturalistic data collected using e-scooters.
Beskrivning
Ämne/nyckelord
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index