From Data to Descriptions: Efficient Data Retrieval in Autonomous Vehicle Development using Generative AI

Knapp, Jesper; Moberg, Klas

From Data to Descriptions: Efficient Data Retrieval in Autonomous Vehicle Development using Generative AI

Ladda ner

CSE 24-24 JK KM.pdf (4.65 MB)

Publicerad

2024

Författare

Knapp, Jesper

Moberg, Klas

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Complex adaptive systems (MPCAS), MSc

Sammanfattning

The software that enables autonomous driving requires a variety of sensors that generate a large amount of data. Data collected from a vehicle are often stored for later reference, either for testing new software components or to analyze the fleet of vehicles on a larger scale. Due to the large amount of varied data, finding a specific vehicle scenario in a collection of vehicle logs proves difficult. Current solutions mainly use SQL to query the database of logs, this solution does however require knowledge of both SQL and the specifics of the data that you are looking for. This thesis was carried out in collaboration with Zenseact, and aims to create an artifact called "Genius" to enable searching their logged vehicle data using natural language by applying generative AI to generate scenario descriptions. A scenario is a 30 second snippet of data from the vehicle logs that contain signals, which are the result of processed sensor data from the vehicles. Videos recorded from the roof of the vehicles complement the signal data. The descriptions are created in two parts, first images from the video are fed to LLaVA 1.5 7b, a multi-modal LLM that describes the scenario based on the image. A selection of key signals extracted from each log, as all signals cannot fit inside the context size of the deployed LLM, are then combined along with the image description and fed to a second LLM, Gemma 7b, to create a combined description. After the descriptions have been generated they are embedded using BGE-large, a text embedding model, and stored in ChromaDB to create a vector database. This database is then used to allow semantically searching the logs by comparing their distances in vector space to a natural language query. This study follows the design science research (DSR) methodology with three regulative cycles, with 5 phases in each, followed by a learnings section for each cycle with insights that are used in the subsequent cycle. Initial results with a smaller set of 8 scenarios show promising results in terms of how well the scenarios were separated in vector space, and the ability to search them using natural language. When scaling up to 100 scenarios, scenarios are mostly still searchable, however, scenarios that are not very distinct are hard to find since there are many similar matches. To counteract this, several systems of evaluating if the returned scenarios are correct were implemented, such as comparing keywords and an evaluation of the scenario distances. The generated descriptions were evaluated by engineers working with the vehicle logs at the collaborating company, on a scale of 1-5, the descriptions achieved a mean score of 3.3125. Overall, the solution can not replace existing solutions in its current form, this is due to the fact that all data is not available in the generated descriptions and the LLM and embedding models limited capabilities with numerical data.

Ämne/nyckelord

Large Language Model, Genereative Pre-trained Transformer, Semantic search, Embedding, Big data, Vehicle logs, Autonomous driving

URI

http://hdl.handle.net/20.500.12380/308923

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

From Data to Descriptions: Efficient Data Retrieval in Autonomous Vehicle Development using Generative AI

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced