Real-time Relevance: RAG with Dynamic Context for Improved Natural Language Responses

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Today’s Retrieval Augmented Generation (RAG) systems often struggle when trying to answer questions that require complex multi-hop reasoning. In this thesis we investigate an autoregressive Large Language Model (LLM) architecture which can generate a real-time relevant dense search vector for every token generation step. To facilitate this we also develop a synthetic data generation technique to acquire search query vector labels on a token-by-token level, requiring only a generating LLM and a document database. We investigate the quality of the synthetic data, and provide an attention based relabeling method which decreases hallucinations, improving the correctness of the labels by 67%. The architecture is able to produce query vectors 27 times faster than a separate embedder at the cost of retrieval accuracy. Finally, we train and employ the model in an active retrieval question-answering setting.

Beskrivning

Ämne/nyckelord

LLM, RAG, active retrieval, synthetic data generation, master thesis

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced