Real-time Relevance: RAG with Dynamic Context for Improved Natural Language Responses

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Today’s Retrieval Augmented Generation (RAG) systems often struggle when trying to answer questions that require complex multi-hop reasoning. In this thesis we investigate an autoregressive Large Language Model (LLM) architecture which can generate a real-time relevant dense search vector for every token generation step. To facilitate this we also develop a synthetic data generation technique to acquire search query vector labels on a token-by-token level, requiring only a generating LLM and a document database. We investigate the quality of the synthetic data, and provide an attention based relabeling method which decreases hallucinations, improving the correctness of the labels by 67%. The architecture is able to produce query vectors 27 times faster than a separate embedder at the cost of retrieval accuracy. Finally, we train and employ the model in an active retrieval question-answering setting.

Description

Keywords

LLM, RAG, active retrieval, synthetic data generation, master thesis

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By