Data Access Control for RAG-Based Chatbots

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

The increasing demand for chatbots accessing enterprise data has resulted in the need to ensure secure data access control and high-quality responses. Thus, this project aims to implement data access control mechanisms, along with fine-tuning techniques, to develop a chatbot capable of generating such responses. Three approaches were explored: two utilizing an agentic Retrieval-Augmented Generation (RAG) architecture with pre-trained Large Language Models (LLMs), with and without fine-tuning, as well as a standalone fine-tuned LLM. The RAG architecture with fine-tuning also employed a response filter, whilst the standalone LLM was fine-tuned on data with incorporated data access control restrictions to prevent information leakage. Their performance was determined by assessing the semantic and linguistic correctness of responses and the amount of information leakage beyond a users access. A combination of the pre-trained LLMs Mistral-NeMo-Instruct-2407 and Qwen2.5-32B-Instruct, applied in the agentic RAG setup, achieved the bestperforming chatbot, having no data leakage and high-quality responses. Fine-tuning LLMs has proven to introduce potential data leakage risks, even when access restrictions are integrated into the training process. Therefore, to guarantee the protection of confidential information, it is advised to use pre-trained LLMs in a RAG setup with access control.

Beskrivning

Ämne/nyckelord

Data access control, RAG, chatbot, data security, response quality, finetuning, LLM, NLP.

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced