Data Access Control for RAG-Based Chatbots

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The increasing demand for chatbots accessing enterprise data has resulted in the need to ensure secure data access control and high-quality responses. Thus, this project aims to implement data access control mechanisms, along with fine-tuning techniques, to develop a chatbot capable of generating such responses. Three approaches were explored: two utilizing an agentic Retrieval-Augmented Generation (RAG) architecture with pre-trained Large Language Models (LLMs), with and without fine-tuning, as well as a standalone fine-tuned LLM. The RAG architecture with fine-tuning also employed a response filter, whilst the standalone LLM was fine-tuned on data with incorporated data access control restrictions to prevent information leakage. Their performance was determined by assessing the semantic and linguistic correctness of responses and the amount of information leakage beyond a users access. A combination of the pre-trained LLMs Mistral-NeMo-Instruct-2407 and Qwen2.5-32B-Instruct, applied in the agentic RAG setup, achieved the bestperforming chatbot, having no data leakage and high-quality responses. Fine-tuning LLMs has proven to introduce potential data leakage risks, even when access restrictions are integrated into the training process. Therefore, to guarantee the protection of confidential information, it is advised to use pre-trained LLMs in a RAG setup with access control.

Description

Keywords

Data access control, RAG, chatbot, data security, response quality, finetuning, LLM, NLP.

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By