Data Access Control for RAG-Based Chatbots

dc.contributor.authorStrandroth Frid, Anton
dc.contributor.authorWramdemark, Filip
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerTatar, Kıvanç
dc.contributor.supervisorXuechen, Liu
dc.date.accessioned2025-10-15T10:52:01Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractThe increasing demand for chatbots accessing enterprise data has resulted in the need to ensure secure data access control and high-quality responses. Thus, this project aims to implement data access control mechanisms, along with fine-tuning techniques, to develop a chatbot capable of generating such responses. Three approaches were explored: two utilizing an agentic Retrieval-Augmented Generation (RAG) architecture with pre-trained Large Language Models (LLMs), with and without fine-tuning, as well as a standalone fine-tuned LLM. The RAG architecture with fine-tuning also employed a response filter, whilst the standalone LLM was fine-tuned on data with incorporated data access control restrictions to prevent information leakage. Their performance was determined by assessing the semantic and linguistic correctness of responses and the amount of information leakage beyond a users access. A combination of the pre-trained LLMs Mistral-NeMo-Instruct-2407 and Qwen2.5-32B-Instruct, applied in the agentic RAG setup, achieved the bestperforming chatbot, having no data leakage and high-quality responses. Fine-tuning LLMs has proven to introduce potential data leakage risks, even when access restrictions are integrated into the training process. Therefore, to guarantee the protection of confidential information, it is advised to use pre-trained LLMs in a RAG setup with access control.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310639
dc.language.isoeng
dc.relation.ispartofseriesCSE 25-21
dc.setspec.uppsokTechnology
dc.subjectData access control, RAG, chatbot, data security, response quality, finetuning, LLM, NLP.
dc.titleData Access Control for RAG-Based Chatbots
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc

Ladda ner

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: