Data Access Control for RAG-Based Chatbots
| dc.contributor.author | Strandroth Frid, Anton | |
| dc.contributor.author | Wramdemark, Filip | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Tatar, Kıvanç | |
| dc.contributor.supervisor | Xuechen, Liu | |
| dc.date.accessioned | 2025-10-15T10:52:01Z | |
| dc.date.issued | 2025 | |
| dc.date.submitted | ||
| dc.description.abstract | The increasing demand for chatbots accessing enterprise data has resulted in the need to ensure secure data access control and high-quality responses. Thus, this project aims to implement data access control mechanisms, along with fine-tuning techniques, to develop a chatbot capable of generating such responses. Three approaches were explored: two utilizing an agentic Retrieval-Augmented Generation (RAG) architecture with pre-trained Large Language Models (LLMs), with and without fine-tuning, as well as a standalone fine-tuned LLM. The RAG architecture with fine-tuning also employed a response filter, whilst the standalone LLM was fine-tuned on data with incorporated data access control restrictions to prevent information leakage. Their performance was determined by assessing the semantic and linguistic correctness of responses and the amount of information leakage beyond a users access. A combination of the pre-trained LLMs Mistral-NeMo-Instruct-2407 and Qwen2.5-32B-Instruct, applied in the agentic RAG setup, achieved the bestperforming chatbot, having no data leakage and high-quality responses. Fine-tuning LLMs has proven to introduce potential data leakage risks, even when access restrictions are integrated into the training process. Therefore, to guarantee the protection of confidential information, it is advised to use pre-trained LLMs in a RAG setup with access control. | |
| dc.identifier.coursecode | DATX05 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12380/310639 | |
| dc.language.iso | eng | |
| dc.relation.ispartofseries | CSE 25-21 | |
| dc.setspec.uppsok | Technology | |
| dc.subject | Data access control, RAG, chatbot, data security, response quality, finetuning, LLM, NLP. | |
| dc.title | Data Access Control for RAG-Based Chatbots | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | Data science and AI (MPDSC), MSc |
Ladda ner
License bundle
1 - 1 av 1
Hämtar...
- Namn:
- license.txt
- Storlek:
- 2.35 KB
- Format:
- Item-specific license agreed upon to submission
- Beskrivning:
