Privacy Risks in Text Masking Models for Anonymization

dc.contributor.authorReimer, Amandus
dc.contributor.departmentChalmers tekniska högskola / Institutionen för fysiksv
dc.contributor.departmentChalmers University of Technology / Department of Physicsen
dc.contributor.examinerVolpe, Giovanni
dc.contributor.supervisorÖstman, Johan
dc.date.accessioned2025-02-27T12:49:26Z
dc.date.available2025-02-27T12:49:26Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractLarge Language Models (LLMs) are increasingly employed to anonymize texts containing Personal Identifiable Information (PII), often relying on Named Entity Recognition (NER) to identify and remove sensitive data. This thesis explores the privacy risks associated with such text masking models by evaluating their vulnerability to Membership Inference Attacks (MIAs) and extraction attacks. MIAs are attempting to identify whether or not a data point was part of the training dataset, knowledge of the membership can in certain scenarios be a breach of privacy. Two state-of-theart MIAs have been used to conduct attacks on text masking models. This study also proposes a framework based on multi-armed bandits for performing extraction attacks and evaluates two different strategies within this framework. The results from the MIAs indicate that there is some risk of revealing information regarding the training data. The extraction attacks did not yield great results in terms of performance but indicate that the concept could possibly be useful if developed further.
dc.identifier.coursecodeTIFX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/309171
dc.language.isoeng
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectMembership Inference Attack, Model Integrity, Personal Identifiable Information, Data Extraction Attack, Text Anonymization.
dc.titlePrivacy Risks in Text Masking Models for Anonymization
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Amandus Reimer.pdf
Storlek:
1.61 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: