Semantically Aware Attacks on Text-based Models: An Extension of Context-aware and Neighbourhood Comparisonbased Membership Inference Attacks

dc.contributor.authorGlänte, Gabriel
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerJohansson, Fredrik
dc.contributor.supervisorMatsson, Anton
dc.date.accessioned2026-01-28T07:15:17Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractTraining deep-learning models requires large amounts of data. When this data is sensitive, e.g., containing personal information, it is important to ensure that no sensitive information can be extracted from the trained models. In a membership inference attack (MIA), an adversary is expected to have access to a trained model θ and a data sample d, sampled from the same distribution as the unknown training data. The objective of the adversary is to construct an algorithm A(θ, d) → {0, 1}, where the binary output guesses if d was part of the unknown training data or not. It is commonly assumed that the attacker can access loss values from θ for different prompts; such loss-based signals are crucial for membership checks, even under black-box conditions. For text, the notion of membership is not clear-cut: distinct strings can share the same semantics. Many MIAs therefore fail when they only test exact strings. Recent work reports near-random performance across models and domains (15). This suggests the need to incorporate semantics, i.e., to probe a text together with semantic neighbours that preserve meaning under small, context-appropriate edits. This thesis explores and strengthens such attacks and evaluates them with the standard metrics area under the ROC curve (AUC) and true positive rate at low false-positive rates (TPR@1%FPR). Building on the context-aware membership inference attack (CAMIA) which uses per-token loss sequences rather than a single average loss to construct signals for membership inference (11), the contributions of this thesis are: (i) a custom reimplementation of CAMIA, (ii) integrating a neighbourhood comparison signal that perturbs a text with its semantic neighbours (16), and (iii) novel signals designed to improve loss-informed neighbour generation. Experiments on Pythia-deduped and GPT-Neo models across six subsets of The Pile (19) (streamed via the MIMIR repository (15)) show that these semantics-aware extensions often increase true positive rates at low false positive rates while keeping AUC stable. Overall, modest, loss-guided semantic edits make MIAs more effective for text under realistic black-box conditions.
dc.identifier.coursecodeDatx05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310950
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectmembership inference attack
dc.subjectlarge language model
dc.subjectprivacy
dc.subjectsemantic perturbation
dc.subjectneighbourhood comparison
dc.titleSemantically Aware Attacks on Text-based Models: An Extension of Context-aware and Neighbourhood Comparisonbased Membership Inference Attacks
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 25-164 GG.pdf
Storlek:
2.06 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: