Semantically Aware Attacks on Text-based Models: An Extension of Context-aware and Neighbourhood Comparisonbased Membership Inference Attacks

Glänte, Gabriel

Semantically Aware Attacks on Text-based Models: An Extension of Context-aware and Neighbourhood Comparisonbased Membership Inference Attacks

dc.contributor.author	Glänte, Gabriel
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Johansson, Fredrik
dc.contributor.supervisor	Matsson, Anton
dc.date.accessioned	2026-01-28T07:15:17Z
dc.date.issued	2025
dc.date.submitted
dc.description.abstract	Training deep-learning models requires large amounts of data. When this data is sensitive, e.g., containing personal information, it is important to ensure that no sensitive information can be extracted from the trained models. In a membership inference attack (MIA), an adversary is expected to have access to a trained model θ and a data sample d, sampled from the same distribution as the unknown training data. The objective of the adversary is to construct an algorithm A(θ, d) → {0, 1}, where the binary output guesses if d was part of the unknown training data or not. It is commonly assumed that the attacker can access loss values from θ for different prompts; such loss-based signals are crucial for membership checks, even under black-box conditions. For text, the notion of membership is not clear-cut: distinct strings can share the same semantics. Many MIAs therefore fail when they only test exact strings. Recent work reports near-random performance across models and domains (15). This suggests the need to incorporate semantics, i.e., to probe a text together with semantic neighbours that preserve meaning under small, context-appropriate edits. This thesis explores and strengthens such attacks and evaluates them with the standard metrics area under the ROC curve (AUC) and true positive rate at low false-positive rates (TPR@1%FPR). Building on the context-aware membership inference attack (CAMIA) which uses per-token loss sequences rather than a single average loss to construct signals for membership inference (11), the contributions of this thesis are: (i) a custom reimplementation of CAMIA, (ii) integrating a neighbourhood comparison signal that perturbs a text with its semantic neighbours (16), and (iii) novel signals designed to improve loss-informed neighbour generation. Experiments on Pythia-deduped and GPT-Neo models across six subsets of The Pile (19) (streamed via the MIMIR repository (15)) show that these semantics-aware extensions often increase true positive rates at low false positive rates while keeping AUC stable. Overall, modest, loss-guided semantic edits make MIAs more effective for text under realistic black-box conditions.
dc.identifier.coursecode	Datx05
dc.identifier.uri	https://hdl.handle.net/20.500.12380/310950
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	membership inference attack
dc.subject	large language model
dc.subject	privacy
dc.subject	semantic perturbation
dc.subject	neighbourhood comparison
dc.title	Semantically Aware Attacks on Text-based Models: An Extension of Context-aware and Neighbourhood Comparisonbased Membership Inference Attacks
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Data science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 25-164 GG.pdf
Size:: 2.06 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen