Semantically Aware Attacks on Text-based Models: An Extension of Context-aware and Neighbourhood Comparisonbased Membership Inference Attacks
| dc.contributor.author | Glänte, Gabriel | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Johansson, Fredrik | |
| dc.contributor.supervisor | Matsson, Anton | |
| dc.date.accessioned | 2026-01-28T07:15:17Z | |
| dc.date.issued | 2025 | |
| dc.date.submitted | ||
| dc.description.abstract | Training deep-learning models requires large amounts of data. When this data is sensitive, e.g., containing personal information, it is important to ensure that no sensitive information can be extracted from the trained models. In a membership inference attack (MIA), an adversary is expected to have access to a trained model θ and a data sample d, sampled from the same distribution as the unknown training data. The objective of the adversary is to construct an algorithm A(θ, d) → {0, 1}, where the binary output guesses if d was part of the unknown training data or not. It is commonly assumed that the attacker can access loss values from θ for different prompts; such loss-based signals are crucial for membership checks, even under black-box conditions. For text, the notion of membership is not clear-cut: distinct strings can share the same semantics. Many MIAs therefore fail when they only test exact strings. Recent work reports near-random performance across models and domains (15). This suggests the need to incorporate semantics, i.e., to probe a text together with semantic neighbours that preserve meaning under small, context-appropriate edits. This thesis explores and strengthens such attacks and evaluates them with the standard metrics area under the ROC curve (AUC) and true positive rate at low false-positive rates (TPR@1%FPR). Building on the context-aware membership inference attack (CAMIA) which uses per-token loss sequences rather than a single average loss to construct signals for membership inference (11), the contributions of this thesis are: (i) a custom reimplementation of CAMIA, (ii) integrating a neighbourhood comparison signal that perturbs a text with its semantic neighbours (16), and (iii) novel signals designed to improve loss-informed neighbour generation. Experiments on Pythia-deduped and GPT-Neo models across six subsets of The Pile (19) (streamed via the MIMIR repository (15)) show that these semantics-aware extensions often increase true positive rates at low false positive rates while keeping AUC stable. Overall, modest, loss-guided semantic edits make MIAs more effective for text under realistic black-box conditions. | |
| dc.identifier.coursecode | Datx05 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12380/310950 | |
| dc.language.iso | eng | |
| dc.setspec.uppsok | Technology | |
| dc.subject | membership inference attack | |
| dc.subject | large language model | |
| dc.subject | privacy | |
| dc.subject | semantic perturbation | |
| dc.subject | neighbourhood comparison | |
| dc.title | Semantically Aware Attacks on Text-based Models: An Extension of Context-aware and Neighbourhood Comparisonbased Membership Inference Attacks | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | Data science and AI (MPDSC), MSc |
