LLMs for SDVs: Automated Software Vulnerability Detection and Repair
Hämtar...
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Software-Defined Vehicles (SDVs) increasingly rely on large-scale C/C++ software
stacks to implement safety-critical functionalities. While these languages provide
the deterministic performance and hardware control required in automotive systems,
they are also susceptible to memory-safety vulnerabilities such as buffer overflows,
out-of-bounds accesses, NULL pointer dereferences, and resource management errors. Existing vulnerability analysis approaches remain essential in industrial practice but face limitations in scalability, coverage, and manual remediation effort when
applied to modern automotive-scale software systems. Recent advances in Large
Language Models (LLMs) have motivated increasing research interest in automated
vulnerability detection and repair.
This thesis presents an experimental study of a two-stage detection and repair
pipeline for function-level C/C++ memory-safety vulnerability detection and re
pair in software relevant to SDVs. For the detection stage, the study evaluates how
classification strategy, pre-trained code model selection, and inference-time threshold selection affect detection performance for four vulnerability categories, Common
Weakness Enumeration (CWE)-787, CWE-476, CWE-399, and CWE-125. Detection experiments compare CodeBERT, GraphCodeBERT, and UniXcoder across
specialised binary classifiers and a shared multiclass classifier. For the repair stage,
the study evaluates how detection-augmented prompting using vulnerability guidance affect LLM-based automated vulnerability repair performance. Repair experiments evaluate three prompting strategies with increasing levels of vulnerability
guidance.
Experiments on the BigVul and PrimeVul datasets show that the specialised binary
classifiers outperform the multiclass classifier for all model-CWE combinations, with
per-CWE F1-score improvements ranging from +0.13 to +0.35. The results also
show that no evaluated pre-trained code model is strongest across all four CWE
types. Thresholds selected on validation F1 make the detector more permissive, in
creasing the rate at which the ground-truth CWE reaches the repair stage by 11.6 to
21.4 percentage points; UniXcoder achieves the highest detection rate of 85.9%. For
vulnerability repair, detection-augmented prompting improves vulnerability repair
performance, increasing the vulnerability pattern removal rate from 28.22% under
the unguided baseline to 48.43% under the detailed guided prompting strategy, while
maintaining high code quality.
The results indicate that specialised binary classifiers are the strongest evaluated
architecture, while model selection and threshold selection still affect how these
classifiers perform within the pipeline. Moreover, incorporating detection results
into repair prompts proves an effective strategy for improving vulnerability repair
quality, though the improvement is bounded by upstream detection accuracy.
Beskrivning
Ämne/nyckelord
Software-Defined Vehicles, Vulnerability Detection, Automated Program Repair, Large Language Models, Pre-trained Code Models
