LLMs for SDVs: Automated Software Vulnerability Detection and Repair

Gong, Wenkang; Yan, Jieman

LLMs for SDVs: Automated Software Vulnerability Detection and Repair

Ladda ner

CSE 26-121 WG JY.pdf (5.45 MB)

Publicerad

2026

Författare

Gong, Wenkang

Yan, Jieman

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Software engineering and technology (MPSOF), MSc

Sammanfattning

Software-Defined Vehicles (SDVs) increasingly rely on large-scale C/C++ software stacks to implement safety-critical functionalities. While these languages provide the deterministic performance and hardware control required in automotive systems, they are also susceptible to memory-safety vulnerabilities such as buffer overflows, out-of-bounds accesses, NULL pointer dereferences, and resource management errors. Existing vulnerability analysis approaches remain essential in industrial practice but face limitations in scalability, coverage, and manual remediation effort when applied to modern automotive-scale software systems. Recent advances in Large Language Models (LLMs) have motivated increasing research interest in automated vulnerability detection and repair. This thesis presents an experimental study of a two-stage detection and repair pipeline for function-level C/C++ memory-safety vulnerability detection and re pair in software relevant to SDVs. For the detection stage, the study evaluates how classification strategy, pre-trained code model selection, and inference-time threshold selection affect detection performance for four vulnerability categories, Common Weakness Enumeration (CWE)-787, CWE-476, CWE-399, and CWE-125. Detection experiments compare CodeBERT, GraphCodeBERT, and UniXcoder across specialised binary classifiers and a shared multiclass classifier. For the repair stage, the study evaluates how detection-augmented prompting using vulnerability guidance affect LLM-based automated vulnerability repair performance. Repair experiments evaluate three prompting strategies with increasing levels of vulnerability guidance. Experiments on the BigVul and PrimeVul datasets show that the specialised binary classifiers outperform the multiclass classifier for all model-CWE combinations, with per-CWE F1-score improvements ranging from +0.13 to +0.35. The results also show that no evaluated pre-trained code model is strongest across all four CWE types. Thresholds selected on validation F1 make the detector more permissive, in creasing the rate at which the ground-truth CWE reaches the repair stage by 11.6 to 21.4 percentage points; UniXcoder achieves the highest detection rate of 85.9%. For vulnerability repair, detection-augmented prompting improves vulnerability repair performance, increasing the vulnerability pattern removal rate from 28.22% under the unguided baseline to 48.43% under the detailed guided prompting strategy, while maintaining high code quality. The results indicate that specialised binary classifiers are the strongest evaluated architecture, while model selection and threshold selection still affect how these classifiers perform within the pipeline. Moreover, incorporating detection results into repair prompts proves an effective strategy for improving vulnerability repair quality, though the improvement is bounded by upstream detection accuracy.

Ämne/nyckelord

Software-Defined Vehicles, Vulnerability Detection, Automated Program Repair, Large Language Models, Pre-trained Code Models

URI

https://hdl.handle.net/20.500.12380/311810

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

LLMs for SDVs: Automated Software Vulnerability Detection and Repair

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By