Autoformalization for Agda via Fine-tuning Large Language Models

Huang, Pei

Autoformalization for Agda via Fine-tuning Large Language Models

Ladda ner

Primär fil CSE 25-89 PH.pdf (845.19 KB)

Publicerad

2025

Författare

Huang, Pei

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Data science and AI (MPDSC), MSc

Sammanfattning

Autoformalization, translating informal mathematical statements into fully formal machine-checkable code, has recently attracted interest thanks to advances in large language models (LLMs). However, progress has been hindered by the lack of highquality parallel corpora pairing natural language with formal code, especially for less-popular systems such as Agda. We address this gap by introducing SMAD (Synthetic Multilanguage Autoformalization Dataset), a 400K 4-to-3 parallel corpus covering four formal languages (Dedukti, Agda, Coq, Lean) and three natural languages (English, French, Swedish), generated via the Informath project. We finetune the open-source Qwen2.5-7B-Instruct model on SMAD and achieve substantial gains: BLEU-4 improves from 32.90 to 76.16, and Agda syntax error rate falls from 98.43 % to under 8 %. We further explore joint training across multiple formal and natural languages, demonstrating that multilingual and multi-formal regimes yield notable improvements in low-resource Agda settings. Our work establishes the first LLM-based Agda autoformalization system and provides systematic insights into model scaling, multilinguality, and dataset construction for future research.

Ämne/nyckelord

Autoformalization, Multilingual Formalization, Fine-tuning, LoRA, Large Language Models, Qwen2.5-7B-Instruct, Agda, Formal Proof Assistants, Synthetic Multilanguage Autoformalization Dataset (SMAD), GF

URI

http://hdl.handle.net/20.500.12380/310852

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Autoformalization for Agda via Fine-tuning Large Language Models

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced