Intent-Driven Code Generation for Android Application Testing Using Large Language Models

Gholamhosseinpour, Ali; Zhang, Xiaoran

Intent-Driven Code Generation for Android Application Testing Using Large Language Models

dc.contributor.author	Gholamhosseinpour, Ali
dc.contributor.author	Zhang, Xiaoran
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Berger, Christian
dc.contributor.supervisor	Yinan, Yu
dc.date.accessioned	2025-10-06T12:58:27Z
dc.date.issued	2025
dc.date.submitted
dc.description.abstract	Modern Android interfaces evolve rapidly, and conventional UI test automation struggles to keep pace with this change. This thesis presents an intent–driven framework that leverages large language models (LLMs) in combination with multi-modal UI representations to translate natural-language testing goals into executable Android tests. While inspired by crawler-based exploration, the framework adopts a modular architecture that separates planning, selection, execution, and observation stages. It incorporates memory for state tracking and includes an evaluator– optimizer loop to refine LLM outputs dynamically during execution. A hybrid screen representation—combining XML hierarchies and screenshots—enables the system to reason over both structural and visual elements of the UI, while a Python-based control layer drives actions on physical devices. The framework is evaluated on three production-grade Volvo Group applications (Alarm Clock, System Settings, and Load Indicator). Across 45 reference scenarios, the generated tests achieve a 60% aggregate pass rate – compared to manual tests at 87%, reach up to 88% functional correctness, and reduce the amount of written code by as much as 70% compared to manually implemented baselines. Ablation studies show that visual input in addition to XML consistently supports task success and rarely confuses the model, contributing to improved reasoning across a wide range of UI challenges. XML remains valuable for precise element localization, especially where structural anchors are critical. A reasoning analysis over 42 planner steps yields an average score of 4.3 out of 5 for correctness, indicating strong semantic alignment between global testing goals and selected local actions. The framework exhibits weaknesses in dynamic screens, complex seekbar interactions, and backend-dependent states, where test reliability remains limited. This work contributes a modular LLM-based system for intent-driven UI testing, empirical evidence of its effectiveness and conciseness on industrial applications without model fine-tuning, and practical design guidelines for future intelligent testing tools, including prompt structures, tool invocation patterns, and memory-based tracking heuristics. Overall, the study shows that combining multi-modal LLM reasoning with structured UI representations advances automated mobile testing toward more adaptive, maintainable, and goal-aligned workflows.
dc.identifier.coursecode	DATX05
dc.identifier.uri	http://hdl.handle.net/20.500.12380/310592
dc.language.iso	eng
dc.relation.ispartofseries	CSE 25-09
dc.setspec.uppsok	Technology
dc.subject	Android UI Testing, Large Language Models, Intent-Driven Code Generation, Automated Software Testing, Multi-Modal Models, Test Script Generation, Semantic Reasoning
dc.title	Intent-Driven Code Generation for Android Application Testing Using Large Language Models
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Software engineering and technology (MPSOF), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 25-09 XZ AG.pdf
Size:: 3.41 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen