Evaluating Machine Learning Algorithms in Design Pattern Recognition - Exploring the Performance of Classification and Clustering Algorithms in Design Pattern Recognition Utilising Large Language Models
dc.contributor.author | Andersson, Simon | |
dc.contributor.author | Berggren, Viktor | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
dc.contributor.examiner | Heyn, Hans-Martin | |
dc.contributor.supervisor | Horkoff, Jennifer | |
dc.date.accessioned | 2024-10-17T13:27:01Z | |
dc.date.available | 2024-10-17T13:27:01Z | |
dc.date.issued | 2024 | |
dc.date.submitted | ||
dc.description.abstract | Design Pattern Recognition (DPR) is an ongoing research challenge in the field of software engineering for increasing software maintainability in code. Recent work has utilised Large Language Models (LLMs) for extracting semantic information from code. This study follows up on previous research and investigates, explores, and evaluates the performance of multiple classification and clustering algorithms when applied to embeddings extracted from LLMs. Performance is explored between contexts using different LLMs, design patterns, and programming languages. Data for design pattern implementations was gathered for Java, Python, and C# via GitHub and the P-MARt repository. Each algorithm was run with tuned hyperparameters, and their average performance across multiple runs was compared. The results indicate variance for the individual performance of the algorithms, but the overall performance order between the algorithms remains the same. Classification algorithms outperformed clustering algorithms, and clustering algorithms had low performance in the measured metrics across all tests. The results also showed a difference in performance between behavioral, creational, and structural design patterns. This study shows further promise for the use of LLMs for DPR and recognises the need for larger studies utilising LLMs for DPR. | |
dc.identifier.coursecode | DATX05 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/308924 | |
dc.language.iso | eng | |
dc.setspec.uppsok | Technology | |
dc.subject | computer science | |
dc.subject | design patterns | |
dc.subject | machine learning | |
dc.subject | large language models | |
dc.subject | software engineering | |
dc.subject | design pattern recognition | |
dc.subject | DPR | |
dc.subject | LLM | |
dc.title | Evaluating Machine Learning Algorithms in Design Pattern Recognition - Exploring the Performance of Classification and Clustering Algorithms in Design Pattern Recognition Utilising Large Language Models | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Software engineering and technology (MPSOF), MSc |