Evaluating Machine Learning Algorithms in Design Pattern Recognition - Exploring the Performance of Classification and Clustering Algorithms in Design Pattern Recognition Utilising Large Language Models

dc.contributor.authorAndersson, Simon
dc.contributor.authorBerggren, Viktor
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerHeyn, Hans-Martin
dc.contributor.supervisorHorkoff, Jennifer
dc.date.accessioned2024-10-17T13:27:01Z
dc.date.available2024-10-17T13:27:01Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractDesign Pattern Recognition (DPR) is an ongoing research challenge in the field of software engineering for increasing software maintainability in code. Recent work has utilised Large Language Models (LLMs) for extracting semantic information from code. This study follows up on previous research and investigates, explores, and evaluates the performance of multiple classification and clustering algorithms when applied to embeddings extracted from LLMs. Performance is explored between contexts using different LLMs, design patterns, and programming languages. Data for design pattern implementations was gathered for Java, Python, and C# via GitHub and the P-MARt repository. Each algorithm was run with tuned hyperparameters, and their average performance across multiple runs was compared. The results indicate variance for the individual performance of the algorithms, but the overall performance order between the algorithms remains the same. Classification algorithms outperformed clustering algorithms, and clustering algorithms had low performance in the measured metrics across all tests. The results also showed a difference in performance between behavioral, creational, and structural design patterns. This study shows further promise for the use of LLMs for DPR and recognises the need for larger studies utilising LLMs for DPR.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/308924
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectcomputer science
dc.subjectdesign patterns
dc.subjectmachine learning
dc.subjectlarge language models
dc.subjectsoftware engineering
dc.subjectdesign pattern recognition
dc.subjectDPR
dc.subjectLLM
dc.titleEvaluating Machine Learning Algorithms in Design Pattern Recognition - Exploring the Performance of Classification and Clustering Algorithms in Design Pattern Recognition Utilising Large Language Models
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeSoftware engineering and technology (MPSOF), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 24-25 SA VB.pdf
Storlek:
3.7 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: