Self-Supervised Vision Transformers for Steel Surface Defect Detection - An Empirical Investigation of Fine-Tuning Strategies and Data Efficiency

Hemmingsson, Nora; Olsson, Alexander

Self-Supervised Vision Transformers for Steel Surface Defect Detection - An Empirical Investigation of Fine-Tuning Strategies and Data Efficiency

dc.contributor.author	Hemmingsson, Nora
dc.contributor.author	Olsson, Alexander
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Wedelin, Dag
dc.contributor.supervisor	Selpi
dc.date.accessioned	2026-06-30T06:48:39Z
dc.date.issued	2026
dc.date.submitted
dc.description.abstract	Industrial defect classification is a critical task in quality control, where accurate detection of surface defects is essential for ensuring product reliability. However, ob taining large amounts of labeled data is often costly and time-consuming, motivating the use of self-supervised learning (SSL) to leverage unlabeled data. This thesis in vestigates the effectiveness of SSL for defect classification using Vision Transformer based methods, with a focus on Masked Autoencoders (MAE) and Distillation with No labels (DINO). The study evaluates the performance of these methods under different conditions, including fine-tuning vs linear probing, ImageNet initialization vs training from scratch and varying amounts of labeled data. A comprehensive experimental setup is used to assess both overall performance and label efficiency and results are compared to a supervised You Only Look Once (YOLO) baseline. The results show that both MAE and DINO learn transferable representations that achieve high classification performance after fine-tuning. DINO consistently outper forms MAE, indicating that distillation-based approaches produce more discrimina tive features for this task. Fine-tuning significantly improves performance compared to linear probing, highlighting the importance of adapting the full model to the down stream task. Additionally, ImageNet initialization provides a strong advantage over training from scratch, demonstrating the importance of large-scale pretraining. Un der limited labeled data condition during fine-tuning stage, both methods remain effective, achieving competitive performance even at low label fractions such as 1 % or 5%. However, performance improves steadily as more labeled data becomes available. Analysis of the results reveals that most misclassifications occur classi fying non-defective samples in defect classes. However, the confusion between the defect classes is minimal which indicates that the key challenge is to avoid the false positives, i.e. identifying non-defective samples as defective. Overall, the finding demonstrate that self-supervised learning is a viable and scalable approach for industrial defect classification, particularly in scenarios where labeled data is scarce. While fully supervised methods still achieve the highest performance when sufficient labeled data is available, SSL provides a strong alternative with reduce reliance on annotations.
dc.identifier.coursecode	DATX05
dc.identifier.uri	https://hdl.handle.net/20.500.12380/311639
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Self-supervised learning (SSL), industrial defect classification, computer vision, Vision Transformer, MAE, DINO, label efficiency, transfer learning.
dc.title	Self-Supervised Vision Transformers for Steel Surface Defect Detection - An Empirical Investigation of Fine-Tuning Strategies and Data Efficiency
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Complex adaptive systems (MPCAS), MSc
local.programme	Data science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 26-30.pdf
Size:: 9.86 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen