Self-Supervised Vision Transformers for Steel Surface Defect Detection - An Empirical Investigation of Fine-Tuning Strategies and Data Efficiency

Hemmingsson, Nora; Olsson, Alexander

Self-Supervised Vision Transformers for Steel Surface Defect Detection - An Empirical Investigation of Fine-Tuning Strategies and Data Efficiency

Ladda ner

CSE 26-30.pdf (9.86 MB)

Publicerad

2026

Författare

Hemmingsson, Nora

Olsson, Alexander

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Complex adaptive systems (MPCAS), MSc
Data science and AI (MPDSC), MSc

Sammanfattning

Industrial defect classification is a critical task in quality control, where accurate detection of surface defects is essential for ensuring product reliability. However, ob taining large amounts of labeled data is often costly and time-consuming, motivating the use of self-supervised learning (SSL) to leverage unlabeled data. This thesis in vestigates the effectiveness of SSL for defect classification using Vision Transformer based methods, with a focus on Masked Autoencoders (MAE) and Distillation with No labels (DINO). The study evaluates the performance of these methods under different conditions, including fine-tuning vs linear probing, ImageNet initialization vs training from scratch and varying amounts of labeled data. A comprehensive experimental setup is used to assess both overall performance and label efficiency and results are compared to a supervised You Only Look Once (YOLO) baseline. The results show that both MAE and DINO learn transferable representations that achieve high classification performance after fine-tuning. DINO consistently outper forms MAE, indicating that distillation-based approaches produce more discrimina tive features for this task. Fine-tuning significantly improves performance compared to linear probing, highlighting the importance of adapting the full model to the down stream task. Additionally, ImageNet initialization provides a strong advantage over training from scratch, demonstrating the importance of large-scale pretraining. Un der limited labeled data condition during fine-tuning stage, both methods remain effective, achieving competitive performance even at low label fractions such as 1 % or 5%. However, performance improves steadily as more labeled data becomes available. Analysis of the results reveals that most misclassifications occur classi fying non-defective samples in defect classes. However, the confusion between the defect classes is minimal which indicates that the key challenge is to avoid the false positives, i.e. identifying non-defective samples as defective. Overall, the finding demonstrate that self-supervised learning is a viable and scalable approach for industrial defect classification, particularly in scenarios where labeled data is scarce. While fully supervised methods still achieve the highest performance when sufficient labeled data is available, SSL provides a strong alternative with reduce reliance on annotations.

Ämne/nyckelord

Self-supervised learning (SSL), industrial defect classification, computer vision, Vision Transformer, MAE, DINO, label efficiency, transfer learning.

URI

https://hdl.handle.net/20.500.12380/311639

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Self-Supervised Vision Transformers for Steel Surface Defect Detection - An Empirical Investigation of Fine-Tuning Strategies and Data Efficiency

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By