Self-Supervised Vision Transformers for Steel Surface Defect Detection - An Empirical Investigation of Fine-Tuning Strategies and Data Efficiency
Hämtar...
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Industrial defect classification is a critical task in quality control, where accurate
detection of surface defects is essential for ensuring product reliability. However, ob
taining large amounts of labeled data is often costly and time-consuming, motivating
the use of self-supervised learning (SSL) to leverage unlabeled data. This thesis in
vestigates the effectiveness of SSL for defect classification using Vision Transformer
based methods, with a focus on Masked Autoencoders (MAE) and Distillation with
No labels (DINO). The study evaluates the performance of these methods under
different conditions, including fine-tuning vs linear probing, ImageNet initialization
vs training from scratch and varying amounts of labeled data. A comprehensive
experimental setup is used to assess both overall performance and label efficiency
and results are compared to a supervised You Only Look Once (YOLO) baseline.
The results show that both MAE and DINO learn transferable representations that
achieve high classification performance after fine-tuning. DINO consistently outper
forms MAE, indicating that distillation-based approaches produce more discrimina
tive features for this task. Fine-tuning significantly improves performance compared
to linear probing, highlighting the importance of adapting the full model to the down
stream task. Additionally, ImageNet initialization provides a strong advantage over
training from scratch, demonstrating the importance of large-scale pretraining. Un
der limited labeled data condition during fine-tuning stage, both methods remain
effective, achieving competitive performance even at low label fractions such as 1
% or 5%. However, performance improves steadily as more labeled data becomes
available. Analysis of the results reveals that most misclassifications occur classi
fying non-defective samples in defect classes. However, the confusion between the
defect classes is minimal which indicates that the key challenge is to avoid the false
positives, i.e. identifying non-defective samples as defective.
Overall, the finding demonstrate that self-supervised learning is a viable and scalable
approach for industrial defect classification, particularly in scenarios where labeled
data is scarce. While fully supervised methods still achieve the highest performance
when sufficient labeled data is available, SSL provides a strong alternative with
reduce reliance on annotations.
Beskrivning
Ämne/nyckelord
Self-supervised learning (SSL), industrial defect classification, computer vision, Vision Transformer, MAE, DINO, label efficiency, transfer learning.
