Self-Supervised Vision Transformers for Steel Surface Defect Detection - An Empirical Investigation of Fine-Tuning Strategies and Data Efficiency

Hämtar...
Bild (thumbnail)

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Industrial defect classification is a critical task in quality control, where accurate detection of surface defects is essential for ensuring product reliability. However, ob taining large amounts of labeled data is often costly and time-consuming, motivating the use of self-supervised learning (SSL) to leverage unlabeled data. This thesis in vestigates the effectiveness of SSL for defect classification using Vision Transformer based methods, with a focus on Masked Autoencoders (MAE) and Distillation with No labels (DINO). The study evaluates the performance of these methods under different conditions, including fine-tuning vs linear probing, ImageNet initialization vs training from scratch and varying amounts of labeled data. A comprehensive experimental setup is used to assess both overall performance and label efficiency and results are compared to a supervised You Only Look Once (YOLO) baseline. The results show that both MAE and DINO learn transferable representations that achieve high classification performance after fine-tuning. DINO consistently outper forms MAE, indicating that distillation-based approaches produce more discrimina tive features for this task. Fine-tuning significantly improves performance compared to linear probing, highlighting the importance of adapting the full model to the down stream task. Additionally, ImageNet initialization provides a strong advantage over training from scratch, demonstrating the importance of large-scale pretraining. Un der limited labeled data condition during fine-tuning stage, both methods remain effective, achieving competitive performance even at low label fractions such as 1 % or 5%. However, performance improves steadily as more labeled data becomes available. Analysis of the results reveals that most misclassifications occur classi fying non-defective samples in defect classes. However, the confusion between the defect classes is minimal which indicates that the key challenge is to avoid the false positives, i.e. identifying non-defective samples as defective. Overall, the finding demonstrate that self-supervised learning is a viable and scalable approach for industrial defect classification, particularly in scenarios where labeled data is scarce. While fully supervised methods still achieve the highest performance when sufficient labeled data is available, SSL provides a strong alternative with reduce reliance on annotations.

Beskrivning

Ämne/nyckelord

Self-supervised learning (SSL), industrial defect classification, computer vision, Vision Transformer, MAE, DINO, label efficiency, transfer learning.

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

Endorsement

Review

Supplemented By

Referenced By