Prototype Based Segmentation of Bone Tissue Microscopy Images
Loading...
Download
Date
Authors
Type
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Model builders
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Segmentation of microscopy images serves as a fundamental task within the field
of biomedical research and clinical analysis. This thesis investigates whether pretrained
self-supervised Vision Transformers, ViTs, can be used for prototype based
similarity segmentation of unlabeled bone tissue microscopy images. The framework
developed and presented utilizes pretrained DINOv2 backbones to extract feature
embeddings from microscopy image patches. Positive and negative reference points
are used to construct prototype embeddings, enabling similarity based segmentation
within the learned feature space.
To evaluate how model capacity influences the learned feature space and segmentation
performance, all available DINOv2 backbone sizes were included in the experiments.
Feature space visualizations and prototype transfer experiments further
enabled evaluation of representation quality as well as the robustness and generalization
capabilities of the proposed framework. In addition, the DINO heatmaps were
used as input to a U-Net to investigate whether they could improve segmentation
quality in supervised learning.
The results show that pretrained ViTs extract feature representations in which tissue
and background regions become partially separable within the learned feature
space. PCA and UMAP visualizations indicate, together with clustering metrics,
that structurally similar image patches tend to form clusters in the embedding space.
The Giant backbone achieved the strongest segmentation performance with a mean
dice score of 0.690 and an IoU of 0.534. Prototype transfer performed well within
the same sample (mean dice score of 0.644), but performance decreased when transferring
prototypes across samples (mean dice score of 0.542), indicating that the
framework is sensitive to biological variability and domain shift. Providing a U-Net
with the DINO output for refinement improved the dice scores while also reducing
boundary alignment errors.
The study demonstrates that pretrained self-supervised Vision Transformers can be
used for prototype based segmentation of bone tissue microscopy images. Despite
being trained on natural RGB images rather than microscopy data, the evaluated
DINOv2 backbones produced feature representations that enabled segmentation of
bone structures without any task specific training.
Description
Keywords
self-supervised representation learning, DINOv2, prototype based segmentation, microscopy image segmentation, Vision Transformers, feature space similarity, bone tissue analysis.
