Prototype Based Segmentation of Bone Tissue Microscopy Images

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Segmentation of microscopy images serves as a fundamental task within the field of biomedical research and clinical analysis. This thesis investigates whether pretrained self-supervised Vision Transformers, ViTs, can be used for prototype based similarity segmentation of unlabeled bone tissue microscopy images. The framework developed and presented utilizes pretrained DINOv2 backbones to extract feature embeddings from microscopy image patches. Positive and negative reference points are used to construct prototype embeddings, enabling similarity based segmentation within the learned feature space. To evaluate how model capacity influences the learned feature space and segmentation performance, all available DINOv2 backbone sizes were included in the experiments. Feature space visualizations and prototype transfer experiments further enabled evaluation of representation quality as well as the robustness and generalization capabilities of the proposed framework. In addition, the DINO heatmaps were used as input to a U-Net to investigate whether they could improve segmentation quality in supervised learning. The results show that pretrained ViTs extract feature representations in which tissue and background regions become partially separable within the learned feature space. PCA and UMAP visualizations indicate, together with clustering metrics, that structurally similar image patches tend to form clusters in the embedding space. The Giant backbone achieved the strongest segmentation performance with a mean dice score of 0.690 and an IoU of 0.534. Prototype transfer performed well within the same sample (mean dice score of 0.644), but performance decreased when transferring prototypes across samples (mean dice score of 0.542), indicating that the framework is sensitive to biological variability and domain shift. Providing a U-Net with the DINO output for refinement improved the dice scores while also reducing boundary alignment errors. The study demonstrates that pretrained self-supervised Vision Transformers can be used for prototype based segmentation of bone tissue microscopy images. Despite being trained on natural RGB images rather than microscopy data, the evaluated DINOv2 backbones produced feature representations that enabled segmentation of bone structures without any task specific training.

Description

Keywords

self-supervised representation learning, DINOv2, prototype based segmentation, microscopy image segmentation, Vision Transformers, feature space similarity, bone tissue analysis.

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By