On the Role of Attention Maps in Visual Transformers - A Clustering Perspective

Anttila Ryderup, Erik; Hsu, Yu-Ping

On the Role of Attention Maps in Visual Transformers - A Clustering Perspective

Ladda ner

CSE 24-155 EAR YH.pdf (15.54 MB)

Publicerad

2024

Författare

Anttila Ryderup, Erik

Hsu, Yu-Ping

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Data science and AI (MPDSC), MSc

Sammanfattning

This thesis delves into a novel area of research, exploring whether attention maps from a single-layer Vision Transformer model exhibit a clustering structure. The discover of such a structure would imply that tokens with similar semantic information tend to cluster together. We extract an attention map from a one-layer Vision Transformer model, which uses image patches as input data. Values below a set threshold are pruned from the attention map, and a graph is created from the remaining data. Various community detection algorithms are then applied to this graph and evaluated based on modularity. We visualize the patches belonging to each cluster and compare classification performance when removing salient and non-salient clusters. The method reveals a significant clustering structure, which was discovered by the Louvain algorithm. The tokens cluster to other objects with similar semantic information, effectively separating parts of the image. The classification logit values for specific images are improved when tokens belonging to unimportant clusters are removed while removing tokens from important clusters negatively impacts performance. This work suggests that a Vision Transformer’s attention layer clusters tokens based on their semantic information, but further research is needed to confirm the generality of this result.

Ämne/nyckelord

Vision Transformer, attention layer, attention map, clustering, interpretability, Louvain, visualization

URI

https://hdl.handle.net/20.500.12380/310470

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

On the Role of Attention Maps in Visual Transformers - A Clustering Perspective

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By