Self-supervised learning of musical representations using VICReg; a comprehensive study of the VICReg loss function for self-supervised representation learning in the music domain
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Sound and vibration (MPSOV), MSc
Publicerad
2023
Författare
Hesse, Cody
Löf, Sebastian
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Self-supervised learning has emerged as a promising method for learning informative
representations suitable for many machine learning tasks. However, while selfsupervised
representation learning has been instrumental in various fields, its significance in music
information retrieval has only recently gained momentum. This thesis
investigates the potential of the VICReg loss function for self-supervised learning
in the music domain by comparing its performance against the established CLMR
model. Following the evaluations performed in CLMR, we train our VICReg model
on the publically available Free Music Archive and GTZAN datasets. We then evaluate
the learned representation on the downstream task of music classification on
the MagnaTagATune dataset by training a linear logistic classifier and a two-layer
MLP classifier atop the representations generated by a frozen, pre-trained VICReg
model. In our transfer learning experiments, VICReg achieves a ROC-AUC score of
89.15 and a PR-AUC score of 35.85 compared to 88.12 and 33.83, respectively, as
achieved by CLMR, showing that VICReg demonstrates a competitive performance
compared to CLMR. With more robust training and further tuning, we believe that
VICReg can achieve superior performance compared to established loss functions for
self-supervised representation learning in the music domain and advocate continued
exploration in this direction.
Beskrivning
Ämne/nyckelord
Self-supervised learning, Contrastive learning, Music Information Retrieval, Representation learning, VICReg, CLMR, SampleCNN