Self-supervised learning of musical representations using VICReg; a comprehensive study of the VICReg loss function for self-supervised representation learning in the music domain

dc.contributor.authorHesse, Cody
dc.contributor.authorLöf, Sebastian
dc.contributor.departmentChalmers tekniska högskola / Institutionen för arkitektur och samhällsbyggnadsteknik (ACE)sv
dc.contributor.departmentChalmers tekniska högskola / Institutionen för arkitektur och samhällsbyggnadsteknik (ACE)en
dc.contributor.examinerAhrens, Jens
dc.contributor.supervisorLordelo, Carlos
dc.contributor.supervisorThomé, Carl
dc.date.accessioned2023-10-09T11:21:55Z
dc.date.available2023-10-09T11:21:55Z
dc.date.issued2023
dc.date.submitted2023
dc.description.abstractSelf-supervised learning has emerged as a promising method for learning informative representations suitable for many machine learning tasks. However, while selfsupervised representation learning has been instrumental in various fields, its significance in music information retrieval has only recently gained momentum. This thesis investigates the potential of the VICReg loss function for self-supervised learning in the music domain by comparing its performance against the established CLMR model. Following the evaluations performed in CLMR, we train our VICReg model on the publically available Free Music Archive and GTZAN datasets. We then evaluate the learned representation on the downstream task of music classification on the MagnaTagATune dataset by training a linear logistic classifier and a two-layer MLP classifier atop the representations generated by a frozen, pre-trained VICReg model. In our transfer learning experiments, VICReg achieves a ROC-AUC score of 89.15 and a PR-AUC score of 35.85 compared to 88.12 and 33.83, respectively, as achieved by CLMR, showing that VICReg demonstrates a competitive performance compared to CLMR. With more robust training and further tuning, we believe that VICReg can achieve superior performance compared to established loss functions for self-supervised representation learning in the music domain and advocate continued exploration in this direction.
dc.identifier.coursecodeACEX30
dc.identifier.urihttp://hdl.handle.net/20.500.12380/307201
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectSelf-supervised learning, Contrastive learning, Music Information Retrieval, Representation learning, VICReg, CLMR, SampleCNN
dc.titleSelf-supervised learning of musical representations using VICReg; a comprehensive study of the VICReg loss function for self-supervised representation learning in the music domain
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeSound and vibration (MPSOV), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
ACEX30 - Cody Hesse and Sebastian Löf.pdf
Storlek:
2.31 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: