ODR kommer att vara otillgängligt pga systemunderhåll onsdag 25 februari, 13:00 -15:00 (ca). Var vänlig och logga ut i god tid. // ODR will be unavailable due to system maintenance, Wednesday February 25, 13:00 - 15:00. Please log out in due time.
 

Designing Loss Functions for Learning Sound Timbre Audio Representations in Variational Autoencoders

dc.contributor.authorKorkmaz, Ipek
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerOlsson, Simon
dc.contributor.supervisorTatar, Kivanc
dc.date.accessioned2026-01-15T14:33:29Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractThis study investigates the effect of audio-related loss functions, audio feature extraction methods, and the addition of a synthesis layer on the reconstruction quality and latent space organization of variational autoencoders (VAEs). Three different experiments were conducted to address these questions. The first experiment suggests that different audio-related loss functions do not lead to significant differences in performance, aside from requiring different training durations. Additionally, in the second experiment, while adding a synthesis layer does not substantially improve reconstruction quality, it generally helps the model converge faster during training. Finally in the third experiment, which focuses on feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC) performed slightly better in terms of reconstruction quality. These findings can potentially guide architectural choices for effective audio representation learning in VAE-based models.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310888
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjecttimbre representation
dc.subjectaudio feature extraction
dc.subjectgenerative models
dc.subjectvariational autoencoder
dc.titleDesigning Loss Functions for Learning Sound Timbre Audio Representations in Variational Autoencoders
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 25-124 IK.pdf
Storlek:
6.55 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: