Swedish Dialect Classification using Artificial Neural Networks and Guassian Mixture Models

Typ
Examensarbete för masterexamen
Master Thesis
Program
Publicerad
2017
Författare
Blomqvist, Viktor
Lidberg, David
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Variations due to speaker dialects are one of the main problems in automatic speech recognition. A possible solution to this issue is to have a separate classifier identify the dialect of a speaker and then load an appropriate speech recognition system. This thesis investigates classification of seven Swedish dialects based on the SweDia2000 database. Classification was done using Gaussian mixture models, which are a widely used technique in speech processing. Inspired by recent progress in deep learning techniques for speech recognition, convolutional neural networks and multi-layered perceptrons were also implemented. Data was preprocessed using both mel-frequency coefficients, and a novel feature extraction technique using path signatures. Results showed high variance in classification accuracy during cross validations even for simple models, suggesting a limitation in the amount of available data for the classification problems formulated in this project. The Gaussian mixture models reached the highest accuracy of 61.3% on test set, based on singe-word classification. Performance is greatly improved by including multiple words, achieving around 80% classification accuracy using 12 words.
Beskrivning
Ämne/nyckelord
Grundläggande vetenskaper, Matematik, Basic Sciences, Mathematics
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material