Swedish Dialect Classification using Artificial Neural Networks and Guassian Mixture Models

Publicerad

Typ

Examensarbete för masterexamen
Master Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Variations due to speaker dialects are one of the main problems in automatic speech recognition. A possible solution to this issue is to have a separate classifier identify the dialect of a speaker and then load an appropriate speech recognition system. This thesis investigates classification of seven Swedish dialects based on the SweDia2000 database. Classification was done using Gaussian mixture models, which are a widely used technique in speech processing. Inspired by recent progress in deep learning techniques for speech recognition, convolutional neural networks and multi-layered perceptrons were also implemented. Data was preprocessed using both mel-frequency coefficients, and a novel feature extraction technique using path signatures. Results showed high variance in classification accuracy during cross validations even for simple models, suggesting a limitation in the amount of available data for the classification problems formulated in this project. The Gaussian mixture models reached the highest accuracy of 61.3% on test set, based on singe-word classification. Performance is greatly improved by including multiple words, achieving around 80% classification accuracy using 12 words.

Beskrivning

Ämne/nyckelord

Grundläggande vetenskaper, Matematik, Basic Sciences, Mathematics

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced