Keyword Spotting Within an Automotive Environment

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

The aim of this project is to implement a system to detect keywords within speech, i.e., keyword spotting (KWS), that performs well in an automotive environment. The system is based on extracting features from sound waves captured by a microphone, and machine learning (ML). The Google speech commands (GSC) dataset is used to develop the models in combination with audio book samples from the LibriSpeech (LS) dataset. The combination of these two datasets is unique and was done with the goal of increasing the robustness of the models. In addition, data augmentation and the insertion of background noise are key tools within this project, to target the system towards an automotive environment. Aside from standard performance metrics, the complexity of the model, which will appear as a time delay for the user, is also an important aspect to enable real-time usage. Performance is examined using recorded speech and in real-world settings, both in a noise-free and a noisy automotive environment. The best-performing model was found to be a temporal convolutional residual neural network (TC-ResNet) using mel frequency cepstral coefficients (MFCC) features, which achieved an accuracy of 95.34% on the validation dataset. The model complexity is low compared to models in previous studies, with 152.7 K parameters and 3.22 M multiplications performed by the model. The model’s performance is substantially lowered in an automotive environment with an average accuracy of 83.71%, but it is considered promising due to multiple possible improvements regarding capturing and filtering the speech signals by using the car’s hardware instead of the laptop that was used. Due to low performance when evaluating the models on coherent speech, the suggestion is that the system should be implemented with a voice activity detection system or a "push-to-talk" button and not as a constantly ongoing process. The data collection is proposed as the main focus for future improvements, as more labeled audio segments are needed to build a more qualitative model with wider functionalities.

Beskrivning

Ämne/nyckelord

Keyword spotting, machine learning, artificial neural networks, automotive environment.

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced