Keyword Spotting Within an Automotive Environment
dc.contributor.author | Hammaräng Grip, Elias | |
dc.contributor.author | Nilsson, Elias | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för matematiska vetenskaper | sv |
dc.contributor.examiner | Sagitov, Serik | |
dc.contributor.supervisor | Eghbali, Amir | |
dc.date.accessioned | 2023-06-12T11:57:12Z | |
dc.date.available | 2023-06-12T11:57:12Z | |
dc.date.issued | 2023 | |
dc.date.submitted | 2023 | |
dc.description.abstract | The aim of this project is to implement a system to detect keywords within speech, i.e., keyword spotting (KWS), that performs well in an automotive environment. The system is based on extracting features from sound waves captured by a microphone, and machine learning (ML). The Google speech commands (GSC) dataset is used to develop the models in combination with audio book samples from the LibriSpeech (LS) dataset. The combination of these two datasets is unique and was done with the goal of increasing the robustness of the models. In addition, data augmentation and the insertion of background noise are key tools within this project, to target the system towards an automotive environment. Aside from standard performance metrics, the complexity of the model, which will appear as a time delay for the user, is also an important aspect to enable real-time usage. Performance is examined using recorded speech and in real-world settings, both in a noise-free and a noisy automotive environment. The best-performing model was found to be a temporal convolutional residual neural network (TC-ResNet) using mel frequency cepstral coefficients (MFCC) features, which achieved an accuracy of 95.34% on the validation dataset. The model complexity is low compared to models in previous studies, with 152.7 K parameters and 3.22 M multiplications performed by the model. The model’s performance is substantially lowered in an automotive environment with an average accuracy of 83.71%, but it is considered promising due to multiple possible improvements regarding capturing and filtering the speech signals by using the car’s hardware instead of the laptop that was used. Due to low performance when evaluating the models on coherent speech, the suggestion is that the system should be implemented with a voice activity detection system or a "push-to-talk" button and not as a constantly ongoing process. The data collection is proposed as the main focus for future improvements, as more labeled audio segments are needed to build a more qualitative model with wider functionalities. | |
dc.identifier.coursecode | MVEX03 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/306175 | |
dc.language.iso | eng | |
dc.setspec.uppsok | PhysicsChemistryMaths | |
dc.subject | Keyword spotting, machine learning, artificial neural networks, automotive environment. | |
dc.title | Keyword Spotting Within an Automotive Environment | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Engineering mathematics and computational science (MPENM), MSc |