Keyword Spotting Within an Automotive Environment

Hammaräng Grip, Elias; Nilsson, Elias

Keyword Spotting Within an Automotive Environment

dc.contributor.author	Hammaräng Grip, Elias
dc.contributor.author	Nilsson, Elias
dc.contributor.department	Chalmers tekniska högskola / Institutionen för matematiska vetenskaper	sv
dc.contributor.examiner	Sagitov, Serik
dc.contributor.supervisor	Eghbali, Amir
dc.date.accessioned	2023-06-12T11:57:12Z
dc.date.available	2023-06-12T11:57:12Z
dc.date.issued	2023
dc.date.submitted	2023
dc.description.abstract	The aim of this project is to implement a system to detect keywords within speech, i.e., keyword spotting (KWS), that performs well in an automotive environment. The system is based on extracting features from sound waves captured by a microphone, and machine learning (ML). The Google speech commands (GSC) dataset is used to develop the models in combination with audio book samples from the LibriSpeech (LS) dataset. The combination of these two datasets is unique and was done with the goal of increasing the robustness of the models. In addition, data augmentation and the insertion of background noise are key tools within this project, to target the system towards an automotive environment. Aside from standard performance metrics, the complexity of the model, which will appear as a time delay for the user, is also an important aspect to enable real-time usage. Performance is examined using recorded speech and in real-world settings, both in a noise-free and a noisy automotive environment. The best-performing model was found to be a temporal convolutional residual neural network (TC-ResNet) using mel frequency cepstral coefficients (MFCC) features, which achieved an accuracy of 95.34% on the validation dataset. The model complexity is low compared to models in previous studies, with 152.7 K parameters and 3.22 M multiplications performed by the model. The model’s performance is substantially lowered in an automotive environment with an average accuracy of 83.71%, but it is considered promising due to multiple possible improvements regarding capturing and filtering the speech signals by using the car’s hardware instead of the laptop that was used. Due to low performance when evaluating the models on coherent speech, the suggestion is that the system should be implemented with a voice activity detection system or a "push-to-talk" button and not as a constantly ongoing process. The data collection is proposed as the main focus for future improvements, as more labeled audio segments are needed to build a more qualitative model with wider functionalities.
dc.identifier.coursecode	MVEX03
dc.identifier.uri	https://hdl.handle.net/20.500.12380/306175
dc.language.iso	eng
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	Keyword spotting, machine learning, artificial neural networks, automotive environment.
dc.title	Keyword Spotting Within an Automotive Environment
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Engineering mathematics and computational science (MPENM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: Master Thesis Elias Nilsson_Nils Hammaräng Grip 2023.pdf
Size:: 2.43 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen