Keyword Spotting Within an Automotive Environment

dc.contributor.authorHammaräng Grip, Elias
dc.contributor.authorNilsson, Elias
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerSagitov, Serik
dc.contributor.supervisorEghbali, Amir
dc.date.accessioned2023-06-12T11:57:12Z
dc.date.available2023-06-12T11:57:12Z
dc.date.issued2023
dc.date.submitted2023
dc.description.abstractThe aim of this project is to implement a system to detect keywords within speech, i.e., keyword spotting (KWS), that performs well in an automotive environment. The system is based on extracting features from sound waves captured by a microphone, and machine learning (ML). The Google speech commands (GSC) dataset is used to develop the models in combination with audio book samples from the LibriSpeech (LS) dataset. The combination of these two datasets is unique and was done with the goal of increasing the robustness of the models. In addition, data augmentation and the insertion of background noise are key tools within this project, to target the system towards an automotive environment. Aside from standard performance metrics, the complexity of the model, which will appear as a time delay for the user, is also an important aspect to enable real-time usage. Performance is examined using recorded speech and in real-world settings, both in a noise-free and a noisy automotive environment. The best-performing model was found to be a temporal convolutional residual neural network (TC-ResNet) using mel frequency cepstral coefficients (MFCC) features, which achieved an accuracy of 95.34% on the validation dataset. The model complexity is low compared to models in previous studies, with 152.7 K parameters and 3.22 M multiplications performed by the model. The model’s performance is substantially lowered in an automotive environment with an average accuracy of 83.71%, but it is considered promising due to multiple possible improvements regarding capturing and filtering the speech signals by using the car’s hardware instead of the laptop that was used. Due to low performance when evaluating the models on coherent speech, the suggestion is that the system should be implemented with a voice activity detection system or a "push-to-talk" button and not as a constantly ongoing process. The data collection is proposed as the main focus for future improvements, as more labeled audio segments are needed to build a more qualitative model with wider functionalities.
dc.identifier.coursecodeMVEX03
dc.identifier.urihttp://hdl.handle.net/20.500.12380/306175
dc.language.isoeng
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectKeyword spotting, machine learning, artificial neural networks, automotive environment.
dc.titleKeyword Spotting Within an Automotive Environment
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeEngineering mathematics and computational science (MPENM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master Thesis Elias Nilsson_Nils Hammaräng Grip 2023.pdf
Storlek:
2.43 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: