Audio Based Road Type Classification Using CNNs and AST: Development of Audio Based Road Type Classification Models with Focus on Convolutional Neural Networks and The Audio Spectrogram Transformer Model

dc.contributor.authorKohestani, Faisal
dc.contributor.authorMehrzad, Niloofar
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerSmallbone, Nicholas
dc.contributor.supervisorGholamzadeh Khoee, Arsham
dc.date.accessioned2025-09-24T11:32:53Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractThis thesis investigates the use of machine learning models for classifying road types based on vehicle audio recordings. The goal is to evaluate the effectiveness of different model architectures, specifically Convolutional Neural Networks (CNNs) and the transformer-based Audio Spectrogram Transformer model, in distinguishing between road surface types such as smooth asphalt, rough asphalt and uneven surfaces. Audio data was pre-processed using feature extraction techniques such as Mel-spectrograms and Mel-frequency cepstral coefficients (MFCCs). Multiple CNN models were developed and trained, while a pre-trained Audio Spectrogram Transformer model was fine-tuned for the task. All models were evaluated using stratified 5-fold crossvalidation with performance measured through metrics such as accuracy, F1-score, precision, recall, confusion matrices and inference metrics. The results show that the AST model achieved the highest classification performance, while the CNN models offered advantages in inference speed and memory usage. Post-training quantization was applied to all models using Qualcomm’s AI Hub to determine their viability for deployment on mobile or embedded-systems. The findings highlight the potential of this audio-based road type classification as a composite sensor for automotive applications. Limitations related to dataset, feature representation, and recording conditions are discussed, along with recommendations for future improvements and deployment strategies.
dc.identifier.coursecodeLMTX38
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310522
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectClassification
dc.subjectCNN
dc.subjectAST
dc.subjectAudio
dc.subjectMel-spectrogram
dc.subjectMFCC
dc.subjectCross-validation
dc.subjectRoad-type
dc.subjectQuantization
dc.titleAudio Based Road Type Classification Using CNNs and AST: Development of Audio Based Road Type Classification Models with Focus on Convolutional Neural Networks and The Audio Spectrogram Transformer Model
dc.type.degreeExamensarbete på kandidatnivåsv
dc.type.degreeBachelor Thesisen
dc.type.uppsokM2
local.programmeDatateknik 180 hp (högskoleingenjör)

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 25-14 FK NM.pdf
Storlek:
1.19 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: