Audio Based Road Type Classification Using CNNs and AST: Development of Audio Based Road Type Classification Models with Focus on Convolutional Neural Networks and The Audio Spectrogram Transformer Model
| dc.contributor.author | Kohestani, Faisal | |
| dc.contributor.author | Mehrzad, Niloofar | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Smallbone, Nicholas | |
| dc.contributor.supervisor | Gholamzadeh Khoee, Arsham | |
| dc.date.accessioned | 2025-09-24T11:32:53Z | |
| dc.date.issued | 2025 | |
| dc.date.submitted | ||
| dc.description.abstract | This thesis investigates the use of machine learning models for classifying road types based on vehicle audio recordings. The goal is to evaluate the effectiveness of different model architectures, specifically Convolutional Neural Networks (CNNs) and the transformer-based Audio Spectrogram Transformer model, in distinguishing between road surface types such as smooth asphalt, rough asphalt and uneven surfaces. Audio data was pre-processed using feature extraction techniques such as Mel-spectrograms and Mel-frequency cepstral coefficients (MFCCs). Multiple CNN models were developed and trained, while a pre-trained Audio Spectrogram Transformer model was fine-tuned for the task. All models were evaluated using stratified 5-fold crossvalidation with performance measured through metrics such as accuracy, F1-score, precision, recall, confusion matrices and inference metrics. The results show that the AST model achieved the highest classification performance, while the CNN models offered advantages in inference speed and memory usage. Post-training quantization was applied to all models using Qualcomm’s AI Hub to determine their viability for deployment on mobile or embedded-systems. The findings highlight the potential of this audio-based road type classification as a composite sensor for automotive applications. Limitations related to dataset, feature representation, and recording conditions are discussed, along with recommendations for future improvements and deployment strategies. | |
| dc.identifier.coursecode | LMTX38 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12380/310522 | |
| dc.language.iso | eng | |
| dc.setspec.uppsok | Technology | |
| dc.subject | Classification | |
| dc.subject | CNN | |
| dc.subject | AST | |
| dc.subject | Audio | |
| dc.subject | Mel-spectrogram | |
| dc.subject | MFCC | |
| dc.subject | Cross-validation | |
| dc.subject | Road-type | |
| dc.subject | Quantization | |
| dc.title | Audio Based Road Type Classification Using CNNs and AST: Development of Audio Based Road Type Classification Models with Focus on Convolutional Neural Networks and The Audio Spectrogram Transformer Model | |
| dc.type.degree | Examensarbete på kandidatnivå | sv |
| dc.type.degree | Bachelor Thesis | en |
| dc.type.uppsok | M2 | |
| local.programme | Datateknik 180 hp (högskoleingenjör) |
