Clustering and Sentiment Analysis of Customer NVH Feedback in the Automotive Domain A Machine Learning Pipeline to Facilitate Extraction of Relevant Information from Large-Scale Textual Data
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Noise, Vibration, and Harshness (NVH) attributes play a significant role in shaping
overall customer satisfaction with a vehicle. Automotive manufacturers often
collect large volumes of textual customer feedback through surveys, offering valuable
insights into how various vehicle attributes are perceived. This information is
intended to help engineering teams in making informed decisions about where to
focus vehicle improvement efforts. However, its unstructured nature and scale make
it difficult for individual teams to extract the feedback relevant to them.
This thesis investigated the feasibility of a clustering and sentiment analysis pipeline
to support NVH teams in making better use of customer feedback. The proposed
pipeline combined sentence embeddings, dimensionality reduction, and clustering to
group semantically similar feedback. Cluster labels were automatically generated
using a large language model and manually refined when necessary. Both sentencebased
and aspect-based sentiment analysis were applied to quantify sentiment and
extract relevant subtopics for each cluster.
The final configuration produced 15 semantically coherent clusters from approximately
36,000 customer feedback sentences. These clusters captured distinct themes,
ranging from high-level impressions of driving and ownership to specific issues regarding
individual components. Sentence-level sentiment analysis successfully distinguished
between positive and negative feedback showing its potential to guide
improvement efforts. In contrast, aspect-based sentiment analysis was less reliable:
although per-cluster aspect distributions often aligned with cluster themes,
individual aspect terms were too frequently inaccurate. Nonetheless, the method
shows potential, and its effectiveness could likely be substantially enhanced through
domain-specific fine-tuning. Overall, the pipeline effectively facilitated the identification
of relevant feedback and could aid future data-driven design and product
improvement efforts.
Beskrivning
Ämne/nyckelord
Clustering, Deep-Learning, HDBSCAN, K-Means, LCF-ATEPC, Machine- Learning, Natural Language Processing, PyABSA, Sentiment Analysis.
