Optimising Multimodal Learning with a System-Aware Perspective

dc.contributor.authorAlves Henriques e Silva, Hugo Manuel
dc.contributor.authorChen, Hongguang
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerSchneider, Gerardo
dc.contributor.supervisorSelpi, Selpi
dc.date.accessioned2025-10-16T12:41:01Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractMultimodal learning in machine learning (ML) aims to improve model performance by integrating multiple modalities, mirroring human perception. However, there are several challenges when training such multimodal models from scratch. These challenges include (1) modality imbalance, where certain modalities dominate the learning process and limit the contribution of others; (2) modality prediction bias, where models disproportionately rely on specific modalities during inference despite the presence of equally informative alternatives; and (3) inefficient resource allocation, resulting from the overtraining of specific modalities and suboptimal utilisation of available hardware. This thesis tackles these issues by proposing a system-aware approach that incorporates algorithmic modality rebalancing techniques, optimisations driven by high-performance computing (HPC), and dynamic adjustments of modality contributions to improve multimodal model performance, training efficiency, and scalability. By focusing on the previous three key challenges, we introduce novel optimisation strategies to detect and mitigate modality imbalance arising from divergent learning curves. We prevent modality suppression by addressing modality bias during prediction. Finally, we address resource inefficiencies by designing adaptive training procedures, load-aware parallelisation, and dynamic scheduling to optimise GPU utilisation and reduce unnecessary computation. The proposed methods are evaluated on several multimodal datasets, including CREMA-D, AVE, and IEMOCAP, among others. Experimental results demonstrate the effectiveness of our system-aware optimisations, showing substantial improvements in both model performance and computational efficiency. The research also investigates multi-GPU optimisation strategies to further enhance the scalability of multimodal learning systems in high-performance computing environments. This work provides a comprehensive approach to advancing multimodal learning by addressing both algorithmic and system-level challenges, contributing to the development of more efficient, scalable, and energy-conscious AI models.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310644
dc.language.isoeng
dc.relation.ispartofseriesCSE 25-25
dc.setspec.uppsokTechnology
dc.subjectMachine Learning, Multimodal Learning, Modality Imbalance, Modality Prediction Bias, High-Performance Computing (HPC), Training Efficiency, Scalability, GPU Utilisation
dc.titleOptimising Multimodal Learning with a System-Aware Perspective
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc
local.programmeHigh-performance computer systems (MPHPC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 25-25 HA HC.pdf
Storlek:
8.36 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: