Optimising Multimodal Learning with a System-Aware Perspective

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Multimodal learning in machine learning (ML) aims to improve model performance by integrating multiple modalities, mirroring human perception. However, there are several challenges when training such multimodal models from scratch. These challenges include (1) modality imbalance, where certain modalities dominate the learning process and limit the contribution of others; (2) modality prediction bias, where models disproportionately rely on specific modalities during inference despite the presence of equally informative alternatives; and (3) inefficient resource allocation, resulting from the overtraining of specific modalities and suboptimal utilisation of available hardware. This thesis tackles these issues by proposing a system-aware approach that incorporates algorithmic modality rebalancing techniques, optimisations driven by high-performance computing (HPC), and dynamic adjustments of modality contributions to improve multimodal model performance, training efficiency, and scalability. By focusing on the previous three key challenges, we introduce novel optimisation strategies to detect and mitigate modality imbalance arising from divergent learning curves. We prevent modality suppression by addressing modality bias during prediction. Finally, we address resource inefficiencies by designing adaptive training procedures, load-aware parallelisation, and dynamic scheduling to optimise GPU utilisation and reduce unnecessary computation. The proposed methods are evaluated on several multimodal datasets, including CREMA-D, AVE, and IEMOCAP, among others. Experimental results demonstrate the effectiveness of our system-aware optimisations, showing substantial improvements in both model performance and computational efficiency. The research also investigates multi-GPU optimisation strategies to further enhance the scalability of multimodal learning systems in high-performance computing environments. This work provides a comprehensive approach to advancing multimodal learning by addressing both algorithmic and system-level challenges, contributing to the development of more efficient, scalable, and energy-conscious AI models.

Beskrivning

Ämne/nyckelord

Machine Learning, Multimodal Learning, Modality Imbalance, Modality Prediction Bias, High-Performance Computing (HPC), Training Efficiency, Scalability, GPU Utilisation

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced