Optimising Multimodal Learning with a System-Aware Perspective
Ladda ner
Publicerad
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Multimodal learning in machine learning (ML) aims to improve model performance by integrating multiple modalities, mirroring human perception. However, there are several challenges when training such multimodal models from scratch. These challenges include (1) modality imbalance, where certain modalities dominate the learning process and limit the contribution of others; (2) modality prediction bias, where models disproportionately rely on specific modalities during inference despite the presence of equally informative alternatives; and (3) inefficient resource allocation, resulting from the overtraining of specific modalities and suboptimal utilisation of available hardware. This thesis tackles these issues by proposing a system-aware approach that incorporates algorithmic modality rebalancing techniques, optimisations driven by high-performance computing (HPC), and dynamic adjustments of modality contributions to improve multimodal model performance, training efficiency, and scalability.
By focusing on the previous three key challenges, we introduce novel optimisation strategies to detect and mitigate modality imbalance arising from divergent learning curves. We prevent modality suppression by addressing modality bias during prediction. Finally, we address resource inefficiencies by designing adaptive training procedures, load-aware parallelisation, and dynamic scheduling to optimise GPU utilisation and reduce unnecessary computation.
The proposed methods are evaluated on several multimodal datasets, including CREMA-D, AVE, and IEMOCAP, among others. Experimental results demonstrate the effectiveness of our system-aware optimisations, showing substantial improvements in both model performance and computational efficiency. The research also investigates multi-GPU optimisation strategies to further enhance the scalability of multimodal learning systems in high-performance computing environments.
This work provides a comprehensive approach to advancing multimodal learning by addressing both algorithmic and system-level challenges, contributing to the development of more efficient, scalable, and energy-conscious AI models.
Beskrivning
Ämne/nyckelord
Machine Learning, Multimodal Learning, Modality Imbalance, Modality Prediction Bias, High-Performance Computing (HPC), Training Efficiency, Scalability, GPU Utilisation
