Explaining Deep Neural Networks using Information Theory and Geometry

Cheong, Ziyong

Explaining Deep Neural Networks using Information Theory and Geometry

Ladda ner

Master's Thesis.pdf (771.15 KB)

Publicerad

2025

Författare

Cheong, Ziyong

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Data science and AI (MPDSC), MSc

Sammanfattning

Abstract Deep neural networks (DNNs) have achieved remarkable success in various machine learning tasks, yet a satisfactory explanation for their generalisation performance remains elusive. In this thesis, we stand on the shoulders of giants and explore two extensions to the information bottleneck (IB) principle, which formalises the idea that DNNs are ultimately compression machines which turn input data into minimal sufficient representations. The first extension is based on geometry and replaces the information-theoretic IB with neural collapse (NC), a phenomenon where the penultimate layer representations of the training data converge to a regular simplex, a configuration of points that maximises pairwise distance. The second extension enforces more realistic constraints on the computation of mutual information used in IB, by taking into account the functional family used to decode said information. This leads to the natural definition of 𝒱-information. We build on these ideas by extending them to all layers of a DNN. Motivated by realworld computational constraints on computing NC and 𝒱-information, we develop a (to our knowledge) novel algorithm for measuring NC, and (re)discover theoretical motivations behind many current practices in DNN training. Experiments indicate that compression, both geometric and information-theoretic, is not necessary for generalisation, yet DNNs do compress in a predictable manner, with compression increasing as one moves deeper into the network.

Ämne/nyckelord

Keywords: neural collapse, information theory, decodable information bottleneck, compression, generalisation

URI

https://hdl.handle.net/20.500.12380/310273

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Explaining Deep Neural Networks using Information Theory and Geometry

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By