Explaining Deep Neural Networks using Information Theory and Geometry

dc.contributor.authorCheong, Ziyong
dc.contributor.departmentChalmers tekniska högskola / Institutionen för elektrotekniksv
dc.contributor.examinerDurisi, Giuseppe
dc.date.accessioned2025-08-04T12:24:51Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractAbstract Deep neural networks (DNNs) have achieved remarkable success in various machine learning tasks, yet a satisfactory explanation for their generalisation performance remains elusive. In this thesis, we stand on the shoulders of giants and explore two extensions to the information bottleneck (IB) principle, which formalises the idea that DNNs are ultimately compression machines which turn input data into minimal sufficient representations. The first extension is based on geometry and replaces the information-theoretic IB with neural collapse (NC), a phenomenon where the penultimate layer representations of the training data converge to a regular simplex, a configuration of points that maximises pairwise distance. The second extension enforces more realistic constraints on the computation of mutual information used in IB, by taking into account the functional family used to decode said information. This leads to the natural definition of đ’±-information. We build on these ideas by extending them to all layers of a DNN. Motivated by realworld computational constraints on computing NC and đ’±-information, we develop a (to our knowledge) novel algorithm for measuring NC, and (re)discover theoretical motivations behind many current practices in DNN training. Experiments indicate that compression, both geometric and information-theoretic, is not necessary for generalisation, yet DNNs do compress in a predictable manner, with compression increasing as one moves deeper into the network.
dc.identifier.coursecodeEENX30
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310273
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectKeywords: neural collapse, information theory, decodable information bottleneck, compression, generalisation
dc.titleExplaining Deep Neural Networks using Information Theory and Geometry
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
HĂ€mtar...
Bild (thumbnail)
Namn:
Master's Thesis.pdf
Storlek:
771.15 KB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
HĂ€mtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: