Unstable gradients in deep neural nets

Storm, Ludvig

Unstable gradients in deep neural nets

Ladda ner

Master_thesis_Ludvig_Storm.pdf (1.09 MB)

Publicerad

2020

Författare

Storm, Ludvig

Typ

Examensarbete för masterexamen

Program

Complex adaptive systems (MPCAS), MSc

Sammanfattning

In the past decade, deep learning algorithms have gained increased popularity due to their ability to detect and represent abstract features in complex data sets. One of the most prominent deep learning algorithms is the deep neural network, having managed to outperform many state-of-the-art machine learning techniques. While its success can largely be attributes to its depth, this feature also causes it to be difficult to train. One of the main obstacles is the vanishing gradient problem; a phenomenon causing updates to the network to exponentially vanish with depth. The problem is severe enough to have been referred to as a fundamental problem of deep learning [18]. However, simulations reveal that DNNs are able to escape the vanishing gradient problem after having been trained for some time, but the dynamics of this escape are still not understood. In this work, the underlying dynamics of the escape from the vanishing gradient problem in deep neural networks is explored by means of dynamical systems theory. In particular, the concept of Lyapunov exponents is used to analyse how signals propagating through the network evolve, and whether this has a connection to the vanishing gradient problem. The study is based on results by [16] and [19]. Furthermore, a method to circumvent the vanishing gradient problem, developed in [14] for very wide neural networks, is explored for narrow networks. The results of this thesis suggest the escape from the vanishing gradient problem is unrelated to what data set the deep neural network is trained on, but is rather a consequence of the training algorithm. Furthermore, it is found that the escape is characterised by the maximal Lyapunov exponent of the network growing from a negative value to a value close to 0. To further explore the underlying dynamics, it is suggested to study the training algorithms in the absence of data. The method of avoiding the vanishing gradient problem, presented by [14], is found to work poorly for narrow neural networks.

Ämne/nyckelord

Vanishing gradient, dynamical system, Lyapunov exponent, dynamical isometry

URI

https://hdl.handle.net/20.500.12380/300713

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Unstable gradients in deep neural nets

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced