Unstable gradients in deep neural nets
Typ
Examensarbete för masterexamen
Program
Complex adaptive systems (MPCAS), MSc
Publicerad
2020
Författare
Storm, Ludvig
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
In the past decade, deep learning algorithms have gained increased popularity due
to their ability to detect and represent abstract features in complex data sets. One
of the most prominent deep learning algorithms is the deep neural network, having
managed to outperform many state-of-the-art machine learning techniques. While
its success can largely be attributes to its depth, this feature also causes it to be
difficult to train. One of the main obstacles is the vanishing gradient problem; a
phenomenon causing updates to the network to exponentially vanish with depth.
The problem is severe enough to have been referred to as a fundamental problem
of deep learning [18]. However, simulations reveal that DNNs are able to escape
the vanishing gradient problem after having been trained for some time, but the
dynamics of this escape are still not understood.
In this work, the underlying dynamics of the escape from the vanishing gradient
problem in deep neural networks is explored by means of dynamical systems theory.
In particular, the concept of Lyapunov exponents is used to analyse how signals
propagating through the network evolve, and whether this has a connection to the
vanishing gradient problem. The study is based on results by [16] and [19]. Furthermore,
a method to circumvent the vanishing gradient problem, developed in [14] for
very wide neural networks, is explored for narrow networks.
The results of this thesis suggest the escape from the vanishing gradient problem
is unrelated to what data set the deep neural network is trained on, but is rather
a consequence of the training algorithm. Furthermore, it is found that the escape
is characterised by the maximal Lyapunov exponent of the network growing from a
negative value to a value close to 0. To further explore the underlying dynamics, it
is suggested to study the training algorithms in the absence of data. The method of
avoiding the vanishing gradient problem, presented by [14], is found to work poorly
for narrow neural networks.
Beskrivning
Ämne/nyckelord
Vanishing gradient , dynamical system , Lyapunov exponent , dynamical isometry