Unstable gradients in deep neural nets

dc.contributor.authorStorm, Ludvig
dc.contributor.departmentChalmers tekniska högskola / Institutionen för fysiksv
dc.contributor.examinerMehlig, Bernhard
dc.contributor.supervisorMehlig, Bernhard
dc.date.accessioned2020-02-26T06:51:05Z
dc.date.available2020-02-26T06:51:05Z
dc.date.issued2020sv
dc.date.submitted2019
dc.description.abstractIn the past decade, deep learning algorithms have gained increased popularity due to their ability to detect and represent abstract features in complex data sets. One of the most prominent deep learning algorithms is the deep neural network, having managed to outperform many state-of-the-art machine learning techniques. While its success can largely be attributes to its depth, this feature also causes it to be difficult to train. One of the main obstacles is the vanishing gradient problem; a phenomenon causing updates to the network to exponentially vanish with depth. The problem is severe enough to have been referred to as a fundamental problem of deep learning [18]. However, simulations reveal that DNNs are able to escape the vanishing gradient problem after having been trained for some time, but the dynamics of this escape are still not understood. In this work, the underlying dynamics of the escape from the vanishing gradient problem in deep neural networks is explored by means of dynamical systems theory. In particular, the concept of Lyapunov exponents is used to analyse how signals propagating through the network evolve, and whether this has a connection to the vanishing gradient problem. The study is based on results by [16] and [19]. Furthermore, a method to circumvent the vanishing gradient problem, developed in [14] for very wide neural networks, is explored for narrow networks. The results of this thesis suggest the escape from the vanishing gradient problem is unrelated to what data set the deep neural network is trained on, but is rather a consequence of the training algorithm. Furthermore, it is found that the escape is characterised by the maximal Lyapunov exponent of the network growing from a negative value to a value close to 0. To further explore the underlying dynamics, it is suggested to study the training algorithms in the absence of data. The method of avoiding the vanishing gradient problem, presented by [14], is found to work poorly for narrow neural networks.sv
dc.identifier.coursecodeTIFX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/300713
dc.language.isoengsv
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectVanishing gradientsv
dc.subjectdynamical systemsv
dc.subjectLyapunov exponentsv
dc.subjectdynamical isometrysv
dc.titleUnstable gradients in deep neural netssv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master_thesis_Ludvig_Storm.pdf
Storlek:
1.09 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: