Asynchronous Parallel Stochastic Gradient Descent: A study of the influence of synchronization methods and hyperparameters

Ek, Hampus

Asynchronous Parallel Stochastic Gradient Descent: A study of the influence of synchronization methods and hyperparameters

dc.contributor.author	Ek, Hampus
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.examiner	Papatriantafilou, Marina
dc.contributor.supervisor	Tsigas, Philippas
dc.date.accessioned	2021-11-16T13:31:09Z
dc.date.available	2021-11-16T13:31:09Z
dc.date.issued	2021	sv
dc.date.submitted	2020
dc.description.abstract	Artificial Neural Networks (ANN) can solve complex tasks and be found in applications such as language translation, object recognition, and more. The underlying optimization algorithm for training ANNs is often Stochastic Gradient Descent (SGD). SGD is a first-order numerical optimization algorithm that repeatedly takes a step in the negative gradient direction of a loss function. Training ANNs generally require a large amount of data, and with an increasing amount of data, more complex tasks can be learned. However, more data increases the training time since several passes through the data are required. The training time can be reduced by parallelizing the training process. There are several ways to do this, and parallelizing the SGD iterations is one way. Parallelizing the SGD iterations can mainly be done in two different ways, synchronously or asynchronously. The asynchronous parallel SGD has shown performance benefits over the synchronous approach and has therefore gained increased attention in recent literature. However, asynchrony introduces challenges with understanding the execution and convergence criteria of SGD. These challenges originate from the fact that asynchrony allows for gradient updates on stale (old) views of the state. Parallel SGD and asynchronous parallel SGD, in particular, can make hyperparameter tuning even more time-consuming and challenging compared to regular sequential SGD. Parallelization of SGD introduces an additional dimension, e.g., the level of parallelism, to the training phase. Increased parallelism also increases the risk of crashed executions. This work aims to increase the understanding of the convergence properties of SGD under asynchronous parallelism. This is done by (i) analyzing how the memory model affects convergence under different levels of parallelism. (ii) What impact batch size and step size have on convergence under a varying level of parallelism is also analyzed. (iii) Moreover, an analysis of how the staleness distribution is affected by different batch sizes is made. Furthermore, (iv) backoff methods are tested to reduce contention for the lock-based algorithms and Leashed-SGD. An alternative lock-based approach to regular mutex lock is proposed using a read-write lock. The read-write lock, mutex lock, and two lock-free algorithms, Leashed-SGD [1] and HOGWILD! [2], are compared in all of the dimensions stated in the previous paragraph (i-iv). We can empirically see that the memory model used impacts convergence, where NUMA converges faster than UMA for both lock-based and lock-free AsyncSGD. Further, the level of parallelism has a high impact on what hyperparameters to use. The level of parallelism is also related to the number of crashed and diverged executions, where higher parallelism increases the risk of crashed executions.	sv
dc.identifier.coursecode	DATX05	sv
dc.identifier.uri	https://hdl.handle.net/20.500.12380/304363
dc.language.iso	eng	sv
dc.setspec.uppsok	Technology
dc.subject	Stochastic gradient descent	sv
dc.subject	parallelism	sv
dc.subject	asynchrony	sv
dc.subject	machine learning	sv
dc.title	Asynchronous Parallel Stochastic Gradient Descent: A study of the influence of synchronization methods and hyperparameters	sv
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.uppsok	H

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 21-151 Ek.pdf
Storlek:: 3.99 MB
Format:: Adobe Portable Document Format
Beskrivning:

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 1.51 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen