A Version Oriented Parallel Asynchronous Evolution Strategy for Deep Learning

Typ
Examensarbete för masterexamen
Program
Computer systems and networks (MPCSN), MSc
Publicerad
2021
Författare
JANG, MYEONG-JIN
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
In this work we propose a new parallel asynchronous Evolution Strategy (ES) that outperforms the existing ESs, including the canonical ES and steady-state ES. ES has been considered a competitive alternative solution for optimizing neural networks in deep reinforcement learning, instead of using an optimizer and a backpropagation function. In this thesis, three different ES systems were implemented to compare the performances of each ES implementation. Two ES systems were implemented based on existing ES systems, which are the canonical ES and steady-steady ES, respectively. Lastly, the last ES system is the proposed ES system called Version Oriented Parallel Asynchronous Evolution Strategy (VOPAES). The canonical ES replaces all population individuals at each generation, whereas the steady-state ES replaces only the weakest population with the newly created one. By replacing all population individuals, the canonical ES could optimize the network faster than the steady-state ES. However, it requires synchronization which might increase CPU idle time. On the contrary, a parallel steady-state ES does not require synchronization, but its learning speed could be slower than the parallel canonical ES one. Therefore, we suggest VOPAES as an advanced ES solution that takes the benefits of both the parallel canonical ES and the parallel steady-state ES system. The test results of this work demonstrated that the canonical ES system can be implemented asynchronously using versions. Moreover, by merging the benefits, VOPAES could decrease CPU idle time and maintain high optimization accuracy and speed as the parallel canonical ES system. In conclusion, VOPAES achieved the fastest training speed among the implemented ES systems.
Beskrivning
Ämne/nyckelord
Reinforcement Learning , Parallelism , Evolution Strategy , Back-propagation , Asynchronous , Optimization
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index