Evaluation of Conditional Recurrent Generative Adversarial Networks forMultivariate Time-Series Augmentation
Examensarbete för masterexamen
Engineering mathematics and computational science (MPENM), MSc
A successful application of any machine learning algorithmis dependent on a sufficiently large training dataset, preferably class-balanced and correctly labeled. However, in many applications, the collection and labeling of data is time-consuming, expensive, and might require special security precautions if the data is of a sensitive nature. Therefore, different types of augmentation methods are commonly used. For time-series data, traditional augmentation methods such as rotation, translation, and flipping are not applicable. In applications where the dataset consists of time-series data, other augmentation methods are therefore of interest. In this thesis, the usage of generative adversarial networks (GANs) as an augmentation method for univariate and multivariate time-series data is investigated. Both recurrent and conditional recurrent GANs are examined. Apart from constructing architectures for time-series generation, the thesis focuses on finding suitable methods for evaluating the quality of the generated data. To monitor the training progress and select a suitable generator model to simulate synthetic data from, two distance-based kernel metrics are used: maximum mean discrepancy (MMD) and energy distance (ED). To evaluate the sample quality and diversity of the generated data, several experiments are performed where a classifier is trained on real, tested on synthetic data (TRTS), trained on synthetic, tested on real data (TSTR), and lastly trained and tested on a mixture of real and synthetic data (TMTM). Furthermore, experiments aiming to examine the usage of synthetic samples from conditional recurrent GANs to augment a real dataset are performed. The results indicate that the GANs successfully generates highly realistic samples, both of simpler time-series and more complexmultivariate time-series. However, the time-series seem to not aid a classifier to any large extent when added to real data, even when larger proportions of synthetic data are added. A possible explanation for this is that the synthetic data, although consisting of realistic samples, suffers from loss of in-class diversity and boundary distortion.
deep learning, generative adversarial networks, generative models,multivariate timeseries classification, maximummean discrepancy, energy distance, covariate shift, boundary distortion