Evaluation of Conditional Recurrent Generative Adversarial Networks forMultivariate Time-Series Augmentation

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

A successful application of any machine learning algorithmis dependent on a sufficiently large training dataset, preferably class-balanced and correctly labeled. However, in many applications, the collection and labeling of data is time-consuming, expensive, and might require special security precautions if the data is of a sensitive nature. Therefore, different types of augmentation methods are commonly used. For time-series data, traditional augmentation methods such as rotation, translation, and flipping are not applicable. In applications where the dataset consists of time-series data, other augmentation methods are therefore of interest. In this thesis, the usage of generative adversarial networks (GANs) as an augmentation method for univariate and multivariate time-series data is investigated. Both recurrent and conditional recurrent GANs are examined. Apart from constructing architectures for time-series generation, the thesis focuses on finding suitable methods for evaluating the quality of the generated data. To monitor the training progress and select a suitable generator model to simulate synthetic data from, two distance-based kernel metrics are used: maximum mean discrepancy (MMD) and energy distance (ED). To evaluate the sample quality and diversity of the generated data, several experiments are performed where a classifier is trained on real, tested on synthetic data (TRTS), trained on synthetic, tested on real data (TSTR), and lastly trained and tested on a mixture of real and synthetic data (TMTM). Furthermore, experiments aiming to examine the usage of synthetic samples from conditional recurrent GANs to augment a real dataset are performed. The results indicate that the GANs successfully generates highly realistic samples, both of simpler time-series and more complexmultivariate time-series. However, the time-series seem to not aid a classifier to any large extent when added to real data, even when larger proportions of synthetic data are added. A possible explanation for this is that the synthetic data, although consisting of realistic samples, suffers from loss of in-class diversity and boundary distortion.

Description

Keywords

deep learning, generative adversarial networks, generative models,multivariate timeseries classification, maximummean discrepancy, energy distance, covariate shift, boundary distortion

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By