2017 Generative Adversarial Networks (GANs) Research Milestones

Generative Adversarial Models, introduced by Ian Goodfellow, are the next big revolution in the field of deep learning. Why? Because of their ability to perform semi-supervised learning where there is a vast majority of data is unlabelled. Here, GANs can efficiently carry out image generation tasks and other tasks such as converting sketches to an image, conversion of satellite images to a map, and many other tasks. GANs are capable of generating realistic images in any circumstances, for instance, giving some text written in a particular handwriting as an input to the generative model in order to generate more texts in the similar handwriting.

The speciality of these GANs is that as compared to discriminative models, these generative models make use of a joint distribution probability to generate more likely samples. In short, these generative models or GANs are an improvisation to the discriminative models.

Let’s explore some of the research papers that are contributing to further advancements in GANs.

CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

This paper talks about CycleGANs, a class of generative Adversarial networks that carry out Image-to-Image translation. This means, capturing special characteristics of one image collection and figuring out how these characteristics could be translated into the other image collection, all in the absence of any paired training examples. CycleGANs method can also be applied in variety of applications such as Collection Style Transfer, Object Transfiguration, season transfer and photo enhancement.

2017-generative-adversarial-networks-gans-research-milestones-img-0 Cycle GAN architecture

Source: GitHub

CycleGANs are built upon the advantages of PIX2PIX architecture. The key advantage of CycleGANs model is, it allows to point the model at two discrete, unpaired collection of images.For example, one image collection say Group A, would consist photos of landscapes in summer whereas Group B would include photos of landscapes in winter. The CycleGAN model can learn to translate the images between these two aesthetics without the need to merge tightly correlated matches together into a single X/Y training image.

2017-generative-adversarial-networks-gans-research-milestones-img-1 Source: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

The way CycleGANs are able to learn such great translations without having explicit X/Y training images involves introducing the idea of a full translation cycle to determine how good the entire translation system is, thus improving both generators at the same time.

2017-generative-adversarial-networks-gans-research-milestones-img-2 Source: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Currently, the applications of CycleGANs can be seen in Image-to-Image translation and video translations. For example they can be seen used in Animal Transfiguration, Turning portrait faces into doll faces, and so on. Further ahead, we could potentially see its implementations in audio, text, etc., would help us in generating new data for training.

Although this method has compelling results, it also has some limitations

The geometric changes within an image are not fully successful (for instance, the cat to dog transformation showed minute success). This could be caused by the generator architecture choices, which are tailored for good performance on the appearance changes. Thus, handling more varied and extreme transformations, especially geometric changes, is an important problem.
Failure caused by the distribution characteristics of the training datasets. For instance, in the horse to zebra transfiguration, the model got confused as it was trained on the wild horse and zebra synsets of ImageNet, which does not contain images of a person riding a horse or zebra.

These and some other limitations are described in the research paper. To read more about CycleGANs in detail visit the link here.

Wasserstein GAN

In this paper, we get an exposure to Wasserstein GANs and how they overcomes the drawbacks in original GANs.

Although GANs have shown a drastic success in realistic image generation, the training however is not that easy as the process is slow and unstable. In the paper proposed for WGANs, it is empirically shown that WGANs cure the training problem.

Wasserstein distance, also known as Earth Mover’s (EM) distance, is a measure of distance between two probability distributions. The basic idea in WGAN is to replace the loss function so that there always exists a non-zero gradient. This can be done using Wasserstein distance between the generator distribution and the data distribution.

Training these WGANs does not require keeping a balance in training of the discriminator and the generator. It also doesn’t require a design of the network architecture too. One of the most fascinating practical benefits of WGANs is the ability to continuously estimate the EM distance by training the discriminator to an optimal level. The learning curves when used for plotting are useful for debugging and hyperparameter searches. These curves also correlate well with the observed sample quality and improved stability of the optimization process.

Thus, Wasserstein GANs are an alternative to traditional GAN training with features such as:

Improvement in the stability of learning
Elimination of problems like mode collapse
Provide meaningful learning curves useful for debugging and hyperparameter searches

Furthermore, the paper also showcases that the corresponding optimization problem is sound, and provides extensive theoretical work highlighting the deep connections to other distances between distributions.

The Wasserstein GAN has been utilized to train a language translation machine. The condition here is that there is no parallel data between the word embeddings between the two languages. Wasserstein GANs have been used to perform English-Russian and English-Chinese language mappings.

Limitations of WGANs:

WGANs suffer from unstable training at times, when one uses a momentum based optimizer or when one uses high learning rates.
Includes slow convergence after weight clipping, especially when clipping window is too large.
It also suffers from the vanishing gradient problem when the clipping window is too small.

To have a detailed understanding of WGANs have a look at the research paper here.

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

This paper describes InfoGAN (Information-theoretic extension to the Generative Adversarial Network). It can learn disentangled representations in a completely unsupervised manner.

In traditional GANs, learned dataset is entangled i.e. encoded in a complex manner within the data space. However, if the representation is disentangled, it would be easy to implement and easy to apply tasks on it. InfoGAN solves the entangled data problem in GANs.

Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, extracts poses of objects correctly irrespective of the lighting conditions within the 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hairstyles, presence/absence of eyeglasses, and emotions on the CelebA face dataset.

InfoGAN does not require any kind of supervision. In comparison to InfoGAN, the only other unsupervised method that learns disentangled representations is hossRBM, a higher-order extension of the spike-and-slab restricted Boltzmann machine which disentangles emotion from identity on the Toronto Face Dataset. However, hossRBM can only disentangle discrete latent factors, and its computation cost grows exponentially in the number of factors.

Whereas, InfoGAN can disentangle both discrete and continuous latent factors, scale to complicated datasets, and typically requires no more training time than regular GAN.

In the experiments given in the paper, firstly the comparison of InfoGAN with prior approaches on relatively clean datasets is shown. Another experiment shown is, where InfoGAN can learn interpretable representations on complex datasets (here no previous unsupervised approach is known to learn representations of comparable quality.)

Thus, InfoGAN is completely unsupervised and learns interpretable and disentangled representations on challenging datasets. Additionally, InfoGAN adds only negligible computation cost on top of GAN and is easy to train. The core idea of using mutual information to induce representation can be applied to other methods like VAE (Variational AutoEncoder) in future. The other possibilities with InfoGAN in future could be,learning hierarchical latent representations, improving semi-supervised learning with better codes, and using InfoGAN as a high-dimensional data discovery tool.

To know more about this research paper in detail, visit the link given here.

Progressive growing of GANs for improved Quality, Stability, and Variation

This paper describes a brand new method for training your Generative Adversarial Networks. The basic idea here is to train both the generator and the discriminator progressively. This means, starting from a low resolution and adding new layers so that the model increases in providing images with finer details as training progresses. Such a method speeds up the training and also stabilizes it to a greater extent, which in turn produces images of unprecedented quality. For instance, a higher quality version of the CELEBA images dataset that provides output resolutions up to 10242 pixels.

2017-generative-adversarial-networks-gans-research-milestones-img-3 Source: https://arxiv.org/pdf/1710.10196.pdf

When new layers are added to the networks, they fade in smoothly. This helps in avoiding the sudden shocks to the already well-trained, smaller resolution layers. Also, the progressive training has various other benefits.

The generation of smaller images is substantially more stable because there is less class information and fewer modes
By increasing the resolution little by little, we are continuously asking a much simpler question compared to the end goal of discovering a mapping from latent vectors to e.g. 10242 images

Progressive growing of GANs also reduces the training time. In addition to this, most of the iterations are done at lower resolutions, and the quality of the result obtained is upto 2-6 times faster, depending on the resolution of the final output.

Thus, by progressively training GANs results into better quality, stability, and variation in images. This may also lead to true photorealism in near future. The paper concludes with the fact that, though there are certain limitations with this training method, which include semantic sensibility and understanding dataset-dependent constraints(such as certain objects being straight rather than curved). This leaves a lot to be desired from GANs and there is also room for improvement in the micro-structure of the images.

To have a thorough understanding of this research paper, read the paper here.