6 min read

[box type=”note” align=”” class=”” width=””]We have come to you with another guest post by Indra den Bakker, an experienced deep learning engineer and a mentor on Udacity for many budding data scientists. Indra has also written one of our best selling titles, Python Deep Learning Cookbook which covers solutions to various problems in modeling deep neural networks.[/box]

In 2014, we took a significant step in AI with the introduction of Generative Adversarial Networks -better known as GANs- by Ian Goodfellow, amongst others. The real breakthrough of GANs didn’t follow until 2016, however, the original paper includes many novel ideas that would be exploited in the years to come. Previously, deep learning had already revolutionized many industries by achieving above human performance. However, many critics argued that these deep learning models couldn’t compete with human creativity. With the introduction to GANs, Ian showed that these critics could be wrong.

Figure 1: example of style transfer with deep learning

Figure 1: example of style transfer with deep learning

The idea behind GANs is to create new examples based on a training set – for example to demonstrate the ability to create new paintings or new handwritten digits. In GANs two competing deep learning models are trained simultaneously. These networks compete against each other: one model tries to generate new realistic examples, this network is also called the generator. The other network tries to classify if an example originates from the training set or from the generator, also called as discriminator. In other words, the generator tries to mislead the discriminator by generating new examples. In the figure below we can see the general structure of GANs.

Figure 2: GAN structure with X as training examples and Z as noise input.

Figure 2: GAN structure with X as training examples and Z as noise input.

GANs are fundamentally different from other machine learning applications. The task of a GAN is unsupervised: we try to extract patterns and structure from data without additional information. Therefore, we don’t have a truth label. GANs shouldn’t be confused with autoencoder networks. With autoencoders we know what the output should be: the same as the input. But in case of GANs we try to create new examples that look like the training examples but are different. It’s a new way of teaching an agent to learn complex tasks by imitating an “expert”. If the generator is able to fool the discriminator one could argue that the agent mastered the task – think about the Turing test.

Best way to explain GANs is to use images as an example. The resulting output of GANs can be fascinating. The most used dataset for GANs is the popular MNIST dataset. This dataset has been used in many deep learning papers, including the original Generative Adversarial Nets paper.

Figure 3: example of MNIST training images

Figure 3: example of MNIST training images

Let’s say as input we have a bunch of handwritten digits. We want our model to be able to take these examples and create new handwritten digits. We want our model to learn how to write digits in such a way that it looks like handwritten digits. Note, that we don’t care which digits the model creates as long as it looks like one of the digits from 0 to 9.

As you may suspect, there is a thin line between generating examples that are exact copies of the training set and newly created images. We need to make sure that the generator generates new images that follow the distribution of the training examples but are slightly different. This is where the creativity needs to come in. In Figure 2, we’ve showed that the generator uses noise -random values- as input. This noise is random, to make sure that the generator creates different output each time.

Now that we know what we need and what we want to achieve, let’s have a closer look at both model architectures. Let’s start with the generator. We will feed the generator with random noise: a vector of 100 values randomly drawn between -1 and 1. Next, we stack multiple fully connected layers with Leaky ReLU activation function. Our training images are in grayscale and are sized as 28×28. Which means, flattened we need an output of 784 units for the final layer of our generator – the output of the generator should match the size of the training images. As activation function for our final layer we will be using TanH to make sure the resulting values are squeezed between -1 and 1. The final model architecture of our generator looks as follows:

Figure 4: model architecture of the generator

Figure 4: model architecture of the generator

Next, we define our discriminator model. Most common is to use a mirrored version of the generator, where we have as input 784 values and as final layer a fully connected layer with 1 hidden neuron and sigmoid activation function for binary classification. Keep in mind that both the generator and discriminator are trained at the same time. The model looks like this:

Figure 5: model architecture of the discriminator

Figure 5: model architecture of the discriminator

In general, generating new images is a harder task. Therefore, sometimes it can be beneficial to train the generator twice for each step. Whereas the discriminator will only be trained once. Another option is to set the learning rate for the discriminator a bit smaller than the learning rate for the generator. Tracking the performance of GANs can be tricky. Sometimes a lower loss doesn’t represent a better output. That’s why it’s a good idea to output the generated images during the training process. In the following figure we can see the digits generated by a GAN after 20 epochs.

Figure 6: example output of generated MNIST images

Figure 6: example output of generated MNIST images

As we have stated in the introduction, GANs didn’t get much traction until 2016. GANs were mostly unstable and hard to train. Small adjustments in the model or training parameter resulted in unsatisfying results. Advancements in model architecture and other improvements fixed some of the previous limitations and unlocked the real potential of GANs. An important improvement was introduced by Deep Convolutional GANs (DCGANs). DCGANs is a network architecture, where in both the discriminator and generator are fully convolutional. The output is more stable – for datasets with higher translation invariance, like the Fashion MNIST dataset.

Figure 7: example of Fashion MNIST images generated by a Deep Convolutional Generative Adversarial Network (DCGAN)

Figure 7: example of Fashion MNIST images generated by a Deep Convolutional Generative Adversarial Network (DCGAN)

There is so much more to discover with GANs and there is huge potential still to be unlocked. According to Yann LeCun – one of the fathers of deep learning – GANs are the most important advancement in machine learning in the last 20 years. GANs can be used for many different applications, ranging from 3D face generation to upscaling resolution of images and text-to-image. GANs might be the stepping stone we have been waiting for to add creativity to machines.

[author title=”Author’s Bio”]Indra Den BakkerIndra den Bakker is an experienced deep learning engineer and mentor on Udacity. He is the founder of 23insights, a part of NVIDIA’s Inception program—a machine learning start-up building solutions that transform the world’s most important industries. For Udacity, he mentors students pursuing a Nanodegree in deep learning and related fields, and he is also responsible for reviewing student projects. Indra has a background in computational intelligence and has worked for several years as a data scientist for IPG Mediabrands and Screen6 before founding 23insights. [/author]

 

 

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here