Generative Adversarial Networks (GANs): The next milestone In Deep Learning

With the rise in popularity of deep learning as a concept and a paradigm, neural networks are captivating the interest of machine learning enthusiasts and developers alike, by being able to replicate the human brain for efficient predictions, image recognition, text recognition, and much more. However, can these neural networks do something more, or are they just limited to predictions? Can they self-generate new data by learning from a training dataset? Generative Adversarial networks (GANs) are here, to answer all these questions.

So, what are GANs all about?

Generative Adversarial Networks follow unsupervised machine learning, unlike traditional neural networks. When a neural network is taught to identify a bird, it is fed with a huge number of images including birds, as training data. Each picture is labeled before it is put to use in training the models. This labeling of data is both costly and time-consuming. So, how can you train your neural networks by giving it less data to train on? GANs are of a great help here. They cast out an easy way to train the DL algorithms by slashing out the amount of data required to train the neural network models, that too, with no labeling of data required.

The architecture of a GAN includes a generative network model(G), which produces fake images or texts, and an adversarial network model--also known as the discriminator model (D)--that distinguishes between the real and the fake productions by comparing the content sent by the generator with the training data it has. Both of these are trained separately by feeding each of them with training data and a competitive goal.

generative-adversarial-networks-gans-next-milestone-deep-learning-img-0

Source: Learning Generative Adversarial Networks

GANs in action

GANs were introduced by Ian Goodfellow, an AI researcher at Google Brain. He compares the generator and the discriminator models with a counterfeiter and a police officer. “You can think of this being like a competition between counterfeiters and the police,” Goodfellow said. “Counterfeiters want to make fake money and have it look real, and the police want to look at any particular bill and determine if it’s fake.”

Both the discriminator and the generator are trained simultaneously to create a powerful GAN architecture. Let’s peek into how a GAN model is trained-

Specify the problem statement and state the type of manipulation that the GAN model is expected to carry out.

Collect data based on the problem statement. For instance, for image manipulation, a lot of images are required to be collected to feed in.
The discriminator is fed with an image; one from the training set and one produced by the generator
The discriminator can be termed as ‘successfully trained’ if it returns 1 for the real image and 0 for the fake image.
The goal of the generator is to successfully fool the discriminator and getting the output as 1 for each of its generated image.

In the beginning of the training, the discriminator loss--the ability to differentiate real and fake image or data--is minimal. As the training advances, the generator loss decreases and the discriminator loss increases, This means, the generator is now able to generate real images.

Real world applications of GANs

The basic application of GANs can be seen in generating photo-realistic images. But there is more to what GANs can do. Some of the instances where GANs are majorly put to use include:

Image Synthesis

Image Synthesis is one of the primary use cases of GANs. Here, multilayer perceptron models are used in both the generator and the discriminator to generate photo-realistic images based on the training dataset of the images.

Text-to-image synthesis

Generative Adversarial networks can also be utilized for text-to-image synthesis. An example of this is in generating a photo-realistic image based on a caption. To do this, a dataset of images with their associated captions are given as training data. The dataset is first encoded using a hybrid neural network called the character-level convolutional Recurrent Neural network, which creates a joint representation of both in multimodal space for both the generator and the discriminator. Both Generator and Discriminator are then trained based on this encoded data.

Image Inpainting

Images that have missing parts or have too much of noise are given as an input to the generator which produces a near to real image. For instance, using TensorFlow framework, DCGANs (Deep Convolutional GANs), can generate a complete image from a broken image. DCGANs are a class of CNNs that stabilizes GANs for efficient usage.

Video generation

Static images can be transformed into short scenes with plausible motions using GANs. These GANs use scene dynamics in order to add motion to static images. The videos generated by these models are not real but illusions.

Drug discovery

Unlike text and image manipulation, Insilico medicine uses GANs to generate an artificially intelligent drug discovery mechanism. To do this, the generator is trained to predict a drug for a disease which was previously incurable.The task of the discriminator is to determine whether the drug actually cures the disease.

Challenges in training a GAN

Whenever a competition is laid out, there has to be a distinct winner. In GANs, there are two models competing against each other. Hence, there can be difficulties in training them. Here are some challenges faced while training GANs:

Fair training: While training both the models, precaution has to be taken that the discriminator does not overpower the generator. If it does, the generator would fail to train effectively. On the other hand, if the discriminator is lenient, it would allow any illegitimate content to be generated.

Failure to understand the number of objects and the dimensions of objects, present in a particular image. This usually occurs during the initial learning phase. For instance, GANs, at times output an image which ends up having more than two eyes, which is not normal in the real world. Sometimes, it may present a 3D image like a 2D one. This is because they cannot differentiate between the two.
Failure to understand the holistic structure: GANs lack in identifying universally correct images. It may generate an image which can be totally opposed to how they look in real. For instance, a cat having an elongated body shape, or a cow standing on its hind legs, etc.
Mode collapse is another challenge, which occurs when a low variation dataset is processed by a GANs. Real world includes complex and multimodal distributions, where data may have different concentrated sub-groups. The problem here is, the generator would be able to yield images based on anyone sub-group resulting in an inaccurate output. Thus, causing a mode collapse.

To tackle these and other challenges that arise while training GANs, researchers have come up with DCGANs (Deep Convolutional GANs), WassersteinGANs, CycleGANs to ensure fair training, enhance accuracy, and reduce the training time. AdaGANs are implemented to eliminate mode collapse problem.

Conclusion

Although the adoption of GANs is not as widespread as one might imagine, there’s no doubt that they could change the way unsupervised machine learning is used today. It is not too far-fetched to think that their implementation in the future could find practical applications in not just image or text processing, but also in domains such as cryptography and cybersecurity. Innovations in developing newer GAN models with improved accuracy and lesser training time is the key here - but it is something surely worth keeping an eye on.