4 min read

[box type=”note” align=”” class=”” width=””]This article is an excerpt from a book by Kuntal Ganguly titled Learning Generative Adversarial Networks. The book gives a complete coverage of Generative adversarial networks. [/box]

The article highlights some of the common challenges that a developer might face while using GAN models.

Common challenges faced while working with GAN models

Training a GAN is basically about two networks, generator G(z) and discriminator D(z) trying to race against each other and trying to reach an optimum, more specifically a nash equilibrium. The definition of nash equilibrium as per Wikipedia: (in economics and game theory) a stable state of a system involving the interaction of different participants, in which no participant can gain by a unilateral change of strategy if the strategies of the others remain unchanged.

1. Setting up failure and bad initialization

If you think about it, this is exactly what a GAN is trying to do; the generator and discriminator reach a state where they cannot improve further given the other is kept unchanged. Now the setup of gradient descent is to take a step in a direction that reduces the loss measure defined on the problem—but we are by no means enforcing the networks to reach Nash equilibrium in GAN, which have non-convex objective with continuous high dimensional parameters. The networks try to take successive steps to minimize a non-convex objective and end up in an oscillating process rather than decreasing the underlying true objective.

In most cases, when your discriminator attains a loss very close to zero, then right away you can figure out something is wrong with your model. But the biggest pain-point is figuring out what is wrong.

Another practical thing done during the training of GAN is to purposefully make one of the networks stall or learn slower, so that the other network can catch up. And in most scenarios, it’s the generator that lags behind so we usually let the discriminator wait. This might be fine to some extent, but remember that for the generator to get better, it requires a good discriminator and vice versa. Ideally the system would want both the networks to learn at a rate where both get better over time. The ideal minimum loss for the discriminator is close to 0.5— this is where the generated images are indistinguishable from the real images from the perspective of the discriminator.

2. Mode collapse

One of the main failure modes with training a generative adversarial network is called mode collapse or sometimes the helvetica scenario. The basic idea is that the generator can accidentally start to produce several copies of exactly the same image, so the reason is related to the game theory setup we can think of the way that we train generative adversarial networks as first maximizing with respect to the discriminator and then minimizing with respect to the generator. If we fully maximize with respect to the discriminator before we start to minimize with respect to the generator everything works out just fine. But if we go the other way around and we minimize with respect to the generator and then maximize with respect to the discriminator, everything will actually break and the reason is that if we hold the discriminator constant it will describe a single region in space as being the point that is most likely to be real rather than fake and then the generator will choose to map all noise input values to that same most likely to be real point.

3. Problem with counting

GANs can sometimes be far-sighted and fail to differentiate the number of particular objects that should occur at a location. As we can see, it gives more numbers of eyes in the head than originally present:

4. Problems with perspective

GANs sometime are not capable of differentiating between front and back view and hence fail to adapt well with 3D objects while generating 2D representation from it as follows:

5. Problems with global structures

GANs do not understand a holistic structure similar to problems with perspective. For example, in the bottom left image, it generates an image of a quadruple cow, that is, a cow standing on its hind legs and simultaneously on all four legs. That is definitely unrealistic and not possible in real life!

It is very important when its comes to train GAN models towards the execution and there would be some common challenges that can come ahead. The major challenge that arises is the failure of the setup and also the one that is mainly faced in training GAN model is mode collapse or sometimes the helvetica scenario. It highlights some of the common problems like with counting, perspective or be global structure.

The above listings are some of the major issues faced while training a GAN model. To read more on solutions with real world examples, you will need to check out this book Learning Generative Adversarial Networks.

GANs

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here