In machine learning, a generative model is one that captures the observable data distribution. The objective of deep neural generative models is to disentangle different factors of variation in data and be able to generate new or similar-looking samples of the data. For example, an ideal generative model on face images disentangles all different factors of variation such as illumination, pose, gender, skin color, and so on, and is also able to generate a new face by the combination of those factors in a very non-linear way. Figure 1 shows a trained generative model that has learned different factors, including pose and the degree of smiling. On the x-axis, as we go to the right, the pose changes and on y-axis as we move upwards, smiles turn to frowns. Usually these factors are orthogonal to one another, meaning that changing one while keeping the others fixed leads to a single change in data space; e.g. in the first row of Figure 1, only the pose changes with no change in the degree of smiling. The figure is adapted from here.
Based on the assumption that these underlying factors of variation have a very simple distribution (unlike the data itself), to generate a new face, we can simply sample a random number from the assumed simple distribution (such as a Gaussian). In other words, if there are k different factors, we randomly sample from a k-dimensional Gaussian distribution (aka noise).
In this post, we will take a look at one of the recent models in the area of deep learning and generative models, called generative adversarial network. This model can be seen as a game between two agents: the Generator and the Discriminator. The generator generates images from noise and the discriminator discriminates between real images and those images which are generated by the generator. The objective is then to train the model such that while the discriminator tries to distinguish generated images from real images, the generator tries to fool the discriminator.
To train the model, we need to define a cost. In the case of GAN, the errors made by the discriminator are considered as the cost. Consequently, the objective of the discriminator is to minimize the cost, while the objective for the generator is to fool the discriminator by maximizing the cost. A graphical illustration of the model is shown in Figure 2.
Formally, we define the discriminator as a binary classiffier D : Rm ! f0; 1g and the generator as the mapping G : Rk ! Rm in which k is the dimension of the latent space that represents all of the factors of variation. Denoting the data by x and a point in latent space by z, the model can be trained by playing the following minmax game:
Note that the rst term encourages the discriminator to discriminate between generated images and real ones, while the second term encourages the generator to come up with images that would fool the discriminator. In practice, the log in the second term can be saturated, which would hurt the row of the gradient. As a result, the cost may be reformulated equivalently as:
At the time of generation, we can sample from a simple k-dimensional Gaussian distribution with zero mean and unit variance and pass it onto the generator. Among different models that can be used as the discriminator and generator, we use deep neural networks with parameters D and G for the discriminator and generator, respectively. Since the training boils down to updating the parameters using the backpropagation algorithm, the update rule is defined as follows:
If we use a convolutional network as the discriminator and another convolutional network with fractionally strided convolution layers as the generator, the model is called DCGAN (Deep Convolutional Generative Adversarial Network). Some samples of bedroom im-age generation from this model are shown in Figure 3.
The generator can also be a sequential model, meaning that it can generate an image using a sequence of images with lower-resolution or details. A few examples of the generated images using such a model are shown in Figure 4.
The GAN and later variants such as the DCGAN are currently considered to be among the best when it comes to the quality of the generated samples. The images look so realistic that you might assume that the model has simply memorized instances of the training set, but a quick KNN search reveals this not to be the case.
About the author
Mohammad Pezeshk is a master’s student in the LISA lab of Universite de Montreal working under the supervision of Yoshua Bengio and Aaron Courville. He obtained his bachelor’s in computer engineering from Amirkabir University of Technology (Tehran Polytechnic) in July 2014 and then started his master’s in September 2014. His research interests lie in the fields of artificial intelligence, machine learning, probabilistic models and specifically deep learning.