[box type=”note” align=”” class=”” width=””]This article is an excerpt from a book written by Rajdeep Dua and Manpreet Singh Ghotra titled Neural Network Programming with Tensorflow. In this book, you will use TensorFlow to build and train neural networks of varying complexities, without any hassle.[/box]
In today’s tutorial, we will learn about generative models, and their types. We will also look into how discriminative models differs from generative models.
Introduction to Generative models
Generative models are the family of machine learning models that are used to describe how data is generated. To train a generative model we first accumulate a vast amount of data in any domain and later train a model to create or generate data like it.
In other words, these are the models that can learn to create data that is similar to data that we give them. One such approach is using Generative Adversarial Networks (GANs).
There are two kinds of machine learning models: generative models and discriminative models. Let’s examine the following list of classifiers: decision trees, neural networks, random forests, generalized boosted models, logistic regression, naive bayes, and Support Vector Machine (SVM). Most of these are classifiers and ensemble models. The odd one out here is Naive Bayes. It’s the only generative model in the list. The others are examples of discriminative models.
The fundamental difference between generative and discriminative models lies in the underlying probability inference structure. Let’s go through some of the key differences between generative and discriminative models.
Discriminative versus generative models
Discriminative models learn P(Y|X), which is the conditional relationship between the target variable Y and features X. This is how least squares regression works, and it is the kind of inference pattern that gets used. It is an approach to sort out the relationship among variables.
Generative models aim for a complete probabilistic description of the dataset. With generative models, the goal is to develop the joint probability distribution P(X, Y), either directly or by computing P(Y | X) and P(X) and then inferring the conditional probabilities required to classify newer data. This method requires more solid probabilistic thought than regression demands, but it provides a complete model of the probabilistic structure of the data. Knowing the joint distribution enables you to generate the data; hence, Naive Bayes is a generative model.
Suppose we have a supervised learning task, where xi is the given features of the data points and yi is the corresponding labels. One way to predict y on future x is to learn a function f() from (xi,yi) that takes in x and outputs the most likely y. Such models fall in the category of discriminative models, as you are learning how to discriminate between x’s from different classes. Methods like SVMs and neural networks fall into this category. Even if you’re able to classify the data very accurately, you have no notion of how the data might have been generated.
The second approach is to model how the data might have been generated and learn a function f(x,y) that gives a score to the configuration determined by x and y together. Then you can predict y for a new x by finding the y for which the score f(x,y) is maximum. A canonical example of this is Gaussian mixture models.
Another example of this is: you can imagine x to be an image and y to be a kind of object like a dog, namely in the image. The probability written as p(y|x) tells us how much the model believes that there is a dog, given an input image compared to all possibilities it knows about. Algorithms that try to model this probability map directly are called discriminative models.
Generative models, on the other hand, try to learn a function called the joint probability p(y, x). We can read this as how much the model believes that x is an image and there is a dog y in it at the same time. These two probabilities are related and that could be written as p(y, x) = p(x) p(y|x), with p(x) being how likely it is that the input x is an image. The p(x) probability is usually called a density function in literature.
The main reason to call these models generative ultimately connects to the fact that the model has access to the probability of both input and output at the same time. Using this, we can generate images of animals by sampling animal kinds y and new images x from p(y, x).
We can mainly learn the density function p(x) which only depends on the input space. Both models are useful; however, comparatively, generative models have an interesting
advantage over discriminative models, namely, they have the potential to understand and
explain the underlying structure of the input data even when there are no labels available.
This is very desirable when working in the real world.
Types of generative models
Discriminative models have been at the forefront of the recent success in the field of machine learning. Models make predictions that depend on a given input, although they are not able to generate new samples or data.
The idea behind the recent progress of generative modeling is to convert the generation problem to a prediction one and use deep learning algorithms to learn such a problem.
One way to convert a generative to a discriminative problem can be by learning the mapping from the input space itself. For example, we want to learn an identity map that, for each image x, would ideally predict the same image, namely, x = f(x), where f is the predictive model.
This model may not be of use in its current form, but from this, we can create a generative model.
Here, we create a model formed of two main components: an encoder model q(h|x) that maps the input to another space, which is referred to as hidden or the latent space represented by h, and a decoder model q(x|h) that learns the opposite mapping from the hidden input space.
These components–encoder and decoder–are connected together to create an end-to-end trainable model. Both the encoder and decoder models are neural networks of different architectures, for example, RNNs and Attention Nets, to get desired outcomes.
As the model is learned, we can remove the decoder from the encoder and then use them separately. To generate a new data sample, we can first generate a sample from the latent space and then feed that to the decoder to create a new sample from the output space.
As seen with autoencoders, we can think of a general concept to create networks that will work together in a relationship, and training them will help us learn the latent spaces that allow us to generate new data samples.
Another type of generative network is GAN, where we have a generator model q(x|h) to map the small dimensional latent space of h (which is usually represented as noise samples from a simple distribution) to the input space of x. This is quite similar to the role of decoders in autoencoders.
The deal is now to introduce a discriminative model p(y| x), which tries to associate an input instance x to a yes/no binary answer y, about whether the generator model generated the input or was a genuine sample from the dataset we were training on.
Let’s use the image example done previously. Assume that the generator model creates a new image, and we also have the real image from our actual dataset. If the generator model was right, the discriminator model would not be able to distinguish between the two images easily. If the generator model was poor, it would be very simple to tell which one was a fake or fraud and which one was real.
When both these models are coupled, we can train them end to end by assuring that the generator model is getting better over time to fool the discriminator model, while the discriminator model is trained to work on the harder problem of detecting frauds. Finally, we desire a generator model with outputs that are indistinguishable from the real data that we used for the training.
Through the initial parts of the training, the discriminator model can easily detect the samples coming from the actual dataset versus the ones generated synthetically by the generator model, which is just beginning to learn. As the generator gets better at modeling the dataset, we begin to see more and more generated samples that look similar to the dataset. The following example depicts the generated images of a GAN model learning over time:
If the data is temporal in nature, then we can use specialized algorithms called Sequence Models. These models can learn the probability of the form p(y|x_n, x_1), where i is an index signifying the location in the sequence and x_i is the ith input sample.
As an example, we can consider each word as a series of characters, each sentence as a series of words, and each paragraph as a series of sentences. Output y could be the sentiment of the sentence.
Using a similar trick from autoencoders, we can replace y with the next item in the series or sequence, namely y = x_n + 1, allowing the model to learn.
To summarize, we learned generative models are a fast advancing area of study and research. As we proceed to advance these models and grow the training and datasets, we can expect to generate data examples that depict completely believable images. This can be used in several applications such as image denoising, painting, structured prediction, and exploration in reinforcement learning.
To know more about how to build and optimize neural networks using TensorFlow, do checkout this book Neural Network Programming with Tensorflow.