You might have worked with the popular MNIST dataset before – but in this article, we will be generating new MNIST-like images with a Keras GAN. It can take a very long time to train a GAN; however, this problem is small enough to run on most laptops in a few hours, which makes it a great example.
The following excerpt is taken from the book Deep Learning Quick Reference, authored by Mike Bernico.
The network architecture that we will be using here has been found by, and optimized by, many folks, including the authors of the DCGAN paper and people like Erik Linder-Norén, who’s excellent collection of GAN implementations called Keras GAN served as the basis of the code we used here.
Loading the MNIST dataset
The MNIST dataset consists of 60,000 hand-drawn numbers, 0 to 9. Keras provides us with a built-in loader that splits it into 50,000 training images and 10,000 test images. We will use the following code to load the dataset:
from keras.datasets import mnist def
load_data(): (X_train, _), (_, _) = mnist.load_data() X_train = (X_train.astype(np.float32) - 127.5) / 127.5 X_train = np.expand_dims(X_train, axis=3) return X_train
As you probably noticed, We’re not returning any of the labels or the testing dataset. We’re only going to use the training dataset. The labels aren’t needed because the only labels we will be using are 0 for fake and 1 for real. These are real images, so they will all be assigned a label of 1 at the discriminator.
Building the generator
The generator uses a few new layers that we will talk about in this section. First, take a chance to skim through the following code:
def build_generator(noise_shape=(100,)): input = Input(noise_shape) x = Dense(128 * 7 * 7, activation="relu")(input) x = Reshape((7, 7, 128))(x) x = BatchNormalization(momentum=0.8)(x) x = UpSampling2D()(x) x = Conv2D(128, kernel_size=3, padding="same")(x) x = Activation("relu")(x) x = BatchNormalization(momentum=0.8)(x) x = UpSampling2D()(x) x = Conv2D(64, kernel_size=3, padding="same")(x) x = Activation("relu")(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(1, kernel_size=3, padding="same")(x) out = Activation("tanh")(x) model = Model(input, out) print("-- Generator -- ") model.summary() return model
We have not previously used the UpSampling2D layer. This layer will take increases in the rows and columns of the input tensor, leaving the channels unchanged. It does this by repeating the values in the input tensor. By default, it will double the input. If we give an UpSampling2D layer a 7 x 7 x 128 input, it will give us a 14 x 14 x 128 output.
Typically when we build a CNN, we start with an image that is very tall and wide and uses convolutional layers to get a tensor that’s very deep but less tall and wide. Here we will do the opposite. We’ll use a dense layer and a reshape to start with a 7 x 7 x 128 tensor and then, after doubling it twice, we’ll be left with a 28 x 28 tensor. Since we need a grayscale image, we can use a convolutional layer with a single unit to get a 28 x 28 x 1 output.
This sort of generator arithmetic is a little off-putting and can seem awkward at first but after a few painful hours, you will get the hang of it!
Building the discriminator
The discriminator is really, for the most part, the same as any other CNN. Of course, there are a few new things that we should talk about. We will use the following code to build the discriminator:
def build_discriminator(img_shape): input = Input(img_shape) x =Conv2D(32, kernel_size=3, strides=2, padding="same")(input) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = Conv2D(64, kernel_size=3, strides=2, padding="same")(x) x = ZeroPadding2D(padding=((0, 1), (0, 1)))(x) x = (LeakyReLU(alpha=0.2))(x) x = Dropout(0.25)(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(128, kernel_size=3, strides=2, padding="same")(x) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(256, kernel_size=3, strides=1, padding="same")(x) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = Flatten()(x) out = Dense(1, activation='sigmoid')(x)
model = Model(input, out) print("-- Discriminator -- ") model.summary() return model
First, you might notice the oddly shaped zeroPadding2D() layer. After the second convolution, our tensor has gone from 28 x 28 x 3 to 7 x 7 x 64. This layer just gets us back into an even number, adding zeros on one side of both the rows and columns so that our tensor is now 8 x 8 x 64.
More unusual is the use of both batch normalization and dropout. Typically, these two layers are not used together; however, in the case of GANs, they do seem to benefit the network.
Building the stacked model
Now that we’ve assembled both the generator and the discriminator, we need to assemble a third model that is the stack of both models together that we can use for training the generator given the discriminator loss.
To do that we can just create a new model, this time using the previous models as layers in the new model, as shown in the following code:
discriminator = build_discriminator(img_shape=(28, 28, 1)) generator = build_generator() z = Input(shape=(100,)) img = generator(z) discriminator.trainable = False real = discriminator(img) combined = Model(z, real)
Notice that we’re setting the discriminator’s training attribute to False before building the model. This means that for this model we will not be updating the weights of the discriminator during backpropagation. We will freeze these weights and only move the generator weights with the stack. The discriminator will be trained separately.
Now that all the models are built, they need to be compiled, as shown in the following code:
gen_optimizer = Adam(lr=0.0002, beta_1=0.5) disc_optimizer = Adam(lr=0.0002, beta_1=0.5) discriminator.compile(loss='binary_crossentropy', optimizer=disc_optimizer, metrics=['accuracy']) generator.compile(loss='binary_crossentropy', optimizer=gen_optimizer) combined.compile(loss='binary_crossentropy', optimizer=gen_optimizer)
If you’ll notice, we’re creating two custom Adam optimizers. This is because many times we will want to change the learning rate for only the discriminator or generator, slowing one or the other down so that we end up with a stable GAN where neither is overpowering the other. You’ll also notice that we’re using beta_1 = 0.5. This is a recommendation from the original DCGAN paper that we’ve carried forward and also had success with. A learning rate of 0.0002 is a good place to start as well, and was found in the original DCGAN paper.
The training loop
We have previously had the luxury of calling .fit() on our model and letting Keras handle the painful process of breaking the data apart into mini batches and training for us.
Unfortunately, because we need to perform the separate updates for the discriminator and the stacked model together for a single batch we’re going to have to do things the old-fashioned way, with a few loops. This is how things used to be done all the time, so while it’s perhaps a little more work, it does admittedly leave me feeling nostalgic. The following code illustrates the training technique:
num_examples = X_train.shape num_batches = int(num_examples / float(batch_size)) half_batch = int(batch_size / 2)
for epoch in range(epochs + 1): for batch in range(num_batches): # noise images for the batch noise = np.random.normal(0, 1, (half_batch, 100)) fake_images = generator.predict(noise) fake_labels = np.zeros((half_batch, 1)) # real images for batch
idx = np.random.randint(0, X_train.shape, half_batch) real_images = X_train[idx] real_labels = np.ones((half_batch, 1)) # Train the discriminator (real classified as ones and generated as zeros) d_loss_real = discriminator.train_on_batch(real_images, real_labels) d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) noise = np.random.normal(0, 1, (batch_size, 100)) # Train the generator g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1))) # Plot the progress print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch,batch, num_batches, d_loss, 100 * d_loss, g_loss)) if batch % 50 == 0: save_imgs(generator, epoch, batch)
There is a lot going on here, to be sure. As before, let’s break it down block by block. First, let’s see the code to generate noise vectors:
noise = np.random.normal(0, 1, (half_batch, 100)) fake_images = generator.predict(noise) fake_labels = np.zeros((half_batch, 1))
This code is generating a matrix of noise vectors called z) and sending it to the generator. It’s getting a set of generated images back, which we’re calling fake images. We will use these to train the discriminator, so the labels we want to use are 0s, indicating that these are in fact generated images.
Note that the shape here is half_batch x 28 x 28 x 1. The half_batch is exactly what you think it is. We’re creating half a batch of generated images because the other half of the batch will be real data, which we will assemble next. To get our real images, we will generate a random set of indices across X_train and use that slice of X_train as our real images, as shown in the following code:
idx = np.random.randint(0, X_train.shape, half_batch) real_images = X_train[idx] real_labels = np.ones((half_batch, 1))
Since we are using these images to train the discriminator, and because they are real images, we will assign them 1s as labels, rather than 0s. Now that we have our discriminator training set assembled, we will update the discriminator. Also, note that we aren’t using the soft labels. That’s because we want to keep things as easy as they can be to understand. Luckily the network doesn’t require them in this case. We will use the following code to train the discriminator:
# Train the discriminator (real classified as ones and generated as zeros) d_loss_real = discriminator.train_on_batch(real_images, real_labels) d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels) d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
Notice that here we’re using the discriminator’s train_on_batch() method. The train_on_batch() method does exactly one round of forward and backward propagation. Every time we call it, it updates the model once from the model’s previous state.
Also, notice that we’re making the update for the real images and fake images separately. This is advice that is given on the GAN hack Git we had previously referenced in the Generator architecture section. Especially in the early stages of training, when real images and fake images are from radically different distributions, batch normalization will cause problems with training if we were to put both sets of data in the same update.
Now that the discriminator has been updated, it’s time to update the generator. This is done indirectly by updating the combined stack, as shown in the following code:
noise = np.random.normal(0, 1, (batch_size, 100)) g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1)))
To update the combined model, we create a new noise matrix, and this time it will be as large as the entire batch. We will use that as an input to the stack, which will cause the generator to generate an image and the discriminator to evaluate that image. Finally, we will use the label of 1 because we want to backpropagate the error between a real image and the generated image.
Lastly, the training loop reports the discriminator and generator loss at the epoch/batch and then, every 50 batches, of every epoch we will use save_imgs to generate example images and save them to disk, as shown in the following code:
print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch,batch, num_batches, d_loss, 100 * d_loss, g_loss)) if batch % 50 == 0: save_imgs(generator, epoch, batch)
The save_imgs function uses the generator to create images as we go, so we can see the fruits of our labor. We will use the following code to define save_imgs:
def save_imgs(generator, epoch, batch): r, c = 5, 5 noise = np.random.normal(0, 1, (r * c, 100)) gen_imgs = generator.predict(noise) gen_imgs = 0.5 * gen_imgs + 0.5 fig, axs = plt.subplots(r, c) cnt = 0 for i in range(r): for j in range(c): axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray') axs[i, j].axis('off') cnt += 1 fig.savefig("images/mnist_%d_%d.png" % (epoch, batch)) plt.close()
It uses only the generator by creating a noise matrix and retrieving an image matrix in return. Then, using matplotlib.pyplot, it saves those images to disk in a 5 x 5 grid.
Performing model evaluation
Good is somewhat subjective when you’re building a deep neural network to create images. Let’s take a look at a few examples of the training process, so you can see for yourself how the GAN begins to learn to generate MNIST.
Here’s the network at the very first batch of the very first epoch. Clearly, the generator doesn’t really know anything about generating MNIST at this point; it’s just noise, as shown in the following image:
But just 50 batches in, something is happening, as you can see from the following image:
And after 200 batches of epoch 0 we can almost see numbers, as you can see from the following image:
And here’s our generator after one full epoch. These generated numbers look pretty good, and we can see how the discriminator might be fooled by them. At this point, we could probably continue to improve a little bit, but it looks like our GAN has worked as the computer is generating some pretty convincing MNIST digits, as shown in the following image:
Thus, we see the power of GANs in action when it comes to image generation using the Keras library.
If you found the above article to be useful, make sure you check out our book Deep Learning Quick Reference, for more such interesting coverage of popular deep learning concepts and their practical implementation.