Training Deep Convolutional GANs to generate Anime Characters [Tutorial]

18 min read

Convolution layers are really good at processing images. They are capable of learning important features, such as edges, shapes, and complex objects, effectively, as shown in neural networks, such as Inception, AlexNet, Visual Geometry Group (VGG), and ResNet.

In this tutorial, we will use a DCGAN architecture to generate anime characters. We will learn to prepare the dataset for training, Keras implementation of a DCGAN for the generation of anime characters, and training the DCGAN on the anime character dataset.

The development of Deep Convolutional Generative Adversarial Networks (DCGANs) was an important step towards using CNNs for image generation. A DCGAN uses convolutional layers instead of dense layers and were proposed by researchers Alec Radford, Luke Metz, Soumith Chintala, and others, in their paper, Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Since then, DCGANs have been widely used for various image generation tasks.

This tutorial is an excerpt taken from the book ‘Generative Adversarial Networks Projects’ written by Kailash Ahirwar. The book explores unsupervised techniques for training neural networks and includes seven end-to-end projects in the GAN domain.

Downloading and preparing the anime characters dataset

To train a DCGAN network, we need a dataset of anime characters containing cropped faces of the characters. In this tutorial, we will be scraping images for educational and demonstration purposes only. We have scraped images from pixiv.net using a crawler tool called gallery-dl. This is a command-line tool that can be used to download image collections from websites, such as pixiv.net, exhentai.org, danbooru.donmai.us, and more. It is available at the following link: https://github.com/mikf/gallery-dl.

Downloading the dataset

In this section, we will cover the different steps required to install the dependencies and download the dataset. Before executing the following commands, activate the virtual environment created for this project:

  1. Execute the following command to install gallery-dl:
pip install --upgrade gallery-dl
  1. Alternatively, you can install the latest development version of gallery-dl using the following command:
pip install --upgrade https://github.com/mikf/gallery-dl/archive/master.zip
  1. If the preceding commands don’t work, follow the instructions given in the official repository:
# Official gallery-dl Github repo
https://github.com/mikf/gallery-dl
  1. Finally, execute the following command to download the images from danbooru.donmai.us using gallery-dl.:
gallery-dl https://danbooru.donmai.us/posts?tags=face

Download images at your own risk. The information given is for educational purposes only and we don’t support illegal scraping. We don’t have copyright of the images, as the images are hosted by their respective owners. For commercial purposes, please contact the respective owner of the website or the content that you are using.

Exploring the dataset

Before we crop or resize the images, take a look at the downloaded images:

As you see, some images contain other body parts as well, which we don’t want in our training images. In the next section, we will crop out only the face part of these images. Also, we will resize all images to a size required for the training.

Cropping and resizing images in the dataset

In this section, we will crop out faces from images. We will be using python-animeface to crop the faces from the images. This is an open source GitHub repository that automatically crops faces from images from the command line. It is publicly available at the following link: https://github.com/nya3jp/python-animeface.

Execute the following steps to crop and resize the images:

  1. First of all, download python-animeface:
pip install animeface
  1. Next, import the module required for the task:
import glob
import os

import animeface
from PIL import Image
  1. Next, define the parameters:
total_num_faces = 0
  1. Next, iterate over all images to crop and resize them one by one:
for index, filename in  
    enumerate(glob.glob('/path/to/directory/containing/images/*.*')):
  1. Inside the loop, open the current image and detect a face inside it:
try:
        # Open image
        im = Image.open(filename)

# Detect faces
faces = animeface.detect(im)
except Exception as e:
print("Exception:{}".format(e))
continue
  1. Next, get coordinates of the face detected in the images:
fp = faces[0].face.pos

# Get coordinates of the face detected in the image
coordinates = (fp.x, fp.y, fp.x+fp.width, fp.y+fp.height)
  1. Now, crop the face out of the image:
    # Crop image
    cropped_image = im.crop(coordinates)
  1. Next, resize the cropped face image to have a dimension of (64, 64):
    # Resize image
    cropped_image = cropped_image.resize((64, 64), Image.ANTIALIAS)
  1. Finally, save the cropped and resized image to the desired directory:
 cropped_image.save("/path/to/directory/to/store/cropped/images/filename.png"))

The complete code wrapped inside a Python function appears as follows:

import glob
import os

import animeface
from PIL import Image

total_num_faces = 0

for index, filename in enumerate(glob.glob('/path/to/directory/containing/images/*.*')):
# Open image and detect faces
try:
im = Image.open(filename)
faces = animeface.detect(im)
except Exception as e:
print("Exception:{}".format(e))
continue

# If no faces found in the current image
if len(faces) == 0:
print("No faces found in the image")
continue

fp = faces[0].face.pos

# Get coordinates of the face detected in the image
coordinates = (fp.x, fp.y, fp.x+fp.width, fp.y+fp.height)

# Crop image
cropped_image = im.crop(coordinates)

# Resize image
cropped_image = cropped_image.resize((64, 64), Image.ANTIALIAS)

# Show cropped and resized image
# cropped_image.show()

# Save it in the output directory
cropped_image.save("/path/to/directory/to/store/cropped/images/filename.png"))

print("Cropped image saved successfully")
total_num_faces += 1
print("Number of faces detected till now:{}".format(total_num_faces))

print("Total number of faces:{}".format(total_num_faces))

The preceding script will load all of the images from the folder containing downloaded images, detect faces using the python-animeface library, and crop out the face part from the initial image. Then, the cropped images will be resized to a size of 64 x 64. If you want to change the dimensions of the images, change the architecture of the generator and the discriminator accordingly. We are now ready to work on our network.

Implementing a DCGAN using Keras

In this section, we will write an implementation of a DCGAN in the Keras framework. Keras is a meta-framework that uses TensorFlow or Teano as a backend. It provides high-level APIs for working with neural networks.  Let’s start by writing the implementation of the generator network.

Generator

The generator network consists of some 2D convolutional layers, upsampling layers, a reshape layer, and a batch normalization layer. In Keras, every operation can be specified as a layer. Even activation functions are layers in Keras and can be added to a model just like a normal dense layer.

Perform the following steps to create a generator network:

  1. Let’s start by creating a Sequential Keras model:
gen_model = Sequential()
  1. Next, add a dense layer that has 2,048 nodes, followed by an activation layer, tanh:
gen_model.add(Dense(units=2048))
gen_model.add(Activation('tanh'))
  1. Next, add the second layer, which is also a dense layer that has 16,384 neurons. This is followed by a batch normalization layer with default hyperparameters and tanh as the activation function:
gen_model.add(Dense(256*8*8))
gen_model.add(BatchNormalization())
gen_model.add(Activation('tanh'))

The output of the second dense layer is a tensor of a size of (16384,). Here, (256, 8, 8) is the number of neurons in the dense layer.

  1. Next, add a reshape layer to the network to reshape the tensor from the last layer to a tensor of a shape of (batch_size, 8, 8, 256):
# Reshape layer
gen_model.add(Reshape((8, 8, 256), input_shape=(256*8*8,)))
  1. Next, add a 2D upsampling layer to alter the shape from (8, 8, 256) to (16, 16, 256). The upsampling size is (2, 2), which increases the size of the tensor to double its original size. Here, we have 256 tensors of a dimension of 16 x 16:
gen_model.add(UpSampling2D(size=(2, 2)))
  1. Next, add a 2D convolutional layer. This applies 2D convolutions on the tensor using a specified number of filters. Here, we are using 64 filters and a kernel of a shape of (5, 5):
gen_model.add(Conv2D(128, (5, 5), padding='same'))
gen_model.add(Activation('tanh'))
  1. Next, add a 2D upsampling layer to change the shape of the tensor from (batch_size, 16, 16, 64) to (batch_size, 32, 32, 64):
gen_model.add(UpSampling2D(size=(2, 2)))

A 2D upsampling layer repeats the rows and columns of the tensor by a size of [0] and a size of [1], respectively.

  1. Next, add a second 2D convolutional layer with 64 filters and a kernel size of (5, 5) followed by tanh as the activation function:
gen_model.add(Conv2D(64, (5, 5), padding='same'))
gen_model.add(Activation('tanh'))
  1. Next, add a 2D upsampling layer to change the shape from (batch_size, 32, 32, 64) to (batch_size, 64, 64, 64):
gen_model.add(UpSampling2D(size=(2, 2)))
  1. Finally, add the third 2D convolutional layer with three filters and a kernel size of (5, 5) followed by tanh as the activation function:
gen_model.add(Conv2D(3, (5, 5), padding='same'))
gen_model.add(Activation('tanh'))

The generator network will output a tensor of a shape of (batch_size, 64, 64, 3). One image tensor from this batch of tensors is similar to an image of a dimension of 64 x 64 with three channels: Red, Green, and Blue (RGB).

The complete code for the generator network wrapped in a Python method looks as follows:

def get_generator():
    gen_model = Sequential()

gen_model.add(Dense(input_dim=100, output_dim=2048))
gen_model.add(LeakyReLU(alpha=0.2))

gen_model.add(Dense(256 * 8 * 8))
gen_model.add(BatchNormalization())
gen_model.add(LeakyReLU(alpha=0.2))

gen_model.add(Reshape((8, 8, 256), input_shape=(256 * 8 * 8,)))
gen_model.add(UpSampling2D(size=(2, 2)))

gen_model.add(Conv2D(128, (5, 5), padding='same'))
gen_model.add(LeakyReLU(alpha=0.2))

gen_model.add(UpSampling2D(size=(2, 2)))

gen_model.add(Conv2D(64, (5, 5), padding='same'))
gen_model.add(LeakyReLU(alpha=0.2))

gen_model.add(UpSampling2D(size=(2, 2)))

gen_model.add(Conv2D(3, (5, 5), padding='same'))
gen_model.add(LeakyReLU(alpha=0.2))
return gen_model

Now we have created the generator network, let’s work on creating the discriminator network.

Discriminator

The discriminator network has three 2D convolutional layers, each followed by an activation function followed by two max-pooling layers. The tail of the network contains two fully-connected (dense) layers that work as a classification layer.

Perform the following steps to create a discriminator network:

  1. Let’s start by creating a Sequential Keras model:
dis_model = Sequential()
  1. Add a 2D convolutional layer that takes an input image of a shape of (64, 64, 3). The hyperparameters for this layer are the following. Also, add LeakyReLU with an alpha value of 0.2 as the activation function:
    • Filters: 128
    • Kernel Size: (5, 5)
    • Padding: Same:
dis_model.add(Conv2D(filters=128, kernel_size=5, padding='same', 
              input_shape=(64, 64, 3)))
dis_model.add(LeakyReLU(alpha=0.2))
  1. Next, add a 2D max pooling layer with a pool size of (2, 2). Max pooling is used to downsample an image representation and it is applied by using a max-filter over non-overlapping sub-regions of the representation:
dis_model.add(MaxPooling2D(pool_size=(2, 2)))

The shape of the output tensor from the first layer will be (batch_size, 32, 32, 128).

  1. Next, add another 2D convolutional layer with the following configurations:
    • Filters: 256
    • Kernel size: (3, 3)
    • Activation function: LeakyReLU with alpha 0.2
    • Pool size in 2D max pooling: (2, 2):
dis_model.add(Conv2D(filters=256, kernel_size=3))
dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(MaxPooling2D(pool_size=(2, 2)))

The shape of the output tensor from this layer will be (batch_size, 30, 30, 256).

  1. Next, add the third 2D convolutional layer with the following configurations:
    • Filters: 512
    • Kernel size: (3, 3)
    • Activation function: LeakyReLU with alpha 0.2
    • Pool size in 2D Max Pooling: (2, 2):
dis_model.add(Conv2D(512, (3, 3)))
dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(MaxPooling2D(pool_size=(2, 2)))

The shape of the output tensor from this layer will be (batch_size, 13, 13, 512).

  1. Next, add a flatten layer. This flattens the input without affecting the batch size. It produces a two-dimensional tensor:
dis_model.add(Flatten())

The output shape of the tensor from the flattened layer will be (batch_size, 18432,).

  1. Next, add a dense layer with 1024 neurons and LeakyReLU with alpha 0.2 as the activation function:
dis_model.add(Dense(1024))
dis_model.add(LeakyReLU(alpha=0.2))
  1. Finally, add a dense layer with one neuron for binary classification. The sigmoid function is the best for binary classification, as it gives the probability of the classes:
dis_model.add(Dense(1))
dis_model.add(Activation('tanh'))

The network will generate an output tensor of a shape of (batch_size, 1). The output tensor contains the probability of the classes.

The complete code for the discriminator network wrapped inside a Python method looks as follows:

def get_discriminator():
    dis_model = Sequential()
    dis_model.add(
        Conv2D(128, (5, 5),
               padding='same',
               input_shape=(64, 64, 3))
    )
    dis_model.add(LeakyReLU(alpha=0.2))
    dis_model.add(MaxPooling2D(pool_size=(2, 2)))

dis_model.add(Conv2D(256, (3, 3)))
dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(MaxPooling2D(pool_size=(2, 2)))

dis_model.add(Conv2D(512, (3, 3)))
dis_model.add(LeakyReLU(alpha=0.2))
dis_model.add(MaxPooling2D(pool_size=(2, 2)))

dis_model.add(Flatten())
dis_model.add(Dense(1024))
dis_model.add(LeakyReLU(alpha=0.2))

dis_model.add(Dense(1))
dis_model.add(Activation('sigmoid'))

return dis_model

In this section, we have successfully implemented the discriminator and generator networks. Next, we will train the model on the dataset that we prepared in the Downloading and preparing the anime characters dataset section.

Training the DCGAN

Again, training a DCGAN is similar to training a Vanilla GAN network. It is a four-step process:

  1. Load the dataset.
  2. Build and compile the networks.
  3. Train the discriminator network.
  4. Train the generator network.

We will work on these steps one by one in this section.

Let’s start by defining the variables and the hyperparameters:

dataset_dir = "/Path/to/dataset/directory/*.*"
batch_size = 128
z_shape = 100
epochs = 10000
dis_learning_rate = 0.0005
gen_learning_rate = 0.0005
dis_momentum = 0.9
gen_momentum = 0.9
dis_nesterov = True
gen_nesterov = True

Here, we have specified different hyperparameters for the training. We will now see how to load the dataset for the training.

Loading the samples

To train the DCGAN network, we need to load the dataset in memory and we need to define a mechanism to load batches of memory. Perform the following steps to load the dataset:

  1. Start by loading all images that you cropped, resized, and saved in the cropped folder. Specify the path of the directory correctly, so that the glob.glob method can create a list of all files in it. To read an image, use the imread method from the scipy.misc module. The following code shows the different steps to load all images inside the directory:
# Loading images
all_images = []
for index, filename in enumerate(glob.glob('/Path/to/cropped/images/directory/*.*')):
    image = imread(filename, flatten=False, mode='RGB')
    all_images.append(image)
  1. Next, create a ndarray of all the images. The shape of the final ndarray will be (total_num_images, 64, 64, 3). Also, normalize all the images:
# Convert to Numpy ndarray
X = np.array(all_images)
X = (X - 127.5) / 127.5

Now we have loaded the dataset, next we will see how to build and compile the networks.

Building and compiling the networks

In this section, we will build and compile our networks required for the training:

  1. Start by defining the optimizers required for the training, as shown here:
# Define optimizers
dis_optimizer = SGD(lr=dis_learning_rate, momentum=dis_momentum, nesterov=dis_nesterov)
gen_optimizer = SGD(lr=gen_learning_rate, momentum=gen_momentum, nesterov=gen_nesterov)
  1. Next, create an instance of the generator model, and compile the generator model (compiling will initialize the weights parameters, the optimizer algorithm, the loss function, and other essential steps required to use the network):
gen_model = build_generator()
gen_model.compile(loss='binary_crossentropy', optimizer=gen_optimizer)

Use binary_crossentropy as the loss function for the generator networks and gen_optimizer as the optimizer.

  1. Next, create an instance of the discriminator model, and compile it, as shown here:
dis_model = build_discriminator()
dis_model.compile(loss='binary_crossentropy', optimizer=dis_optimizer)

Similarly, use binary_crossentropy as the loss function for the discriminator network and  dis_optimizer as the optimizer.

  1. Next, create an adversarial model. An adversarial contains both networks in a single model. The architecture of the adversarial model will be as follows:
    • input -> generator->discriminator->output

The code to create and compile an adversarial model is as follows:

adversarial_model = Sequential()
adversarial_model.add(gen_model)
dis_model.trainable = False
adversarial_model.add(dis_model)

When we train this network, we don’t want to train the discriminator network, so make it non-trainable before we add it to the adversarial model.

Compile the adversarial model, as follows:

adversarial_model.compile(loss='binary_crossentropy', optimizer=gen_optimizer)

Use binary_crossentropy as the loss function and gen_optimizer as the optimizer for the adversarial model.

Before starting the training, add TensorBoard to visualize the losses, as follows:

tensorboard = TensorBoard(log_dir="logs/{}".format(time.time()), write_images=True, write_grads=True, write_graph=True)
tensorboard.set_model(gen_model)
tensorboard.set_model(dis_model)

We will train the network for a specified number of iterations, so create a loop that should run for a specified number of epochs. Inside each epoch, we will train our networks on a mini-batch of a size of 128. Calculate the number of batches that need to be processed:

for epoch in range(epcohs):
    print("Epoch is", epoch)
    number_of_batches = int(X.shape[0] / batch_size)
    print("Number of batches", number_of_batches)
    for index in range(number_of_batches):

We will now take a closer look at the training process. The following points explain the different steps involved in the training of DCGAN:

  • Initially, both of the networks are naive and have random weights.
  • The standard process to train a DCGAN network is to first train the discriminator on the batch of samples.
  • To do this, we need fake samples as well as real samples. We already have the real samples, so we now need to generate the fake samples.
  • To generate fake samples, create a latent vector of a shape of (100,) over a uniform distribution. Feed this latent vector to the untrained generator network. The generator network will generate fake samples that we use to train our discriminator network.
  • Concatenate the real images and fake images to create a new set of sample images. We also need to create an array of labels: label 1 for real images and label 0 for fake images.

Training the discriminator network

Perform the following steps to train the discriminator network:

  1. Start by sampling a batch of noise vectors from a normal distribution, as follows:
z_noise = np.random.normal(0, 1, size=(batch_size, z_shape))

To sample the values, use the normal() method from the np.random module in the Numpy library.

  1. Next, sample a batch of real images from the set of all images:
image_batch = X[index * batch_size:(index + 1) * batch_size]
  1. Next, generate a batch of fake images using the generator network:
generated_images = gen_model.predict_on_batch(z_noise)
  1. Next, create real labels and fake labels:
y_real = np.ones(batch_size) - np.random.random_sample(batch_size) * 0.2
y_fake = np.random.random_sample(batch_size) * 0.2
  1. Next, train the discriminator network on real images and real labels:
 dis_loss_real = dis_model.train_on_batch(image_batch, y_real)
  1. Similarly, train it on fake images and fake labels:
dis_loss_fake = dis_model.train_on_batch(generated_images, y_fake)
  1. Next, calculate the average loss and print it to the console:
d_loss = (dis_loss_real+dis_loss_fake)/2
print("d_loss:", d_loss)

Up until now, we have been training the discriminator network. In the next section, let’s train the generator network.

Training the generator network

To train the generator network, we have to train the adversarial model. When we train the adversarial model, it trains the generator network only but freezes the discriminator network. We won’t train the discriminator network, as we have already trained it. Perform the following steps to train the adversarial model:

  1. Start by creating a batch of noise vectors again. Sample these noise vectors from a Gaussian/Normal distribution:
z_noise = np.random.normal(0, 1, size=(batch_size, z_shape))
  1. Next, train the adversarial model on this batch of noise vectors, as follows:
g_loss = adversarial_model.train_on_batch(z_noise, [1] * batch_size)

We train the adversarial model on the batch of noise vectors and real labels. Here, real labels is a vector with all values equal to 1. We are also training the generator to fool the discriminator network. To do this, we provide it with a vector that has all the values equal to 1. In this step, the generator will receive feedback from the generator network and improve itself accordingly.

  1. Finally, print the generator loss to the console to keep track of the losses:
print("g_loss:", g_loss)

There is a passive method to evaluate the training process. After every 10 epochs, generate fake images and manually check the quality of the images:

if epoch % 10 == 0:
        z_noise = np.random.normal(0, 1, size=(batch_size, z_shape))
        gen_images1 = gen_model.predict_on_batch(z_noise)

for img in gen_images1[:2]:
save_rgb_img(img, "results/one_{}.png".format(epoch))

These images will help you to decide whether to continue the training or to stop it early. Stop the training if quality of the generated high-resolution images is good; else, continue the training until your model becomes good. After this step, we then further evaluate the trained model and visualize the generated images.

We have successfully trained a DCGAN network on the ANIME character dataset. Now we can use the model to generate images of anime characters.

To summarize, in this tutorial, we looked at the different steps required to download and prepare the dataset. We then prepared a Keras implementation of the network and trained it on our dataset. If you enjoyed the tutorial and want to explore how to further evaluate the trained model, and optimize the networks by optimizing the hyperparameters, be sure to check out the book ‘Generative Adversarial Networks Projects’.

Read Next

Generative Adversarial Networks: Generate images using Keras GAN [Tutorial]

What you need to know about Generative Adversarial Networks

Generative Adversarial Networks (GANs): The next milestone In Deep Learning

Natasha Mathur
Tech writer at the Packt Hub. Dreamer, book nerd, lover of scented candles, karaoke, and Gilmore Girls.

Share this post

Popular