12 min read

In this tutorial, we will learn to build both simple and deep convolutional GAN models with the help of TensorFlow and Keras deep learning frameworks.

[box type=”note” align=”” class=”” width=””]This article is an excerpt taken from the book Mastering TensorFlow 1.x written by Armando Fandango.[/box]

Simple GAN with TensorFlow

For building the GAN with TensorFlow, we build three networks, two discriminator models, and one generator model with the following steps:

  1. Start by adding the hyper-parameters for defining the network:
# graph hyperparameters
g_learning_rate = 0.00001
d_learning_rate = 0.01
n_x = 784  # number of pixels in the MNIST image 

# number of hidden layers for generator and discriminator
g_n_layers = 3
d_n_layers = 1
# neurons in each hidden layer
g_n_neurons = [256, 512, 1024]
d_n_neurons = [256]

# define parameter ditionary
d_params = {}
g_params = {}

activation = tf.nn.leaky_relu
w_initializer = tf.glorot_uniform_initializer
b_initializer = tf.zeros_initializer
  1. Next, define the generator network:
z_p = tf.placeholder(dtype=tf.float32, name='z_p', 
        shape=[None, n_z])
layer = z_p

# add generator network weights, biases and layers
with tf.variable_scope('g'):
    for i in range(0, g_n_layers):
        w_name = 'w_{0:04d}'.format(i)
        g_params[w_name] = tf.get_variable(
            name=w_name,
            shape=[n_z if i == 0 else g_n_neurons[i - 1], 
                    g_n_neurons[i]],
            initializer=w_initializer())
        b_name = 'b_{0:04d}'.format(i)
        g_params[b_name] = tf.get_variable(
            name=b_name, shape=[g_n_neurons[i]], 
            initializer=b_initializer())
        layer = activation(
            tf.matmul(layer, g_params[w_name]) + g_params[b_name])
    # output (logit) layer
    i = g_n_layers
    w_name = 'w_{0:04d}'.format(i)
    g_params[w_name] = tf.get_variable(
        name=w_name,
        shape=[g_n_neurons[i - 1], n_x],
        initializer=w_initializer())
    b_name = 'b_{0:04d}'.format(i)
    g_params[b_name] = tf.get_variable(
        name=b_name, shape=[n_x], initializer=b_initializer())
    g_logit = tf.matmul(layer, g_params[w_name]) + g_params[b_name]
    g_model = tf.nn.tanh(g_logit)
  1. Next, define the weights and biases for the two discriminator networks that we shall build:
with tf.variable_scope('d'):
    for i in range(0, d_n_layers):
        w_name = 'w_{0:04d}'.format(i)
        d_params[w_name] = tf.get_variable(
            name=w_name,
            shape=[n_x if i == 0 else d_n_neurons[i - 1], 
                    d_n_neurons[i]],
            initializer=w_initializer())

        b_name = 'b_{0:04d}'.format(i)
        d_params[b_name] = tf.get_variable(
            name=b_name, shape=[d_n_neurons[i]], 
            initializer=b_initializer())

    #output (logit) layer
    i = d_n_layers
    w_name = 'w_{0:04d}'.format(i)
    d_params[w_name] = tf.get_variable(
        name=w_name, shape=[d_n_neurons[i - 1], 1], 
        initializer=w_initializer())

    b_name = 'b_{0:04d}'.format(i)
    d_params[b_name] = tf.get_variable(
        name=b_name, shape=[1], initializer=b_initializer())
  1. Now using these parameters, build the discriminator that takes the real images as input and outputs the classification:
# define discriminator_real

# input real images
x_p = tf.placeholder(dtype=tf.float32, name='x_p', 
        shape=[None, n_x])

layer = x_p

with tf.variable_scope('d'):
    for i in range(0, d_n_layers):
        w_name = 'w_{0:04d}'.format(i)
        b_name = 'b_{0:04d}'.format(i)

        layer = activation(
            tf.matmul(layer, d_params[w_name]) + d_params[b_name])
        layer = tf.nn.dropout(layer,0.7)
    #output (logit) layer
    i = d_n_layers
    w_name = 'w_{0:04d}'.format(i)
    b_name = 'b_{0:04d}'.format(i)
    d_logit_real = tf.matmul(layer, 
        d_params[w_name]) + d_params[b_name]
    d_model_real = tf.nn.sigmoid(d_logit_real)
  1.  Next, build another discriminator network, with the same parameters, but providing the output of generator as input:
# define discriminator_fake

# input generated fake images
z = g_model
layer = z

with tf.variable_scope('d'):
    for i in range(0, d_n_layers):
        w_name = 'w_{0:04d}'.format(i)
        b_name = 'b_{0:04d}'.format(i)
        layer = activation(
            tf.matmul(layer, d_params[w_name]) + d_params[b_name])
        layer = tf.nn.dropout(layer,0.7)
    #output (logit) layer
    i = d_n_layers
    w_name = 'w_{0:04d}'.format(i)
    b_name = 'b_{0:04d}'.format(i)
    d_logit_fake = tf.matmul(layer, 
        d_params[w_name]) + d_params[b_name]
    d_model_fake = tf.nn.sigmoid(d_logit_fake)
  1. Now that we have the three networks built, the connection between them is made using the loss, optimizer and training functions. While training the generator, we only train the generator’s parameters and while training the discriminator, we only train the discriminator’s parameters. We specify this using the var_list parameter to the optimizer’s minimize() function. Here is the complete code for defining the loss, optimizer and training function for both kinds of network:
g_loss = -tf.reduce_mean(tf.log(d_model_fake))
d_loss = -tf.reduce_mean(tf.log(d_model_real) + tf.log(1 - d_model_fake))

g_optimizer = tf.train.AdamOptimizer(g_learning_rate)
d_optimizer = tf.train.GradientDescentOptimizer(d_learning_rate)

g_train_op = g_optimizer.minimize(g_loss, 
                var_list=list(g_params.values()))
d_train_op = d_optimizer.minimize(d_loss, 
                var_list=list(d_params.values()))
  1.  Now that we have defined the models, we have to train the models. The training is done as per the following algorithm:
For each epoch:
  For each batch:
    get real images x_batch
    generate noise z_batch
    train discriminator using z_batch and x_batch
    generate noise z_batch
    train generator using z_batch

The complete code for training from the notebook is as follows:

n_epochs = 400
batch_size = 100
n_batches = int(mnist.train.num_examples / batch_size)
n_epochs_print = 50

with tf.Session() as tfs:
    tfs.run(tf.global_variables_initializer())
    for epoch in range(n_epochs):
        epoch_d_loss = 0.0
        epoch_g_loss = 0.0
        for batch in range(n_batches):
            x_batch, _ = mnist.train.next_batch(batch_size)
            x_batch = norm(x_batch)
            z_batch = np.random.uniform(-1.0,1.0,size=[batch_size,n_z])
            feed_dict = {x_p: x_batch,z_p: z_batch}
            _,batch_d_loss = tfs.run([d_train_op,d_loss], 
                                    feed_dict=feed_dict)
            z_batch = np.random.uniform(-1.0,1.0,size=[batch_size,n_z])
            feed_dict={z_p: z_batch}
            _,batch_g_loss = tfs.run([g_train_op,g_loss], 
                                    feed_dict=feed_dict)
            epoch_d_loss += batch_d_loss 
            epoch_g_loss += batch_g_loss
            
        if epoch%n_epochs_print == 0:
            average_d_loss = epoch_d_loss / n_batches
            average_g_loss = epoch_g_loss / n_batches
            print('epoch: {0:04d}   d_loss = {1:0.6f}  g_loss = {2:0.6f}'
                  .format(epoch,average_d_loss,average_g_loss))
            # predict images using generator model trained            
            x_pred = tfs.run(g_model,feed_dict={z_p:z_test})
            display_images(x_pred.reshape(-1,pixel_size,pixel_size))   

We printed the generated images every 50 epochs:

50 epochs

As we can see the generator was producing just noise in epoch 0, but by epoch 350, it got trained to produce much better shapes of handwritten digits. You can try experimenting with epochs, regularization, network architecture and other hyper-parameters to see if you can produce even faster and better results.

Simple GAN with Keras

Now let us implement the same model in Keras:

  1.  The hyper-parameter definitions remain the same as the last section:
# graph hyperparameters
g_learning_rate = 0.00001
d_learning_rate = 0.01
n_x = 784  # number of pixels in the MNIST image 
# number of hidden layers for generator and discriminator
g_n_layers = 3
d_n_layers = 1
# neurons in each hidden layer
g_n_neurons = [256, 512, 1024]
d_n_neurons = [256]
  1.  Next, define the generator network:
# define generator

g_model = Sequential()
g_model.add(Dense(units=g_n_neurons[0], 
                  input_shape=(n_z,),
                  name='g_0'))
g_model.add(LeakyReLU())
for i in range(1,g_n_layers):
    g_model.add(Dense(units=g_n_neurons[i],
                      name='g_{}'.format(i)
                     ))
    g_model.add(LeakyReLU())
g_model.add(Dense(units=n_x, activation='tanh',name='g_out'))
print('Generator:')
g_model.summary()
g_model.compile(loss='binary_crossentropy',
              optimizer=keras.optimizers.Adam(lr=g_learning_rate)
             )

This is what the generator model looks like:

generator

  1. In the Keras example, we do not define two discriminator networks as we defined in the TensorFlow example. Instead, we define one discriminator network and then stitch the generator and discriminator network into the GAN network. The GAN network is then used to train the generator parameters only, and the discriminator network is used to train the discriminator parameters:
# define discriminator

d_model = Sequential()
d_model.add(Dense(units=d_n_neurons[0],  
                  input_shape=(n_x,),
                  name='d_0'
                 ))
d_model.add(LeakyReLU())
d_model.add(Dropout(0.3))
for i in range(1,d_n_layers):
    d_model.add(Dense(units=d_n_neurons[i], 
                      name='d_{}'.format(i)
                     ))
    d_model.add(LeakyReLU())
    d_model.add(Dropout(0.3))
d_model.add(Dense(units=1, activation='sigmoid',name='d_out'))
print('Discriminator:')
d_model.summary()
d_model.compile(loss='binary_crossentropy',
              optimizer=keras.optimizers.SGD(lr=d_learning_rate)
             )

This is what the discriminator models look:

Discriminator:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
d_0 (Dense)                  (None, 256)               200960    
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU)    (None, 256)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
d_out (Dense)                (None, 1)                 257       
=================================================================
Total params: 201,217
Trainable params: 201,217
Non-trainable params: 0
_________________________________________________________________
  1. Next, define the GAN Network, and turn the trainable property of the discriminator model to false, since GAN would only be used to train the generator:
# define GAN network
d_model.trainable=False
z_in = Input(shape=(n_z,),name='z_in')
x_in = g_model(z_in)
gan_out = d_model(x_in)

gan_model = Model(inputs=z_in,outputs=gan_out,name='gan')
print('GAN:')
gan_model.summary()
gan_model.compile(loss='binary_crossentropy',
              optimizer=keras.optimizers.Adam(lr=g_learning_rate)
             )

This is what the GAN model looks:

GAN:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
z_in (InputLayer)            (None, 256)               0         
_________________________________________________________________
sequential_1 (Sequential)    (None, 784)               1526288   
_________________________________________________________________
sequential_2 (Sequential)    (None, 1)                 201217    
=================================================================
Total params: 1,727,505
Trainable params: 1,526,288
Non-trainable params: 201,217
_________________________________________________________________
  1.  Great, now that we have defined the three models, we have to train the models. The training is as per the following algorithm:
For each epoch:
  For each batch:
    get real images x_batch
    generate noise z_batch
    generate images g_batch using generator model
    combine g_batch and x_batch into x_in and create labels y_out

    set discriminator model as trainable
    train discriminator using x_in and y_out
    
    generate noise z_batch
    set x_in = z_batch and labels y_out = 1
    
    set discriminator model as non-trainable
    train gan model using x_in and y_out, 
        (effectively training generator model)

For setting the labels, we apply the labels as 0.9 and 0.1 for real and fake images respectively. Generally, it is suggested that you use label smoothing by picking a random value from 0.0 to 0.3 for fake data and 0.8 to 1.0 for real data.

Here is the complete code for training from the notebook:

n_epochs = 400
batch_size = 100
n_batches = int(mnist.train.num_examples / batch_size)
n_epochs_print = 50

for epoch in range(n_epochs+1):
    epoch_d_loss = 0.0
    epoch_g_loss = 0.0
    for batch in range(n_batches):
        x_batch, _ = mnist.train.next_batch(batch_size)
        x_batch = norm(x_batch)
        z_batch = np.random.uniform(-1.0,1.0,size=[batch_size,n_z])
        g_batch = g_model.predict(z_batch)
        
        x_in = np.concatenate([x_batch,g_batch])
        
        y_out = np.ones(batch_size*2)
        y_out[:batch_size]=0.9
        y_out[batch_size:]=0.1
        
        d_model.trainable=True
        batch_d_loss = d_model.train_on_batch(x_in,y_out)

        z_batch = np.random.uniform(-1.0,1.0,size=[batch_size,n_z])
        x_in=z_batch
        
        y_out = np.ones(batch_size)
            
        d_model.trainable=False
        batch_g_loss = gan_model.train_on_batch(x_in,y_out)
        
        epoch_d_loss += batch_d_loss 
        epoch_g_loss += batch_g_loss 
    if epoch%n_epochs_print == 0:
        average_d_loss = epoch_d_loss / n_batches
        average_g_loss = epoch_g_loss / n_batches
        print('epoch: {0:04d}   d_loss = {1:0.6f}  g_loss = {2:0.6f}'
              .format(epoch,average_d_loss,average_g_loss))
        # predict images using generator model trained            
        x_pred = g_model.predict(z_test)
        display_images(x_pred.reshape(-1,pixel_size,pixel_size))

We printed the results every 50 epochs, up to 350 epochs:

350 epochs

The model slowly learns to generate good quality images of handwritten digits from the random noise. There are so many variations of the GANs that it will take another book to cover all the different kinds of GANs. However, the implementation techniques are almost similar to what we have shown here.

Deep Convolutional GAN with TensorFlow and Keras

In DCGAN, both the discriminator and generator are implemented using a Deep
Convolutional Network:

1.  In this example, we decided to implement the generator as the following network:

Generator:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
g_in (Dense)                 (None, 3200)              822400    
_________________________________________________________________
g_in_act (Activation)        (None, 3200)              0         
_________________________________________________________________
g_in_reshape (Reshape)       (None, 5, 5, 128)         0         
_________________________________________________________________
g_0_up2d (UpSampling2D)      (None, 10, 10, 128)       0         
_________________________________________________________________
g_0_conv2d (Conv2D)          (None, 10, 10, 64)        204864    
_________________________________________________________________
g_0_act (Activation)         (None, 10, 10, 64)        0         
_________________________________________________________________
g_1_up2d (UpSampling2D)      (None, 20, 20, 64)        0         
_________________________________________________________________
g_1_conv2d (Conv2D)          (None, 20, 20, 32)        51232     
_________________________________________________________________
g_1_act (Activation)         (None, 20, 20, 32)        0         
_________________________________________________________________
g_2_up2d (UpSampling2D)      (None, 40, 40, 32)        0         
_________________________________________________________________
g_2_conv2d (Conv2D)          (None, 40, 40, 16)        12816     
_________________________________________________________________
g_2_act (Activation)         (None, 40, 40, 16)        0         
_________________________________________________________________
g_out_flatten (Flatten)      (None, 25600)             0         
_________________________________________________________________
g_out (Dense)                (None, 784)               20071184  
=================================================================
Total params: 21,162,496
Trainable params: 21,162,496
Non-trainable params: 0
  1. The generator is a stronger network having three convolutional layers followed by tanh activation. We define the discriminator network as follows:
Discriminator:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
d_0_reshape (Reshape)        (None, 28, 28, 1)         0         
_________________________________________________________________
d_0_conv2d (Conv2D)          (None, 28, 28, 64)        1664      
_________________________________________________________________
d_0_act (Activation)         (None, 28, 28, 64)        0         
_________________________________________________________________
d_0_maxpool (MaxPooling2D)   (None, 14, 14, 64)        0         
_________________________________________________________________
d_out_flatten (Flatten)      (None, 12544)             0         
_________________________________________________________________
d_out (Dense)                (None, 1)                 12545     
=================================================================
Total params: 14,209
Trainable params: 14,209
Non-trainable params: 0
_________________________________________________________________
  1.  The GAN network is composed of the discriminator and generator as demonstrated previously:
GAN:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
z_in (InputLayer)            (None, 256)               0         
_________________________________________________________________
g (Sequential)               (None, 784)               21162496  
_________________________________________________________________
d (Sequential)               (None, 1)                 14209     
=================================================================
Total params: 21,176,705
Trainable params: 21,162,496
Non-trainable params: 14,209
_________________________________________________________________

When we run this model for 400 epochs, we get the following output:

400 epochs

As you can see, the DCGAN is able to generate high-quality digits starting from epoch 100 itself. The DGCAN has been used for style transfer, generation of images and titles and for image algebra, namely taking parts of one image and adding that to parts of another image.

We built a simple GAN in TensorFlow and Keras and applied it to generate images from the MNIST dataset. We also built a DCGAN where the generator and discriminator consisted of convolutional networks.

Do check out the book Mastering TensorFlow 1.x  to explore advanced features of TensorFlow 1.x and obtain in-depth knowledge of TensorFlow for solving artificial intelligence problems.

Mastering TensorFlow 1.x

Read Next:

5 reasons to learn Generative Adversarial Networks (GANs) in 2018

Implementing a simple Generative Adversarial Network (GANs)

Getting to know Generative Models and their types

A Data science fanatic. Loves to be updated with the tech happenings around the globe. Loves singing and composing songs. Believes in putting the art in smart.

LEAVE A REPLY

Please enter your comment!
Please enter your name here