6 min read

In this second and third part of this series with Chainer, we are going to train an autoencoder. These two parts will mostly consist of code and explanations about the internal architecture of the framework. The autoencoder that we will implement will have one hidden layer with 2 nodes and 3 nodes in the input and output. The dimension of the output layer needs to equal the dimensions of the input layer since we are training an autoencoder. The goal is to train this model to compress the 3-dimensional data into 2 dimensions in the hidden layer and then be able to reconstruct the initial data from the hidden layer.

Preparing the Data

Let’s create 1000 training samples, each one with 3 random floating-point values, the same dimensions as the input layer of the model. This data is stored in one single NumPy array of shape (1000, 3). A copy of this data is created so that it can be used as the target values during the training phase. This is not specific to Chainer but simple Python using NumPy. It is worth noting that this is all the data we need to prepare. Later during the training phase, we will convert this data into Chainer variables as described in Part 1 of this series.

import numpy as np

input_size = 3
train_size = 1000

x_train = np.random.rand(train_size, input_size).astype(np.float32)
y_train = x_train.copy()

Creating a Model

Defining a model in Chainer is done via code in contrast to other frameworks such as Caffe where the models are defined in configuration files (.prototxt). It is therefore quite easy to debug. You might, however, need to get used to the assertion errors that Chainer throws at you when the model layers aren’t compatible or variable dimensions aren’t what the framework expects. You can wrap all of the network definitions such as the layers and the loss functions in a class that inherits chainer.Chain as follows.

from chainer import Chain
from chainer import links as L
from chainer import functions as F

class Autoencoder(Chain):

   def__init__(self):
       # Layer / Connection definitions in the constructor
       super().__init__(
           l1=L.Linear(3, 2),
           l2=L.Linear(2, 3)
       )
       self.train = True

   def__call__(self, x, t):
       # Forward pass
       h = self.l1(x)
       y = self.l2(h)

       ifself.train:
           self.loss = F.mean_squared_error(y, t)
           returnself.loss
       else:
           return y

This is truly elegant and readable. This is not the only way to implement a model in Chainer but a commonly seen pattern. For instance, notice that the __call__ method (which is invoked directly on an instance of this class (for example, model = Autoencoder(); model(x, t)) acts as the loss function when the model train property is set to True. It takes both the input and the target; it performs a forward pass and then computes and returns the loss. It can be used as a regular feed forward network by setting the train property to False and skipping the target argument. Let’s take a closer look at the connections and the loss function.

Links

Links, or in this case the layer connections defined by chainer.links.Linear, is a subclass of chainer.link.Link, a basic building block for a network. The linear link included in this autoencoder is usually referred to as the fully connected layer. When no other arguments are passed to the constructor of a Link, a bias vector is created behind the scenes with an initial value of 0. You may pass nobias=True to skip bias nodes altogether or set the initial bias vector to any arbitrary values. Other links except the linear include convolution layers, inception layers from GoogLeNet and LSTM layers to mention a few. The actual model itself inherits from chainer.link.Chain which is a subclass of chianer.link.Link, basically a container for multiple links.

Sets of weights and biases for each layer can be accessed directly using chainer.links.Linear.W or chainer.links.Linear.b so we’d call l1.b to get the bias vector from the first layer. If a model is trained using another framework such as Caffe, those parameters could be loaded into memory and then copied over to any Chainer model by directly accessing the weights and bias values. It is therefore quite easy to convert a Caffe model to a Chainer model.

Functions

The function module, chainer.functions contains various loss functions such as the mean squared error used in the example, activation functions such as sigmoid, tanh, ReLU and Softmax. It also contains dropout, pooling functions, accuracy evaluation and basic arithmetic. It is a wide set of functions but they share the fact that they all inherit the chainer.function.Function base class. What it means is that they all implement the forward pass and back propagation logic. If for instance you want to implement your own loss function in Chainer, you’d have to inherit chainer.function.Function too and implement those necessary methods. More functions are introduced in later sections.

Training a Model

The code below shows the full training loop. It runs for 1000 epochs, meaning that it goes through the complete training set 1000 times. The order in which the training samples are iterated over is randomly shuffled in each epoch. We also split up the training samples into 10 batches so the weights are only updated 10 times in each epoch.

# Assume that this code follows the previous data preparation and model
# definition code

from chainer import Variable, optimizers

model = Autoencoder()

learning_rate = 0.1

# Introducing the optimizer. It will be explained in the next section
optimizer = optimizers.SGD(lr=learning_rate)
optimizer.setup(model)

epochs = 1000
batch_size = int(train_size / 10)

for epoch in range(epochs):

   # Randomly change the order of the training samples in each epoch
   indexes = np.random.permutation(train_size)
  
   # Accumulate the loss over the epoch
   epoch_sum_loss = 0

   for i in range(0, train_size, batch_size):
       batch_indexes = indexes[i : i + batch_size]
       batch_x_data = np.asarray(x_train[batch_indexes])
       batch_y_data = np.asarray(y_train[batch_indexes])

       x = Variable(batch_x_data)
       t = Variable(batch_y_data)

       optimizer.update(model, x, t)
       epoch_sum_loss += model.loss * batch_size

   epoch_avg_loss = epoch_sum_loss / train_size

   print('Epoch: {} Loss: {}'.format(epoch, epoch_avg_loss.data))

Running the code above might output something like this. You can see that the average loss is decreasing.

Epoch: 0 Loss: 0.48210659623146057
Epoch: 1 Loss: 0.1797855794429779
Epoch: 2 Loss: 0.1128358468413353
Epoch: 3 Loss: 0.08458908647298813
Epoch: 4 Loss: 0.07391669601202011
Epoch: 5 Loss: 0.06650342792272568
Epoch: 6 Loss: 0.05791966989636421
Epoch: 7 Loss: 0.0553070604801178
Epoch: 8 Loss: 0.05461772903800011
Epoch: 9 Loss: 0.05078549310564995
...

Most of the code should be familiar but you might wonder what the optimizer is, which is covered in Part 3, along with batches, more complex networks, running the code on the GPU, and saving and loading data.

Summary

Defining and training neural networks with Chainer is intuitive and requires little code. It is easy to maintain and experiment with various hyper parameters because of its design. In this second part of the series with Chainer, we implemented a neural network and trained it with randomly generated data and common patterns were introduced such as how to design the model and the loss function to demonstrate this fact. Stay tuned for Part 3 where we cover the optimizer, batches, complex networks and running the code on the GPU.

About the Author

Hiroyuki Vincent Yamazaki is a graduate student at KTH, Royal Institute of Technology in Sweden, currently conducting research in convolutional neural networks at Keio University in Tokyo, partially using Chainer as a part of a double-degree programme.

GitHub

LinkedIn 

LEAVE A REPLY

Please enter your comment!
Please enter your name here