5 min read

In this post, I will introduce you to how to construct your own network model on Chainer. First, I will begin by introducing Chainer’s basic concepts as its building blocks. Then, you will see how to construct network models on Chainer. Finally, I define a multi-class classifier bound to the network model.

Concepts

Let’s start with the basic concepts Chainer has as its building blocks.

Procedural abstractions

  • Chains
  • Links
  • Functions

Data abstraction

  • Variables

Chainer has three procedural abstractions: chains, links and functions from the higher level. Chains are at the highest level and they represent entire network models. They consist of links and/or other chains. Links are like layers in a network model, which have learnable parameters to be optimized through training. Functions are the most fundamental in Chainer’s procedural abstractions. They take inputs and return outputs. Links use functions to apply them its inputs and parameters to return its outputs.

At the data abstraction level, Chainer has Variables. They represent inputs and outputs of chains, links and functions. As their actual data representation, they wrap numpy and cupy’s n-dimentional arrays.

In the following sections, we will construct our own network models with these concepts.

Construct network models on Chainer

In this section, I describe how to construct multi-layer perceptron (MLP) with three layers on top of the basic concepts shown in the previous section. It is very simple.

from chainer import Chain
import chainer.functions as F
import chainer.links as L

class MLP(Chain):
   def__init__(self):
       super(MLP, self).__init__(
           l1=L.Linear(784, 100),
           l2=L.Linear(100, 100),
           l3=L.Linear(100, 10),
       )

   def__call__(self, x):
       h1 = F.relu(self.l1(x))
       h2 = F.relu(self.l2(h1))
       y = self.l3(h2)
       return y

We define the MLP class derived from Chain class. Then we implement two methods, __init__ and __call__.

The __init__ method is for initializing links that the chain has. It has three fully connected layers: chainer.links.Linear, or L.Linear above. The first layer named l1 takes 784-dimension input vectors, which means a 28×28-pixel grayscale handwritten digital image. The last layer named l3 returns 10-dimension output vectors, which correspond to ten digits from 0 to 9. Between them, it has another middle layer named l2 that takes 100 input/output vectors.

Called on forward propagation is the __call__ method. It takes an input x as a Chainer variable and applies it to three Linear layers and two ReLU activity functions for l1 and l2 Linear layers. It then returns a Chainer variable y that is the output of l3 layer. Because the two ReLU activity functions are Chainer functions, they do not have learnable parameters internally, so you do not have to initialize them with the __init__ method.

This code does forward propagation as well as network construction behind the stage. It is Chainer’s magic, and backward propagation is automatically computed based on the network constructed here when we optimize it.

Of course, you may write the __call__ method as follows. The local variables h1 and h2 are just for clear code.

def__call__(self, x):
   returnself.l3(F.relu(self.l2(F.relu(self.l1(x)))))

Define a Multi-class classifier

Another chain is Classifier, which will be bound to the MLP chain defined in the previous section. Classifier is for general multi-class classification, and it computes loss value and accuracy of a network model for a given input vector and its ground truth.

import chainer.functions as F

class Classifier(Chain):
   def__init__(self, predictor):
       super(Classifier, self).__init__(predictor=predictor)

   def__call__(self, x, t):
       y = self.predictor(x)
       self.loss = F.softmax_cross_entropy(y, t)
       self.accuracy = F.accuracy(y, t)
       returnself.loss

We define the Classifier class derived from the Chain class as well as the MLP class, because it takes a chain as a predictor. We similarly implement the __init__ and __call__ methods.

In the __init__ method, it takes a predictor parameter as a chain, or a link is also acceptable, to initialize this class.

In the __call__ method, it takes two inputs x and t. These are input vectors and its ground truth respectively. Chains that are passed to Chainer optimizers should be compliant with this protocol. First, we give the input vector x to the predictor, which would be MLP model in this post, to get the result y of forward propagation. Then, the loss value and accuracy are computed with the softmax_cross_entropy and accuracy functions for a given ground truth t. Finally, it returns the loss value. The accuracy can be accessed as an attribute any time.

Actually, we initialize this classifier bound to the MLP model as follows. model is passed to Chainer optimizer.

model = Classifier(MLP())

Conclusion

Now we have our own network model. In this post I introduced basic concepts of Chainer. Then we implemented the MLP class that models multi-layer perceptron and Classifier class that computes loss value and accuracy to be optimized for a given ground truth.

from chainer import Chain
import chainer.functions as F
import chainer.links as L

class MLP(Chain):
   def__init__(self):
       super(MLP, self).__init__(
           l1=L.Linear(784, 100),
           l2=L.Linear(100, 100),
           l3=L.Linear(100, 10),
       )

   def__call__(self, x):
       h1 = F.relu(self.l1(x))
       h2 = F.relu(self.l2(h1))
       y = self.l3(h2)
       return y

class Classifier(Chain):
   def__init__(self, predictor):
       super(Classifier, self).__init__(predictor=predictor)

   def__call__(self, x, t):
       y = self.predictor(x)
       self.loss = F.softmax_cross_entropy(y, t)
       self.accuracy = F.accuracy(y, t)
       returnself.loss

model = Classifier(MLP())

As some next steps, you may want to learn:

  • How to optimize the model.
  • How to train and test the model for an MNIST dataset.
  • How to accelerate training the model using GPU.

Chainer provides an MNIST example program in the chainer/examples/mnist directory, which will help you.

About the author

Masayuki Takagi is an entrepreneur and software engineer from Japan. His professional experience domains are advertising and deep learning, serving big Japanese corporations. His personal interests are fluid simulation, GPU computing, FPGA, compiler, and programming language design. Common Lisp is his most beloved programming language and he is the author of the cl-cuda library. Masayuki is a graduate of the University of Tokyo and lives in Tokyo with his buddy Plum, a long coat Chihuahua, and aco.

LEAVE A REPLY

Please enter your comment!
Please enter your name here