Home Tutorials How to start Chainer

Tutorials

How to start Chainer

October 7, 2016 - 12:00 am

2237

10 min read

Chainer is a powerful, flexible, and intuitive framework for deep learning. In this post, I will help you to get started with Chainer.

Why is Chainer important to learn? With its “Define by run” approach, it provides very flexible and easy ways to construct—and debug if you encounter some troubles—network models. It also works efficiently with well-tuned numpy. Additionally, it provides transparent GPU access facility with a fairly sophisticated cupy library, and it is actively developed.

How to install Chainer

Here I’ll show you how to install Chainer. Chainer has CUDA and cuDNN support. If you want to use Chainer with them, follow along in the “Install Chainer with CUDA and cuDNN support” section.

Install Chainer via pip

You can install Chainer easily via pip, which is the officially recommended way.

$ pip install chainer

Install Chainer from source code

You can also install it from its source code.

$ tar zxf chainer-x.x.x.tar.gz
$ cd chainer-x.x.x
$ python setup.py install

Install Chainer with CUDA and cuDNN support

Chainer has GPU support with Nvidia CUDA. If you want to use Chainer with CUDA support, install the related software in the following order.

Install CUDA Toolkit
(optional) Install cuDNN
Install Chainer

Chainer has cuDNN suport as well as CUDA. cuDNN is a library for accelerating deep neural networks and Nvidia provides it. If you want to enable cuDNN support, install cuDNN in an appropriate path before installing Chainer. The recommended install path is in the CUDA Toolkit directory.

$ cp /path/to/cudnn.h $CUDA_PATH/include
$ cp /path/to/libcudnn.so* $CUDA_PATH/lib64

After that, if the CUDA Toolkit (and cuDNN if you want) is installed in its default path, the Chainer installer finds them automatically.

$ pip install chainer

If CUDA Toolkit is in a directory other than the default one, setting the CUDA_PATH environment variable helps.

$ CUDA_PATH=/opt/nvidia/cuda pip install chainer

In case you have already installed Chainer before setting up CUDA (and cuDNN), reinstall Chainer after setting them up.

$ pip uninstall chainer
$ pip install chainer --no-cache-dir

Trouble shooting

If you have some trouble installing Chainer, using the -vvvv option with the pip command may help you. It shows all logs during installation.

$ pip install chainer -vvvv

Train multi-layer perceptron with MNIST dataset

Now I cover how to train the multi-layer perceptron (MLP) model with the MNIST handwritten digit dataset with Chainer is shown. It is very simple because Chainer provides an example program to do it. Additionally, we run the program on GPU to compare its training efficiency between CPU and GPU.

Run MNIST example program

Chainer proves an example program to train the multi-layer perceptron (MLP) model with the MNIST dataset.

$ cd chainer/examples/mnist
$ ls *.py
data.py net.py train_mnist.py

data.py, which is used from train_mnist.py, is a script for downloading the MNIST dataset from the Internet. net.py is also used from train_mnist.py as a script where a MLP network model is defined.

train_mnist.py is a script we run to train the MLP model. It does the following things:

Download and convert the MNIST dataset.
Prepare an MLP model.
Train the MLP model with the MNIST dataset.
Test the trained MLP model.

Here we run the train_mnist.py script on the CPU. It may take a couple of hours depending on your machine spec.

$ python train_mnist.py
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20
Network type: simple

load MNIST dataset
Downloading train-images-idx3-ubyte.gz...
Done
Downloading train-labels-idx1-ubyte.gz...
Done
Downloading t10k-images-idx3-ubyte.gz...
Done
Downloading t10k-labels-idx1-ubyte.gz...
Done
Converting training data...
Done
Converting test data...
Done
Save output...
Done
Convert completed
epoch 1
graph generated
train mean loss=0.192189417146, accuracy=0.941533335938, throughput=121.97765842 images/sec
test mean loss=0.108210202637, accuracy=0.966000005007
epoch 2
train mean loss=0.0734026790201, accuracy=0.977350010276, throughput=122.715585263 images/sec
test mean loss=0.0777539889357, accuracy=0.974500003457
...
epoch 20
train mean loss=0.00832913763473, accuracy=0.997666668793, throughput=121.496046895 images/sec
test mean loss=0.131264564424, accuracy=0.978300007582
save the model
save the optimizer

After training for 20 epochs, the MLP model is well trained to achieve a loss value of around 0.131 and accuracy of around 0.978 in testing. The trained model and state files are stored as mlp.model and mlp.state respectively, so we can resume training or testing with them.

Accelerate training MLP with GPU

The train_mnist.py script has a –gpu option to train the MLP model on GPU. To use it, specify which GPU you use giving a GPU index that is an integer beginning from 0. -1 means you are using CPU.

$ python train_mnist.py --gpu 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
Network type: simple

load MNIST dataset
epoch 1
graph generated
train mean loss=0.189480076165, accuracy=0.942750002556, throughput=12713.8872884 images/sec
test mean loss=0.0917134844698, accuracy=0.97090000689
epoch 2
train mean loss=0.0744868403107, accuracy=0.976266676188, throughput=14545.6445472 images/sec
test mean loss=0.0737037020434, accuracy=0.976600006223
...
epoch 20
train mean loss=0.00728972356146, accuracy=0.9978333353, throughput=14483.9658281 images/sec
test mean loss=0.0995463995047, accuracy=0.982400006056
save the model
save the optimizer

After training for 20 epochs, the MLP model is trained as well as on CPU to get a loss value of around 0.0995 and accuracy of around 0.982 in testing.

Compared in average throughput, the GPU is more than 10,000 images/sec than that on CPU, shown in the previous section at around 100 images/sec[VP1] . While the CPU of my machine is a bit too poor compared with its GPU, in fairness the CPU version runs in a single thread, but this result should be enough to illustrate the efficiency of training on the GPU.

Little code required to run Chainer on GPU

The code required to run Chainer on GPU is less. In the train_mnist.py script, only the three treatments are needed:

Select the GPU device to use.
Transfer the network model to GPU.
Allocate the multi-dimensional matrix on the GPU device installed on the host.

The following lines are related to train_mnist.py.

...
if args.gpu >= 0:
   cuda.get_device(args.gpu).use()
   model.to_gpu()
xp = np if args.gpu < 0else cuda.cupy
...

and

...
for i in six.moves.range(0, N, batchsize):
   x = chainer.Variable(xp.asarray(x_train[perm[i:i + batchsize]]))
   t = chainer.Variable(xp.asarray(y_train[perm[i:i + batchsize]]))
...

Run Caffe reference models on Chainer

Chainer has a powerful feature to interpret pre-trained Caffe reference models. This makes us able to use those models easily on Chainer without enormous training efforts. In this part we import the pre-trained GoogLeNet model and use it to predict what an input image means.

Model Zoo directory

Chainer has a Model Zoo directory in the following path.

$ cd chainer/example/modelzoo

Download Caffe reference model

First, download a Caffe reference model from BVLC Model Zoo. Chainer provides a simple script for that. This time use the pre-trained GoogLeNet model.

$ python download_model.py googlenet
Downloading model file...
Done
$ ls *.caffemodel
bvlc_googlenet.caffemodel

Download ILSVRC12 mean file

We also download the ILSVRC12 mean image file, which Chainer uses. This mean image is used to subtract from images we want to predict.

$ python download_mean_file.py
Downloading ILSVRC12 mean file for NumPy...
Done
$ ls *.npy
ilsvrc_2012_mean.npy

Get ILSVRC12 label descriptions

Because the output of the Caffe reference model is a set of label indices as integers, we cannot find out which index means which category of images. So we get label descriptions from BVLC. The row numbers correspond to the label indices of the model.

$ wget -c http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
$ tar -xf caffe_ilsvrc12.tar.gz
$ cat synset_words.txt | awk '{$1=""; print}' > labels.txt
$ cat labels.txt
tench, Tinca tinca
goldfish, Carassius auratus
great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
tiger shark, Galeocerdo cuvieri
hammerhead, hammerhead shark
electric ray, crampfish, numbfish, torpedo
stingray
cock
hen
ostrich, Struthio camelus
...

Mac OS X does not have a wget command, so you can use the curl command instead.

$ curl -O http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz

Python script to run Caffe reference model

While there is an evaluate_caffe_net.py script in the modelzoo directory, it is for evaluating the accuracy of interpreted Caffe reference models against the ILSVRC12 dataset. Now input an image and predict what it means. So use another predict_caffe_net.py script that is not included in the modelzoo directory. Here is that code listing.

#!/usr/bin/env python

from __future__ import print_function
import argparse
import sys

import numpy as np
from PIL import Image

import chainer
import chainer.functions as F
from chainer.links import caffe

in_size = 224

# Parse command line arguments.
parser = argparse.ArgumentParser()
parser.add_argument('image', help='Path to input image file.')
parser.add_argument('model', help='Path to pretrained GoogLeNet Caffe model.')
args = parser.parse_args()

# Constant mean over spatial pixels.
mean_image = np.ndarray((3, in_size, in_size), dtype=np.float32)
mean_image[0] = 104
mean_image[1] = 117
mean_image[2] = 123

# Prepare input image.
def resize(image, base_size):
   width, height = image.size
   if width > height:
       new_width = base_size * width / height
       new_height = base_size
   else:
       new_width = base_size
       new_height = base_size * height / width
   return image.copy().resize((new_width, new_height))

def clip_center(image, size):
   width, height = image.size
   width_offset = (width - size) / 2
   height_offset = (height - size) / 2
   box = (width_offset, height_offset,
           width_offset + size, height_offset + size)
   image1 = image.crop(box)
   image1.load()
   return image1

image = Image.open(args.image)
image = resize(image, 256)
image = clip_center(image, in_size)
image = np.asarray(image).transpose(2, 0, 1).astype(np.float32)
image -= mean_image

# Make input data from the image.
x_data = np.ndarray((1, 3, in_size, in_size), dtype=np.float32)
x_data[0] = image

# Load Caffe model file.
print('Loading Caffe model file ...', file=sys.stderr)
func = caffe.CaffeFunction(args.model)
print('Loaded', file=sys.stderr)

# Predict input image.
def predict(x):
   y, = func(inputs={'data': x}, outputs=['loss3/classifier'],
             disable=['loss1/ave_pool', 'loss2/ave_pool'],
             train=False)
   return F.softmax(y)

x = chainer.Variable(x_data, volatile=True)
y = predict(x)

# Print prediction scores.
categories = np.loadtxt("labels.txt", str, delimiter="t")
top_k = 20
result = zip(y.data[0].tolist(), categories)
result.sort(cmp=lambda x, y: cmp(x[0], y[0]), reverse=True)
for rank, (score, name) in enumerate(result[:top_k], start=1):
   print('#%d | %4.1f%% | %s' % (rank, score * 100, name))

You may need to install Pillow to run this script. Pillow is a Python imaging library and we use it to manipulate input images.

$ pip install pillow

Run Caffe reference model

Now you can run the pre-trained GoogLeNet model on Chainer for prediction. This time, we use the following JPEG image of a dog, specially classified as a long coat Chihuahua in dog breeds.

$ ls *.jpg
image.jpg

Sample image

To predict, just run the predict_caffe_net.py script.

$ python predict_caffe_net.py image.jpg googlenet bvlc_googlenet.caffemodel
Loading Caffe model file bvlc_googlenet.caffemodel...
Loaded
#1 | 48.2% | Chihuahua
#2 | 24.5% | Japanese spaniel
#3 | 24.2% | papillon
#4 | 1.1% | Pekinese, Pekingese, Peke
#5 | 0.5% | Boston bull, Boston terrier
#6 | 0.4% | toy terrier
#7 | 0.2% | Border collie
#8 | 0.2% | Pomeranian
#9 | 0.1% | Shih-Tzu
#10 | 0.1% | bow tie, bow-tie, bowtie
#11 | 0.0% | feather boa, boa
#12 | 0.0% | tennis ball
#13 | 0.0% | Brabancon griffon
#14 | 0.0% | Blenheim spaniel
#15 | 0.0% | collie
#16 | 0.0% | quill, quill pen
#17 | 0.0% | sunglasses, dark glasses, shades
#18 | 0.0% | muzzle
#19 | 0.0% | Yorkshire terrier
#20 | 0.0% | sunglass

Here GoogLeNet predicts that most likely the input image means Chihuahua, which is the correct answer.

While its prediction percentage is 48.2%, the following candidates are Japanese spaniel (aka. Japanese Chin) and papillon, which are hard to distinguish from Chihuahua even for human eyes at a glance.

Conclusion

In this post, I showed you some basic entry points to start using Chainer. The first is how to install Chainer. Then we saw how to train a multi-layer perceptron with the MNIST dataset on Chainer, illustrating the efficiency of training on GPU, which you can get easily on Chainer. Finally, I indicated how to run the Caffe reference model on Chainer, which enables you to use pre-trained Caffe models out of the box.

As next steps, you may want to learn:

Playing with other Chainer examples.
Defining your own network models.
Implementing your own functions.

For details, Chainer’s official documentation will help you.

About the author

Masayuki Takagi is an entrepreneur and software engineer from Japan. His professional experience domains are advertising and deep learning, serving big Japanese corporations. His personal interests are fluid simulation, GPU computing, FPGA, and compiler and programming language design. Common Lisp is his most beloved programming language and he is the author of the cl-cuda library. Masayuki is a graduate of the university of Tokyo and lives in Tokyo with his buddy Plum, a long coat Chihuahua, and aco.