Categories: Tutorials

Computer Vision with Keras, Part 2

5 min read

If you were following along in Part 1, you will have seen how we used Keras to create our model for tackling The German Traffic Sign Recognition Benchmark(GTSRB). Now in Part 2 you will see how we achieve performance close to human-level performance. You will also see how to improve the accuracy of the model using augmentation of the training data.

Training

Now, our model is ready to train. During the training, our model will iterate over batches of the training set, each of size batch_size. For each batch, gradients will be computed and updates will be made to the weights of the network automatically. One iteration over all of the training set is referred to as an epoch. Training is usually run until the loss converges to a constant.

We will add a couple of features to our training:

Learning rate scheduler: Decaying learning rate over the epochs usually helps the model learn better.
Model checkpoint: We will save the model with best validation accuracy. This is useful because our network might start overfitting after a certain number of epochs, but we want the best model.

These are not necessary but they improve the model accuracy. These features are implemented via the callback feature of Keras. callback are a set of functions that will applied at given stages of training procedure like end of an epoch of training. Keras provides inbuilt functions for both learning rate scheduling and model checkpointing.

fromkeras.callbacks import LearningRateScheduler, ModelCheckpoint

deflr_schedule(epoch):
returnlr*(0.1**int(epoch/10))

batch_size = 32
nb_epoch = 30

model.fit(X, Y,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_split=0.2,
callbacks=[LearningRateScheduler(lr_schedule),
ModelCheckpoint('model.h5',save_best_only=True)]
         )

You’ll see that model starts training and logs the losses and accuracies:

Train on 31367 samples, validate on 7842 samples
Epoch 1/30
31367/31367 [==============================] - 30s - loss: 1.1502 - acc: 0.6723 - val_loss: 0.1262 - val_acc: 0.9616
Epoch 2/30
31367/31367 [==============================] - 32s - loss: 0.2143 - acc: 0.9359 - val_loss: 0.0653 - val_acc: 0.9809
Epoch 3/30
31367/31367 [==============================] - 31s - loss: 0.1342 - acc: 0.9604 - val_loss: 0.0590 - val_acc: 0.9825
...

Now this might take a bit of time, especially if you are running on a CPU. If you have anNvidiaGPU, you should install cuda. It speeds up the training dramatically. For example, on my Macbook air, it takes 10 minutes per epoch while on a machine with Nvidia Titan X GPU, it takes 30 seconds. Even modest GPUs offer impressive speedup because of the inherent parallelizability of the neural networks. This makes GPUs necessary for deep learning if anything big has to be done. Grab a coffee while you wait for training to complete ;).

Congratulations! You have just trained your first deep learning model.

Evaluation

Let’s quickly load test data and evaluate our model on it:

import pandas as pd
test = pd.read_csv('GT-final_test.csv',sep=';')

# Load test dataset
X_test = []
y_test = []
i = 0
forfile_name, class_id  in zip(list(test['Filename']), list(test['ClassId'])):
img_path = os.path.join('GTSRB/Final_Test/Images/',file_name)
X_test.append(preprocess_img(io.imread(img_path)))
y_test.append(class_id)

X_test = np.array(X_test)
y_test = np.array(y_test)

# predict and evaluate
y_pred = model.predict_classes(X_test)
acc = np.sum(y_pred==y_test)/np.size(y_pred)
print("Test accuracy = {}".format(acc))

Which outputs on my system (Results may change a bit because the weights of the neural network are randomly initialized):

12630/12630 [==============================] - 2s
Test accuracy = 0.9792557403008709

97.92%! That’s great! It’s not far from average human performance (98.84%)[1].

A lot of things can be done to squeeze out extra performance from the neural net. I’ll implement one such improvement in the next section.

Data Augmentation

You might think 40000 images is a lot of images. Think about it again. Our model has 1358155 parameters (try model.count_params() or model.summary()). That’s 4X the number of training images.

If we can generate new images for training from the existing images, that will be a great way to increase the size of the dataset. This can be done by slightly:

Translating theimage
Rotating theimage
Shearing the image
Zooming in/out of the image

Rather than generating and saving such images to hard disk, we will generate them on the fly during training. This can be done directly using built-in functionality of Keras.

fromkeras.preprocessing.image import ImageDataGenerator

fromkeras.preprocessing.image import ImageDataGenerator
fromsklearn.cross_validation import train_test_split

X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2, random_state=42)

datagen = ImageDataGenerator(featurewise_center=False, 
featurewise_std_normalization=False, 
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2,
shear_range=0.1,
rotation_range=10.,)

datagen.fit(X_train)

# Reinitialize model and compile 
model = cnn_model()
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])

# Train again
nb_epoch = 30
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
samples_per_epoch=X_train.shape[0],
nb_epoch=nb_epoch,
validation_data=(X_val, Y_val),
callbacks=[LearningRateScheduler(lr_schedule),
ModelCheckpoint('model.h5',save_best_only=True)]
                    )

With this model, I get 98.29% accuracy on the test set.

Frankly, I haven’t done much parameter tuning. I’ll make a small list of things which can be tried to improve the model:

Try different network architectures. Try deeper and shallower networks
Try adding BatchNormalization layers to the network
Experiment with different weight initializations
Try different learning rates and schedules
Make an ensemble of models
Try normalization of input images
More aggressive data augmentation

This is but a model for beginners. For state-of-the-art solutions of the problem, you can have a look at this, where the authors achieve 99.61% accuracy with a specialized layer called Spatial Transformer layer.

Conclusion

In this two-part post, you have learned how to use convolutional networks to solve a computer vision problem. We used the Keras deep learning framework to implement CNNs in Python. We have achieved performance close to human-level performance. We also have seen a way to improve the accuracy of the model using augmentation of the training data.

References:

Stallkamp, Johannes, et al. “Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition.” Neural networks 32 (2012): 323-332.

About the author

Sasank Chilamkurthy works at Qure.ai. His work involves deep learning on medical images obtained from radiology and pathology. He completed his UG in Mumbai at the Indian Institute of Technology, Bombay. He can be found on Github at here.

Sasank Chilamkurthy