Introduction to Keras

5 min read

Keras is a high-level library for deep learning, which is built on top of theano and tensorflow. It is written in Python and provides a scikit-learn type API for building neural networks. It enables developers to quickly build neural networks without worrying about the mathematical details of tensor algebra, optimization methods, and numerical methods. The key idea behind keras is to facilitate fast prototyping and experimentation. In the words of Francois Chollet, creator of keras,Being able to go from idea to result with the least possible delay is the key to doing good research.”

Key features of keras:

  • Any one of the theano and tensorflow backends can be used.
  • Supports both CPU and GPU.
  • Keras is modular in nature in the sense that each component of a neural network model is a separate, standalone module, and these modules can be combined to create new models. New modules are easy to add.
  • Write only Python code.


Keras has the following dependencies: numpyscipypyyamlhdf5 (for saving/loading models) – theano (for theano backend) – tensorflow (for tensorflow backend).

The easiest way to install keras is using Python Project Index (PyPI):

sudo pip install keras

Example: MNIST digits classification using keras

We will learn about the basic functionality of keras using an example. We will build a simple neural network for classifying hand-written digits from the MNIST dataset. Classification of hand-written digits was the first big problem where deep learning outshone all the other known methods and this paved the way for deep learning on a successful track.

Let’s start by importing data; we will use the sample of hand-written digits provided with the scikit-learn base package:

from sklearn import datasets

mnist = datasets.load_digits()
X =
Y =

Let’s examine the data:

print X.shape, Y.shape
print X[0]
print Y[0]

Since we are working with numpy arrays, let’s import numpy:

import numpy

# set seed

Now, we’ll split the data into training and test sets by randomly picking 70% of the data points as a training set and the remaining for validation:

from sklearn.cross_validation import train_test_split

train_X, test_X, train_y, test_y = train_test_split(X, Y, train_size=0.7, random_state=0)

Keras requires the labels to be one-hot-encoded, i.e., the labels 1, 2, 3,..,etc., need to be converted to vectors like [1,0,0,…], [0,1,0,0…], [0,0,1,0,0…], respectively:

def one_hot_encode_object_array(arr):
    '''One hot encode a numpy array of objects (e.g. strings)'''
    uniques, ids = np.unique(arr, return_inverse=True)
    return np_utils.to_categorical(ids, len(uniques))

# One hot encode labels for training and test sets.
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)

We are now ready to build a neural network model. Start by importing the relevant classes from keras:

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import np_utils

In keras, we have to specify the structure of the model before we can use it. A Sequential model is a linear stack of layers. There are other alternatives in keras, but we will with sequential for simplicity:

model = Sequential()

This creates an instance of the constructor; we don’t have anything in the model as yet. As stated previously, keras is modular and we can add different components to the model via modules. Let’s add a fully connected layer with 32 units. Each unit receives an input from every unit in the input layer, and since the number of units in the input is equal to the dimension (64) of the input vectors, we need the input shape to be 64. Keras uses a Dense module to create a fully connected layer:

model.add(Dense(32, input_shape=(64,)))

Next, we add an activation function after the first layer. We will use sigmoid activation. Other choices like relu, etc., are also possible:


We can add any number of layers this way. But for simplicity, we will restrict to only one hidden layer. Add the output layer. Since the output is a 10-dimensional vector, we require the output layer to have 10 units:


Add activation for the output layer. In classification tasks, we use softmax activation. This provides a probilistic interpretation for the output labels:


Next, we need to configure the model. There are some more choices we need to make before we can run the model, e.g., choose an optimization method, loss function, and metric of evaluation:

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

The compile method configures the model, and the model is now ready to be trained on data. Similar to sklearn, keras has a fit method for training:, train_y_ohe, nb_epoch=10, batch_size=30)

Training neural networks often involves the concept of minibatching, which means showing the network a subset of the data, adjusting the weights, and then showing it another subset of the data. When the network has seen all the data once, that’s called an “epoch”. Tuning the minibatch/epoch strategy is a somewhat problem-specific issue.

After the model has trained, we can compute its accuracy on the validation set:

loss, accuracy = model.evaluate(test_X, test_y_ohe)
print accuracy


We have seen how a neural network can be built using keras, and how easy and intuitive the keras API is. This is just an introduction, a hello-world program, if you will. There is a lot more functionality in keras, including convolutional neural networks, recurrent neural networks, language modeling, deep dream, etc.

About the author

Janu Verma is a Researcher in the IBM T.J. Watson Research Center, New York. His research interests are in mathematics, machine learning, information visualization, computational biology and healthcare analytics. He has held research positions at Cornell University, Kansas State University, Tata Institute of Fundamental Research, Indian Institute of Science, and Indian Statistical Institute. He has written papers for IEEE Vis, KDD, International Conference on HealthCare Informatics, Computer Graphics and Applications, Nature Genetics, IEEE Sensors Journals, etc. His current focus is on the development of visual analytics systems for prediction and understanding. He advises startups and companies on data science and machine learning in the Delhi-NCR area; email to schedule a meeting.


Please enter your comment!
Please enter your name here