How to Create a Neural Network in TensorFlow

8 min read
This article has been extracted from the book Principles of Data Science authored by Sinan Ozdemir. With a unique approach that bridges the gap between mathematics and computer science, the books takes you through the entire data science pipeline. Beginning with cleaning and preparing data, and effective data mining strategies and techniques to help you get to grips with machine learning.

In this article, we’re going to learn how to create a neural network whose goal will be to classify images.

Tensorflow is an open-source machine learning module that is used primarily for its simplified deep learning and neural network abilities. I would like to take some time to introduce the module and solve a few quick problems using tensorflow.

Let’s begin with some imports:

from sklearn import datasets, metrics

import tensorflow as tf

import numpy as np

from sklearn.cross_validation import train_test_split

%matplotlib inline

Loading our iris dataset:

# Our data set of iris flowers

iris = datasets.load_iris()

# Load datasets and split them for training and testing

X_train, X_test, y_train, y_test = train_test_split(,


Creating the Neural Network:

# Specify that all features have real-value datafeature_columns = [tf.contrib.layers.real_valued_column("",
dimension=4)] optimizer = tf.train.GradientDescentOptimizer(learning_rate=.1) # Build 3 layer DNN with 10, 20, 10 units respectively.

classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,

hidden_units=[10, 20, 10],



# Fit model.,



Notice that our code really hasn’t changed from the last segment. We still have our feature_columns from before, but now we introduce, instead of a linear classifier, a DNNClassifier, which stands for Deep Neural Network Classifier.

This is TensorFlow’s syntax for implementing a neural network. Let’s take a closer look:


hidden_units=[10, 20, 10],



We see that we are inputting the same feature_columns, n_classes, and optimizer, but see how we have a new parameter called hidden_units? This list represents the number of nodes to have in each layer between the input and the output layer. All in all, this neural network will have five layers:

  • The first layer will have four nodes, one for each of the iris feature variables. This layer is the input layer.
  • A hidden layer of 10 nodes.
  • A hidden layer of 20 nodes.
  • A hidden layer of 10 nodes.
  • The final layer will have three nodes, one for each possible outcome of the network. This is called our output layer.

Now that we’ve trained our model, let’s evaluate it on our test set:

# Evaluate accuracy.

accuracy_score = classifier.evaluate(x=X_test,


print('Accuracy: {0:f}'.format(accuracy_score))

Accuracy: 0.921053

Hmm, our neural network didn’t do so well on this dataset, but perhaps it is because the network is a bit too complicated for such a simple dataset. Let’s introduce a new dataset that has a bit more to it…
The MNIST dataset consists of over 50,000 handwritten digits (0-9) and the goal is to recognize the handwritten digits and output which letter they are writing. Tensorflow has a built-in mechanism for downloading and loading these images.

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)

Extracting MNIST_data/train-images-idx3-ubyte.gz

Extracting MNIST_data/train-labels-idx1-ubyte.gz

Extracting MNIST_data/t10k-images-idx3-ubyte.gz

Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Notice that one of our inputs for downloading mnist is called one_hot. This parameter either brings in the dataset’s target variable (which is the digit itself) as a single number or has a dummy variable.

For example, if the first digit were a 7, the target would either be:

  • 7: If one_hot was false
  • 0 0 0 0 0 0 0 1 0 0: If one_hot was true (notice that starting from 0, the seventh index is a 1)

We will encode our target the former way, as this is what our tensorflow neural network and our sklearn logistic regression will expect.

The dataset is split up already into a training and test set, so let’s create new variables to hold them:

x_mnist = mnist.train.images

y_mnist = mnist.train.labels.astype(int)

For the y_mnist variable, I specifically cast every target as an integer (by default they come in as floats) because otherwise tensorflow would throw an error at us.

Out of curiosity, let’s take a look at a single image:

import matplotlib.pyplot as plt

plt.imshow(x_mnist[10].reshape(28, 28))

Neural network in tensorflow

And hopefully our target variable matches at the 10th index as well:



Excellent! Let’s now take a peek at how big our dataset is:


(55000, 784)



Our training size then is 55000 images and target variables.

Let’s fit a deep neural network to our images and see if it will be able to pick up on the patterns in our inputs:

# Specify that all features have real-value data

feature_columns = [tf.contrib.layers.real_valued_column("",


optimizer = tf.train.GradientDescentOptimizer(learning_rate=.1)

# Build 3 layer DNN with 10, 20, 10 units respectively.

classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,

    hidden_units=[10, 20, 10],



# Fit model.,



# Warning this is veryyyyyyyy slow

This code is very similar to our previous segment using DNNClassifier; however, look how in our first line of code, I have changed the number of columns to be 784 while in the classifier itself, I changed the number of output classes to be 10. These are manual inputs that tensorflow must be given to work.

The preceding code runs very slowly. It is little by little adjusting itself in order to get the best possible performance from our training set. Of course, we know that the ultimate test here is testing our network on an unknown test set, which is also given to us from tensorflow:

x_mnist_test = mnist.test.images

y_mnist_test = mnist.test.labels.astype(int)


(10000, 784)



So we have 10,000 images to test on; let’s see how our network was able to adapt to the dataset:

# Evaluate accuracy.

accuracy_score = classifier.evaluate(x=x_mnist_test,


print('Accuracy: {0:f}'.format(accuracy_score))

Accuracy: 0.920600

Not bad, 92% accuracy on our dataset. Let’s take a second and compare this performance to a standard sklearn logistic regression now:

logreg = LogisticRegression(), y_mnist)

# Warning this is slow

y_predicted = logreg.predict(x_mnist_test)

from sklearn.metrics import accuracy_score

# predict on our test set, to avoid overfitting!

accuracy = accuracy_score(y_predicted, y_mnist_test)

# get our accuracy score



Success! Our neural network performed better than the standard logistic regression. This is likely because the network is attempting to find relationships between the pixels themselves and using these relationships to map them to what digit we are writing down. In logistic regression, the model assumes that every single input is independent of one another, and therefore has a tough time finding relationships between them.

There are ways of making our neural network learn differently:

  • We could make our network wider, that is, increase the number of nodes in the hidden layers instead of having several layers of a smaller number of nodes:

Neural Network in Tensorflow

# A wider network

feature_columns = [tf.contrib.layers.real_valued_column("",

optimizer = tf.train.GradientDescentOptimizer(learning_rate=.1)

# Build 3 layer DNN with 10, 20, 10 units respectively.

classifier = tf.contrib.learn.DNNClassifier(feature_




# Fit model.,



# Warning this is veryyyyyyyy slow

# Evaluate accuracy.

accuracy_score = classifier.evaluate(x=x_mnist_test,

print('Accuracy: {0:f}'.format(accuracy_score))

Accuracy: 0.898400
  • We could increase our learning rate, forcing the network to attempt to converge into an answer faster. As mentioned before, we run the risk of the model skipping the answer entirely if we go down this route. It is usually better to stick with a smaller learning rate.
  • We can change the method of optimization. Gradient descent is very popular; however, there are other algorithms for doing so. One example is called the Adam Optimizer. The difference is in the way they traverse the error function, and therefore the way that they approach the optimization point. Different problems in different domains call for different optimizers.
  • There is no replacement for a good old fashioned feature selection phase instead of attempting to let the network figure everything out for us. We can take the time to find relevant and meaningful features that actually will allow our network to find an answer quicker!

There you go! You’ve now learned how to build a neural net in Tensorflow! If you liked this tutorial and would like to learn more, head over and grab the copy Principles of Data Science.

If you want to take things a bit further and learn how to classify Irises using multi-layer perceptrons, head over here.

Principles of Data Science




Please enter your comment!
Please enter your name here