In this article by **Giancarlo Zaccone**, the author of Getting Started with TensorFlow, we will learn about **artificial neural networks** (**ANNs**), an information processing system whose operating mechanism is inspired by biological neural circuits. Thanks to their characteristics, neural networks are the protagonists of a real revolution in machine learning systems and more generally in the context of Artificial Intelligence. An artificial neural network possesses many simple processing units variously connected to each other, according to various architectures. If we look at the schema of an ANN report, it can be seen that the *hidden units* communicate with the external layer, both in input and output, while the *input* and *output units* communicate only with the *hidden layer* of the network

Each unit or node simulates the role of the neuron in biological neural networks, a node, said *artificial neuron*, plays a very simple operation: becomes *active* if the total quantity of signal, which it receives exceeds its activation threshold, defined by the so-called activation function. If a node becomes*active*, it emits a signal that is transmitted along the transmission channels up to the other unit to which it is connected. A connection point acts as a filter that converts the message into an inhibitory or excitatory signal increasing or decreasing the intensity, according to their individual characteristics. The connection points simulate the biological synapses and have the fundamental function *to weigh* the intensity of the transmitted signals, by multiplying them by the weights whose value depends on the connection itself.

ANN schematic diagram

### Neural network architectures

The *way to connect* the nodes, the *total number* of layers, that is, the levels of nodes between input and output, define the *architecture* of a neural network. For example, in a **multilayer networks**, one can identify the artificial neurons of layers such that:

- Each neuron is connected with all those of the next layer
- There are no connections between neurons belonging to the same layer
- The number of layers and of neurons per layer depends on the problem to be solved

Now we start our exploration of neural network models, introducing the most simple neural network model: the **Single Layer Perceptron** or the so-called Rosenblatt’s Perceptron.

## Single Layer Perceptron

The Single Layer Perceptron was the first neural network model proposed in 1958 by *Frank Rosenblatt*. In this model, the content of the local memory of the neuron consists of a vector of weights, *W = (w1, w2,……, wn)*. The computation is performed over the calculation of a sum of the input vector *X =(x1, x2,……, xn)*, each of which is multiplied by the corresponding element of the vector of the weights; then the value provided in the output (that is, a weighted sum) will be the input of an activation function. This function returns *1* if the result is greater than a certain threshold, otherwise it returns *-1*. In the following figure, the activation function is the so-called sign function:

```
+1 x > 0
sign(x) =
−1 otherwise
```

It is possible to use other activation functions, preferably non-linear (such as the *sigmoid* function that we will see in the next section). The learning procedure of the net is iterative: it slightly modifies for each learning cycle (called *epoch*) the synaptic weights by using a selected set called training set. At each cycle, the weights must be modified so as to minimize a cost function, which is specific to the problem under consideration. Finally, when the perceptron will be trained on the training set, it will be able to be tested on other inputs (the test set) in order to verify its capacity of generalization.

Schema of a Rosemblatt’s Perceptron

Let’s now see how to implement a single layer neural network for an image classification problem using TensorFlow.

## The logistic regression

This algorithm has nothing to do with the canonical linear regression, but it is an algorithm that allows us to solve supervised classification problems. In fact to estimate the dependent variable, now we make use of the so-called logistic function or sigmoid. It is precisely because of this feature that we call this algorithm logistic regression.The sigmoid function has this pattern:

As we can see, the dependent variable takes values strictly between *0* and *1* that is precisely what serves us. In the case of the *logistic regression* we want, then, that our function tell us what’s the *probability* of belonging to a particular element of our class. We recall again that the *supervised* learning by the neural network is configured as an *iterative process of optimization of the weights*; these are then modified on the basis of the network’s performance of the training set. Indeed the aim is to the *loss function* which indicates the degree to which the behavior of the network deviates from the desired one. The performance of the network is then verified on a test set, consisting of images other than those of train.

The basic steps of training that we’re going to implement are as follows:

- The weights are initialized with random values at the beginning of the training.
- For each element of the training set is calculated the
*error*, that is, the difference between the desired output and the actual output. This error is used to adjust the weights - The process is repeated resubmitting to the network, in a random order, all the examples of the training set until the error made on the entire training set is not less than a certain threshold or until the number of iterations are over.

Let’s now see in detail how to implement the logistic regression with TensorFlow. The problem we want to solve is yet to classify images from the MNIST dataset.

### The TensorFlow implementation

First of all, we have to import all the necessary libraries:

```
import input_data
import tensorflow as tf
import matplotlib.pyplot as plt
```

We use the *input_data.read*function, to upload the images to our problem:

`mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)`

Then we set the total number of epochs for the training phase:

`training_epochs = 25`

Also we must define other parameters necessary for the model building:

```
learning_rate = 0.01
batch_size = 100
display_step = 1
```

Now we move to the construction of the model

### Building the model

Define *x* as the input tensor, it represents the MNIST data image of shape *28 x 28 = 784* pixels

`x = tf.placeholder("float", [None, 784])`

We recall that our problem consists in assigning a probability value for each of the possible classes of membership (the numbers from 0 to 9). At the end of this calculation, we will use a probability distribution, which gives us the value of what is confident with our prediction.

So the output we’re going to get will be an output tensor with *10* probabilities each one corresponding to a digit (of course the sum of probabilities must be one):

`y = tf.placeholder("float", [None, 10])`

To assign probabilities to each image, we will use the so-called softmax activation function.

The softmax function is specified in two main steps:

Calculate the *evidence* that a certain image belongs to a particular class.

Convert the evidence into *probabilities* of belonging to each of the 10 possible classes.

To evaluate the evidence, we first define the weights input tensor as*W*:

`W = tf.Variable(tf.zeros([784, 10]))`

For a given image, we could evaluate the evidence for each class *i*simply multiplying the tensor*W*with the input tensor*x*. Using TensorFlow, we should have something like this:

`evidence = tf.matmul(x, W)`

In general, the models include an extra parameter representing the bias that indicates a certain degree of uncertainty; in our case, the final formula for the evidence is:

`evidence = tf.matmul(x, W) + b`

It means that for every*i*(from 0 to 9) we have a*Wi*matrix elements*784 (28 × 28)*, where each element*j*of the matrix is multiplied by the correspondingcomponent*j*of the input image (784 parts) that is added and the corresponding bias element*bi*.

So to define the evidence, we must define the following tensor of biases:

`b = tf.Variable(tf.zeros([10]))`

The second step is finally to use the*softmax*function to obtain the output vector of probabilities, namely*activation*:

`activation = tf.nn.softmax(tf.matmul(x, W) + b)`

The TensorFlow’s function*tf.nn.softmax*provides a probability based output from the input evidence tensor. Once we implement the model, we can proceed to specify the necessary code to find the *W* weights and biases *b* network through the iterative training algorithm. In each iteration, the training algorithm takes the training data, applies the neural network, and compares the result with the expected.

In order to train our model and to know when we have a good one, we must know how to define the accuracy of our model. Our goal is to try to get valuesof parameters *W* and *b* that minimize the value of the metric that indicates how bad the model is.

Different metrics calculate the degree of error between the desired output and output of the training data. A common measure of error is the mean squared error or the *Squared Euclidean Distance*. However, there are some research findings that suggest to use other metrics to a neural network like this.

In this example, we use the so-called cross-entropy error function, it is defined as follows:

`cross_entropy = y*tf.lg(activation)`

In order to minimize the *cross_entropy*, we could use the following combination of *tf.reduce_mean* and *tf.reduce_sum* to build the *cost* function:

```
cost = tf.reduce_mean
(-tf.reduce_sum
(cross_entropy, reduction_indices=1))
```

Then we must minimize it using the gradient descent optimization algorithm:

```
optimizer = tf.train.GradientDescentOptimizer
(learning_rate).minimize(cost)
```

Few lines of code to build a neural network model!

### Launching the session

It’s the moment to build the session and launch our neural net model.We fix these lists to visualize the training session:

```
avg_set = []
epoch_set=[]
```

Then we initialize the TensorFlow variables:

`init = tf.initialize_all_variables()`

Start the session:

```
with tf.Session() as sess:
sess.run(init)
```

As explained, each epoch is a training cycle:

```
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
```

Then we loop over all batches:

```
for i in range(total_batch):
batch_xs, batch_ys =
mnist.train.next_batch(batch_size)
```

Fit training using batch data:

`sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})`

Compute the average loss running the *train_step* function with the given image values (*x*) and the real output (*y_*):

```
avg_cost += sess.run
(cost, feed_dict={x: batch_xs,
y: batch_ys})/total_batch
```

During the computation, we display a log per epoch step:

```
if epoch % display_step == 0:
print "Epoch:",
'%04d' % (epoch+1),
"cost=","{:.9f}".format(avg_cost)
print " Training phase finished"
```

Let’s get the accuracy of our mode.It is correct if the index with the highest *y* value is the same as in the real digit vector the mean of *correct_prediction* gives us the accuracy. We need to run the accuracy function with our test set (*mnist.test*).

We use the keys *images* and *labels*for *x* and *y_*:

```
correct_prediction = tf.equal
(tf.argmax(activation, 1),
tf.argmax(y, 1))
accuracy = tf.reduce_mean
(tf.cast(correct_prediction, "float"))
print "MODEL accuracy:", accuracy.eval({x: mnist.test.images,
y: mnist.test.labels})
```

### Test evaluation

We have seen the training phase in the preceding sections; for each epoch we have printed the relative *cost* function:

```
Python 2.7.10 (default, Oct 14 2015, 16:09:02)
[GCC 5.2.1 20151010] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ======================= RESTART ============================
>>>
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 cost= 1.174406662
Epoch: 0002 cost= 0.661956009
Epoch: 0003 cost= 0.550468774
Epoch: 0004 cost= 0.496588717
Epoch: 0005 cost= 0.463674555
Epoch: 0006 cost= 0.440907706
Epoch: 0007 cost= 0.423837747
Epoch: 0008 cost= 0.410590841
Epoch: 0009 cost= 0.399881751
Epoch: 0010 cost= 0.390916621
Epoch: 0011 cost= 0.383320325
Epoch: 0012 cost= 0.376767031
Epoch: 0013 cost= 0.371007620
Epoch: 0014 cost= 0.365922904
Epoch: 0015 cost= 0.361327561
Epoch: 0016 cost= 0.357258660
Epoch: 0017 cost= 0.353508228
Epoch: 0018 cost= 0.350164634
Epoch: 0019 cost= 0.347015593
Epoch: 0020 cost= 0.344140861
Epoch: 0021 cost= 0.341420144
Epoch: 0022 cost= 0.338980592
Epoch: 0023 cost= 0.336655581
Epoch: 0024 cost= 0.334488012
Epoch: 0025 cost= 0.332488823
Training phase finished
```

As wesaw, during the training phase, the cost function is minimized.At the end of the test, we show how accurately the model is implemented:

```
Model Accuracy: 0.9475
>>>
```

Finally, using these lines of code, we could visualize the the training phase of the net:

```
plt.plot(epoch_set,avg_set, 'o',
label='Logistic Regression Training phase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()
```

Training phase in logistic regression

# Summary

In this article, we learned the implementation of artificial neural networks, Single Layer Perceptron, TensorFlow. We also learned how to build the model and launch the session.