5 min read

What is a convolutional neural network, exactly?

Well, let’s start with the basics: a convolutional neural network (CNN) is a type of neural network that is most often applied to image processing problems. You’ve probably seen them in action anywhere a computer is identifying objects in an image. But you can also use convolutional neural networks in natural language processing projects, too.

The fact that they are useful for these fast growing areas is one of the main reasons they’re so important in deep learning and artificial intelligence today.

What makes a convolutional neural network unique?

Once you understand how a convolutional neural network works and what makes it unique from other neural networks, you can see why they’re so effective for processing and classifying images.

But let’s first take a regular neural network. A regular neural network has an input layer, hidden layers and an output layer. The input layer accepts inputs in different forms, while the hidden layers perform calculations on these inputs. The output layer then delivers the outcome of the calculations and extractions. Each of these layers contains neurons that are connected to neurons in the previous layer, and each neuron has its own weight.

This means you aren’t making any assumptions about the data being fed into the network – great usually, but not if you’re working with images or language.

Convolutional neural networks work differently as they treat data as spatial. Instead of neurons being connected to every neuron in the previous layer, they are instead only connected to neurons close to it and all have the same weight. This simplification in the connections means the network upholds the spatial aspect of the data set. It means your network doesn’t think an eye is all over the image.

The word ‘convolutional’ refers to the filtering process that happens in this type of network. Think of it this way, an image is complex – a convolutional neural network simplifies it so it can be better processed and ‘understood.’

What’s inside a convolutional neural network?

Like a normal neural network, a convolutional neural network is made up of multiple layers.

There are a couple of layers that make it unique – the convolutional layer and the pooling layer. However, like other neural networks, it will also have a ReLu or rectified linear unit layer, and a fully connected layer. The ReLu layer acts as an activation function, ensuring non-linearity as the data moves through each layer in the network – without it, the data being fed into each layer would lose the dimensionality that we want to maintain. The fully connected layer, meanwhile, allows you to perform classification on your dataset.

The convolutional layer

The convolutional layer is the most important, so let’s start there. It works by placing a filter over an array of image pixels – this then creates what’s called a convolved feature map. “It’s a bit like looking at an image through a window which allows you to identify specific features you might not otherwise be able to see.

The pooling layer

Next we have the pooling layer – this downsamples or reduces the sample size of a particular feature map. This also makes processing much faster as it reduces the number of parameters the network needs to process. The output of this is a pooled feature map. There are two ways of doing this, max pooling, which takes the maximum input of a particular convolved feature, or average pooling, which simply takes the average. These steps amount to feature extraction, whereby the network builds up a picture of the image data according to its own mathematical rules.

If you want to perform classification, you’ll need to move into the fully connected layer. To do this, you’ll need to flatten things out – remember, a neural network with a more complex set of connections can only process linear data.

How to train a convolutional neural network

There are a number of ways you can train a convolutional neural network.

If you’re working with unlabelled data, you can use unsupervised learning methods. One of the best popular ways of doing this is using auto-encoders – this allows you to squeeze data in a space with low dimensions, performing calculations in the first part of the convolutional neural network. Once this is done you’ll then need to reconstruct with additional layers that upsample the data you have.

Another option is to use generative adversarial networks, or GANs. With a GAN, you train two networks. The first gives you artificial data samples that should resemble data in the training set, while the second is a ‘discriminative network’ – it should distinguish between the artificial and the ‘true’ model.

What’s the difference between a convolutional neural network and a recurrent neural network?

Although there’s a lot of confusion about the difference between a convolutional neural network and a recurrent neural network, it’s actually more simple than many people realise.

Whereas a convolutional neural network is a feedforward network that filters spatial data, a recurrent neural network, as the name implies, feeds data back into itself. From this perspective recurrent neural networks are better suited to sequential data.

Think of it like this: a convolutional network is able to perceive patterns across space – a recurrent neural network can see them over time.

How to get started with convolutional neural networks

If you want to get started with convolutional neural networks Python and TensorFlow are great tools to begin with. It’s worth exploring MNIST dataset too. This is a database of handwritten digits that you can use to get started with building your first convolutional neural network.

To learn more about convolutional neural networks, artificial intelligence, and deep learning, visit Packt’s store for eBooks and videos.

Co-editor of the Packt Hub. Interested in politics, tech culture, and how software and business are changing each other.