9 min read

If you are a deep learning practitioner or someone who wants to get into the world of deep learning, you might be well acquainted with neural networks already. Neural networks, inspired by biological neural networks, are pretty useful when it comes to solving complex, multi-layered computational problems. Deep learning has stood out pretty well in several high-profile research fields – including facial and speech recognition, natural language processing, machine translation, and more.

In this article, we look at the top 5 popular and widely-used deep learning architectures you should know in order to advance your knowledge or deep learning research.

Convolutional Neural Networks

Convolutional Neural Networks, or CNNs in short, are the popular choice of neural networks for different Computer Vision tasks such as image recognition. The name ‘convolution’ is derived from a mathematical operation involving the convolution of different functions.

There are 4 primary steps or stages in designing a CNN:

  • Convolution: The input signal is received at this stage
  • Subsampling: Inputs received from the convolution layer are smoothened to reduce the sensitivity of the filters to noise or any other variation
  • Activation: This layer controls how the signal flows from one layer to the other, similar to the neurons in our brain
  • Fully connected: In this stage, all the layers of the network are connected with every neuron from a preceding layer to the neurons from the subsequent layer

Here is an in-depth look at the CNN Architecture and its working, as explained by the popular AI Researcher Giancarlo Zaccone.

sample CNN in action

A sample CNN in action

Advantages of CNN

  • Very good for visual recognition
  • Once a segment within a particular sector of an image is learned, the CNN can recognize that segment present anywhere else in the image

Disadvantages of CNN

  • CNN is highly dependent on the size and quality of the training data
  • Highly susceptible to noise

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) have been very popular in areas where the sequence in which the information is presented is crucial. As a result, they find a lot applications in real-world domains such as natural language processing, speech synthesis and machine translation.

RNNs are called ‘recurrent’ mainly because a uniform task is performed for every single element of a sequence, with the output dependant on the previous computations as well. Think of these networks as having a memory, where every calculated information is captured, stored and utilized to calculate the final outcome.

Over the years, quite a few varieties of RNNs have been researched and developed:

    • Bidirectional RNN – The output in this type of RNN depends not only on the past but also the future outcomes
    • Deep RNN – In this type of RNN, there are multiple layers present per step, allowing for a greater rate of learning and more accuracy

RNNs can be used to build industry-standard chatbots that can be used to interact with customers on websites. Given a sequence of signals from an audio wave, RNNs can also be used to predict a correct sequence of phonetic segments with a given probability.

Simplest representation of how RNN works

Advantages of RNN

  • Unlike a traditional neural network, an RNN shares the same parameters across all steps. This greatly reduces the number of parameters that we need to learn
  • RNNs can be used along with CNNs to generate accurate descriptions for unlabeled images.

Disadvantages of RNN

  • RNNs find it difficult to track long-term dependencies. This is especially true in case of long sentences and paragraphs having too many words in between the noun and the verb.
  • RNNs cannot be stacked into very deep models. This is due to the activation function used in RNN models, making the gradient decay over multiple layers.


Autoencoders apply the principle of backpropagation in an unsupervised environment. Autoencoders, interestingly, have a close resemblance to PCA (Principal Component Analysis) except that they are more flexible.

Some of the popular applications of Autoencoders is anomaly detection – for example detecting fraud in financial transactions in banks. Basically, the core task of autoencoders is to identify and determine what constitutes regular, normal data and then identify the outliers or anomalies.

Autoencoders usually represent data through multiple hidden layers such that the output signal is as close to the input signal.

There are 4 major types of autoencoders being used today:

  • Vanilla autoencoder – the simplest form of autoencoders there is, i.e. a neural net with one hidden layer
  • Multilayer autoencoder – when one hidden layer is not enough, an autoencoder can be extended to include more hidden layers
  • Convolutional autoencoder – In this type, convolutions are used in the autoencoders instead of fully-connected layers
  • Regularized autoencoder – this type of autoencoders use a special loss function that enables the model to have properties beyond the basic ability to copy a given input to the output.

This article demonstrates training an autoencoder using H20, a popular machine learning and AI platform.

basic representation of Autoencoder

A basic representation of Autoencoder

Advantages of Autoencoders

  • Autoencoders give a resultant model which is primarily based on the data rather than predefined filters
  • Very less complexity means it’s easier to train them

Disadvantages of Autoencoders

  • Training time can be very high sometimes
  • If the training data is not representative of the testing data, then the information that comes out of the model can be obscured and unclear
  • Some autoencoders, especially of the variational type, cause a deterministic bias being introduced in the model

Generative Adversarial Networks

The basic premise of Generative Adversarial Networks (GANs) is the training of two deep learning models simultaneously. These deep learning networks basically compete with each other – one model that tries to generate new instances or examples is called as the generator. The other model that tries to classify if a particular instance originates from the training data or from the generator is called as the discriminator.

GANs, a breakthrough recently in the field of deep learning,  was a concept put forth by the popular deep learning expert Ian Goodfellow in 2014. It finds large and important applications in Computer Vision, especially image generation.

Read more about the structure and the functionality of the GAN from the official paper submitted by Ian Goodfellow.

General architecture of GAN

General architecture of GAN (Source: deeplearning4j)

Advantages of GAN

  • Per Goodfellow, GANs allow for efficient training of classifiers in a semi-supervised manner
  • Because of the improved accuracy of the model, the generated data is almost indistinguishable from the original data
  • GANs do not introduce any deterministic bias unlike variational autoencoders

Disadvantages of GAN

  • Generator and discriminator working efficiently is crucial to the success of GAN. The whole system fails even if one of them fails
  • Both the generator and discriminator are separate systems and trained with different loss functions. Hence the time required to train the entire system can get quite high.

Interested to know more about GANs? Here’s what you need to know about them.


Ever since they gained popularity in 2015, ResNets or Deep Residual Networks have been widely adopted and used by many data scientists and AI researchers.

As you already know, CNNs are highly useful when it comes to solving image classification and visual recognition problems. As these tasks become more complex, training of the neural network starts to get a lot more difficult, as additional deep layers are required to compute and enhance the accuracy of the model. Residual learning is a concept designed to tackle this very problem, and the resultant architecture is popularly known as a ResNet.

A ResNet consists of a number of residual modules – where each module represents a layer. Each layer consists of a set of functions to be performed on the input. The depth of a ResNet can vary greatly – the one developed by Microsoft researchers for an image classification problem had 152 layers!

A basic building block of ResNet (Source: Quora)

A basic building block of ResNet (Source: Quora)

Advantages of ResNets

  • ResNets are more accurate and require less weights than LSTMs and RNNs in some cases
  • They are highly modular. Hundreds and thousands of residual layers can be added to create a network and then trained.
  • ResNets can be designed to determine how deep a particular network needs to be.

Disadvantages of ResNets

  • If the layers in a ResNet are too deep, errors can be hard to detect and cannot be propagated back quickly and correctly. At the same time, if the layers are too narrow, the learning might not be very efficient.

Apart from the ones above, a few more deep learning models are being increasingly adopted and preferred by data scientists. These definitely deserve a honorable mention:

  • LSTM: LSTMs are a special kind of Recurrent Neural Networks that include a special memory cell that can hold information for long periods of time. A set of gates is used to determine when a particular information enters the memory and when it is forgotten.
  • SqueezeNet: One of the newer but very powerful deep learning architectures which is extremely efficient for low bandwidth platforms such as mobile.
  • CapsNet: CapsNet, or Capsule Networks, is a recent breakthrough in the field of Deep Learning and neural network modeling. Mainly used for accurate image recognition tasks, and is an advanced variation of the CNNs.
  • SegNet: A popular deep learning architecture especially used to solve the image segmentation problem.
  • Seq2Seq: An upcoming deep learning architecture being increasingly used for machine translation and building efficient chatbots

So there you have it! Thanks to the intense efforts in research in deep learning and AI, we now have a variety of deep learning models at our disposal to solve a variety of problems – both functional and computational. What’s even better is that we have the liberty to choose the most appropriate deep learning architecture based on the problem at hand.

Editor’s Tip: It is very important to know the best deep learning frameworks you can use to train your models. Here are the top 10 deep learning frameworks for you.

In contrast to the traditional programming approach where we tell the computer what to do, the deep learning models figure out the problem and devise the most appropriate solution on their own – however complex the problem may be. No wonder these deep learning architectures are being researched on and deployed on a large scale by the major market players such as Google, Facebook, Microsoft and many others.

Read Next

Packt Explains… Deep Learning in 90 seconds

Behind the scenes: Deep learning evolution and core concepts

Facelifting NLP with Deep Learning


Subscribe to the weekly Packt Hub newsletter. We'll send you the results of our AI Now Survey, featuring data and insights from across the tech landscape.

* indicates required


Please enter your comment!
Please enter your name here