Dr. Brandon explains 'Transfer Learning' to Jon

[box type="shadow" align="" class="" width=""]Dr. Brandon: Hello and welcome to another episode of 'Date with Data Science'. Today we are going to talk about a topic that is all the rage these days in the data science community: Transfer Learning.

Jon: 'Transfer learning' sounds all sci-fi to me. Is it like the thing that Prof. X does in X-men reading other people's minds using that dome-like headset thing in his chamber?

Dr. Brandon: If we are going to get X-men involved, what Prof. X does is closer to deep learning. We will talk about that another time. Transfer learning is simpler to explain. It's what you actually do everytime you get into some character, Jon.

Say, you are given the role of Jack Sparrow to play. You will probably read a lot about pirates, watch a lot of pirate movies and even Jonny Depp in character and form your own version of Jack Sparrow. Now after that acting assignment is over, say you are given the opportunity to audition for the role of Captain Hook, the famous pirate from Peter Pan. You won't do your research from ground zero this time. You will retain general mannerisms of a Pirate you learned from your previous role, but will only learn the nuances of Captain Hook, like acting one-handed.

Jon: That's pretty cool! So you say machines can also learn this way?

Dr.Brandon: Of course, that's what transfer learning is all about: learn something, abstract the learning sufficiently, then apply it to another related problem. The following is an excerpt from a book by Kuntal Ganguly titled Learning Generative Adversarial Networks.[/box]

Pre-trained models are not optimized for tackling user specific datasets, but they are extremely useful for the task at hand that has similarity with the trained model task.

For example, a popular model, InceptionV3, is optimized for classifying images on a broad set of 1000 categories, but our domain might be to classify some dog breeds. A well-known technique used in deep learning that adapts an existing trained model for a similar task to the task at hand is known as Transfer Learning.

And this is why Transfer Learning has gained a lot of popularity among deep learning practitioners and in recent years has become the go-to technique in many real-life use cases. It is all about transferring knowledge (or features) among related domain.

Purpose of Transfer Learning

Let say you have trained a deep neural network to differentiate between fresh mango and rotten mango. During training, the network requires thousands of rotten and fresh mango images and hours of training to learn knowledge like if any fruit is rotten, a liquid will ooze out of the fruit and it produce a bad odor. Now with this training experience the network, can be used for different task/use-case to differentiate between a rotten apple and fresh apple using the knowledge of rotten features learned during training of mango images.

The general approach of Transfer Learning is to train a base network and then copy its first n layers to the first n layers of a target network. The remaining layers of the target network are initialized randomly and trained toward the targeted use-case.

The main scenarios for using Transfer Learning in your deep learning workflow are as follows:

Smaller datasets: When you have a smaller dataset, building a deep learning model from scratch won't work well. Transfer Learning provides the way to apply a pre-trained model to new classes of data. Let's say a pre-trained model built from one million images of ImageNet data will converge to a decent solution (after training on just a fraction of the available smaller training data, for example, CIFAR-10) compared to a deep learning model built with a smaller dataset from scratch.
Less resource: Deep learning process (such as convolution) requires a significant amount of resource and time. Deep learning process are well suited to run on high graded GPU-based machines. But with pre-trained models, you can easily train across a full training set (let's say 50000 images) in less than a minute using your laptop/notebook without GPU, since the majority of time a model is modified in the final layer with a simple update of just a classifier or regressor.

Various approaches of using pre-trained models

Using pre-trained architecture: Instead of transferring weights of the trained model, we can only use the architecture and initialize our own random weights to our new dataset.
Feature extractor: A pre-trained model can be used as a feature extraction mechanism just by simply removing the output layer of the network (that gives the probabilities for being in each of the n classes) and then freezing all the previous layers of the network as a fixed feature extractor for the new dataset.
Partially freezing the network: Instead of replacing only the final layer and extracting features from all previous layers, sometime we might train our new model partially (that is, to keep the weights of initial layers of the network frozen while retraining only the higher layers). Choice of the number of frozen layers can be considered as one more hyper-parameter.

Next, read about how transfer learning is being used in the real world.

If you enjoyed the above excerpt, do check out the book it is from.