7 min read

Machine learning has gained a lot of traction over the years because of the predictive solutions that it provides, including the development of intelligent, and reliable models. However, training the models is a laborious task because it takes time to curate the labeled data within the model and then to get the model ready. Reducing the time involved in training and labeling can be overcome by using the novel approach of Transfer Learning – a smarter and effective form of machine learning, where you can use the learnings of one scenario and apply that learning to a different but related problem.

How exactly does Transfer Learning work?

Transfer learning reduces the efforts to build a model from scratch by using the fundamental logic or base algorithms within one domain and applying it to another. For instance, in the real-world, the balancing logic learned while riding a bicycle can be transferred to learn driving other two-wheeled vehicles. Similarly, in the case of machine learning, transfer learning can be used to transfer the algorithmic logic from one ML model to the other.

Let’s look into some of the possible use cases of transfer learning.


Real-world Simulations

Digital simulation is better than creating a physical prototype for real-world implementations. Training a robot in the real-world surroundings is both time and cost consuming. In order to minimize this, robots can now be trained using simulation and the knowledge acquired can be thus transferred onto a real-world robot. This is done using progressive networks, which are ideal for a simulation to the real world transfer of policies in robot control domains. These networks consist of essential features for learning numerous tasks in sequence while enabling transfer and are resistant to catastrophic forgetting–a tendency of Artificial Neural Networks(ANNs) to completely forget previously learned information, on learning a new information.


Another application of simulation can be seen while training self-driving cars, which are trained using simulations through video games. Udacity has open sourced its self-driving car simulator which allows training self-driving cars through GTA 5 and many other video games. However, not all features of a simulation are replicated successfully when they are brought into the real world, as the interactions in the real world are more complex.  



The adoption of Artificial Intelligence has taken gaming to an altogether new level. DeepMind’s neural network program AlphaGo is a testament to this, as it successfully defeated a professional Go player. AlphaGo is a master in Go but fails when tasked to play other games. This is because its algorithm is tailored to play Go. So, the disadvantage of using ANNs in gaming is that they cannot master all games as a human brain does. In order to do this, AlphaGo has to totally forget Go and adapt itself to the new algorithms and techniques of the new game. With transfer Learning, the tactics learned in a game can be reapplied to play another game.


An example of how Transfer learning is implemented in gaming can be seen in MadRTS, a commercial Real Time Strategy games. MadRTS, is developed to carry out military simulations. MadRTS uses CARL(CAse-based Reinforcement Learner), a multi-tiered architecture which combines Case-based reasoning(CBR) and Reinforcement Learning(RL). CBR provides an approach to tackle unseen but related problems based on past experiences within each level of the game. RL algorithms, on the other hand, allow the model to carry out good approximations to a situation, based on the agent’s experience in its environment–also known as Markov’s Decision Process. These CBR/RL transfer learning agents are evaluated in order to perform effective learning on tasks given in MadRTS, and should be able to learn better across tasks by transferring experience.


Image Classification

Neural networks are experts in recognizing objects within an image as they are trained on huge datasets of labeled images, which is time-consuming. How transfer learning helps here is, it reduces the time to train the model by pre-training the model using ImageNet, which contains millions of images from different categories.

Let’s assume that a convolutional neural network – for instance, a VGG-16 ConvNet – has to be trained to recognize images within a dataset. Firstly, it is pre-trained using ImageNet. Then, it is trained layer-wise starting by replacing the final layer with a softmax layer and training it until the training saturates. Further, the other dense layers are trained progressively. By the end of the training, the ConvNet model is successful in learning to detect images from the dataset provided. In cases where the dataset is not similar to the pre-trained model data, one can finetune weights in the higher layers of the ConvNet by backpropagation methods. The dense layers contain the logic for detecting the image, thus, tuning the higher layers won’t affect the base logic. The convolutional neural networks can be trained on Keras, using Tensorflow or as a backend.

An example of Image Classification can be seen in the field of medical imaging, where the convolutional model is trained on ImageNet to solve kidney detection problem in ultrasound images.


Zero Shot translation

Zero shot translation is an extended part of supervised learning, where the goal of the model is, learning to predict novel values from values that are not present in the training dataset. The prominent working example of zero shot translation can be seen in Google’s Neural Translation model(GNMT), which allows for effective cross-lingual translations.

Prior to Zero shot implementation, two discrete languages had to be translated using a pivot language. For instance, to translate Korean to Japanese, Korean had to be first translated into English and then English to Japanese. Here, English is the pivot language that acts as a medium to translate Korean to Japanese. This resulted in a translated language that was full of distortions created by the first language pair.

Zero shot translation rips off the need for a pivot language. It uses available training data to learn the translational knowledge applied, to translate a new language pair. Another instance of Zero shot translation can be seen in Image2Emoji, which combines visuals and texts to predict unseen emoji icons in a zero shot approach.


Sentiment Classification

Businesses can know their customers better by implementing Sentiment Analysis, which helps them to understand emotions and polarity (negative or positive) underlying the feedback and the product reviews. Analyzing sentiments for a new text corpus is difficult to build up, as training the models to detect different emotions is difficult. A solution to this is Transfer Learning.

This involves training the models on any one domain, twitter feeds for instance, and fine-tuning them to another domain you wish to perform Sentiment Analysis on; say movie reviews. Here, deep learning models are trained on twitter feeds by carrying out sentiment analysis of the text corpus and also detecting the polarity of each statement.

Once the model is trained on understanding emotions through polarity of the twitter feeds, its underlying language model and learned representation is transferred onto the model assigned a task to analyze sentiments within movie reviews. Here, an RNN model is trained on logistic regression techniques carried out sentiment analysis on the twitter feeds. The word embeddings and the recurrent weights learned from the source domain (twitter feeds) are re-used in the target domain (movie reviews) to classify sentiments within the latter domain.


Transfer learning has brought in a new wave of learning in machines by reusing algorithms and the applied logic, thus speeding up their learning process. This directly results in a reduction in the capital investment and also the time invested to train a model. This is why many organizations are looking forward to replicating such a learning onto their machine learning models. Also, transfer learning has been carried out successfully in the field of Image processing, Simulations, Gaming, and so on. How transfer learning affects the learning curve of machines in other sectors in the future, is worth watching out for.

A Data science fanatic. Loves to be updated with the tech happenings around the globe. Loves singing and composing songs. Believes in putting the art in smart.


Please enter your comment!
Please enter your name here