Don't miss an insight. Subscribe to Techopedia for free.


How Does Transfer Learning Benefit AI models in Leveraging Knowledge?


Training AI models and collecting data can be complex and challenging due to regulatory compliance, cost and ethical concerns. Transfer learning is an effective way of training AI models to recognize objects and perform actions as evidenced by various case studies. Nevertheless, there are limitations that can impede its progress, which are discussed in this article.

Artificial Intelligence (AI) and Machine Learning (ML) are demanding technologies since training AI models and collecting vast volumes of data regularly are complex tasks. Training relies on data collection but access to large, new datasets is difficult because of legal compliance, costs, and ethical concerns. Transfer learning is a technique that is being used to address this issue, by allowing a pre-trained model to be used as a starting point for training a new model


What is transfer learning?

Transfer learning is akin to a master chef that leverages their pre-existing culinary expertise and techniques of previous dishes to create new dishes. It trains an AI model on a data set so that the AI model can recognize the objects in the data set based on their characteristics and patterns. When the same AI model is deployed to recognize a different but related thing, the amount of training and data that the AI model would need to integrate the second object would be significantly less – reducing costs in the process.

How does transfer learning work?

The transfer learning for AI models is a standard and well-defined process.

  • An AI model is trained on a large dataset with the objective of enabling the AI model to recognize the elements in the dataset. An example of a large dataset could be ImageNet, which is a free-to-use database of labeled images.  The AI model then acquires knowledge of the objects in the dataset and uses it to recognize different but related objects.
  • The AI model has different layers performing different tasks. For example, one layer could recognize the shape of an object, for example, a bird while another layer could recognize the body parts of the bird, such as wings, beaks, and eyes. In the process, the AI model acquires significant knowledge about birds. This phase is known as pre-training.
  • The AI model is then deployed on recognizing a different but related task. For example, if during the pre-training phase, the AI model learned to recognize smaller birds like sparrows and kingfishers, now, the AI model could be deployed to recognize bigger birds like a vulture. This phase is known as fine-tuning.
  • During the fine-tuning phase, the layers used during the pre-training phase are usually frozen or lightly trained and the other layers are extensively trained on the new datasets. Thus, the AI model combines the knowledge from the pre-training phase and the knowledge gained during the fine-tuning phase to identify objects accurately.

Case study

Transfer learning has enabled doctors to identify diseases accurately and fast in at least one case. In 2017, researchers at Stanford University used transfer learning for better imaging and diagnosis. They developed a deep learning model called CheXNet that studied over 100,000 images of chest X-rays so that it could detect various diseases in the chest. The images were sent from the National Institutes of Health (NIH) Clinical Center.

CheXNet levered a pre-trained AI model known as DenseNet that enabled it to accelerate its learning of various imaging and identify diseases. CheXNet is enabled to identify accurately 14 different pathologies, including lung nodules, pneumonia, and pneumothorax. The researchers, however, wanted to stringently verify that the findings of the CheXNet were accurate and could be used for treatment. Therefore they compared the findings of the CheXNet on a sample of 420 images of the chest X-ray with the findings and observations of expert and experienced radiologists. They found that ChestNet’s findings closely matched those of the expert radiologists and thus were considered a significant advancement.

Limitations of transfer learning

Transfer learning is a great way to train AI models because it optimizes time and expedites learning. However, there are certain limitations in the process that are described below.



Overfitting is a case of the pre-training model getting too deeply acquainted with a specific type of dataset. Think of this as the model getting deeply specialized to the extent that the later models that leverage the pre-training model find it difficult to recognize similar but different objects. It’s like the case of the pre-training model recognizes all types of small birds but cannot transfer the ability to the later models to recognize the bigger birds based on the fundamental characteristics of birds.

Negative transfer

Negative transfer is a case of differences in the data sets of the pre-training AI model and the succeeding models that leverage the pre-training AI model. Differences in datasets can jeopardize the purpose of training the AI models. All models must be given datasets that match closely enough or the development team or engineers have a logical set of parameters to ensure that the datasets match. Else, the likelihood of the later models identifying something different from that of the pre-training AI model is extremely high. This creates an overhead for the engineering and development team.


Transfer learning is an efficient way to train AI models to recognize things and perform actions, as many case studies corroborate. However, the limitations can seriously hinder its progress. The engineers need to overcome the limitation of overfitting else, the succeeding AI models will not succeed in identifying objects. Similarly, organizations need a cohesive policy and checklist of parameters that ensure that the pre-training model and all succeeding data models have uniform datasets to work on. That is going to be a challenge, though – the task of consolidating data.


Kaushik is a technical architect and software consultant, having over 23 years of experience in software analysis, development, architecture, design, testing and training industry. He has an interest in new technology and innovation areas. He focuses on web architecture, web technologies, Java/J2EE, open source, WebRTC, big data and semantic technologies. He has demonstrated his expertise in requirement analysis, architecture design & implementation, technical use case preparation, and software development. His experience has spanned different domains like insurance, banking, airlines, shipping, document management and product development, etc. He has worked with a wide variety of technologies starting from mainframe (IBM S/390),…