AIWiki
Malaysia

Transfer Learning

Transfer learning is a machine learning technique in which a model pre-trained on one task or dataset is adapted for a different but related task, enabling high performance with significantly less data and compute than training from scratch.

6 min readLast updated May 2026Foundations

Transfer learning is a training paradigm in machine learning where a model that has been pre-trained on a large, general-purpose dataset is subsequently adapted to perform well on a different but related task, typically with a smaller specialised dataset. Rather than initialising all model parameters randomly and learning from scratch, transfer learning starts from representations already acquired during pre-training, giving the model a head start that reduces both the volume of task-specific labelled data required and the total training time and compute expenditure. Transfer learning has become the dominant paradigm for applied machine learning in both computer vision and natural language processing, underpinning most modern AI products and research systems.

Conceptual Origins

The intuition underlying transfer learning draws on the observation that knowledge acquired in one context is often applicable in another. In humans, a person who knows one Romance language learns subsequent Romance languages far more quickly because of shared grammar, vocabulary, and structure. In neural networks, features learned to distinguish cats from dogs — curves, textures, part-whole relationships — are relevant to distinguishing other visual categories. This analogy motivated early work in transfer learning for neural networks, which demonstrated in the 2010s that features learned by CNNs on ImageNet transferred effectively to other image classification tasks with far less target-domain data.

The development of BERT (Bidirectional Encoder Representations from Transformers) in 2018 extended transfer learning decisively into natural language processing. Pre-training a transformer on large text corpora using self-supervised objectives produced representations so general that fine-tuning on a wide range of downstream NLP tasks — sentiment analysis, named entity recognition, question answering — consistently outperformed task-specific models trained from scratch. GPT and its successors demonstrated that auto-regressive language modelling pre-training transfers even more broadly, powering open-ended generation and in-context learning.

How Transfer Learning Works

Transfer learning involves two distinct phases. In the pre-training phase, a model is trained on a large, diverse dataset — ImageNet for vision, a multi-hundred-billion-token text corpus for language — using an objective that encourages the model to learn rich, generalisable representations. In the fine-tuning phase, the pre-trained model is updated on a smaller target dataset using a task-specific objective, adjusting the representations to the demands of the new domain.

The degree to which pre-trained weights are updated during fine-tuning varies. In feature extraction (sometimes called linear probing), all pre-trained layers are frozen and only a new output head is trained on top of the fixed representations; this requires very little target-domain data but cannot adapt the model's internal representations to domain-specific patterns. Full fine-tuning updates all layers and achieves the best task performance but requires more data and compute. Intermediate approaches — freezing early layers and fine-tuning later ones — balance the two extremes.

Domain Adaptation and Negative Transfer

When the source domain (where the model was pre-trained) and the target domain (the application task) are very similar, transfer is highly effective. As the domains diverge, the benefit of transfer diminishes. When domains are sufficiently different, transfer learning can occasionally hurt performance compared to random initialisation — a phenomenon called negative transfer. In practice, negative transfer is rare with large pre-trained models because the scale and diversity of pre-training data tends to produce representations that generalise broadly.

Domain adaptation methods specifically address the case where labelled target-domain data is scarce. Continued pre-training on unlabelled target-domain text before fine-tuning often improves downstream performance on specialised tasks such as biomedical NLP (PubMedBERT), legal text analysis (Legal-BERT), or financial document understanding.

Applications

In computer vision, transfer learning enables the rapid deployment of image classification, object detection, and medical image analysis systems with as few as hundreds of labelled examples, by fine-tuning models pre-trained on ImageNet. ResNet, EfficientNet, and Vision Transformer (ViT) checkpoints are the standard starting points.

In NLP, virtually all production language models — including the Claude, GPT, and Gemini families — rely on transfer learning at their core: a general pre-trained model is fine-tuned for instruction following or specific domains using RLHF, supervised fine-tuning, or parameter-efficient methods such as LoRA. Speech recognition systems transfer acoustic representations trained on large multilingual corpora to low-resource language models.

In tabular and structured data domains, transfer learning is less mature but growing: models pre-trained on diverse tabular datasets have been shown to transfer effectively to downstream prediction tasks.

References

  1. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
  2. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT 2019.
  3. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?. NeurIPS 2014.
  4. Gururangan, S., Marasović, A., Swayamdipta, S., et al. (2020). Don't stop pretraining: Adapt language models to domains and tasks. ACL 2020.
  5. IBM. (2025). What is transfer learning?. IBM Think. https://www.ibm.com/think/topics/transfer-learning
  6. Malaysia Digital Economy Corporation. (2024). Malaysia AI Governance Framework. MDEC, Putrajaya.