AIWiki
Malaysia

Domain Adaptation

Domain adaptation is a machine learning technique that transfers a model trained on a labelled source domain to perform effectively on a related but distinct target domain with limited or no labelled target data, addressing distribution shift between domains.

7 min readLast updated June 2026Foundations

Domain adaptation is a machine learning approach that addresses the performance degradation that occurs when a model trained on data from one distribution (the source domain) is applied to data from a different but related distribution (the target domain). The underlying cause of this degradation is domain shift: differences in statistical properties between source and target data that cause the assumptions embedded in the trained model to be violated. Domain adaptation methods use unlabelled or minimally labelled target domain data to adapt a source-trained model to perform well in the target setting without requiring a fully labelled target dataset.

The Domain Shift Problem

In supervised learning, standard theory assumes that training and test data are drawn independently from the same underlying distribution. In practice, this assumption is frequently violated. A spam filter trained on one organisation's email exhibits different patterns from another's. A sentiment classifier trained on movie reviews does not perform equally well on product reviews. A medical image analysis model trained on data from one hospital generalises poorly to data from another hospital using different imaging equipment. These are all instances of domain shift.

Domain shift can manifest in several forms. Covariate shift occurs when the marginal distribution of inputs P(X) differs between domains while the conditional P(Y|X) remains the same — for example, the same object categories appearing under different lighting conditions. Label shift (or prior shift) occurs when the marginal distribution of labels P(Y) differs. Concept drift occurs when the relationship between inputs and labels P(Y|X) itself changes over time, which is discussed more fully in the context of continual learning.

Types of Domain Adaptation

Domain adaptation methods are categorised by the amount of labelled data available in the target domain.

Unsupervised domain adaptation (UDA) assumes labelled source data and unlabelled target data only. The model must adapt using the statistical structure of the unlabelled target data without any target labels. This is the most studied and practically relevant setting.

Semi-supervised domain adaptation assumes a small number of labelled target examples in addition to the unlabelled target data. Even a handful of labelled examples can significantly improve adaptation quality.

Few-shot domain adaptation is a related setting where very few labelled target examples (one to five) are available. It is closely related to few-shot learning, where the goal is to generalise to new settings from minimal examples.

Domain generalisation is a harder variant in which the model must generalise to target domains not seen during training at all. Unlike domain adaptation, it does not assume access to target domain data during training.

Methods

Feature Alignment

Feature alignment methods learn representations in which source and target domain data are statistically indistinguishable. If a shared feature extractor maps both domains to the same latent representation, a classifier trained on source features should generalise to target features. Alignment is achieved through several mechanisms.

Domain-adversarial training, introduced in the Domain-Adversarial Neural Network (DANN) framework by Ganin et al. (2016), trains a feature extractor simultaneously to fool a domain discriminator (which tries to predict whether features came from the source or target domain) and to be useful for the source task. The gradient reversal layer is a technical device that reverses gradients flowing from the domain discriminator to the feature extractor, implementing the adversarial objective without separate optimisation loops.

Maximum Mean Discrepancy (MMD) is a statistical measure of the distance between two distributions in a reproducing kernel Hilbert space. Deep Adaptation Network (DAN) minimises the MMD between source and target feature distributions across multiple layers of the network, encouraging domain-invariant representations.

Instance Reweighting

Instance reweighting approaches assign importance weights to source domain training examples based on their relevance to the target domain. Examples from the source domain that are more similar to the target distribution are up-weighted; dissimilar examples are down-weighted. The Importance Weighted Cross-Validation (IWCV) and Kernel Mean Matching (KMM) methods estimate these importance weights using kernel density ratio estimation.

Self-Training and Pseudo-Labels

Self-training adapts to the target domain by iteratively generating pseudo-labels for unlabelled target data using the current model's predictions and then retraining on these pseudo-labels alongside source data. High-confidence predictions are typically used first, with the threshold relaxed as the model improves. Mean Teacher and Noisy Student are variants that use an exponential moving average of model parameters to generate more stable pseudo-labels.

Prompt-Based Adaptation for Language Models

For large pre-trained language models, domain adaptation is increasingly performed through prompt engineering and soft prompt tuning rather than full fine-tuning. Prefix tuning, LoRA, and domain-specific retrieval-augmented generation are efficient adaptation strategies that modify only a small fraction of model parameters while achieving strong domain-specific performance.

| Method | Labelled Target Data | Approach | Best For | |---|---|---|---| | DANN | None | Adversarial feature alignment | Visual domains | | MMD/DAN | None | Statistical alignment | Feature-rich domains | | Self-training | None | Pseudo-labelling | High-confidence settings | | Fine-tuning | Few/many | Gradient updates | NLP, medium-size models | | Prompt tuning | Few | Soft prompt optimisation | Large language models |

Applications

Domain adaptation has broad practical applications. In medical imaging, models trained on data from well-resourced hospitals adapt to data from smaller clinics using different scanners. In NLP, sentiment and classification models trained on English adapt to other languages or dialects. In autonomous driving, models trained in sunny California adapt to rainy conditions in different geographies. In robotics, manipulation policies trained in simulation adapt to real-world physical environments.

See Also

References

  1. Ganin, Y., Ustinova, E., Ajakan, H., et al. (2016). Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research, 17(59), 1-35.
  2. Pan, S. J., and Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345-1359.
  3. Long, M., Cao, Y., Wang, J., and Jordan, M. I. (2015). Learning Transferable Features with Deep Adaptation Networks. Proceedings of the 32nd International Conference on Machine Learning (ICML).
  4. Wilson, G., and Cook, D. J. (2020). A Survey of Unsupervised Deep Domain Adaptation. ACM Transactions on Intelligent Systems and Technology, 11(5), 1-46.
  5. Ministry of Health Malaysia. (2024). AI in Healthcare Malaysia: Pilot Programme Report. Ministry of Health Malaysia.