Meta-Learning
A machine learning paradigm in which models learn how to learn, acquiring inductive biases across a distribution of tasks so they can adapt rapidly to new tasks with minimal data.
Meta-learning, often described as "learning to learn", is a subfield of machine learning in which a model is trained across a distribution of tasks so that it can adapt to new, previously unseen tasks using only a small number of examples or gradient steps. Whereas conventional supervised learning treats each task in isolation and learns from scratch, meta-learning explicitly optimises for transferability: the outer training loop searches for parameters, representations, or learning rules that generalise well to held-out tasks.
Conceptual framework
A meta-learning problem is typically formalised over a distribution of tasks p(T). Each task T_i contains a small support set used for adaptation and a query set used to evaluate the adapted model. The meta-learner observes many such tasks during training and aims to minimise the expected query loss after adaptation. This bi-level structure — an inner loop that adapts to a specific task and an outer loop that updates the meta-parameters — is shared by most modern meta-learning algorithms.
Three broad families of approaches dominate the literature. Optimisation-based methods learn an initialisation or learning rule that enables rapid fine-tuning. Metric-based methods learn an embedding space in which simple nearest-neighbour rules suffice for classification. Model-based methods rely on architectures, often with external memory or recurrence, that internalise the adaptation procedure within the forward pass.
Model-Agnostic Meta-Learning (MAML)
The most influential optimisation-based algorithm is Model-Agnostic Meta-Learning (MAML), introduced by Chelsea Finn, Pieter Abbeel and Sergey Levine in 2017. MAML searches for an initial parameter vector such that, for any task drawn from p(T), a small number of gradient descent steps on the support set yields strong query-set performance. The outer-loop update therefore differentiates through the inner-loop gradient steps, requiring second-order derivatives in its exact form. First-order approximations such as FOMAML and Reptile drop the second-order term to reduce memory and compute cost while retaining most of the benefit.
Metric-based methods include Matching Networks, Prototypical Networks and Relation Networks. Prototypical Networks, for instance, compute a class prototype as the mean embedding of its support examples and classify query points by their distance to those prototypes. Model-based methods include Memory-Augmented Neural Networks and SNAIL, which combine temporal convolution with attention.
Relationship to large language models
Modern large language models exhibit meta-learning behaviour implicitly. Few-shot and in-context learning, in which a model conditions on a handful of demonstrations within its prompt and produces correct outputs without parameter updates, can be viewed as emergent meta-learning. The pre-training process exposes the model to such a wide distribution of tasks that the forward pass itself learns to act as an adaptive learner. This perspective links meta-learning to prompt engineering, retrieval-augmented generation and in-context fine-tuning.
Applications
Meta-learning is applied wherever labelled data per task is scarce. In computer vision it underpins few-shot image classification benchmarks such as miniImageNet and Meta-Dataset. In robotics it allows policies to adapt to new objects, terrains, or dynamics after only a handful of demonstrations. In drug discovery and medical imaging it enables models to generalise from one rare disease cohort to another. In natural language processing it supports cross-lingual transfer and domain adaptation.
Limitations
Meta-learning algorithms can be sensitive to the choice of task distribution and prone to memorisation when training tasks are too narrow. Second-order methods are memory intensive, and bi-level optimisation can be unstable. The community continues to debate the extent to which meta-learning offers benefits beyond well-tuned multi-task pre-training followed by standard fine-tuning.
References
- Finn, C., Abbeel, P. and Levine, S. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Proceedings of the 34th International Conference on Machine Learning.
- Snell, J., Swersky, K. and Zemel, R. (2017). Prototypical Networks for Few-shot Learning. Advances in Neural Information Processing Systems.
- Hospedales, T., Antoniou, A., Micaelli, P. and Storkey, A. (2021). Meta-Learning in Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Ministry of Science, Technology and Innovation Malaysia. (2021). National Artificial Intelligence Roadmap 2021–2025. MOSTI / MDEC.