In-Context Learning
In-context learning is the ability of large language models to perform new tasks by conditioning on examples or instructions provided within the input prompt, without updating model weights.
In-context learning (ICL) is the capacity of a large language model to perform novel tasks by observing demonstrations or instructions placed directly in the input context — without any modification to the model's parameters. The technique was prominently demonstrated by OpenAI's GPT-3 in 2020 and has since become one of the most consequential and widely studied phenomena in modern natural language processing. It enables practitioners to adapt a general-purpose model to specialised tasks through prompt design alone, reducing or eliminating the need for labelled training data and dedicated fine-tuning pipelines.
How In-Context Learning Works
A language model is trained to predict the next token in a sequence. When presented with an input that includes several input-output examples of a task — for instance, several English sentences followed by their French translations — the model uses its learned representations of language patterns to infer the task structure and generate an appropriate output for the final, unlabelled input. No gradient update occurs; the model's weights remain fixed. The "learning" happens through the model's attention mechanism, which identifies statistical regularities in the provided demonstrations and conditions its output accordingly.
This contrasts with conventional supervised machine learning, where adapting a model to a new task requires a dataset, a training loop, and a new set of weights. In-context learning collapses that pipeline into a prompt.
Variants by Example Count
In-context learning encompasses several operating modes distinguished by how many examples are included in the prompt.
Zero-shot learning provides no examples at all, relying on a natural-language instruction to specify the task. A prompt such as "Translate the following sentence from English to Bahasa Malaysia:" followed by the input sentence is a zero-shot ICL scenario.
One-shot learning provides a single example before the target input, giving the model one demonstration of the desired input-output mapping.
Few-shot learning provides several examples — typically between two and thirty, limited by the model's context window — before the target input. GPT-3's few-shot performance on tasks such as translation, question answering, and arithmetic was what first drew wide attention to ICL as a practical technique.
Emergent Behaviour and Scale
In-context learning is an emergent ability — it appears in large models but not in smaller ones trained with identical objectives. GPT-3 (175 billion parameters) demonstrated markedly stronger few-shot performance than GPT-2 (1.5 billion parameters), even though neither model was explicitly trained to do ICL. Research has shown that models above roughly 10 to 100 billion parameters generally exhibit reliable ICL, while smaller models tend to degrade when given in-context examples rather than improve. This scale dependence has been a major driver of the scaling-law research agenda in AI.
The mechanism underlying ICL remains an active research question. One influential hypothesis is that large language models implicitly implement gradient-descent-like algorithms internally during the forward pass, effectively running a learned optimisation over the demonstrations. Another view holds that ICL is primarily a form of task retrieval: the model identifies which of the patterns it has seen during pretraining is most similar to the provided demonstrations and applies that pattern to the target input.
Relationship to Other Techniques
In-context learning is closely related to but distinct from several other techniques.
Prompt engineering is the practice of crafting prompts to elicit desired ICL behaviour. Effective prompt design — including the choice of examples, their ordering, and the phrasing of the task instruction — can substantially affect ICL performance.
Chain-of-thought prompting is an extension of ICL in which examples include not just input-output pairs but also intermediate reasoning steps. Models that receive chain-of-thought examples produce their own step-by-step reasoning before giving a final answer, improving accuracy on mathematical and logical tasks.
Fine-tuning updates model weights on a labelled dataset and typically outperforms ICL when sufficient labelled data is available. ICL is valuable precisely in the low-data regime, where collecting labelled examples is expensive.
Retrieval-augmented generation (RAG) combines ICL with dynamic retrieval: relevant documents or examples are fetched from a knowledge base at inference time and placed in the context, giving the model access to information not in its weights.
See Also
References
- Brown, T., et al. (2020). Language Models are Few-Shot Learners. NeurIPS 2020.
- Dong, Q., et al. (2022). A Survey for In-Context Learning. arXiv:2301.00234.
- Wei, J., et al. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research.
- Lakera. (2024). What is In-Context Learning, and How Does It Work?. lakera.ai/blog.