Beam Search
Beam search is a heuristic search algorithm used in sequence generation that keeps a fixed number of the most promising partial sequences at each step, balancing output quality against computational cost.
Overview
Beam search is a heuristic algorithm for generating output sequences in tasks such as machine translation, speech recognition and text generation. When a model produces text one token at a time, the number of possible sequences grows exponentially with length, so examining every possibility is infeasible. Beam search offers a practical compromise: at each step it retains a fixed number of the most probable partial sequences, called the beam, and expands only those.
How it works
Generation proceeds from left to right. At each step the model assigns a probability to every possible next token for each candidate sequence currently in the beam. Beam search scores all the resulting extended sequences and keeps only the top candidates, where the number kept is the beam width, often a small value such as 4 or 5. The process repeats until sequences reach an end-of-sequence marker or a maximum length.
A beam width of 1 reduces beam search to greedy decoding, which simply picks the single most likely token at each step and can miss better overall sequences. Wider beams explore more alternatives and usually improve quality up to a point, after which returns diminish and computation grows. Because longer sequences accumulate more probability penalties, implementations typically apply length normalisation, dividing the accumulated score by a function of sequence length to avoid an unfair bias toward short outputs.
Relationship to sampling
Beam search aims to find a high-probability, often near-deterministic output, which suits tasks with a single correct answer such as translation. For open-ended generation, however, it can produce repetitive or generic text. Modern large language models therefore frequently use stochastic decoding methods such as top-k sampling, nucleus (top-p) sampling and temperature scaling, which trade some probability for diversity and creativity. Beam search and sampling are sometimes combined.
Strengths and limitations
Beam search reliably finds sequences with higher overall probability than greedy decoding while remaining far cheaper than exhaustive search. Its limitations include a tendency to favour bland or repetitive text in open-ended settings, sensitivity to the chosen beam width, and the well-documented observation that the highest-probability sequence is not always the highest-quality one, a phenomenon sometimes called the beam search curse.
Applications
Beam search remains standard in neural machine translation systems, automatic speech recognition, image captioning and structured prediction tasks where output accuracy matters more than diversity. It is implemented in widely used libraries such as Hugging Face Transformers.
References
- Sutskever, I., Vinyals, O. and Le, Q. (2014). Sequence to Sequence Learning with Neural Networks. NeurIPS.
- Freitag, M. and Al-Onaizan, Y. (2017). Beam Search Strategies for Neural Machine Translation. Workshop on Neural Machine Translation.
- Holtzman, A., Buys, J., Du, L., Forbes, M. and Choi, Y. (2020). The Curious Case of Neural Text Degeneration. ICLR.