Tree of Thoughts
Tree of Thoughts (ToT) is a prompting framework that lets large language models explore and evaluate multiple intermediate reasoning paths in a branching search, improving performance on tasks requiring planning.
Tree of Thoughts (ToT) is a prompting framework, introduced in 2023, that enhances the reasoning ability of large language models by allowing them to explore multiple lines of reasoning in a structured, branching manner rather than committing to a single chain. It generalises the popular chain-of-thought approach, in which a model generates a single linear sequence of intermediate steps, by treating each reasoning step as a node that can branch into several alternatives, forming a tree of possible solution paths.
Motivation
Standard chain-of-thought prompting works well for many problems but is fundamentally left-to-right and greedy: the model produces one reasoning path and cannot easily reconsider an early decision that turns out to be wrong. For tasks that require planning, exploration, or search, such as puzzles, mathematical games, or constraint problems, a single chain often fails because a poor early choice dooms the rest of the reasoning. Tree of Thoughts addresses this by enabling deliberate decision-making, in which the model considers several candidate next steps, evaluates them, and can look ahead or backtrack to make better global choices.
How it works
A thought in ToT is a coherent unit of intermediate text that represents a partial step toward a solution. The framework operates in stages. First, the model decomposes a problem into steps and, at each step, generates several candidate thoughts that branch from the current state. Second, a separate evaluation prompt assesses these candidate thoughts, scoring how promising each one is, for instance by judging whether a partial solution can still reach a valid answer. Third, a search algorithm decides which branches to expand and which to prune.
ToT can use different search strategies over the resulting tree. Breadth-first search explores many options at each level before going deeper, depth-first search follows a promising path to completion before reconsidering, and beam search maintains a fixed number of the most promising candidate states at each step. By combining generation, self-evaluation, and search, the model behaves less like a single forward pass and more like a deliberate problem-solver.
Performance
The most cited demonstration of Tree of Thoughts is the Game of 24, an arithmetic puzzle in which a model must combine four numbers to reach 24. Using GPT-4, standard chain-of-thought prompting solved only around 4 percent of instances, whereas Tree of Thoughts raised the success rate to roughly 74 percent. Similar gains were reported on other tasks involving non-trivial planning or search, illustrating that the benefit of ToT is largest precisely where linear reasoning struggles most.
Relationship to other methods
Tree of Thoughts sits within a broader family of techniques that structure model reasoning. It builds directly on chain-of-thought prompting and is related to self-consistency, which samples multiple chains and takes a majority vote, and to graph-of-thoughts variants that allow more general connections than a tree. It also connects conceptually to classical search and planning in artificial intelligence, applying ideas such as state evaluation and pruning to the space of natural-language reasoning steps. The added capability comes at a cost: exploring many branches requires substantially more model calls than a single chain, so ToT is typically reserved for difficult problems where the accuracy gain justifies the extra computation.
References
- Yao, S., et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601 (NeurIPS 2023).
- IBM. (2024). What is Tree of Thoughts Prompting?
- Learn Prompting. (2024). Tree of Thoughts (ToT): Enhancing Problem-Solving in LLMs.