AI Planning
AI planning is the discipline of automatically generating a sequence of actions that an intelligent agent can execute to move from an initial state to a goal, increasingly used inside LLM-based agents to decompose and reason about complex tasks.
Planning in artificial intelligence is the problem of automatically producing a sequence of actions that, when executed by an agent in some environment, transforms an initial state into a state that satisfies a stated goal. Planning is one of the founding subfields of AI: it underpins game-playing, robotics, logistics, autonomous vehicles, and, more recently, the orchestration of multi-step behaviour inside large language model (LLM) agents.
Classical planning
Classical planning assumes a fully observable, deterministic environment with a finite state space and instantaneous actions. A planning problem is specified by an initial state, a goal description, and a set of action schemas with preconditions and effects. The dominant representation languages are STRIPS (Stanford Research Institute Problem Solver), introduced in 1971, and its successor PDDL (Planning Domain Definition Language), which has been the lingua franca of the International Planning Competition since 1998. Solvers explore the state space using heuristic search algorithms such as A*, with heuristics derived from problem relaxations (FF, LAMA, Fast Downward).
Extensions cover non-classical settings: temporal planning with durations and concurrency, probabilistic planning modelled as Markov decision processes, partially observable planning as POMDPs, multi-agent planning, and hierarchical task network (HTN) planning, where high-level tasks are decomposed into lower-level subtasks.
Planning with large language models
The rise of LLMs has produced a new generation of planners that treat natural-language task descriptions as planning problems. Several patterns recur:
| Pattern | Idea | |---|---| | Chain-of-thought | Prompt the model to reason step by step before answering | | Plan-and-Solve | Generate a full plan first, then execute step by step | | ReAct | Interleave reasoning thoughts with tool actions and observations | | Tree of Thoughts | Explore multiple candidate plans as a tree with self-evaluation | | Graph of Thoughts | Allow merging and revisiting of partial plans in a graph | | Reflexion | Reflect on past failures and revise the plan in the next attempt | | LLM-as-planner with verifier | Use an LLM to generate plans and a separate verifier (symbolic or neural) to check feasibility | | LLM + PDDL | Translate natural-language tasks into PDDL and use classical solvers |
These approaches differ in their commitment to single-shot versus stepwise planning, and in whether they require an external symbolic component. Empirically, stepwise approaches handle dynamic environments and tool failures better, while one-shot approaches are cheaper when the task is well structured.
Task decomposition
A practical concern in agentic systems is task decomposition: how to break a high-level goal into subtasks small enough for reliable execution. Recent work has shown that aggressive decomposition combined with per-step verification can scale agent reliability dramatically, with some published systems reporting near-zero error rates across millions of reasoning steps when each step is small and locally checkable. Hierarchical decomposition also fits naturally with tool-using agents that delegate subtasks to specialised tools, smaller models, or human reviewers.
Evaluation
Planning systems are evaluated on success rate, plan length or cost, generalisation across problem instances, robustness to perturbations, and computational efficiency. Modern LLM agent benchmarks (AgentBench, GAIA, WebArena, SWE-bench, OSWorld) all stress planning behaviour in addition to single-step reasoning.
Limitations and open problems
Even capable LLM planners struggle with long-horizon planning, irreversible actions, partial observability, and adversarial environments. Hallucinated steps, infinite loops, and brittle recovery from tool errors remain common failure modes. Research directions include neuro-symbolic hybrids that combine LLMs with classical planners or constraint solvers, world-model learning to simulate consequences before acting, and self-improvement through replay of past trajectories.
See Also
References
References
- Fikes, R. and Nilsson, N. (1971). STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. Artificial Intelligence.
- Ghallab, M., Nau, D., and Traverso, P. (2004). Automated Planning: Theory and Practice. Morgan Kaufmann.
- Yao, S. et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR.
- Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS.
- Cognizant AI Lab. (2025). MAKER Achieves Million-Step, Zero-Error LLM Reasoning.