Tool Use
Tool use in AI refers to the capability of language models to invoke external functions, APIs, or services to retrieve information, perform actions, or extend their abilities beyond text generation.
Tool use is the ability of a large language model (LLM) to take actions beyond producing text by calling externally defined functions, APIs, or services. When a model is granted tools, the application supplies a schema describing each tool name, parameters, and return type; during inference the model decides when to invoke a tool, produces a structured call, and receives the tool result back as part of the conversation. The model can then continue reasoning, call additional tools, or return a final answer to the user. Tool use is the foundational mechanism behind agentic AI systems, retrieval-augmented generation pipelines, and most production deployments of LLM-driven automation.
Tool use is closely related to function calling, a term popularised by OpenAI in 2023, but the concept predates that release. Earlier prompting techniques such as ReAct (Reason + Act) and Toolformer demonstrated that LLMs could be trained or prompted to interleave reasoning steps with tool invocations to solve tasks that required external information or computation. Modern instruction-tuned models from OpenAI, Anthropic, Google, Meta, Mistral, and Alibaba include native support for structured tool calls and are evaluated on benchmarks that measure tool selection accuracy and argument fidelity.
How tool use works
A typical tool-use interaction proceeds in three phases. First, the application supplies tool definitions to the model alongside the user prompt. Each definition includes a name, a natural-language description of when to use it, and a JSON schema for its parameters. Second, the model produces either a final text response or a structured tool call, indicated by a special token sequence or message field. Third, the application executes the requested tool, returns the result to the model as a tool-result message, and the model continues from that point. This loop may iterate many times before the model produces a final answer.
The reliability of tool use depends on several factors. The model must select the correct tool from a possibly long list, produce arguments that match the schema exactly, and reason about the returned data. Mistakes include calling the wrong tool, hallucinating non-existent parameters, mishandling errors returned by tools, or failing to call a tool when one is needed. Vendors have iterated on training procedures and prompt formats to improve these behaviours, and have introduced features such as parallel tool calling, where a model may issue multiple independent calls in a single turn, and forced tool choice, where the application requires the next response to be a specific tool call.
Common tool categories
Tools fall into several broad categories. Retrieval tools query a vector database, search engine, or document store and return relevant text chunks for grounding. Computation tools include calculators, Python interpreters, SQL engines, and code sandboxes that execute arbitrary code and return results. Action tools modify external state — sending email, creating calendar events, updating CRM records, posting to messaging platforms, or controlling devices. Memory tools allow the model to store and retrieve information across conversations or sessions.
A growing ecosystem of standards aims to make tool definitions portable. The Model Context Protocol (MCP), introduced by Anthropic in late 2024, defines an open protocol for exposing tools and resources from external servers to any compliant LLM client. MCP has been adopted by Claude, OpenAI clients, and several open-source agent frameworks, reducing the per-vendor effort required to wire models to enterprise systems.
| System | Tool format | Notable features | |---|---|---| | OpenAI | JSON schema, parallel calls | Native function calling since June 2023 | | Anthropic Claude | JSON tools, MCP-native | Structured tool blocks, computer-use tools | | Google Gemini | JSON schema, code execution | Built-in code interpreter | | Mistral | JSON schema | Native function calling in Mixtral 8x22B and Large | | Open models | Function-calling templates | Hermes, NousResearch, Functionary fine-tunes |
Risks and design considerations
Tool use expands the surface area for harm. A model that can read email, schedule meetings, or move money can be exploited through prompt injection, in which adversarial text within a tool result instructs the model to take an unintended action. Defences include separating untrusted input from trusted instructions, requiring human confirmation for sensitive actions, allowlisting tool destinations, and applying capability-aware policies that restrict which tools are available in which contexts. Enterprise deployments commonly couple tool use with audit logging, rate limiting, and per-user authorisation.
References
- Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629.
- Schick, T., et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761.
- Anthropic. (2024). Introducing the Model Context Protocol. anthropic.com/news/model-context-protocol.
- Bank Negara Malaysia. (2024). Discussion Paper on Artificial Intelligence in the Financial Services Industry. https://www.bnm.gov.my.