AIWiki
Malaysia

Function Calling

Function calling is the structured mechanism by which a large language model returns a JSON-formatted invocation of a named function with typed arguments, enabling reliable integration of LLMs with external systems.

6 min readLast updated May 2026Applications

Function calling is a feature of modern large language models that allows the model to return a structured invocation of a named function with typed arguments, rather than producing free-form text. The application developer supplies a list of function definitions, each described by a name, a natural-language description, and a JSON Schema for its parameters. During inference the model decides whether to respond with text or to emit a function call; if a call is emitted, the application executes the requested function and returns the result to the model, which may then continue reasoning or produce a final response.

Function calling was popularised by OpenAI in June 2023 with the release of gpt-3.5-turbo-0613 and gpt-4-0613, which were fine-tuned to detect when a function should be called and to format arguments as valid JSON conforming to the supplied schema. The term has since become a generic name for the same capability across vendors, even when the underlying API uses different message types or naming. Tool use is a related and broader term that encompasses function calling along with built-in tools (such as code execution or web search) provided by the vendor.

How function calling works

A function definition supplied to an LLM API typically contains three fields: a name, a description, and a parameters schema. The name is a short identifier such as get_weather or create_invoice. The description is a natural-language explanation of when and how to use the function; it is read by the model at inference time and plays a critical role in selection accuracy. The parameters schema follows the JSON Schema standard and lists each argument with its type, description, and constraints.

When the model decides to call a function, it produces an output containing the function name and a JSON object of arguments. The API returns this output to the application, which is responsible for executing the function — typically a database query, an HTTP request, or a local computation — and returning the result. The result is appended to the conversation as a tool-result message, and the model is invoked again to continue the dialogue. This loop may be repeated several times within a single user turn.

Reliability and structured output

Function calling is a special case of structured-output generation, in which the model is constrained to produce text matching a schema. Vendors employ several techniques to improve reliability: constrained decoding that masks tokens incompatible with the schema, fine-tuning on synthetic data with valid and invalid calls, and dedicated tokens that delimit function-call blocks. Recent model releases support strict mode, in which the API guarantees that the returned JSON validates against the schema, eliminating a common class of integration bugs.

Parallel function calling, supported by OpenAI, Anthropic, Google, and other vendors since 2024, allows the model to emit multiple independent function calls in a single response. The application executes the calls — possibly in parallel — and returns all results together. Forced function choice allows the application to constrain the next response to a specific named function or to disallow text-only responses, useful in deterministic pipelines.

Patterns and pitfalls

| Pattern | Use case | Risk | |---|---|---| | Tool routing | Choose 1 of N tools by intent | Misrouting on ambiguous prompts | | Iterative tool loop | Multi-step research and action | Runaway loops, token cost | | Structured extraction | Pull fields from free text | Schema drift on edge cases | | Forced tool choice | Always emit a specific call | Loss of conversational fallback |

Common pitfalls include hallucinated arguments (the model invents parameters not in the schema), prompt injection through tool outputs (adversarial text in returned data instructing the model to take unintended actions), and silent failures when a tool returns an error the model does not understand. Mitigations include strict schema validation, return-value sanitisation, and explicit error-handling instructions in the system prompt.

Standards and portability

Several efforts aim to standardise function-calling formats across vendors. OpenAPI schemas can be transformed automatically into LLM function definitions. The Model Context Protocol (MCP), released by Anthropic in late 2024, defines an open protocol by which tools, prompts, and resources are exposed by servers and consumed by any compliant LLM client. MCP is supported by Claude, OpenAI clients, and a growing set of open-source agents, reducing the lock-in associated with vendor-specific function-calling formats.

References

  1. OpenAI. (2023). Function calling and other API updates. openai.com/blog.
  2. Anthropic. (2024). Tool Use with Claude — API Documentation. docs.anthropic.com.
  3. Anthropic. (2024). Introducing the Model Context Protocol. anthropic.com/news/model-context-protocol.
  4. Bank Negara Malaysia. (2024). Discussion Paper on Artificial Intelligence in the Financial Services Industry. https://www.bnm.gov.my.