Structured Outputs
A capability that forces a language model to return responses that conform exactly to a specified format such as a JSON schema, making model output reliable for software pipelines.
Structured outputs is a capability that constrains a large language model to produce responses matching a predefined format, most commonly JSON that conforms to a supplied schema. Ordinary language-model output is free-form text, which is difficult for software to consume reliably because the model may add commentary, omit required fields, or emit malformed syntax. Structured outputs remove this fragility by guaranteeing that the response is valid and complete according to the schema, which is essential when a model's answer must feed directly into a database, an API call, or another program.
The problem it solves
When a developer asks a model to return data as JSON through prompting alone, the model usually complies but occasionally fails: it may wrap the JSON in explanatory prose, use the wrong field names, or produce a truncated object. Even a low failure rate is costly in an automated pipeline processing thousands of requests, since each malformed response must be caught, retried, or repaired. Structured outputs shift this guarantee from best-effort prompting to an enforced property of generation.
How constrained decoding works
The core mechanism is constrained decoding. A language model generates text one token at a time, and at each step it assigns probabilities to every possible next token in its vocabulary. Left unconstrained, the model can select any token, which is what allows invalid output. Constrained decoding restricts the choice at each step to only those tokens that keep the output valid under the target format. If the schema requires a closing brace or a specific field name next, tokens that would violate that requirement are masked out before sampling. Because validity is enforced token by token, the final output is guaranteed to parse and to match the schema structure.
OpenAI released Structured Outputs in its API in August 2024, exposing it through a response format that accepts a JSON schema and training a model version to follow complex schemas accurately, then layering deterministic constrained decoding on top to reach full reliability. Google added a comparable response-schema feature to Gemini, and Anthropic later introduced constrained decoding for Claude. Open-weight ecosystems offer the same idea through grammar-based decoding libraries that compile a schema or grammar into token masks.
Relationship to function calling
Structured outputs and function calling are closely linked. In function calling, a model is given the signature of a tool and must produce arguments that match it; enforcing the argument schema is the same constrained-decoding problem. The distinction is one of intent: structured outputs generally describe returning data to the application in a fixed shape, while function calling describes selecting and parameterising an action. Modern APIs apply the same strict-schema guarantees to both.
| Feature | Purpose | Output | | --- | --- | --- | | Structured outputs | Return data in a fixed shape | Schema-conforming JSON | | Function calling | Select and parameterise a tool | Schema-conforming arguments |
Research has noted trade-offs. Constraining generation can, in some settings, slightly affect reasoning quality or suppress certain behaviours compared with unconstrained generation, sometimes called a format tax, so practitioners weigh strict enforcement against flexibility. Nonetheless, structured outputs have become foundational to building reliable AI agents, data-extraction systems, and any workflow where model responses must integrate cleanly with conventional software.
References
- OpenAI. (2024). Introducing Structured Outputs in the API. openai.com.
- Willard, B., and Louf, R. (2023). Efficient Guided Generation for Large Language Models. arXiv.
- Databricks. (2025). Introducing Structured Outputs for Batch and Agent Workflows. databricks.com/blog.