Structured output prompting with JSON mode + Pydantic

Surya Koritala
16 Min Read

The JSON-extraction hack a lot of teams wrote in 2023 is no longer the best default. structured output prompting gives you schema-constrained responses, typed Python objects, and a cleaner production path than regexing JSON out of markdown. In this tutorial, we’ll build a small ticket-triage flow with OpenAI’s structured outputs, validate it with Pydantic, and map the same pattern to Anthropic tool use and Google Gemini response schemas. Prereqs: Python 3.10+, pip install openai pydantic, and an API key.

Stage 1: Know what you’re building

This tutorial builds a support-ticket triage step that returns a typed object instead of free-form text. The goal is simple: send a user message, get back a validated structure with category, priority, summary, and a suggested response. That is the practical center of structured output prompting: you are not asking the model to “please format this as JSON” and hoping for the best; you are defining the shape you need up front.

OpenAI’s structured outputs guide describes two related patterns. One is returning data that conforms to a JSON schema. The other is tool calling, where the schema describes arguments for an action the model should invoke. In both cases, the schema is the contract. OpenAI’s docs also note that strict mode constrains decoding so invalid JSON is not produced, which removes the old class of parsing failures.

For Python builders, the ergonomic win is that the OpenAI Python client can accept a Pydantic model directly in client.beta.chat.completions.parse. That means your application state can stay typed from the model boundary onward, instead of bouncing through raw strings and ad hoc cleanup code.

OpenAI structured outputs documentation page
Image: source page. Used under fair use.

Use Python 3.10+ and install openai and pydantic. You’ll also need an OpenAI API key in your environment.

python -m venv .venv
source .venv/bin/activate
pip install openai pydantic

Stage 2: Define the schema in Pydantic

Pydantic is a good fit here because it already gives you typed models, validation, and JSON schema generation. The Pydantic docs cover both model validation and schema generation in depth. In this pattern, your Pydantic class becomes the single source of truth for what the model is allowed to return.

A useful rule is to keep the schema narrow. Enumerations with Literal are better than open-ended strings when your downstream system expects a fixed set of values. That is one of the biggest practical gains from structured output prompting: you stop spending time normalizing near-miss labels like urgent-ish or billing issue after the fact.

Pros
  • Clear enum values for downstream routing
  • Typed object is easy to test
  • Schema can be reused across prompts and services
Cons
  • You need to think through edge cases up front
  • Very flexible outputs may need a broader schema
  • Unsupported schema features require simplification
from pydantic import BaseModel
from typing import Literal

class TicketTriage(BaseModel):
    category: Literal["billing", "technical", "account", "spam"]
    priority: Literal["low", "medium", "high", "urgent"]
    summary: str
    suggested_response: str
NeedBest fit
Any valid JSON blobJSON mode
Specific response shapeStructured outputs with schema
Model should invoke an actionTool calling
A practical decision rule for output control

Stage 3: Call the model and get back a typed object

Best default: Pydantic + parse

It keeps the schema in Python, lets the SDK derive JSON schema for you, and returns a typed object without a manual parsing layer.

Here is the canonical working pattern. OpenAI’s Python client supports passing a Pydantic class directly as response_format, and the SDK returns a parsed Pydantic instance. This is the cleanest version of structured output prompting for Python because it removes the manual json.loads() step entirely.

Use client.beta.chat.completions.parse when you want a parsed Pydantic object back directly.

from openai import OpenAI
from pydantic import BaseModel
from typing import Literal

class TicketTriage(BaseModel):
    category: Literal["billing", "technical", "account", "spam"]
    priority: Literal["low", "medium", "high", "urgent"]
    summary: str
    suggested_response: str

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Triage support tickets into a structured form."},
        {"role": "user", "content": "My credit card was charged twice for the same order #4521."},
    ],
    response_format=TicketTriage,
)

triage: TicketTriage = response.choices[0].message.parsed
print(triage.category)        # 'billing'
print(triage.priority)        # 'high'
print(triage.summary)         # auto-generated string

Stage 4: Use raw JSON schema when you need tighter control

There are cases where you may want to define the schema yourself instead of relying on automatic conversion from Pydantic. OpenAI’s docs show a raw schema path using response_format with a json_schema payload and strict enabled. That is useful when you need explicit control over the schema name or want to inspect the exact contract being sent to the API.

This is also the point where it helps to separate terms that often get blurred together. JSON mode is for getting valid JSON. Structured outputs are for getting JSON that matches a specific schema. If your system depends on exact keys and allowed values, structured output prompting should be the default, not generic JSON mode.

JSON mode does not by itself guarantee your preferred schema. Structured outputs do.

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Triage support tickets into a structured form."},
        {"role": "user", "content": "I can't log in after resetting my password."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "ticket_triage",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "category": {
                        "type": "string",
                        "enum": ["billing", "technical", "account", "spam"]
                    },
                    "priority": {
                        "type": "string",
                        "enum": ["low", "medium", "high", "urgent"]
                    },
                    "summary": {"type": "string"},
                    "suggested_response": {"type": "string"}
                },
                "required": ["category", "priority", "summary", "suggested_response"],
                "additionalProperties": False
            }
        }
    }
)

print(response.choices[0].message.content)

Stage 5: Design around the limitations before they bite you

The happy path is straightforward, but there are a few constraints worth designing for early. OpenAI’s structured outputs guide documents unsupported schema features in strict mode, including oneOf, allOf, regex patterns on strings, and format constraints like date-time or email. The docs also note a maximum nesting depth of five levels. If your first schema mirrors a deeply nested internal object graph, simplify it before you wire it into production.

Another practical constraint is required fields. If a field can truly be absent, model that intentionally in Python rather than assuming the model will sometimes omit it cleanly. A flatter schema with explicit enums and strings tends to be more robust than a highly expressive one. Teams adopting structured output prompting often get the best results when they treat schemas as product interfaces, not as a dump of every possible field they might someday want.

Latency is the other tradeoff to keep in mind. OpenAI’s docs note that strict mode can add roughly one to three seconds of first-token latency versus non-strict generation. That overhead is often worth it because you avoid malformed-output retries and cleanup passes, but it still matters for user-facing flows.

“The old failure mode was malformed JSON. The new failure mode is usually a schema that tries to do too much.”

Alatirok editorial guidance
LimitationWhat to do instead
Unsupported schema features like oneOf/allOfSplit into simpler schemas or separate calls
Regex and format constraints not supportedValidate after receipt in application code
Deep nestingFlatten the response shape
Strict mode latency overheadUse only where exact shape matters
Common strict-mode constraints and practical workarounds

Stage 6: Pick structured outputs or tool calling on purpose

A lot of confusion comes from using one pattern where the other is a better fit. If the model is returning data for your application to store, rank, display, or pass to another service, use structured outputs. If the model is selecting and invoking an action, use tool calling. Anthropic’s tool use guide frames this clearly: tools let Claude produce structured arguments for external functions. That is an action boundary, not just a formatting preference.

This distinction matters in agent systems. A triage object is state. A refund request, calendar update, or database write is an action. In practice, many teams use both: structured output prompting for internal state transitions and tool calling for side effects. That split keeps your orchestration logic easier to reason about and test.

Return data with structured outputs. Trigger actions with tool calling.

ScenarioRecommended pattern
Classify a support ticketStructured outputs
Choose a CRM update actionTool calling
Generate a report object for downstream codeStructured outputs
Call a refund API with validated argsTool calling
Choosing the right control surface

Stage 7: Add the production error-handling path

Production takeaway

Your error handling shifts from string cleanup to refusal handling, retries, and normal application validation.

The SDK path is cleaner than manual parsing, but production code still needs to handle refusals and API failures. OpenAI’s Python client exposes refusal information on the message object, and API-level exceptions should be retried or surfaced according to your application’s policy. The key shift is that you are no longer writing retry logic for malformed JSON; you are handling operational errors and model refusals instead.

If you want an abstraction layer around this pattern, the Instructor library is a well-known option in the Python ecosystem. It wraps structured extraction workflows around Pydantic models across providers. Even if you stick to the official SDK, it is useful as a reference for how many teams package structured output prompting into reusable application code.

from openai import OpenAI, APIError
from pydantic import BaseModel, ValidationError
from typing import Literal

class TicketTriage(BaseModel):
    category: Literal["billing", "technical", "account", "spam"]
    priority: Literal["low", "medium", "high", "urgent"]
    summary: str
    suggested_response: str

client = OpenAI()

def handle_refusal(refusal: str) -> None:
    print(f"Model refusal: {refusal}")

def handle_retry(error: Exception) -> None:
    print(f"Retryable error: {error}")

try:
    response = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "Triage support tickets into a structured form."},
            {"role": "user", "content": "My account was locked after I changed my email."},
        ],
        response_format=TicketTriage,
    )
    if response.choices[0].message.refusal:
        handle_refusal(response.choices[0].message.refusal)
    else:
        triage = response.choices[0].message.parsed
        print(triage.model_dump())
except APIError as e:
    handle_retry(e)
except ValidationError as e:
    print(f"Validation error: {e}")

Where to go from here

Once the basic pattern works, the next step is to standardize it across providers. Anthropic supports structured arguments through tool use for Claude 3 and later, documented in its tool use guide. Google documents schema-constrained generation for Gemini in its structured output guide. The surface area differs, but the architectural idea is the same: define a schema, constrain the model to it, and keep your application state typed.

If you are building agents, this pattern is worth treating as infrastructure rather than prompt craft. Typed objects make state transitions auditable, testable, and easier to route across services. The old markdown-code-block extraction trick solved a short-term problem. structured output prompting is the longer-term replacement because it turns output shape into an explicit contract. Start with one narrow schema, wire it into your tests, and expand only when the downstream system truly needs more fields.

Try adding a second schema for escalation decisions, then compare a pure structured-output path with a tool-calling path for actions.

ProviderPrimary mechanismReference
OpenAIStructured outputs via JSON schema or SDK parseOpenAI structured outputs docs
AnthropicTool use with structured argumentsAnthropic tool use guide
GoogleGemini response schemaGoogle structured output guide
Cross-model map for schema-constrained generation

Frequently asked questions

What is the difference between JSON mode and structured outputs?

JSON mode is for getting valid JSON, while structured outputs are for getting JSON that conforms to a specific schema. OpenAI’s structured outputs guide recommends schema-constrained output when your application depends on exact fields and values.

Can I use Pydantic directly with the OpenAI Python SDK?

Yes. The OpenAI Python client supports client.beta.chat.completions.parse, where response_format can be a Pydantic model class. Pydantic itself is documented at docs.pydantic.dev.

When should I use tool calling instead of structured outputs?

Use tool calling when the schema represents arguments for an action the model should invoke, not just a data object it should return. Anthropic’s tool use guide is a good reference for that action-oriented pattern.

Primary sources

Last updated: May 22, 2026. Related: Agent Infrastructure.

Share This Article
1 Comment