Recently, while implementing OpenAI structured outputs using
json_schema with MCP
(Model Context Protocol), I ran into a serious performance issue.
The model would:
- Start streaming normally
- Then call
mcp.list - And suddenly pause for 30 seconds
- After that, the next chunk appeared
After investigation, the root cause turned out to be
not specifying tool_choice.
When I added tool_choice, the pause dropped to 8-10 seconds.
What is json_schema in OpenAI?
OpenAI now allows enforcing structured outputs using:
response_format: {
type: "json_schema",
json_schema: { ... }
}
This guarantees strict JSON and production-ready parsing.
What is MCP?
MCP allows models to list tools, call tools, fetch schemas, and interact with external systems.
The Real Problem: 30-Second Pause
Without explicit tool choice, the model evaluates too many tool paths while also honoring schema constraints.
tool_choice: "auto"
Setting this reduced the pause significantly by removing decision overhead.