Understanding OpenAI json_schema, MCP, and the 30-Second Streaming Pause

Recently, while implementing OpenAI structured outputs using json_schema with MCP (Model Context Protocol), I ran into a serious performance issue.

The model would:

After investigation, the root cause turned out to be not specifying tool_choice. When I added tool_choice, the pause dropped to 8-10 seconds.

What is json_schema in OpenAI?

OpenAI now allows enforcing structured outputs using:

response_format: {
  type: "json_schema",
  json_schema: { ... }
}

This guarantees strict JSON and production-ready parsing.

MCP allows models to list tools, call tools, fetch schemas, and interact with external systems.

Without explicit tool choice, the model evaluates too many tool paths while also honoring schema constraints.

tool_choice: "auto"

Setting this reduced the pause significantly by removing decision overhead.