Back to Blog

Understanding OpenAI json_schema, MCP, and the 30-Second Streaming Pause

Why missing tool_choice when using OpenAI structured outputs with MCP causes a long streaming pause, and how to fix it.

Recently, while implementing OpenAI structured outputs using json_schema with MCP (Model Context Protocol), I ran into a serious performance issue.

The model would:

  • Start streaming normally
  • Then call mcp.list
  • And suddenly… pause for 30 seconds
  • After that, the next chunk appeared

After investigation, the root cause turned out to be not specifying tool_choice. When I added tool_choice, the pause dropped to 8–10 seconds.

Let's break down why this happens.

πŸ”Ž What is json_schema in OpenAI?

OpenAI now allows enforcing structured outputs using:

response_format: {
  type: "json_schema",
  json_schema: { ... }
}

This guarantees:

  • Strict JSON
  • No malformed outputs
  • Predictable structure
  • Production-ready parsing

This is much more reliable than "please return JSON" prompting.

πŸ”Œ What is MCP?

MCP (Model Context Protocol) allows models to dynamically:

  • List tools (mcp.list)
  • Call tools
  • Fetch tool schemas
  • Interact with external systems

If you're using something like https://mcp.botsify.com/mcp/list_actions, the model dynamically evaluates available tools before deciding what to call.

🚨 The Real Problem: 30-Second Pause

Here's what was happening:

  • Model begins streaming.
  • It internally triggers mcp.list.
  • It evaluates all tools.
  • It thinks deeply.
  • It decides whether to call a tool.
  • Streaming pauses for ~30 seconds.

Why? Because I did not define:

tool_choice: "auto"

Or:

tool_choice: { "type": "function", "function": { "name": "myTool" } }

🧠 Why Missing tool_choice Causes Delay

When tool_choice is NOT provided, the model must:

  • Evaluate all available tools
  • Decide whether to call one
  • Compare with schema requirements
  • Validate output format
  • Possibly retry internally

This internal reasoning phase is expensive β€” especially when:

  • Using json_schema
  • Using MCP dynamic tools
  • Streaming responses
  • Large tool lists

The model enters a "decision paralysis" loop: it has to satisfy both the structured output contract and the possibility of calling tools, so it spends a long time reasoning before emitting the next token.

⚑ Why Adding tool_choice Reduced It to 8–10s

When I added tool_choice, the model:

  • No longer had to evaluate whether to use a tool
  • Skipped tool selection reasoning
  • Directly executed the intended flow
  • Reduced internal retries

The pause dropped from 30s β†’ 8–10s.

There's still some delay because MCP still loads, tool schema validation still occurs, and structured output validation still runs β€” but the heavy decision-making phase is reduced.

πŸ“Š What's Happening Internally (Advanced)

When using json_schema, MCP tools, streaming, and no tool_choice, the model must satisfy two constraints at once:

  • It must follow the strict JSON schema for the final response.
  • It must decide whether a tool call is needed (and which one).

If tool output also needs to match the schema, the model may internally "simulate" or plan tool outputs before streaming. That planning step adds significant latency. By setting tool_choice, you remove the need for that decision step.

πŸ›  Best Practices If You're Using MCP + json_schema

βœ… 1. Always Define tool_choice

If you know the tool:

tool_choice: {
  "type": "function",
  "function": { "name": "updateBotSettings" }
}

If you want auto:

tool_choice: "auto"

Even "auto" is better than undefined β€” it gives the model a clear instruction to consider tools without re-evaluating from scratch.

βœ… 2. Reduce Tool Count

If you expose many tools (e.g. 20+) with complex schemas and nested properties, the model's reasoning time increases. Keep only the necessary tools per request when possible.

βœ… 3. Avoid Huge JSON Schemas

Large json_schema definitions increase validation time, retry loops, and token usage. Flatten or simplify where you can.

βœ… 4. Measure Streaming Gaps

Don't just measure total time. Measure first-token time, time before a tool call, and time between chunks β€” that's where hidden latency shows up.

πŸ’‘ Key Insight

The delay was not the network, not MCP server speed, and not streaming itself. It was model decision latency. The fix was giving the model clearer instructions: the more freedom you give the model (e.g. no tool_choice), the slower it may think before producing the next token.