This article focuses on the inherent instability of JSON output from large language models and presents a three-layer governance strategy: pre-generation guidance, in-generation hard constraints, and post-generation repair. It addresses redundant narration, missing fields, syntax errors, and parsing failures. Keywords: LLM, structured output, JSON repair.
Technical Specifications at a Glance
| Parameter | Description |
|---|---|
| Topic | Structured output governance for LLMs |
| Language | Python |
| API Protocol | HTTP / OpenAI-Compatible API |
| Output Target | Valid, parseable, field-complete JSON |
| Core Dependencies | openai, pydantic, json_repair, re |
| Applicable Models | General-purpose LLMs, with hard constraints preferred on mainstream models |
| Typical Scenarios | Moderation classification, information extraction, Agent tool argument generation |
JSON instability in LLM generation is an engineering reality
At their core, LLMs generate the next token probabilistically, while business systems require deterministic structure. This mismatch directly causes issues such as extra commentary around JSON, Markdown code fences wrapping the payload, missing brackets, and drifting field names.
Once downstream code executes json.loads(), even a minor deviation can trigger a parsing exception. In moderation, extraction, and workflow orchestration scenarios, JSON is not just a display format. It is a program contract.
AI Visual Insight: The image illustrates the most common signs of unstable JSON output from LLMs: natural-language text before or after the payload, Markdown code fences around the JSON body, and unclear structural boundaries. These outputs remain readable to humans but are not directly consumable by programs, making them a primary cause of parsing failures.
Common failures fall into four categories
The first category is redundant narration, such as “Here is the result.” The second is Markdown wrapping. The third is syntax errors. The fourth includes field-level issues such as missing fields, misspellings, or type drift. The first two are primarily text contamination, while the last two are structural distortion.
import json
raw = 'Okay, here is the result: {"status": true}'
try:
data = json.loads(raw) # Parsing fails because of the prefixed narration
except json.JSONDecodeError as e:
print(f"Parsing failed: {e}")
This snippet shows a simple fact: if the output is not pure JSON, a standard parser will fail immediately.
Pre-generation guidance can significantly reduce format drift
Prompt optimization is a soft constraint. It cannot guarantee 100% compliance, but it is the lowest-cost solution that works across all models. The key is not merely telling the model to “output JSON,” but clearly defining fields, allowed values, boundary conditions, and prohibited behavior.
A high-quality prompt usually contains four parts: role definition, field rules, boundary constraints, and few-shot examples. Examples are especially effective at reducing field-name rewrites and list-type mistakes.
AI Visual Insight: This image highlights the role of prompt engineering in structured output. By making the role, constraints, and examples more explicit, you compress the model’s output space from open-ended generation toward a near-fixed template, which reduces JSON deviations caused by randomness.
A reusable prompt skeleton is critical
system_prompt = """
You are a strict violation classification assistant.
Output JSON only. Do not include any explanatory text.
The JSON fields must be exactly:
{
"is_illegal":
<boolean>,
"illegal_type":
<string>,
"illegal_words":
<list>
}
Before responding, verify the following:
1. Do not modify key names
2. Do not add fields
3. If there are no illegal terms, illegal_words must be []
"""
The purpose of this prompt is to narrow the model’s output freedom in advance and reduce the chance of unstructured content leaking into the response.
In-generation hard constraints are the core of stable production systems
When your business requires high availability, prompt engineering alone is not enough. A more reliable approach is to apply constraints during generation so the model can only follow token paths that satisfy the required rules.
These mechanisms mainly include JSON Mode, Structured Outputs, and Function Calling. The difference is straightforward: JSON Mode protects the format, while the latter two protect both format and fields.
JSON Mode is the right starting point for parseability
JSON Mode can force the model to return valid JSON text. It is ideal for quickly eliminating redundant narration and Markdown wrapping. However, it does not guarantee correct field names or complete fields.
from openai import OpenAI
import json
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Return the moderation result in JSON"},
{"role": "user", "content": "What an idiot"}
],
response_format={"type": "json_object"} # Force a JSON object response
)
data = json.loads(resp.choices[0].message.content) # Parse pure JSON
This code guarantees, at minimal cost, that the returned result is at least parseable JSON.
Structured Outputs can lock down the field contract
If you care about field completeness, correct types, and forbidding extra properties, Structured Outputs are better suited for production. They use JSON Schema to define the output contract so the model result strictly matches the predefined structure.
schema = {
"type": "json_schema",
"json_schema": {
"name": "illegal_judge_result",
"strict": True,
"schema": {
"type": "object",
"properties": {
"is_illegal": {"type": "boolean"},
"illegal_type": {"type": "string"},
"illegal_words": {"type": "array"}
},
"required": ["is_illegal", "illegal_type", "illegal_words"],
"additionalProperties": False # Forbid extra fields
}
}
}
This schema turns “acceptable output” into a machine-executable structural constraint.
Function Calling is especially effective for Agents and toolchains
If your system already invokes tools, Function Calling is often the more natural solution. The model is no longer “writing JSON.” Instead, it is “filling function arguments,” which usually provides better stability.
tools = [{
"type": "function",
"function": {
"name": "process_illegal_judge",
"strict": True,
"parameters": {
"type": "object",
"properties": {
"is_illegal": {"type": "boolean"},
"illegal_type": {"type": "string"},
"illegal_words": {"type": "array"}
},
"required": ["is_illegal", "illegal_type", "illegal_words"],
"additionalProperties": False
}
}
}]
The core value of this configuration is that it turns the JSON result directly into tool arguments, reducing the risk of secondary parsing.
Post-processing is the final fuse that prevents production incidents
In reality, not all models support hard constraints, and API fluctuations or compatibility-layer differences can still distort output. For that reason, post-processing is not optional. It is the safety net of a production system.
A practical post-processing pipeline should include four steps: extract first, repair second, validate third, and retry last. This separation keeps each step focused and makes failures easier to diagnose.
Regex extraction removes noisy text first
import re
def extract_json_from_text(text: str):
block = re.search(r'```json\s*(\{[\s\S]*?\})\s*```', text)
if block:
return block.group(1).strip() # Prefer JSON extracted from a code block
outer = re.search(r'(\{[\s\S]*\})', text)
return outer.group(1).strip() if outer else text
This function cleans up the most common contamination pattern: extra narration before or after the payload plus Markdown code fences.
Syntax repair and field validation must run in layers
import json
import json_repair
from pydantic import BaseModel
from typing import Literal
class IllegalJudgeResult(BaseModel, extra="forbid"):
is_illegal: bool
illegal_type: Literal["色情", "暴力", "辱骂", "其他", ""]
illegal_words: list
def safe_parse(raw: str):
fixed = json_repair.repair_json(raw) # Repair minor syntax errors
data = json.loads(fixed)
return IllegalJudgeResult(**data).model_dump() # Validate the field contract
This code repairs syntax first and validates fields second, which covers most non-fatal JSON exceptions.
The best end-to-end practice is layered composition, not single-point dependence
For low-risk scenarios, use “prompt engineering + regex extraction.” For medium-risk scenarios, use “JSON Mode + validation.” For high-risk production scenarios, prioritize “Structured Outputs or Function Calling + Pydantic + retry mechanisms.”
At a deeper level, soft constraints reduce the probability of failure, hard constraints guarantee structural correctness, and post-processing absorbs edge-case anomalies. You need all three layers together to move JSON from “sometimes usable” to “stable and reliable.”
FAQ on structured output
Q1: Can prompt engineering alone completely eliminate LLM JSON errors?
No. Prompting can only reduce the probability of errors. It cannot enforce compliance at the generation level. For production systems, you should at least add validation and fallback repair.
Q2: How should I choose between JSON Mode and Structured Outputs?
If your only goal is parseable JSON, use JSON Mode. If you also require complete fields, accurate types, and no extra properties, choose Structured Outputs first.
Q3: What if the model does not support structured output?
Use a compatibility stack based on “strong prompting + regex extraction + json_repair + Pydantic validation + exponential backoff retries.” In practice, this is sufficient for most general business scenarios.
Core takeaway: This article reconstructs a complete JSON output governance strategy for LLMs, covering prompt guidance, JSON Mode, Structured Outputs, Function Calling, regex extraction, json_repair, Pydantic validation, and retry mechanisms. It helps developers systematically solve formatting errors, missing fields, and parsing failures.