How to Stabilize LLM JSON Output: Prompt Engineering, Structured Outputs, and Automatic Repair

This article focuses on the inherent instability of JSON output from large language models and presents a three-layer governance strategy: pre-generation guidance, in-generation hard constraints, and post-generation repair. It addresses redundant narration, missing fields, syntax errors, and parsing failures. Keywords: LLM, structured output, JSON repair.

Technical Specifications at a Glance

Parameter Description
Topic Structured output governance for LLMs
Language Python
API Protocol HTTP / OpenAI-Compatible API
Output Target Valid, parseable, field-complete JSON
Core Dependencies openai, pydantic, json_repair, re
Applicable Models General-purpose LLMs, with hard constraints preferred on mainstream models
Typical Scenarios Moderation classification, information extraction, Agent tool argument generation

JSON instability in LLM generation is an engineering reality

At their core, LLMs generate the next token probabilistically, while business systems require deterministic structure. This mismatch directly causes issues such as extra commentary around JSON, Markdown code fences wrapping the payload, missing brackets, and drifting field names.

Once downstream code executes json.loads(), even a minor deviation can trigger a parsing exception. In moderation, extraction, and workflow orchestration scenarios, JSON is not just a display format. It is a program contract.

image AI Visual Insight: The image illustrates the most common signs of unstable JSON output from LLMs: natural-language text before or after the payload, Markdown code fences around the JSON body, and unclear structural boundaries. These outputs remain readable to humans but are not directly consumable by programs, making them a primary cause of parsing failures.

Common failures fall into four categories

The first category is redundant narration, such as “Here is the result.” The second is Markdown wrapping. The third is syntax errors. The fourth includes field-level issues such as missing fields, misspellings, or type drift. The first two are primarily text contamination, while the last two are structural distortion.

import json

raw = 'Okay, here is the result: {"status": true}'

try:
    data = json.loads(raw)  # Parsing fails because of the prefixed narration
except json.JSONDecodeError as e:
    print(f"Parsing failed: {e}")

This snippet shows a simple fact: if the output is not pure JSON, a standard parser will fail immediately.

Pre-generation guidance can significantly reduce format drift

Prompt optimization is a soft constraint. It cannot guarantee 100% compliance, but it is the lowest-cost solution that works across all models. The key is not merely telling the model to “output JSON,” but clearly defining fields, allowed values, boundary conditions, and prohibited behavior.

A high-quality prompt usually contains four parts: role definition, field rules, boundary constraints, and few-shot examples. Examples are especially effective at reducing field-name rewrites and list-type mistakes.

image AI Visual Insight: This image highlights the role of prompt engineering in structured output. By making the role, constraints, and examples more explicit, you compress the model’s output space from open-ended generation toward a near-fixed template, which reduces JSON deviations caused by randomness.

A reusable prompt skeleton is critical

system_prompt = """
You are a strict violation classification assistant.
Output JSON only. Do not include any explanatory text.
The JSON fields must be exactly:
{
  "is_illegal": 
<boolean>,
  "illegal_type": 
<string>,
  "illegal_words": 
<list>
}
Before responding, verify the following:
1. Do not modify key names
2. Do not add fields
3. If there are no illegal terms, illegal_words must be []
"""

The purpose of this prompt is to narrow the model’s output freedom in advance and reduce the chance of unstructured content leaking into the response.

In-generation hard constraints are the core of stable production systems

When your business requires high availability, prompt engineering alone is not enough. A more reliable approach is to apply constraints during generation so the model can only follow token paths that satisfy the required rules.

These mechanisms mainly include JSON Mode, Structured Outputs, and Function Calling. The difference is straightforward: JSON Mode protects the format, while the latter two protect both format and fields.

JSON Mode is the right starting point for parseability

JSON Mode can force the model to return valid JSON text. It is ideal for quickly eliminating redundant narration and Markdown wrapping. However, it does not guarantee correct field names or complete fields.

from openai import OpenAI
import json

client = OpenAI()

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Return the moderation result in JSON"},
        {"role": "user", "content": "What an idiot"}
    ],
    response_format={"type": "json_object"}  # Force a JSON object response
)

data = json.loads(resp.choices[0].message.content)  # Parse pure JSON

This code guarantees, at minimal cost, that the returned result is at least parseable JSON.

Structured Outputs can lock down the field contract

If you care about field completeness, correct types, and forbidding extra properties, Structured Outputs are better suited for production. They use JSON Schema to define the output contract so the model result strictly matches the predefined structure.

schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "illegal_judge_result",
        "strict": True,
        "schema": {
            "type": "object",
            "properties": {
                "is_illegal": {"type": "boolean"},
                "illegal_type": {"type": "string"},
                "illegal_words": {"type": "array"}
            },
            "required": ["is_illegal", "illegal_type", "illegal_words"],
            "additionalProperties": False  # Forbid extra fields
        }
    }
}

This schema turns “acceptable output” into a machine-executable structural constraint.

Function Calling is especially effective for Agents and toolchains

If your system already invokes tools, Function Calling is often the more natural solution. The model is no longer “writing JSON.” Instead, it is “filling function arguments,” which usually provides better stability.

tools = [{
    "type": "function",
    "function": {
        "name": "process_illegal_judge",
        "strict": True,
        "parameters": {
            "type": "object",
            "properties": {
                "is_illegal": {"type": "boolean"},
                "illegal_type": {"type": "string"},
                "illegal_words": {"type": "array"}
            },
            "required": ["is_illegal", "illegal_type", "illegal_words"],
            "additionalProperties": False
        }
    }
}]

The core value of this configuration is that it turns the JSON result directly into tool arguments, reducing the risk of secondary parsing.

Post-processing is the final fuse that prevents production incidents

In reality, not all models support hard constraints, and API fluctuations or compatibility-layer differences can still distort output. For that reason, post-processing is not optional. It is the safety net of a production system.

A practical post-processing pipeline should include four steps: extract first, repair second, validate third, and retry last. This separation keeps each step focused and makes failures easier to diagnose.

Regex extraction removes noisy text first

import re

def extract_json_from_text(text: str):
    block = re.search(r'```json\s*(\{[\s\S]*?\})\s*```', text)
    if block:
        return block.group(1).strip()  # Prefer JSON extracted from a code block
    outer = re.search(r'(\{[\s\S]*\})', text)
    return outer.group(1).strip() if outer else text

This function cleans up the most common contamination pattern: extra narration before or after the payload plus Markdown code fences.

Syntax repair and field validation must run in layers

import json
import json_repair
from pydantic import BaseModel
from typing import Literal

class IllegalJudgeResult(BaseModel, extra="forbid"):
    is_illegal: bool
    illegal_type: Literal["色情", "暴力", "辱骂", "其他", ""]
    illegal_words: list

def safe_parse(raw: str):
    fixed = json_repair.repair_json(raw)  # Repair minor syntax errors
    data = json.loads(fixed)
    return IllegalJudgeResult(**data).model_dump()  # Validate the field contract

This code repairs syntax first and validates fields second, which covers most non-fatal JSON exceptions.

The best end-to-end practice is layered composition, not single-point dependence

For low-risk scenarios, use “prompt engineering + regex extraction.” For medium-risk scenarios, use “JSON Mode + validation.” For high-risk production scenarios, prioritize “Structured Outputs or Function Calling + Pydantic + retry mechanisms.”

At a deeper level, soft constraints reduce the probability of failure, hard constraints guarantee structural correctness, and post-processing absorbs edge-case anomalies. You need all three layers together to move JSON from “sometimes usable” to “stable and reliable.”

FAQ on structured output

Q1: Can prompt engineering alone completely eliminate LLM JSON errors?

No. Prompting can only reduce the probability of errors. It cannot enforce compliance at the generation level. For production systems, you should at least add validation and fallback repair.

Q2: How should I choose between JSON Mode and Structured Outputs?

If your only goal is parseable JSON, use JSON Mode. If you also require complete fields, accurate types, and no extra properties, choose Structured Outputs first.

Q3: What if the model does not support structured output?

Use a compatibility stack based on “strong prompting + regex extraction + json_repair + Pydantic validation + exponential backoff retries.” In practice, this is sufficient for most general business scenarios.

Core takeaway: This article reconstructs a complete JSON output governance strategy for LLMs, covering prompt guidance, JSON Mode, Structured Outputs, Function Calling, regex extraction, json_repair, Pydantic validation, and retry mechanisms. It helps developers systematically solve formatting errors, missing fields, and parsing failures.