Learn what "shadow context" and context drift mean in LLMs, why they cause AI to forget your original instructions, and how to fix them using structured prompts, periodic re-injection, and conversation state management.

Learn how structured generation in LLMs lets you extract clean, typed JSON directly from AI models, eliminating the need for fragile regex parsers and post-processing hacks.

You have a working LLM integration. The model is smart, the responses look good in testing, and then on Tuesday morning a ticket lands: the parser broke in production. The model added a polite prefix like "Sure! Here's the JSON you asked for:" and your regex fell apart.
You patch it. A week later, the model changes its phrasing slightly and the whole thing breaks again. You add more regex. Your code turns into a graveyard of edge case handlers that nobody wants to touch.
This is the exact problem that structured generation solves. Instead of begging the model to "return only JSON, no extra text" in your prompt and hoping for the best, structured generation forces the model to output data that matches a schema you define. No parser. No regex. Just clean, typed data every single time.
Structured generation (also called constrained decoding) is a technique where the model's output tokens are restricted during generation to only produce valid output according to a schema.
It works at the inference level. The model is not just instructed to output JSON. It is mathematically constrained so that it cannot produce anything else. Think of it as putting guardrails on the token sampling process itself.
The result: you get a Python dict, a typed object, or a validated JSON payload back from the API. No string parsing needed.
| Approach | How It Works | Main Problem |
|---|---|---|
| Prompt engineering | Ask nicely in the system prompt | Model can ignore it |
| Regex parsing | Extract patterns from raw text | Breaks with any format change |
| JSON mode (basic) | Ask model to output JSON | No schema enforcement |
| Structured generation | Schema-constrained token sampling | Reliable, typed output every time |
The old way is fragile by design. The new way bakes reliability into the generation process itself.
At inference time, the model generates one token at a time. Normally, any token in the vocabulary can come next. With structured generation, a grammar or schema is used to compute a mask over the vocabulary at each step, blocking any token that would make the output invalid.
For example, if your schema says the output must be a JSON object with a name field (string) and an score field (integer), the model cannot produce a boolean, an array, or a random sentence. The decoding engine enforces this at every single token.
Libraries like Outlines, Guidance, and LM Format Enforcer implement this approach. Cloud providers like OpenAI and Anthropic expose it through simpler API-level controls.
OpenAI supports structured outputs via the response_format parameter using JSON Schema.
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class MovieReview(BaseModel):
title: str
rating: int
summary: str
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "user", "content": "Review the movie Inception"}
],
response_format=MovieReview,
)
review = completion.choices[0].message.parsed
print(review.title) # Inception
print(review.rating) # 9No regex. No string parsing. review is already a typed MovieReview object.
Instructor is a Python library that wraps any LLM client and adds structured output support via Pydantic.
pip install instructorimport instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class UserInfo(BaseModel):
name: str
age: int
email: str
user = client.chat.completions.create(
model="gpt-4o",
response_model=UserInfo,
messages=[
{"role": "user", "content": "Extract: John Doe, 34, john@example.com"}
],
)
print(user.name) # John Doe
print(user.age) # 34Instructor also supports automatic retries with validation feedback when the model gets it wrong.
Outlines is the go-to library for structured generation with local models (Mistral, LLaMA, etc.).
pip install outlinesimport outlines
from pydantic import BaseModel
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
class Product(BaseModel):
name: str
price: float
in_stock: bool
generator = outlines.generate.json(model, Product)
result = generator("Describe the product: Blue running shoes, $89.99, available")
print(result.name) # Blue running shoes
print(result.price) # 89.99
print(result.in_stock) # TrueThis works offline. No API key needed.
Many developers confuse these two. They are not the same.
# JSON Mode: tells model to output valid JSON, but no schema enforcement
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"}, # Only guarantees valid JSON
messages=[...]
)
# Structured Outputs: enforces a specific schema
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
response_format=MyPydanticModel, # Guarantees schema compliance
messages=[...]
)JSON Mode guarantees valid JSON syntax. Structured Outputs guarantees that the JSON matches your exact schema.
A clean way to organize an extraction or parsing pipeline using structured generation:
my_extraction_app/
├── schemas/
│ ├── __init__.py
│ ├── invoice.py # Pydantic models for invoice data
│ ├── contact.py # Pydantic models for contact info
│ └── product.py # Pydantic models for product listings
├── extractors/
│ ├── __init__.py
│ ├── base.py # Base extractor class
│ └── document.py # Document extraction logic
├── prompts/
│ └── system.txt # System prompt templates
├── main.py
└── requirements.txtKeeping schemas in their own folder makes them reusable across different extractors and easy to version.
| Use Case | Recommended Tool |
|---|---|
| Using OpenAI or Azure OpenAI | Instructor or native Structured Outputs |
| Self-hosted or local models | Outlines or LM Format Enforcer |
| Simple JSON output, no strict schema | JSON Mode |
| Complex nested schemas with validation | Instructor + Pydantic validators |
| Production pipelines needing retries | Instructor (built-in retry logic) |
Optional fields can still cause issues. If your schema has required fields but the model cannot find that data in the input, it may hallucinate values. Use Optional[str] = None for fields that might not always be present.
from typing import Optional
from pydantic import BaseModel
class Contact(BaseModel):
name: str
email: Optional[str] = None # Safe: won't force hallucination
phone: Optional[str] = NoneNested schemas increase latency. The more complex the schema, the longer constrained decoding takes. Flatten where possible.
Not all models support all schema features. Recursive schemas and certain union types may not be supported depending on the model and library. Test your schema before deploying.
Here is a practical example that shows how structured generation replaces a messy regex-based invoice parser.
import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import List
client = instructor.from_openai(OpenAI())
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
total: float
class Invoice(BaseModel):
invoice_number: str
vendor_name: str
total_amount: float
line_items: List[LineItem]
invoice_text = """
Invoice #INV-2024-0042
From: Acme Supplies Ltd
Items:
- 5x Widget A @ $12.00 each = $60.00
- 2x Widget B @ $35.00 each = $70.00
Total: $130.00
"""
result = client.chat.completions.create(
model="gpt-4o",
response_model=Invoice,
messages=[
{"role": "user", "content": f"Extract invoice data:\n{invoice_text}"}
],
)
print(result.invoice_number) # INV-2024-0042
print(result.total_amount) # 130.0
print(result.line_items[0].description) # Widget ABefore structured generation, this would require at least 20 lines of regex and would still break on formatting variations.
1. Does structured generation work with all LLM providers?
Not all providers support it natively. OpenAI has first-class support. For other providers, use Instructor which works with any OpenAI-compatible API, or use Outlines for local models.
2. Is structured generation slower than regular generation?
Slightly, yes. Constrained decoding adds a small overhead at each token step. In practice, the difference is often negligible compared to network latency on cloud APIs, but it can be noticeable with very large schemas on local models.
3. Can I still use a system prompt alongside structured generation?
Yes, absolutely. A good system prompt still improves the quality of the extracted data. Structured generation just guarantees the format, not the accuracy of the content.
4. What happens if the model cannot find the required data in the input?
With required fields, the model will attempt to fill them, which can lead to hallucination. Always mark fields as Optional when the data might not be present in all inputs.
5. Is Instructor free to use?
Yes. Instructor is an open-source Python library. You still pay for the underlying API calls (e.g., to OpenAI), but Instructor itself is free.
6. Can structured generation handle nested or recursive schemas?
Nested schemas (like a list of objects inside an object) work well. Recursive schemas are less reliable and depend heavily on the library and model. Test carefully before using in production.
7. How is this different from function calling?
Function calling is a related concept where the model selects and calls a function with structured arguments. Structured outputs and function calling often use the same underlying mechanism, and many structured generation libraries are built on top of function calling APIs.
8. What is the best library for beginners?
Instructor is the most beginner-friendly. It integrates directly with Pydantic, which most Python developers already know, and it works with OpenAI's API without needing to run anything locally.
9. Can I validate the extracted data beyond just schema types?
Yes. Pydantic supports custom validators using @field_validator decorators. For example, you can enforce that an email field matches a specific pattern or that a price is always positive.
from pydantic import BaseModel, field_validator
class Product(BaseModel):
name: str
price: float
@field_validator("price")
def price_must_be_positive(cls, v):
if v <= 0:
raise ValueError("Price must be positive")
return v10. Should I completely stop using regex?
Regex is still useful for simple, deterministic text patterns like extracting a phone number format or validating an email address. The point is to stop using regex as a replacement for proper output parsing from LLMs. Use the right tool for each job.
Structured Outputs in the API - https://platform.openai.com/docs/guides/structured-outputs
Outlines: Structured Text Generation - https://github.com/dottxt-ai/outlines
Instructor: Top Multi-Language Library for Structured LLM Outputs - https://python.useinstructor.com/
Willard, B. T., & Louf, R. (2023). Efficient Guided Generation for Large Language Models - https://arxiv.org/abs/2307.09702
Pydantic V2 Documentation: Validators - https://docs.pydantic.dev/latest/concepts/validators/