Skip to content

Structured Generation: Why AI Output Parsing No Longer Needs Regex

Learn how structured generation in LLMs lets you extract clean, typed JSON directly from AI models, eliminating the need for fragile regex parsers and post-processing hacks.

Structured Generation: Why AI Output Parsing No Longer Needs Regex

You have a working LLM integration. The model is smart, the responses look good in testing, and then on Tuesday morning a ticket lands: the parser broke in production. The model added a polite prefix like "Sure! Here's the JSON you asked for:" and your regex fell apart.

You patch it. A week later, the model changes its phrasing slightly and the whole thing breaks again. You add more regex. Your code turns into a graveyard of edge case handlers that nobody wants to touch.

This is the exact problem that structured generation solves. Instead of begging the model to "return only JSON, no extra text" in your prompt and hoping for the best, structured generation forces the model to output data that matches a schema you define. No parser. No regex. Just clean, typed data every single time.


What Is Structured Generation?

Structured generation (also called constrained decoding) is a technique where the model's output tokens are restricted during generation to only produce valid output according to a schema.

It works at the inference level. The model is not just instructed to output JSON. It is mathematically constrained so that it cannot produce anything else. Think of it as putting guardrails on the token sampling process itself.

The result: you get a Python dict, a typed object, or a validated JSON payload back from the API. No string parsing needed.


The Old Way vs. The New Way

ApproachHow It WorksMain Problem
Prompt engineeringAsk nicely in the system promptModel can ignore it
Regex parsingExtract patterns from raw textBreaks with any format change
JSON mode (basic)Ask model to output JSONNo schema enforcement
Structured generationSchema-constrained token samplingReliable, typed output every time

The old way is fragile by design. The new way bakes reliability into the generation process itself.


How Structured Generation Actually Works

At inference time, the model generates one token at a time. Normally, any token in the vocabulary can come next. With structured generation, a grammar or schema is used to compute a mask over the vocabulary at each step, blocking any token that would make the output invalid.

For example, if your schema says the output must be a JSON object with a name field (string) and an score field (integer), the model cannot produce a boolean, an array, or a random sentence. The decoding engine enforces this at every single token.

Libraries like Outlines, Guidance, and LM Format Enforcer implement this approach. Cloud providers like OpenAI and Anthropic expose it through simpler API-level controls.


Practical Setup: Four Ways to Use Structured Generation

1. OpenAI Structured Outputs (API)

OpenAI supports structured outputs via the response_format parameter using JSON Schema.

python
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class MovieReview(BaseModel):
    title: str
    rating: int
    summary: str

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "user", "content": "Review the movie Inception"}
    ],
    response_format=MovieReview,
)

review = completion.choices[0].message.parsed
print(review.title)   # Inception
print(review.rating)  # 9

No regex. No string parsing. review is already a typed MovieReview object.


2. Instructor (Works with Any OpenAI-Compatible API)

Instructor is a Python library that wraps any LLM client and adds structured output support via Pydantic.

bash
pip install instructor
python
import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class UserInfo(BaseModel):
    name: str
    age: int
    email: str

user = client.chat.completions.create(
    model="gpt-4o",
    response_model=UserInfo,
    messages=[
        {"role": "user", "content": "Extract: John Doe, 34, john@example.com"}
    ],
)

print(user.name)   # John Doe
print(user.age)    # 34

Instructor also supports automatic retries with validation feedback when the model gets it wrong.


3. Outlines (Local / Self-Hosted Models)

Outlines is the go-to library for structured generation with local models (Mistral, LLaMA, etc.).

bash
pip install outlines
python
import outlines
from pydantic import BaseModel

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool

generator = outlines.generate.json(model, Product)

result = generator("Describe the product: Blue running shoes, $89.99, available")

print(result.name)      # Blue running shoes
print(result.price)     # 89.99
print(result.in_stock)  # True

This works offline. No API key needed.


4. JSON Mode vs. Structured Outputs: Know the Difference

Many developers confuse these two. They are not the same.

python
# JSON Mode: tells model to output valid JSON, but no schema enforcement
response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},  # Only guarantees valid JSON
    messages=[...]
)

# Structured Outputs: enforces a specific schema
response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    response_format=MyPydanticModel,  # Guarantees schema compliance
    messages=[...]
)

JSON Mode guarantees valid JSON syntax. Structured Outputs guarantees that the JSON matches your exact schema.


Project Structure for a Structured Generation Pipeline

A clean way to organize an extraction or parsing pipeline using structured generation:

my_extraction_app/
├── schemas/
│   ├── __init__.py
│   ├── invoice.py        # Pydantic models for invoice data
│   ├── contact.py        # Pydantic models for contact info
│   └── product.py        # Pydantic models for product listings
├── extractors/
│   ├── __init__.py
│   ├── base.py           # Base extractor class
│   └── document.py       # Document extraction logic
├── prompts/
│   └── system.txt        # System prompt templates
├── main.py
└── requirements.txt

Keeping schemas in their own folder makes them reusable across different extractors and easy to version.


When to Use Which Approach

Use CaseRecommended Tool
Using OpenAI or Azure OpenAIInstructor or native Structured Outputs
Self-hosted or local modelsOutlines or LM Format Enforcer
Simple JSON output, no strict schemaJSON Mode
Complex nested schemas with validationInstructor + Pydantic validators
Production pipelines needing retriesInstructor (built-in retry logic)

Common Pitfalls and How to Avoid Them

Optional fields can still cause issues. If your schema has required fields but the model cannot find that data in the input, it may hallucinate values. Use Optional[str] = None for fields that might not always be present.

python
from typing import Optional
from pydantic import BaseModel

class Contact(BaseModel):
    name: str
    email: Optional[str] = None   # Safe: won't force hallucination
    phone: Optional[str] = None

Nested schemas increase latency. The more complex the schema, the longer constrained decoding takes. Flatten where possible.

Not all models support all schema features. Recursive schemas and certain union types may not be supported depending on the model and library. Test your schema before deploying.


Real-World Example: Extracting Invoice Data

Here is a practical example that shows how structured generation replaces a messy regex-based invoice parser.

python
import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import List

client = instructor.from_openai(OpenAI())

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float
    total: float

class Invoice(BaseModel):
    invoice_number: str
    vendor_name: str
    total_amount: float
    line_items: List[LineItem]

invoice_text = """
Invoice #INV-2024-0042
From: Acme Supplies Ltd
Items:
- 5x Widget A @ $12.00 each = $60.00
- 2x Widget B @ $35.00 each = $70.00
Total: $130.00
"""

result = client.chat.completions.create(
    model="gpt-4o",
    response_model=Invoice,
    messages=[
        {"role": "user", "content": f"Extract invoice data:\n{invoice_text}"}
    ],
)

print(result.invoice_number)        # INV-2024-0042
print(result.total_amount)          # 130.0
print(result.line_items[0].description)  # Widget A

Before structured generation, this would require at least 20 lines of regex and would still break on formatting variations.


Q&A

1. Does structured generation work with all LLM providers?

Not all providers support it natively. OpenAI has first-class support. For other providers, use Instructor which works with any OpenAI-compatible API, or use Outlines for local models.

2. Is structured generation slower than regular generation?

Slightly, yes. Constrained decoding adds a small overhead at each token step. In practice, the difference is often negligible compared to network latency on cloud APIs, but it can be noticeable with very large schemas on local models.

3. Can I still use a system prompt alongside structured generation?

Yes, absolutely. A good system prompt still improves the quality of the extracted data. Structured generation just guarantees the format, not the accuracy of the content.

4. What happens if the model cannot find the required data in the input?

With required fields, the model will attempt to fill them, which can lead to hallucination. Always mark fields as Optional when the data might not be present in all inputs.

5. Is Instructor free to use?

Yes. Instructor is an open-source Python library. You still pay for the underlying API calls (e.g., to OpenAI), but Instructor itself is free.

6. Can structured generation handle nested or recursive schemas?

Nested schemas (like a list of objects inside an object) work well. Recursive schemas are less reliable and depend heavily on the library and model. Test carefully before using in production.

7. How is this different from function calling?

Function calling is a related concept where the model selects and calls a function with structured arguments. Structured outputs and function calling often use the same underlying mechanism, and many structured generation libraries are built on top of function calling APIs.

8. What is the best library for beginners?

Instructor is the most beginner-friendly. It integrates directly with Pydantic, which most Python developers already know, and it works with OpenAI's API without needing to run anything locally.

9. Can I validate the extracted data beyond just schema types?

Yes. Pydantic supports custom validators using @field_validator decorators. For example, you can enforce that an email field matches a specific pattern or that a price is always positive.

python
from pydantic import BaseModel, field_validator

class Product(BaseModel):
    name: str
    price: float

    @field_validator("price")
    def price_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError("Price must be positive")
        return v

10. Should I completely stop using regex?

Regex is still useful for simple, deterministic text patterns like extracting a phone number format or validating an email address. The point is to stop using regex as a replacement for proper output parsing from LLMs. Use the right tool for each job.

My SaaS
Acluebox
Build modular and reusable system prompts with my SaaS, Acluebox. Also, free prompt template generators there.

References

Made with ❤️ by Mun Bock Ho

Copyright ©️ 2026