BAML vs Instructor: Structured LLM Outputs
Type-safe LLM outputs with BAML and Instructor
When working with Large Language Models in production, getting structured, type-safe outputs is critical. Two popular frameworks - BAML and Instructor - take different approaches to solving this problem.
This comparison helps you choose the right tool for your Python LLM applications.

Understanding Structured Output Challenges
LLMs naturally generate unstructured text, but modern applications need predictable, parseable data. Whether you’re building chatbots, data extraction pipelines, or AI agents, you need JSON objects, validated data types, and error handling—not free-form responses.
Both BAML and Instructor address this challenge but with fundamentally different philosophies: BAML uses a contract-first approach with code generation, while Instructor leverages Python’s type system with runtime validation. If you’re interested in broader context about structured output approaches across different LLM providers, understanding these frameworks becomes even more valuable.
BAML: Domain-Specific Language for LLMs
BAML (BoundaryML’s language) introduces a dedicated DSL for defining LLM interactions. You write .baml files that declare your prompts, types, and functions, then BAML generates type-safe client code for multiple languages including Python.
Key Features of BAML
Type Safety Across Languages: BAML generates clients for Python, TypeScript, and Ruby from the same .baml definitions, ensuring consistency across your stack.
Version Control for Prompts: Your prompts live in .baml files, making them easy to track, review, and test independently from application code.
Built-in Testing Framework: BAML includes testing tools to validate prompt behavior before deployment, catching issues early in development.
Playground Interface: The BAML playground lets you iterate on prompts visually with immediate feedback, accelerating development cycles.
BAML Example Implementation
# First, define your schema in a .baml file:
# persona.baml
class Person {
name string
age int
occupation string
skills string[]
}
function ExtractPerson(text: string) -> Person {
client GPT4
prompt #"
Extract person information from: {{ text }}
Return structured data.
"#
}
The generated Python client provides type-safe access:
from baml_client import b
from baml_client.types import Person
# Use the generated client
text = "John Smith, 34, software engineer skilled in Python and Go"
result: Person = b.ExtractPerson(text)
print(f"{result.name} is {result.age} years old")
print(f"Skills: {', '.join(result.skills)}")
BAML’s approach shines when you have multiple services consuming the same LLM contracts or when you need strong guarantees about data shapes across language boundaries.
Instructor: Pydantic-Native Python Framework
Instructor takes a Python-first approach, extending Pydantic models with LLM capabilities. It feels natural to Python developers already using Pydantic for validation and type hints.
Key Features of Instructor
Zero Boilerplate: Instructor works directly with your existing Pydantic models using simple decorators. No code generation or build steps required.
Rich Validation: Leverage Pydantic’s entire validation ecosystem—custom validators, field constraints, computed fields, and complex nested structures.
Multiple Provider Support: Works seamlessly with OpenAI, Anthropic, Google, and Ollama through a unified interface.
Streaming Support: First-class support for streaming responses with incremental Pydantic model updates.
Retry Logic: Built-in retry mechanisms with exponential backoff and validator-based error recovery.
Instructor Example Implementation
from pydantic import BaseModel, Field
from instructor import from_openai
from openai import OpenAI
# Define your Pydantic model
class Person(BaseModel):
name: str = Field(description="Full name of the person")
age: int = Field(ge=0, le=120, description="Age in years")
occupation: str
skills: list[str] = Field(description="List of professional skills")
# Patch the OpenAI client
client = from_openai(OpenAI())
# Extract structured data
text = "John Smith, 34, software engineer skilled in Python and Go"
result = client.chat.completions.create(
model="gpt-4",
response_model=Person,
messages=[
{"role": "user", "content": f"Extract person info: {text}"}
]
)
print(f"{result.name} is {result.age} years old")
print(f"Skills: {', '.join(result.skills)}")
Instructor’s strength lies in its simplicity and integration with Python’s ecosystem. If you’re already using Pydantic, the learning curve is minimal. For developers new to Python or needing quick reference for Python-specific patterns, our Python cheatsheet provides helpful syntax reminders alongside these frameworks.
Detailed Comparison: BAML vs Instructor
Development Experience
BAML requires an additional build step and tooling setup. You write .baml files, run the generator, then import the generated code. This creates a clear separation between prompt engineering and application logic, which can be beneficial for larger teams.
Instructor has zero setup friction—pip install and you’re ready. Your prompts live alongside your code, making rapid iteration easier for smaller projects or prototypes.
Type Safety and Validation
BAML provides compile-time type checking in the generated code. Your IDE knows exactly what fields are available before you run anything. Cross-language consistency is guaranteed since the same .baml file generates clients for all supported languages.
Instructor offers runtime validation through Pydantic. While Python type hints provide IDE support, errors surface during execution. This is standard for Python but means less static guarantee than BAML’s generated code.
Working with Local LLMs
Both frameworks support local models, which is crucial for privacy, cost control, and offline development. When using Ollama or other local LLM providers, you maintain the same structured output benefits without external API dependencies. For a deeper dive into constraining LLMs with structured output using Ollama, Qwen3, and Python or Go, these frameworks provide production-ready abstractions over the lower-level APIs.
BAML connects to Ollama by configuring the client in your .baml file:
# In your .baml file:
client OllamaLocal {
provider ollama
options {
model "llama2"
base_url "http://localhost:11434"
}
}
Instructor works with Ollama through the OpenAI-compatible API:
from openai import OpenAI
from instructor import from_openai
client = from_openai(OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # dummy key
))
Note that when working with local models, you should be aware of potential structured output issues with Ollama and GPT-OSS models, as not all models handle structured outputs with equal reliability.
Error Handling and Retries
BAML handles retries at the framework level with configurable strategies. Errors in schema validation trigger automatic reprompting with error context.
Instructor provides decorative retry logic with hooks for custom behavior. You can define validators that trigger retries with modified prompts:
from instructor import patch
from tenacity import retry, stop_after_attempt
@retry(stop=stop_after_attempt(3))
def extract_with_retry(text: str) -> Person:
return client.chat.completions.create(
model="gpt-4",
response_model=Person,
messages=[{"role": "user", "content": text}]
)
Testing and Observability
BAML includes a testing framework where you can write test cases directly in .baml files, validating prompt behavior across different inputs. The playground provides visual debugging.
Instructor integrates with standard Python testing frameworks. You can use pytest fixtures, mocking libraries, and assertion helpers just like any Python code.
Performance Considerations
Runtime performance is comparable—both frameworks ultimately make the same LLM API calls. The overhead for validation and parsing is negligible compared to network latency and model inference time.
Development velocity differs significantly:
- BAML’s code generation means better autocomplete and earlier error detection but requires a build step
- Instructor’s decorator approach means faster iteration but runtime error discovery
For production systems processing millions of requests, both frameworks handle load equally well. Your choice depends more on development workflow preferences than performance characteristics.
When to Choose BAML
Select BAML when you need:
- Multi-language support: Accessing the same LLM contracts from Python, TypeScript, and Ruby services
- Contract-first development: API-style development where LLM interfaces are designed before implementation
- Team collaboration: Separate prompt engineering workflows from application development
- Strong typing guarantees: Compile-time checks across your entire stack
- Visual prompt development: Playground-driven iteration on prompts
When to Choose Instructor
Choose Instructor when you want:
- Python-only projects: No need for cross-language consistency
- Rapid prototyping: Minimum setup to get structured outputs working
- Pydantic integration: Leveraging existing Pydantic models and validators
- Simple deployment: No build steps or generated code to manage
- Rich Python ecosystem: Using Python-specific libraries and patterns
Combining Approaches
Some projects benefit from using both frameworks. For example, you might use BAML for customer-facing APIs that need cross-language clients, while using Instructor for internal Python services that need rapid iteration.
You can also transition between frameworks as your project matures—starting with Instructor for quick validation, then moving to BAML when you need broader language support or stricter contracts.
Real-World Use Cases
Data Extraction Pipeline (BAML)
A document processing system uses BAML to extract structured data from invoices, contracts, and receipts. The .baml definitions serve as contracts between the ML team and the backend services, with TypeScript clients for the web dashboard and Python clients for batch processing.
Customer Support Bot (Instructor)
A support bot uses Instructor to classify tickets, extract user intents, and generate responses. The team iterates quickly on prompts using Pydantic models, with validators ensuring extracted phone numbers, emails, and ticket IDs meet format requirements.
Multi-Modal AI Agent (Both)
An AI agent system uses BAML for core agent-to-agent communication contracts, ensuring type safety across the distributed system, while individual agents use Instructor internally for flexible, Python-native processing of user inputs. Similar patterns apply when building MCP servers in Python, where structured outputs enable reliable tool integration with AI assistants.
Migration and Integration Paths
If you’re already using basic JSON parsing with LLMs, both frameworks offer straightforward migration paths:
From JSON to BAML: Convert your JSON schemas to BAML type definitions, move prompts into .baml files, generate clients, and replace manual parsing with generated types.
From JSON to Instructor: Add Pydantic models matching your JSON structure, install instructor, patch your OpenAI client, and replace JSON parsing with response_model parameters.
Both migrations can be incremental—you don’t need to convert your entire codebase at once.
Future Outlook and Community
Both frameworks are actively developed with strong communities:
BAML (BoundaryML) focuses on expanding language support, improving the playground, and enhancing testing capabilities. The commercial backing suggests long-term stability.
Instructor maintains a strong open-source presence with frequent updates, extensive documentation, and growing adoption. The project is well-maintained by Jason Liu and contributors.
Conclusion
BAML and Instructor represent two excellent but distinct approaches to structured LLM outputs. BAML’s contract-first, multi-language philosophy suits teams building distributed systems with strict type requirements. Instructor’s Python-native, Pydantic-based approach fits rapid development and Python-centric stacks.
Neither is universally better—your choice depends on your team’s size, language preferences, development workflow, and type safety requirements. Many teams will find that starting with Instructor for prototyping, then adopting BAML for production multi-service architectures, offers the best of both worlds.
Useful links
Related articles on this site
- Constraining LLMs with Structured Output: Ollama, Qwen3 & Python or Go
- Structured output comparison across popular LLM providers - OpenAI, Gemini, Anthropic, Mistral and AWS Bedrock
- Ollama GPT-OSS Structured Output Issues
- Building MCP Servers in Python: WebSearch & Scrape
- Python Cheatsheet
- Ollama Cheatsheet