Ollama GPT-OSS Structured Output Issues
Not very nice.
Ollama’s GPT-OSS models have recurring issues handling structured output, especially when used with frameworks like LangChain, OpenAI SDK, vllm, and others.
Many users report failures to generate valid JSON or other structured formats, model hallucination of format elements, and inconsistent or empty response content. These problems stem from current compatibility gaps, response format changes (such as Harmony), and incomplete enforcement of output schemas by both Ollama and third-party APIs.
About GPT-OSS
This is a new very interesting LLM from OpenAI. Just look at these params:
Model | gpt-oss-120b | gpt-oss-20b |
---|---|---|
Layers | 36 | 24 |
Total Params | 117B | 21B |
Active Params Per Token | 5.1B | 3.6B |
Total Experts | 128 | 32 |
Active Experts Per Token | 4 | 4 |
Context Length | 128k | 128k |
Release notes say (here and here):
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
- MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.
What not to love? The behaviour of structured output… that what is. Overall, this issue is very disappointing, especially because Structured Output works so well with Ollama and Qwen3.
Common Issues
- Models like gpt-oss:20b frequently fail to produce strict JSON or schema-compliant output, with responses often containing extra commentary or incomplete objects.
- Integration with LangChain and OpenAI SDK tends to throw parsing/validation errors due to non-structured output, making pipelines unusable in production environments.
- Harmony format in gpt-oss introduces reasoning traces even when not requested, complicating schema parsing compared to other models such as Qwen3.
- With vllm, structured output enforcement mechanisms are either missing or deprecated, so the output is frequently “unguided” and must be manually parsed.
- There are reports of the model producing the correct structured output, then continuing with unrelated content, breaking standard parsers.
Workarounds and Fixes
- Some users suggest explicitly specifying the JSON schema in the prompt and attempting manual parsing of model outputs, sometimes using pre- and post-split markers.
- Another approach is to run a post-processing layer or a smaller LLM to reformat GPT-OSS output to the desired schema, though this is resource-intensive.
- A few bugfixes and pull requests (PRs) have incrementally improved Harmony format compliance, particularly with newer Ollama releases, but complete parity with previous models is not achieved yet.
- When using vllm, patching specific functions may help, but generally, robust schema enforcement isn’t supported at this time.
Recommendations
- Avoid relying solely on GPT-OSS for strict structured output until full compatibility is restored in Ollama and downstream frameworks.
- Where structured output is critical, use additional parsing or a model better known for schema compliance.
- Monitor relevant GitHub issues (ollama/ollama, langchain-ai/langchain, vllm-project/vllm) for fixes and integration updates.
In summary, GPT-OSS with Ollama currently struggles with structured output, largely due to incomplete format enforcement, Harmony format changes, and lacking support across toolchains. Manual workarounds may help, but consistent success is not guaranteed.
Useful links
- https://www.reddit.com/r/MachineLearning/comments/1n37qnu/d_ollamagptoss20b_cant_seem_to_generate/
- https://github.com/vllm-project/vllm/issues/23120
- https://github.com/ollama/ollama/issues/11691
- https://huggingface.co/openai/gpt-oss-20b/discussions/111
- https://github.com/langchain-ai/langchain/issues/33116
- https://ollama.com/library/gpt-oss
- https://openai.com/index/introducing-gpt-oss/
Other Ollama Articles
- LLMs and Structured Output: Ollama, Qwen3 & Python or Go
- Structured output comparison across popular LLM providers - OpenAI, Gemini, Anthropic, Mistral and AWS Bedrock
- Ollama cheatsheet
- Test: How Ollama is using Intel CPU Performance and Efficient Cores
- How Ollama Handles Parallel Requests
- LLM Performance and PCIe Lanes: Key Considerations