Choosing the Right LLM for Cognee: Local Ollama Setup
Thoughts on LLMs for self-hosted Cognee
Choosing the Best LLM for Cognee demands balancing graph-building quality, hallucination rates, and hardware constraints. Cognee excels with larger, low-hallucination models (32B+) via Ollama but mid-size options work for lighter setups.

Key Cognee Requirements
Cognee relies on the LLM for entity extraction, relation inference, and metadata generation. Models under 32B often produce noisy graphs, while high hallucination (e.g., 90%+) pollutes nodes/edges, degrading retrieval. Official docs recommend deepseek-r1:32b or llama3.3-70b-instruct-q3_K_M paired with Mistral embeddings.
Model Comparison Table
| Model | Params | Hallucination (SimpleQA/est.) | VRAM (quantized) | Cognee Strengths | Weaknesses |
|---|---|---|---|---|---|
| gpt-oss:20b | 20B | 91.4% | ~16GB | Fast inference, tool-calling | Severe graph noise |
| Qwen3:14b | 14B | ~40-45% | ~12-14GB | Efficient on modest hardware | Limited depth for graphs |
| Devstral Small 2 | 24B | ~8-10% | ~18-20GB | Coding focus, clean entities | Higher VRAM than Qwen3 |
| Llama3.3-70b | 70B | ~30-40% | ~40GB+ | Optimal graph quality | Heavy resource needs |
| Deepseek-r1:32b | 32B | Low (recommended) | ~24-32GB | Best for reasoning/graphs | Slower on consumer GPUs |
Data synthesized from Cognee docs, model cards, and benchmarks, the hallucination level data even though looks out of wack, might be not far off…
Recommendations by Hardware
- High-end (32GB+ VRAM): Deepseek-r1:32b or Llama3.3-70b. These yield the cleanest graphs per Cognee guidance.
- Mid-range (16-24GB VRAM): Devstral Small 2. Low hallucination and coding prowess suit structured memory tasks.
- Budget (12-16GB VRAM): Qwen3:14b over gpt-oss:20b - avoid 91% hallucination pitfalls.
- Thinking to avoid gpt-oss:20b for Cognee; there are notes that its errors amplify in unfiltered graph construction. But the inferrence speed on my GPU is 2+ times faster….
Quick Ollama + Cognee Setup
# 1. Pull model (e.g., Devstral)
ollama pull devstral-small-2:24b # or qwen3:14b, etc.
# 2. Install Cognee
pip install "cognee[ollama]"
# 3. Env vars
export LLM_PROVIDER="ollama"
export LLM_MODEL="devstral-small-2:24b"
export EMBEDDING_PROVIDER="ollama"
export EMBEDDING_MODEL="nomic-embed-text" # 768 dims
export EMBEDDING_DIMENSIONS=768
# 4. Test graph
cognee add --file "your_data.txt" --name "test_graph"
Match embedding dims (e.g., 768, 1024) across config and vector store. Qwen3 Embeddings (unproven in Cognee) could work at 1024-4096 dims if Ollama-supported.
Prioritize low-hallucination models for production Cognee pipelines—your graphs will thank you. Test on your hardware and monitor graph coherence.
Embedding models
Didn’t think much on this one, but here is a table I brought together, for future reference
| Ollama Model | Size, GB | Embedding Dimensions | Context Length |
|---|---|---|---|
| nomic-embed-text:latest | 0.274 | 768 | 2k |
| jina-embeddings-v2-base-en:latest | 0.274 | 768 | 8k |
| nomic-embed-text-v2-moe | 0.958 | 768 | 512 |
| qwen3-embedding:0.6b | 0.639 | 1024 | 32K |
| qwen3-embedding:4b | 2.5 | 2560 | 32K |
| qwen3-embedding:8b | 4.7 | 4096 | 32K |
| avr/sfr-embedding-mistral:latest | 4.4 | 4096 | 32K |
Useful links
- https://docs.cognee.ai/how_to_guides/local_models
- https://docs.cognee.ai/setup-configuration/embedding-providers
- https://arxiv.org/html/2508.10925v1
- https://github.com/vectara/hallucination-leaderboard
- https://ollama.com/library/nomic-embed-text-v2-moe
- Qwen3 Embedding
- How to Move Ollama Models to Different Drive or Folder
- Ollama cheatsheet