What makes an LLM suitable for Cognee?

Cognee requires LLMs with low hallucination rates (ideally under 40%) and sufficient parameter count (32B+ recommended) for quality entity extraction, relation inference, and metadata generation. Models with high hallucination rates (90%+) produce noisy graphs that degrade retrieval quality.

Which LLM should I choose for my hardware setup?

For high-end setups (32GB+ VRAM), use Deepseek-r1:32b or Llama3.3-70b. For mid-range (16-24GB VRAM), Devstral Small 2 offers low hallucination and coding focus. For budget setups (12-16GB VRAM), Qwen3:14b is preferable over gpt-oss:20b due to much lower hallucination rates.

Why avoid gpt-oss:20b for Cognee?

Despite fast inference speeds, gpt-oss:20b has a 91.4% hallucination rate that severely pollutes graph nodes and edges, making it unsuitable for production Cognee pipelines where graph quality is critical.

What embedding models work with Cognee on Ollama?

Popular options include nomic-embed-text (768 dims, 2k context), jina-embeddings-v2-base-en (768 dims, 8k context), and qwen3-embedding models (1024-4096 dims, 32K context). Ensure embedding dimensions match your vector store configuration.

Where does Ollama for Cognee fit in the LLM hosting landscape?

Ollama is one local option. Our main LLM Hosting guide compares it with vLLM, Docker Model Runner, LocalAI and cloud providers, including cost and infrastructure trade-offs.

Can I run Cognee with a backend other than Ollama?

Cognee supports Ollama and other providers. For comparing local and cloud backends, see the LLM Hosting guide.

Choosing the Right LLM for Cognee: Local Ollama Setup

Thoughts on LLMs for self-hosted Cognee

Page content

Choosing the Best LLM for Cognee demands balancing graph-building quality, hallucination rates, and hardware constraints. Cognee excels with larger, low-hallucination models (32B+) via Ollama but mid-size options work for lighter setups.

For a broader comparison of Ollama with vLLM, Docker Model Runner, LocalAI and cloud providers—including cost and infrastructure trade-offs—see LLM Hosting: Local, Self-Hosted & Cloud Infrastructure Compared.

lecture-in-the-library

Key Cognee Requirements

Cognee relies on the LLM for entity extraction, relation inference, and metadata generation. Models under 32B often produce noisy graphs, while high hallucination (e.g., 90%+) pollutes nodes/edges, degrading retrieval. Official docs recommend deepseek-r1:32b or llama3.3-70b-instruct-q3_K_M paired with Mistral embeddings.

Model Comparison Table

Model	Params	Hallucination (SimpleQA/est.)	VRAM (quantized)	Cognee Strengths	Weaknesses
gpt-oss:20b	20B	91.4%	~16GB	Fast inference, tool-calling	Severe graph noise
Qwen3:14b	14B	~40-45%	~12-14GB	Efficient on modest hardware	Limited depth for graphs
Devstral Small 2	24B	~8-10%	~18-20GB	Coding focus, clean entities	Higher VRAM than Qwen3
Llama3.3-70b	70B	~30-40%	~40GB+	Optimal graph quality	Heavy resource needs
Deepseek-r1:32b	32B	Low (recommended)	~24-32GB	Best for reasoning/graphs	Slower on consumer GPUs

Data synthesized from Cognee docs, model cards, and benchmarks, the hallucination level data even though looks out of wack, might be not far off…

Recommendations by Hardware

High-end (32GB+ VRAM): Deepseek-r1:32b or Llama3.3-70b. These yield the cleanest graphs per Cognee guidance.
Mid-range (16-24GB VRAM): Devstral Small 2. Low hallucination and coding prowess suit structured memory tasks.
Budget (12-16GB VRAM): Qwen3:14b over gpt-oss:20b - avoid 91% hallucination pitfalls.
Thinking to avoid gpt-oss:20b for Cognee; there are notes that its errors amplify in unfiltered graph construction. But the inferrence speed on my GPU is 2+ times faster….

Quick Ollama + Cognee Setup

# 1. Pull model (e.g., Devstral)
ollama pull devstral-small-2:24b  # or qwen3:14b, etc.

# 2. Install Cognee
pip install "cognee[ollama]"

# 3. Env vars
export LLM_PROVIDER="ollama"
export LLM_MODEL="devstral-small-2:24b"
export EMBEDDING_PROVIDER="ollama"
export EMBEDDING_MODEL="nomic-embed-text"  # 768 dims
export EMBEDDING_DIMENSIONS=768

# 4. Test graph
cognee-cli add your_data_file.txt --dataset-name "test_graph"

Match embedding dims (e.g., 768, 1024) across config and vector store. Qwen3 Embeddings (unproven in Cognee) could work at 1024-4096 dims if Ollama-supported.

Prioritize low-hallucination models for production Cognee pipelines—your graphs will thank you. Test on your hardware and monitor graph coherence. To see how Ollama fits with other local and cloud LLM options, check our LLM Hosting: Local, Self-Hosted & Cloud Infrastructure Compared guide.

Embedding models

Didn’t think much on this one, but here is a table I brought together, for future reference

Ollama Model	Size, GB	Embedding Dimensions	Context Length
nomic-embed-text:latest	0.274	768	2k
jina-embeddings-v2-base-en:latest	0.274	768	8k
nomic-embed-text-v2-moe	0.958	768	512
qwen3-embedding:0.6b	0.639	1024	32K
qwen3-embedding:4b	2.5	2560	32K
qwen3-embedding:8b	4.7	4096	32K
avr/sfr-embedding-mistral:latest	4.4	4096	32K

Key Cognee Requirements

Model Comparison Table

Recommendations by Hardware

Quick Ollama + Cognee Setup

Embedding models

Useful links