AI

How Ollama Handles Parallel Requests

How Ollama Handles Parallel Requests

Understand Ollama concurrency, queueing, and how to tune OLLAMA_NUM_PARALLEL for stable parallel requests.

This guide explains how Ollama handles parallel requests (concurrency, queuing, and resource limits), and how to tune it using the OLLAMA_NUM_PARALLEL environment variable (and related knobs).

Testing Deepseek-R1 on Ollama

Testing Deepseek-R1 on Ollama

Comparing two deepseek-r1 models to two base ones

DeepSeek’s first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.