Memory allocation and model scheduling in Ollama new version - v0.12.1
My own test of ollama model scheduling
Here I am comparing how much VRAM new version of Ollama allocating for the model vs previous Ollama version. The new version is worse.
As it’s said on official web site new Ollama release has New model scheduling
with
Maximizing GPU utilization:
Ollama’s new memory management allocates more memory to the GPU,
increasing token generation and processing speeds
and some examples are given, for example:
Long context
GPU: 1x NVIDIA GeForce RTX 4090
Model: gemma3:12b
Context length: 128k
Old New
52.02 tokens/s token generation speed 85.54 tokens/s token generation speed
19.9GiB of VRAM 21.4GiB of VRAM
48⁄49 layers loaded on GPU 49⁄49 layers loaded on GPU
Here I am testing how it works on my PC. My results are very different, to the official tests, they are completely the opposite. I have slightly different hardware configuration and tested different models, but the results are not better at all, and often worse. This is echoing post about First Signs of Ollama Enshittification.
This image is from the blog post on Ollama website.
TL;DR
I’ve tested how new version of Ollama scheduling LLMs that do not fit into my 16GB VRAM.
- mistral-small3.2:24b
- qwen3:30b-a3b
- gemma3:27b
- qwen3:32b
I was running ollama run <modelname>
, then some simple question like who are you?
, And in separate terminal checked response of ollama ps
and nvidia-smi
. All pretty simple.
Only qwen3:30b-a3b showed the same CPU/GPU spread, three other models were pushed more to CPU in new version. In my tests, to my disappointment, new version of Ollama is worse, and these results are contradicting post on Ollama blog.
Detail comparison data
Model | Old ver: VRAM allocated | Old ver: CPU/GPU | New ver: VRAM allocated | New ver: CPU/GPU |
---|---|---|---|---|
mistral-small3.2:24b | 14489MiB | 41%/59% | 14249MiB | 44%/56% |
qwen3:30b-a3b | 15065MiB | 21%/79% | 14867MiB | 21%/79% |
gemma3:27b | 13771MiB | 28%/72% | 14817MiB | 29%/71% |
qwen3:32b | 14676MiB | 30%/70% | 15139MiB | 32%/68% |
Disappointed.