Is new version of Ollama better then previous one?

New version of Ollama consumes more memory then previous.

Memory allocation and model scheduling in Ollama new version - v0.12.1

My own test of ollama model scheduling

Page content

Here I am comparing how much VRAM new version of Ollama allocating for the model vs previous Ollama version. The new version is worse.

As it’s said on official web site new Ollama release has New model scheduling with

Maximizing GPU utilization:
Ollama’s new memory management allocates more memory to the GPU,
increasing token generation and processing speeds

and some examples are given, for example:

Long context

    GPU: 1x NVIDIA GeForce RTX 4090
    Model: gemma3:12b
    Context length: 128k

Old                                   New
52.02 tokens/s token generation speed 85.54 tokens/s token generation speed
19.9GiB of VRAM                       21.4GiB of VRAM
48⁄49 layers loaded on GPU            49⁄49 layers loaded on GPU

Here I am testing how it works on my PC. My results are very different, to the official tests, they are completely the opposite. I have slightly different hardware configuration and tested different models, but the results are not better at all, and often worse. This is echoing post about First Signs of Ollama Enshittification.

ollama llamas This image is from the blog post on Ollama website.

TL;DR

I’ve tested how new version of Ollama scheduling LLMs that do not fit into my 16GB VRAM.

mistral-small3.2:24b
qwen3:30b-a3b
gemma3:27b
qwen3:32b

I was running ollama run <modelname>, then some simple question like who are you?, And in separate terminal checked response of ollama ps and nvidia-smi. All pretty simple.

Only qwen3:30b-a3b showed the same CPU/GPU spread, three other models were pushed more to CPU in new version. In my tests, to my disappointment, new version of Ollama is worse, and these results are contradicting post on Ollama blog.

Detail comparison data

Model	Old ver: VRAM allocated	Old ver: CPU/GPU	New ver: VRAM allocated	New ver: CPU/GPU
mistral-small3.2:24b	14489MiB	41%/59%	14249MiB	44%/56%
qwen3:30b-a3b	15065MiB	21%/79%	14867MiB	21%/79%
gemma3:27b	13771MiB	28%/72%	14817MiB	29%/71%
qwen3:32b	14676MiB	30%/70%	15139MiB	32%/68%

Disappointed.

TL;DR

Detail comparison data

Useful links