Memory allocation and model scheduling in Ollama new version - v0.12.1

My own test of ollama model scheduling

Page content

Here I am comparing how much VRAM new version of Ollama allocating for the model vs previous Ollama version. The new version is worse.

As it’s said on official web site new Ollama release has New model scheduling with

Maximizing GPU utilization:
Ollama’s new memory management allocates more memory to the GPU,
increasing token generation and processing speeds

and some examples are given, for example:

Long context

    GPU: 1x NVIDIA GeForce RTX 4090
    Model: gemma3:12b
    Context length: 128k

Old                                   New
52.02 tokens/s token generation speed 85.54 tokens/s token generation speed
19.9GiB of VRAM                       21.4GiB of VRAM
48⁄49 layers loaded on GPU            49⁄49 layers loaded on GPU

Here I am testing how it works on my PC. My results are very different, to the official tests, they are completely the opposite. I have slightly different hardware configuration and tested different models, but the results are not better at all, and often worse. This is echoing post about First Signs of Ollama Enshittification.

ollama llamas This image is from the blog post on Ollama website.

TL;DR

I’ve tested how new version of Ollama scheduling LLMs that do not fit into my 16GB VRAM.

  • mistral-small3.2:24b
  • qwen3:30b-a3b
  • gemma3:27b
  • qwen3:32b

I was running ollama run <modelname>, then some simple question like who are you?, And in separate terminal checked response of ollama ps and nvidia-smi. All pretty simple.

Only qwen3:30b-a3b showed the same CPU/GPU spread, three other models were pushed more to CPU in new version. In my tests, to my disappointment, new version of Ollama is worse, and these results are contradicting post on Ollama blog.

Detail comparison data

Model Old ver: VRAM allocated Old ver: CPU/GPU New ver: VRAM allocated New ver: CPU/GPU
mistral-small3.2:24b 14489MiB 41%/59% 14249MiB 44%/56%
qwen3:30b-a3b 15065MiB 21%/79% 14867MiB 21%/79%
gemma3:27b 13771MiB 28%/72% 14817MiB 29%/71%
qwen3:32b 14676MiB 30%/70% 15139MiB 32%/68%

Disappointed.