Adding NVIDIA GPU Support to Docker Model Runner
Enable GPU acceleration for Docker Model Runner with NVIDIA CUDA support
Docker Model Runner is Docker’s official tool for running AI models locally, but enabling NVidia GPU acceleration in Docker Model Runner requires specific configuration.
Unlike standard docker run commands, docker model run doesn’t support --gpus or -e flags, so GPU support must be configured at the Docker daemon level and during runner installation.
If you’re looking for an alternative LLM hosting solution with easier GPU configuration, consider Ollama, which has built-in GPU support and simpler setup. However, Docker Model Runner offers better integration with Docker’s ecosystem and OCI artifact distribution.
This nice image is generated by AI model Flux 1 dev.
Prerequisites
Before configuring GPU support, ensure you have:
- NVIDIA GPU with compatible drivers installed. For help choosing the right GPU for AI workloads, see our guide on Comparing NVidia GPU specs suitability for AI.
- NVIDIA Container Toolkit installed (see NVIDIA RTX support section)
- Docker Model Runner installed (can be reinstalled with GPU support)
Verify your GPU is accessible:
nvidia-smi
Test Docker GPU access:
docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubi8 nvidia-smi
For more Docker commands and configuration options, see our Docker Cheatsheet.
Step 1: Configure Docker Daemon for NVIDIA Runtime
Docker Model Runner requires the NVIDIA runtime to be set as the default runtime in the Docker daemon configuration.
Find NVIDIA Container Runtime Path
First, locate where nvidia-container-runtime is installed:
which nvidia-container-runtime
This typically outputs /usr/bin/nvidia-container-runtime. Note this path for the next step.
Configure Docker Daemon
Create or update /etc/docker/daemon.json to set NVIDIA as the default runtime:
sudo tee /etc/docker/daemon.json > /dev/null << 'EOF'
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
Important: If which nvidia-container-runtime returned a different path, update the "path" value in the JSON configuration accordingly.
Restart Docker Service
Apply the configuration by restarting Docker:
sudo systemctl restart docker
Verify Configuration
Confirm the NVIDIA runtime is configured:
docker info | grep -i runtime
You should see Default Runtime: nvidia in the output.
Step 2: Install Docker Model Runner with GPU Support
Docker Model Runner must be installed or reinstalled with explicit GPU support. The runner container itself needs to be the CUDA-enabled version.
Stop Current Runner (if running)
If Docker Model Runner is already installed, stop it first:
docker model stop-runner
Install/Reinstall with CUDA Support
Install or reinstall Docker Model Runner with CUDA GPU support:
docker model reinstall-runner --gpu cuda
This command:
- Pulls the CUDA-enabled version (
docker/model-runner:latest-cuda) instead of the CPU-only version - Configures the runner container to use NVIDIA runtime
- Enables GPU acceleration for all models
Note: If you’ve already installed Docker Model Runner without GPU support, you must reinstall it with the --gpu cuda flag. Simply configuring the Docker daemon is not enough—the runner container itself needs to be the CUDA-enabled version.
Available GPU Backends
Docker Model Runner supports multiple GPU backends:
cuda- NVIDIA CUDA (most common for NVIDIA GPUs)rocm- AMD ROCm (for AMD GPUs)musa- Moore Threads MUSAcann- Huawei CANNauto- Automatic detection (default, may not work correctly)none- CPU only
For NVIDIA GPUs, always use --gpu cuda explicitly.
Step 3: Verify GPU Access
After installation, verify that Docker Model Runner can access your GPU.
Check Runner Container GPU Access
Test GPU access from within the Docker Model Runner container:
docker exec docker-model-runner nvidia-smi
This should display your GPU information, confirming the container has GPU access.
Check Runner Status
Verify Docker Model Runner is running:
docker model status
You should see the runner is active with llama.cpp support.
Step 4: Test Model with GPU
Run a model and verify it’s using the GPU.
Run a Model
Start a model inference:
docker model run ai/qwen3:14B-Q6_K "who are you?"
Verify GPU Usage in Logs
Check the Docker Model Runner logs for GPU confirmation:
docker model logs | grep -i cuda
You should see messages indicating GPU usage:
using device CUDA0 (NVIDIA GeForce RTX 4080)- GPU device detectedoffloaded 41/41 layers to GPU- Model layers loaded on GPUCUDA0 model buffer size = 10946.13 MiB- GPU memory allocationCUDA0 KV buffer size = 640.00 MiB- Key-value cache on GPUCUDA0 compute buffer size = 306.75 MiB- Compute buffer on GPU
Monitor GPU Usage
In another terminal, monitor GPU usage in real-time:
nvidia-smi -l 1
You should see GPU memory usage and utilization increase when the model is running.
For more advanced GPU monitoring options and tools, see our guide on GPU monitoring applications in Linux / Ubuntu.
Troubleshooting
Model Still Using CPU
If the model is still running on CPU:
-
Verify Docker daemon configuration:
docker info | grep -i runtimeShould show
Default Runtime: nvidia -
Check runner container runtime:
docker inspect docker-model-runner | grep -A 2 '"Runtime"'Should show
"Runtime": "nvidia" -
Reinstall runner with GPU support:
docker model reinstall-runner --gpu cuda -
Check logs for errors:
docker model logs | tail -50
GPU Not Detected
If GPU is not detected:
-
Verify NVIDIA Container Toolkit is installed:
dpkg -l | grep nvidia-container-toolkit -
Test GPU access with standard Docker:
docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubi8 nvidia-smiFor troubleshooting Docker issues, refer to our Docker Cheatsheet.
-
Check NVIDIA drivers:
nvidia-smi
Performance Issues
If GPU performance is poor:
-
Check GPU utilization:
nvidia-smiLook for high GPU utilization percentage
-
Verify model layers are on GPU:
docker model logs | grep "offloaded.*layers to GPU"All layers should be offloaded to GPU
-
Check for memory issues:
nvidia-smiEnsure GPU memory isn’t exhausted
Best Practices
-
Always specify GPU backend explicitly: Use
--gpu cudainstead of--gpu autofor NVIDIA GPUs to ensure correct configuration. -
Verify configuration after changes: Always check
docker info | grep -i runtimeafter modifying Docker daemon settings. -
Monitor GPU usage: Use
nvidia-smito monitor GPU memory and utilization during model inference. For more advanced monitoring tools, see our guide on GPU monitoring applications in Linux / Ubuntu. -
Check logs regularly: Review
docker model logsto ensure models are using GPU acceleration. -
Use appropriate model sizes: Ensure your GPU has sufficient memory for the model. Use quantized models (Q4, Q5, Q6, Q8) for better GPU memory efficiency. For help choosing the right GPU for your AI workloads, see our guide on Comparing NVidia GPU specs suitability for AI.
Useful Links
- Docker Model Runner Cheatsheet
- Docker Model Runner Official Documentation
- NVIDIA Container Toolkit Installation Guide
- Docker Model Runner vs Ollama Comparison
- Ollama Cheatsheet - Alternative LLM hosting solution with built-in GPU support
- Docker Cheatsheet - Complete reference for Docker commands and configuration
- GPU monitoring applications in Linux / Ubuntu - List and comparison of NVIDIA GPU monitoring tools
- Comparing NVidia GPU specs suitability for AI - Guide to choosing the right GPU for AI workloads