Ollama Cheatsheet - most useful commands - 2026 update
Compiled this Ollama command list sime time ago...
Here is the list and examples of the most useful Ollama commands (Ollama commands cheatsheet) I compiled some time ago, last updated in January 2026. Hopefully it will be useful to you too.

This Ollama cheatsheet is focusing on CLI commands, model management, and customization, But we have here also some curl calls too.
If you’re comparing different local LLM hosting solutions, check out our comprehensive comparison of Ollama, vLLM, LocalAI, Jan, LM Studio and more. For those seeking alternatives to command-line interfaces, Docker Model Runner offers a different approach to LLM deployment.
Installation
- Option 1: Download from Website
- Visit ollama.com and download the installer for your operating system (Mac, Linux, or Windows).
- Option 2: Install via Command Line
- For Mac and Linux users, use the command:
curl https://ollama.ai/install.sh | sh
- Follow the on-screen instructions and enter your password if prompted.
System Requirements
- Operating System: Mac or Linux (Windows version in development)
- Memory (RAM): 8GB minimum, 16GB or more recommended
- Storage: At least ~10GB free space (model files could be really big, see here more Move Ollama Models to Different Drive )
- Processor: A relatively modern CPU (from the last 5 years). If you’re curious about how Ollama utilizes different CPU architectures, see our analysis of how Ollama uses Intel CPU Performance and Efficient Cores.
For serious AI workloads, you might want to compare hardware options. We’ve benchmarked NVIDIA DGX Spark vs Mac Studio vs RTX-4080 performance with Ollama, and if you’re considering investing in high-end hardware, our DGX Spark pricing and capabilities comparison provides detailed cost analysis.
Basic Ollama CLI Commands
| Command | Description |
|---|---|
ollama serve |
Starts Ollama on your local system. |
ollama create <new_model> |
Creates a new model from an existing one for customization or training. |
ollama show <model> |
Displays details about a specific model, such as its configuration and release date. |
ollama run <model> |
Runs the specified model, making it ready for interaction. |
ollama pull <model> |
Downloads the specified model to your system. |
ollama list |
Lists all the downloaded models. The same as ollama ls |
ollama ps |
Shows the currently running models. |
ollama stop <model> |
Stops the specified running model. |
ollama rm <model> |
Removes the specified model from your system. |
ollama help |
Provides help about any command. |
Model Management
-
Download a Model:
ollama pull mistral-nemo:12b-instruct-2407-q6_KThis command downloads the specified model (e.g., Gemma 2B, or mistral-nemo:12b-instruct-2407-q6_K) to your system. The model files could be quite large, so keep an eye on the space used by models on the hard drive, or ssd. You might even want to move all Ollama models from you home directory to another bigger and better drive
-
Run a Model:
ollama run qwen2.5:32b-instruct-q3_K_SThis command starts the specified model and opens an interactive REPL for interaction. Want to understand how Ollama manages multiple concurrent requests? Learn more about how Ollama handles parallel requests in our detailed analysis.
-
List Models:
ollama listthe same as:
ollama lsThis command lists all the models that have been downloaded to your system, like
$ ollama ls NAME ID SIZE MODIFIED deepseek-r1:8b 6995872bfe4c 5.2 GB 2 weeks ago gemma3:12b-it-qat 5d4fa005e7bb 8.9 GB 2 weeks ago LoTUs5494/mistral-small-3.1:24b-instruct-2503-iq4_NL 4e994e0f85a0 13 GB 3 weeks ago dengcao/Qwen3-Embedding-8B:Q4_K_M d3ca2355027f 4.7 GB 4 weeks ago dengcao/Qwen3-Embedding-4B:Q5_K_M 7e8c9ad6885b 2.9 GB 4 weeks ago qwen3:8b 500a1f067a9f 5.2 GB 5 weeks ago qwen3:14b bdbd181c33f2 9.3 GB 5 weeks ago qwen3:30b-a3b 0b28110b7a33 18 GB 5 weeks ago devstral:24b c4b2fa0c33d7 14 GB 5 weeks ago -
Stop a Model:
ollama stop llama3.1:8b-instruct-q8_0This command stops the specified running model.
Releasing Model from VRAM
When a model is loaded into VRAM (GPU memory), it stays there even after you finish using it. To explicitly release a model from VRAM and free up GPU memory, you can send a request to the Ollama API with keep_alive: 0.
- Release Model from VRAM using curl:
curl http://localhost:11434/api/generate -d '{"model": "MODELNAME", "keep_alive": 0}'
Replace MODELNAME with your actual model name, for example:
curl http://localhost:11434/api/generate -d '{"model": "qwen3:14b", "keep_alive": 0}'
- Release Model from VRAM using Python:
import requests
response = requests.post(
'http://localhost:11434/api/generate',
json={'model': 'qwen3:14b', 'keep_alive': 0}
)
This is particularly useful when:
- You need to free up GPU memory for other applications
- You’re running multiple models and want to manage VRAM usage
- You’ve finished using a large model and want to release resources immediately
Note: The keep_alive parameter controls how long (in seconds) a model stays loaded in memory after the last request. Setting it to 0 immediately unloads the model from VRAM.
Customizing Models
-
Set System Prompt: Inside the Ollama REPL, you can set a system prompt to customize the model’s behavior:
>>> /set system For all questions asked answer in plain English avoiding technical jargon as much as possible >>> /save ipe >>> /byeThen, run the customized model:
ollama run ipeThis sets a system prompt and saves the model for future use.
-
Create Custom Model File: Create a text file (e.g.,
custom_model.txt) with the following structure:FROM llama3.1 SYSTEM [Your custom instructions here]Then, run:
ollama create mymodel -f custom_model.txt ollama run mymodelThis creates a customized model based on the instructions in the file".
Using Ollama with Files
-
Summarize Text from a File:
ollama run llama3.2 "Summarize the content of this file in 50 words." < input.txtThis command summarizes the content of
input.txtusing the specified model. -
Log Model Responses to a File:
ollama run llama3.2 "Tell me about renewable energy." > output.txtThis command saves the model’s response to
output.txt.
Common Use Cases
-
Text Generation:
- Summarizing a large text file:
ollama run llama3.2 "Summarize the following text:" < long-document.txt - Generating content:
ollama run llama3.2 "Write a short article on the benefits of using AI in healthcare." > article.txt - Answering specific questions:
ollama run llama3.2 "What are the latest trends in AI, and how will they affect healthcare?"
.
- Summarizing a large text file:
-
Data Processing and Analysis:
- Classifying text into positive, negative, or neutral sentiment:
ollama run llama3.2 "Analyze the sentiment of this customer review: 'The product is fantastic, but delivery was slow.'" - Categorizing text into predefined categories: Use similar commands to classify or categorize text based on predefined criteria.
- Classifying text into positive, negative, or neutral sentiment:
Using Ollama with Python
- Install Ollama Python Library:
pip install ollama - Generate Text Using Python:
This code snippet generates text using the specified model and prompt.
import ollama response = ollama.generate(model='gemma:2b', prompt='what is a qubit?') print(response['response'])
For advanced Python integration, explore using Ollama’s Web Search API in Python, which covers web search capabilities, tool calling, and MCP server integration. If you’re building AI-powered applications, our AI Coding Assistants comparison can help you choose the right tools for development.
Looking for a web-based interface? Open WebUI provides a self-hosted interface with RAG capabilities and multi-user support. For high-performance production deployments, consider vLLM as an alternative.
Useful links
Alternatives and Comparisons
- Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More
- vLLM Quickstart: High-Performance LLM Serving
- Docker Model Runner vs Ollama: Which to Choose?
- First Signs of Ollama Enshittification
Performance and Hardware
- How Ollama Handles Parallel Requests
- How Ollama is using Intel CPU Performance and Efficient Cores
- NVIDIA DGX Spark vs Mac Studio vs RTX-4080: Ollama Performance Comparison
- DGX Spark vs. Mac Studio: A Practical, Price-Checked Look at NVIDIA’s Personal AI Supercomputer
Integration and Development
- Using Ollama Web Search API in Python
- AI Coding Assistants Comparison
- Open WebUI: Self-Hosted LLM Interface
- Open-Source Chat UIs for LLMs on Local Ollama Instances
- Constraining LLMs with Structured Output: Ollama, Qwen3 & Python or Go
- Integrating Ollama with Python: REST API and Python Client Examples
- Go SDKs for Ollama - comparison with examples