Ollama CLI Cheatsheet: ls, serve, run, ps + commands (2026 update)
Updated Ollama command list - ls, ps, run, serve, etc
This Ollama CLI cheatsheet focuses on the commands you use every day (ollama ls, ollama serve, ollama run, ollama ps, model management, and common workflows), with examples you can copy/paste.
It also includes a short “performance knobs” section to help you discover (and then deep-dive) OLLAMA_NUM_PARALLEL and related settings.

This Ollama cheatsheet is focusing on CLI commands, model management, and customization, But we have here also some curl calls too.
For a full picture of where Ollama fits among local, self-hosted and cloud options—including vLLM, Docker Model Runner, LocalAI and cloud providers—see LLM Hosting: Local, Self-Hosted & Cloud Infrastructure Compared. If you’re comparing different local LLM hosting solutions, check out our comprehensive comparison of Ollama, vLLM, LocalAI, Jan, LM Studio and more. For those seeking alternatives to command-line interfaces, Docker Model Runner offers a different approach to LLM deployment.
Ollama installation (download and CLI install)
- Option 1: Download from Website
- Visit ollama.com and download the installer for your operating system (Mac, Linux, or Windows).
- Option 2: Install via Command Line
- For Mac and Linux users, use the command:
curl https://ollama.ai/install.sh | sh
- Follow the on-screen instructions and enter your password if prompted.
Ollama system requirements (RAM, storage, CPU)
- Operating System: Mac or Linux (Windows version in development)
- Memory (RAM): 8GB minimum, 16GB or more recommended
- Storage: At least ~10GB free space (model files could be really big, see here more Move Ollama Models to Different Drive )
- Processor: A relatively modern CPU (from the last 5 years). If you’re curious about how Ollama utilizes different CPU architectures, see our analysis of how Ollama uses Intel CPU Performance and Efficient Cores.
For serious AI workloads, you might want to compare hardware options. We’ve benchmarked NVIDIA DGX Spark vs Mac Studio vs RTX-4080 performance with Ollama, and if you’re considering investing in high-end hardware, our DGX Spark pricing and capabilities comparison provides detailed cost analysis.
Basic Ollama CLI Commands
| Command | Description |
|---|---|
ollama serve |
Starts Ollama on your local system. |
ollama create <new_model> |
Creates a new model from an existing one for customization or training. |
ollama show <model> |
Displays details about a specific model, such as its configuration and release date. |
ollama run <model> |
Runs the specified model, making it ready for interaction. |
ollama pull <model> |
Downloads the specified model to your system. |
ollama list |
Lists all the downloaded models. The same as ollama ls |
ollama ps |
Shows the currently running models. |
ollama stop <model> |
Stops the specified running model. |
ollama rm <model> |
Removes the specified model from your system. |
ollama help |
Provides help about any command. |
Jump links: Ollama serve command · Ollama run command · Ollama ps command · Ollama CLI basics · Performance knobs (OLLAMA_NUM_PARALLEL) · Parallel requests deep dive
Ollama CLI (what it is)
Ollama CLI is the command-line interface to manage models and run/serve them locally. Most workflows boil down to:
- Start the server:
ollama serve - Run a model:
ollama run <model> - See what’s loaded/running:
ollama ps - Manage models:
ollama pull,ollama list,ollama rm
Ollama model management: pull and list models commands
List Models:
ollama list
the same as:
ollama ls
This command lists all the models that have been downloaded to your system, with their file sizes on your hdd/sdd, like
$ ollama ls
NAME ID SIZE MODIFIED
deepseek-r1:8b 6995872bfe4c 5.2 GB 2 weeks ago
gemma3:12b-it-qat 5d4fa005e7bb 8.9 GB 2 weeks ago
LoTUs5494/mistral-small-3.1:24b-instruct-2503-iq4_NL 4e994e0f85a0 13 GB 3 weeks ago
dengcao/Qwen3-Embedding-8B:Q4_K_M d3ca2355027f 4.7 GB 4 weeks ago
dengcao/Qwen3-Embedding-4B:Q5_K_M 7e8c9ad6885b 2.9 GB 4 weeks ago
qwen3:8b 500a1f067a9f 5.2 GB 5 weeks ago
qwen3:14b bdbd181c33f2 9.3 GB 5 weeks ago
qwen3:30b-a3b 0b28110b7a33 18 GB 5 weeks ago
devstral:24b c4b2fa0c33d7 14 GB 5 weeks ago
Download a Model: ollama pull
ollama pull mistral-nemo:12b-instruct-2407-q6_K
This command downloads the specified model (e.g., Gemma 2B, or mistral-nemo:12b-instruct-2407-q6_K) to your system. The model files could be quite large, so keep an eye on the space used by models on the hard drive, or ssd. You might even want to move all Ollama models from you home directory to another bigger and better drive
Ollama serve command
ollama serve starts the local Ollama server (default HTTP port 11434).
ollama serve
“ollama serve” command (systemd-friendly example):
# set env vars, then start the server
# make ollama available on the host's IP address
export OLLAMA_HOST=0.0.0.0:11434
export OLLAMA_NUM_PARALLEL=2
ollama serve
Ollama run command
Run a Model:
ollama run gpt-oss:20b
This command starts the specified model and opens an interactive REPL for interaction. Want to understand how Ollama manages multiple concurrent requests? Learn more about how Ollama handles parallel requests in our detailed analysis.
ollama run runs a model in an interactive session,
so you in the case of gpt-oss:120b would see something like
$ ollama run gpt-oss:120b
>>> Send a message (/? for help)
you can type your questions or commands and the model will reply.
>>> who are you?
Thinking...
The user asks "who are you?" Simple question. Should respond as ChatGPT, an AI language model, trained by OpenAI,
etc. Provide brief intro. Probably ask if they need help.
...done thinking.
I’m ChatGPT, an AI language model created by OpenAI. I’ve been trained on a wide range of text so I can help
answer questions, brainstorm ideas, explain concepts, draft writing, troubleshoot problems, and much more. Think
of me as a versatile virtual assistant—here to provide information, support, and conversation whenever you need
it. How can I help you today?
>>> Send a message (/? for help)
To exit the interactive ollama session, press Ctrl+D, or you can type /bye, the same result:
>>> /bye
$
Ollama run command examples
To run a model and ask a single question in a non-interactive mode:
printf "Give me 10 bash one-liners for log analysis.\n" | ollama run llama3.2
If you want to see detailed verbose LLM reply in ollama session - run the model with --verbose or -v parameter:
$ ollama run gpt-oss:20b --verbose
>>> who are you?
Thinking...
We need to respond to a simple question: "who are you?" The user is asking "who are you?" We can answer that we
are ChatGPT, a large language model trained by OpenAI. We can also mention capabilities. The user likely expects
a brief introduction. We'll keep it friendly.
...done thinking.
I’m ChatGPT, a large language model created by OpenAI. I’m here to help answer questions, offer explanations,
brainstorm ideas, and chat about a wide range of topics—everything from science and history to creative writing
and everyday advice. Just let me know what you’d like to talk about!
total duration: 1.118585707s
load duration: 106.690543ms
prompt eval count: 71 token(s)
prompt eval duration: 30.507392ms
prompt eval rate: 2327.30 tokens/s
eval count: 132 token(s)
eval duration: 945.801569ms
eval rate: 139.56 tokens/s
>>> /bye
$
Yes, that’s right, it is 139 tokens per second. The gpt-oss:20b is very fast. If you, like me have GPU with 16GB VRAM - see the LLMs speed somparison details in Best LLMs for Ollama on 16GB VRAM GPU.
Tip: If you want the model available over HTTP for multiple apps, start the server with ollama serve and use the API client instead of long interactive sessions.
Ollama stop command
This command stops the specified running model.
ollama stop llama3.1:8b-instruct-q8_0
Ollama evicts models automagically after some time.
You can specify this time, byt default is 4 minutes.
If you don’t want to wait the remaining time , you might want to use this ollama stop command.
You can also kick the model out of the VRAM by calling /generate API endpoint with parameter keep_alive=0, see below for the description and example.
Ollama ps command
ollama ps shows currently running models and sessions (useful to debug “why is my VRAM full?”).
ollama ps
The example of the ollama ps output is below:
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gpt-oss:20b 17052f91a42e 14 GB 100% GPU 4096 4 minutes from now
You see here on my PC the gpt-oss:20b fits into the my GPU’s 16GB VRAM very well, and ocupied only 14GB.
If I execute ollama run gpt-oss:120b and then call the ollama ps, the outcome will not be that bright:
78% of layers are on CPU, and this is just with the context window 4096 tokens. It will be more should I need to increase the context.
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gpt-oss:120b a951a23b46a1 66 GB 78%/22% CPU/GPU 4096 4 minutes from now
Performance knobs (OLLAMA_NUM_PARALLEL)
If you see queueing or timeouts under load, the first knob to learn is OLLAMA_NUM_PARALLEL.
OLLAMA_NUM_PARALLEL= how many requests Ollama executes in parallel.- A higher value can increase throughput, but may increase VRAM pressure and latency spikes.
Quick example:
OLLAMA_NUM_PARALLEL=2 ollama serve
For a full explanation (including tuning strategies and failure modes), see:
Releasing Ollama model from VRAM (keep_alive)
When a model is loaded into VRAM (GPU memory), it stays there even after you finish using it. To explicitly release a model from VRAM and free up GPU memory, you can send a request to the Ollama API with keep_alive: 0.
- Release Model from VRAM using curl:
curl http://localhost:11434/api/generate -d '{"model": "MODELNAME", "keep_alive": 0}'
Replace MODELNAME with your actual model name, for example:
curl http://localhost:11434/api/generate -d '{"model": "qwen3:14b", "keep_alive": 0}'
- Release Model from VRAM using Python:
import requests
response = requests.post(
'http://localhost:11434/api/generate',
json={'model': 'qwen3:14b', 'keep_alive': 0}
)
This is particularly useful when:
- You need to free up GPU memory for other applications
- You’re running multiple models and want to manage VRAM usage
- You’ve finished using a large model and want to release resources immediately
Note: The keep_alive parameter controls how long (in seconds) a model stays loaded in memory after the last request. Setting it to 0 immediately unloads the model from VRAM.
Customizing Ollama models (system prompt, Modelfile)
-
Set System Prompt: Inside the Ollama REPL, you can set a system prompt to customize the model’s behavior:
>>> /set system For all questions asked answer in plain English avoiding technical jargon as much as possible >>> /save ipe >>> /byeThen, run the customized model:
ollama run ipeThis sets a system prompt and saves the model for future use.
-
Create Custom Model File: Create a text file (e.g.,
custom_model.txt) with the following structure:FROM llama3.1 SYSTEM [Your custom instructions here]Then, run:
ollama create mymodel -f custom_model.txt ollama run mymodelThis creates a customized model based on the instructions in the file".
Using Ollama run command with files (summarize, redirect)
-
Summarize Text from a File:
ollama run llama3.2 "Summarize the content of this file in 50 words." < input.txtThis command summarizes the content of
input.txtusing the specified model. -
Log Model Responses to a File:
ollama run llama3.2 "Tell me about renewable energy." > output.txtThis command saves the model’s response to
output.txt.
Ollama CLI use cases (text generation, analysis)
-
Text Generation:
- Summarizing a large text file:
ollama run llama3.2 "Summarize the following text:" < long-document.txt - Generating content:
ollama run llama3.2 "Write a short article on the benefits of using AI in healthcare." > article.txt - Answering specific questions:
ollama run llama3.2 "What are the latest trends in AI, and how will they affect healthcare?"
.
- Summarizing a large text file:
-
Data Processing and Analysis:
- Classifying text into positive, negative, or neutral sentiment:
ollama run llama3.2 "Analyze the sentiment of this customer review: 'The product is fantastic, but delivery was slow.'" - Categorizing text into predefined categories: Use similar commands to classify or categorize text based on predefined criteria.
- Classifying text into positive, negative, or neutral sentiment:
Using Ollama with Python (client and API)
- Install Ollama Python Library:
pip install ollama - Generate Text Using Python:
This code snippet generates text using the specified model and prompt.
import ollama response = ollama.generate(model='gemma:2b', prompt='what is a qubit?') print(response['response'])
For advanced Python integration, explore using Ollama’s Web Search API in Python, which covers web search capabilities, tool calling, and MCP server integration. If you’re building AI-powered applications, our AI Coding Assistants comparison can help you choose the right tools for development.
Looking for a web-based interface? Open WebUI provides a self-hosted interface with RAG capabilities and multi-user support. For high-performance production deployments, consider vLLM as an alternative. To compare Ollama with other local and cloud LLM infrastructure choices, see LLM Hosting: Local, Self-Hosted & Cloud Infrastructure Compared.
Useful links
Configuration and Management
Alternatives and Comparisons
- Local LLM Hosting: Complete 2026 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More
- vLLM Quickstart: High-Performance LLM Serving
- Docker Model Runner vs Ollama: Which to Choose?
- First Signs of Ollama Enshittification
Performance and Hardware
- How Ollama Handles Parallel Requests
- How Ollama is using Intel CPU Performance and Efficient Cores
- NVIDIA DGX Spark vs Mac Studio vs RTX-4080: Ollama Performance Comparison
- DGX Spark vs. Mac Studio: A Practical, Price-Checked Look at NVIDIA’s Personal AI Supercomputer
Integration and Development
- Using Ollama Web Search API in Python
- AI Coding Assistants Comparison
- Open WebUI: Self-Hosted LLM Interface
- Open-Source Chat UIs for LLMs on Local Ollama Instances
- Constraining LLMs with Structured Output: Ollama, Qwen3 & Python or Go
- Integrating Ollama with Python: REST API and Python Client Examples
- Go SDKs for Ollama - comparison with examples