Ollama

Ollama’s Python library now includes native OLlama web search capabilities. With just a few lines of code, you can augment your local LLMs with real-time information from the web, reducing hallucinations and improving accuracy.

Ollama’s Web Search API lets you augment local LLMs with real-time web information. This guide shows you how to implement web search capabilities in Go, from simple API calls to full-featured search agents.

Local LLM Hosting: Complete 2025 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More

Local deployment of LLMs has become increasingly popular as developers and organizations seek enhanced privacy, reduced latency, and greater control over their AI infrastructure.

The democratization of AI is here. With open-source LLMs like Llama 3, Mixtral, and Qwen now rivaling proprietary models, teams can build powerful AI infrastructure using consumer hardware - slashing costs while maintaining complete control over data privacy and deployment.

NVIDIA DGX Spark vs Mac Studio vs RTX-4080: Ollama Performance Comparison

I dug up some interesting performance tests of GPT-OSS 120b running on Ollama across three different platforms: NVIDIA DGX Spark, Mac Studio, and RTX 4080. The GPT-OSS 120b model from the Ollama library weighs in at 65GB, which means it doesn’t fit into the 16GB VRAM of an RTX 4080 (or the newer RTX 5080).

Docker Model Runner vs Ollama: Which to Choose?

Running large language models (LLMs) locally has become increasingly popular for privacy, cost control, and offline capabilities. The landscape shifted significantly in April 2025 when Docker introduced Docker Model Runner (DMR), its official solution for AI model deployment.

Go clients for Ollama: SDK comparison and Qwen3/GPT-OSS examples

This guide provides a comprehensive overview of available Go SDKs for Ollama and compares their feature sets.

Here is a comparison between Qwen3:30b and GPT-OSS:20b focusing on instruction following and performance parameters, specs and speed:

Integrating Ollama with Python: REST API and Python Client Examples

In this post, we’ll explore two ways to connect your Python application to Ollama: 1. Via HTTP REST API; 2. Via the official Ollama Python library.

Ollama’s GPT-OSS models have recurring issues handling structured output, especially when used with frameworks like LangChain, OpenAI SDK, vllm, and others.

Constraining LLMs with Structured Output: Ollama, Qwen3 & Python or Go

Large Language Models (LLMs) are powerful, but in production we rarely want free-form paragraphs. Instead, we want predictable data: attributes, facts, or structured objects you can feed into an app. That’s LLM Structured Output.

Memory allocation and model scheduling in Ollama new version - v0.12.1

Here I am comparing how much VRAM new version of Ollama allocating for the model vs previous Ollama version. The new version is worse.

Ollama Enshittification - the Early Signs

Ollama has quickly become one of the most popular tools for running LLMs locally. Its simple CLI, and streamlined model management have made it a go-to option for developers who want to work with AI models outside the cloud. But as with many promising platforms, there are already signs of Enshittification:

Locally hosted Ollama allows to run large language models on your own machine, but using it via command-line isn’t user-friendly. Here are several open-source projects provide ChatGPT-style interfaces that connect to a local Ollama.

Reranking documents with Ollama and Qwen3 Reranker model - in Go

Since standard Ollama doesn’t have a direct rerank API, you’ll need to implement reranking using Qwen3 Reranker in GO by generating embeddings for query-document pairs and scoring them.

Comparison of Hugo Page Translation quality - LLMs on Ollama

In this test I’m comparing how different LLMs hosted on Ollama translate Hugo page in English to German. Three pages I tested were on different topics, had some nice markdown with some structure: headers, lists, tables, links, etc.

Ollama

Using Ollama Web Search API in Python

Using Ollama Web Search API in Go

Local LLM Hosting: Complete 2025 Guide - Ollama, vLLM, LocalAI, Jan, LM Studio & More

AI Infrastructure on Consumer Hardware

NVIDIA DGX Spark vs Mac Studio vs RTX-4080: Ollama Performance Comparison

Docker Model Runner vs Ollama: Which to Choose?

Go clients for Ollama: SDK comparison and Qwen3/GPT-OSS examples

Comparison: Qwen3:30b vs GPT-OSS:20b

Integrating Ollama with Python: REST API and Python Client Examples

Ollama GPT-OSS Structured Output Issues

Constraining LLMs with Structured Output: Ollama, Qwen3 & Python or Go

Memory allocation and model scheduling in Ollama new version - v0.12.1

Ollama Enshittification - the Early Signs

Chat UIs for Local Ollama Instances

Reranking documents with Ollama and Qwen3 Reranker model - in Go

Comparison of Hugo Page Translation quality - LLMs on Ollama