LLM

Ollama Enshittification - the Early Signs

Ollama has quickly become one of the most popular tools for running LLMs locally. Its simple CLI, and streamlined model management have made it a go-to option for developers who want to work with AI models outside the cloud.

Locally hosted Ollama allows to run large language models on your own machine, but using it via command-line isn’t user-friendly. Here are several open-source projects provide ChatGPT-style interfaces that connect to a local Ollama.

NVIDIA DGX Spark - new little AI supercomputer

Nvidia is about to release NVIDIA DGX Spark - little AI supercomputer on blackwell architecture with 128+GB unified RAM and 1 PFLOPS AI performance. Nice device to run LLMs.

Model Context Protocol (MCP), and notes on implementing MCP server in Go

Here we have a description of The Model Context Protocol (MCP), short notes on how to implement an MCP server in Go, including message structure, protocol specifications.

Reranking documents with Ollama and Qwen3 Reranker model - in Go

Since standard Ollama doesn’t have a direct rerank API, you’ll need to implement reranking using Qwen3 Reranker in GO by generating embeddings for query-document pairs and scoring them.

Comparison of Hugo Page Translation quality - LLMs on Ollama

In this test I’m comparing how different LLMs hosted on Ollama translate Hugo page in English to German.

Reranking texts with Ollama and Qwen3 Embedding LLM - in Go

This little Reranking Go code example is calling Ollama to generate embeddings for the query and for eache candidate document, then sorting descending by cosine similarity.

Qwen3 Embedding & Reranker Models on Ollama: State-of-the-Art Performance

The Qwen3 Embedding and Reranker models are the latest releases in the Qwen family, specifically designed for advanced text embedding, retrieval, and reranking tasks.

LLM Performance and PCIe Lanes: Key Considerations

How PCIe Lanes Affect LLM Performance? Depending on the task. For training and multi-gpu inferrence - perdormance drop is significant.

Convert HTML content to Markdown using LLM and Ollama

In the Ollama models library there are models that able convert HTML content to Markdown, which is useful for content conversion tasks.

Search is best for quick, straightforward information retrieval using keywords.
Deep Search excels at understanding context and intent, delivering more relevant and comprehensive results for complex queries.

Will list here some AI-assisted coding tools and AI Coding Assistants and their nice sides.

Using LLMs is not very expensive, might be no need to buy new awesome GPU. Here is a list if LLM providers in the cloud with LLMs they host.

Test: How Ollama is using Intel CPU Performance and Efficient Cores

I’ve got a theory to test - if utilising ALL cores on Intel CPU would raise the speed of LLMs? This is bugging me that new gemma3 27 bit model (gemma3:27b, 17GB on ollama) is not fitting into 16GB VRAM of my GPU, and partially running on CPU.

In the midst of the modern world’s turmoil here I’m comparing tech specs of different cards suitable for AI tasks (Deep Learning, Object Detection and LLMs). They are all incredibly expensive though.

This guide explains how Ollama handles parallel requests (concurrency, queuing, and resource limits), and how to tune it using the OLLAMA_NUM_PARALLEL environment variable (and related knobs).

Ollama Enshittification - the Early Signs

Chat UIs for Local Ollama Instances

NVIDIA DGX Spark - new little AI supercomputer

Model Context Protocol (MCP), and notes on implementing MCP server in Go

Reranking documents with Ollama and Qwen3 Reranker model - in Go

Comparison of Hugo Page Translation quality - LLMs on Ollama

Reranking texts with Ollama and Qwen3 Embedding LLM - in Go

Qwen3 Embedding & Reranker Models on Ollama: State-of-the-Art Performance

LLM Performance and PCIe Lanes: Key Considerations

Convert HTML content to Markdown using LLM and Ollama

Search vs Deepsearch vs Deep Research

AI Coding Assistants comparison

Cloud LLM Providers

Test: How Ollama is using Intel CPU Performance and Efficient Cores

Comparing NVidia GPU suitability for AI

How Ollama Handles Parallel Requests