site logo
Notes on the margins
Rost Glukhov. Personal site and technical blog
  • Documentation tools
  • Hardware
  • LLM Hosting
  • LLM Performance
  • RAG
  • Data
  • Observability
  • AI DevTools
  • DevTools
  • OpenClaw
  • Web Infra
  • Coding
  • DevOps
  • CookBook
  • AI
  • Ollama
  • Cheatsheets
  • Offline
  • About
Claude Code install and config for Ollama, llama.cpp, pricing

Claude Code install and config for Ollama, llama.cpp, pricing

Agentic coding, now with local model backends.

Claude Code is not autocomplete with better marketing. It is an agentic coding tool: it reads your codebase, edits files, runs commands, and integrates with your development tools.

Hermes AI Assistant - Install, Setup, Workflow, and Troubleshooting

Hermes AI Assistant - Install, Setup, Workflow, and Troubleshooting

Hermes Agent install and quickstart for devs

Hermes Agent is a self-hosted, model-agnostic AI assistant that runs on a local machine or low-cost VPS, works through terminal and messaging interfaces, and improves over time by turning repeated tasks into reusable skills.

TGI - Text Generation Inference - Install, Config, Troubleshoot

TGI - Text Generation Inference - Install, Config, Troubleshoot

Install TGI, ship fast, debug faster

Text Generation Inference (TGI) has a very specific energy. It is not the newest kid in the inference street, but it is the one that already learned how production breaks -

16 GB VRAM LLM benchmarks with llama.cpp (speed and context)

16 GB VRAM LLM benchmarks with llama.cpp (speed and context)

llama.cpp token speed on 16 GB VRAM (tables).

Here I am comparing speed of several LLMs running on GPU with 16GB of VRAM, and choosing the best one for self-hosting.

RTX 5090 in Australia March 2026 Pricing Stock Reality

RTX 5090 in Australia March 2026 Pricing Stock Reality

RTX 5090 in AU is scarce and overpriced

Australia has RTX 5090 stock. Barely. And if you find one, you will pay a premium that feels detached from reality.

Remote Ollama access via Tailscale or WireGuard, no public ports

Remote Ollama access via Tailscale or WireGuard, no public ports

Remote Ollama access without public ports

Ollama is at its happiest when it is treated like a local daemon: the CLI and your apps talk to a loopback HTTP API, and the rest of the network never finds out it exists.

Structured Logging in Go with slog for Observability and Alerting

Structured Logging in Go with slog for Observability and Alerting

Queryable JSON logs that connect to traces.

Logs are a debugging interface you can still use when the system is on fire. The problem is that plain text logs age poorly: as soon as you need filtering, aggregation, and alerting, you start parsing sentences.

Ollama in Docker Compose with GPU and Persistent Model Storage

Ollama in Docker Compose with GPU and Persistent Model Storage

Compose-first Ollama server with GPU and persistence.

Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not.

Ollama behind a reverse proxy with Caddy or Nginx for HTTPS streaming

Ollama behind a reverse proxy with Caddy or Nginx for HTTPS streaming

HTTPS Ollama without breaking streaming responses.

Running Ollama behind a reverse proxy is the simplest way to get HTTPS, optional access control, and predictable streaming behaviour.

Text embeddings for RAG and search - Python, Ollama, OpenAI-compatible APIs

Text embeddings for RAG and search - Python, Ollama, OpenAI-compatible APIs

RAG embeddings - Python, Ollama, OpenAI APIs.

If you are working through retrieval-augmented generation (RAG), this section walks through text embeddings in plain terms — what they are, how they fit search and retrieval, and how to call two common local setups from Python using Ollama or an OpenAI-compatible HTTP API (as many llama.cpp-based servers expose).

Netlify for Hugo & static sites: pricing, free tier, and alternatives

Netlify for Hugo & static sites: pricing, free tier, and alternatives

Git-based deploys, CDN, credits, and trade-offs.

Netlify is one of the most developer-friendly ways to ship Hugo sites and modern web apps with a production-grade workflow: preview URLs for every pull request, atomic deploys, a global CDN, and optional serverless and edge capabilities.

Apache Flink on K8s and Kafka: PyFlink, Go, ops, and managed pricing

Apache Flink on K8s and Kafka: PyFlink, Go, ops, and managed pricing

Stateful streaming, checkpoints, K8s, PyFlink, Go.

Apache Flink is a framework for stateful computations over unbounded and bounded data streams.

Neo4j graph database for GraphRAG, install, Cypher, vectors, ops

Neo4j graph database for GraphRAG, install, Cypher, vectors, ops

Graphs, Cypher, vectors, and ops hardening.

Neo4j is what you reach for when the relationships are the data. If your domain looks like a whiteboard of circles and arrows, forcing it into tables is painful.

IndexNow explained - notify search engines when you publish

IndexNow explained - notify search engines when you publish

Push URL updates to search engines after deploy.

Static sites and blogs change whenever you deploy. Search engines that support IndexNow can learn about those changes without waiting for the next blind crawl.

Hosted email for custom domains compared - Workspace, Microsoft 365, Zoho, Proton, WorkMail

Hosted email for custom domains compared - Workspace, Microsoft 365, Zoho, Proton, WorkMail

Pick hosted email for your domain without regret.

Putting email on your own domain sounds like a weekend DNS task. In practice it is a small distributed system with a twenty-year legacy.

SGLang QuickStart: Install, Configure, and Serve LLMs via OpenAI API

SGLang QuickStart: Install, Configure, and Serve LLMs via OpenAI API

Serve open models fast with SGLang.

SGLang is a high-performance serving framework for large language models and multimodal models, built to deliver low-latency and high-throughput inference across everything from a single GPU to distributed clusters.

1/20 »

Recent Posts

  • Claude Code install and config for Ollama, llama.cpp, pricing
  • Hermes AI Assistant - Install, Setup, Workflow, and Troubleshooting
  • TGI - Text Generation Inference - Install, Config, Troubleshoot
  • 16 GB VRAM LLM benchmarks with llama.cpp (speed and context)
  • RTX 5090 in Australia March 2026 Pricing Stock Reality

Categories

  • AI
  • AI DevTools
  • Ai Systems
  • Architecture
  • Cheatsheet
  • Coding
  • Cookbook
  • Data Infrastructure
  • Developer Tools
  • DevOps
  • Documentation-Tools
  • Hardware
  • Howtos
  • LLM Hosting
  • LLM Performance
  • Observability
  • Offline
  • RAG
  • Research
  • Self-Hosting
  • Web Infrastructure

Tags

AI AI Coding Anaconda Android API Architecture AWS AWS Amplify Backup Bash Cheatsheet Claude Cloud Coding Community Conversion Cookbook Cpu Data Database DeepLearning Deployment Dev DevOps Devtools DGX Spark Digital Detox Docker Documentation Embeddings Filofax Flutter Food Garage GGUF Git Gitea GitHub Go Golang Gpu Grafana Hardware Hosting Hugo Images Infrastructure JavaScript K8S Kubernetes LabelStudio Latex Linux Llama.cpp LLM LLM Performance Logging Machine Learning Mainroad Markdown MCP Melbourne Microservices Minio MMDetection Monitoring Node.js NVidia Object-Storage ObjectDetection Observability Offline Ollama Open Source Openai OpenClaw Opencode Pdf Performance Perplexica Photos PostgreSQL Printing Privacy Prometheus Python PyTorch RAG Reranking Rust S3 Security Self-Hosting SelfHosting SEO Serverless SQL Terminal Terraform Testing TypeScript Ubuntu Vector Database Vector Databases Vllm VS Code VSCode Web Hosting Windows

Social

Twitter
LinkedIn
root@@@glukhov.au
Mastodon
Mastodon
rost @ lemmy.world
rosgluk @ github
rosgluk @ bluesky
gluk @ reddit
rosgluk @ Medium
rosgluk @ blogspot
rosgluk @ tumblr

Languages

  • EN English
  • RU Русский
  • DE Deutsch
  • ES Español
  • FR Français
  • IT Italiano
  • JA 日本語
  • KO 한국어
  • PL Polski
  • NL Nederlands
  • PT Português
  • SV Svenska
Contact | Privacy Policy | Sponsored Technical Contributions | Terms and Conditions
© 2026 Rost Glukhov.