Self-Hosting

Run Docker Compose as a Linux Service with systemd

Docker Compose on a Linux server should start on boot, stop cleanly on shutdown, and survive reboots without manual intervention.

Install Docker on Ubuntu: APT, Snap, Rootless — Complete Guide 2026

Installing Docker on Ubuntu should be simple, but in practice several Docker-shaped options compete for the same command name, each with different packaging, upgrade behavior, and security implications.

Ubuntu APT Troubleshooting: Fix Broken Packages, Holds, and GPG Errors

APT failures are common on long-lived Ubuntu machines, and they usually appear after a release upgrade, a third-party repository change, a removed PPA, a manually installed .deb, or an interrupted package installation.

Unload All llama.cpp Router Models Without Restarting

llama.cpp router mode is one of the most useful changes to llama-server in years. It finally gives local LLM operators something close to the model management experience people expect from Ollama, while keeping the raw performance and low-level control that make llama.cpp worth using in the first place.

Agentic LLM Inference Parameters Reference for Qwen 3.6 and Gemma 4

This page is a practical reference for agentic LLM inference tuning (temperature, top_p, top_k, penalties, and how they interact in multi-step and tool-heavy workflows).

You already chat to Hermes Agent from your phone with text. Now you want to talk to it directly and get spoken replies back. That is usually the right move, especially if you already use Hermes as a persistent self-hosted assistant. Typing long prompts on a small screen is slow and error-prone

NemoClaw practical guide for secure OpenClaw operations in 2026

Most AI agent stacks still treat security as a post-demo fix. NemoClaw starts from the opposite assumption and makes isolation, policy, and routing day-zero defaults.

Knowledge Management in 2026: PKM Tools, Self-Hosted Wikis & Digital Systems

Personal knowledge management spans Obsidian, Logseq, DokuWiki, Zettelkasten, and PARA — the right choice depends on whether you want a local note graph, a self-hosted wiki, or an outliner-driven workflow.

Claude, OpenClaw, and the End of Flat Pricing for Agents

The quiet loophole that powered a wave of agent experimentation is now closed.

Vane (Perplexica 2.0) Quickstart With Ollama and llama.cpp

Vane is one of the more pragmatic entries in the “AI search with citations” space: a self-hosted answering engine that mixes live web retrieval with local or cloud LLMs, while keeping the whole stack under your control.

TGI - Text Generation Inference - Install, Config, Troubleshoot

Text Generation Inference (TGI) has a very specific energy. It is not the newest kid in the inference street, but it is the one that already learned how production breaks -

16 GB VRAM LLM benchmarks with llama.cpp (speed and context)

Here I am comparing speed of several LLMs running on GPU with 16GB of VRAM, and choosing the best one for self-hosting.

RTX 5090 in Australia March 2026 Pricing Stock Reality

Australia has RTX 5090 stock. Barely. And if you find one, you will pay a premium that feels detached from reality.

Remote Ollama access via Tailscale or WireGuard, no public ports

Ollama is at its happiest when it is treated like a local daemon: the CLI and your apps talk to a loopback HTTP API, and the rest of the network never finds out it exists.

Ollama in Docker Compose with GPU and Persistent Model Storage

Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not.

Ollama behind a reverse proxy with Caddy or Nginx for HTTPS streaming

Running Ollama behind a reverse proxy is the simplest way to get HTTPS, optional access control, and predictable streaming behaviour.