Run Docker Compose as a Linux Service with systemd
Docker Compose on boot, managed by systemd.
Docker Compose on a Linux server should start on boot, stop cleanly on shutdown, and survive reboots without manual intervention.
Docker Compose on boot, managed by systemd.
Docker Compose on a Linux server should start on boot, stop cleanly on shutdown, and survive reboots without manual intervention.
Pick the right Docker install path on Ubuntu.
Installing Docker on Ubuntu should be simple, but in practice several Docker-shaped options compete for the same command name, each with different packaging, upgrade behavior, and security implications.
Fix Ubuntu APT without guesswork.
APT failures are common on long-lived Ubuntu machines, and they usually appear after a release upgrade, a third-party repository change, a removed PPA, a manually installed .deb, or an interrupted package installation.
Free VRAM without killing llama-server.
llama.cpp router mode is one of the most useful changes to llama-server in years. It finally gives local LLM operators something close to the model management experience people expect from Ollama, while keeping the raw performance and low-level control that make llama.cpp worth using in the first place.
Agentic LLM tuning reference
This page is a practical reference for agentic LLM inference tuning (temperature, top_p, top_k, penalties, and how they interact in multi-step and tool-heavy workflows).
Talk to Hermes from your phone
You already chat to Hermes Agent from your phone with text. Now you want to talk to it directly and get spoken replies back. That is usually the right move, especially if you already use Hermes as a persistent self-hosted assistant. Typing long prompts on a small screen is slow and error-prone
Run OpenClaw safely with NemoClaw
Most AI agent stacks still treat security as a post-demo fix. NemoClaw starts from the opposite assumption and makes isolation, policy, and routing day-zero defaults.
PKM tools, methods, and self-hosted wikis compared.
Personal knowledge management spans Obsidian, Logseq, DokuWiki, Zettelkasten, and PARA — the right choice depends on whether you want a local note graph, a self-hosted wiki, or an outliner-driven workflow.
Claude subscriptions no longer power agents
The quiet loophole that powered a wave of agent experimentation is now closed.
Self-hosted AI search with local LLMs
Vane is one of the more pragmatic entries in the “AI search with citations” space: a self-hosted answering engine that mixes live web retrieval with local or cloud LLMs, while keeping the whole stack under your control.
Install TGI, ship fast, debug faster
Text Generation Inference (TGI) has a very specific energy. It is not the newest kid in the inference street, but it is the one that already learned how production breaks -
llama.cpp token speed on 16 GB VRAM (tables).
Here I am comparing speed of several LLMs running on GPU with 16GB of VRAM, and choosing the best one for self-hosting.
RTX 5090 in AU is scarce and overpriced
Australia has RTX 5090 stock. Barely. And if you find one, you will pay a premium that feels detached from reality.
Remote Ollama access without public ports
Ollama is at its happiest when it is treated like a local daemon: the CLI and your apps talk to a loopback HTTP API, and the rest of the network never finds out it exists.
Compose-first Ollama server with GPU and persistence.
Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not.
HTTPS Ollama without breaking streaming responses.
Running Ollama behind a reverse proxy is the simplest way to get HTTPS, optional access control, and predictable streaming behaviour.