Claude, OpenClaw, and the End of Flat Pricing for Agents
Claude subscriptions no longer power agents
The quiet loophole that powered a wave of agent experimentation is now closed.
Claude subscriptions no longer power agents
The quiet loophole that powered a wave of agent experimentation is now closed.
Self-hosted AI search with local LLMs
Vane is one of the more pragmatic entries in the “AI search with citations” space: a self-hosted answering engine that mixes live web retrieval with local or cloud LLMs, while keeping the whole stack under your control.
Agentic coding, now with local model backends.
Claude Code is not autocomplete with better marketing. It is an agentic coding tool: it reads your codebase, edits files, runs commands, and integrates with your development tools.
Hermes Agent install and quickstart for devs
Hermes Agent is a self-hosted, model-agnostic AI assistant that runs on a local machine or low-cost VPS, works through terminal and messaging interfaces, and improves over time by turning repeated tasks into reusable skills.
Install TGI, ship fast, debug faster
Text Generation Inference (TGI) has a very specific energy. It is not the newest kid in the inference street, but it is the one that already learned how production breaks -
llama.cpp token speed on 16 GB VRAM (tables).
Here I am comparing speed of several LLMs running on GPU with 16GB of VRAM, and choosing the best one for self-hosting.
RTX 5090 in AU is scarce and overpriced
Australia has RTX 5090 stock. Barely. And if you find one, you will pay a premium that feels detached from reality.
Remote Ollama access without public ports
Ollama is at its happiest when it is treated like a local daemon: the CLI and your apps talk to a loopback HTTP API, and the rest of the network never finds out it exists.
Queryable JSON logs that connect to traces.
Logs are a debugging interface you can still use when the system is on fire. The problem is that plain text logs age poorly: as soon as you need filtering, aggregation, and alerting, you start parsing sentences.
Compose-first Ollama server with GPU and persistence.
Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not.
HTTPS Ollama without breaking streaming responses.
Running Ollama behind a reverse proxy is the simplest way to get HTTPS, optional access control, and predictable streaming behaviour.
RAG embeddings - Python, Ollama, OpenAI APIs.
If you are working through retrieval-augmented generation (RAG), this section walks through text embeddings in plain terms — what they are, how they fit search and retrieval, and how to call two common local setups from Python using Ollama or an OpenAI-compatible HTTP API (as many llama.cpp-based servers expose).
Git-based deploys, CDN, credits, and trade-offs.
Netlify is one of the most developer-friendly ways to ship Hugo sites and modern web apps with a production-grade workflow: preview URLs for every pull request, atomic deploys, a global CDN, and optional serverless and edge capabilities.
Stateful streaming, checkpoints, K8s, PyFlink, Go.
Apache Flink is a framework for stateful computations over unbounded and bounded data streams.
Graphs, Cypher, vectors, and ops hardening.
Neo4j is what you reach for when the relationships are the data. If your domain looks like a whiteboard of circles and arrows, forcing it into tables is painful.
Push URL updates to search engines after deploy.
Static sites and blogs change whenever you deploy. Search engines that support IndexNow can learn about those changes without waiting for the next blind crawl.