AI - Page 2 - Rost Glukhov | Personal site and technical blog

AI Assistant Architecture: LLM, Memory, Tools, Routing, Observability

A production AI assistant is not “an LLM with a prompt”. It is a system that accepts intent, keeps state, decides when to retrieve or act, and exposes enough runtime detail to debug failures.

AI for Knowledge Management: Real Workflows That Hold Up

AI is not replacing knowledge management; it is changing the shape of it for both individuals and teams.

OpenClaw vs Hermes Agent: Stars, Downloads & Usage 2026

Open-source AI agent frameworks are exploding in popularity on GitHub. Two projects at the core of the self-hosted AI systems ecosystem — OpenClaw and Hermes Agent — have pulled so far ahead that the rest of the field is fighting for a distant third place.

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

I tested Speculative decoding (Multi-Token Prediction, MTP) performance in Qwen 3.6 27B and 35B on an RTX 4080 with 16 GB VRAM.

Unload All llama.cpp Router Models Without Restarting

llama.cpp router mode is one of the most useful changes to llama-server in years. It finally gives local LLM operators something close to the model management experience people expect from Ollama, while keeping the raw performance and low-level control that make llama.cpp worth using in the first place.

LLM Wiki - Compiled Knowledge That RAG Cannot Replace

The premise is simple: compiled knowledge is more reusable than retrieved fragments. RAG became the default answer to a straightforward question - how do I give an LLM access to external knowledge?

PKM vs RAG vs Wiki vs Memory Systems Explained Clearly

PKM, RAG, wikis, AI memory systems, and now practical AI-assisted workflows are often discussed as if they solve the same problem. They do not. They all deal with knowledge, but they operate at different layers:

LLM Structured Output Validation in Python That Holds Up

Most LLM “structured output” tutorials are unserious. They teach you to ask for JSON politely and then hope the model behaves. That is not validation. That is optimism with braces.

Agentic LLM Inference Parameters Reference for Qwen 3.6 and Gemma 4

This page is a practical reference for agentic LLM inference tuning (temperature, top_p, top_k, penalties, and how they interact in multi-step and tool-heavy workflows).

You already chat to Hermes Agent from your phone with text. Now you want to talk to it directly and get spoken replies back. That is usually the right move, especially if you already use Hermes as a persistent self-hosted assistant. Typing long prompts on a small screen is slow and error-prone

Kanban in Hermes Agent for Self Hosted LLM Workflows

Hermes Agent ships with a Kanban-style board and the Hermes Gateway that can saturate your self-hosted LLM if too many tasks are dispatched at once.

Hermes Agent Skill Authoring — SKILL.md Structure and Best Practices

Hermes Agent treats skills as the default way to teach repeatable workflows. Official documentation describes them as on-demand knowledge documents aligned with the open agentskills.io shape, loaded through progressive disclosure so the model sees a small index first and only pulls full instructions when a task actually needs them.

Hermes Agent CLI cheat sheet — commands, flags, and slash shortcuts

Hermes Agent from Nous Research is a model-agnostic, tool-using assistant you run locally or on a VPS.

NemoClaw practical guide for secure OpenClaw operations in 2026

Most AI agent stacks still treat security as a post-demo fix. NemoClaw starts from the opposite assumption and makes isolation, policy, and routing day-zero defaults.

Agent Memory Providers Compared — Honcho, Mem0, Hindsight, and Five More

Modern assistants still forget everything when you close the tab unless something persists beyond the context window. Agent memory providers are services or libraries that hold facts and summaries across sessions — often wired in as plugins so the framework stays thin while memory scales.

AI Systems Memory — Persistent Knowledge and Agent Memory

This section collects guides on persistent knowledge and memory for AI systems — how assistants keep facts, preferences, and distilled context across sessions without stuffing every token into one prompt. Here, memory means intentional retention (user facts, summaries, plugin-backed stores), not GPU RAM or model weights.