RAG - Rost Glukhov | Personal site and technical blog

PARA Method for Engineers: Organize Knowledge by Action

Organizing notes by topic sounds logical until you have notes on PostgreSQL in five different folders and cannot find the one that matters for today’s problem.

Memory turns assistants from reactive to persistent, but it is also where many systems quietly rot. Surveys argue the short-term versus long-term split is no longer enough for modern agent memory; OpenAI and LangGraph SDKs point to a simpler stack — working memory, durable state, and retrieval.

AI Assistant Architecture: LLM, Memory, Tools, Routing, Observability

A production AI assistant is not “an LLM with a prompt”. It is a system that accepts intent, keeps state, decides when to retrieve or act, and exposes enough runtime detail to debug failures.

AI for Knowledge Management: Real Workflows That Hold Up

AI is not replacing knowledge management; it is changing the shape of it for both individuals and teams.

Retrieval vs Representation in Knowledge Systems

Most modern knowledge systems optimize retrieval, and that is understandable. Search is visible, easy to demo, and feels magical when it works. Type a question, get an answer.

LLM Wiki - Compiled Knowledge That RAG Cannot Replace

The premise is simple: compiled knowledge is more reusable than retrieved fragments. RAG became the default answer to a straightforward question - how do I give an LLM access to external knowledge?

PKM vs RAG vs Wiki vs Memory Systems Explained Clearly

PKM, RAG, wikis, AI memory systems, and now practical AI-assisted workflows are often discussed as if they solve the same problem. They do not. They all deal with knowledge, but they operate at different layers:

Second Brain Explained for Engineers and Knowledge Workers

Information overload is less about sheer volume than about unresolved inputs. Modern knowledge work leaves a trail of tabs, chat threads, docs, highlights, snippets, transcripts, screenshots, and half-written notes.

LLM Structured Output Validation in Python That Holds Up

Most LLM “structured output” tutorials are unserious. They teach you to ask for JSON politely and then hope the model behaves. That is not validation. That is optimism with braces.

Text embeddings for RAG and search - Python, Ollama, OpenAI-compatible APIs

If you are working through retrieval-augmented generation (RAG), this section walks through text embeddings in plain terms — what they are, how they fit search and retrieval, and how to call two common local setups from Python using Ollama or an OpenAI-compatible HTTP API (as many llama.cpp-based servers expose).

Neo4j graph database for GraphRAG, install, Cypher, vectors, ops

Neo4j is what you reach for when the relationships are the data. If your domain looks like a whiteboard of circles and arrows, forcing it into tables is painful.

AI Systems: Self-Hosted Assistants, RAG, and Local Infrastructure

Most local AI setups start with a model and a runtime.

OpenClaw Quickstart: Install with Docker (Ollama GPU or Claude + CPU)

OpenClaw is a self-hosted AI assistant designed to run with local LLM runtimes like Ollama or with cloud-based models such as Claude Sonnet.

OpenClaw: Examining a Self-Hosted AI Assistant as a Real System

Most local AI setups start the same way: a model, a runtime, and a chat interface.

Chunking Strategies in RAG Comparison: Alternatives, Trade‑offs, and Examples

Chunking is the most under-estimated hyperparameter in Retrieval ‑ Augmented Generation (RAG): it silently determines what your LLM “sees”, how expensive ingestion becomes, and how much of the LLM’s context window you burn per answer.

Retrieval-Augmented Generation (RAG) Tutorial: Architecture, Implementation, and Production Guide

Production-focused guide to building RAG systems: chunking, vector stores, hybrid retrieval, reranking, evaluation, and when to choose RAG over fine-tuning.