Rost Glukhov | Personal site and technical blog

Oh My Opencode Review: Honest Results, Billing Risks, and When It's Worth It

Oh My Opencode promises a “virtual AI dev team” — Sisyphus orchestrating specialists, tasks running in parallel, and the magic ultrawork keyword activating all of it.

I have tested how OpenCode works with several locally hosted on Ollama LLMs, and for comparison added some Free models from OpenCode Zen.

Oh My Opencode Specialised Agents Deep Dive and Model Guide

The biggest capability jump in OpenCode comes from specialised agents: deliberate separation of orchestration, planning, execution, and research.

OpenHands Coding Assistant QuickStart: Install, CLI Flags, Examples

OpenHands is an open-source, model-agnostic platform for AI-driven software development agents. It lets an agent behave more like a coding partner than a simple autocomplete tool.

LocalAI QuickStart: Run OpenAI-Compatible LLMs Locally

LocalAI is a self-hosted, local-first inference server designed to behave like a drop-in OpenAI API for running AI workloads on your own hardware (laptop, workstation, or on-prem server).

Oh My Opencode QuickStart for OpenCode: Install, Configure, Run

Oh My Opencode turns OpenCode into a multi-agent coding harness: an orchestrator delegates work to specialist agents that run in parallel.

llama.cpp Quickstart with CLI and Server

I keep coming back to llama.cpp for local inference—it gives you control that Ollama and others abstract away, and it just works. Easy to run GGUF models interactively with llama-cli or expose an OpenAI-compatible HTTP API with llama-server.

OpenCode Quickstart: Install, Configure, and Use the Terminal AI Coding Agent

OpenCode is an open source AI coding agent you can run in the terminal (TUI + CLI) with optional desktop and IDE surfaces. This is the OpenCode Quickstart: install, verify, connect a model/provider, and run real workflows (CLI + API).

Airtable for Developers & DevOps - Plans, API, Webhooks, and Go/Python Examples

Airtable is best thought of as a low‑code application platform built around a collaborative “database-like” spreadsheet UI - excellent for rapidly creating operational tooling (internal trackers, lightweight CRMs, content pipelines, AI evaluation queues) where non-developers need a friendly interface, but developers also need an API surface for automation and integration.

Monitor LLM Inference in Production (2026): Prometheus & Grafana for vLLM, TGI, llama.cpp

LLM inference looks like “just another API” — until latency spikes, queues back up, and your GPUs sit at 95% memory with no obvious explanation.

OpenClaw: Examining a Self-Hosted AI Assistant as a Real System

Most local AI setups start the same way: a model, a runtime, and a chat interface.

OpenClaw Quickstart: Install with Docker (Ollama GPU or Claude + CPU)

OpenClaw is a self-hosted AI assistant designed to run with local LLM runtimes like Ollama or with cloud-based models such as Claude Sonnet.

Garage vs MinIO vs AWS S3: Object Storage Comparison and Feature Matrix

AWS S3 remains the “default” baseline for object storage: it is fully managed, strongly consistent, and designed for extremely high durability and availability.
Garage and MinIO are self-hosted, S3-compatible alternatives: Garage is designed for lightweight, geo-distributed small-to-medium clusters, while MinIO emphasises broad S3 API feature coverage and high performance in larger deployments.

Implementing Workflow Applications with Temporal in Go: A Complete Guide

Temporal is an open-source, enterprise-grade workflow engine that enables developers to build durable, scalable, and fault-tolerant workflow applications using familiar programming languages like Go.

Garage - S3 compatible object storage Quickstart

Garage is an open-source, self-hosted, S3-compatible object storage system designed for small-to-medium deployments, with a strong emphasis on resilience and geo-distribution.

Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production

LLM systems fail in ways that traditional API monitoring cannot surface — queues fill silently, GPU memory saturates long before CPU looks busy, and latency blows up at the batching layer rather than the application layer. This guide covers an end-to-end observability strategy for LLM inference and LLM applications: what to measure, how to instrument it with Prometheus, OpenTelemetry, and Grafana, and how to deploy the telemetry pipeline at scale.