Self-Hosting Cognee: Choosing LLM on Ollama
Testing Cognee with local LLMs - real results
Cognee is a Python framework for building knowledge graphs from documents using LLMs. But does it work with self-hosted models?
Testing Cognee with local LLMs - real results
Cognee is a Python framework for building knowledge graphs from documents using LLMs. But does it work with self-hosted models?
Thoughts on LLMs for self-hosted Cognee
Choosing the Best LLM for Cognee demands balancing graph-building quality, hallucination rates, and hardware constraints. Cognee excels with larger, low-hallucination models (32B+) via Ollama but mid-size options work for lighter setups.
Build AI search agents with Python and Ollama
Ollama’s Python library now includes native OLlama web search capabilities. With just a few lines of code, you can augment your local LLMs with real-time information from the web, reducing hallucinations and improving accuracy.
Pick the right vector DB for your RAG stack
Choosing the right vector store can make or break your RAG application’s performance, cost, and scalability. This comprehensive comparison covers the most popular options in 2024-2025.
Build AI search agents with Go and Ollama
Ollama’s Web Search API lets you augment local LLMs with real-time web information. This guide shows you how to implement web search capabilities in Go, from simple API calls to full-featured search agents.
RAM prices surge 163-619% as AI demand strains supply
The memory market is experiencing unprecedented price volatility in late 2025, with RAM prices surging dramatically across all segments.
Master local LLM deployment with 12+ tools compared
Local deployment of LLMs has become increasingly popular as developers and organizations seek enhanced privacy, reduced latency, and greater control over their AI infrastructure.
AI-suitable Consumer GPU' Prices - RTX 5080 and RTX 5090
Let’s compare prices for top-level consumer GPUs, that are suitable for LLMs in particular and AI in general. Specifically I’m looking at RTX-5080 and RTX-5090 prices.
Deploy enterprise AI on budget hardware with open models
The democratization of AI is here. With open-source LLMs like Llama 3, Mixtral, and Qwen now rivaling proprietary models, teams can build powerful AI infrastructure using consumer hardware - slashing costs while maintaining complete control over data privacy and deployment.
Set up robust infrastructure monitoring with Prometheus
Prometheus has become the de facto standard for monitoring cloud-native applications and infrastructure, offering metrics collection, querying, and integration with visualization tools.
Master Grafana setup for monitoring & visualization
Grafana is the leading open-source platform for monitoring and observability, transforming metrics, logs, and traces into actionable insights through stunning visualizations.
Deploy stateful apps with ordered scaling & persistent data
Kubernetes StatefulSets are the go-to solution for managing stateful applications that require stable identities, persistent storage, and ordered deployment patterns—essential for databases, distributed systems, and caching layers.
Speed-up FLUX.1-dev with GGUF quantization
FLUX.1-dev is a powerful text-to-image model that produces stunning results, but its 24GB+ memory requirement makes it challenging to run on many systems. GGUF quantization of FLUX.1-dev offers a solution, reducing memory usage by approximately 50% while maintaining excellent image quality.
Configure context sizes in Docker Model Runner with workarounds
Configuring context sizes in Docker Model Runner is more complex than it should be.
AI model for augmenting images with text instructions
Black Forest Labs has released FLUX.1-Kontext-dev, an advanced image-to-image AI model that augments existing images using text instructions.
Enable GPU acceleration for Docker Model Runner with NVIDIA CUDA support
Docker Model Runner is Docker’s official tool for running AI models locally, but enabling NVidia GPU acceleration in Docker Model Runner requires specific configuration.