RAG

Text embeddings for RAG and search - Python, Ollama, OpenAI-compatible APIs

If you are working through retrieval-augmented generation (RAG), this section walks through text embeddings in plain terms — what they are, how they fit search and retrieval, and how to call two common local setups from Python using Ollama or an OpenAI-compatible HTTP API (as many llama.cpp-based servers expose).

Chunking Strategies in RAG Comparison: Alternatives, Trade‑offs, and Examples

Chunking is the most under-estimated hyperparameter in Retrieval ‑ Augmented Generation (RAG): it silently determines what your LLM “sees”, how expensive ingestion becomes, and how much of the LLM’s context window you burn per answer.

Retrieval-Augmented Generation (RAG) Tutorial: Architecture, Implementation, and Production Guide

Production-focused guide to building RAG systems: chunking, vector stores, hybrid retrieval, reranking, evaluation, and when to choose RAG over fine-tuning.

Choosing the right vector store can make or break your RAG application’s performance, cost, and scalability. This comprehensive comparison covers the most popular options in 2024-2025.

Cross-Modal Embeddings: Bridging AI Modalities

Cross-modal embeddings represent a breakthrough in artificial intelligence, enabling understanding and reasoning across different data types within a unified representation space.

Advanced RAG: LongRAG, Self-RAG and GraphRAG Explained

Retrieval-Augmented Generation (RAG) has evolved far beyond simple vector similarity search. LongRAG, Self-RAG, and GraphRAG represent the cutting edge of these capabilities.

Reranking documents with Ollama and Qwen3 Reranker model - in Go

Since standard Ollama doesn’t have a direct rerank API, you’ll need to implement reranking using Qwen3 Reranker in GO by generating embeddings for query-document pairs and scoring them.

Reranking texts with Ollama and Qwen3 Embedding LLM - in Go

This little Reranking Go code example is calling Ollama to generate embeddings for the query and for eache candidate document, then sorting descending by cosine similarity.

Qwen3 Embedding & Reranker Models on Ollama: State-of-the-Art Performance

The Qwen3 Embedding and Reranker models are the latest releases in the Qwen family, specifically designed for advanced text embedding, retrieval, and reranking tasks.

Search vs Deep Search vs Deep Research in 2026

Search is best for quick, straightforward information retrieval using keywords.
Deep Search excels at understanding context and intent, delivering more relevant and comprehensive results for complex queries.

Reranking is a second step in Retrieval Augmented Generation (RAG) systems, right between Retrieving and Generating.

Text embeddings for RAG and search - Python, Ollama, OpenAI-compatible APIs

Chunking Strategies in RAG Comparison: Alternatives, Trade‑offs, and Examples

Retrieval-Augmented Generation (RAG) Tutorial: Architecture, Implementation, and Production Guide

Vector Stores for RAG Comparison

Cross-Modal Embeddings: Bridging AI Modalities

Advanced RAG: LongRAG, Self-RAG and GraphRAG Explained

Reranking documents with Ollama and Qwen3 Reranker model - in Go

Reranking texts with Ollama and Qwen3 Embedding LLM - in Go

Qwen3 Embedding & Reranker Models on Ollama: State-of-the-Art Performance

Search vs Deep Search vs Deep Research in 2026

Reranking with embedding models