RAG

Reduce LLM Costs: Token Optimization Strategies

Token optimization is the critical skill separating cost-effective LLM applications from budget-draining experiments.

Converting HTML to Markdown with Python: A Comprehensive Guide

Converting HTML to Markdown is a fundamental task in modern development workflows, particularly when preparing web content for Large Language Models (LLMs), documentation systems, or static site generators like Hugo.

Go clients for Ollama: SDK comparison and Qwen3/GPT-OSS examples

This guide provides a comprehensive overview of available Go SDKs for Ollama and compares their feature sets.

Here is a comparison between Qwen3:30b and GPT-OSS:20b focusing on instruction following and performance parameters, specs and speed.

Integrating Ollama with Python: REST API and Python Client Examples

In this post, we’ll explore two ways to connect your Python application to Ollama: 1. Via HTTP REST API; 2. Via the official Ollama Python library.

Structured output comparison across popular LLM providers - OpenAI, Gemini, Anthropic, Mistral and AWS Bedrock

Here’s a side-by-side support comparison of structured output (getting reliable JSON back) across popular LLM providers, plus minimal Python examples

Constraining LLMs with Structured Output: Ollama, Qwen3 & Python or Go

Large Language Models (LLMs) are powerful, but in production we rarely want free-form paragraphs. Instead, we want predictable data: attributes, facts, or structured objects you can feed into an app. That’s LLM Structured Output.

Reranking documents with Ollama and Qwen3 Reranker model - in Go

Since standard Ollama doesn’t have a direct rerank API, you’ll need to implement reranking using Qwen3 Reranker in GO by generating embeddings for query-document pairs and scoring them.

Reranking texts with Ollama and Qwen3 Embedding LLM - in Go

This little Reranking Go code example is calling Ollama to generate embeddings for the query and for eache candidate document, then sorting descending by cosine similarity.

Qwen3 Embedding & Reranker Models on Ollama: State-of-the-Art Performance

The Qwen3 Embedding and Reranker models are the latest releases in the Qwen family, specifically designed for advanced text embedding, retrieval, and reranking tasks.

For a direct Beautiful Soup analogue in Go, use soup.
For CSS selector support, consider goquery.
For XPath queries, use htmlquery.
For another Beautiful Soup-inspired option, look at Node.

If you’re looking for a Beautiful Soup equivalent in Go, several libraries offer similar HTML parsing and scraping functionality:

Convert HTML content to Markdown using LLM and Ollama

In the Ollama models library there are models that able convert HTML content to Markdown, which is useful for content conversion tasks.

Using LLMs is not very expensive, might be no need to buy new awesome GPU. Here is a list if LLM providers in the cloud with LLMs they host.

This guide explains how Ollama handles parallel requests (concurrency, queuing, and resource limits), and how to tune it using the OLLAMA_NUM_PARALLEL environment variable (and related knobs).

DeepSeek’s first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.

Reranking is a second step in Retrieval Augmented Generation (RAG) systems, right between Retrieving and Generating.

Reduce LLM Costs: Token Optimization Strategies

Converting HTML to Markdown with Python: A Comprehensive Guide

Go clients for Ollama: SDK comparison and Qwen3/GPT-OSS examples

Comparison: Qwen3:30b vs GPT-OSS:20b

Integrating Ollama with Python: REST API and Python Client Examples

Structured output comparison across popular LLM providers - OpenAI, Gemini, Anthropic, Mistral and AWS Bedrock

Constraining LLMs with Structured Output: Ollama, Qwen3 & Python or Go

Reranking documents with Ollama and Qwen3 Reranker model - in Go

Reranking texts with Ollama and Qwen3 Embedding LLM - in Go

Qwen3 Embedding & Reranker Models on Ollama: State-of-the-Art Performance

Beautiful Soup Alternatives for Go

Convert HTML content to Markdown using LLM and Ollama

Cloud LLM Providers

How Ollama Handles Parallel Requests

Testing Deepseek-R1 on Ollama

Reranking with embedding models