Reduce LLM Costs: Token Optimization Strategies
Cut LLM costs by 80% with smart token optimization
Token optimization is the critical skill separating cost-effective LLM applications from budget-draining experiments.
Cut LLM costs by 80% with smart token optimization
Token optimization is the critical skill separating cost-effective LLM applications from budget-draining experiments.
Python for converting HTML to clean, LLM-ready Markdown
Converting HTML to Markdown is a fundamental task in modern development workflows, particularly when preparing web content for Large Language Models (LLMs), documentation systems, or static site generators like Hugo.
Integrate Ollama with Go: SDK guide, examples, and production best practices.
This guide provides a comprehensive overview of available Go SDKs for Ollama and compares their feature sets.
Comparing Speed, parameters and performance of these two models
Here is a comparison between Qwen3:30b and GPT-OSS:20b focusing on instruction following and performance parameters, specs and speed:
+ Specific Examples Using Thinking LLMs
In this post, we’ll explore two ways to connect your Python application to Ollama: 1. Via HTTP REST API; 2. Via the official Ollama Python library.
Slightly different APIs require special approach.
Here’s a side-by-side support comparison of structured output (getting reliable JSON back) across popular LLM providers, plus minimal Python examples
A couple of ways to get structured output from Ollama
Large Language Models (LLMs) are powerful, but in production we rarely want free-form paragraphs. Instead, we want predictable data: attributes, facts, or structured objects you can feed into an app. That’s LLM Structured Output.
Implementing RAG? Here are some Go code bits - 2...
Since standard Ollama doesn’t have a direct rerank API, you’ll need to implement reranking using Qwen3 Reranker in GO by generating embeddings for query-document pairs and scoring them.
Implementing RAG? Here are some codesnippets in Golang..
This little Reranking Go code example is calling Ollama to generate embeddings for the query and for eache candidate document, then sorting descending by cosine similarity.
New awesome LLMs available in Ollama
The Qwen3 Embedding and Reranker models are the latest releases in the Qwen family, specifically designed for advanced text embedding, retrieval, and reranking tasks.
Continuing the topic of extracting data from html
If you’re looking for a Beautiful Soup equivalent in Go, several libraries offer similar HTML parsing and scraping functionality:
LLM to extract text from HTML...
In the Ollama models library there are models that able convert HTML content to Markdown, which is useful for content conversion tasks.
Short list of LLM providers
Using LLMs is not very expensive, might be no need to buy new awesome GPU. Here is a list if LLM providers in the cloud with LLMs they host.
Configuring ollama for parallel requests executions.
When the Ollama server receives two requests at the same time, its behavior depends on its configuration and available system resources.
Comparing two deepseek-r1 models to two base ones
DeepSeek’s first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.