Comparison of Hugo Page Translation quality - LLMs on Ollama
qwen3 8b, 14b and 30b, devstral 24b, mistral small 24b
In this test I’m comparing how different LLMs hosted on Ollama translate Hugo page in English to German.
qwen3 8b, 14b and 30b, devstral 24b, mistral small 24b
In this test I’m comparing how different LLMs hosted on Ollama translate Hugo page in English to German.
Implementing RAG? Here are some codesnippets in Golang..
This little Reranking Go code example is calling Ollama to generate embeddings for the query and for eache candidate document, then sorting descending by cosine similarity.
Thinking of installing second gpu for LLMs?
How PCIe Lanes Affect LLM Performance? Depending on the task. For training and multi-gpu inferrence - perdormance drop is significant.
LLM to extract text from HTML...
In the Ollama models library there are models that able convert HTML content to Markdown, which is useful for content conversion tasks.
Which AI research mode fits your task?
Cursor AI vs GitHub Copilot vs Cline AI vs...
Will list here some AI-assisted coding tools and AI Coding Assistants and their nice sides.
Short list of LLM providers
Using LLMs is not very expensive, might be no need to buy new awesome GPU. Here is a list if LLM providers in the cloud with LLMs they host.
Ollama on Intel CPU Efficient vs Performance cores
I’ve got a theory to test - if utilising ALL cores on Intel CPU would raise the speed of LLMs? This is bugging me that new gemma3 27 bit model (gemma3:27b, 17GB on ollama) is not fitting into 16GB VRAM of my GPU, and partially running on CPU.
AI requires a lot of power...
In the midst of the modern world’s turmoil here I’m comparing tech specs of different cards suitable for AI tasks (Deep Learning, Object Detection and LLMs). They are all incredibly expensive though.
Understand Ollama concurrency, queueing, and how to tune OLLAMA_NUM_PARALLEL for stable parallel requests.
This guide explains how Ollama handles parallel requests (concurrency, queuing, and resource limits), and how to tune it using the OLLAMA_NUM_PARALLEL environment variable (and related knobs).
AI-assisted coding where you describe, not type
Vibe coding is an AI-driven programming approach where developers describe desired functionality in natural language and let AI tools generate the code automatically — with minimal direct writing.
Whole set of MM* tools is on EOL...
I’ve used MMDetection (mmengine, mdet, mmcv) quite a bit, And now looks like it’s out of the game. It’s a pity. I liked it’s model zoo.
Comparing two deepseek-r1 models to two base ones
DeepSeek’s first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
Updated Ollama command list - ls, ps, run, serve, etc
This Ollama CLI cheatsheet focuses on the commands you use every day (ollama ls, ollama serve, ollama run, ollama ps, model management, and common workflows), with examples you can copy/paste.
Next round of LLM tests
Not long ago was released. Let’s catch up and test how Mistral Small performs comparing to other LLMs.
A python code of RAG's reranking