AI - Rost Glukhov | Personal site and technical blog

LLM Performance and PCIe Lanes: Key Considerations

How PCIe Lanes Affect LLM Performance? Depending on the task. For training and multi-gpu inferrence - perdormance drop is significant.

Convert HTML content to Markdown using LLM and Ollama

In the Ollama models library there are models that able convert HTML content to Markdown, which is useful for content conversion tasks.

Search is best for quick, straightforward information retrieval using keywords.
Deep Search excels at understanding context and intent, delivering more relevant and comprehensive results for complex queries.

Will list here some AI-assisted coding tools and AI Coding Assistants and their nice sides.

Using LLMs is not very expensive, might be no need to buy new awesome GPU. Here is a list if LLM providers in the cloud with LLMs they host.

Test: How Ollama is using Intel CPU Performance and Efficient Cores

I’ve got a theory to test - if utilising ALL cores on Intel CPU would raise the speed of LLMs? This is bugging me that new gemma3 27 bit model (gemma3:27b, 17GB on ollama) is not fitting into 16GB VRAM of my GPU, and partially running on CPU.

In the midst of the modern world’s turmoil here I’m comparing tech specs of different cards suitable for AI tasks (Deep Learning, Object Detection and LLMs). They are all incredibly expensive though.

When the Ollama server receives two requests at the same time, its behavior depends on its configuration and available system resources.

Vibe coding is an AI-driven programming approach where developers describe desired functionality in natural language, allowing AI tools to generate code automatically.

I’ve used MMDetection (mmengine, mdet, mmcv) quite a bit, And now looks like it’s out of the game. It’s a pity. I liked it’s model zoo.

DeepSeek’s first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.

Here is the list and examples of the most useful Ollama commands (Ollama commands cheatsheet) I compiled some time ago. Hopefully it will be useful to you.

Not long ago was released. Let’s catch up and test how Mistral Small performs comparing to other LLMs.

Reranking is a second step in Retrieval Augmented Generation (RAG) systems, right between Retrieving and Generating.

Recently Black Forest Labs published a set of text-to-image AI models. These models are told have much higher output quality. Let’s try them out

AI

LLM Performance and PCIe Lanes: Key Considerations

Convert HTML content to Markdown using LLM and Ollama

Search vs Deepsearch vs Deep Research

AI Coding Assistants comparison

Cloud LLM Providers

Test: How Ollama is using Intel CPU Performance and Efficient Cores

Comparing NVidia GPU suitability for AI

How Ollama Handles Parallel Requests

Vibe Coding - Meaning and Description

MMdetection is not supported anymore

Testing Deepseek-R1 on Ollama

Ollama Cheatsheet

Mistral Small, Gemma 2, Qwen 2.5, Mistral Nemo, LLama3 and Phi - LLM Test

Reranking with embedding models

Flux text to image AI model