Which is the best LLM for text translation on OLLAMA?

The best LLM for text translation on OLLAMA is Mistral small 3.1 24b.

Comparison of Hugo Page Translation quality - LLMs on Ollama

qwen3 8b, 14b and 30b, devstral 24b, mistral small 24b

Page content

In this test I’m comparing how different LLMs hosted on Ollama translate Hugo page in English to German. Three pages I tested were on different topics, had some nice markdown with some structure: headers, lists, tables, links, etc.

All these models did run on NVidia GPU with 16GB VRAM, some of them did fit completely into VRAM, some (24b and 30b) didn’t and spilt over to CPU.

Nevertheless, tested those too.

two pages - translation is in progress

The Ollama models I’ve tested:

qwen3:8b
qwen3:14b
qwen3:30b-a3b
devstral:24b
mistral-small3.1:24b

After running translation prompts on all of those models I asked smart Anthropic model to analyse the quality of the translations, compare and provide metrics for each model.

I have also tried magistral:24b but it was thinking too much, so was slow and was eating up all the context. So - not for my GPU.

The simple conclusion: TL;DR

The best one is Mistral Small 3.1 24b. It doesn’t fit into 16GB VRAM, so was a bit slow on my hardware.

The next two best: Qwen 3 14b and Qwen 3 30b. 14b fits nicely, but 30b is a MoE model, so quite fast too. On my hardware their speed was similar.

Test 1: Phylosophical topic

The markdown of the Hugo-based website page on some awesome philosophical topic - Ad Baculum Lofical Fallacy description and examples.

index.de.devstral-24b.md - Rating: 7/10 Strengths:

Good technical terminology translation (“argumentum ad baculum Fehlschluss”)
Maintains proper link formatting and structure
Accurate translation of complex philosophical concepts
Proper German grammar and syntax

Weaknesses:

“verunstaltet” for “tainted” is somewhat awkward; “getrübt” would be more natural
Some phrases feel slightly stiff or overly literal
Minor flow issues in certain sentences

index.de.mistral-small3.1-24b.md - Rating: 8.5/10 Strengths:

Very natural German language flow
Excellent translation of technical terms
Good preservation of original meaning and tone
Proper grammar and sentence structure
Uses “getrübt” for “tainted” which is more natural

Weaknesses:

Very minor inconsistencies in terminology
Occasional slight deviations from source structure

index.de.qwen3-8b.md - Rating: 6.5/10 Strengths:

Accurate translation of main concepts
Technical terms properly handled
Maintains article structure

Weaknesses:

Some awkward phrasing (“kann sich auch in alltäglichen Interaktionen breiten”)
Less natural German flow in several passages
Some grammatical constructions feel forced
Missing word in “ein Werkzeug zur Unterdrückung von Widerstand und zur Gestaltung von Ergebnissen in Gunst der Machtbesitzer darstellt” (grammar error)

index.de.qwen3-14b.md - Rating: 8/10 Strengths:

Natural German language flow
Excellent handling of technical terminology
Good preservation of original meaning
Proper grammar and syntax
Consistent style throughout

Weaknesses:

Minor variations in terminology consistency
Some sentences could be more concise

index.de.qwen3-30b-a3b.md - Rating: 7.5/10 Strengths:

Good overall translation quality
Natural German expression
Technical terms well translated
Maintains readability and flow
Good preservation of meaning

Weaknesses:

Some minor inconsistencies in word choice
Occasional slightly awkward phrasing
Missing article in final paragraph “über das [Terrain der] Argumentation”

Summary Rankings

File	Translation Quality Score
index.de.mistral-small3.1-24b.md	8.5/10
index.de.qwen3-14b.md	8.0/10
index.de.qwen3-30b-a3b.md	7.5/10
index.de.devstral-24b.md	7.0/10
index.de.qwen3-8b.md	6.5/10

Best Translation: index.de.mistral-small3.1-24b.md - Most natural German flow with excellent technical accuracy

Areas for Improvement: The Qwen3-8b version needs grammatical corrections and more natural phrasing, while others mainly need minor refinements in word choice and consistency.

Test 2: Translation of the page about Qwen3 Embedding & Reranker Models on Ollama.

Based on my analysis of the German translations compared to the original English text, here are the quality ratings on a scale of 1-10:

File	LLM Model	Translation Quality Score	Comments
`index.de.devstral-24b.md`	Devstral 24B	8.5/10	Excellent overall translation with natural German flow, proper technical terminology, and complete Hugo shortcode preservation. Minor deductions for some slightly awkward phrasing.
`index.de.mistral-small3.1-24b.md`	Mistral Small 3.1 24B	8.7/10	Very high quality translation with consistent terminology, natural German language, and perfect formatting preservation. Slightly better flow than Devstral.
`index.de.qwen3-8b.md`	Qwen3 8B	6.5/10	Mixed quality - contains English-German language mixing in opening paragraph (“The [Qwen3 Embedding and Reranker models]” instead of full German). Rest is well translated but inconsistent.
`index.de.qwen3-14b.md`	Qwen3 14B	8.2/10	Good German translation with proper technical terminology and natural flow. Well-structured and consistent throughout.
`index.de.qwen3-30b-a3b.md`	Qwen3 30B	9.0/10	Excellent translation quality with very natural German language, perfect technical terminology consistency, and outstanding readability. Best overall translation.

Key Evaluation Criteria:

Strengths observed across translations:

Hugo shortcode preservation ({ {< ref >}} syntax)
Technical terminology accuracy
Markdown formatting maintenance
Table structure preservation
Image reference consistency

Quality differentiators:

Language consistency - Qwen3 8B had English-German mixing issues
Natural German flow - Qwen3 30B achieved the most natural German expression
Technical precision - All models handled technical terms well
Formatting preservation - All maintained Hugo structure perfectly

Ranking by Quality:

Qwen3 30B (9.0/10) - Best overall
Mistral Small 3.1 24B (8.7/10) - Very strong
Devstral 24B (8.5/10) - Excellent
Qwen3 14B (8.2/10) - Good
Qwen3 8B (6.5/10) - Adequate with issues

The larger models (Qwen3 30B and Mistral Small 3.1 24B) produced the highest quality translations, while the smaller Qwen3 8B model showed clear limitations in consistency.

Test 3: Reranking with Ollama and Qwen3 Embedding model - in Go

Basically, it is a previous post on this website: Reranking text documents with Ollama and Qwen3 Embedding model - in Go.

Based on my detailed analysis of the German translations, here’s the comparative assessment:

Translation Quality Analysis (1-10 Scale)

File	Model	Score	Key Assessment
index.de.devstral-24b.md	DeVistral 24B	9/10	Excellent - Most consistent and professional
index.de.mistral-small3.1-24b.md	Mistral Small 3.1 24B	8.5/10	Very Good - Natural but minor inconsistencies
index.de.qwen3-14b.md	Qwen3 14B	7/10	Good - Solid with some mixed elements
index.de.qwen3-30b-a3b.md	Qwen3 30B	6.5/10	Fair - Contains untranslated English
index.de.qwen3-8b.md	Qwen3 8B	5.5/10	Poor - Multiple translation failures

Detailed Comparative Analysis:

🥇 DeVistral 24B (9/10) - Winner Strengths:

Technical Precision: Uses “ollama API-Fehler” (lowercase, more natural)
Consistent Translation: “handelt mit parallelen Anfragen” (sophisticated phrasing)
Professional Quality: Complete German translation throughout
Formatting: Perfect preservation of markdown structure

Minor Issues: Very few, mostly stylistic preferences

🥈 Mistral Small 3.1 24B (8.5/10) Strengths:

Natural German flow
Good technical terminology: “Ollama-API-Fehler” (hyphenated, formal style)
Consistent “verarbeitet parallele Anfragen”

Issues:

Slightly less sophisticated phrasing in some technical contexts
Minor formatting variations

🥉 Qwen3 14B (7/10) Strengths:

Generally accurate meaning preservation
Proper German grammar structure

Issues:

Some awkward constructions
Mixed quality in technical terminology
Contains “```” at start and formatting issues

Qwen3 30B (6.5/10) Critical Flaw:

Contains untranslated English: “This little” appears directly in German text
Despite larger model size, shows translation inconsistencies
Technical accuracy is present but delivery is flawed

Qwen3 8B (5.5/10) Major Issues:

Untranslated sections: “This little” left in English
Inconsistent terminology: Uses “RANGORDNUNG” vs standard “RANGIERUNG”
Poor flow: Less natural German constructions
Shows clear limitations of smaller model size

Key Differentiators Found:

API Error Translation:
- DeVistral: ollama API-Fehler (natural, lowercase)
- Others: Ollama-API-Fehler (formal, hyphenated)
Parallel Requests:
- DeVistral: “handelt mit parallelen Anfragen” (sophisticated)
- Mistral/Qwen3-14B/30B: “verarbeitet parallele Anfragen” (standard)
- Qwen3-8B: “behandelt parallele Anfragen” (less precise)
Section Headers:
- Most: “RANGIERUNG NACH ÄHNLICHKEIT” (standard)
- Qwen3-8B: “RANGORDNUNG DURCH ÄHNLICHKEIT” (awkward)
English Contamination:
- Qwen3-30B & Qwen3-8B: Leave “This little” untranslated
- Others: Properly translate to “Dieses kleine”

Winner: DeVistral 24B - Shows the most consistent professional-level German translation with sophisticated technical terminology and complete language conversion.

Hehe DeVistral :). But that’s another LLM’s error.

The simple conclusion: TL;DR

Test 1: Phylosophical topic

Test 2: Translation of the page about Qwen3 Embedding & Reranker Models on Ollama.

Test 3: Reranking with Ollama and Qwen3 Embedding model - in Go

Useful links