What is the best embedding model?

Qwen 3 embedding on ollama is probably the bestembedding LLM right now.

Qwen3 Embedding & Reranker Models on Ollama: State-of-the-Art Performance

New awesome LLMs available in Ollama

Page content

The Qwen3 Embedding and Reranker models are the latest releases in the Qwen family, specifically designed for advanced text embedding, retrieval, and reranking tasks.

Joy for the eye Qwen3 Embedding Reranker Context length and embedding dimensions

The Qwen3 Embedding and Reranker models represent a significant advancement in multilingual natural language processing (NLP), offering state-of-the-art performance in text embedding and reranking tasks. These models, part of the Qwen series developed by Alibaba, are designed to support a wide range of applications, from semantic retrieval to code search. While Ollama is a popular open-source platform for hosting and deploying large language models (LLMs), the integration of Qwen3 models with Ollama is not explicitly detailed in official documentation. However, the models are accessible via Hugging Face, GitHub, and ModelScope, enabling potential local deployment through Ollama or similar tools.

Examples using these models

Please see sample code in Go using ollama with these models:

Overview of New Qwen3 Embedding and Reranker Models on Ollama

These models are now available for deployment on Ollama in various sizes, providing state-of-the-art performance and flexibility for a wide range of language and code-related applications.

Key Features and Capabilities

Model Sizes and Flexibility
- Available in multiple sizes: 0.6B, 4B, and 8B parameters for both embedding and reranking tasks.
- The 8B embedding model currently ranks No.1 on the MTEB multilingual leaderboard (as of June 5, 2025, with a score of 70.58).
- Supports a range of quantization options (Q4, Q5, Q8, etc.) for balancing performance, memory usage, and speed. Q5_K_M is recommended for most users as it preserves most model performance while being resource efficient.
Architecture and Training
- Built on the Qwen3 foundation, leveraging both dual-encoder (for embeddings) and cross-encoder (for reranking) architectures.
- Embedding model: Processes single text segments, extracting semantic representations from the final hidden state.
- Reranker model: Takes text pairs (e.g., query and document) and outputs a relevance score using a cross-encoder approach.
- Embedding models use a three-stage training paradigm: contrastive pre-training, supervised training with high-quality data, and model merging for optimal generalization and adaptability.
- Reranker models are trained directly with high-quality labeled data for efficiency and effectiveness.
Multilingual and Multitask Support
- Supports over 100 languages, including programming languages, enabling robust multilingual, cross-lingual, and code retrieval capabilities.
- Embedding models allow for flexible vector definitions and user-defined instructions to tailor performance to specific tasks or languages.
Performance and Use Cases
- State-of-the-art results in text retrieval, code retrieval, classification, clustering, and bitext mining.
- Reranker models excel in various text retrieval scenarios and can be seamlessly combined with embedding models for end-to-end retrieval pipelines.

How to Use on Ollama

You can run these models on Ollama with commands like:

ollama run dengcao/Qwen3-Embedding-8B:Q5_K_M
ollama run dengcao/Qwen3-Reranker-0.6B:F16

Choose the quantization version that best fits your hardware and performance needs.

Summary Table

Model Type	Sizes Available	Key Strengths	Multilingual Support	Quantization Options
Embedding	0.6B, 4B, 8B	Top MTEB scores, flexible, efficient, SOTA	Yes (100+ languages)	Q4, Q5, Q6, Q8, etc.
Reranker	0.6B, 4B, 8B	Excels at text pair relevance, efficient, flexible	Yes	F16, Q4, Q5, etc.

Awesome news!

The Qwen3 Embedding and Reranker models on Ollama represent a significant leap in multilingual, multitask text and code retrieval capabilities. With flexible deployment options, strong benchmark performance, and support for a wide range of languages and tasks, they are well-suited for both research and production environments.

Model zoo - pleasure for the eye now

Qwen3 Embedding

https://ollama.com/dengcao/Qwen3-Embedding-8B

Qwen3 Embedding 8b

https://ollama.com/dengcao/Qwen3-Embedding-4B/tags

Qwen3 Embedding 4b

https://ollama.com/dengcao/Qwen3-Embedding-0.6B/tags

Qwen3 Embedding 0.6b

Qwen3 Reranker

https://ollama.com/dengcao/Qwen3-Reranker-8B

Qwen3 Reranker 8b

dengcao/Qwen3-Reranker-8B:Q3_K_M
dengcao/Qwen3-Reranker-8B:Q5_K_M

https://ollama.com/dengcao/Qwen3-Reranker-4B/tags

dengcao/Qwen3-Reranker-4B:Q5_K_M

Qwen3-Reranker-4B

https://ollama.com/dengcao/Qwen3-Reranker-0.6B/tags

Qwen3-Reranker-0.6B

Nice!