Reranking text documents with Ollama and Qwen3 Embedding model - in Go
Implementing RAG? Here are some codesnippets in Golang..
This little Reranking Go code example is calling Ollama to generate embeddings for the query and for eache candidate document, then sorting descending by cosine similarity.
We already did similar activity - Reranking with embedding models but that was in python, with different LLM and almost a year ago.
TL;DR
The result looks very good, the speed is 0.128s per document. Question is counted as a doc. And Sorting and printing is also included into this stat.
LLM memory consumption:
Even though model size on sdd (ollama ls
) is less then 3GB
dengcao/Qwen3-Embedding-4B:Q5_K_M 7e8c9ad6885b 2.9 GB
In GPU VRAM it takes (not a bit) more: 5.5GB. (ollama ps
)
NAME ID SIZE
dengcao/Qwen3-Embedding-4B:Q5_K_M 7e8c9ad6885b 5.5 GB
If you have 8GB GPU - should be OK.
Testing Reranking with Embeddings on Ollama - Sample Output
In all three test cases reranking with embedding using dengcao/Qwen3-Embedding-4B:Q5_K_M ollama model was awesome! See it for yourselves.
We have 7 files containing some texts describing what their filenamename says:
- ai_introduction.txt
- machine_learning.md
- qwen3-reranking-models.md
- ollama-parallelism.md
- ollama-reranking-models.md
- programming_basics.txt
- setup.log
test runs:
Reranking test: What is artificial intelligence and how does machine learning work?
./rnk example_query.txt example_docs/
Using embedding model: dengcao/Qwen3-Embedding-4B:Q5_K_M
Ollama base URL: http://localhost:11434
Processing query file: example_query.txt, target directory: example_docs/
Query: What is artificial intelligence and how does machine learning work?
Found 7 documents
Extracting query embedding...
Processing documents...
=== RANKING BY SIMILARITY ===
1. example_docs/ai_introduction.txt (Score: 0.451)
2. example_docs/machine_learning.md (Score: 0.388)
3. example_docs/qwen3-reranking-models.md (Score: 0.354)
4. example_docs/ollama-parallelism.md (Score: 0.338)
5. example_docs/ollama-reranking-models.md (Score: 0.318)
6. example_docs/programming_basics.txt (Score: 0.296)
7. example_docs/setup.log (Score: 0.282)
Processed 7 documents in 0.899s (avg: 0.128s per document)
Reranking test: How ollama handles parallel requests?
./rnk example_query2.txt example_docs/
Using embedding model: dengcao/Qwen3-Embedding-4B:Q5_K_M
Ollama base URL: http://localhost:11434
Processing query file: example_query2.txt, target directory: example_docs/
Query: How ollama handles parallel requests?
Found 7 documents
Extracting query embedding...
Processing documents...
=== RANKING BY SIMILARITY ===
1. example_docs/ollama-parallelism.md (Score: 0.557)
2. example_docs/qwen3-reranking-models.md (Score: 0.532)
3. example_docs/ollama-reranking-models.md (Score: 0.498)
4. example_docs/ai_introduction.txt (Score: 0.366)
5. example_docs/machine_learning.md (Score: 0.332)
6. example_docs/programming_basics.txt (Score: 0.307)
7. example_docs/setup.log (Score: 0.257)
Processed 7 documents in 0.858s (avg: 0.123s per document)
Reranking test: How can we do the reranking of the document with ollama?
./rnk example_query3.txt example_docs/
Using embedding model: dengcao/Qwen3-Embedding-4B:Q5_K_M
Ollama base URL: http://localhost:11434
Processing query file: example_query3.txt, target directory: example_docs/
Query: How can we do the reranking of the document with ollama?
Found 7 documents
Extracting query embedding...
Processing documents...
=== RANKING BY SIMILARITY ===
1. example_docs/ollama-reranking-models.md (Score: 0.552)
2. example_docs/ollama-parallelism.md (Score: 0.525)
3. example_docs/qwen3-reranking-models.md (Score: 0.524)
4. example_docs/ai_introduction.txt (Score: 0.369)
5. example_docs/machine_learning.md (Score: 0.346)
6. example_docs/programming_basics.txt (Score: 0.316)
7. example_docs/setup.log (Score: 0.279)
Processed 7 documents in 0.882s (avg: 0.126s per document)
Go Source Code
Put it all into a folder and compile it like
go build -o rnk
Feel free to use it in any entertaining or commercial purpose or upload it to github if you like. MIT license.
main.go
package main
import (
"fmt"
"log"
"os"
"sort"
"time"
"github.com/spf13/cobra"
)
var rootCmd = &cobra.Command{
Use: "rnk [query-file] [target-directory]",
Short: "RAG system using Ollama embeddings",
Long: "A simple RAG system that extracts embeddings and ranks documents using Ollama",
Args: cobra.ExactArgs(2),
Run: runRnk,
}
var (
embeddingModel string
ollamaBaseURL string
)
func init() {
rootCmd.Flags().StringVarP(&embeddingModel, "model", "m", "dengcao/Qwen3-Embedding-4B:Q5_K_M", "Embedding model to use")
rootCmd.Flags().StringVarP(&ollamaBaseURL, "url", "u", "http://localhost:11434", "Ollama base URL")
}
func main() {
if err := rootCmd.Execute(); err != nil {
fmt.Println(err)
os.Exit(1)
}
}
func runRnk(cmd *cobra.Command, args []string) {
queryFile := args[0]
targetDir := args[1]
startTime := time.Now()
fmt.Printf("Using embedding model: %s\n", embeddingModel)
fmt.Printf("Ollama base URL: %s\n", ollamaBaseURL)
fmt.Printf("Processing query file: %s, target directory: %s\n", queryFile, targetDir)
// Read query from file
query, err := readQueryFromFile(queryFile)
if err != nil {
log.Fatalf("Error reading query file: %v", err)
}
fmt.Printf("Query: %s\n", query)
// Find all text files in target directory
documents, err := findTextFiles(targetDir)
if err != nil {
log.Fatalf("Error finding text files: %v", err)
}
fmt.Printf("Found %d documents\n", len(documents))
// Extract embeddings for query
fmt.Println("Extracting query embedding...")
queryEmbedding, err := getEmbedding(query, embeddingModel, ollamaBaseURL)
if err != nil {
log.Fatalf("Error getting query embedding: %v", err)
}
// Process documents
fmt.Println("Processing documents...")
validDocs := make([]Document, 0)
for _, doc := range documents {
embedding, err := getEmbedding(doc.Content, embeddingModel, ollamaBaseURL)
if err != nil {
fmt.Printf("Warning: Failed to get embedding for %s: %v\n", doc.Path, err)
continue
}
similarity := cosineSimilarity(queryEmbedding, embedding)
doc.Score = similarity
validDocs = append(validDocs, doc)
}
if len(validDocs) == 0 {
log.Fatalf("No documents could be processed successfully")
}
// Sort by similarity score (descending)
sort.Slice(validDocs, func(i, j int) bool {
return validDocs[i].Score > validDocs[j].Score
})
// Display results
fmt.Println("\n=== RANKING BY SIMILARITY ===")
for i, doc := range validDocs {
fmt.Printf("%d. %s (Score: %.3f)\n", i+1, doc.Path, doc.Score)
}
totalTime := time.Since(startTime)
avgTimePerDoc := totalTime / time.Duration(len(validDocs))
fmt.Printf("\nProcessed %d documents in %.3fs (avg: %.3fs per document)\n",
len(validDocs), totalTime.Seconds(), avgTimePerDoc.Seconds())
}
documents.go
package main
import (
"fmt"
"os"
"path/filepath"
"strings"
)
func readQueryFromFile(filename string) (string, error) {
content, err := os.ReadFile(filename)
if err != nil {
return "", err
}
return strings.TrimSpace(string(content)), nil
}
func findTextFiles(dir string) ([]Document, error) {
var documents []Document
err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() && isTextFile(path) {
content, err := os.ReadFile(path)
if err != nil {
fmt.Printf("Warning: Could not read file %s: %v\n", path, err)
return nil
}
documents = append(documents, Document{
Path: path,
Content: string(content),
})
}
return nil
})
return documents, err
}
func isTextFile(filename string) bool {
ext := strings.ToLower(filepath.Ext(filename))
textExts := []string{".txt", ".md", ".rst", ".csv", ".json", ".xml", ".html", ".htm", ".log"}
for _, textExt := range textExts {
if ext == textExt {
return true
}
}
return false
}
embeddings.go
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func getEmbedding(text string, model string, ollamaBaseURL string) ([]float64, error) {
req := OllamaEmbeddingRequest{
Model: model,
Prompt: text,
}
jsonData, err := json.Marshal(req)
if err != nil {
return nil, err
}
resp, err := http.Post(ollamaBaseURL+"/api/embeddings", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return nil, fmt.Errorf("ollama API error: %s", string(body))
}
var embeddingResp OllamaEmbeddingResponse
if err := json.NewDecoder(resp.Body).Decode(&embeddingResp); err != nil {
return nil, err
}
return embeddingResp.Embedding, nil
}
similarity.go
package main
func cosineSimilarity(a, b []float64) float64 {
if len(a) != len(b) {
return 0
}
var dotProduct, normA, normB float64
for i := range a {
dotProduct += a[i] * b[i]
normA += a[i] * a[i]
normB += b[i] * b[i]
}
if normA == 0 || normB == 0 {
return 0
}
return dotProduct / (sqrt(normA) * sqrt(normB))
}
func sqrt(x float64) float64 {
if x == 0 {
return 0
}
z := x
for i := 0; i < 10; i++ {
z = (z + x/z) / 2
}
return z
}
types.go
package main
// OllamaEmbeddingRequest represents the request payload for Ollama embedding API
type OllamaEmbeddingRequest struct {
Model string `json:"model"`
Prompt string `json:"prompt"`
}
// OllamaEmbeddingResponse represents the response from Ollama embedding API
type OllamaEmbeddingResponse struct {
Embedding []float64 `json:"embedding"`
}
// Document represents a document with its metadata
type Document struct {
Path string
Content string
Score float64
}
Useful links
- Ollama cheatsheet
- Qwen3 Embedding & Reranker Models on Ollama: State-of-the-Art Performance
- https://en.wikipedia.org/wiki/Retrieval-augmented_generation
- Install and Configure Ollama models location
- How Ollama Handles Parallel Requests
- Writing effective prompts for LLMs
- Testing LLMs: gemma2, qwen2 and Mistral Nemo on Ollama
- LLMs comparison: Mistral Small, Gemma 2, Qwen 2.5, Mistral Nemo, LLama3 and Phi - On Ollama
- Test: How Ollama is using Intel CPU Performance and Efficient Cores
- Reranking with embedding models on Ollama in Python
- Comparing LLM Summarising Abilities
- Cloud LLM Providers