Reranking text documents with Ollama and Qwen3 Embedding model - in Go

Implementing RAG? Here are some codesnippets in Golang..

Page content

This little Reranking Go code example is calling Ollama to generate embeddings for the query and for eache candidate document, then sorting descending by cosine similarity.

We already did similar activity - Reranking with embedding models but that was in python, with different LLM and almost a year ago.

llamas of different heights - reranking with ollama

TL;DR

The result looks very good, the speed is 0.128s per document. Question is counted as a doc. And Sorting and printing is also included into this stat.

LLM memory consumption: Even though model size on sdd (ollama ls) is less then 3GB

dengcao/Qwen3-Embedding-4B:Q5_K_M           7e8c9ad6885b    2.9 GB

In GPU VRAM it takes (not a bit) more: 5.5GB. (ollama ps)

NAME                                 ID              SIZE
dengcao/Qwen3-Embedding-4B:Q5_K_M    7e8c9ad6885b    5.5 GB 

If you have 8GB GPU - should be OK.

Testing Reranking with Embeddings on Ollama - Sample Output

In all three test cases reranking with embedding using dengcao/Qwen3-Embedding-4B:Q5_K_M ollama model was awesome! See it for yourselves.

We have 7 files containing some texts describing what their filenamename says:

  • ai_introduction.txt
  • machine_learning.md
  • qwen3-reranking-models.md
  • ollama-parallelism.md
  • ollama-reranking-models.md
  • programming_basics.txt
  • setup.log

test runs:

Reranking test: What is artificial intelligence and how does machine learning work?

./rnk example_query.txt example_docs/

Using embedding model: dengcao/Qwen3-Embedding-4B:Q5_K_M
Ollama base URL: http://localhost:11434
Processing query file: example_query.txt, target directory: example_docs/
Query: What is artificial intelligence and how does machine learning work?
Found 7 documents
Extracting query embedding...
Processing documents...

=== RANKING BY SIMILARITY ===
1. example_docs/ai_introduction.txt (Score: 0.451)
2. example_docs/machine_learning.md (Score: 0.388)
3. example_docs/qwen3-reranking-models.md (Score: 0.354)
4. example_docs/ollama-parallelism.md (Score: 0.338)
5. example_docs/ollama-reranking-models.md (Score: 0.318)
6. example_docs/programming_basics.txt (Score: 0.296)
7. example_docs/setup.log (Score: 0.282)

Processed 7 documents in 0.899s (avg: 0.128s per document)

Reranking test: How ollama handles parallel requests?

./rnk example_query2.txt example_docs/

Using embedding model: dengcao/Qwen3-Embedding-4B:Q5_K_M
Ollama base URL: http://localhost:11434
Processing query file: example_query2.txt, target directory: example_docs/
Query: How ollama handles parallel requests?
Found 7 documents
Extracting query embedding...
Processing documents...

=== RANKING BY SIMILARITY ===
1. example_docs/ollama-parallelism.md (Score: 0.557)
2. example_docs/qwen3-reranking-models.md (Score: 0.532)
3. example_docs/ollama-reranking-models.md (Score: 0.498)
4. example_docs/ai_introduction.txt (Score: 0.366)
5. example_docs/machine_learning.md (Score: 0.332)
6. example_docs/programming_basics.txt (Score: 0.307)
7. example_docs/setup.log (Score: 0.257)

Processed 7 documents in 0.858s (avg: 0.123s per document)

Reranking test: How can we do the reranking of the document with ollama?

./rnk example_query3.txt example_docs/

Using embedding model: dengcao/Qwen3-Embedding-4B:Q5_K_M
Ollama base URL: http://localhost:11434
Processing query file: example_query3.txt, target directory: example_docs/
Query: How can we do the reranking of the document with ollama?
Found 7 documents
Extracting query embedding...
Processing documents...

=== RANKING BY SIMILARITY ===
1. example_docs/ollama-reranking-models.md (Score: 0.552)
2. example_docs/ollama-parallelism.md (Score: 0.525)
3. example_docs/qwen3-reranking-models.md (Score: 0.524)
4. example_docs/ai_introduction.txt (Score: 0.369)
5. example_docs/machine_learning.md (Score: 0.346)
6. example_docs/programming_basics.txt (Score: 0.316)
7. example_docs/setup.log (Score: 0.279)

Processed 7 documents in 0.882s (avg: 0.126s per document)

Go Source Code

Put it all into a folder and compile it like

go build -o rnk

Feel free to use it in any entertaining or commercial purpose or upload it to github if you like. MIT license.

main.go

package main

import (
	"fmt"
	"log"
	"os"
	"sort"
	"time"

	"github.com/spf13/cobra"
)

var rootCmd = &cobra.Command{
	Use:   "rnk [query-file] [target-directory]",
	Short: "RAG system using Ollama embeddings",
	Long:  "A simple RAG system that extracts embeddings and ranks documents using Ollama",
	Args:  cobra.ExactArgs(2),
	Run:   runRnk,
}

var (
	embeddingModel string
	ollamaBaseURL  string
)

func init() {
	rootCmd.Flags().StringVarP(&embeddingModel, "model", "m", "dengcao/Qwen3-Embedding-4B:Q5_K_M", "Embedding model to use")
	rootCmd.Flags().StringVarP(&ollamaBaseURL, "url", "u", "http://localhost:11434", "Ollama base URL")
}

func main() {
	if err := rootCmd.Execute(); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

func runRnk(cmd *cobra.Command, args []string) {
	queryFile := args[0]
	targetDir := args[1]

	startTime := time.Now()

	fmt.Printf("Using embedding model: %s\n", embeddingModel)
	fmt.Printf("Ollama base URL: %s\n", ollamaBaseURL)
	fmt.Printf("Processing query file: %s, target directory: %s\n", queryFile, targetDir)

	// Read query from file
	query, err := readQueryFromFile(queryFile)
	if err != nil {
		log.Fatalf("Error reading query file: %v", err)
	}
	fmt.Printf("Query: %s\n", query)

	// Find all text files in target directory
	documents, err := findTextFiles(targetDir)
	if err != nil {
		log.Fatalf("Error finding text files: %v", err)
	}
	fmt.Printf("Found %d documents\n", len(documents))

	// Extract embeddings for query
	fmt.Println("Extracting query embedding...")
	queryEmbedding, err := getEmbedding(query, embeddingModel, ollamaBaseURL)
	if err != nil {
		log.Fatalf("Error getting query embedding: %v", err)
	}

	// Process documents
	fmt.Println("Processing documents...")
	validDocs := make([]Document, 0)

	for _, doc := range documents {
		embedding, err := getEmbedding(doc.Content, embeddingModel, ollamaBaseURL)
		if err != nil {
			fmt.Printf("Warning: Failed to get embedding for %s: %v\n", doc.Path, err)
			continue
		}

		similarity := cosineSimilarity(queryEmbedding, embedding)
		doc.Score = similarity
		validDocs = append(validDocs, doc)
	}

	if len(validDocs) == 0 {
		log.Fatalf("No documents could be processed successfully")
	}

	// Sort by similarity score (descending)
	sort.Slice(validDocs, func(i, j int) bool {
		return validDocs[i].Score > validDocs[j].Score
	})

	// Display results
	fmt.Println("\n=== RANKING BY SIMILARITY ===")
	for i, doc := range validDocs {
		fmt.Printf("%d. %s (Score: %.3f)\n", i+1, doc.Path, doc.Score)
	}

	totalTime := time.Since(startTime)
	avgTimePerDoc := totalTime / time.Duration(len(validDocs))

	fmt.Printf("\nProcessed %d documents in %.3fs (avg: %.3fs per document)\n",
		len(validDocs), totalTime.Seconds(), avgTimePerDoc.Seconds())
}

documents.go

package main

import (
	"fmt"
	"os"
	"path/filepath"
	"strings"
)

func readQueryFromFile(filename string) (string, error) {
	content, err := os.ReadFile(filename)
	if err != nil {
		return "", err
	}
	return strings.TrimSpace(string(content)), nil
}

func findTextFiles(dir string) ([]Document, error) {
	var documents []Document

	err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
		if err != nil {
			return err
		}

		if !info.IsDir() && isTextFile(path) {
			content, err := os.ReadFile(path)
			if err != nil {
				fmt.Printf("Warning: Could not read file %s: %v\n", path, err)
				return nil
			}

			documents = append(documents, Document{
				Path:    path,
				Content: string(content),
			})
		}

		return nil
	})

	return documents, err
}

func isTextFile(filename string) bool {
	ext := strings.ToLower(filepath.Ext(filename))
	textExts := []string{".txt", ".md", ".rst", ".csv", ".json", ".xml", ".html", ".htm", ".log"}
	for _, textExt := range textExts {
		if ext == textExt {
			return true
		}
	}
	return false
}

embeddings.go

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
)

func getEmbedding(text string, model string, ollamaBaseURL string) ([]float64, error) {
	req := OllamaEmbeddingRequest{
		Model:  model,
		Prompt: text,
	}

	jsonData, err := json.Marshal(req)
	if err != nil {
		return nil, err
	}

	resp, err := http.Post(ollamaBaseURL+"/api/embeddings", "application/json", bytes.NewBuffer(jsonData))
	if err != nil {
		return nil, err
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		body, _ := io.ReadAll(resp.Body)
		return nil, fmt.Errorf("ollama API error: %s", string(body))
	}

	var embeddingResp OllamaEmbeddingResponse
	if err := json.NewDecoder(resp.Body).Decode(&embeddingResp); err != nil {
		return nil, err
	}

	return embeddingResp.Embedding, nil
}

similarity.go

package main

func cosineSimilarity(a, b []float64) float64 {
	if len(a) != len(b) {
		return 0
	}

	var dotProduct, normA, normB float64

	for i := range a {
		dotProduct += a[i] * b[i]
		normA += a[i] * a[i]
		normB += b[i] * b[i]
	}

	if normA == 0 || normB == 0 {
		return 0
	}

	return dotProduct / (sqrt(normA) * sqrt(normB))
}

func sqrt(x float64) float64 {
	if x == 0 {
		return 0
	}
	z := x
	for i := 0; i < 10; i++ {
		z = (z + x/z) / 2
	}
	return z
}

types.go

package main

// OllamaEmbeddingRequest represents the request payload for Ollama embedding API
type OllamaEmbeddingRequest struct {
	Model  string `json:"model"`
	Prompt string `json:"prompt"`
}

// OllamaEmbeddingResponse represents the response from Ollama embedding API
type OllamaEmbeddingResponse struct {
	Embedding []float64 `json:"embedding"`
}

// Document represents a document with its metadata
type Document struct {
	Path    string
	Content string
	Score   float64
}