Reranking with embedding models

A python code of RAG's reranking

Page content

Reranking is a second step in Retrieval Augmented Generation (RAG) systems, right between Retrieving and Generating.

Electric cubes in digital space

This above is how Flux-1 dev imagines Electric cubes in digital space.

Retrieval with reranking

If we store the documents in the form of embeddings in the vector DB from the start - the retrieval will give us the list of similar documents right away.

Standalone reranking

But if we download the documents from the internet first, the search system response could be affected by search provider preferences/algorithms, sponsored content, seo optimisation, etc. so we need to to post-search reranking.

What I was doing -

  • getting embeddings for the search query
  • getting embeddings for each document. the doc wasn’t expected to be more than 8k tokens anyway
  • computed similarity between query and each of the document’s embeddings
  • sorted documents by this similarity.

No vector db here, let’s go.

Sample code

Using the Langchain to connect to Ollama and langchain’s cosine_similarity function. You can filter by similarity measure, but keep in mind that for different domain and embedding LLMs it the threshold would be different.

I will be glad if this bit of code is useful to you in anyway. Copy/Paste/UseAnyWayYouWant license. Cheers.

from langchain_core.documents import Document
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.utils.math import cosine_similarity
import numpy as np


def cosine_distance(a: np.ndarray, b: np.ndarray) -> np.ndarray:
    return 1.0 - cosine_similarity(a, b)

def compute_score(vectors: np.ndarray) -> float:
    score = cosine_distance(vectors[0].reshape(1, -1), vectors[1].reshape(1, -1)).item()
    return score

def list_to_array(lst):
    return np.array(lst, dtype=float)   

def compute_scorel(lists) -> float:
    v1 = list_to_array(lists[0])
    v2 = list_to_array(lists[1])
    return compute_score([v1, v2])

def filter_docs(emb_model_name, docs, query, num_docs):
    content_arr = [doc.page_content for doc in docs]

    ollama_emb = OllamaEmbeddings(
        model=emb_model_name
    )

    docs_embs = ollama_emb.embed_documents(content_arr)
    query_embs = ollama_emb.embed_query(query)
    sims = []
    for i, emb in enumerate(docs_embs):
        idx = docs[i].id
        s = compute_scorel([query_embs, docs_embs[i]])
        simstr = str(round(s, 4))
        docs[i].metadata["sim"] = simstr
        sim = {
            "idx": idx,
            "i": i,
            "sim": s,
        }
        sims.append(sim)

    sims.sort(key=sortFn)

    sorted_docs = [docs[x["i"]] for x in sims]
    filtered_docs = sorted_docs[:num_docs]
    return filtered_docs

Best Embedding Models

For my tasks the best embedding model currently is bge-large:335m-en-v1.5-fp16

The second place took nomic-embed-text:137m-v1.5-fp16 and jina/jina-embeddings-v2-base-en:latest.

But do your own tests for your own domain and queries.