You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue #47 defines the Reranker protocol for rescoring routing candidates after initial retrieval. The default implementation will be NoOpReranker (pass-through). This issue adds an LLM-powered reranker that uses a language model to assess semantic relevance between the query and candidate tools.
Current state
Router.route() scores candidates using TfIdfScorer (cosine similarity on tokenized text).
TF-IDF is fast but misses semantic relevance: "schedule a meeting" scores low against "calendar event creation" because the vocabulary doesn't overlap.
A reranker sits after retrieval and before navigation, using a more expensive model to reorder the top-k candidates by true relevance.
Why it matters
Accuracy at scale — Cross-encoder and LLM-based reranking consistently outperform bi-encoder and lexical scoring in IR benchmarks (by 10-30% on nDCG).
Pillar 3 — "Use an LM to better understand the relationship between tools" — reranking is the most targeted application: given this specific query, which 5 of these 20 candidates are truly relevant?
Context
Issue #47 defines the
Rerankerprotocol for rescoring routing candidates after initial retrieval. The default implementation will beNoOpReranker(pass-through). This issue adds an LLM-powered reranker that uses a language model to assess semantic relevance between the query and candidate tools.Current state
Router.route()scores candidates usingTfIdfScorer(cosine similarity on tokenized text).contextweaver[retrieval]extra with BM25 and fuzzy matching backends #55 adds BM25/fuzzy matching — still lexical. [routing] Add optional embedding-based retrieval backend for improved recall at scale #8 adds embedding retrieval — semantic but coarse.Why it matters
Acceptance Criteria
LLMRerankerclass implementingRerankerprotocol (from [routing] Add EngineRegistry with pluggable Retriever, Reranker, and ClusteringEngine protocols #47)llm_fn: Callable[[str], str]— no dependency on any LLM providerlist[tuple[str, float]]top_kparameter to limit how many candidates are sent to the LLM (cost control)EngineRegistryas"llm"rerankerllm_fn(valid response, invalid response, empty candidates)Implementation Notes
Files likely touched:
src/contextweaver/engines.py(or newsrc/contextweaver/extras/reranker_llm.py)tests/test_engines.pyDependencies
Rerankerprotocol andEngineRegistrycontextweaver[retrieval]extra with BM25 and fuzzy matching backends #55 (BM25 retrieval) and [routing] Add optional embedding-based retrieval backend for improved recall at scale #8 (embedding retrieval) — reranking refines their output