Context
Issue #47 defines the ClusteringEngine protocol for partitioning tools into groups. The default implementation will be JaccardClusterer, wrapping the existing tag/Jaccard-based logic from TreeBuilder. However, no LLM-powered implementation is planned — and this is where the core Pillar 3 value lies: "use an LM to better understand the relationship between tools."
Current state
Why it matters
- Vision centerpiece — "Use an LM to understand the relationship between tools" requires semantic understanding of tool descriptions, not just tag matching.
- Better ChoiceGraph quality — LLM-grouped trees present more intuitive navigation to the agent: "Communication tools" vs "Data tools" vs "Admin tools" instead of arbitrary tag-based splits.
- Scale — At 500+ tools, manual tagging breaks down. Semantic clustering scales without human curation.
Acceptance Criteria
Implementation Notes
class LLMClusteringEngine:
def __init__(self, llm_fn: Callable[[str], str], *, fallback: ClusteringEngine | None = None) -> None: ...
def cluster(self, items: list[SelectableItem], k: int) -> dict[str, list[SelectableItem]]:
prompt = self._build_prompt(items, k)
response = self.llm_fn(prompt)
try:
return self._parse_response(response, items)
except ParseError:
if self.fallback:
return self.fallback.cluster(items, k)
raise
Files likely touched:
src/contextweaver/engines.py (or new src/contextweaver/extras/clustering_llm.py)
tests/test_engines.py
Dependencies
Context
Issue #47 defines the
ClusteringEngineprotocol for partitioning tools into groups. The default implementation will beJaccardClusterer, wrapping the existing tag/Jaccard-based logic fromTreeBuilder. However, no LLM-powered implementation is planned — and this is where the core Pillar 3 value lies: "use an LM to better understand the relationship between tools."Current state
TreeBuilderclusters tools by tag overlap using Jaccard similarity — purely lexical.ClusteringEngineprotocol ([routing] Add EngineRegistry with pluggable Retriever, Reranker, and ClusteringEngine protocols #47) will create the extension point. This issue creates the intelligence that plugs into it.Why it matters
Acceptance Criteria
LLMClusteringEngineclass implementingClusteringEngineprotocol (from [routing] Add EngineRegistry with pluggable Retriever, Reranker, and ClusteringEngine protocols #47)llm_fn: Callable[[str], str]parameter — no dependency on any LLM providerdict[str, list[SelectableItem]]JaccardClustererEngineRegistryas"llm"clustering enginellm_fn(valid response, invalid response, empty tools, single group)pyproject.toml: no new runtime dependencies (LLM is user-provided via callable)Implementation Notes
Files likely touched:
src/contextweaver/engines.py(or newsrc/contextweaver/extras/clustering_llm.py)tests/test_engines.pyDependencies
ClusteringEngineprotocol andEngineRegistry