feat(routing): add cognitive roles (debugging/orchestration/evaluation) and fix critique tradeoff#21
Open
Joi wants to merge 1 commit into
Open
Conversation
…tique tradeoff
Implements joi-90y. Extends the routing matrix with three new cognitive roles
for high-stakes work, and corrects the critique role's capability-vs-thinking-
budget tradeoff.
NEW ROLES (added to anthropic.yaml, balanced.yaml, quality.yaml):
- debugging Opus + high — bug-hunter, session-analyst, incident analysis
- orchestration Opus + medium — root session, coordinator work
- evaluation Opus + high — comparing parallel agent outputs
CHANGED:
- critique: Sonnet+xhigh → Opus+high. xhigh produces longer outputs of the
same model class, not higher-quality outputs. For critique tasks, capability
(model class) > thinking budget. Inline comment captures the rationale.
- writing: added reasoning_effort: medium for coherence across long outputs.
No-op for creative (already has no reasoning_effort).
All 3 files have updated: bumped to 2026-05-08. Tests pass.
Generated with Amplifier (https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three new cognitive roles for high-stakes work, plus a correction to the
critiquerole's model-vs-thinking-budget tradeoff. Applied to anthropic.yaml, balanced.yaml, and quality.yaml.Why these roles
The current matrix has
reasoning(deep analysis),critique(finding flaws), andcreative(generative). It lacks targeted roles for:All three default to Opus because the cost of a wrong call in these roles compounds across the session (a bad debugging hypothesis cascades; a bad orchestration decision wastes parallel work; a bad evaluation picks the worse output).
Why critique changes
Current: Sonnet + reasoning_effort: xhigh.
New: Opus + reasoning_effort: high.
The hypothesis was that thinking budget could compensate for model strength. In practice,
xhighproduces longer Sonnet outputs, not higher-quality outputs. Critique is a discrimination task — it benefits more from Opus's stronger judgment than from extra thinking on a weaker model.reasoning_effortis orthogonal to capability.Why writing changes
Added
reasoning_effort: mediumto thewritingrole for coherence across long outputs. Long-form content (documentation, marketing, case studies) benefits from medium thinking — not high (which slows it down without proportional gain), not none (which produces less-coherent long outputs).The companion proposal to remove
reasoning_effortfromcreativewas a no-op — it already has none.Files changed
routing/anthropic.yaml— single-provider, Opus across all new rolesrouting/balanced.yaml— multi-provider, mirrors existing reasoning chainrouting/quality.yaml— multi-provider, mirrors existing reasoning chainupdated:bumped to 2026-05-08 in all three.Verification
pytest tests/— 5/5 passyaml.safe_load()— all three files parseReferences