Skip to content

feat(routing): add cognitive roles (debugging/orchestration/evaluation) and fix critique tradeoff#21

Open
Joi wants to merge 1 commit into
microsoft:mainfrom
Joi:feat/cognitive-roles-and-critique-fix
Open

feat(routing): add cognitive roles (debugging/orchestration/evaluation) and fix critique tradeoff#21
Joi wants to merge 1 commit into
microsoft:mainfrom
Joi:feat/cognitive-roles-and-critique-fix

Conversation

@Joi
Copy link
Copy Markdown

@Joi Joi commented May 8, 2026

Summary

Three new cognitive roles for high-stakes work, plus a correction to the critique role's model-vs-thinking-budget tradeoff. Applied to anthropic.yaml, balanced.yaml, and quality.yaml.

Why these roles

The current matrix has reasoning (deep analysis), critique (finding flaws), and creative (generative). It lacks targeted roles for:

  • debugging — hypothesis-driven investigation of failures (bug-hunter, session-analyst, incident analysis). Different from reasoning: more iterative, more evidence-driven, less open-ended.
  • orchestration — root-session coordination across multiple agents. Needs strong planning + judgment but not the same "think hard" depth as reasoning.
  • evaluation — comparing parallel agent outputs, judging quality across candidates. The "which of these is better" task that needs Opus-level discrimination.

All three default to Opus because the cost of a wrong call in these roles compounds across the session (a bad debugging hypothesis cascades; a bad orchestration decision wastes parallel work; a bad evaluation picks the worse output).

Why critique changes

Current: Sonnet + reasoning_effort: xhigh.
New: Opus + reasoning_effort: high.

The hypothesis was that thinking budget could compensate for model strength. In practice, xhigh produces longer Sonnet outputs, not higher-quality outputs. Critique is a discrimination task — it benefits more from Opus's stronger judgment than from extra thinking on a weaker model. reasoning_effort is orthogonal to capability.

Why writing changes

Added reasoning_effort: medium to the writing role for coherence across long outputs. Long-form content (documentation, marketing, case studies) benefits from medium thinking — not high (which slows it down without proportional gain), not none (which produces less-coherent long outputs).

The companion proposal to remove reasoning_effort from creative was a no-op — it already has none.

Files changed

  • routing/anthropic.yaml — single-provider, Opus across all new roles
  • routing/balanced.yaml — multi-provider, mirrors existing reasoning chain
  • routing/quality.yaml — multi-provider, mirrors existing reasoning chain

updated: bumped to 2026-05-08 in all three.

Verification

  • pytest tests/ — 5/5 pass
  • yaml.safe_load() — all three files parse
  • Role count: 13 → 16 in each file
  • Programmatic verification: new roles present, critique is Opus+high, writing has reasoning_effort: medium

References

  • Source issue: joi-90y (filed 2026-04-29)
  • Aligns with strategic routing decision documented in dotfiles 2026-04-30: "root session uses Opus, critique role and reviewers Opus, Sonnet for generative leaf nodes"

…tique tradeoff

Implements joi-90y. Extends the routing matrix with three new cognitive roles
for high-stakes work, and corrects the critique role's capability-vs-thinking-
budget tradeoff.

NEW ROLES (added to anthropic.yaml, balanced.yaml, quality.yaml):
  - debugging      Opus + high   — bug-hunter, session-analyst, incident analysis
  - orchestration  Opus + medium — root session, coordinator work
  - evaluation     Opus + high   — comparing parallel agent outputs

CHANGED:
  - critique: Sonnet+xhigh → Opus+high. xhigh produces longer outputs of the
    same model class, not higher-quality outputs. For critique tasks, capability
    (model class) > thinking budget. Inline comment captures the rationale.
  - writing: added reasoning_effort: medium for coherence across long outputs.
    No-op for creative (already has no reasoning_effort).

All 3 files have updated: bumped to 2026-05-08. Tests pass.

Generated with Amplifier (https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant