Skip to content

chore(ai-framework): bump .ai to subagent-cost-guidance + statusline#3

Merged
23min merged 1 commit intomainfrom
chore/ai-framework-bump-subagent-guidance
Apr 20, 2026
Merged

chore(ai-framework): bump .ai to subagent-cost-guidance + statusline#3
23min merged 1 commit intomainfrom
chore/ai-framework-bump-subagent-guidance

Conversation

@23min
Copy link
Copy Markdown
Owner

@23min 23min commented Apr 20, 2026

Summary

Bumps the .ai framework submodule 247fdde90ef1ef, picking up:

  • PR chore(ai-framework): bump .ai to 4e595f6; record sync sha #6 (ours, 23min/ai-workflow): subagent cost discipline rules + Explore quick default
    • rules.md: new Subagent Cost Discipline section — keep Opus for code gen / planning / review, prefer Sonnet for general-purpose research, prefer Explore over general-purpose for scans
    • agents/{builder,planner,reviewer}.md: default Explore thoroughness to quick, escalate only after a real gap
  • PR chore(aiwf): catch up to v0.1.1 #7 (upstream, 23min/ai-workflow): framework-managed Claude Code status line

Why

Grounded in a 108-call subagent audit across 30 sessions on this repo:

Agent Calls Tokens Model
Explore 92 (85%) 155M haiku-4-5
general-purpose 11 9.6M opus-4-6
Plan / builder / reviewer / planner 7 total ~13M opus-4-6

Explore is already Haiku (dominant lever pulled). Remaining savings: (a) general-purpose defaulting to Opus for what is mostly research/lookup, and (b) 46 Explore prompts using "thorough"/"very thorough" at 1.7M-tokens/call average where quick would often suffice.

Code gen, planning, review stay on Opus per explicit directive.

Test plan

  • git submodule status shows .ai at 90ef1ef
  • bash .ai/sync.sh runs cleanly and regenerates .claude/agents/*.md adapters
  • No behavior change in existing CI (rules-doc-only bump)

- PR #6 (ours): subagent cost discipline rules + Explore quick default
  - rules.md: new "Subagent Cost Discipline" section (Opus for
    code/plan/review, Sonnet for general-purpose research, Explore
    over general-purpose for scans)
  - agents/{builder,planner,reviewer}.md: default Explore to `quick`
  - Grounded in 108-call subagent audit across 30 sessions
- PR #7 (upstream): framework-managed Claude Code status line

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@23min 23min merged commit a196553 into main Apr 20, 2026
1 check passed
23min added a commit that referenced this pull request May 1, 2026
…-E21-07 AC14)

- 9 Playwright specs cover AC14 per Q4=D hybrid: spec #2 drives bytes through POST /v1/run end-to-end (real-bytes AC1 wire-format round-trip); specs #1, #3-#9 use page.route mocks for deterministic UI-behaviour coverage (multi-severity, severity-max collapse, multi-warning-per-node, view-switch persistence)
- Real-bytes spec #2 uses edge_behavior_violation_lag analyser branch (any edge with lag > 0) — deterministic trigger of any analyser branch with no series-shape dependency; lag YAML inline at LAG_TRIGGER_YAML
- Test-runs isolation: spec #2 skips with copy-pasteable recipe when FLOWTIME_E2E_TEST_RUNS env var unset, so default invocation does not pollute data/runs/. With env var set + API restarted to data/test-runs/, all 9 pass; gracefully skips when infra is down (mirrors svelte-heatmap.spec.ts pattern)
- Selectors used (all already shipped by earlier AC chunks): data-testid="validation-panel", data-testid="validation-row", data-row-kind, data-row-key, data-row-match, data-warning-indicator, data-warning-node-id, data-warning-edge-id, data-warning-severity, data-warning-dot, data-selected
- Verified end-to-end: 8 passed / 1 skipped against live infra without env var (no pollution); 9 passed when env var set + API on data/test-runs/. ui-vitest 879 baseline unchanged
- Quirks captured in spec for future maintainers: state_window requires grid.start; edge clicks need force: true (transparent hit-path stroke)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant