Skip to content

ace-doctor: probe Nova API key scopes (HQ Read/Write), not just bearer-accepted #174

@jjackson

Description

@jjackson

Symptom

/ace:doctor currently reports nova_auth: ace-nova authed (POST initialize → HTTP 200) whenever the bearer token is accepted at the MCP transport layer. That's a necessary but not sufficient check — the token can authenticate cleanly and still lack the per-tool scopes that ACE's Phase 2 actually needs.

Concrete miss that motivated this: turmeric/20260508-1951 halted at Phase 2 / app-deploy with error_type: scope_missing, required_scope: nova.hq.read on get_hq_connection. Doctor was green on nova_auth at the time. The first signal of the missing scope was mid-run, after Phase 1 + Step 1 of Phase 2 had already consumed real budget.

Proposed probe

Add a nova_scopes probe that exercises the actual scope ACE needs at the cheapest possible cost:

  1. Call mcp__plugin_nova_nova__get_hq_connection (or the equivalent direct HTTP tools/call).

  2. Branch on the typed response:

    • {configured: true, ...}pass nova_scopes: HQ Read scope granted; bound to <domain.name> (bonus signal: confirms which HQ domain Nova is bound to — could replace the manual ACE_HQ_DOMAIN-vs-Nova reconciliation in app-deploy's pre-flight)
    • {configured: false}warn nova_scopes: HQ Read granted but no HQ key bound in Nova settings — paste an HQ API key at https://commcare.app/settings before /ace:run
    • error_type: scope_missingfail nova_scopes: NOVA_API_KEY missing scope <required_scope>; edit at https://commcare.app/settings → API tokens → grant HQ Read + HQ Write (the latter is needed for upload_app_to_hq; both should be requested together)
    • error_type: auth_failed / 401 → already covered by existing nova_auth probe; this case is a sanity-check the existing probe still works
    • other / 5xx → warn nova_scopes: unexpected response <body>; investigate Nova MCP health

The probe is a single MCP tool call (~100ms), happens after nova_auth so we know the bearer is good first, and only runs when NOVA_API_KEY is present (skipped silently otherwise — same pattern as the existing nova_auth probe).

Why this matters

ACE's Phase 2 is the longest single-phase budget burn (Nova architect builds ~10–15 min each + deploy + release). A scope check that catches the failure pre-Phase-1 saves the full Phase-1 + Phase-2-Step-1 cost on every machine that has a partial-scope key. The 30-second fix at https://commcare.app/settings is fine; the lost ~25-min iteration when it surfaces mid-run is not.

Class-level fit

Per ACE's "class-level preventers > instance-level fixes" convention (CLAUDE.md), this turns "key configured wrong" from a silent mid-run halt into a structurally-impossible-to-miss doctor signal. Same shape as the 0.7.1 ocs_shared_collection_team doctor probe — small HTTP probe at the boundary so the failure mode never reaches the skill.

Out of scope

  • Auto-rotating / requesting scopes — UI-only operation on Nova's side; doctor only diagnoses, doesn't mutate.
  • A general "all scopes ACE might ever need" enumeration — start with nova.hq.read (covers get_hq_connection + upload_app_to_hq's prerequisite) and add others if/when a different scope-missing failure surfaces.

Acceptance

  • bin/ace-doctor runs nova_scopes after nova_auth, gated on NOVA_API_KEY being present
  • On a key with full scopes: pass nova_scopes: HQ Read scope granted; bound to <domain>
  • On a key missing nova.hq.read: fail nova_scopes: ... with the exact remediation URL + scope names
  • /ace:doctor exit code reflects fail per existing convention (mark_fail)
  • Update playbook/integrations/nova-integration.md § Auth liveness lines to mention nova_scopes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions