From a9fb64c209b98174bfdd0e2ee44a16af151e1caf Mon Sep 17 00:00:00 2001 From: Jonathan Jackson Date: Fri, 22 May 2026 02:56:42 -0600 Subject: [PATCH] docs: cleanup historical artifacts (~17k lines) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Audit pass over docs/ found a lot of accumulated detritus that no code, skill, or agent references. Removing the unreferenced historical artifacts and fixing the broken cross-references that result. Deletions (~17,300 lines) - docs/superpowers/plans/ — wholesale (11 files, ~13k lines). These were post-shipment implementation plans, zero references from any skill/agent/CLAUDE.md. The PR diff is the truth, not the plan checkboxes. - docs/superpowers/specs/ — 8 unreferenced design docs (~2,100 lines): ace-solicitations-phase-design, app-multimedia-coverage-design, skills-audit-findings, mobile-cloud-runner-poc + api-gaps, ace-sweep-atom-contracts + design, work-order-skill-design. Kept the 5 specs that ARE referenced from skills/CLAUDE.md (shallow-deep-qa-split, decisions-log, qa-eval-migration, state-consolidation, focus-group-archetype-redefinition). - docs/generated/playbook.md — 16-day-stale derived artifact (claimed "8-phase orchestration"; pipeline has been 10-phase for weeks). Regenerated by `/ace:docs` when next needed. Broken cross-references fixed - skills/README.md — two refs to deleted 2026-04-01-ace-design.md replaced with pointers to CLAUDE.md + agents/orchestrator-reference.md and (for the dry-run paragraph) absorbed inline. - skills/upload-transcript/SKILL.md — ref to deleted 2026-05-02-ace-run-multi-run-revival-design.md dropped; the sentence stands on its own. - README.md — Documentation section refreshed; broken refs to ace-design.md and ace-web-harness-design.md replaced with pointers to CLAUDE.md, agents/orchestrator-reference.md, the integrations playbook, and the ace-web sibling repo. Followups not in this PR (called out in the analysis but deferred): - SKILL.md ## Change Log compression (every skill carries 5-13 historical entries that bloat its context every time it dispatches). Bigger code change; deserves its own PR. - .claude/pm/runs/ early-April compaction. Light touch; folds into one durable learning when someone gets to it. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude-plugin/marketplace.json | 4 +- .claude-plugin/plugin.json | 2 +- README.md | 13 +- VERSION | 2 +- docs/generated/playbook.md | 564 ---- .../2026-05-04-ace-solicitations-phase.md | 2425 ----------------- .../plans/2026-05-04-shallow-deep-qa-split.md | 1202 -------- .../2026-05-05-app-multimedia-coverage.md | 2105 -------------- .../plans/2026-05-08-decisions-log-pr1.md | 1124 -------- .../plans/2026-05-08-decisions-log-pr2.md | 1059 ------- .../plans/2026-05-08-decisions-log-pr3.md | 1201 -------- .../plans/2026-05-08-decisions-log-pr4.md | 294 -- .../plans/2026-05-08-decisions-log-pr5.md | 133 - ...026-05-10-orchestrator-structural-split.md | 818 ------ ...5-15-ace-sweep-pr1-foundation-and-drive.md | 1148 -------- .../plans/2026-05-21-work-order-skill.md | 1454 ---------- ...26-05-04-ace-solicitations-phase-design.md | 378 --- ...26-05-05-app-multimedia-coverage-design.md | Bin 21583 -> 0 bytes .../specs/2026-05-06-skills-audit-findings.md | 316 --- .../2026-05-09-mobile-cloud-runner-poc.md | 180 -- ...2026-05-11-mobile-cloud-runner-api-gaps.md | 141 - .../2026-05-15-ace-sweep-atom-contracts.md | 186 -- .../specs/2026-05-15-ace-sweep-design.md | 171 -- .../2026-05-21-work-order-skill-design.md | 233 -- package.json | 2 +- skills/README.md | 4 +- skills/upload-transcript/SKILL.md | 4 +- 27 files changed, 16 insertions(+), 15147 deletions(-) delete mode 100644 docs/generated/playbook.md delete mode 100644 docs/superpowers/plans/2026-05-04-ace-solicitations-phase.md delete mode 100644 docs/superpowers/plans/2026-05-04-shallow-deep-qa-split.md delete mode 100644 docs/superpowers/plans/2026-05-05-app-multimedia-coverage.md delete mode 100644 docs/superpowers/plans/2026-05-08-decisions-log-pr1.md delete mode 100644 docs/superpowers/plans/2026-05-08-decisions-log-pr2.md delete mode 100644 docs/superpowers/plans/2026-05-08-decisions-log-pr3.md delete mode 100644 docs/superpowers/plans/2026-05-08-decisions-log-pr4.md delete mode 100644 docs/superpowers/plans/2026-05-08-decisions-log-pr5.md delete mode 100644 docs/superpowers/plans/2026-05-10-orchestrator-structural-split.md delete mode 100644 docs/superpowers/plans/2026-05-15-ace-sweep-pr1-foundation-and-drive.md delete mode 100644 docs/superpowers/plans/2026-05-21-work-order-skill.md delete mode 100644 docs/superpowers/specs/2026-05-04-ace-solicitations-phase-design.md delete mode 100644 docs/superpowers/specs/2026-05-05-app-multimedia-coverage-design.md delete mode 100644 docs/superpowers/specs/2026-05-06-skills-audit-findings.md delete mode 100644 docs/superpowers/specs/2026-05-09-mobile-cloud-runner-poc.md delete mode 100644 docs/superpowers/specs/2026-05-11-mobile-cloud-runner-api-gaps.md delete mode 100644 docs/superpowers/specs/2026-05-15-ace-sweep-atom-contracts.md delete mode 100644 docs/superpowers/specs/2026-05-15-ace-sweep-design.md delete mode 100644 docs/superpowers/specs/2026-05-21-work-order-skill-design.md diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 6987c3d3..447d125c 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -6,13 +6,13 @@ "url": "https://github.com/jjackson" }, "metadata": { - "version": "0.13.330" + "version": "0.13.331" }, "plugins": [ { "name": "ace", "source": "./", - "version": "0.13.330", + "version": "0.13.331", "description": "AI Connect Engine — orchestrates the CRISPR-Connect lifecycle from idea through app building, Connect setup, LLO management, and closeout" } ] diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index 13ae57c7..d519303a 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "ace", - "version": "0.13.330", + "version": "0.13.331", "description": "AI Connect Engine — orchestrates the CRISPR-Connect lifecycle from idea through app building, Connect setup, LLO management, and closeout", "author": { "name": "Jonathan Jackson", diff --git a/README.md b/README.md index ef567e41..9ac9973c 100644 --- a/README.md +++ b/README.md @@ -217,11 +217,14 @@ delegates app-build to `/nova:autobuild`. See ## Documentation -- [Design Spec](docs/superpowers/specs/2026-04-01-ace-design.md) — full architecture and rationale -- [Generated Playbook](docs/generated/playbook.md) — human-readable process flow (generated from agent/skill definitions) -- [Integration Specs](playbook/integrations/) — what APIs exist vs. need to be built -- [ACE Web Harness Design](docs/superpowers/specs/2026-04-07-ace-web-harness-design.md) — cross-cutting architecture spec for the browser-based ACE frontend -- [PDD Stress-Test Observations](docs/examples/pdd-stress-test-observations.md) — how to validate an PDD and verify LLO execution, with two sample PDDs worked through end-to-end +- [CLAUDE.md](CLAUDE.md) — agent guide, phase pipeline, conventions, gotchas +- [agents/orchestrator-reference.md](agents/orchestrator-reference.md) — state schemas, phase write-back contract, pause points, fork points +- [Integration Specs](playbook/integrations/) — per-MCP integration reference and durable gotcha records (OCS, Nova, Connect, CommCare, labs, mobile, slides) +- [Generated Playbook](docs/generated/playbook.md) — derived process flow regenerated by `/ace:docs` (run the command to (re)create it) +- [Design Specs](docs/superpowers/specs/) — date-stamped design docs for in-flight or recently-shipped work +- [Durable Learnings](docs/learnings/) — cross-session lessons (Nova bugs, demo-user mechanics, Phase 6 validation arc, etc.) +- [PDD Stress-Test Observations](docs/examples/pdd-stress-test-observations.md) — how to validate a PDD and verify LLO execution, with two sample PDDs worked through end-to-end +- **`ace-web`** sibling repo — design spec for the browser-based ACE frontend lives in that repo, not here ## Related projects diff --git a/VERSION b/VERSION index 8bf173bc..7a026946 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.13.330 +0.13.331 diff --git a/docs/generated/playbook.md b/docs/generated/playbook.md deleted file mode 100644 index a6142596..00000000 --- a/docs/generated/playbook.md +++ /dev/null @@ -1,564 +0,0 @@ -# ACE Playbook — CRISPR-Connect Process - -_Generated: 2026-05-06 (ACE 0.13.43 — 8-phase orchestration)_ - -Derived from `agents/*.md`, `skills/*/SKILL.md`, and -`playbook/integrations/*.md`. Regenerate with `/ace:docs` after changing any -of those sources. - -## Overview - -ACE (AI Connect Engine) orchestrates the full CRISPR-Connect lifecycle for -a Connect opportunity. The `ace-orchestrator` agent dispatches to **eight -phase agents** in order: - -1. **design-review** — iterate idea → approved PDD + opp-specific test prompts -2. **commcare-setup** — Nova-build Learn + Deliver apps, deploy + release -3. **connect-setup** — create Program + Opportunity in Connect -4. **ocs-setup** — clone OCS chatbot template, attach RAG collection, smoke-test -5. **qa-and-training** — capture screenshots, generate per-artifact training docs -6. **solicitation-management** — publish solicitation, invite candidate LLOs -7. **execution-manager** — onboard awarded LLO, UAT, go-live, recurring monitor -8. **closeout** — invoices, feedback, learnings, cycle grade - -**Phases 1–5 run end-to-end with zero LLO involvement.** Phase 6 publishes a -public solicitation listing (no targeted contact unless the PDD names -preferred LLOs). Phase 7 is the first 1-1 LLO contact and starts only when -`opp.yaml.selected_llo.org_slug` is populated by the manual `solicitation-review` -skill that runs between Phase 6 and Phase 7. - -### Execution modes - -- **Auto (default)** — run all phases sequentially, log gates but don't - enforce them. -- **Review** — pause at gate steps and use `AskUserQuestion` to get operator - approval before continuing. Gate steps: - - After `idea-to-pdd` (Phase 1) — PDD approval - - After `app-deploy` (Phase 2) — apps verified before Connect setup - - After `ocs-chatbot-eval --deep` (Phase 4) — OCS quality clears the - pre-launch bar - - After `solicitation-review` (manual, between Phase 6 and 7) — awardee - explicitly approved - - After `llo-launch` (Phase 7) — opportunity activation verified - -### Agent topology - -ACE has one architectural rule: **anything that calls `Agent` must run at -level 0.** The orchestrator and the Phase 2 `commcare-setup` are procedure -docs the top-level session reads inline; the other seven agents are -subagents dispatched via `Agent(...)` from level 0. There are never two -levels of `Agent` dispatch. See `CLAUDE.md § Agent topology` for the full -rule and history. - -## Process Flow - -| Phase | Agent | Skills | Gate | -|---|---|---|---| -| 1 | design-review | idea-to-pdd → pdd-to-test-prompts + pdd-to-app-journeys | PDD approved (review mode) | -| 2 | commcare-setup | pdd-to-learn-app + pdd-to-deliver-app → app-connect-coverage → app-test-cases → app-deploy → app-release | Apps deployed + released | -| 3 | connect-setup | connect-program-setup → connect-opp-setup | — | -| 4 | ocs-setup | ocs-agent-setup → ocs-chatbot-qa --quick → ocs-chatbot-eval --quick | OCS quality (deep gate at end) | -| 5 | qa-and-training | app-screenshot-capture → training-* (6 per-artifact) → training-deck-build | — | -| 6 | solicitation-management | solicitation-create → llo-invite → solicitation-monitor (recurring) | **HALT** — manual `solicitation-review` populates `selected_llo` | -| 7 | execution-manager | llo-onboarding → llo-uat → llo-launch → timeline-monitor + flw-data-review + ocs-chatbot-qa --monitor (recurring) | UAT sign-off + launch verified | -| 8 | closeout | opp-closeout → llo-feedback → learnings-summary → cycle-grade | — | - -Standalone skills (not part of the default `/ace:run`): -- `app-multimedia-coverage` — manual post-Phase-2, attaches display images -- `connect-baseline-screenshots` — cross-opp Connect-walkthrough capture -- `ocs-tester` (agent) — ad-hoc OCS quality probe -- `email-communicator` — utility skill, called by other skills -- `upload-transcript` — uploads CLI stream-json to ace-web - -In-flow skills with a removal trigger: -- `commcare-form-patch` — Phase 2 Step 2.8 workaround for - voidcraft-labs/nova-plugin#7 (Nova emits ``/`` - wrappers in Learn-app form XML that break the AVD's CommCare - runtime). Idempotent + no-op when zero wrappers found. Whole skill - + the backing `commcare_patch_xform` MCP atom self-delete the day - nova-plugin#7 ships and a clean `/ace:run` produces a wrapper-free - Learn CCZ. - ---- - -## Phase 1: Design Review - -**Agent:** `design-review` - -> Phase 1 of the CRISPR-Connect lifecycle: iterate an initial idea into an -> approved Program Design Document (PDD) and derive opp-specific test -> prompts for later OCS chatbot evaluation. - -### Skills - -#### idea-to-pdd -Develop a Program Design Doc (PDD) for a Connect intervention from source -material. Iterates a 5-question stress-test rubric until approved. - -#### idea-to-pdd-eval -Independently grade a PDD against the source idea pack — re-runs the -stress test from outside and cross-checks reviewer-comment fidelity. - -#### pdd-to-test-prompts -Derive opp-specific Q&A test prompts from an approved PDD. Produces the -ground-truth suite for the Phase 4 OCS chatbot deep gate. - -#### pdd-to-app-journeys -Derive opp-specific expected user journeys from an approved PDD. Produces -the UX-intent ground truth consumed by app-test-cases and app-ux-eval. - ---- - -## Phase 2: CommCare Setup - -**Agent:** `commcare-setup` - -> Phase 2 of the CRISPR-Connect lifecycle: translate the approved PDD into -> Learn and Deliver apps via Nova, deploy them to CommCare HQ, and test. - -### Skills - -#### pdd-to-learn-app -Build the CommCare Learn (training) app from the PDD via Nova's -/nova:autobuild. Captures nova_app_id and writes a structure summary. - -#### pdd-to-learn-app-eval -Grade a Nova-built Learn app against the PDD that specified it — module -count, order, Assessment Score wiring, content coverage. - -#### pdd-to-deliver-app -Build the CommCare Deliver (service-delivery) app from the PDD via Nova's -/nova:autobuild. Captures nova_app_id and writes a structure summary. - -#### pdd-to-deliver-app-eval -Grade a Nova-built Deliver app against the PDD that specified it — field -count, ordering, conditional logic, Connectify wiring. - -#### app-connect-coverage -Verify every form in a Nova-built Learn or Deliver app has the right -CommCare Connect markers, auto-fix via Nova edits, loop until clean. - -#### app-test-cases -Bind each PDD user journey to the Nova-built app structure and emit a -Maestro recipe per journey with real selectors. Use after Nova finishes -building, before app-release. - -#### app-deploy -Upload Nova-built Learn + Deliver apps to CommCare HQ as draft builds via -/nova:upload_to_hq. Captures HQ app IDs and writes a deploy summary. - -#### app-release -Build and release the Learn + Deliver CommCare apps on CCHQ so Connect -can read their form schema and surface deliver units. - -#### app-release-eval -Verify every Learn + Deliver build was actually released so Connect can -read deliver units. Provisional rubric pending 3+ real releases. - -#### app-multimedia-coverage (manual, not part of /ace:run) -Attach display-only images to Connect app questions where they -meaningfully help FLWs. Manual gate; not part of /ace:run. - -#### commcare-form-patch (Phase 2 Step 2.8, removal-tracked) -Apply surgical CCHQ form-XML patches when Nova's `compile_app` emits -output the AVD's CommCare runtime can't parse, then re-build + re-release. -Wired into Phase 2 as Step 2.8 in 0.13.66 — auto-runs after `app-release` -with `targets: auto` (no-op when zero wrappers in the released Learn CCZ). -Whole skill self-deletes when voidcraft-labs/nova-plugin#7 ships per its -own `## Removal criteria`. Workaround for jjackson/ace#115 finding 1. - ---- - -## Phase 3: Connect Setup - -**Agent:** `connect-setup` - -> Orchestrates Connect platform setup for a CRISPR-Connect opportunity: -> program creation, opportunity shell, verification flags, and payment -> units. Atom-driven via the ace-connect MCP (no HITL). - -### Skills - -#### connect-program-setup -Create or reuse a Connect Program for the opportunity, archetype-matched -to the PDD. Captures program_id for downstream skills. - -#### connect-program-setup-eval -Grade Connect Program + Opportunity configuration against the PDD — -reuse-vs-create, verification rules, delivery units, payment units. - -#### connect-opp-setup -Create and fully configure a Connect Opportunity — opp shell, verification -flags, payment units, ACE test-user pre-invite for emulator testing. - ---- - -## Phase 4: OCS Setup - -**Agent:** `ocs-setup` - -> Phase 4 of the CRISPR-Connect lifecycle: clone the ACE golden template, -> build the opp-specific RAG collection, smoke-test the bot via a thin -> quick chat suite, and stage the widget credentials for Connect. - -### Skills - -#### ocs-agent-setup -Clone the ACE OCS template into a per-opp chatbot, attach a RAG -collection from PDD + training + app summaries, publish, return embed -credentials. - -#### ocs-chatbot-qa -Exercise the per-opp OCS chatbot via its anonymous widget and capture a -transcript with structural checks. Modes: --quick / --deep / --monitor. - -#### ocs-chatbot-eval -LLM-as-Judge grader for OCS chatbot transcripts. Modes: --quick (1-dim -smoke), --deep / --monitor (5-dim calibrated; emits gate brief). - -#### ocs-widget-handoff-eval -Grade the OCS widget-handoff staging artifact for HITL paste-in — widget -URL, embed key, opportunity-binding instructions. - ---- - -## Phase 5: QA and Training - -**Agent:** `qa-and-training` - -> Phase 5 of the CRISPR-Connect lifecycle: produce per-opp QA test plan + -> walkthrough screenshots + training materials. All derived from the -> design docs (PDD, app summaries, opp identifiers, OCS chatbot URL) so -> the Phase runs from artifacts; no live LLO contact. - -### Skills - -#### app-screenshot-capture -Run app smoke recipes against a local AVD and capture per-step -screenshots for the training deck. Per-opp content only. - -#### app-ux-eval (deep-only, /ace:qa-deep) -Grade the FLW experience of the built apps via LLM-as-Judge over -captured screenshots. Deep-only — runs from /ace:qa-deep. - -#### training-llo-guide -Generate the LLO-facing operations document for overseeing FLW -deployment. Owns one artifact: llo-manager-guide.md. - -#### training-flw-guide -Generate the FLW-facing step-by-step guide for the Learn and Deliver -apps. Owns one artifact: flw-training-guide.md. - -#### training-quick-reference -Generate the one-page printable pocket-card summary for FLWs in the -field. Owns one artifact: quick-reference.md. - -#### training-faq -Generate anticipated LLO + FLW questions with authoritative answers. -Owns one artifact: faq.md. - -#### training-deck-outline -Generate the slide-by-slide markdown outline that training-deck-build -renders into a Google Slides deck. Owns one artifact. - -#### training-deck-build -Render training-deck-outline.md into a Google Slides deck using the -ACE template. Produces a presentable Slides URL. - -#### training-onboarding-email -Generate the LLO onboarding email body, consumed by llo-onboarding -and personalized per LLO at send time. Owns one artifact. - -#### connect-baseline-screenshots (cross-opp, manual) -Capture the per-Connect-version baseline of "how Connect works" -screenshots reused across every training deck. Manual, cross-opp. - ---- - -## Phase 6: Solicitation Management - -**Agent:** `solicitation-management` - -> Phase 6 of the CRISPR-Connect lifecycle: publish a solicitation derived -> from the PDD, invite PDD-named candidate LLOs to it by email, and stop. -> The review-and-award lifecycle continues via the manually-invoked -> `solicitation-review` skill (gated on a human-in-the-loop checkpoint -> before `award_response` is called). Phase 7 starts once an awardee is -> recorded in `opp.yaml.selected_llo`. - -### Skills - -#### solicitation-create -Translate the PDD into a solicitation payload, derive evaluation criteria, -and publish via connect-labs MCP. Captures solicitation_id. - -#### solicitation-create-eval -Grade a published solicitation against its source PDD — scope fidelity, -field completeness, deadline sensibility. - -#### llo-invite -Email each PDD-named candidate LLO the public solicitation URL. No-op -when PDD has no preferred_llos. - -#### solicitation-monitor (recurring) -Recurring poll for solicitation responses. Modes: --quick (count only) / ---monitor (full pull, default) / --close (final pull). - -#### solicitation-review (manual; HALT-and-resume) -Score solicitation responses, recommend an awardee, and (after HITL -approval) call award_response and populate opp.yaml.selected_llo. - -#### solicitation-review-eval -Compare ACE's top-ranked solicitation recommendation against the human's -actual award. Detection-rate metric. - ---- - -## Phase 7: Execution Management - -**Agent:** `execution-manager` - -> Phase 7 of the CRISPR-Connect lifecycle: execute the awarded LLO's run -> of the opportunity — onboarding, UAT, go-live, and recurring monitoring. -> Phase 7 entry is gated on `opp.yaml.selected_llo.org_slug` being -> populated by Phase 6's `solicitation-review` skill. - -### Skills - -#### llo-onboarding -Issue the Connect program invite and send the awarded LLO the ACE -onboarding email with training materials and OCS widget link. - -#### llo-uat -Coordinate User Acceptance Testing with onboarded LLOs. Send UAT -instructions, monitor feedback, compile results with sign-off status. - -#### llo-launch -Activate the opportunity for live use. Verifies UAT sign-offs and -deep-QA verdicts, activates in Connect, notifies LLOs of go-live. - -#### llo-launch-eval -Grade an llo-launch activation against PDD launch preconditions — UAT -sign-off, Connect activation, app-publish, go-live notify. - -#### timeline-monitor (recurring) -Watch whether LLOs are hitting expected milestones on schedule. Email -prompts when behind. Recurring during active opp. - -#### flw-data-review (recurring) -Analyze FLW submissions to identify quality issues, trends, and -improvement opportunities. Recurring during active opp. - -#### flw-data-review-eval -Grade an flw-data-review report — signal coverage, outlier rigor, -recommendation actionability, evidence citation, trajectory awareness. - -#### ocs-chatbot-qa --monitor (recurring) -See Phase 4. Phase 7 invokes recurring `--monitor` mode. - -#### ocs-chatbot-eval --monitor (recurring) -See Phase 4. Phase 7 invokes recurring `--monitor` mode. - ---- - -## Phase 8: Closeout - -**Agent:** `closeout` - -> Orchestrates opportunity closeout: invoice processing, LLO feedback -> collection, learnings summary, and overall cycle grading. Triggered -> when the opportunity reaches its end date. - -### Skills - -#### opp-closeout -Pull invoices from the completed opportunity and create a Jira ticket to -issue payment to the LLO. - -#### llo-feedback -Prompt LLOs for feedback on application, process, and next-step -suggestions. Collect and document responses for closeout. - -#### learnings-summary -Synthesize learnings from a completed opportunity. Drafts a new PDD to -seed the next cycle when iteration is warranted. - -#### cycle-grade -Grade the closed CRISPR-Connect cycle end-to-end with concrete -improvement recommendations for the next cycle. - -#### cycle-grade-eval -Independently re-grade a closed cycle's cycle-grade output. Detects -self-eval inflation, missing learnings, vague recommendations. - ---- - -## Cross-cutting Skills - -#### opp-eval (umbrella aggregator) -Umbrella aggregator that rolls every per-skill -eval verdict into a -run-level scorecard. Modes: --quick / --deep / --monitor. - -#### eval-calibration (methodology reference) -Methodology reference for calibrating ACE's per-skill -eval rubrics — -ground-truth catalogues, variance protocol, detection-rate metric. - -#### email-communicator (utility, called by other skills) -Send/receive email via GOG CLI using the ACE Gmail account. Utility -skill — other skills delegate here for any Gmail operation. - -#### upload-transcript (utility) -Upload a Claude CLI stream-json transcript (.jsonl) to a deployed -ace-web via /api/ingest/upload. Used by /ace:run --ace-web-url. - ---- - -## External Integrations - -### Connect API -ACE talks to Connect through **two** MCP servers, scoped to distinct -domains: - -1. **`connect-labs`** (lives in [`connect-labs` repo](https://github.com/dimagi/connect-labs)) - — solicitations, reviews, awards, funds. Production-ready and - unrelated to the Programs/Opportunities lifecycle ACE manages. - Consumed via a thin local stdio proxy (`mcp/connect-labs-server.ts`) - that forwards JSON-RPC frames to the remote HTTP MCP at - `https://labs.connect.dimagi.com/mcp/`. -2. **`ace-connect`** — composite Connect backend over `connect.dimagi.com`, - authenticated as `ace@dimagi-ai.com` via OAuth-with-CommCareHQ. 21 - atoms today: 8 authoring atoms route to the REST automation API - (commcare-connect#1135); the rest still drive HTML form pages via - Playwright. - -See `playbook/integrations/connect-api.md` for the atom inventory. - -### CommCare API -Production-ready CommCare HQ tools live in the `connect-labs` MCP -(`list_apps`, `get_app_structure`, etc.). ACE calls them for app -inspection during Phase 2. - -See `playbook/integrations/commcare-api.md`. - -### OCS (Open Chat Studio) -Composite MCP backend with **22 atomic capabilities** at -`mcp/ocs-server.ts` → `ace-ocs`. REST + Playwright + composite backends. -Authenticate with `/ace:ocs-login` before calling tools that hit live OCS. - -See `playbook/integrations/ocs-integration.md`. - -### Nova (CommCare app builder) -Live as a sibling Claude Code plugin (`voidcraft-labs/nova-marketplace`). -End-to-end smoke test passed 2026-04-28. ACE consumes Nova's -`/nova:autobuild` slash command via the Nova plugin; OAuth on first use. - -See `playbook/integrations/nova-integration.md`. - -### Mobile (CommCare Android emulation) -The `ace-mobile` MCP server drives a local Android AVD on the operator's -Mac via Maestro + adb + Playwright. **Mac-only, dev-machine-only** — no -cloud device farms. Bootstrap with `/ace:mobile-bootstrap`. - -See `playbook/integrations/mobile-integration.md`. - -### Slides (Google Slides API) -Slides atoms (`slides_get`, `slides_batch_update`, `slides_copy_template`) -shipped 0.10.78. Back the `training-deck-build` skill, which renders -markdown deck-outlines into editable Google Slides decks. - -See `playbook/integrations/slides-integration.md`. - ---- - -## Current Limitations - -`## Current Workaround` blocks across SKILL.md files document HITL -fallbacks for capabilities not yet automated. As of 0.13.43, no skills -ship with active workaround blocks — all previously-blocked Phase 3 / 5 / -7 paths are atom-driven via the `ace-connect` MCP (since 0.10.47). - -The `commcare-form-patch` and `app-multimedia-coverage` skills ARE -documented workarounds but for the Nova upstream, not for Connect/CCHQ. -Both have explicit `## Removal criteria` sections naming the upstream -ticket whose resolution will retire the skill. - ---- - -## Skill Reference - -54 ACE skills + 3 reference docs. All skills ship with -`disable-model-invocation: true` (orchestrator-dispatched, never -free-text invoked). See `skills/README.md` for the author contract. - -| Skill | Phase | Description (≤200 chars) | -|---|---|---| -| app-connect-coverage | 2 | Verify every form in a Nova-built Learn or Deliver app has the right CommCare Connect markers, auto-fix via Nova edits, loop until clean. | -| app-deploy | 2 | Upload Nova-built Learn + Deliver apps to CommCare HQ as draft builds via /nova:upload_to_hq. Captures HQ app IDs and writes a deploy summary. | -| app-multimedia-coverage | 2 (manual) | Attach display-only images to Connect app questions where they meaningfully help FLWs. Manual gate; not part of /ace:run. | -| app-release | 2 | Build and release the Learn + Deliver CommCare apps on CCHQ so Connect can read their form schema and surface deliver units. | -| app-release-eval | 2 | Verify every Learn + Deliver build was actually released so Connect can read deliver units. Provisional rubric pending 3+ real releases. | -| app-screenshot-capture | 5 | Run app smoke recipes against a local AVD and capture per-step screenshots for the training deck. Per-opp content only. | -| app-test-cases | 2 | Bind each PDD user journey to the Nova-built app structure and emit a Maestro recipe per journey with real selectors. Use after Nova finishes building, before app-release. | -| app-ux-eval | 5 (deep) | Grade the FLW experience of the built apps via LLM-as-Judge over captured screenshots. Deep-only — runs from /ace:qa-deep. | -| commcare-form-patch | 2 (workaround) | Apply surgical CCHQ form-XML patches when Nova's compile_app emits output Connect rejects, then re-build + re-release. Workaround skill. | -| connect-baseline-screenshots | xcut | Capture the per-Connect-version baseline of "how Connect works" screenshots reused across every training deck. Manual, cross-opp. | -| connect-opp-setup | 3 | Create and fully configure a Connect Opportunity — opp shell, verification flags, payment units, ACE test-user pre-invite for emulator testing. | -| connect-program-setup | 3 | Create or reuse a Connect Program for the opportunity, archetype-matched to the PDD. Captures program_id for downstream skills. | -| connect-program-setup-eval | 3 | Grade Connect Program + Opportunity configuration against the PDD — reuse-vs-create, verification rules, delivery units, payment units. | -| cycle-grade | 8 | Grade the closed CRISPR-Connect cycle end-to-end with concrete improvement recommendations for the next cycle. | -| cycle-grade-eval | 8 | Independently re-grade a closed cycle's cycle-grade output. Detects self-eval inflation, missing learnings, vague recommendations. | -| email-communicator | xcut | Send/receive email via GOG CLI using the ACE Gmail account. Utility skill — other skills delegate here for any Gmail operation. | -| eval-calibration | xcut | Methodology reference for calibrating ACE's per-skill -eval rubrics — ground-truth catalogues, variance protocol, detection-rate metric. | -| flw-data-review | 7 | Analyze FLW submissions to identify quality issues, trends, and improvement opportunities. Recurring during active opp. | -| flw-data-review-eval | 7 | Grade an flw-data-review report — signal coverage, outlier rigor, recommendation actionability, evidence citation, trajectory awareness. | -| idea-to-pdd | 1 | Develop a Program Design Doc (PDD) for a Connect intervention from source material. Iterates a 5-question stress-test rubric until approved. | -| idea-to-pdd-eval | 1 | Independently grade a PDD against the source idea pack — re-runs the stress test from outside and cross-checks reviewer-comment fidelity. | -| learnings-summary | 8 | Synthesize learnings from a completed opportunity. Drafts a new PDD to seed the next cycle when iteration is warranted. | -| llo-feedback | 8 | Prompt LLOs for feedback on application, process, and next-step suggestions. Collect and document responses for closeout. | -| llo-invite | 6 | Email each PDD-named candidate LLO the public solicitation URL. No-op when PDD has no preferred_llos. | -| llo-launch | 7 | Activate the opportunity for live use. Verifies UAT sign-offs and deep-QA verdicts, activates in Connect, notifies LLOs of go-live. | -| llo-launch-eval | 7 | Grade an llo-launch activation against PDD launch preconditions — UAT sign-off, Connect activation, app-publish, go-live notify. | -| llo-onboarding | 7 | Issue the Connect program invite and send the awarded LLO the ACE onboarding email with training materials and OCS widget link. | -| llo-uat | 7 | Coordinate User Acceptance Testing with onboarded LLOs. Send UAT instructions, monitor feedback, compile results with sign-off status. | -| ocs-agent-setup | 4 | Clone the ACE OCS template into a per-opp chatbot, attach a RAG collection from PDD + training + app summaries, publish, return embed credentials. | -| ocs-chatbot-eval | 4, 7 | LLM-as-Judge grader for OCS chatbot transcripts. Modes: --quick (1-dim smoke), --deep / --monitor (5-dim calibrated; emits gate brief). | -| ocs-chatbot-qa | 4, 7 | Exercise the per-opp OCS chatbot via its anonymous widget and capture a transcript with structural checks. Modes: --quick / --deep / --monitor. | -| ocs-widget-handoff-eval | 4 | Grade the OCS widget-handoff staging artifact for HITL paste-in — widget URL, embed key, opportunity-binding instructions. | -| opp-closeout | 8 | Pull invoices from the completed opportunity and create a Jira ticket to issue payment to the LLO. | -| opp-eval | xcut | Umbrella aggregator that rolls every per-skill -eval verdict into a run-level scorecard. Modes: --quick / --deep / --monitor. | -| pdd-to-app-journeys | 1 | Derive opp-specific expected user journeys from an approved PDD. Produces the UX-intent ground truth consumed by app-test-cases and app-ux-eval. | -| pdd-to-deliver-app | 2 | Build the CommCare Deliver (service-delivery) app from the PDD via Nova's /nova:autobuild. Captures nova_app_id and writes a structure summary. | -| pdd-to-deliver-app-eval | 2 | Grade a Nova-built Deliver app against the PDD that specified it — field count, ordering, conditional logic, Connectify wiring. | -| pdd-to-learn-app | 2 | Build the CommCare Learn (training) app from the PDD via Nova's /nova:autobuild. Captures nova_app_id and writes a structure summary. | -| pdd-to-learn-app-eval | 2 | Grade a Nova-built Learn app against the PDD that specified it — module count, order, Assessment Score wiring, content coverage. | -| pdd-to-test-prompts | 1 | Derive opp-specific Q&A test prompts from an approved PDD. Produces the ground-truth suite for the Phase 4 OCS chatbot deep gate. | -| solicitation-create | 6 | Translate the PDD into a solicitation payload, derive evaluation criteria, and publish via connect-labs MCP. Captures solicitation_id. | -| solicitation-create-eval | 6 | Grade a published solicitation against its source PDD — scope fidelity, field completeness, deadline sensibility. | -| solicitation-monitor | 6 | Recurring poll for solicitation responses. Modes: --quick (count only) / --monitor (full pull, default) / --close (final pull). | -| solicitation-review | 6 (manual) | Score solicitation responses, recommend an awardee, and (after HITL approval) call award_response and populate opp.yaml.selected_llo. | -| solicitation-review-eval | 6 | Compare ACE's top-ranked solicitation recommendation against the human's actual award. Detection-rate metric. | -| timeline-monitor | 7 | Watch whether LLOs are hitting expected milestones on schedule. Email prompts when behind. Recurring during active opp. | -| training-deck-build | 5 | Render training-deck-outline.md into a Google Slides deck using the ACE template. Produces a presentable Slides URL. | -| training-deck-outline | 5 | Generate the slide-by-slide markdown outline that training-deck-build renders into a Google Slides deck. Owns one artifact. | -| training-faq | 5 | Generate anticipated LLO + FLW questions with authoritative answers. Owns one artifact: faq.md. | -| training-flw-guide | 5 | Generate the FLW-facing step-by-step guide for the Learn and Deliver apps. Owns one artifact: flw-training-guide.md. | -| training-llo-guide | 5 | Generate the LLO-facing operations document for overseeing FLW deployment. Owns one artifact: llo-manager-guide.md. | -| training-onboarding-email | 5 | Generate the LLO onboarding email body, consumed by llo-onboarding and personalized per LLO at send time. Owns one artifact. | -| training-quick-reference | 5 | Generate the one-page printable pocket-card summary for FLWs in the field. Owns one artifact: quick-reference.md. | -| upload-transcript | xcut | Upload a Claude CLI stream-json transcript (.jsonl) to a deployed ace-web via /api/ingest/upload. Used by /ace:run --ace-web-url. | - -### Reference docs (`skills/_*-template.md`) - -Three reference documents extract shared boilerplate so skills don't -duplicate it. Excluded from the skill catalog because filenames start -with `_`. - -- `_eval-template.md` — verdict YAML contract, severity rules, inflation - guard, stock blocks for `## MCP Tools Used / ## Mode Behavior / - ## Dry-Run Behavior`. Referenced by all 12 `*-eval` skills. -- `_training-template.md` — per-artifact decomposition rationale, sibling - map, common Drive paths. Referenced by the 7 `training-*` skills. -- `_solicitation-template.md` — `opp.yaml.solicitation` and - `opp.yaml.selected_llo` contract, connect-labs MCP atom inventory, - Phase 6 → Phase 7 boundary rule. Referenced by all 5 solicitation - skills + `llo-invite`. diff --git a/docs/superpowers/plans/2026-05-04-ace-solicitations-phase.md b/docs/superpowers/plans/2026-05-04-ace-solicitations-phase.md deleted file mode 100644 index 48b00109..00000000 --- a/docs/superpowers/plans/2026-05-04-ace-solicitations-phase.md +++ /dev/null @@ -1,2425 +0,0 @@ -# ACE Solicitations Phase Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Insert a new Phase 7 (Solicitation Management) into the ACE lifecycle, renumber the existing Phase 7 (LLO Management → Execution Management) and Phase 8 (Closeout) to 7 and 8, and wire ACE to consume the existing connect-labs remote MCP for solicitation/review/award atoms. - -**Architecture:** New `solicitation-management` subagent owns Phase 7. Two skills (`solicitation-create`, `llo-invite` — moved from old Phase 7 and rewritten) run in default `/ace:run`; one recurring skill (`solicitation-monitor`); one manual skill (`solicitation-review`). ACE consumes connect-labs's remote MCP at `https://labs.connect.dimagi.com/mcp/` via a thin local stdio proxy (`mcp/connect-labs-server.ts`) that forwards JSON-RPC and injects the bearer PAT. No new atoms in `ace-connect`. - -**Tech Stack:** TypeScript (`npx tsx` MCP subprocesses), vitest, Markdown SKILL.md prompt files, `.claude-plugin/plugin.json` for MCP wiring, 1Password for secrets. - -**Spec:** [`docs/superpowers/specs/2026-05-04-ace-solicitations-phase-design.md`](../specs/2026-05-04-ace-solicitations-phase-design.md) - ---- - -## File Structure - -**New files:** -- `mcp/connect-labs-server.ts` — stdio MCP proxy forwarding to `labs.connect.dimagi.com/mcp/` -- `agents/solicitation-management.md` — Phase 7 subagent -- `skills/solicitation-create/SKILL.md` -- `skills/solicitation-create-eval/SKILL.md` -- `skills/solicitation-monitor/SKILL.md` -- `skills/solicitation-review/SKILL.md` -- `skills/solicitation-review-eval/SKILL.md` -- `test/mcp/connect-labs/proxy.test.ts` — unit test for the proxy -- `test/mcp/connect-labs/integration/e2e.integration.test.ts` — `LABS_INTEGRATION=1` end-to-end -- `test/skills/solicitation/*.test.ts` — fixture-driven validation -- `test/fixtures/CRISPR-Test-004-Solicitation/` — golden fixture - -**Renamed files:** -- `agents/llo-manager.md` → `agents/execution-manager.md` - -**Heavily modified:** -- `skills/llo-invite/SKILL.md` — rewritten; phase moves Phase 8 → Phase 7; behavior changes from Connect-roster prep to solicitation-invite email -- `skills/llo-onboarding/SKILL.md` — reads `selected_llo` from `opp.yaml`, fails fast if empty -- `agents/ace-orchestrator.md` — phases block, pause-points, prose -- `lib/artifact-manifest.ts` — drop `connect-setup/invites.md` artifacts, add solicitation/* artifacts -- `bin/ace-doctor` — new `[Connect Labs]` section -- `templates/pdd-template.md` — three new optional fields -- `CLAUDE.md` — phase order list, plugin overview, pause-points - -**Search/replace pass (low-content edits):** -- `agents/connect-setup.md`, `agents/ocs-setup.md`, `agents/qa-and-training.md` — Phase 7/7 references -- `skills/training-onboarding-email/SKILL.md`, `skills/training-deck-build/SKILL.md`, `skills/llo-launch-eval/SKILL.md`, `skills/cycle-grade-eval/SKILL.md`, `skills/connect-opp-setup/SKILL.md`, `skills/ocs-widget-handoff-eval/SKILL.md` -- `commands/run.md`, `commands/step.md` - ---- - -### Task 1: Renumber phase ordinals across the codebase - -**Files:** -- Modify: `agents/ace-orchestrator.md` -- Modify: `agents/connect-setup.md` -- Modify: `agents/ocs-setup.md` -- Modify: `agents/qa-and-training.md` -- Modify: `agents/llo-manager.md` (will be renamed in Task 2) -- Modify: `skills/training-onboarding-email/SKILL.md` -- Modify: `skills/training-deck-build/SKILL.md` -- Modify: `skills/llo-launch-eval/SKILL.md` -- Modify: `skills/cycle-grade-eval/SKILL.md` -- Modify: `skills/connect-opp-setup/SKILL.md` -- Modify: `skills/ocs-widget-handoff-eval/SKILL.md` -- Modify: `bin/ace-doctor` - -- [ ] **Step 1: Inspect every Phase 7/7 reference** - -Run: `grep -rn "Phase 7\|Phase 8\|phase_ordinal: 6\|phase_ordinal: 7\|phase: 6\|phase: 7\|phase_6\|phase_7\|phases.llo_management\|llo-management" agents/ skills/ commands/ bin/ lib/ CLAUDE.md README.md` - -Expected: ~50 matches across the files listed above. Read each match in context — some are descriptive prose ("Phase 7 (LLO Management)"), some are frontmatter (`phase_ordinal: 6`), some are state-key names (`phase_6_backlog`). - -- [ ] **Step 2: Apply mechanical replacements** - -The renumbering is a context-aware substitution. For each file, apply: -- `phase_ordinal: 7` (where it was 7 in `closeout`) → `phase_ordinal: 8` -- `phase_ordinal: 6` (where it was 6 in `llo-manager`) → `phase_ordinal: 7` -- `Phase 7` (where it referred to llo-management) → `Phase 8` -- `Phase 8` (where it referred to closeout) → `Phase 9` -- `phase_6_backlog` (orchestrator state key for old Phase 7) → `phase_7_backlog` -- `phase_7_backlog` (orchestrator state key for old Phase 8) → `phase_8_backlog` -- `phases.llo_management` → `phases.execution_management` - -**Do not yet touch:** -- The `Phase 7` references in `solicitation-management` (it doesn't exist yet) -- The `phases:` block in `ace-orchestrator.md` (handled in Task 18 — must add new entry and renumber atomically) - -For `bin/ace-doctor`: any `phase_6_*` / `phase_7_*` health check identifiers shift up by one. The ` [LLO Management]` section header becomes ` [Execution Management]`. - -- [ ] **Step 3: Verify no leftover stale references** - -Run: `grep -rn "phase_6_backlog\|phases.llo_management\|Phase 7 (LLO\|Phase 8 (Closeout)" agents/ skills/ commands/ bin/ lib/` - -Expected: zero matches. (The orchestrator's `phases:` block will still have `phase_ordinal` integers; that's fine — they get rewritten in Task 18.) - -- [ ] **Step 4: Run vitest to verify nothing structural broke** - -Run: `npm test -- --run` - -Expected: tests pass at the same rate as on `main` for this branch. Renumbering is documentation-level; nothing in code references phase numbers as integers except `lib/artifact-manifest.ts` (which doesn't number phases — it uses string names like `'design'`, `'connect'`). - -- [ ] **Step 5: Commit** - -```bash -git add agents/ skills/ bin/ace-doctor -git commit -m "refactor(phases): renumber Phase 7→8, Phase 8→9 (no behavior change) - -Pure rename pass: prepares the topology for the new Phase 7 -(Solicitation Management) added in subsequent commits. Touches phase -ordinals, run-state backlog keys, and prose references in agents, -skills, and doctor sections. The orchestrator's phases: block is -rewritten atomically in a later task. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 2: Rename `llo-manager` agent to `execution-manager` - -**Files:** -- Rename: `agents/llo-manager.md` → `agents/execution-manager.md` - -- [ ] **Step 1: Rename the file via git** - -Run: `git mv agents/llo-manager.md agents/execution-manager.md` - -- [ ] **Step 2: Rewrite the frontmatter and opening prose** - -Edit `agents/execution-manager.md`. Replace the frontmatter block: - -```yaml ---- -name: execution-manager -description: > - Phase 8 of the CRISPR-Connect lifecycle: execute the awarded LLO's run - of the opportunity — onboarding, UAT, go-live, and recurring monitoring. - Phase 8 entry is gated on `opp.yaml.selected_llo.org_slug` being populated - by Phase 7's solicitation-review skill (which the run halts before). -model: inherit -phase: execution-management -phase_display: Execution Management -phase_ordinal: 7 -skills: - - { name: llo-onboarding, has_judge: false } - - { name: llo-uat, has_judge: false } - - { name: llo-launch, has_judge: true, eval_skill: llo-launch-eval } -recurring_skills: - - { name: timeline-monitor, has_judge: true } - - { name: flw-data-review, has_judge: true, eval_skill: flw-data-review-eval } - - { name: ocs-chatbot-qa, has_judge: false } - - { name: ocs-chatbot-eval, has_judge: true } ---- -``` - -Note: `llo-invite` is removed from the skills list (it moves to Phase 7 in Task 14). The remainder of the agent body keeps its existing prose for `llo-onboarding`, `llo-uat`, `llo-launch`, and the recurring skills — only the phase numbering and the "first LLO contact" framing get rewritten. - -- [ ] **Step 3: Update the agent body's opening paragraph** - -Replace the opening "You run the first LLO-facing phase..." paragraph with: - -> You run the execution phase of a CRISPR-Connect opportunity. By the time this phase starts, Phase 7 (Solicitation Management) has published a solicitation, collected responses, and (via the manual `solicitation-review` skill) awarded an org. The awardee is recorded in `opp.yaml.selected_llo` — that's the LLO this phase onboards, supports through UAT, takes to go-live, and monitors during execution. - -- [ ] **Step 4: Strip Step 1 (LLO Invitation List) from the body** - -The existing agent body has a "### Step 1: LLO Invitation List" section that calls `llo-invite`. Delete that section. Renumber the remaining steps so what was Step 2 (LLO Onboarding) becomes Step 1, Step 3 → Step 2, Step 4 → Step 3, Step 5 → Step 4. Update internal cross-references ("Step 1" / "Step 2" / etc.) accordingly. - -- [ ] **Step 5: Update the orchestrator-side dispatch reference** - -Run: `grep -rn "llo-manager\|ace:llo-manager" agents/ commands/ CLAUDE.md` - -For each match, replace `llo-manager` → `execution-manager` (preserve case and the `ace:` prefix where applicable). The orchestrator's `Agent(llo-manager)` calls will be rewritten when the phases block is updated in Task 18. - -- [ ] **Step 6: Commit** - -```bash -git add agents/execution-manager.md agents/llo-manager.md commands/ CLAUDE.md -git commit -m "refactor(agent): rename llo-manager → execution-manager - -Phase 8 (was Phase 7) is no longer 'first LLO contact' — that role moves -to the new Phase 7 (Solicitation Management) which publishes solicitations -and invites candidate LLOs. Phase 8 takes over once an awardee exists. - -Drops the llo-invite skill from the agent's skill list (it moves to -Phase 7 in a later commit). Renumbers internal step numbering. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 3: Build the connect-labs MCP stdio proxy - -**Files:** -- Create: `mcp/connect-labs-server.ts` -- Create: `test/mcp/connect-labs/proxy.test.ts` - -**Why a proxy.** The labs MCP runs as remote HTTP at `https://labs.connect.dimagi.com/mcp/`. ACE's existing MCP wiring (`.claude-plugin/plugin.json`) only uses stdio MCPs (`command + args`). Rather than experiment with whether plugin.json supports `type: "http"` mcpServers, write a thin local stdio proxy that forwards JSON-RPC frames to labs over HTTP and injects the bearer PAT. Same shape as `mcp/google-drive-server.ts`, `mcp/ocs-server.ts`, etc. - -- [ ] **Step 1: Write the failing test** - -Create `test/mcp/connect-labs/proxy.test.ts`: - -```typescript -import { describe, it, expect, beforeEach, vi } from 'vitest'; -import { spawn } from 'node:child_process'; -import path from 'node:path'; - -const PROXY_PATH = path.resolve(__dirname, '../../../mcp/connect-labs-server.ts'); - -describe('connect-labs-server (stdio → HTTP proxy)', () => { - beforeEach(() => { - vi.restoreAllMocks(); - }); - - it('forwards a JSON-RPC frame to labs with Bearer auth and returns the body', async () => { - const fetchSpy = vi.spyOn(global, 'fetch').mockResolvedValue( - new Response(JSON.stringify({ jsonrpc: '2.0', id: 1, result: { ok: true } }), { - status: 200, - headers: { 'Content-Type': 'application/json' }, - }), - ); - - // The proxy is launched as a subprocess in real use; here we import its forward() - // function directly for unit testing. - const { forward } = await import('../../../mcp/connect-labs-server'); - const out = await forward( - { jsonrpc: '2.0', id: 1, method: 'tools/list', params: {} }, - { token: 'test-token', url: 'https://labs.example/mcp/' }, - ); - - expect(fetchSpy).toHaveBeenCalledOnce(); - const [calledUrl, init] = fetchSpy.mock.calls[0]; - expect(calledUrl).toBe('https://labs.example/mcp/'); - expect(init?.method).toBe('POST'); - expect(init?.headers).toMatchObject({ - 'Authorization': 'Bearer test-token', - 'Content-Type': 'application/json', - }); - expect(JSON.parse(init?.body as string)).toEqual({ - jsonrpc: '2.0', - id: 1, - method: 'tools/list', - params: {}, - }); - expect(out).toEqual({ jsonrpc: '2.0', id: 1, result: { ok: true } }); - }); - - it('returns a JSON-RPC error envelope when the upstream returns 401', async () => { - vi.spyOn(global, 'fetch').mockResolvedValue( - new Response(JSON.stringify({ error: { code: 'PERMISSION_DENIED', message: 'bad token' } }), { - status: 401, - }), - ); - const { forward } = await import('../../../mcp/connect-labs-server'); - const out = await forward( - { jsonrpc: '2.0', id: 2, method: 'tools/list', params: {} }, - { token: 'bad', url: 'https://labs.example/mcp/' }, - ); - expect(out).toMatchObject({ - jsonrpc: '2.0', - id: 2, - error: { - code: -32000, - message: expect.stringContaining('401'), - }, - }); - }); - - it('throws if LABS_MCP_TOKEN is empty when invoked without an explicit token', async () => { - const { forward } = await import('../../../mcp/connect-labs-server'); - await expect( - forward({ jsonrpc: '2.0', id: 3, method: 'tools/list' }, { token: '', url: 'https://labs.example/mcp/' }), - ).rejects.toThrow(/LABS_MCP_TOKEN/); - }); -}); -``` - -- [ ] **Step 2: Run the test to verify it fails** - -Run: `npm test -- --run test/mcp/connect-labs/proxy.test.ts` - -Expected: FAIL with "Cannot find module '../../../mcp/connect-labs-server'". - -- [ ] **Step 3: Implement the proxy** - -Create `mcp/connect-labs-server.ts`: - -```typescript -#!/usr/bin/env tsx -/** - * connect-labs-server: stdio MCP proxy to labs.connect.dimagi.com/mcp/. - * - * Reads LABS_MCP_TOKEN from ${CLAUDE_PLUGIN_DATA}/.env (legacy fallback: - * plugin root .env), then forwards every JSON-RPC frame received on stdin - * over HTTPS to the labs MCP, injecting `Authorization: Bearer `. - * The HTTP response body is written back to stdout as a single line. - * - * Stays a stdio MCP because ACE's plugin.json only wires stdio mcpServers - * (verified via grep: every existing entry uses `command + args`). When - * Claude Code's plugin.json gains first-class HTTP MCP support, this - * proxy can be deleted in favor of a direct `type: "http"` entry. - */ - -import { readFileSync } from 'node:fs'; -import { join } from 'node:path'; -import { createInterface } from 'node:readline'; - -export interface JsonRpcFrame { - jsonrpc: '2.0'; - id?: number | string; - method?: string; - params?: unknown; - result?: unknown; - error?: { code: number; message: string; data?: unknown }; -} - -export interface ForwardOpts { - token: string; - url: string; -} - -export async function forward(frame: JsonRpcFrame, opts: ForwardOpts): Promise { - if (!opts.token) { - throw new Error('LABS_MCP_TOKEN is required to forward to labs MCP'); - } - const res = await fetch(opts.url, { - method: 'POST', - headers: { - 'Authorization': `Bearer ${opts.token}`, - 'Content-Type': 'application/json', - }, - body: JSON.stringify(frame), - }); - if (!res.ok) { - return { - jsonrpc: '2.0', - id: frame.id, - error: { - code: -32000, - message: `labs MCP returned ${res.status}: ${await res.text()}`, - }, - }; - } - return (await res.json()) as JsonRpcFrame; -} - -function loadEnvFile(path: string): Record { - try { - const txt = readFileSync(path, 'utf8'); - const out: Record = {}; - for (const line of txt.split('\n')) { - if (!line || line.startsWith('#')) continue; - const eq = line.indexOf('='); - if (eq <= 0) continue; - out[line.slice(0, eq).trim()] = line.slice(eq + 1).trim().replace(/^['"]|['"]$/g, ''); - } - return out; - } catch { - return {}; - } -} - -function loadToken(): string { - if (process.env.LABS_MCP_TOKEN) return process.env.LABS_MCP_TOKEN; - const dataDir = process.env.CLAUDE_PLUGIN_DATA; - if (dataDir) { - const fromData = loadEnvFile(join(dataDir, '.env')).LABS_MCP_TOKEN; - if (fromData) return fromData; - } - const rootEcho = process.env.CLAUDE_PLUGIN_ROOT_ECHO; - if (rootEcho) { - const fromRoot = loadEnvFile(join(rootEcho, '.env')).LABS_MCP_TOKEN; - if (fromRoot) return fromRoot; - } - return ''; -} - -async function main() { - const token = loadToken(); - const url = process.env.LABS_MCP_URL || 'https://labs.connect.dimagi.com/mcp/'; - - const rl = createInterface({ input: process.stdin, crlfDelay: Infinity }); - for await (const line of rl) { - const trimmed = line.trim(); - if (!trimmed) continue; - let frame: JsonRpcFrame; - try { - frame = JSON.parse(trimmed) as JsonRpcFrame; - } catch (e) { - process.stdout.write(JSON.stringify({ - jsonrpc: '2.0', - id: null, - error: { code: -32700, message: `Parse error: ${(e as Error).message}` }, - }) + '\n'); - continue; - } - try { - const reply = await forward(frame, { token, url }); - process.stdout.write(JSON.stringify(reply) + '\n'); - } catch (e) { - process.stdout.write(JSON.stringify({ - jsonrpc: '2.0', - id: frame.id, - error: { code: -32000, message: (e as Error).message }, - }) + '\n'); - } - } -} - -if (import.meta.url === `file://${process.argv[1]}`) { - main().catch((e) => { - process.stderr.write(`connect-labs-server fatal: ${(e as Error).stack || e}\n`); - process.exit(1); - }); -} -``` - -- [ ] **Step 4: Run the test to verify it passes** - -Run: `npm test -- --run test/mcp/connect-labs/proxy.test.ts` - -Expected: 3/3 pass. - -- [ ] **Step 5: Commit** - -```bash -git add mcp/connect-labs-server.ts test/mcp/connect-labs/proxy.test.ts -git commit -m "feat(mcp): add connect-labs stdio proxy to labs MCP - -Forwards JSON-RPC frames from Claude Code (stdio) to -labs.connect.dimagi.com/mcp/ (HTTP) with Bearer PAT injected from -LABS_MCP_TOKEN. Same shape as the other ACE MCP servers. - -Plugin.json wiring + .env.tpl + doctor checks land in subsequent commits. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 4: Wire LABS_MCP_TOKEN into `.env.tpl` and `plugin.json` - -**Files:** -- Modify: `.env.tpl` -- Modify: `.claude-plugin/plugin.json` - -- [ ] **Step 1: Add the env var to `.env.tpl`** - -Append to `.env.tpl` (after the existing Connect block): - -``` -# Connect Labs (solicitations / reviews / awards) — labs.connect.dimagi.com -# Bearer PAT for the labs MCP, scoped to the ace@dimagi-ai.com labs user. -# To rotate: a labs admin runs: -# python manage.py mcp_create_token --user ace@dimagi-ai.com --name ACE-plugin --ttl-days 0 -# then drops the printed token into the 1Password item below. -LABS_MCP_TOKEN=op://Dimagi/labs-mcp-pat-ace/credential -``` - -- [ ] **Step 2: Add the MCP entry to `plugin.json`** - -In `.claude-plugin/plugin.json`, append a new entry under `mcpServers` (after the existing `ace-mobile` entry): - -```jsonc -"connect-labs": { - "command": "npx", - "args": ["tsx", "${CLAUDE_PLUGIN_ROOT}/mcp/connect-labs-server.ts"], - "env": { - "CLAUDE_PLUGIN_DATA": "${CLAUDE_PLUGIN_DATA}", - "CLAUDE_PLUGIN_ROOT_ECHO": "${CLAUDE_PLUGIN_ROOT}" - } -} -``` - -Note: the proxy reads `LABS_MCP_TOKEN` from `${CLAUDE_PLUGIN_DATA}/.env` itself (see Task 3 step 3) — passing `CLAUDE_PLUGIN_DATA` is enough; we do not put the token directly in plugin.json's `env` block. - -- [ ] **Step 3: Verify the manifest still parses** - -Run: `node -e "JSON.parse(require('fs').readFileSync('.claude-plugin/plugin.json', 'utf8'))"` - -Expected: no output, exit 0. - -- [ ] **Step 4: Verify the marketplace mirror is in sync** - -Run: `node -e "const m = JSON.parse(require('fs').readFileSync('.claude-plugin/marketplace.json', 'utf8'));"` - -Expected: no output, exit 0. (The version-sync hook keeps marketplace.json in sync with plugin.json on commit.) - -- [ ] **Step 5: Commit** - -```bash -git add .env.tpl .claude-plugin/plugin.json -git commit -m "feat(mcp): wire connect-labs MCP into plugin manifest - -Adds the connect-labs stdio proxy to mcpServers and LABS_MCP_TOKEN to -.env.tpl. After op inject, ACE skills can call mcp__connect-labs__* -atoms (create_solicitation, list_responses, award_response, etc.). - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 5: Add `[Connect Labs]` doctor section - -**Files:** -- Modify: `bin/ace-doctor` -- Create: `test/doctor/connect-labs.test.ts` - -- [ ] **Step 1: Find the existing pattern** - -Read `bin/ace-doctor` and locate the `[Connect]` section (anchor: a line beginning with `## ` or printing a `[Connect]` header). Note the style: each check prints a tag (e.g. `connect_env`, `connect_session`) followed by `OK | WARN | FAIL` and a one-liner explanation. - -- [ ] **Step 2: Write the failing test** - -Create `test/doctor/connect-labs.test.ts`: - -```typescript -import { describe, it, expect, vi, beforeEach } from 'vitest'; - -// The doctor module exports check functions (assumes bin/ace-doctor has been -// refactored to expose ts-importable helpers; if not, this test invokes them -// via subprocess). - -describe('doctor [Connect Labs] checks', () => { - beforeEach(() => vi.restoreAllMocks()); - - it('connect_labs_env: FAIL when LABS_MCP_TOKEN missing', async () => { - const { checkConnectLabsEnv } = await import('../../bin/checks/connect-labs'); - const result = await checkConnectLabsEnv({ envFile: 'test/fixtures/empty.env' }); - expect(result.tag).toBe('connect_labs_env'); - expect(result.status).toBe('FAIL'); - expect(result.message).toMatch(/LABS_MCP_TOKEN/); - }); - - it('connect_labs_env: OK when token present', async () => { - const { checkConnectLabsEnv } = await import('../../bin/checks/connect-labs'); - const result = await checkConnectLabsEnv({ envFile: 'test/fixtures/with-labs-token.env' }); - expect(result.status).toBe('OK'); - }); - - it('connect_labs_mcp_reachable: FAIL on 401 (PAT bad)', async () => { - vi.spyOn(global, 'fetch').mockResolvedValue(new Response('', { status: 401 })); - const { checkConnectLabsReachable } = await import('../../bin/checks/connect-labs'); - const result = await checkConnectLabsReachable({ token: 'bad', url: 'https://labs.example/mcp/' }); - expect(result.status).toBe('FAIL'); - expect(result.message).toMatch(/PAT|401/); - }); - - it('connect_labs_mcp_reachable: OK on 200', async () => { - vi.spyOn(global, 'fetch').mockResolvedValue( - new Response(JSON.stringify({ jsonrpc: '2.0', id: 1, result: {} }), { status: 200 }), - ); - const { checkConnectLabsReachable } = await import('../../bin/checks/connect-labs'); - const result = await checkConnectLabsReachable({ token: 'good', url: 'https://labs.example/mcp/' }); - expect(result.status).toBe('OK'); - }); - - it('connect_labs_connect_oauth: WARN with actionable hint when tool returns PERMISSION_DENIED', async () => { - vi.spyOn(global, 'fetch').mockResolvedValue( - new Response(JSON.stringify({ - jsonrpc: '2.0', - id: 1, - error: { code: -32000, message: 'PERMISSION_DENIED: connect oauth required' }, - }), { status: 200 }), - ); - const { checkConnectLabsConnectOAuth } = await import('../../bin/checks/connect-labs'); - const result = await checkConnectLabsConnectOAuth({ token: 'good', url: 'https://labs.example/mcp/' }); - expect(result.status).toBe('WARN'); - expect(result.message).toMatch(/Connect OAuth/); - }); - - it('connect_labs_connect_oauth: OK on a successful tools/call list_solicitations', async () => { - vi.spyOn(global, 'fetch').mockResolvedValue( - new Response(JSON.stringify({ - jsonrpc: '2.0', - id: 1, - result: { content: [{ type: 'text', text: '[]' }] }, - }), { status: 200 }), - ); - const { checkConnectLabsConnectOAuth } = await import('../../bin/checks/connect-labs'); - const result = await checkConnectLabsConnectOAuth({ token: 'good', url: 'https://labs.example/mcp/' }); - expect(result.status).toBe('OK'); - }); -}); -``` - -Also create test fixture files: -- `test/fixtures/empty.env` — empty file -- `test/fixtures/with-labs-token.env` — single line `LABS_MCP_TOKEN=test-token` - -- [ ] **Step 3: Run the test to verify it fails** - -Run: `npm test -- --run test/doctor/connect-labs.test.ts` - -Expected: FAIL with "Cannot find module '../../bin/checks/connect-labs'". - -- [ ] **Step 4: Implement the check helpers** - -Create `bin/checks/connect-labs.ts`: - -```typescript -import { readFileSync } from 'node:fs'; - -export interface CheckResult { - tag: string; - status: 'OK' | 'WARN' | 'FAIL'; - message: string; -} - -function parseEnvFile(path: string): Record { - try { - const out: Record = {}; - for (const line of readFileSync(path, 'utf8').split('\n')) { - if (!line || line.startsWith('#')) continue; - const eq = line.indexOf('='); - if (eq <= 0) continue; - out[line.slice(0, eq).trim()] = line.slice(eq + 1).trim().replace(/^['"]|['"]$/g, ''); - } - return out; - } catch { - return {}; - } -} - -export async function checkConnectLabsEnv(opts: { envFile: string }): Promise { - const env = parseEnvFile(opts.envFile); - const token = env.LABS_MCP_TOKEN; - if (!token || token.startsWith('op://')) { - return { - tag: 'connect_labs_env', - status: 'FAIL', - message: 'LABS_MCP_TOKEN missing or unrendered. Run: op inject -i .env.tpl -o "$CLAUDE_PLUGIN_DATA/.env" --account dimagi.1password.com', - }; - } - return { tag: 'connect_labs_env', status: 'OK', message: 'LABS_MCP_TOKEN present' }; -} - -export async function checkConnectLabsReachable(opts: { token: string; url: string }): Promise { - try { - const res = await fetch(opts.url, { - method: 'POST', - headers: { 'Authorization': `Bearer ${opts.token}`, 'Content-Type': 'application/json' }, - body: JSON.stringify({ jsonrpc: '2.0', id: 1, method: 'initialize', params: {} }), - }); - if (res.status === 401) { - return { tag: 'connect_labs_mcp_reachable', status: 'FAIL', message: 'Labs MCP returned 401 — PAT invalid or revoked. Rotate via mcp_create_token.' }; - } - if (!res.ok) { - return { tag: 'connect_labs_mcp_reachable', status: 'FAIL', message: `Labs MCP returned ${res.status}` }; - } - return { tag: 'connect_labs_mcp_reachable', status: 'OK', message: 'Labs MCP reachable + PAT accepted' }; - } catch (e) { - return { tag: 'connect_labs_mcp_reachable', status: 'FAIL', message: `Cannot reach labs MCP: ${(e as Error).message}` }; - } -} - -export async function checkConnectLabsConnectOAuth(opts: { token: string; url: string }): Promise { - try { - const res = await fetch(opts.url, { - method: 'POST', - headers: { 'Authorization': `Bearer ${opts.token}`, 'Content-Type': 'application/json' }, - body: JSON.stringify({ - jsonrpc: '2.0', - id: 1, - method: 'tools/call', - params: { name: 'list_solicitations', arguments: {} }, - }), - }); - const body = await res.json() as { error?: { message: string }; result?: unknown }; - if (body.error?.message?.includes('PERMISSION_DENIED') || body.error?.message?.includes('connect')) { - return { - tag: 'connect_labs_connect_oauth', - status: 'WARN', - message: 'Labs accepts the PAT but the ace user has not completed Connect OAuth linkage. Have ace@dimagi-ai.com sign into labs once and authorize Connect.', - }; - } - if (body.error) { - return { tag: 'connect_labs_connect_oauth', status: 'FAIL', message: `list_solicitations error: ${body.error.message}` }; - } - return { tag: 'connect_labs_connect_oauth', status: 'OK', message: 'list_solicitations responded — Connect OAuth bridge is live' }; - } catch (e) { - return { tag: 'connect_labs_connect_oauth', status: 'FAIL', message: `Probe failed: ${(e as Error).message}` }; - } -} -``` - -- [ ] **Step 5: Wire the helpers into `bin/ace-doctor`** - -Locate the section in `bin/ace-doctor` that prints `[Connect]` (right before or after the OCS section). After it, add a new `[Connect Labs]` section that calls the three helpers via a small TypeScript invocation. Mirror the existing pattern — if `bin/ace-doctor` is already a bash script that shells into a TS helper for the Connect section, do the same for Connect Labs. If it's pure bash that does HTTP via curl, port the three checks to bash equivalents that call the same `tag/status/message` shape. - -If `bin/ace-doctor` is a thin bash wrapper around `tsx`, add a single block: - -```bash -echo "" -echo "[Connect Labs]" -npx tsx -e " -import { checkConnectLabsEnv, checkConnectLabsReachable, checkConnectLabsConnectOAuth } from './bin/checks/connect-labs'; -import { join } from 'node:path'; -const dataDir = process.env.CLAUDE_PLUGIN_DATA || process.env.HOME + '/.ace'; -const envFile = join(dataDir, '.env'); -const tokenEntry = (await checkConnectLabsEnv({ envFile })); -console.log(\` \${tokenEntry.tag.padEnd(36)} \${tokenEntry.status.padEnd(4)} \${tokenEntry.message}\`); -if (tokenEntry.status === 'OK') { - const env = require('fs').readFileSync(envFile, 'utf8'); - const token = env.match(/LABS_MCP_TOKEN=(.+)/)?.[1]?.trim() || ''; - const url = process.env.LABS_MCP_URL || 'https://labs.connect.dimagi.com/mcp/'; - for (const check of [checkConnectLabsReachable, checkConnectLabsConnectOAuth]) { - const r = await check({ token, url }); - console.log(\` \${r.tag.padEnd(36)} \${r.status.padEnd(4)} \${r.message}\`); - } -} -" -``` - -(Adjust to match the actual style and indentation of the existing `[Connect]` section. The principle is: read the env file, run the three checks in order, print one line per check.) - -- [ ] **Step 6: Run the test to verify it passes** - -Run: `npm test -- --run test/doctor/connect-labs.test.ts` - -Expected: 6/6 pass. - -- [ ] **Step 7: Smoke `/ace:doctor` locally** - -Run: `bin/ace-doctor` - -Expected output includes a `[Connect Labs]` section with three `OK/WARN/FAIL` lines. With no PAT yet provisioned, `connect_labs_env` should be FAIL with the actionable `op inject` hint. - -- [ ] **Step 8: Commit** - -```bash -git add bin/ace-doctor bin/checks/connect-labs.ts test/doctor/connect-labs.test.ts test/fixtures/empty.env test/fixtures/with-labs-token.env -git commit -m "feat(doctor): add [Connect Labs] section with token + OAuth probes - -Three checks, mirroring the [Connect] section pattern: -- connect_labs_env: LABS_MCP_TOKEN present and rendered (not op://) -- connect_labs_mcp_reachable: PAT accepted by labs MCP (distinguishes - 401 / network / OK) -- connect_labs_connect_oauth: list_solicitations probe distinguishes - PAT-level 401 from tool-level PERMISSION_DENIED (Connect OAuth missing - on the ace user's labs account) - -Class-level preventer for silent labs misconfig. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 6: Add three optional fields to the PDD template - -**Files:** -- Modify: `templates/pdd-template.md` -- Modify: `skills/idea-to-pdd/SKILL.md` - -- [ ] **Step 1: Append the new fields to the PDD template** - -Read `templates/pdd-template.md`. After the existing `total_budget:` field (or at the end of the PDD frontmatter / metadata block, wherever budget lives), add: - -```yaml -# ── Solicitation (optional, drives Phase 7) ──────────────────────────── -# These fields are read by `solicitation-create` to build the solicitation -# published to labs.connect.dimagi.com. Safe to omit — defaults below. -solicitation_type: EOI # 'EOI' (Expression of Interest) | 'RFP' (Request for Proposals) -solicitation_deadline_days: 14 # response window from publish date -llo_questions: # optional response template - - "Describe your prior experience deploying CHW programs in this archetype." - - "How will you recruit and train FLWs for this scope?" - - "What is your timeline for fielding once awarded?" - - "What is your supervision model?" - - "Do you have local-language capacity matching the target geography?" - - "Provide a budget breakdown for the proposed scope." - -# ── Preferred LLOs (optional, used by Phase 7 llo-invite) ────────────── -preferred_llos: [] # list of { name, contact_email, organization_slug } -``` - -(If `preferred_llos` already exists in the PDD template under another section, do not duplicate — only add the three new solicitation fields. Run `grep -n "preferred_llos" templates/pdd-template.md` first to verify.) - -- [ ] **Step 2: Update `idea-to-pdd` SKILL.md** - -In `skills/idea-to-pdd/SKILL.md`, locate the section that walks the agent through PDD field collection. Add a paragraph noting the three new optional fields and that they default sensibly: - -> **Solicitation fields (optional, Phase 7).** If the user names preferred LLOs or a non-default solicitation type/deadline, capture them. Defaults: `solicitation_type: EOI`, `solicitation_deadline_days: 14`, a generic 6-question response template. Skipping these is fine — Phase 7 will use the defaults. Always ask once whether a custom deadline or response template is needed; if not, leave the defaults. - -- [ ] **Step 3: Verify existing PDD fixtures still validate** - -Run: `npm test -- --run test/fixtures/` - -Expected: pass. The new fields are optional, so existing fixtures (`CRISPR-Test-001`, `-002`, `-003`) without them remain valid. - -- [ ] **Step 4: Commit** - -```bash -git add templates/pdd-template.md skills/idea-to-pdd/SKILL.md -git commit -m "feat(pdd): add three optional solicitation fields to PDD template - -solicitation_type (EOI|RFP, default EOI), solicitation_deadline_days -(default 14), llo_questions (default 6-question template). All optional; -existing PDDs without them continue to validate. Drives the new Phase 7 -solicitation-create skill. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 7: Implement `solicitation-create` skill - -**Files:** -- Create: `skills/solicitation-create/SKILL.md` -- Modify: `lib/artifact-manifest.ts` (add solicitation/draft.md, solicitation/published.md, opp.yaml.solicitation block) - -- [ ] **Step 1: Add the artifact manifest entries** - -In `lib/artifact-manifest.ts`, locate the `ARTIFACT_MANIFEST` array. After the last existing entry, add: - -```typescript -// ── Solicitation Management (Phase 7) ────────────────────────── - -{ - path: 'solicitation/draft.md', - producedBy: 'solicitation-create', - consumedBy: ['solicitation-create-eval'], - phase: 'design', // produced once, audit-only - required: false, - description: 'Solicitation payload pre-publish: title, type, scope, criteria, response template, deadline. Audit trail for what solicitation-create proposed before posting to labs.', -}, -{ - path: 'solicitation/published.md', - producedBy: 'solicitation-create', - consumedBy: ['solicitation-monitor', 'solicitation-review', 'solicitation-create-eval', 'llo-invite'], - phase: 'design', - required: false, - description: 'Snapshot of the published solicitation: solicitation_id, public_url, manage_url, deadline, criteria. Read by every downstream Phase 7 skill and by Phase 7 llo-invite for the URL to email.', -}, -{ - path: 'solicitation/invitations.md', - producedBy: 'llo-invite', - consumedBy: ['solicitation-monitor', 'solicitation-review-eval'], - phase: 'design', - required: false, - description: 'Per-recipient log: who got emailed the solicitation URL, when, and send status. Empty when PDD has no preferred_llos.', -}, -{ - path: 'solicitation/responses/', - producedBy: 'solicitation-monitor', - consumedBy: ['solicitation-review'], - phase: 'design', - required: false, - description: 'One file per solicitation response, written incrementally as responses arrive. Each file contains the response content plus metadata returned by labs.', -}, -{ - path: 'solicitation/review/scoring-rubric.md', - producedBy: 'solicitation-review', - consumedBy: ['solicitation-review-eval'], - phase: 'design', - required: false, - description: 'Per-response, per-criterion scores produced by solicitation-review.', -}, -{ - path: 'solicitation/review/recommendation.md', - producedBy: 'solicitation-review', - consumedBy: ['solicitation-review-eval'], - phase: 'design', - required: false, - description: 'Ranked candidates + reasoning. Input to the HITL gate before award_response is called.', -}, -{ - path: 'solicitation/award-record.md', - producedBy: 'solicitation-review', - consumedBy: ['solicitation-review-eval', 'opp-closeout'], - phase: 'design', - required: false, - description: 'Written when award_response is called (success or failure). Includes response_id, awarded_at, awarded_org_slug, and any error envelope on failure.', -}, -``` - -Also drop the existing entry for `connect-setup/invites.md` (the old `llo-invite` artifact). Locate it in the manifest (`grep -n "connect-setup/invites" lib/artifact-manifest.ts`) and delete that block. - -- [ ] **Step 2: Run the manifest validation test** - -Run: `npm test -- --run test/fixtures/artifact-manifest.test.ts` - -Expected: PASS (or fail with a clear "skill `solicitation-create` referenced by artifact but no SKILL.md found" — that's the next step). If it passes, the validation isn't strict enough to catch missing skills; that's fine, we'll add the skill next. - -- [ ] **Step 3: Write the SKILL.md** - -Create `skills/solicitation-create/SKILL.md`: - -```markdown ---- -name: solicitation-create -description: > - Phase 7 step 1 (auto, default run). Translate the approved PDD into a - solicitation payload, derive evaluation criteria via labs's - generate_criteria endpoint, and publish the solicitation via the - connect-labs MCP. Captures solicitation_id and public_url for downstream - skills. ---- - -# Solicitation Create - -Phase 7 default-run skill. Builds and publishes the solicitation in one -shot — ACE always publishes, never drafts. The solicitation can be edited -post-publish via the labs UI without affecting responses. - -## Inputs - -- `ACE//inputs/pdd.md` — approved PDD (scope, success criteria, total_budget, optional solicitation fields) -- `ACE//opp.yaml` — program_id, archetype, opp display name - -## Process - -1. **Read the PDD.** Extract the fields per the table below. For optional - PDD fields (`solicitation_type`, `solicitation_deadline_days`, - `llo_questions`), use defaults when missing. - -2. **Build the solicitation payload:** - - | Field | Source | - |---|---| - | `title` | `: ` | - | `solicitation_type` | PDD `solicitation_type` (default `EOI`) | - | `description` | PDD `intervention_summary` + `target_flw_profile` (concatenate with a newline) | - | `scope_of_work` | PDD `visit_structure` + `success_criteria` | - | `budget` | PDD `total_budget` | - | `deadline` | `now() + (solicitation_deadline_days || 14)` days, ISO-8601 | - | `evaluation_criteria` | derived by `generate_criteria` (see step 3) | - | `response_template` | PDD `llo_questions` or the default 6-question set | - | `status` | `published` | - | `program_id` | `opp.yaml.program_id` | - -3. **Derive evaluation criteria.** Call: - - ``` - mcp__connect-labs__generate_criteria( - scope_text: , - archetype: - ) - ``` - - Capture the structured rubric (criteria + weights) into the payload's - `evaluation_criteria` field. - -4. **Write the draft for traceability.** Save the full payload + the AI-derived rubric to: - - ``` - ACE//solicitation/draft.md - ``` - -5. **Publish.** Call: - - ``` - mcp__connect-labs__create_solicitation() - ``` - - Capture the returned `solicitation_id`, `public_url`, and `manage_url`. - -6. **Write `published.md`.** Save: - - ``` - ACE//solicitation/published.md - ``` - - Body includes the full payload, the returned IDs/URLs, and the deadline - in absolute ISO-8601 form. - -7. **Update `opp.yaml`.** Add a `solicitation:` block: - - ```yaml - solicitation: - solicitation_id: - public_url: - manage_url: - type: - published_at: - deadline: - status: open - awarded: - response_id: null - awarded_at: null - awarded_org_slug: null - awarded_org_name: null - awarded_contact_email: null - award_amount: null - ``` - - Also stub a `selected_llo:` block: - - ```yaml - selected_llo: - org_slug: null - contact_email: null - source: null - response_id: null - ``` - - These will be populated by `solicitation-review` on award. - -## Error handling - -- **Labs MCP unreachable** (proxy returns transport error): halt with a - doctor-style message pointing at `/ace:doctor`'s `[Connect Labs]` - section. -- **`create_solicitation` returns 4xx**: preserve `draft.md`, halt, surface - the error verbatim. Do not retry — most 4xx is a payload schema mismatch - or the program_id is wrong. -- **`generate_criteria` returns degenerate output** (empty list, single - criterion): write what was returned, mark `evaluation_criteria` as - `needs-review` in `published.md`, still publish. Criteria are editable - post-publish via labs UI without losing responses. - -## Output - -- `ACE//solicitation/draft.md` (audit) -- `ACE//solicitation/published.md` (live state) -- `opp.yaml.solicitation.{solicitation_id, public_url, deadline, status: open}` populated -- `opp.yaml.selected_llo.*` stubbed -``` - -- [ ] **Step 4: Run the manifest validation test** - -Run: `npm test -- --run test/fixtures/artifact-manifest.test.ts` - -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add skills/solicitation-create/ lib/artifact-manifest.ts -git commit -m "feat(skill): add solicitation-create (Phase 7, default run) - -Translates the approved PDD into a solicitation payload, derives -evaluation criteria via labs's generate_criteria, and publishes via -mcp__connect-labs__create_solicitation. Writes draft.md (audit) and -published.md (live state), populates opp.yaml.solicitation. - -Manifest entries: drops connect-setup/invites.md (moves to Phase 7 in -a later commit), adds solicitation/* artifacts. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 8: Add `solicitation-create-eval` rubric - -**Files:** -- Create: `skills/solicitation-create-eval/SKILL.md` - -- [ ] **Step 1: Read the existing eval rubric pattern** - -Read `skills/connect-program-setup-eval/SKILL.md` to align style and structure with the existing `-eval` family. - -- [ ] **Step 2: Write the SKILL.md** - -Create `skills/solicitation-create-eval/SKILL.md`: - -```markdown ---- -name: solicitation-create-eval -description: > - Provisional LLM-as-Judge rubric for solicitation-create. Grades whether - the published solicitation faithfully reflects the PDD's intervention - scope, has complete fields, and ships a sensible deadline. Calibrated - per skills/eval-calibration once 3+ real solicitations have shipped. ---- - -# Solicitation Create — Eval - -Cross-artifact LLM-as-Judge eval. Reads the source PDD plus -`solicitation/draft.md` and `solicitation/published.md`, scores the -result, and writes a verdict YAML in the shared QA/eval shape so -`opp-eval` can aggregate it. - -**Status:** Provisional. Calibration TBD until 3+ real solicitations have -shipped — see `skills/eval-calibration/SKILL.md`. - -## Inputs - -- `ACE//inputs/pdd.md` -- `ACE//solicitation/draft.md` -- `ACE//solicitation/published.md` - -## Rubric - -Score each dimension 0-10. Hard-deduct rules listed inline. - -1. **PDD-fidelity (weight 0.4).** Does the solicitation's `description` - and `scope_of_work` actually carry the PDD's intervention summary, - target FLW profile, and visit structure forward? Hard-deduct -3 if - either field paraphrases away a PDD constraint (e.g. PDD says "weekly - visits" and solicitation says "regular visits"). Hard-deduct -5 if a - key PDD element is missing entirely. - -2. **Field completeness (weight 0.2).** All required fields present? - `evaluation_criteria` non-empty (or marked `needs-review`)? - `response_template` non-empty? - -3. **Deadline sanity (weight 0.1).** Deadline is `now + 7..30 days`. Hard- - deduct -5 if deadline is in the past or > 90 days out. - -4. **Criteria alignment (weight 0.3).** Do the evaluation criteria reflect - what the PDD actually cares about (e.g. archetype-specific capabilities, - geographic fit, language capacity)? Penalize generic criteria like - "demonstrate experience" when the PDD has specific archetype demands. - -## Verdict shape - -Write `verdicts/solicitation-create-.yaml` per the `lib/verdict-schema.ts` -shape (see `skills/README.md` § QA vs Eval). - -```yaml -schema_version: 1 -skill: solicitation-create -mode: deep -overall_score: <0-10 weighted> -overall_verdict: pass | fail | partial -dimensions: - - { name: pdd-fidelity, score: <0-10>, weight: 0.4, notes: "..." } - - { name: field-completeness, score: <0-10>, weight: 0.2, notes: "..." } - - { name: deadline-sanity, score: <0-10>, weight: 0.1, notes: "..." } - - { name: criteria-alignment, score: <0-10>, weight: 0.3, notes: "..." } -hard_deduct_triggered: [ ... ] -recommendations: [ ... ] -``` -``` - -- [ ] **Step 3: Run vitest to verify the manifest doesn't break** - -Run: `npm test -- --run test/fixtures/artifact-manifest.test.ts` - -Expected: PASS. - -- [ ] **Step 4: Commit** - -```bash -git add skills/solicitation-create-eval/ -git commit -m "feat(skill): add solicitation-create-eval rubric (provisional) - -PDD-fidelity, field completeness, deadline sanity, criteria alignment. -Provisional rubric — calibration TBD per skills/eval-calibration once 3+ -real solicitations have shipped. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 9: Implement `solicitation-monitor` skill - -**Files:** -- Create: `skills/solicitation-monitor/SKILL.md` - -- [ ] **Step 1: Write the SKILL.md** - -Create `skills/solicitation-monitor/SKILL.md`: - -```markdown ---- -name: solicitation-monitor -description: > - Phase 7 recurring skill. Polls labs for new responses while the - solicitation is open, writes one file per response to - ACE//solicitation/responses/, and appends a tick line to the - observation log. Three modes: --quick (count only), --monitor (full - pull, default), --close (final pull when deadline passes). ---- - -# Solicitation Monitor - -Recurring skill that runs while `opp.yaml.solicitation.status == open`. -Mirrors the `ocs-chatbot-qa` recurring pattern (`--quick`/`--monitor`). - -## Modes - -- **`--quick`**: just count responses; do not pull bodies. Cheap. - Suitable for the orchestrator's recurring check. -- **`--monitor`** (default): for each new response, pull the body and - write `solicitation/responses/.md`. -- **`--close`**: same as `--monitor` but also flips `opp.yaml.solicitation.status` - from `open` to `closed`. Run once when the deadline passes. - -## Inputs - -- `opp.yaml.solicitation.solicitation_id` -- `opp.yaml.solicitation.deadline` - -## Process (--monitor) - -1. **List responses.** Call: - - ``` - mcp__connect-labs__list_responses(solicitation_id: ) - ``` - -2. **Diff against local state.** Read existing files in - `ACE//solicitation/responses/` (each is named - `.md`). For each new response: - - ``` - mcp__connect-labs__get_response(response_id: ) - ``` - - Write the body to `solicitation/responses/.md`. Body - includes: response_id, submitted_at, organization, contact, the answers - to each question in the response template, and any attachments. - -3. **Summarize inflow.** Compute: - - Total responses received - - Responses received since the last monitor tick - - Time-to-deadline (delta between `now()` and `solicitation.deadline`) - - If `solicitation/invitations.md` exists: list of invitees who have - not yet responded. - -4. **Append observation.** Append a single line to - `ACE//comms-log/observations.md`: - - ``` - solicitation-monitor total responses (<+N> new since last tick), h to deadline - ``` - -5. **Update `opp.yaml`.** If mode is `--close` AND `now() > deadline`, set - `opp.yaml.solicitation.status: closed`. - -## Process (--quick) - -Steps 1, 3 (counts only), 4. Skip body pulls and per-response file writes. - -## Error handling - -Read-only skill from labs's perspective; failures are non-fatal. -Log "monitor failed: " to `comms-log/observations.md` and exit -without halting the orchestrator. Next tick will retry. - -## Output - -- New files in `ACE//solicitation/responses/` -- Tick line in `ACE//comms-log/observations.md` -- (`--close` only) `opp.yaml.solicitation.status: closed` - -## No eval companion - -`solicitation-monitor` is read-only and recurring. Quality bar is captured -by `solicitation-review-eval` downstream. -``` - -- [ ] **Step 2: Run the manifest validation** - -Run: `npm test -- --run test/fixtures/artifact-manifest.test.ts` - -Expected: PASS. - -- [ ] **Step 3: Commit** - -```bash -git add skills/solicitation-monitor/ -git commit -m "feat(skill): add solicitation-monitor (Phase 7 recurring) - -Polls labs for responses, writes one file per response, appends a tick -line to comms-log/observations.md. Three modes: --quick, --monitor -(default), --close (flip status to closed when deadline passes). - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 10: Implement `solicitation-review` skill (manual, with HITL gate) - -**Files:** -- Create: `skills/solicitation-review/SKILL.md` - -- [ ] **Step 1: Write the SKILL.md** - -Create `skills/solicitation-review/SKILL.md`: - -```markdown ---- -name: solicitation-review -description: > - Phase 7 manual skill. Reads all solicitation responses, scores each - against the published rubric, presents a recommendation to the human, - and (after explicit HITL approval) calls award_response and populates - opp.yaml.selected_llo. The only path that unblocks Phase 8. ---- - -# Solicitation Review - -Manual skill — never runs in default `/ace:run`. Only via: - -``` -/ace:step solicitation-review --opp -``` - -This is the only skill that calls `award_response` (irreversible) and the -only skill that populates `opp.yaml.selected_llo` (which gates Phase 8). - -## Inputs - -- `opp.yaml.solicitation.solicitation_id` -- `opp.yaml.solicitation.public_url` -- `ACE//solicitation/published.md` (rubric) -- `ACE//solicitation/responses/*.md` (all responses) - -## Process - -1. **Pull all responses fresh.** Call: - - ``` - mcp__connect-labs__list_responses(solicitation_id: ) - ``` - - For each response, call `get_response` even if the local cache exists - (responses may have been edited). - -2. **Score each response.** Read the rubric from `published.md` (the - `evaluation_criteria` block). For each response, score every criterion - on its declared scale (typically 1-10) and compute a weighted total. - -3. **Optionally write to labs.** For each response, call: - - ``` - mcp__connect-labs__create_review( - response_id: , - scores: { : , ... }, - notes: "" - ) - ``` - - This puts ACE's scores in the labs audit trail. Idempotent — call - `list_reviews` first and skip if a review by `ace@dimagi-ai.com` already - exists for this response. - -4. **Write `scoring-rubric.md`.** Save the per-response, per-criterion - scores to: - - ``` - ACE//solicitation/review/scoring-rubric.md - ``` - -5. **Write `recommendation.md`.** Save: - - ``` - ACE//solicitation/review/recommendation.md - ``` - - Body: ranked list of candidates with reasoning. Top candidate gets a - `Recommended awardee` callout. - -6. **HITL gate.** Present `recommendation.md` to the human and ask: - - > "Confirm awarding response_id= ($) to ? Reply - > 'award $' to confirm, or 'cancel' to halt." - - Wait for an explicit reply. **Do not call `award_response` without one.** - If the human picks a different response_id or amount, use those. - -7. **Call `award_response`.** On confirm: - - ``` - mcp__connect-labs__award_response( - response_id: , - amount: - ) - ``` - -8. **Write `award-record.md`.** - - ``` - ACE//solicitation/award-record.md - ``` - - Body: `response_id`, `awarded_at`, `awarded_org_slug`, `awarded_org_name`, - `awarded_contact_email`, `award_amount`, and (if labs returned an error) - `status: failed` + the error envelope. - -9. **Populate `opp.yaml.selected_llo`.** Only on a successful award: - - ```yaml - selected_llo: - org_slug: - contact_email: - source: solicitation - response_id: - ``` - - Also flip `opp.yaml.solicitation.status: awarded` and populate the - `solicitation.awarded.*` block. - -## Error handling - -- **HITL gate timeout / no reply**: do not call `award_response`. Do not - mutate `opp.yaml`. Exit cleanly so the human can re-run the skill. -- **`award_response` returns 4xx after approval**: write `award-record.md` - with `status: failed` and the error envelope. **Do not** populate - `selected_llo` (Phase 8 stays gated). Surface the error to the human - and suggest contacting a labs admin if the award call must succeed - out-of-band. -- **`list_reviews` shows ACE already reviewed all responses**: skip the - scoring step (we don't re-score), proceed to step 4 from the existing - reviews. - -## Output - -- `ACE//solicitation/review/scoring-rubric.md` -- `ACE//solicitation/review/recommendation.md` -- `ACE//solicitation/award-record.md` -- `opp.yaml.selected_llo.*` populated (only on success) -- `opp.yaml.solicitation.status: awarded` (only on success) -``` - -- [ ] **Step 2: Commit** - -```bash -git add skills/solicitation-review/ -git commit -m "feat(skill): add solicitation-review (Phase 7 manual, HITL-gated) - -Scores all responses against the published rubric, presents a -recommendation, and (after explicit human approval) calls award_response -and populates opp.yaml.selected_llo. The only path that unblocks Phase 8. - -The award call is gated on a literal 'award \$' -reply from the human — no auto-award. Never runs in default /ace:run. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 11: Add `solicitation-review-eval` rubric - -**Files:** -- Create: `skills/solicitation-review-eval/SKILL.md` - -- [ ] **Step 1: Write the SKILL.md** - -Create `skills/solicitation-review-eval/SKILL.md`: - -```markdown ---- -name: solicitation-review-eval -description: > - Provisional LLM-as-Judge rubric for solicitation-review. Compares ACE's - top-ranked recommendation against the human's actual award decision. - Detection-rate metric: did ACE's recommended awardee match the human's - pick? Calibrated per skills/eval-calibration once 3+ awards have shipped. ---- - -# Solicitation Review — Eval - -Cross-artifact LLM-as-Judge eval. Compares ACE's recommendation in -`solicitation/review/recommendation.md` against the actual outcome -in `solicitation/award-record.md`. - -**Status:** Provisional. Calibration TBD until 3+ real awards have shipped. - -## Inputs - -- `ACE//solicitation/review/scoring-rubric.md` -- `ACE//solicitation/review/recommendation.md` -- `ACE//solicitation/award-record.md` -- `ACE//solicitation/published.md` (rubric reference) - -## Rubric - -1. **Recommendation alignment (weight 0.4).** Did ACE's top-ranked - recommendation match the awarded response_id? Score 10 if yes, 5 if - awardee was in ACE's top 3, 0 otherwise. Hard-deduct -3 if - `award-record.md` has `status: failed` while `selected_llo` is populated - (data-integrity violation — that path should be impossible per the - skill's contract, and any verdict must flag it). - -2. **Scoring rationale quality (weight 0.3).** Are the scores in - `scoring-rubric.md` traceable to the criteria in `published.md`? Are - the per-criterion notes specific or generic? Penalize one-line "good - experience" justifications. - -3. **Recommendation specificity (weight 0.2).** Does `recommendation.md` - surface concrete differentiators between candidates, or is it a - ranked list with no narrative? Higher score for surfacing the close - calls. - -4. **Edge case coverage (weight 0.1).** Did the recommendation flag any - responses that were structurally unscoreable (incomplete answers, - wrong-archetype)? Penalize silent skipping. - -## Verdict shape - -Write `verdicts/solicitation-review-.yaml` per `lib/verdict-schema.ts`. -``` - -- [ ] **Step 2: Commit** - -```bash -git add skills/solicitation-review-eval/ -git commit -m "feat(skill): add solicitation-review-eval rubric (provisional) - -Detection-rate metric (recommendation alignment with actual award) plus -scoring rationale, specificity, and edge-case coverage. Calibration TBD -until 3+ real awards have shipped. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 12: Transform `llo-invite` skill (move to Phase 7, rewrite behavior) - -**Files:** -- Modify: `skills/llo-invite/SKILL.md` (substantial rewrite) - -- [ ] **Step 1: Replace the SKILL.md content** - -Replace the entire content of `skills/llo-invite/SKILL.md` with: - -```markdown ---- -name: llo-invite -description: > - Phase 7 step 2 (auto, default run). For each PDD-named candidate LLO, - send an invitation email with the public solicitation URL. No-op when - the PDD has no preferred_llos (long-term solicitation flow). Makes no - Connect API calls — those happen for the awardee only, in - llo-onboarding (Phase 8). ---- - -# LLO Invite - -Phase 7 default-run skill. Runs after `solicitation-create` has captured -`opp.yaml.solicitation.public_url`. Sends each PDD-named candidate LLO an -email containing the solicitation URL, deadline, and a scope summary. - -This skill replaces the previous Phase-7 (was Phase-6) `llo-invite` that -prepared a Connect-side invite roster. The Connect program-level invite -(`connect_send_llo_invite`) is now `llo-onboarding`'s responsibility and -fires only for the awardee. - -## Inputs - -- `ACE//inputs/pdd.md` (specifically `preferred_llos:`) -- `opp.yaml.solicitation.public_url` -- `opp.yaml.solicitation.deadline` - -## Process - -1. **Read `preferred_llos`** from the PDD. - -2. **If empty:** write `ACE//solicitation/invitations.md`: - - ```markdown - # Solicitation Invitations - - Status: empty (long-term solicitation flow — no PDD-named candidates). - The solicitation is publicly listed at ; orgs find it on the - labs portal. - ``` - - Exit successfully. - -3. **For each preferred LLO**, compose an email: - - ``` - Subject: Invitation to respond — - To: - - Hi , - - - - We are inviting your organization to respond to a solicitation for - . The full description, scope of work, and response template - are at: - - - - Responses are due by (UTC). - - To respond, sign into labs.connect.dimagi.com with your organization - account, open the solicitation linked above, and click "Submit Response." - - Questions? Reply to this email. - - - ``` - - Send via the `email-communicator` skill (uses ACE's Gmail account - `ace@dimagi-ai.com`). - -4. **Log every send** to `ACE//solicitation/invitations.md`: - - ```markdown - # Solicitation Invitations - - Solicitation: - Deadline: - - ## Recipients - - | Recipient | Org | Sent at | Status | - |---|---|---|---| - | | | | sent | - | | | | failed: | - ``` - -## Review-mode gate - -If invoked under `/ace:run --review` mode, present the prepared email list -to the human before sending and pause. Default mode sends without a gate -(the orchestrator's gate is the Phase 7→8 boundary, not here). - -## Error handling - -- Per-recipient email failure: log `status: failed: ` for that - row, continue with the rest. -- All recipients fail: halt with a surfaced error. -- PDD has no `preferred_llos`: no-op per Step 2 above. -- `opp.yaml.solicitation.public_url` empty: halt with "run - solicitation-create first" message. - -## Output - -- `ACE//solicitation/invitations.md` — recipient log -``` - -- [ ] **Step 2: Verify the manifest now matches** - -Run: `npm test -- --run test/fixtures/artifact-manifest.test.ts` - -Expected: PASS. The `solicitation/invitations.md` entry from Task 7 lists `producedBy: 'llo-invite'`, which now matches. - -- [ ] **Step 3: Commit** - -```bash -git add skills/llo-invite/ -git commit -m "refactor(skill): llo-invite — move to Phase 7, rewrite for solicitations - -Previously Phase 7 (was Phase 8 after renumbering): identified PDD-named -candidates and prepared a Connect-side invite roster. Now Phase 7: same -candidate identification, but emails each one a link to the public -solicitation URL. Makes no Connect API calls — the Connect program-level -invite fires only for the awardee inside llo-onboarding. - -Empty PDD preferred_llos → no-op (long-term flow: solicitation is public, -orgs find it via the labs portal). - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 13: Update `llo-onboarding` to read `selected_llo` - -**Files:** -- Modify: `skills/llo-onboarding/SKILL.md` - -- [ ] **Step 1: Read the current SKILL.md** - -Run: `cat skills/llo-onboarding/SKILL.md` - -Note where it currently reads from `connect-setup/invites.md` (the old roster). The change replaces that with reading `opp.yaml.selected_llo`. - -- [ ] **Step 2: Edit the inputs and process** - -In `skills/llo-onboarding/SKILL.md`: - -(a) In the Inputs section, replace any reference to `connect-setup/invites.md` with: - -``` -- `opp.yaml.selected_llo` — populated by Phase 7 solicitation-review on award. - Halt with a clear error if `org_slug` is null (Phase 8 must not start - without an awardee). -``` - -(b) In Process step 1 (or wherever the roster gets read), replace the roster-loading logic with: - -``` -1. Read `opp.yaml.selected_llo`. If `org_slug` is null: - ``` - FATAL: Phase 8 cannot start — opp.yaml.selected_llo.org_slug is empty. - Run `/ace:step solicitation-review --opp ` to score responses - and award an awardee. The orchestrator's pre-Phase-7 gate should have - caught this; if you're seeing this from a manual /ace:step invocation, - the gate was bypassed. - ``` - Halt. -2. Use `selected_llo.org_slug` as the target for `connect_send_llo_invite` - and `selected_llo.contact_email` as the recipient for the ACE - onboarding email. -``` - -(c) Drop any prose that talks about iterating a multi-LLO roster — Phase 8 onboards exactly one awardee. - -- [ ] **Step 3: Smoke a fixture** - -Run: `npm test -- --run test/fixtures/` - -Expected: PASS. Fixtures don't currently set `selected_llo`, but the SKILL.md change is prose; tests don't execute the skill. - -- [ ] **Step 4: Commit** - -```bash -git add skills/llo-onboarding/ -git commit -m "refactor(skill): llo-onboarding reads opp.yaml.selected_llo - -Replaces the connect-setup/invites.md roster read with a single -selected_llo lookup populated by Phase 7 solicitation-review. Fails fast -with an actionable message if Phase 8 is reached without an awardee. - -The Connect program-level invite (connect_send_llo_invite) and the ACE -onboarding email both target selected_llo.org_slug / -selected_llo.contact_email. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 14: Add the `solicitation-management` subagent - -**Files:** -- Create: `agents/solicitation-management.md` - -- [ ] **Step 1: Read an existing subagent for shape** - -Run: `cat agents/closeout.md` (it's the simplest subagent in the codebase). - -- [ ] **Step 2: Write the new agent** - -Create `agents/solicitation-management.md`: - -```markdown ---- -name: solicitation-management -description: > - Phase 7 of the CRISPR-Connect lifecycle: publish a solicitation derived - from the PDD, invite PDD-named candidate LLOs to it by email, and stop. - The review-and-award lifecycle continues via the manually-invoked - solicitation-review skill (gated on a human-in-the-loop checkpoint - before award_response is called). Phase 8 starts once an awardee is - recorded in opp.yaml.selected_llo. -model: inherit -phase: solicitation-management -phase_display: Solicitation Management -phase_ordinal: 6 -skills: - - { name: solicitation-create, has_judge: true, eval_skill: solicitation-create-eval } - - { name: llo-invite, has_judge: false } -recurring_skills: - - { name: solicitation-monitor, has_judge: false } -manual_skills: - - { name: solicitation-review, has_judge: true, eval_skill: solicitation-review-eval } ---- - -# Solicitation Management Agent (Phase 7) - -You run the solicitation phase of a CRISPR-Connect opportunity. By the -time this phase starts, Phases 1–5 have produced an approved PDD, -deployed CommCare apps, a configured Connect opportunity, a quality-gated -OCS chatbot, and per-opp training materials. The opportunity is fully -prepared on the ACE side — what's missing is an LLO to run it. - -This phase publishes a solicitation that potential LLOs can respond to. -In default `/ace:run` mode, you publish the solicitation and email the -PDD-named candidate LLOs (if any), then stop. The review-and-award -lifecycle requires explicit human approval and is run manually via -`/ace:step solicitation-review`. - -## Workflow (default run) - -### Step 1: Solicitation Create - -Run the `solicitation-create` skill. It translates the PDD into a -solicitation payload, derives evaluation criteria via labs's -`generate_criteria` endpoint, and publishes the solicitation via the -`connect-labs` MCP. Captures `solicitation_id` and `public_url` into -`opp.yaml.solicitation`. - -- Input: approved PDD, opp.yaml (program_id, total_budget) -- Output: `solicitation/published.md`, `opp.yaml.solicitation` populated -- Eval (unless `--no-evals`): `solicitation-create-eval` - -### Step 2: LLO Invite - -Run the `llo-invite` skill. For each PDD-named candidate LLO, send an -invitation email pointing at the public solicitation URL. - -- Input: PDD `preferred_llos`, `opp.yaml.solicitation.public_url` -- Output: `solicitation/invitations.md` -- No-op when PDD has no `preferred_llos` (long-term solicitation flow). - -### Recurring: Solicitation Monitor - -While `opp.yaml.solicitation.status == open`, the orchestrator's recurring -loop calls `solicitation-monitor` to pull new responses, write one file -per response to `solicitation/responses/`, and append a tick line to -`comms-log/observations.md`. - -This loop runs OUTSIDE the default `/ace:run` invocation (which exits -after Step 2). It is meant to be scheduled (cron or manual `/ace:step -solicitation-monitor`) until the deadline passes. - -### Manual: Solicitation Review - -Once the deadline has passed (or whenever a human decides to award), the -human runs: - -``` -/ace:step solicitation-review --opp -``` - -This skill scores all responses, presents a recommendation, gates on -explicit human approval, then calls `award_response` and populates -`opp.yaml.selected_llo`. Only this skill unblocks Phase 8. - -## Pause-points - -- **End of Step 2** (default `/ace:run` exit): `/ace:run` halts here. Phase 8 - cannot start until `solicitation-review` has populated `selected_llo`. -- **Inside `solicitation-review`**: HITL gate before `award_response`. - -## Outputs at phase end (default run) - -- `ACE//solicitation/draft.md` -- `ACE//solicitation/published.md` -- `ACE//solicitation/invitations.md` -- `opp.yaml.solicitation.{solicitation_id, public_url, deadline, status: open}` -- `opp.yaml.selected_llo.*` (stubbed, null until award) - -## Completion - -The phase is "complete" in the orchestrator's sense after Step 2. The -recurring monitor and manual review are NOT part of phase completion — -they happen post-`/ace:run` and gate Phase 8 entry. -``` - -- [ ] **Step 3: Commit** - -```bash -git add agents/solicitation-management.md -git commit -m "feat(agent): add solicitation-management subagent (Phase 7) - -Owns the new Phase 7: solicitation-create + llo-invite (auto, default -run), solicitation-monitor (recurring), solicitation-review (manual, -HITL-gated). Default /ace:run halts at the end of llo-invite; Phase 8 is -gated on opp.yaml.selected_llo being populated by solicitation-review. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 15: Update `ace-orchestrator` phases block and pause-points - -**Files:** -- Modify: `agents/ace-orchestrator.md` - -- [ ] **Step 1: Locate the `phases:` block** - -Run: `grep -n "^phases:" agents/ace-orchestrator.md` - -Read the block (lines 66-105 in the current file). Note the existing entries. - -- [ ] **Step 2: Rewrite the phases block** - -Replace the existing `phases:` block in `agents/ace-orchestrator.md` with: - -```yaml -phases: - design-review: # Phase 1 - # (existing entries, unchanged) - commcare-setup: # Phase 2 - # (existing entries, unchanged) - connect-setup: # Phase 3 - # (existing entries, unchanged) - ocs-setup: # Phase 4 - # (existing entries, unchanged) - qa-and-training: # Phase 5 - # (existing entries, unchanged) - solicitation-management: # Phase 7 (NEW) - solicitation-create: pending - llo-invite: pending - # solicitation-monitor and solicitation-review run outside /ace:run. - execution-management: # Phase 8 (was llo-management, Phase 7) - llo-onboarding: pending - llo-uat: pending - llo-launch: pending - closeout: # Phase 9 (was Phase 8) - # (existing entries, unchanged) -``` - -(Preserve the existing pending/skip values inside each unchanged phase — the example above shows only the structure, not literal replacement of inner entries. Use `Edit` with surgical replacements for each block; do not blow away the inner state.) - -- [ ] **Step 3: Update the pause-points list** - -Locate the pause-points list in `agents/ace-orchestrator.md` (around lines 274-307 — search for "Phase 5→6 transition" and "After `llo-invite`"). Replace the existing pause-points text with: - -```markdown -**Pause-points:** -- After `idea-to-pdd` (Phase 1) — PDD must be approved before building apps -- After `app-deploy` (Phase 2) — apps must be verified before Connect setup -- After `ocs-chatbot-eval --deep` (Phase 4) — OCS quality must clear pre-launch bar -- **Phase 7 → 7 boundary** — `/ace:run` halts here in default mode. Phase 8 - cannot start until `opp.yaml.selected_llo.org_slug` is populated, which - only happens via the manual `solicitation-review` skill. This is the new - external-communication boundary (Phase 8 sends the first email to the - awardee LLO). -- After `solicitation-review` (Phase 7, manual) — HITL gate before - `award_response` is called. -- After `llo-launch` (Phase 8) — activation verified before monitoring -- Phase 5 → 6 is **no longer mandatory pause**. Solicitation publication - is passive (labs portal listing); the active-outreach boundary moves - to Phase 7 → 7. -``` - -- [ ] **Step 4: Update the agent dispatch references** - -Run: `grep -n "Agent(llo-manager)\|Agent('llo-manager')" agents/ace-orchestrator.md` - -For each match, replace with `Agent(execution-manager)` / `Agent('execution-manager')`. Then add a new dispatch reference for `solicitation-management` between Phase 5 and Phase 8 dispatch sites: - -```markdown -After Phase 5 (qa-and-training) completes, dispatch Phase 7: - - Agent(solicitation-management) - -Wait for it to return. After Phase 7 completes, the orchestrator HALTS in -default mode. The next phase requires manual intervention -(/ace:step solicitation-review). Resume Phase 8 only after -opp.yaml.selected_llo.org_slug is populated: - - Agent(execution-manager) -``` - -- [ ] **Step 5: Update prose references to phase numbering** - -Throughout `ace-orchestrator.md`, update prose references: -- "Phase 5→6 transition: always pause" → moved to Phase 7→8 (covered above) -- "Phase 7 is where LLOs first hear from ACE" → "Phase 8 is where the awardee LLO first hears from ACE; Phase 7 publishes the public solicitation but does not contact specific LLOs unless the PDD names preferred_llos." - -- [ ] **Step 6: Commit** - -```bash -git add agents/ace-orchestrator.md -git commit -m "feat(orchestrator): wire Phase 7 (solicitation-management) - -phases: block now lists solicitation-management between qa-and-training -and execution-management. Pause-points: Phase 5→6 no longer mandatory; -Phase 7→8 is the new external-comms boundary. Agent dispatch now calls -Agent(solicitation-management) and Agent(execution-manager). - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 16: Update commands/run.md and commands/step.md - -**Files:** -- Modify: `commands/run.md` -- Modify: `commands/step.md` - -- [ ] **Step 1: Find references to llo-manager** - -Run: `grep -n "llo-manager\|llo_management\|llo-management" commands/` - -- [ ] **Step 2: Apply replacements** - -For each match: `llo-manager` → `execution-manager`, `llo_management` → `execution_management`, `llo-management` → `execution-management`. - -If `commands/step.md` documents the `/ace:step` command's valid skill list, add the four new Phase 7 skills (`solicitation-create`, `llo-invite`, `solicitation-monitor`, `solicitation-review`) and the two new eval skills. - -- [ ] **Step 3: Commit** - -```bash -git add commands/run.md commands/step.md -git commit -m "chore(commands): rename llo-manager → execution-manager in command docs - -Also adds the four new Phase 7 skills to /ace:step's documented skill list. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 17: Add CRISPR-Test-004-Solicitation fixture - -**Files:** -- Create: `test/fixtures/CRISPR-Test-004-Solicitation/inputs/pdd.md` -- Create: `test/fixtures/CRISPR-Test-004-Solicitation/opp.yaml` -- Modify: `test/fixtures/artifact-manifest.test.ts` - -- [ ] **Step 1: Read an existing fixture for shape** - -Run: `ls test/fixtures/CRISPR-Test-001*/`, then read its `pdd.md` and `opp.yaml` for structure. - -- [ ] **Step 2: Create the fixture** - -Create `test/fixtures/CRISPR-Test-004-Solicitation/inputs/pdd.md`: - -```markdown ---- -title: "FLW Outreach for Maternal Health — Niger" -archetype: atomic-visit -intervention_summary: > - CHWs visit pregnant women and new mothers monthly to provide ANC/PNC - guidance, basic screening, and referrals. The program targets districts - with low facility-delivery rates. -target_flw_profile: > - Existing community-elected health volunteers, primarily women, with - basic literacy in Hausa or French. 6-month engagement, ~30 visits per - month per FLW. -visit_structure: > - Single-visit data collection at each woman's home. Form covers - demographics, pregnancy status, danger signs screening, and referral - log. ~15 minutes per visit. -success_criteria: - - "≥80% of pregnant women in catchment receive at least 1 ANC visit" - - "≥60% of identified danger-sign cases referred to facility" - - "FLW retention ≥85% over 6 months" -total_budget: 75000 - -# Solicitation fields -solicitation_type: EOI -solicitation_deadline_days: 21 -llo_questions: - - "Describe your prior experience deploying CHW programs in West Africa" - - "How will you recruit and train 40 FLWs across 3 districts?" - - "What is your timeline for fielding once awarded?" - - "What is your supervision model for FLW visits?" - - "Do you have local-language capacity (Hausa or French)?" - - "Provide a budget breakdown for the proposed scope" - -preferred_llos: - - { name: "Niger Health Initiative", contact_email: "ops@niger-health.example", organization_slug: "niger-health-initiative" } - - { name: "Sahel Maternal Care", contact_email: "info@sahel-maternal.example", organization_slug: "sahel-maternal-care" } ---- -``` - -Create `test/fixtures/CRISPR-Test-004-Solicitation/opp.yaml`: - -```yaml -display_name: "Niger Maternal Health Pilot" -slug: niger-maternal-health-pilot -program_id: 42 -created_at: 2026-05-04T12:00:00Z -created_by: ace@dimagi-ai.com -last_run_id: null -tags: [solicitation-fixture, atomic-visit] -``` - -- [ ] **Step 3: Update the manifest validation test** - -In `test/fixtures/artifact-manifest.test.ts`, add `'CRISPR-Test-004-Solicitation'` to the list of fixtures that get walked. (Look for an array like `const FIXTURES = ['CRISPR-Test-001-...', ...]` and append.) - -- [ ] **Step 4: Run the test** - -Run: `npm test -- --run test/fixtures/` - -Expected: PASS for all 4 fixtures. - -- [ ] **Step 5: Commit** - -```bash -git add test/fixtures/CRISPR-Test-004-Solicitation/ test/fixtures/artifact-manifest.test.ts -git commit -m "test(fixture): add CRISPR-Test-004-Solicitation - -PDD with all three new optional solicitation fields populated, two -preferred_llos. Used by Phase 7 skill tests and the LABS_INTEGRATION -e2e test. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 18: Update opp-eval to include solicitation category - -**Files:** -- Modify: `skills/opp-eval/SKILL.md` - -- [ ] **Step 1: Read the current category list** - -Run: `grep -n "category\|categories" skills/opp-eval/SKILL.md` - -Note the existing 6 categories (likely: design, commcare, connect, ocs, operate, closeout). - -- [ ] **Step 2: Add the solicitation category** - -In `skills/opp-eval/SKILL.md`, add a new category between `connect` and `ocs` (or wherever the phase-ordering sits in the document): - -- Category: `solicitation` -- Eval rubrics aggregated: `solicitation-create-eval`, `solicitation-review-eval` (when present) -- Phase: 6 -- Coverage tier rule: full coverage requires verdicts from both rubrics; partial coverage requires `solicitation-create-eval` only. - -If the SKILL.md has a "category coverage tier" table that says "6 of 6 = full", update to "7 of 7 = full". The "full" threshold lifts by one. - -- [ ] **Step 3: Commit** - -```bash -git add skills/opp-eval/ -git commit -m "feat(eval): add solicitation category to opp-eval - -opp-eval now aggregates verdicts from solicitation-create-eval and -solicitation-review-eval as a 7th category. Full coverage threshold -lifts from 6 → 7. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 19: Add LABS_INTEGRATION end-to-end test - -**Files:** -- Create: `test/mcp/connect-labs/integration/e2e.integration.test.ts` - -- [ ] **Step 1: Write the test** - -Create `test/mcp/connect-labs/integration/e2e.integration.test.ts`: - -```typescript -import { describe, it, expect, beforeAll } from 'vitest'; -import { forward } from '../../../../mcp/connect-labs-server'; - -const RUN = process.env.LABS_INTEGRATION === '1'; -const URL = process.env.LABS_MCP_URL || 'https://labs.connect.dimagi.com/mcp/'; -const TOKEN = process.env.LABS_MCP_TOKEN || ''; - -describe.runIf(RUN)('connect-labs MCP — live integration', () => { - beforeAll(() => { - if (!TOKEN) throw new Error('LABS_MCP_TOKEN required for LABS_INTEGRATION=1'); - }); - - it('lists tools (sanity)', async () => { - const reply = await forward( - { jsonrpc: '2.0', id: 1, method: 'tools/list' }, - { token: TOKEN, url: URL }, - ); - expect(reply.error).toBeUndefined(); - expect((reply.result as any)?.tools?.length).toBeGreaterThan(0); - const names = (reply.result as any).tools.map((t: any) => t.name); - expect(names).toEqual(expect.arrayContaining([ - 'list_solicitations', - 'create_solicitation', - 'list_responses', - 'award_response', - ])); - }); - - it('list_solicitations returns at least an empty list (Connect OAuth bridge live)', async () => { - const reply = await forward( - { - jsonrpc: '2.0', - id: 2, - method: 'tools/call', - params: { name: 'list_solicitations', arguments: {} }, - }, - { token: TOKEN, url: URL }, - ); - expect(reply.error).toBeUndefined(); - // Result is the labs-side serialized list; shape may be a JSON-encoded - // string or a structured array depending on the labs MCP transport. - expect(reply.result).toBeDefined(); - }); - - it('create_solicitation → list_responses → cleanup (smoke)', async () => { - // Create a draft solicitation in a test program. The fixture's program_id - // must point at a "Solicitation Test" program in labs that has no real - // responders. Skip the test (don't fail) if the env doesn't provide one. - const programId = process.env.LABS_TEST_PROGRAM_ID; - if (!programId) { - console.warn('LABS_TEST_PROGRAM_ID unset — skipping create_solicitation smoke'); - return; - } - const create = await forward( - { - jsonrpc: '2.0', - id: 3, - method: 'tools/call', - params: { - name: 'create_solicitation', - arguments: { - program_id: programId, - title: `ACE integration test ${new Date().toISOString()}`, - solicitation_type: 'EOI', - description: 'integration test — please ignore', - scope_of_work: 'integration test', - budget: 1, - deadline: new Date(Date.now() + 24 * 3600 * 1000).toISOString(), - evaluation_criteria: [{ id: 'fit', weight: 1.0, scale: 10 }], - response_template: ['Why are you interested?'], - status: 'draft', // never publish from a test - }, - }, - }, - { token: TOKEN, url: URL }, - ); - expect(create.error).toBeUndefined(); - }); -}); -``` - -- [ ] **Step 2: Verify the test is skipped without the env var** - -Run: `npm test -- --run test/mcp/connect-labs/integration/` - -Expected: 0 tests run (the `runIf` skips when `LABS_INTEGRATION` is unset). - -- [ ] **Step 3: Run the test with the env var (manual verification)** - -Run: `LABS_INTEGRATION=1 LABS_MCP_TOKEN= npm test -- --run test/mcp/connect-labs/integration/` - -Expected: 3/3 pass against live labs (only if you have a labs PAT and a `LABS_TEST_PROGRAM_ID` to target). - -If you don't have a PAT yet, skip this verification — the test exists for CI / future runs. - -- [ ] **Step 4: Commit** - -```bash -git add test/mcp/connect-labs/integration/ -git commit -m "test(integration): add LABS_INTEGRATION e2e for connect-labs MCP - -Three checks: tools/list, list_solicitations (verifies Connect OAuth -bridge), create_solicitation smoke (skipped without LABS_TEST_PROGRAM_ID). -Gated like OCS_INTEGRATION — does not run in default npm test. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 20: Update CLAUDE.md and README.md prose - -**Files:** -- Modify: `CLAUDE.md` -- Modify: `README.md` - -- [ ] **Step 1: Update CLAUDE.md phase order list** - -Locate the phase listing in `CLAUDE.md` (search for "Orchestration runs 7 phases"). Update to: - -> **Orchestration runs 8 phases as of .** Phase order: (1) design-review → (2) commcare-setup → (3) connect-setup → (4) ocs-setup → (5) qa-and-training → (6) solicitation-management → (7) execution-management → (8) closeout. Phase 7 (new) publishes a solicitation derived from the PDD and emails PDD-named candidate LLOs the public URL. Phase 8 (renamed from llo-management) onboards the awardee chosen by the manual solicitation-review skill. - -- [ ] **Step 2: Update CLAUDE.md MCP section** - -Add a new bullet under the existing MCP-server description block: - -> - `connect-labs-server.ts` → `connect-labs` (stdio proxy forwarding to `labs.connect.dimagi.com/mcp/`). 10 atoms consumed: `list/get/create/update_solicitation`, `list/get_responses`, `create_review`, `list_reviews`, `award_response`, `generate_criteria`. Source under `mcp/connect-labs-server.ts` is a thin proxy — the real catalog lives in connect-labs (`commcare_connect/mcp/tools/`). Auth: Bearer PAT in `LABS_MCP_TOKEN` (1Password). Provisioned per-machine via `op inject -i .env.tpl`. - -- [ ] **Step 3: Update CLAUDE.md gotchas section** - -Add a new bullet under "Gotchas": - -> - **Connect Labs MCP is HTTP, but ACE consumes it via a stdio proxy.** `mcp/connect-labs-server.ts` reads `LABS_MCP_TOKEN` from `${CLAUDE_PLUGIN_DATA}/.env` and forwards JSON-RPC frames to `labs.connect.dimagi.com/mcp/`. If the labs MCP gains first-class HTTP support in `plugin.json` later, the proxy can be removed. -> - **`solicitation` and `selected_llo` are separate blocks in `opp.yaml`.** `solicitation` is the audit trail (URLs, deadline, status); `selected_llo` is the narrow contract Phase 8 reads. Only `solicitation-review` populates `selected_llo`. If you see `selected_llo` set without a corresponding `solicitation` block, that's a contract violation. - -- [ ] **Step 4: Update README.md (if present)** - -If `README.md` lists phases, update to the 8-phase order. If it has a "What ACE does" summary, add a sentence about Phase 7 being solicitation-driven. - -- [ ] **Step 5: Commit** - -```bash -git add CLAUDE.md README.md -git commit -m "docs: update CLAUDE.md + README for 8-phase order - -Phase 7 (Solicitation Management) added between qa-and-training and the -renamed Execution Management. New connect-labs MCP entry, new gotchas -on the stdio proxy + opp.yaml.solicitation/selected_llo split. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -### Task 21: Bump version, update CHANGELOG, final smoke - -**Files:** -- Modify: `VERSION` -- Modify: `CHANGELOG.md` - -- [ ] **Step 1: Bump VERSION (worktree-safe)** - -Run: `scripts/version-bump.sh` - -Expected: prints something like `bumped 0.11.9 → 0.12.0` (the bumper picks `max(local, origin) + patch+1`; the actual minor-vs-patch depends on the current state). For a feature this size, manually override to a minor bump if the script picked patch: - -If needed, edit `VERSION` to `0.12.0` and run the pre-commit-style sync: - -```bash -echo "0.12.0" > VERSION -bash scripts/sync-version.sh -``` - -- [ ] **Step 2: Update CHANGELOG.md** - -Prepend to `CHANGELOG.md`: - -```markdown -## 0.12.0 — Solicitation Management (new Phase 7) - -**Phase topology shifts.** Inserts Phase 7 (Solicitation Management) between -qa-and-training and the renamed Execution Management (was llo-management, -Phase 7). Closeout shifts to Phase 9. - -**New phase: Solicitation Management.** -- Default `/ace:run` publishes a solicitation derived from the PDD via the - new `connect-labs` MCP, then emails PDD-named candidate LLOs the public - URL. `/ace:run` halts at the Phase 7→8 boundary. -- Recurring `solicitation-monitor` polls labs for responses; runs outside - `/ace:run`. -- Manual `solicitation-review` (HITL-gated) scores responses, presents a - recommendation, and on human approval calls `award_response` and - populates `opp.yaml.selected_llo`. The only path that unblocks Phase 8. - -**New MCP: `connect-labs`.** A thin stdio proxy at -`mcp/connect-labs-server.ts` forwards JSON-RPC frames to -`labs.connect.dimagi.com/mcp/` with a Bearer PAT (`LABS_MCP_TOKEN`). -Consumes 10 atoms; no new code in `ace-connect`. - -**Phase 8 changes:** `llo-manager` agent renamed to `execution-manager`. -`llo-invite` skill moved to Phase 7 with rewritten behavior (sends -solicitation invites instead of preparing a Connect roster). -`llo-onboarding` reads `opp.yaml.selected_llo` and fails fast if empty. - -**Pause-points:** -- Phase 5→6 no longer mandatory pause. -- Phase 7→8 is the new external-communication boundary (where `/ace:run` - halts in default mode). -- HITL gate inside `solicitation-review` before `award_response`. - -**Doctor:** new `[Connect Labs]` section with three checks -(env / reachable / Connect OAuth bridge). - -**Provisional eval rubrics:** `solicitation-create-eval`, -`solicitation-review-eval`. Calibration TBD per `eval-calibration` once -3+ real solicitations + awards have shipped. - -**No migration script.** In-flight opps finish on the old code; new opps -use the new schema. -``` - -- [ ] **Step 3: Run the full test suite** - -Run: `npm test -- --run` - -Expected: full pass. - -- [ ] **Step 4: Run /ace:doctor smoke** - -Run: `bin/ace-doctor` - -Expected: all sections OK or WARN. The new `[Connect Labs]` section will -likely FAIL on `connect_labs_env` until a PAT is provisioned in 1Password -— that's expected and not a blocker for the merge. The doctor exit -status should reflect the FAIL, but the operator will provision the PAT -during/after the merge. - -- [ ] **Step 5: Commit** - -```bash -git add VERSION package.json .claude-plugin/plugin.json .claude-plugin/marketplace.json CHANGELOG.md -git commit -m "release: 0.12.0 — Solicitation Management (new Phase 7) - -Inserts Phase 7 (Solicitation Management) between qa-and-training and -the renamed Execution Management (was llo-management, Phase 7). Closeout -shifts to Phase 9. Adds the connect-labs stdio proxy MCP, four new -skills (solicitation-create, llo-invite-rewritten, solicitation-monitor, -solicitation-review), two provisional eval rubrics, and a new doctor -section. - -See CHANGELOG.md for full details. - -Co-Authored-By: Claude Opus 4.7 (1M context) " -``` - ---- - -## Self-review - -**Spec coverage:** Walked each section of the spec — - -- ✅ Phase topology (renumbering): Tasks 1, 2, 14, 15 -- ✅ MCP integration (proxy, auth, doctor): Tasks 3, 4, 5 -- ✅ PDD additions: Task 6 -- ✅ Solicitation skills: Tasks 7, 9, 10 -- ✅ Eval rubrics: Tasks 8, 11 -- ✅ Transformed llo-invite: Task 12 -- ✅ Updated llo-onboarding: Task 13 -- ✅ Solicitation-management agent: Task 14 -- ✅ Orchestrator phases + pause-points: Task 15 -- ✅ Commands updates: Task 16 -- ✅ Fixture: Task 17 -- ✅ opp-eval coverage: Task 18 -- ✅ Integration test: Task 19 -- ✅ Doc updates: Task 20 -- ✅ Version + CHANGELOG: Task 21 - -**Placeholder scan:** No "TBD", "TODO", "implement later", or -"add appropriate error handling" without specifics. Provisional rubrics -are explicitly flagged with calibration plans. - -**Type consistency:** Skill names match between manifest entries -(Task 7), the agent's `skills:` block (Task 14), and orchestrator -references (Task 15). Atom names (`create_solicitation`, `list_responses`, -`award_response`, etc.) match what `connect-labs/commcare_connect/mcp/tools/solicitations.py` -registers. - -**Known limitation:** The doctor wiring in Task 5 step 5 ("if `bin/ace-doctor` -is a thin bash wrapper around `tsx`") branches on the current shape of -`ace-doctor`. The implementation step should inspect the file first and -match its existing convention. This is documented in the task itself. - ---- - -Plan complete and saved to `docs/superpowers/plans/2026-05-04-ace-solicitations-phase.md`. Two execution options: - -1. **Subagent-Driven (recommended)** — I dispatch a fresh subagent per task, review between tasks, fast iteration. - -2. **Inline Execution** — Execute tasks in this session using executing-plans, batch execution with checkpoints. - -Which approach? diff --git a/docs/superpowers/plans/2026-05-04-shallow-deep-qa-split.md b/docs/superpowers/plans/2026-05-04-shallow-deep-qa-split.md deleted file mode 100644 index 3df5a419..00000000 --- a/docs/superpowers/plans/2026-05-04-shallow-deep-qa-split.md +++ /dev/null @@ -1,1202 +0,0 @@ -# Shallow / Deep QA Split Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Make `/ace:run` shallow-by-default (~5 LLM judge calls vs ~90 today), introduce a manual `/ace:qa-deep` command for quality assessment, and move QA-plan generation upstream to phases that know design intent (Phase 1) and built structure (Phase 2). Add a Phase 7 gate that prevents activation without fresh deep verdicts. - -**Architecture:** Add two new artifact-producing skills upstream (`pdd-to-app-journeys` in Phase 1, `app-test-cases` in Phase 2) so Phase 5 can become a thin executor. Add one new eval skill (`app-ux-eval`) plus a top-level `/ace:qa-deep` command that wraps deep OCS + deep app eval. Thin OCS `--quick` to a 3-prompt × 1-dimension smoke check. Drop Phase 4's `--deep` gate. Wire the deep-verdict requirement into `llo-launch` so go-live can't ship without it. Retire `qa-plan` and `app-test` once their successors are live. - -**Tech Stack:** TypeScript (MCP atoms, lib/), prompt-based skills (.md files), Vitest tests, Google Drive artifact layout under `ACE//runs//`. - -**Spec:** `docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md` - ---- - -## File Structure - -**New files:** -- `skills/pdd-to-app-journeys/SKILL.md` — Phase 1 producer for `expected-journeys.md` -- `skills/app-test-cases/SKILL.md` — Phase 2 producer for `app-test-cases.yaml` -- `skills/app-ux-eval/SKILL.md` — deep-only LLM-as-Judge over screenshots + journeys -- `commands/qa-deep.md` — `/ace:qa-deep ` slash command -- `templates/expected-journeys-template.md` — markdown skeleton consumed by `pdd-to-app-journeys` -- `templates/app-test-cases-template.yaml` — yaml skeleton consumed by `app-test-cases` -- `migrations/0.x.0-shallow-deep-qa.md` — migration notes for in-flight opps - -**Modified files:** -- `skills/ocs-chatbot-qa/SKILL.md` — thin `--quick` to 3 prompts; gate calls only from `/ace:qa-deep` -- `skills/ocs-chatbot-eval/SKILL.md` — `--quick` collapses to 1 dimension (`overall_quality`) -- `skills/app-screenshot-capture/SKILL.md` — read `app-test-cases.yaml` instead of `qa-plan/`; add 1-question UX smoke judge -- `skills/llo-launch/SKILL.md` — gate activation on fresh deep verdicts -- `agents/design-review.md` — add `pdd-to-app-journeys` step -- `agents/commcare-setup.md` — add `app-test-cases` step after Nova builds, before `app-release` -- `agents/qa-and-training.md` — drop `qa-plan` step, point at new artifacts -- `agents/ocs-setup.md` — drop `--deep` gate; only `--quick` runs in Phase 4 -- `agents/llo-manager.md` — note the new gate in `llo-launch` -- `lib/artifact-manifest.ts` — add new artifacts; drop `qa-plan/` and `app-test` artifacts; update `consumedBy` lists -- `bin/ace-doctor` — add freshness check for deep verdicts -- `commands/run.md` — note that deep QA is no longer part of `/ace:run` -- `VERSION` — bump - -**Retired files (deleted at end):** -- `skills/qa-plan/SKILL.md` and directory -- `skills/app-test/SKILL.md` and directory -- `test-results/` artifact entries in manifest - ---- - -## Task 1: New Phase 1 skill — `pdd-to-app-journeys` - -**Goal:** Phase 1 emits `expected-journeys.md` describing UX intent. Nothing reads it yet (Task 3 does), so this lands cleanly without breaking anything. - -**Files:** -- Create: `skills/pdd-to-app-journeys/SKILL.md` -- Create: `templates/expected-journeys-template.md` -- Modify: `agents/design-review.md` (add a step that dispatches the new skill) -- Modify: `lib/artifact-manifest.ts` (add `expected-journeys.md` entry under `phase: 'design'`, `required: true`, `producedBy: 'pdd-to-app-journeys'`, `consumedBy: ['app-test-cases', 'app-ux-eval']`) -- Test: `test/fixtures/artifact-manifest.test.ts` (existing — re-run after manifest edit) - -- [ ] **Step 1: Read the existing Phase 1 skill `pdd-to-test-prompts` for structure** - -Read: `skills/pdd-to-test-prompts/SKILL.md`. The new skill mirrors its frontmatter/process layout, including the `## Archetypes` branching. - -- [ ] **Step 2: Write the template** - -Create `templates/expected-journeys-template.md`: - -```markdown -# Expected User Journeys — {{opp_name}} - -Derived from: pdd.md (rev {{pdd_rev_date}}) -Archetype: {{archetype}} - -## Persona - -{{persona_summary — pulled verbatim from PDD's "Target FLW" section}} - -## Journey 1 — {{journey_name}} - -**Goal:** {{one-line goal of the journey}} - -**Happy path narrative:** -{{2-4 sentences describing what the FLW does, in user-outcome language — -not field/form mechanics. Example: "FLW arrives at a household, opens -the Deliver app, confirms the household by name and phone, completes -the screening, photographs the MTN card, and submits. They see a -confirmation that their visit has been recorded."}} - -**Edge cases (UX outcomes, not error codes):** -- {{e.g., "FLW understands why a duplicate-household submission was - rejected and how to proceed"}} -- {{e.g., "FLW understands they cannot submit without GPS"}} - -**Pass criteria:** -- {{e.g., "Journey completes in <3 minutes including form fill"}} -- {{e.g., "Required-field errors are recoverable in-form"}} - -## Journey 2 — {{journey_name}} -... -``` - -- [ ] **Step 3: Write the skill file** - -Create `skills/pdd-to-app-journeys/SKILL.md`. Frontmatter: - -```markdown ---- -name: pdd-to-app-journeys -description: > - Derive opp-specific expected user journeys from an approved PDD. - Output is `expected-journeys.md`, the UX-intent ground truth for - `app-test-cases` (Phase 2) and `app-ux-eval` (deep QA). Mirrors - pdd-to-test-prompts but for the apps, not the chatbot. ---- -``` - -Body must include: -- A `## Process` section with steps: read PDD, branch on archetype, generate journeys per persona, self-evaluate coverage, write file -- An `## Archetypes` section that mirrors `pdd-to-test-prompts`: - - `atomic-visit`: 2-4 journeys covering visit-flow, eligibility-edge, data-quality-error, duplicate-handling - - `focus-group`: 2-4 journeys covering session-setup, recruitment-failure, consent-handling, output-coherence - - `multi-stage`: per-stage journeys + cross-stage transition -- A `## Coverage rules` section requiring at least one `error_recovery`-flavored edge case per journey (so `app-ux-eval`'s rubric has signal) -- A `## Failure modes` and `## Mode behavior` block matching `pdd-to-test-prompts` -- A `## Change log` entry - -The skill writes to `ACE//runs//expected-journeys.md`. (Use the run-scoped path — see `lib/run-paths.ts`.) - -- [ ] **Step 4: Wire the skill into `design-review` agent** - -Modify `agents/design-review.md`. Find the existing `pdd-to-test-prompts` dispatch step. Add a parallel step right after it (same level — Phase 1 Step 3 or 4): - -```markdown -### Step : Generate expected user journeys - -Dispatch `pdd-to-app-journeys`: -- Reads: `pdd.md` -- Writes: `expected-journeys.md` -- Halts on missing/empty PDD or missing target-FLW persona section - -This skill is the UX-intent ground truth for downstream app QA. Phase 5 -shallow execution and `/ace:qa-deep` both read it. -``` - -- [ ] **Step 5: Add to artifact manifest** - -Modify `lib/artifact-manifest.ts`. Add this entry inside the `// ── Design phase (Phase 1) ─────────` block, alongside `test-prompts.md`: - -```typescript -{ - path: 'expected-journeys.md', - producedBy: 'pdd-to-app-journeys', - consumedBy: ['app-test-cases', 'app-ux-eval', 'app-screenshot-capture'], - phase: 'design', - required: true, - description: 'PDD-derived user journeys + UX edge cases. Ground truth for app-test-cases (Phase 2) and app-ux-eval (deep). Each journey carries a goal, happy-path narrative, edge cases phrased as UX outcomes, and pass criteria.', -}, -``` - -- [ ] **Step 6: Run manifest tests** - -Run: `npm test -- test/fixtures/artifact-manifest.test.ts` -Expected: PASS. If it fails saying a fixture is missing the file, that's expected — the fixture-update lands in Task 8. - -(If the test enforces strict fixture coverage and fails, mark this as a known-blocker for Task 8 and proceed; do not block this task on it.) - -- [ ] **Step 7: Commit** - -```bash -git add skills/pdd-to-app-journeys/ templates/expected-journeys-template.md \ - agents/design-review.md lib/artifact-manifest.ts -git commit -m "feat(phase-1): add pdd-to-app-journeys skill + expected-journeys.md artifact - -Mirror of pdd-to-test-prompts for the app side. Emits the UX-intent -ground truth that app-test-cases (Phase 2) and app-ux-eval (deep) will -consume in subsequent commits. Artifact-manifest gets the new entry -under phase=design, required=true. - -Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md" -``` - ---- - -## Task 2: New Phase 2 skill — `app-test-cases` - -**Goal:** Phase 2 emits `app-test-cases.yaml` after Nova builds. Binds Phase 1 journeys to real built structure + Maestro recipe stubs. Doesn't replace `qa-plan` yet — `qa-plan` keeps running in Phase 5 until Task 5 retires it. - -**Files:** -- Create: `skills/app-test-cases/SKILL.md` -- Create: `templates/app-test-cases-template.yaml` -- Modify: `agents/commcare-setup.md` (add dispatch step after Nova builds, before `app-release`) -- Modify: `lib/artifact-manifest.ts` (add `app-test-cases.yaml` entry) - -- [ ] **Step 1: Read existing Phase 2 producers for context** - -Read in parallel: -- `skills/pdd-to-learn-app/SKILL.md` — how Phase 2 skills read `nova_app_id` and Nova's blueprint -- `skills/qa-plan/SKILL.md` — the recipe-composition pattern we'll inherit (Steps 2 + 3) -- `mcp/mobile/recipes/static/connect-login.yaml` (and siblings in that directory) — the static-recipe palette - -- [ ] **Step 2: Write the template** - -Create `templates/app-test-cases-template.yaml`: - -```yaml -# app-test-cases.yaml — bindings of Phase 1 journeys to Phase 2 built structure. -# Producer: app-test-cases (Phase 2) -# Consumers: app-screenshot-capture (Phase 5 shallow), /ace:qa-deep (manual deep) - -opp: {{opp_name}} -run_id: {{run_id}} -generated_at: {{ISO}} -pdd_rev: {{pdd_rev_date}} -nova_apps: - learn: {{learn_nova_app_id}} - deliver: {{deliver_nova_app_id}} - -# Each entry binds one Journey from expected-journeys.md to: -# - the actual forms/fields it exercises (real IDs, not placeholders) -# - a Maestro recipe filled with concrete selectors (no REPLACE_*) -# - the structural pass criteria (boot, no crash, submit confirmation) -# -# `is_smoke: true` marks the recipe Phase 5 runs in shallow mode (one -# per app — the cheapest representative happy path). - -journeys: - - id: J1 - name: {{journey_name from expected-journeys.md}} - app: deliver # or learn - is_smoke: false - forms_exercised: - - {{form_id_or_name}} - fields_exercised: - - {{field_id}} - recipe_path: app-test-cases/recipes/J1.yaml - structural_pass_criteria: - - app_boots - - no_crash - - submission_confirmed # or "assessment_complete" for Learn - pdd_time_budget_seconds: {{from PDD if specified, else null}} -``` - -- [ ] **Step 3: Write the skill file** - -Create `skills/app-test-cases/SKILL.md`: - -```markdown ---- -name: app-test-cases -description: > - After Nova builds the Learn and Deliver apps, bind each user journey - from expected-journeys.md to the actual built structure, emit a - Maestro recipe stub per journey with real selectors (not REPLACE_*), - and write the consolidated app-test-cases.yaml. Phase 5 reads this - for shallow execution; /ace:qa-deep reads it for full execution. - Successor to qa-plan (which is retired in this same release). ---- - -# App Test Cases - -Binds Phase 1 UX intent to Phase 2 built structure. Runs after Nova -finishes both apps, before `app-release` — so the recipes exist when -Phase 5 needs them. - -## Process - -### Step 1: Read inputs - -- `expected-journeys.md` -- `app-summaries/learn-app-summary.md` -- `app-summaries/deliver-app-summary.md` -- The Nova blueprints (call `mcp__plugin_nova_nova__get_app` with each - app id) for real form/field IDs -- The static-recipe library at `mcp/mobile/recipes/static/` - -### Step 2: For each journey, decide its app + smoke flag - -Map each journey from `expected-journeys.md` to either Learn or Deliver -based on whether the journey describes assessment behavior (Learn) or -visit/delivery behavior (Deliver). Multi-stage opps may have both. - -**Smoke flag rules:** -- Exactly ONE journey per app gets `is_smoke: true` -- The smoke journey is the simplest happy-path that exercises the - app's primary submission/completion flow -- If two journeys could plausibly be the smoke, pick the one with the - smallest `pdd_time_budget_seconds` - -### Step 3: For each journey, compose the Maestro recipe - -Use the same composition pattern as the retired `qa-plan` skill (read -`skills/qa-plan/SKILL.md` § Step 3 for the static-recipe palette). -Differences: - -- Recipes here are journey-keyed, not module-keyed (`J1.yaml`, `J2.yaml`) -- Each journey's recipe MUST include a final - `takeScreenshot: "sc-J-final"` for the deep UX judge to grade -- Validate via `mobile_validate_recipe` before writing - -Write recipes to `ACE//runs//app-test-cases/recipes/J.yaml`. - -### Step 4: Emit the consolidated yaml - -Write `ACE//runs//app-test-cases.yaml` per the template -in `templates/app-test-cases-template.yaml`. - -### Step 5: Self-evaluate coverage - -(Same shape as pdd-to-test-prompts.) Verify: -- Every journey from `expected-journeys.md` has a binding -- Exactly one `is_smoke: true` per app -- Every recipe passes `mobile_validate_recipe` -- Every `forms_exercised` entry resolves to a real Nova form ID - -If any check fails, halt with a `[BLOCKER]` verdict. - -## Mode behavior - -- Auto: write everything, halt on blocker -- Review: pause to show the journey→form bindings before composing recipes -- Dry-run: write the yaml + journey bindings; stub recipe paths; state - tracks as `dry-run-success` - -## Failure modes - -- expected-journeys.md missing or empty → Phase 1 hasn't completed; halt -- Nova blueprint missing for one of the apps → Phase 2 build hasn't - succeeded; halt with pointer to upstream skill -- mobile_validate_recipe rejects more than 2× per journey → escalate - with the validator output - -## MCP tools used - -- ace-gdrive: drive_read_file, drive_create_file, drive_create_folder -- ace-mobile: mobile_resolve_selectors, mobile_validate_recipe -- nova: mcp__plugin_nova_nova__get_app - -## Change log - -| Date | Change | Author | -|------|--------|--------| -| {{today}} | Initial version. Phase 2 producer for app-test-cases.yaml; binds expected-journeys.md to Nova-built structure with Maestro recipe stubs. Successor to qa-plan (retired in same release). | ACE team | -``` - -- [ ] **Step 4: Wire the skill into `commcare-setup` agent** - -Modify `agents/commcare-setup.md`. Find the dispatch chain that goes: -`pdd-to-learn-app` → `pdd-to-deliver-app` → `app-deploy` → `app-release` → `app-test`. - -Insert `app-test-cases` between `app-deploy` and `app-release` (Nova builds are uploaded via app-deploy, so the blueprint IDs are stable by then; app-release is when we can no longer rebuild the apps cheaply, so it's also the natural cutoff for "the apps are now what they are"). Step text: - -```markdown -### Step : Generate app-test-cases.yaml - -Dispatch `app-test-cases`: -- Reads: expected-journeys.md, both app summaries, Nova blueprints -- Writes: app-test-cases.yaml + recipes/J*.yaml under app-test-cases/ -- Halts on missing inputs or recipe-validation failure - -Phase 5 shallow runs the smoke recipes; /ace:qa-deep runs them all. -``` - -- [ ] **Step 5: Add to artifact manifest** - -Modify `lib/artifact-manifest.ts`. Add inside the CommCare phase block, after `deployment-summary.md`: - -```typescript -{ - path: 'app-test-cases.yaml', - producedBy: 'app-test-cases', - consumedBy: ['app-screenshot-capture', 'app-ux-eval'], - phase: 'commcare', - required: true, - description: 'Bindings of expected-journeys.md to Phase-2-built app structure: per-journey form/field IDs, Maestro recipe paths, smoke flags, structural pass criteria. Phase 5 shallow uses is_smoke: true entries; /ace:qa-deep uses all entries.', -}, -``` - -Update consumed-by lists already in the file: -- `expected-journeys.md` → confirm `consumedBy` includes `'app-test-cases'` -- `app-summaries/learn-app-summary.md` → add `'app-test-cases'` -- `app-summaries/deliver-app-summary.md` → add `'app-test-cases'` - -- [ ] **Step 6: Run manifest tests** - -Run: `npm test -- test/fixtures/artifact-manifest.test.ts` -Expected: PASS (modulo the fixture-coverage warning carried from Task 1). - -- [ ] **Step 7: Commit** - -```bash -git add skills/app-test-cases/ templates/app-test-cases-template.yaml \ - agents/commcare-setup.md lib/artifact-manifest.ts -git commit -m "feat(phase-2): add app-test-cases skill + app-test-cases.yaml artifact - -Phase 2 producer for the journey→build binding layer. Composes Maestro -recipes per journey with real selectors (not REPLACE_*), marks one -smoke recipe per app for Phase 5 shallow execution. Successor to -qa-plan; qa-plan keeps running in Phase 5 until Task 5 swaps over. - -Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md" -``` - ---- - -## Task 3: New deep eval skill — `app-ux-eval` - -**Goal:** New LLM-as-Judge skill that grades captured screenshots against `expected-journeys.md`. Deep-only — no `--quick` mode. Used by `/ace:qa-deep` (Task 4) and the Phase 7 gate (Task 7). - -**Files:** -- Create: `skills/app-ux-eval/SKILL.md` -- Modify: `lib/artifact-manifest.ts` (add `verdicts/app-ux-eval-deep.yaml`) -- Modify: `lib/verdict-schema.ts` (only if the existing schema doesn't already cover the dimensions; prefer reusing) - -- [ ] **Step 1: Read existing eval skills + verdict schema** - -Read in parallel: -- `skills/ocs-chatbot-eval/SKILL.md` — uniform verdict shape, hard-deduction pattern -- `lib/verdict-schema.ts` — confirm the schema is dimension-agnostic (it should be; it just stores `dimensions: { name: string; score: number; reason: string }[]`) -- `lib/parse-verdict.ts` — confirm parser is generic - -If the schema is already generic, no schema edits are needed. - -- [ ] **Step 2: Write the skill file** - -Create `skills/app-ux-eval/SKILL.md`: - -```markdown ---- -name: app-ux-eval -description: > - LLM-as-Judge over captured screenshots + expected-journeys.md. - Per-journey verdict on UX dimensions: clarity, flow_predictability, - error_recovery, time_budget, journey_completion. Deep-only — runs from - /ace:qa-deep, never from /ace:run. Writes verdicts/app-ux-eval-deep.yaml - in the uniform verdict shape so opp-eval can aggregate. ---- - -# App UX Eval - -Grades the FLW experience of the built apps. Asks: "would this be a -good experience for the user?" and pins each judgment to concrete -PDD-derived ground truth (the journey's stated goal, time budget, edge -cases) so the rubric isn't unmoored. - -## Process - -### Step 1: Read inputs - -- `expected-journeys.md` — ground truth -- `app-test-cases.yaml` — journey↔recipe bindings -- The captured screenshots from the recent execution run (look up by - the run id passed in) -- `pdd.md` — for persona context (the FLW the rubric is judging "good - experience" against) - -### Step 2: For each journey, score 5 dimensions (1-3) - -| Dimension | What to look for | Hard deduction → fail | -|---|---|---| -| `clarity` | Field labels and prompts unambiguous to the persona from PDD's "Target FLW" section | Any field name only a developer would understand (e.g., `q3_v2_optional`) | -| `flow_predictability` | Conditional branches go where FLW expects; skip patterns don't surprise | A screen appears or disappears with no apparent cause from the user's perspective | -| `error_recovery` | Validation errors tell the FLW what's wrong and how to fix | Dead-end errors with no recovery path | -| `time_budget` | Step count + estimated input time vs. journey's `pdd_time_budget_seconds` | Recipe step count × 5s exceeds 2× the budget | -| `journey_completion` | Recipe accomplishes the journey's stated goal end-to-end | Recipe ends without confirmation / stuck screen | - -### Step 3: Aggregate - -- Per-journey verdict: weighted average of dimensions, hard-deduction - on any single dimension clamps the journey to fail -- Phase verdict: pass = all journeys pass; fail = any journey fails, - with summary of which journeys failed which dimensions - -### Step 4: Write verdict - -Write `ACE//runs//verdicts/app-ux-eval-deep.yaml` per the -uniform verdict shape (see `skills/README.md § Eval verdict shape` or -`lib/verdict-schema.ts`). Required fields: - -- skill: app-ux-eval -- mode: deep -- timestamp: ISO with timezone -- artifact_refs: { learn_build_id, deliver_build_id } — read from - deployment-summary.md so the Phase 7 gate can timestamp-compare -- dimensions: per-dimension scores + reasons -- per_unit_verdicts: per-journey verdicts -- overall_score, status (pass | fail), failing_units - -Also append a row to `eval-calibration/app-ux-eval-runs.md` so -calibration metrics keep accumulating. - -## Mode behavior - -- Deep only. There is no `--quick`. - -## Failure modes - -- Screenshots missing for a journey marked in app-test-cases.yaml → - halt with a `[BLOCKER]` saying which recipe didn't run -- expected-journeys.md missing → upstream Phase 1 or migration gap; - halt with pointer -- Nova builds older than the screenshots → screenshots are stale; halt - -## MCP tools used - -- ace-gdrive: drive_read_file, drive_list_folder, drive_create_file -- (No mobile/MCP — this is pure judging over already-captured artifacts) - -## Change log - -| Date | Change | Author | -|------|--------|--------| -| {{today}} | Initial version. Deep-only LLM-as-Judge for app UX. Used by /ace:qa-deep and the Phase 7 gate. | ACE team | -``` - -- [ ] **Step 3: Add to artifact manifest** - -Modify `lib/artifact-manifest.ts`. Add to the operate phase block (mirroring `verdicts/ocs-chatbot-eval-deep.yaml`): - -```typescript -{ - path: 'verdicts/app-ux-eval-deep.yaml', - producedBy: 'app-ux-eval', - consumedBy: ['llo-launch', 'opp-eval'], - phase: 'operate', - required: false, - description: 'Machine-readable verdict from app-ux-eval (deep). Read by llo-launch (Phase 7 activation gate) for freshness check vs. latest released CommCare build, and by opp-eval for cross-skill aggregation. Required to be fresh and passing for go-live; absent if /ace:qa-deep has not been run.', -}, -``` - -- [ ] **Step 4: Run manifest tests + verdict-schema tests** - -Run in parallel: -- `npm test -- test/fixtures/artifact-manifest.test.ts` -- `npm test -- test/lib/verdict-schema.test.ts` (if the path exists) - -Expected: PASS. The new skill produces the same shape so no test changes needed. - -- [ ] **Step 5: Commit** - -```bash -git add skills/app-ux-eval/ lib/artifact-manifest.ts -git commit -m "feat: add app-ux-eval skill for deep app UX grading - -LLM-as-Judge over captured screenshots + expected-journeys.md. Five -dimensions (clarity, flow_predictability, error_recovery, time_budget, -journey_completion), each with a hard-deduction rule. Deep-only — -called from /ace:qa-deep (next task) and gated by Phase 7 in Task 7. - -Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md" -``` - ---- - -## Task 4: New `/ace:qa-deep ` slash command - -**Goal:** Manual deep-QA surface. Thin wrapper that dispatches deep-mode versions of the existing OCS qa+eval pair plus the new `app-ux-eval`. - -**Files:** -- Create: `commands/qa-deep.md` -- Modify: `commands/run.md` (add a one-line "deep QA is no longer part of /ace:run; see /ace:qa-deep") - -- [ ] **Step 1: Read existing slash commands for the format** - -Read in parallel: -- `commands/run.md` -- `commands/step.md` -- `commands/eval.md` - -Note the frontmatter schema (`description`, `argument-hint`, etc.) and how multi-arg commands handle flags. - -- [ ] **Step 2: Write `commands/qa-deep.md`** - -```markdown ---- -description: Run deep QA (OCS + apps) against an existing opportunity. Manual gate, not part of /ace:run. -argument-hint: [--ocs-only | --apps-only] [--since=] ---- - -# /ace:qa-deep — Manual Deep QA - -Triggers a full LLM-as-Judge quality assessment of an opportunity that -already has a successful /ace:run behind it. - -## Inputs read from Drive (`ACE/$1/`) - -- `pdd.md`, `test-prompts.md` (OCS deep ground truth) -- `expected-journeys.md`, `app-test-cases.yaml` (app deep ground truth) -- The published OCS chatbot's current configuration -- The latest released CommCare builds (Learn + Deliver) - -## What this does - -Run the following dispatches in this order: - -### Stage A — OCS deep (skip if `--apps-only`) - -1. Dispatch `ocs-chatbot-qa --deep` for $1 -2. Dispatch `ocs-chatbot-eval --deep` for $1 - -Writes: -- qa-captures/-ocs-chat-deep.md -- verdicts/ocs-chatbot-eval-deep.yaml -- gate-briefs/ocs-chatbot-eval-deep.md - -### Stage B — Apps deep (skip if `--ocs-only`) - -1. Read `app-test-cases.yaml` for the run. -2. If `--since=` is provided: filter to journeys whose - `recipe_path` mtime is newer than the prior verdict at - `verdicts/app-ux-eval-deep.yaml@`. Otherwise run all. -3. For each journey: call `mobile_run_recipe` against a fresh AVD, - capture screenshots, upload to Drive under - `screenshots/qa-deep//`. -4. Dispatch `app-ux-eval` to grade the captured set. - -Writes: -- screenshots/qa-deep/J*/*.png -- verdicts/app-ux-eval-deep.yaml -- eval-calibration/app-ux-eval-runs.md (appended row) - -## What this does NOT do - -- No /ace:run side effects. No Phase 7 activation, no app rebuild, no - training-material regeneration. -- No FLW invites, no LLO emails. - -## After completion - -Both verdicts go to `verdicts/*-deep.yaml`. The Phase 7 `llo-launch` -gate reads them and refuses activation if either is missing or stale. - -If you ran this and want to proceed to go-live, re-enter Phase 7 via -/ace:step llo-launch $1 (or let /ace:run resume from where it left off). -``` - -- [ ] **Step 3: Update `commands/run.md`** - -Find the "What this does" section. Add a single bullet noting the change: - -```markdown -- Phase 4 (OCS) and Phase 5 (apps) run **shallow** QA only. Deep - quality assessment is a separate command — see /ace:qa-deep . - Phase 7 activation will refuse to proceed without fresh deep - verdicts (run /ace:qa-deep before go-live). -``` - -- [ ] **Step 4: Sanity-test the command file lints** - -Run: `npx tsx scripts/sync-version.sh --dry-run` (or whatever the repo's command-validator is — check `bin/ace-doctor` for hints). - -If the repo doesn't have a command linter, skip — the command is just markdown frontmatter that Claude Code parses. - -- [ ] **Step 5: Commit** - -```bash -git add commands/qa-deep.md commands/run.md -git commit -m "feat: add /ace:qa-deep command for manual deep quality assessment - -Thin wrapper that dispatches OCS deep qa+eval + new app-ux-eval. Read- -and-grade only — no run side effects. Supports --ocs-only / --apps-only -for surgical re-runs, --since= for incremental app grading. - -Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md" -``` - ---- - -## Task 5: Switch Phase 5 to executor-only; retire `qa-plan` - -**Goal:** Phase 5 stops synthesizing test plans. `app-screenshot-capture` reads `app-test-cases.yaml`, runs only smoke recipes, and adds a thin UX judge per app. The `qa-plan` skill becomes dead code (deleted in Task 8). - -**Files:** -- Modify: `skills/app-screenshot-capture/SKILL.md` -- Modify: `agents/qa-and-training.md` -- Modify: `lib/artifact-manifest.ts` (drop qa-plan/* artifacts; add app-ux-shallow verdict) - -- [ ] **Step 1: Read current Phase 5 wiring** - -Read in parallel: -- `agents/qa-and-training.md` -- `skills/app-screenshot-capture/SKILL.md` - -Confirm the existing dispatch order. Phase 5 should look like: -1. `qa-plan` (will be removed) -2. `app-screenshot-capture` (modified to read new artifact) -3. Per-artifact training skills in parallel -4. `training-deck-build` - -- [ ] **Step 2: Modify `skills/app-screenshot-capture/SKILL.md`** - -Edits: -- Replace the current input list ("reads `qa-plan/test-matrix.md`, - `qa-plan/walkthrough-recipes/manifest.yaml`...") with: - - `expected-journeys.md` - - `app-test-cases.yaml` -- Add a new Step labeled "Filter to smoke recipes": - ```markdown - ### Step : Select smoke recipes only - Read `app-test-cases.yaml`. Filter `journeys[]` to entries with - `is_smoke: true`. There MUST be exactly two (one per app — Learn - and Deliver). Halt with a clear pointer to `app-test-cases` if - fewer or more are found. - ``` -- After the existing screenshot-capture loop, add a new Step labeled - "Thin UX smoke judge": - ```markdown - ### Step : Thin UX smoke judge - - For each smoke recipe (Learn + Deliver), assemble the captured - screenshot set into a single LLM-as-Judge call: - - Prompt: "These screenshots are from a smoke run of the {{app}} - app. The target FLW persona (from PDD) is: {{persona_summary}}. - Looking at these screenshots in order, would this person be able - to complete the journey without confusion? Rate 0-3 + one-line - reason. 0 = a typical persona-matching FLW would get stuck; 3 = - obviously usable." - - Threshold: ≥ 2/3 per app. Below → halt with verdict. - ``` -- Update the verdict-writing section to also write - `ACE//runs//verdicts/app-screenshot-capture-shallow.yaml` - with the smoke-judge dimension. - -- [ ] **Step 3: Modify `agents/qa-and-training.md`** - -Find the `qa-plan` dispatch step. Delete it. Adjust the now-first -step (`app-screenshot-capture`) to note that its inputs come from -upstream phases: - -```markdown -### Step 1: Capture smoke screenshots + thin UX judge - -Dispatch `app-screenshot-capture`: -- Reads: expected-journeys.md (Phase 1), app-test-cases.yaml (Phase 2) -- Writes: screenshots/J*/*.png + verdicts/app-screenshot-capture-shallow.yaml -- Halts on smoke-recipe failure or UX judge < 2/3 -``` - -Confirm downstream training-skill dispatches still consume -`screenshots/manifest.yaml` (they do — `app-screenshot-capture` still -emits it). - -- [ ] **Step 4: Update artifact manifest** - -Modify `lib/artifact-manifest.ts`: - -(a) Drop the `qa-plan/*` entries (test-matrix, walkthrough-recipes/*, screenshot-manifest, uat-checklist, verdicts/qa-plan.yaml). -(b) Drop the `test-results/*` entries produced by `app-test`. -(c) Add the new shallow verdict: - -```typescript -{ - path: 'verdicts/app-screenshot-capture-shallow.yaml', - producedBy: 'app-screenshot-capture', - consumedBy: ['opp-eval'], - phase: 'operate', - required: true, - description: 'Shallow smoke verdict from /ace:run Phase 5 — smoke recipe pass/fail + thin UX judge ≥ 2/3 per app. Always present after a successful /ace:run.', -}, -``` - -(d) Update consumed-by lists: anything that listed `qa-plan` or `app-test` as consumer/producer needs the references removed. - -- [ ] **Step 5: Run manifest tests** - -Run: `npm test -- test/fixtures/artifact-manifest.test.ts` -Expected: PASS. Fixtures may need updating if they reference the dropped paths — handle in Task 8 if so. - -- [ ] **Step 6: Commit** - -```bash -git add skills/app-screenshot-capture/ agents/qa-and-training.md \ - lib/artifact-manifest.ts -git commit -m "refactor(phase-5): executor-only — drop qa-plan synthesis - -app-screenshot-capture now reads expected-journeys.md (Phase 1) and -app-test-cases.yaml (Phase 2) as inputs. Runs the two smoke recipes -flagged is_smoke: true (one per app), captures screenshots, runs a -single-question UX judge per app (~2 LLM calls total). Drops qa-plan -artifacts from the manifest; the qa-plan skill itself is deleted in -Task 8 once retirement settles. - -Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md" -``` - ---- - -## Task 6: Thin OCS `--quick`; drop Phase 4 deep gate - -**Goal:** OCS shallow (Phase 4 default) collapses to 3 prompts × 1 dimension. Deep no longer runs in Phase 4 — it lives only in `/ace:qa-deep`. - -**Files:** -- Modify: `skills/ocs-chatbot-qa/SKILL.md` -- Modify: `skills/ocs-chatbot-eval/SKILL.md` -- Modify: `agents/ocs-setup.md` - -- [ ] **Step 1: Read current OCS skill files** - -Read: -- `skills/ocs-chatbot-qa/SKILL.md` -- `skills/ocs-chatbot-eval/SKILL.md` -- `agents/ocs-setup.md` - -Find the section in each that defines `--quick` behavior. - -- [ ] **Step 2: Thin `ocs-chatbot-qa` `--quick`** - -Modify `skills/ocs-chatbot-qa/SKILL.md`: -- In the `--quick` mode section, change "5 smoke prompts" to "3 smoke prompts" (universal Connect-domain questions: 1 about claiming an opp, 1 about syncing data, 1 about getting paid) -- Tighten the timeout: total cap = 90s × 3 = 270s -- Note in the change log: "Thinned from 5 to 3 prompts (0.x.0). Phase 4 cost reduction; multi-dimensional judging moves to deep-only." - -- [ ] **Step 3: Thin `ocs-chatbot-eval` `--quick`** - -Modify `skills/ocs-chatbot-eval/SKILL.md`: -- In the `--quick` mode section, replace the 5-dimension grading rubric with a single-dimension `overall_quality_0_to_3` -- Pass criterion: every prompt's `overall_quality` ≥ 2/3 -- Verdict path stays `verdicts/ocs-chatbot-eval-quick.yaml` but the - dimensions array now has 1 entry -- Note in the change log - -- [ ] **Step 4: Drop the `--deep` gate from `agents/ocs-setup.md`** - -In `agents/ocs-setup.md`, find the "Step 3: Deep eval" section (or -equivalent). Delete the entire deep-eval step. Adjust step numbering -in the rest of the agent. - -In the Phase 4 gate-brief section, change "deep verdict" references -to "quick verdict" — Phase 4 → 5 only requires the quick gate now. - -Add a paragraph at the end of the agent's overview: - -```markdown -**Note:** Deep OCS evaluation moved out of Phase 4 in 0.x.0. Run -/ace:qa-deep after /ace:run completes to grade chatbot quality -before go-live. The Phase 7 llo-launch gate refuses to proceed -without a fresh, passing deep verdict. -``` - -- [ ] **Step 5: Run OCS-related tests** - -Run in parallel: -- `npm test -- test/mcp/ocs/` (unit tests, no live OCS) -- `npm test -- test/fixtures/artifact-manifest.test.ts` - -Expected: PASS. (Integration tests OCS_INTEGRATION=1 are out of scope here — those exercise live OCS and are a separate CI concern.) - -- [ ] **Step 6: Commit** - -```bash -git add skills/ocs-chatbot-qa/ skills/ocs-chatbot-eval/ agents/ocs-setup.md -git commit -m "refactor(phase-4): thin --quick to 3 prompts × 1 dim; drop Phase 4 deep gate - -OCS shallow collapses from 5 prompts × 5 dims (~25 calls) to 3 prompts -× 1 dim (overall_quality_0_to_3, 3 calls). Phase 4 → 5 gate is now -quick-pass-only. Deep OCS eval moves entirely to /ace:qa-deep. - -Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md" -``` - ---- - -## Task 7: Wire Phase 7 deep-verdict gate - -**Goal:** `llo-launch` reads both deep verdicts before activation. Refuses if missing, stale, or failing. Adds an override flag with audit trail. - -**Files:** -- Modify: `skills/llo-launch/SKILL.md` -- Modify: `bin/ace-doctor` (add a freshness check that mirrors the gate) -- Modify: `agents/llo-manager.md` (note the new gate behavior) - -- [ ] **Step 1: Read current `llo-launch`** - -Read: `skills/llo-launch/SKILL.md`. Identify the step that calls -`connect_activate_opportunity`. - -- [ ] **Step 2: Add the gate to `llo-launch`** - -Insert a new step **immediately before** the activation call: - -```markdown -### Step : Verify deep-QA verdicts before activation - -Read these two files from `ACE//runs//`: -- `verdicts/ocs-chatbot-eval-deep.yaml` -- `verdicts/app-ux-eval-deep.yaml` - -For each verdict, require: - -1. File exists. -2. `status: pass`. -3. Verdict timestamp is newer than the relevant artifact: - - OCS verdict: newer than the chatbot's last `published_at` - (read via `ocs_get_chatbot`) - - App verdict: newer than the latest released CommCare build - timestamp (read from `deployment-summary.md`) - -If ANY check fails, halt with [BLOCKER]: - -> Deep QA verdicts missing or stale. -> Run /ace:qa-deep before activation. -> Missing: - -### Step : Override (operator-only, audited) - -If the activation includes the flag `--override-deep-qa-gate=`, -skip the gate. Required: -- The flag must include a non-empty reason -- /ace:run cannot pass this flag (only /ace:step llo-launch can) -- Append to `comms-log/observations.md`: - > YYYY-MM-DD HH:MM TZ — Deep-QA gate overridden during activation. - > Reason: . Operator: . Verdicts at time of override: - > / . - -### Step : Activate the opportunity (existing step) - -(Preserve the existing connect_activate_opportunity call.) -``` - -Also update the skill's frontmatter description to mention the new gate. - -- [ ] **Step 3: Add a freshness check to `bin/ace-doctor`** - -Read `bin/ace-doctor`. Find the section reporting on per-opp verdicts -(if any; if not, this becomes a new section). - -Add a new check `[deep-qa-freshness]` that, given an opp name: -1. Reads `verdicts/ocs-chatbot-eval-deep.yaml` if present -2. Reads `verdicts/app-ux-eval-deep.yaml` if present -3. For each: compare timestamp to the artifact it grades -4. Reports: PASS / WARN (one is missing) / FAIL (one is stale) - -This is advisory in doctor (WARN-level), not a blocker. The actual -enforcement is the gate in `llo-launch`. - -- [ ] **Step 4: Update `agents/llo-manager.md`** - -Find the description of the `llo-launch` dispatch. Add a note: - -```markdown -**Note:** llo-launch enforces a deep-QA-verdict freshness gate before -activation in 0.x.0+. If /ace:qa-deep hasn't been run since the most -recent app release / chatbot publish, llo-launch halts with a -[BLOCKER] and the operator must run /ace:qa-deep before resuming. -``` - -- [ ] **Step 5: Run tests** - -Run: `npm test -- test/` -Expected: PASS. Existing tests don't cover the new gate (it's prompt-side); integration coverage comes from a manual test on a fixture opp in Task 9. - -- [ ] **Step 6: Commit** - -```bash -git add skills/llo-launch/ bin/ace-doctor agents/llo-manager.md -git commit -m "feat(phase-6): gate llo-launch on fresh deep-QA verdicts - -Before connect_activate_opportunity, llo-launch reads both deep -verdicts (OCS + apps), checks they exist + pass + are newer than the -artifacts they grade. Halts with [BLOCKER] otherwise. Override flag ---override-deep-qa-gate= bypasses with audit trail in -comms-log/observations.md (only available via /ace:step, not /ace:run). -Doctor adds a WARN-level freshness check. - -Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md" -``` - ---- - -## Task 8: Retire `qa-plan` and `app-test`; migration script - -**Goal:** Delete the dead skills, update fixtures, write the migration doc, bump version. - -**Files:** -- Delete: `skills/qa-plan/` (entire directory) -- Delete: `skills/app-test/` (entire directory) -- Create: `migrations/0.x.0-shallow-deep-qa.md` -- Modify: `test/fixtures/...` (remove references to retired artifacts) -- Modify: `VERSION` (bump per `scripts/version-bump.sh`) -- Modify: `CHANGELOG.md` (entry for 0.x.0) - -- [ ] **Step 1: Find every reference to retired skills** - -Run in parallel: -- `git grep -l 'qa-plan' -- skills/ agents/ commands/ lib/ test/` -- `git grep -l 'app-test' -- skills/ agents/ commands/ lib/ test/ -- ':!skills/app-test-cases'` - -Note: `app-test-cases` (the new skill) shouldn't be matched by the -second grep — that's why we exclude its directory. - -- [ ] **Step 2: Remove references** - -For each file that references `qa-plan` or `app-test` (the retired ones): -- Skill markdown files: delete the line / paragraph referencing them -- Agent markdown files: confirm they were already updated in Tasks 1, 2, 5; if not, update now -- `lib/artifact-manifest.ts`: confirm Task 5's edits removed all entries for these skills (no `producedBy: 'qa-plan'` or `producedBy: 'app-test'` remaining) -- Tests: update fixtures so they no longer expect `qa-plan/*` or `test-results/*` files - -- [ ] **Step 3: Delete the skill directories** - -```bash -git rm -r skills/qa-plan/ skills/app-test/ -``` - -- [ ] **Step 4: Write the migration doc** - -Create `migrations/0.x.0-shallow-deep-qa.md`: - -```markdown -# Migration: 0.x.0 — Shallow / Deep QA Split - -**Date:** YYYY-MM-DD - -## What changed - -- New skills: `pdd-to-app-journeys` (Phase 1), `app-test-cases` (Phase 2), - `app-ux-eval` (deep, manual) -- New artifacts: `expected-journeys.md`, `app-test-cases.yaml`, - `verdicts/app-ux-eval-deep.yaml`, `verdicts/app-screenshot-capture-shallow.yaml` -- New command: `/ace:qa-deep ` -- Modified: OCS `--quick` thinned to 3×1 dim; `app-screenshot-capture` - reads new artifacts; `llo-launch` gates on deep verdicts -- Retired: `qa-plan`, `app-test` skills + their artifacts - -## In-flight opportunities (mid-/ace:run when 0.x.0 lands) - -If an opp's run had completed Phase 1 but not Phase 2 before this update: -- Re-run Phase 1 just for the new artifact: - `/ace:step pdd-to-app-journeys ` -- Resume from where Phase 2 left off - -If an opp had completed Phase 5 (qa-plan + app-screenshot-capture) on -the old shape: -- The old artifacts (qa-plan/*) remain in Drive; nothing reads them. - Safe to leave. -- For deep QA, run `/ace:qa-deep ` to populate the new verdicts. - -## Activation gate (Phase 7) - -Existing opps that completed Phase 5 on the old shape but have NOT yet -been activated will hit the new deep-QA gate. Run `/ace:qa-deep ` -before `/ace:step llo-launch`. If you must bypass for emergency -activation: `/ace:step llo-launch --override-deep-qa-gate=""` -(reason is required; gets logged to `comms-log/observations.md`). - -## Cost impact - -- /ace:run shallow QA: ~5 LLM judge calls (was ~90) -- /ace:qa-deep (manual, optional): ~65 OCS + per-journey app -- Net: /ace:run cycles cheaper; deep grading is now opt-in - -## Rollback - -Revert to . The old qa-plan + app-test skills -return; the new artifacts in Drive are ignored. No Drive data loss. -``` - -- [ ] **Step 5: Bump version** - -Run: `bash scripts/version-bump.sh` - -This fetches origin/main, picks `max(local, origin) + patch+1`, and -syncs the four version files. Capture the new version (e.g., `0.x.0`). - -Edit `migrations/0.x.0-shallow-deep-qa.md` to replace `0.x.0` with the -real version. Same for the change-log entries inside skill files -(Task 1 step 7 commit body, Task 2 step 7 commit body, etc. — for the -log table dates). If those changelog tables already have the literal -`0.x.0`, replace via `git grep '0\.x\.0' | grep -v 0.x.0-shallow` and -inspect. - -- [ ] **Step 6: Update `CHANGELOG.md`** - -Add a section at the top: - -```markdown -## 0.x.0 — Shallow / Deep QA Split - -- New: /ace:qa-deep for manual deep quality assessment -- New: pdd-to-app-journeys (Phase 1), app-test-cases (Phase 2), - app-ux-eval (deep) skills -- Changed: /ace:run does shallow QA only — ~5 LLM judge calls vs ~90 - before. Phase 7 llo-launch refuses activation without fresh deep - verdicts (override available with audit reason). -- Retired: qa-plan, app-test skills (replaced by upstream producers) -- Migration: see migrations/0.x.0-shallow-deep-qa.md -- Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md -``` - -- [ ] **Step 7: Run the full test suite** - -Run: `npm test` - -Expected: PASS. Any fixture-coverage failures from earlier tasks should -now be fixed (since fixtures were updated in Step 2 above). - -- [ ] **Step 8: Commit** - -```bash -git add migrations/ CHANGELOG.md VERSION package.json \ - .claude-plugin/plugin.json .claude-plugin/marketplace.json \ - skills/qa-plan/ skills/app-test/ \ - test/fixtures/ -git commit -m "chore(0.x.0): retire qa-plan + app-test, migration doc, version bump - -Wraps the shallow/deep QA split. qa-plan and app-test skills + their -artifacts are removed (their jobs moved to pdd-to-app-journeys, -app-test-cases, and app-ux-eval). Migration notes for in-flight opps -in migrations/. Version bumped via scripts/version-bump.sh. - -Spec: docs/superpowers/specs/2026-05-04-shallow-deep-qa-split-design.md" -``` - ---- - -## Task 9: End-to-end smoke against a test fixture - -**Goal:** Verify the new path runs end-to-end on a fixture opp before -shipping. Doesn't replace integration tests (those are CI-time); this -is a one-time confidence check that the wiring works. - -**Files:** -- Read: `test/fixtures/CRISPR-Test-001/...` (atomic-visit golden fixture) -- Possibly modify: fixture files to include the new artifacts - -- [ ] **Step 1: Pick the smallest existing fixture** - -Read: `test/fixtures/` — find the CRISPR-Test-001 atomic-visit fixture. - -- [ ] **Step 2: Verify or backfill the fixture's new artifacts** - -Confirm or create: -- `expected-journeys.md` -- `app-test-cases.yaml` (with at least one `is_smoke: true` per app) -- Sample `verdicts/app-ux-eval-deep.yaml` (passing) - -If any are missing, hand-write minimal versions matching the templates -from Tasks 1+2. - -- [ ] **Step 3: Run manifest validation against the fixture** - -Run: `npm test -- test/fixtures/artifact-manifest.test.ts` -Expected: PASS for the updated fixture. - -- [ ] **Step 4: Dry-run the new skills against the fixture** - -The dry-run paths from each skill write under `comms-log/dry-run-*`. -Confirm: -- `pdd-to-app-journeys` dry-run produces a non-empty journeys file -- `app-test-cases` dry-run produces yaml with at least the bindings - (recipes can be stubbed) -- `/ace:qa-deep` dry-run prints the planned dispatches without running them - -(If dry-run plumbing isn't wired in your skill body for a given step, -that's fine — we're checking inputs/outputs, not exhaustive dry-run -coverage.) - -- [ ] **Step 5: Commit fixture updates if any** - -```bash -git add test/fixtures/ -git commit -m "test(fixture): add new shallow-deep-qa artifacts to CRISPR-Test-001 - -expected-journeys.md, app-test-cases.yaml, app-ux-eval-deep verdict -sample. Lets manifest validation pass and provides a known-good fixture -for future regressions." -``` - ---- - -## Self-Review - -After writing this plan I checked it against the spec: - -**Spec coverage:** -- §1 Artifact ownership → Tasks 1, 2, 5 (artifact moves + manifest edits) -- §2 Skill changes (new/retired/changed) → Tasks 1–8 cover every entry -- §3 Shallow path (OCS + apps) → Tasks 5, 6 -- §4 Deep app UX rubric → Task 3 -- §5 /ace:qa-deep → Task 4 -- §6 Phase 7 deep-verdict gate → Task 7 -- Migration / rollout → Task 8 -- Open questions (1)–(4) noted in the spec are intentionally not - blockers; they get iterated post-ship. - -**Placeholder scan:** No `TBD` / `TODO` / "implement later" / "add validation". The dimension table is fully filled in (Task 3 step 2). Each skill body has its `## Process` section spelled out. Version numbers are intentionally `0.x.0` until Task 8 step 5 resolves the actual bump. - -**Type consistency:** -- Verdict file names align across tasks: `app-ux-eval-deep.yaml` (Tasks 3, 4, 7), `ocs-chatbot-eval-deep.yaml` (Tasks 4, 7), `app-screenshot-capture-shallow.yaml` (Task 5) -- Smoke flag spelled `is_smoke: true` consistently (Tasks 2, 5) -- Skill names: `pdd-to-app-journeys`, `app-test-cases`, `app-ux-eval` consistent across all tasks -- Artifact paths use `runs//` shape consistently (matches `lib/run-paths.ts` convention) - -**Order dependency check:** -- Tasks 1–4 are additive and don't break the running pipeline -- Task 5 swaps Phase 5 over to the new artifacts — depends on Tasks 1, 2, 3 having shipped first -- Task 6 thins OCS — independent of Tasks 1–5; can run in any order after Task 4 -- Task 7 wires Phase 7 — depends on Task 3 (verdict producer must exist) and Task 4 (gate references /ace:qa-deep in error messages) -- Task 8 retires dead code — must come last -- Task 9 verifies — must come last - -The plan is implementable end-to-end as written. Migration ordering preserves a working pipeline at every commit. diff --git a/docs/superpowers/plans/2026-05-05-app-multimedia-coverage.md b/docs/superpowers/plans/2026-05-05-app-multimedia-coverage.md deleted file mode 100644 index f05459fb..00000000 --- a/docs/superpowers/plans/2026-05-05-app-multimedia-coverage.md +++ /dev/null @@ -1,2105 +0,0 @@ -# app-multimedia-coverage Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Ship a manually-invoked ACE skill that takes Nova-built CommCare apps, uses an LLM judge to pick which fields deserve display-only images, generates them via Dimagi's Content Generator API, patches form XML and bundles assets into the CCZ, then re-builds and re-releases — closing the loop on multimedia attachment. - -**Architecture:** Sibling of `commcare-form-patch`. New skill `skills/app-multimedia-coverage/`, two new pure helpers (`lib/multimedia-judge.ts`, `lib/content-generator-client.ts`, `lib/multimedia-manifest.ts`, `lib/multimedia-prompt-hash.ts`, `lib/multimedia-xform-patch.ts`), one new MCP atom (`commcare_upload_multimedia`), `.env.tpl` additions for Content Generator credentials, doctor env-drift checks, a smoke fixture, and a Nova feature request filed against `voidcraft-labs/nova-plugin`. Spec at `docs/superpowers/specs/2026-05-05-app-multimedia-coverage-design.md`. - -**Tech Stack:** TypeScript / vitest / Anthropic SDK (Sonnet 4.6) / Playwright for CCHQ I/O / Zod for schemas / @xmldom/xmldom (already in repo for form-XML manipulation). - ---- - -## Task 0: Branch verification & worktree confirmation - -**Files:** none - -- [ ] **Step 1: Verify branch and clean tree** - -```bash -git rev-parse --git-dir | grep -q worktrees && echo "in worktree: ok" -git status --short # expect empty -git log --oneline -1 # expect the design-spec commit (638a855 or later) -``` - -Expected: in worktree, clean tree, the spec commit at HEAD. - -- [ ] **Step 2: Read the spec** - -Read `docs/superpowers/specs/2026-05-05-app-multimedia-coverage-design.md` end-to-end before continuing. The plan below assumes you've read it. - ---- - -## Task 1: Probe the Content Generator API contract - -**Files:** -- Create: `scripts/probe-content-generator.ts` - -This is investigative; no test, no commit yet (the script gets committed alongside the client in Task 6 once the contract is documented). The goal is to discover: does the API return PNG bytes inline or a signed URL? What's the exact request body and auth header shape? - -- [ ] **Step 1: Pull credentials from 1Password** - -```bash -op item get "Content Generator API" --vault AI-Agents --account dimagi.1password.com --format json -``` - -Expected: JSON containing fields like `url`, `apikey` (or `credential`, etc.). Note the exact field names — they go into `.env.tpl` in Task 3. - -- [ ] **Step 2: Write the probe script** - -```typescript -// scripts/probe-content-generator.ts -// -// Probes Dimagi's Content Generator API to document the live contract. -// Purely investigative — outputs: -// - Request shape that worked -// - Response shape (Content-Type, body size, structure) -// - Total wall-clock for one image -// -// Run: npx tsx scripts/probe-content-generator.ts - -import { writeFileSync } from 'node:fs'; - -const URL = process.env.CONTENT_GENERATOR_URL!; -const KEY = process.env.CONTENT_GENERATOR_API_KEY!; -if (!URL || !KEY) { - console.error('Set CONTENT_GENERATOR_URL and CONTENT_GENERATOR_API_KEY'); - process.exit(1); -} - -const body = { - application_context: - 'Frontline workers in Africa teaching mothers to care for Small Vulnerable Newborns with Kangaroo Mother Care. Modestly dressed, representative of context.', - form_text: 'Show the mother how to support the baby\'s head and neck while skin-to-skin.', - image_directives: - 'Frontline worker assisting a mother holding a small newborn skin-to-skin against her chest, head supported, warm lighting.', -}; - -const t0 = Date.now(); -const res = await fetch(URL, { - method: 'POST', - headers: { - Authorization: `Bearer ${KEY}`, - 'Content-Type': 'application/json', - }, - body: JSON.stringify(body), -}); -const elapsed = Date.now() - t0; - -console.log({ status: res.status, contentType: res.headers.get('content-type'), elapsedMs: elapsed }); - -const buf = Buffer.from(await res.arrayBuffer()); -writeFileSync('/tmp/content-gen-probe-response.bin', buf); - -if (res.headers.get('content-type')?.startsWith('image/')) { - console.log('Response is image bytes inline. Saved to /tmp/content-gen-probe-response.bin (open to confirm).'); -} else if (res.headers.get('content-type')?.includes('json')) { - console.log('Response is JSON:', buf.toString('utf-8').slice(0, 500)); -} else { - console.log('Unexpected content type. Body bytes 0..200:', buf.slice(0, 200).toString()); -} -``` - -- [ ] **Step 3: Run the probe and capture findings** - -```bash -export CONTENT_GENERATOR_URL= -export CONTENT_GENERATOR_API_KEY= -npx tsx scripts/probe-content-generator.ts -``` - -If status is non-200, iterate on auth header (`Authorization: Bearer X` vs `X-API-Key: X` vs `?api_key=`), body wrapper, etc., until 200. - -- [ ] **Step 4: Document the contract** - -Append a top-of-file comment block to `scripts/probe-content-generator.ts` documenting the live contract verbatim: - -``` -// LIVE CONTRACT (probed YYYY-MM-DD): -// Method: POST -// URL: -// Auth:
: -// Request body: { application_context, form_text, image_directives } -// Response: -// Wall-clock: ~Xs low-res / ~Ys upscaled -``` - -This block becomes the source of truth for `lib/content-generator-client.ts` in Task 6. - ---- - -## Task 2: Probe the CCHQ multimedia upload endpoint - -**Files:** -- Create: `scripts/probe-multimedia-upload.ts` - -Same shape as Task 1: discover the live endpoint, document it, no commit yet. - -- [ ] **Step 1: Read existing CCHQ atoms for the auth pattern** - -Read `mcp/connect/backends/commcare.ts` lines 285–360 (the `patchXform` implementation). The probe script needs to use the same Playwright-session auth. - -- [ ] **Step 2: Write the probe script** - -```typescript -// scripts/probe-multimedia-upload.ts -// -// Probes CCHQ's multimedia upload endpoint to document the live contract. -// Uses the same authenticated Playwright session as commcare_patch_xform. -// -// Run: npx tsx scripts/probe-multimedia-upload.ts -// -// The endpoint is best-guess `/a//apps//multimedia/uploaded/`; -// CCHQ may use a different path. Iterate until 200. - -import { commcareClient } from '../mcp/connect/backends/commcare.js'; // adjust if needed -import { readFileSync } from 'node:fs'; - -const [, , domain, appId] = process.argv; -if (!domain || !appId) { - console.error('Usage: npx tsx scripts/probe-multimedia-upload.ts '); - process.exit(1); -} - -// 1x1 PNG (smallest valid PNG, ~67 bytes) -const TINY_PNG = Buffer.from( - 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII=', - 'base64' -); - -const candidatePaths = [ - `/a/${domain}/apps/${appId}/multimedia/uploaded/`, - `/a/${domain}/apps/${appId}/multimedia_upload/`, - `/a/${domain}/multimedia/upload_multimedia/${appId}/`, - `/a/${domain}/apps/multimedia/${appId}/uploaded/`, -]; - -const client = await commcareClient(); -for (const path of candidatePaths) { - const form = new FormData(); - form.set('Filedata', new Blob([TINY_PNG], { type: 'image/png' }), 'probe.png'); - form.set('media_type', 'image'); - form.set('file_name', 'probe.png'); - - // Probe: try the path with a multipart POST through the existing session. - // Adjust the POST helper to match commcare.ts's actual API. - const res = await client.rawPost(path, form); // <-- helper to add if missing - console.log({ path, status: res.status, body: (await res.text()).slice(0, 300) }); - if (res.status === 200) break; -} -``` - -- [ ] **Step 3: Run against a test HQ project** - -Use an existing ACE smoke opp's HQ domain + app_id from a recent `2-commcare/app-deploy_summary.md`. Iterate the candidate paths and form-field names until one returns 200 with a multimedia_id-shaped response. - -- [ ] **Step 4: Document the contract** - -Top-of-file comment block in `scripts/probe-multimedia-upload.ts`: - -``` -// LIVE CONTRACT (probed YYYY-MM-DD against ): -// Method: POST -// Path: /a////... -// Content-Type: multipart/form-data -// Required fields: Filedata=, media_type=image, file_name= -// Optional fields: -// CSRF: required via X-CSRFToken header (per CCHQ standard) -// Auth: same Playwright session as patchXform (login_or_digest) -// Response: 200 application/json { multimedia_id, sha1, ... } -// Errors: 400 on bad media_type, 403 on csrf miss -``` - -This is the source of truth for `commcare_upload_multimedia` in Task 9. - ---- - -## Task 3: Add Content Generator credentials to `.env.tpl` - -**Files:** -- Modify: `.env.tpl` (append a new section after CommCare HQ block) - -- [ ] **Step 1: Append new section** - -Append to `.env.tpl` (use the field names discovered in Task 1 step 1): - -```bash -# ── Content Generator (image gen for app-multimedia-coverage) ─────── -# -# Dimagi's internal image-generation service (Cloud Run, Gemini-3-Flash). -# Used by the app-multimedia-coverage skill to attach display-only images -# to CommCare app questions. -# -# 1Password item: "Content Generator API" in AI-Agents vault. - -CONTENT_GENERATOR_URL=op://AI-Agents/Content Generator API/url -CONTENT_GENERATOR_API_KEY=op://AI-Agents/Content Generator API/credential -``` - -(Adjust `url` / `credential` to match the actual 1Password field names from Task 1 step 1.) - -- [ ] **Step 2: Regenerate local `.env`** - -```bash -op inject -i .env.tpl -o "$CLAUDE_PLUGIN_DATA/.env" --account dimagi.1password.com 2>&1 | tail -5 -# Verify new vars are present: -grep -c CONTENT_GENERATOR "$CLAUDE_PLUGIN_DATA/.env" # expect 2 -``` - -- [ ] **Step 3: Commit** - -```bash -git add .env.tpl -git commit -m "feat(env): add CONTENT_GENERATOR_URL / _API_KEY for multimedia skill" -``` - ---- - -## Task 4: `lib/multimedia-prompt-hash.ts` — content-addressed cache key - -**Files:** -- Create: `lib/multimedia-prompt-hash.ts` -- Test: `lib/multimedia-prompt-hash.test.ts` - -Pure function. Used to cache-skip image regeneration when inputs haven't changed. - -- [ ] **Step 1: Write the failing test** - -```typescript -// lib/multimedia-prompt-hash.test.ts -import { describe, it, expect } from 'vitest'; -import { promptHash } from './multimedia-prompt-hash.js'; - -describe('promptHash', () => { - it('returns the same hash for identical inputs', () => { - const a = promptHash({ appContext: 'X', formText: 'Y', directive: 'Z' }); - const b = promptHash({ appContext: 'X', formText: 'Y', directive: 'Z' }); - expect(a).toBe(b); - }); - - it('returns a different hash when any field changes', () => { - const base = { appContext: 'X', formText: 'Y', directive: 'Z' }; - const h = promptHash(base); - expect(promptHash({ ...base, appContext: 'X2' })).not.toBe(h); - expect(promptHash({ ...base, formText: 'Y2' })).not.toBe(h); - expect(promptHash({ ...base, directive: 'Z2' })).not.toBe(h); - }); - - it('is whitespace-insensitive on leading/trailing whitespace', () => { - const a = promptHash({ appContext: 'X', formText: 'Y', directive: 'Z' }); - const b = promptHash({ appContext: ' X ', formText: '\nY\n', directive: ' Z ' }); - expect(a).toBe(b); - }); - - it('treats null/undefined directive as the same as empty string', () => { - const a = promptHash({ appContext: 'X', formText: 'Y', directive: '' }); - const b = promptHash({ appContext: 'X', formText: 'Y', directive: null }); - const c = promptHash({ appContext: 'X', formText: 'Y', directive: undefined }); - expect(a).toBe(b); - expect(b).toBe(c); - }); - - it('returns a 64-char hex string (SHA-256)', () => { - const h = promptHash({ appContext: 'X', formText: 'Y', directive: 'Z' }); - expect(h).toMatch(/^[0-9a-f]{64}$/); - }); -}); -``` - -- [ ] **Step 2: Run test, verify it fails** - -```bash -npm test -- lib/multimedia-prompt-hash.test.ts -``` - -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement** - -```typescript -// lib/multimedia-prompt-hash.ts -import { createHash } from 'node:crypto'; - -export interface PromptHashInput { - appContext: string; - formText: string; - directive: string | null | undefined; -} - -export function promptHash(input: PromptHashInput): string { - const norm = (s: string | null | undefined) => (s ?? '').trim(); - const payload = [norm(input.appContext), norm(input.formText), norm(input.directive)].join(''); - return createHash('sha256').update(payload, 'utf-8').digest('hex'); -} -``` - -- [ ] **Step 4: Run tests, verify pass** - -```bash -npm test -- lib/multimedia-prompt-hash.test.ts -``` - -Expected: PASS (all 5 tests). - -- [ ] **Step 5: Commit** - -```bash -git add lib/multimedia-prompt-hash.ts lib/multimedia-prompt-hash.test.ts -git commit -m "feat(lib): multimedia prompt-hash helper for cache-skip" -``` - ---- - -## Task 5: `lib/multimedia-manifest.ts` — Zod schema + I/O helpers - -**Files:** -- Create: `lib/multimedia-manifest.ts` -- Test: `lib/multimedia-manifest.test.ts` - -The manifest is the auth-only source of truth for what's been generated. Stored as YAML in Drive at `2-commcare/app-multimedia-coverage_manifest.yaml`. - -- [ ] **Step 1: Write the failing test** - -```typescript -// lib/multimedia-manifest.test.ts -import { describe, it, expect } from 'vitest'; -import { - multimediaManifestSchema, - parseManifest, - serializeManifest, - type MultimediaManifest, -} from './multimedia-manifest.js'; - -const sample: MultimediaManifest = { - app_context_hash: 'a'.repeat(64), - images: [ - { - app: 'learn', - form_unique_id: 'f'.repeat(32), - field_id: 'kmc_position_demo', - prompt_hash: 'b'.repeat(64), - file_path: - 'app-multimedia-coverage_generated/learn/ffffffffffffffffffffffffffffffff/kmc_position_demo__bbbbbbbb.png', - ccz_filename: 'kmc_position_demo.png', - cchq_multimedia_id: 'mm_123', - cchq_sha1: 'c'.repeat(40), - generated_at: '2026-05-05T20:00:00.000Z', - }, - ], -}; - -describe('multimediaManifestSchema', () => { - it('accepts a well-formed manifest', () => { - expect(multimediaManifestSchema.parse(sample)).toEqual(sample); - }); - - it('rejects an unknown app value', () => { - const bad = { ...sample, images: [{ ...sample.images[0], app: 'feedback' }] }; - expect(() => multimediaManifestSchema.parse(bad)).toThrow(); - }); - - it('rejects a non-32-char form_unique_id', () => { - const bad = { ...sample, images: [{ ...sample.images[0], form_unique_id: 'short' }] }; - expect(() => multimediaManifestSchema.parse(bad)).toThrow(); - }); - - it('round-trips through YAML serialize/parse', () => { - const yaml = serializeManifest(sample); - expect(parseManifest(yaml)).toEqual(sample); - }); -}); -``` - -- [ ] **Step 2: Run test, verify it fails** - -```bash -npm test -- lib/multimedia-manifest.test.ts -``` - -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement** - -```typescript -// lib/multimedia-manifest.ts -import { z } from 'zod'; -import { dump as yamlDump, load as yamlLoad } from 'js-yaml'; - -export const multimediaImageSchema = z.object({ - app: z.enum(['learn', 'deliver']), - form_unique_id: z.string().regex(/^[0-9a-f]{32}$/, '32-char hex'), - field_id: z.string().min(1), - prompt_hash: z.string().regex(/^[0-9a-f]{64}$/, '64-char hex SHA-256'), - file_path: z.string().min(1), - ccz_filename: z.string().min(1), - cchq_multimedia_id: z.string().nullable(), - cchq_sha1: z.string().regex(/^[0-9a-f]{40}$/).nullable(), - generated_at: z.string().datetime(), -}); - -export const multimediaManifestSchema = z.object({ - app_context_hash: z.string().regex(/^[0-9a-f]{64}$/), - images: z.array(multimediaImageSchema), -}); - -export type MultimediaImage = z.infer; -export type MultimediaManifest = z.infer; - -export function parseManifest(yaml: string): MultimediaManifest { - return multimediaManifestSchema.parse(yamlLoad(yaml)); -} - -export function serializeManifest(m: MultimediaManifest): string { - multimediaManifestSchema.parse(m); // throw on invalid - return yamlDump(m, { noRefs: true, lineWidth: 100 }); -} -``` - -- [ ] **Step 4: Run tests, verify pass** - -```bash -npm test -- lib/multimedia-manifest.test.ts -``` - -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add lib/multimedia-manifest.ts lib/multimedia-manifest.test.ts -git commit -m "feat(lib): multimedia-manifest Zod schema + YAML I/O" -``` - ---- - -## Task 6: `lib/content-generator-client.ts` — typed wrapper for the API - -**Files:** -- Create: `lib/content-generator-client.ts` -- Test: `lib/content-generator-client.test.ts` - -Wrapper around the live contract documented in Task 1 step 4. - -- [ ] **Step 1: Write the failing test** - -```typescript -// lib/content-generator-client.test.ts -import { describe, it, expect, vi, beforeEach } from 'vitest'; -import { ContentGeneratorClient, ContentGeneratorAuthError } from './content-generator-client.js'; - -const PNG_MAGIC = new Uint8Array([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]); - -describe('ContentGeneratorClient', () => { - let fetchMock: ReturnType; - beforeEach(() => { - fetchMock = vi.fn(); - vi.stubGlobal('fetch', fetchMock); - }); - - it('returns PNG bytes on 200 image/png', async () => { - fetchMock.mockResolvedValueOnce( - new Response(PNG_MAGIC, { status: 200, headers: { 'content-type': 'image/png' } }), - ); - const c = new ContentGeneratorClient({ url: 'https://x.test/gen', apiKey: 'k' }); - const out = await c.generateImage({ applicationContext: 'A', formText: 'F', imageDirectives: 'D' }); - expect(out.subarray(0, 8)).toEqual(Buffer.from(PNG_MAGIC)); - }); - - it('sends Authorization Bearer header with the API key', async () => { - fetchMock.mockResolvedValueOnce( - new Response(PNG_MAGIC, { status: 200, headers: { 'content-type': 'image/png' } }), - ); - const c = new ContentGeneratorClient({ url: 'https://x.test/gen', apiKey: 'k123' }); - await c.generateImage({ applicationContext: 'A', formText: 'F' }); - expect(fetchMock.mock.calls[0][1].headers.Authorization).toBe('Bearer k123'); - }); - - it('retries once on 5xx then succeeds', async () => { - fetchMock - .mockResolvedValueOnce(new Response('fail', { status: 503 })) - .mockResolvedValueOnce( - new Response(PNG_MAGIC, { status: 200, headers: { 'content-type': 'image/png' } }), - ); - const c = new ContentGeneratorClient({ url: 'https://x.test/gen', apiKey: 'k', retryDelayMs: 1 }); - const out = await c.generateImage({ applicationContext: 'A', formText: 'F' }); - expect(out.subarray(0, 4).toString('hex')).toBe('89504e47'); - expect(fetchMock).toHaveBeenCalledTimes(2); - }); - - it('throws ContentGeneratorAuthError on 401/403', async () => { - fetchMock.mockResolvedValueOnce(new Response('forbidden', { status: 403 })); - const c = new ContentGeneratorClient({ url: 'https://x.test/gen', apiKey: 'bad', retryDelayMs: 1 }); - await expect(c.generateImage({ applicationContext: 'A', formText: 'F' })).rejects.toBeInstanceOf( - ContentGeneratorAuthError, - ); - }); - - it('does not retry on 4xx (other than 408/429)', async () => { - fetchMock.mockResolvedValueOnce(new Response('bad request', { status: 400 })); - const c = new ContentGeneratorClient({ url: 'https://x.test/gen', apiKey: 'k', retryDelayMs: 1 }); - await expect(c.generateImage({ applicationContext: 'A', formText: 'F' })).rejects.toThrow(); - expect(fetchMock).toHaveBeenCalledTimes(1); - }); -}); -``` - -- [ ] **Step 2: Run test, verify it fails** - -```bash -npm test -- lib/content-generator-client.test.ts -``` - -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement (adapt to live contract from Task 1)** - -```typescript -// lib/content-generator-client.ts -// -// Wrapper around Dimagi's internal Content Generator API. Live contract -// documented in scripts/probe-content-generator.ts. - -export class ContentGeneratorAuthError extends Error { - constructor(public status: number, body: string) { - super(`Content Generator auth failed (${status}): ${body.slice(0, 200)}`); - this.name = 'ContentGeneratorAuthError'; - } -} - -export class ContentGeneratorClient { - constructor( - private opts: { - url: string; - apiKey: string; - timeoutMs?: number; // default 60_000 - retryDelayMs?: number; // default 1_000 - }, - ) {} - - async generateImage(input: { - applicationContext: string; - formText: string; - imageDirectives?: string; - }): Promise { - const body = { - application_context: input.applicationContext, - form_text: input.formText, - image_directives: input.imageDirectives ?? '', - }; - - const attempt = async (): Promise => { - const ac = new AbortController(); - const t = setTimeout(() => ac.abort(), this.opts.timeoutMs ?? 60_000); - try { - return await fetch(this.opts.url, { - method: 'POST', - headers: { - Authorization: `Bearer ${this.opts.apiKey}`, - 'Content-Type': 'application/json', - }, - body: JSON.stringify(body), - signal: ac.signal, - }); - } finally { - clearTimeout(t); - } - }; - - let res = await attempt(); - if (res.status >= 500 || res.status === 408 || res.status === 429) { - await new Promise(r => setTimeout(r, this.opts.retryDelayMs ?? 1_000)); - res = await attempt(); - } - - if (res.status === 401 || res.status === 403) { - throw new ContentGeneratorAuthError(res.status, await res.text()); - } - if (res.status !== 200) { - throw new Error(`Content Generator HTTP ${res.status}: ${(await res.text()).slice(0, 300)}`); - } - - const ct = res.headers.get('content-type') ?? ''; - if (ct.startsWith('image/')) { - return Buffer.from(await res.arrayBuffer()); - } - if (ct.includes('json')) { - // Live contract may return {url: signed} — fetch it inline. - const j = await res.json(); - if (typeof j?.url === 'string') { - const r2 = await fetch(j.url); - if (r2.status !== 200) throw new Error(`signed URL fetch ${r2.status}`); - return Buffer.from(await r2.arrayBuffer()); - } - throw new Error(`Content Generator JSON response had no .url: ${JSON.stringify(j).slice(0, 200)}`); - } - throw new Error(`Content Generator unexpected content-type: ${ct}`); - } -} -``` - -- [ ] **Step 4: Run tests, verify pass** - -```bash -npm test -- lib/content-generator-client.test.ts -``` - -Expected: PASS. - -- [ ] **Step 5: Commit (probe + client together)** - -```bash -git add lib/content-generator-client.ts lib/content-generator-client.test.ts scripts/probe-content-generator.ts -git commit -m "feat(lib): content-generator-client + probe script" -``` - ---- - -## Task 7: `lib/multimedia-judge.ts` — LLM judge for "image-worthy" fields - -**Files:** -- Create: `lib/multimedia-judge.ts` -- Test: `lib/multimedia-judge.test.ts` - -Single Anthropic SDK call per field. App Context goes in an `ephemeral`-cached block. Sonnet 4.6. - -- [ ] **Step 1: Write the failing test** - -```typescript -// lib/multimedia-judge.test.ts -import { describe, it, expect, vi } from 'vitest'; -import { judgeField, type JudgeInput } from './multimedia-judge.js'; - -const fakeAnthropic = (responseText: string) => ({ - messages: { - create: vi.fn().mockResolvedValue({ - content: [{ type: 'text', text: responseText }], - usage: { input_tokens: 10, output_tokens: 5 }, - }), - }, -}); - -const baseInput: JudgeInput = { - appContext: 'African FLWs teaching mothers KMC for SVN newborns. Modestly dressed.', - appType: 'learn', - formName: 'KMC positioning', - formPosition: 'module 1, form 0 (instructional)', - field: { - id: 'kmc_position_demo', - kind: 'label', - label: "Show the mother how to support the baby's head and neck while skin-to-skin.", - hint: null, - options: [], - }, - surroundingFields: [], -}; - -describe('judgeField', () => { - it('parses a valid yes-self-use response', async () => { - const fake = fakeAnthropic( - JSON.stringify({ - generate: true, - use_case: 'flw_self_use', - why: 'FLW uses this to demonstrate KMC positioning.', - directive: 'Frontline worker assisting a mother holding a small newborn skin-to-skin.', - }), - ); - const out = await judgeField(baseInput, fake as any); - expect(out.generate).toBe(true); - expect(out.use_case).toBe('flw_self_use'); - }); - - it('parses a valid no response', async () => { - const fake = fakeAnthropic(JSON.stringify({ generate: false, why: 'numeric input', directive: null })); - const out = await judgeField(baseInput, fake as any); - expect(out.generate).toBe(false); - }); - - it('throws on schema-invalid LLM output', async () => { - const fake = fakeAnthropic(JSON.stringify({ generate: 'maybe', why: 42 })); - await expect(judgeField(baseInput, fake as any)).rejects.toThrow(); - }); - - it('throws on non-JSON LLM output', async () => { - const fake = fakeAnthropic('I am sorry, I cannot'); - await expect(judgeField(baseInput, fake as any)).rejects.toThrow(); - }); - - it('places appContext in a cache_control:ephemeral block', async () => { - const fake = fakeAnthropic( - JSON.stringify({ generate: false, why: 'x', directive: null }), - ); - await judgeField(baseInput, fake as any); - const callArgs = fake.messages.create.mock.calls[0][0]; - const sysBlocks = Array.isArray(callArgs.system) ? callArgs.system : []; - const ephemeral = sysBlocks.find((b: any) => b.cache_control?.type === 'ephemeral'); - expect(ephemeral).toBeDefined(); - expect(ephemeral.text).toContain(baseInput.appContext); - }); -}); -``` - -- [ ] **Step 2: Run test, verify it fails** - -```bash -npm test -- lib/multimedia-judge.test.ts -``` - -Expected: FAIL — module does not exist. - -- [ ] **Step 3: Implement** - -```typescript -// lib/multimedia-judge.ts -import { z } from 'zod'; -import type Anthropic from '@anthropic-ai/sdk'; - -export const judgeOutputSchema = z.object({ - generate: z.boolean(), - use_case: z.enum(['flw_self_use', 'flw_shows_client', 'both']).optional().nullable(), - why: z.string().min(1).max(500), - directive: z.string().max(800).nullable(), -}); - -export type JudgeOutput = z.infer; - -export interface JudgeInput { - appContext: string; - appType: 'learn' | 'deliver'; - formName: string; - formPosition: string; - field: { - id: string; - kind: string; - label: string; - hint: string | null; - options: string[]; - }; - surroundingFields: Array<{ id: string; kind: string; label: string }>; -} - -const SYSTEM_HEAD = `You decide whether to generate a display-only image for a single CommCare app question. - -Criterion (yes if EITHER applies): -1. The frontline worker (FLW) would use this image themselves to do their job — e.g. a step-by-step demonstration, a labeled diagram of an anatomy or device. -2. The FLW would show the image to a client to communicate something — e.g. a visual choice card, a "what does X look like" reference. - -Skip if the question is purely numeric (weight, age), date/time, or a yes/no without ambiguity. Skip if the question's text alone is unambiguous and concrete. - -Return STRICT JSON only, matching this schema: -{ - "generate": boolean, - "use_case": "flw_self_use" | "flw_shows_client" | "both" | null, - "why": "short rationale, ≤200 chars", - "directive": "draft Image Directive for the generator, ≤500 chars, or null if generate=false" -} - -Image Directive guidance: be specific about the subject, action, environment, lighting, and any modesty/representation cues from the application context. The directive will be passed verbatim to an image generator.`; - -export async function judgeField( - input: JudgeInput, - anthropic: Anthropic, - model = 'claude-sonnet-4-6', -): Promise { - const userPayload = { - app_type: input.appType, - form_name: input.formName, - form_position: input.formPosition, - field: input.field, - surrounding_fields: input.surroundingFields, - }; - - const res = await anthropic.messages.create({ - model, - max_tokens: 600, - system: [ - { type: 'text', text: SYSTEM_HEAD }, - { - type: 'text', - text: `Application Context (constant for this opp):\n${input.appContext}`, - cache_control: { type: 'ephemeral' }, - }, - ], - messages: [{ role: 'user', content: JSON.stringify(userPayload) }], - }); - - const text = (res.content[0] as { type: string; text?: string }).text ?? ''; - const trimmed = text.trim().replace(/^```(?:json)?\s*|\s*```$/g, ''); - let parsed: unknown; - try { - parsed = JSON.parse(trimmed); - } catch { - throw new Error(`judge returned non-JSON: ${text.slice(0, 200)}`); - } - return judgeOutputSchema.parse(parsed); -} -``` - -- [ ] **Step 4: Run tests, verify pass** - -```bash -npm test -- lib/multimedia-judge.test.ts -``` - -Expected: PASS (5 tests). - -- [ ] **Step 5: Commit** - -```bash -git add lib/multimedia-judge.ts lib/multimedia-judge.test.ts -git commit -m "feat(lib): multimedia-judge LLM rubric for image-worthy fields" -``` - ---- - -## Task 8: `lib/multimedia-xform-patch.ts` — add `` itext to a form - -**Files:** -- Create: `lib/multimedia-xform-patch.ts` -- Test: `lib/multimedia-xform-patch.test.ts` -- Test fixture: `test/fixtures/cchq/multimedia-sample-form.xml` (a minimal CommCare XForm with a label question, 30-50 lines) - -Pure XML-DOM manipulation. No I/O. - -- [ ] **Step 1: Create the test fixture** - -```xml - - - - - KMC positioning - - - - - - - - - - Show the mother how to support the baby's head and neck. - - - - - - - - - - -``` - -- [ ] **Step 2: Write the failing test** - -```typescript -// lib/multimedia-xform-patch.test.ts -import { describe, it, expect } from 'vitest'; -import { readFileSync } from 'node:fs'; -import { join } from 'node:path'; -import { addImageItext } from './multimedia-xform-patch.js'; - -const FIXTURE = readFileSync( - join(__dirname, '../test/fixtures/cchq/multimedia-sample-form.xml'), - 'utf-8', -); - -describe('addImageItext', () => { - it('adds an jr:// value to the matching itext text node', () => { - const out = addImageItext(FIXTURE, [ - { fieldId: 'kmc_position_demo', cczFilename: 'kmc_position_demo.png' }, - ]); - expect(out.patched).toBe(true); - expect(out.xml).toContain('jr://file/commcare/image/kmc_position_demo.png'); - // The original label value must remain intact. - expect(out.xml).toContain("Show the mother how to support the baby's head and neck."); - }); - - it('is idempotent — re-applying does not duplicate the entry', () => { - const once = addImageItext(FIXTURE, [ - { fieldId: 'kmc_position_demo', cczFilename: 'kmc_position_demo.png' }, - ]); - const twice = addImageItext(once.xml, [ - { fieldId: 'kmc_position_demo', cczFilename: 'kmc_position_demo.png' }, - ]); - const occurrences = (twice.xml.match(/jr:\/\/file\/commcare\/image\/kmc_position_demo\.png/g) ?? []).length; - expect(occurrences).toBe(1); - expect(twice.patched).toBe(false); - }); - - it('returns patched=false when the field has no matching itext entry', () => { - const out = addImageItext(FIXTURE, [{ fieldId: 'no_such_field', cczFilename: 'x.png' }]); - expect(out.patched).toBe(false); - }); - - it('handles multiple fields in one pass', () => { - // Build a form with two label-text entries - const twoFieldForm = FIXTURE.replace( - /[\s\S]*?<\/text>/, - `AB`, - ); - const out = addImageItext(twoFieldForm, [ - { fieldId: 'a', cczFilename: 'a.png' }, - { fieldId: 'b', cczFilename: 'b.png' }, - ]); - expect(out.xml).toContain('jr://file/commcare/image/a.png'); - expect(out.xml).toContain('jr://file/commcare/image/b.png'); - }); -}); -``` - -- [ ] **Step 3: Run test, verify it fails** - -```bash -npm test -- lib/multimedia-xform-patch.test.ts -``` - -Expected: FAIL — module does not exist. - -- [ ] **Step 4: Check available XML library** - -```bash -grep -E '"@xmldom/xmldom"|"xmldom"|"fast-xml-parser"' package.json -``` - -If `@xmldom/xmldom` is present, use it. Otherwise `npm install --save @xmldom/xmldom` then commit `package.json` + `package-lock.json` changes alongside the implementation in step 6. - -- [ ] **Step 5: Implement** - -```typescript -// lib/multimedia-xform-patch.ts -// -// Pure XML transformation: given a CommCare XForm and a list of -// (fieldId, cczFilename) pairs, add a `jr://...` -// child to the matching `` node in itext. -// Idempotent: skips fields whose value is already present. - -import { DOMParser, XMLSerializer } from '@xmldom/xmldom'; - -export interface ImageBinding { - fieldId: string; - cczFilename: string; -} - -export interface PatchResult { - patched: boolean; - xml: string; - applied: string[]; // field ids that were modified - skipped: string[]; // field ids whose itext was already up-to-date - notFound: string[]; // field ids with no matching itext text -} - -export function addImageItext(xml: string, bindings: ImageBinding[]): PatchResult { - const doc = new DOMParser().parseFromString(xml, 'text/xml'); - const applied: string[] = []; - const skipped: string[] = []; - const notFound: string[] = []; - - // Find every node anywhere; loose match handles - // multi-translation forms (each has its own copy). - const texts = Array.from(doc.getElementsByTagName('text')); - - for (const b of bindings) { - const targetId = `${b.fieldId}-label`; - const matches = texts.filter(t => t.getAttribute('id') === targetId); - if (matches.length === 0) { - notFound.push(b.fieldId); - continue; - } - - const jrUrl = `jr://file/commcare/image/${b.cczFilename}`; - let modifiedThisField = false; - for (const t of matches) { - const existing = Array.from(t.getElementsByTagName('value')).some( - v => v.getAttribute('form') === 'image' && (v.textContent ?? '').trim() === jrUrl, - ); - if (existing) continue; - - const valueEl = doc.createElement('value'); - valueEl.setAttribute('form', 'image'); - valueEl.appendChild(doc.createTextNode(jrUrl)); - t.appendChild(valueEl); - modifiedThisField = true; - } - - if (modifiedThisField) applied.push(b.fieldId); - else skipped.push(b.fieldId); - } - - const out = new XMLSerializer().serializeToString(doc); - return { patched: applied.length > 0, xml: out, applied, skipped, notFound }; -} -``` - -- [ ] **Step 6: Run tests, verify pass** - -```bash -npm test -- lib/multimedia-xform-patch.test.ts -``` - -Expected: PASS (4 tests). - -- [ ] **Step 7: Commit** - -```bash -git add lib/multimedia-xform-patch.ts lib/multimedia-xform-patch.test.ts test/fixtures/cchq/multimedia-sample-form.xml -# include package.json/lock if dependency was added -git commit -m "feat(lib): multimedia-xform-patch — add itext entries" -``` - ---- - -## Task 9: `commcare_upload_multimedia` MCP atom - -**Files:** -- Modify: `mcp/connect/backends/commcare.ts` (add `uploadMultimedia` method) -- Modify: `mcp/connect/capability-map.ts` (add capability) -- Test: `test/mcp/connect/unit/commcare-upload-multimedia.test.ts` -- Integration test: `test/mcp/connect/integration/commcare-upload-multimedia.test.ts` - -**Live contract** (probed 2026-05-05; see `scripts/probe-multimedia-upload.ts` header): - -- **Method**: `POST` -- **Path**: `/a//apps//multimedia/uploaded//` where `` ∈ `{image, audio, video, text}` derived from the `content_type` MIME prefix -- **Body**: multipart/form-data; **required** `Filedata` (bytes) and `path` (`jr://file/commcare//.`); **optional** `originalPath`, `shared='t'`, `license`, `author`, `attribution-notes` -- **Headers**: `X-CSRFToken` (from session cookie), `Referer: /a//apps/view//` -- **Response 200**: `Content-Type: text/html` (lies; body is JSON): - ```json - { - "ref": { - "path": "jr://file/commcare/image/foo.png", - "uid": "<32-hex md5>", // → file_hash_md5 (CCHQ dedupes on this) - "m_id": "<32-hex couch _id>", // → multimedia_id - "url": "/hq/multimedia/file/CommCareImage//", - "updated": false, - "media_type": "Image" - }, - "errors": [] - } - ``` -- **Failure**: `400` with non-empty `errors[]`; `302 → /accounts/login/` on session expiry; `403` on CSRF miss - -**CRITICAL gotcha — orphan pruning.** CCHQ's `clean_paths()` strips multimedia entries that no form references on the next `make_build`. So this atom alone does NOT bundle the file into the released CCZ — the form-XML must already reference the `jr://...` path before `make_build` runs. The skill (Task 12) is responsible for ordering: patch form XML → upload media → make_build → release. The atom only owns the upload step. - -- [ ] **Step 1: Write the failing unit test** - -```typescript -// test/mcp/connect/unit/commcare-upload-multimedia.test.ts -import { describe, it, expect, vi } from 'vitest'; -import { CommcareBackend } from '../../../../mcp/connect/backends/commcare.js'; - -function fakeRequest(handler: (url: string, init: any) => { status: number; body: string; contentType?: string }) { - return { - get: vi.fn().mockImplementation(async () => ({ - status: () => 200, text: async () => '', headers: () => new Headers(), - })), - post: vi.fn().mockImplementation(async (url: string, init: any) => { - const r = handler(url, init); - const headers = new Headers({ 'content-type': r.contentType ?? 'text/html' }); - return { status: () => r.status, text: async () => r.body, headers: () => headers }; - }), - storageState: async () => ({ cookies: [{ name: 'csrftoken', value: 'TOKEN' }] }), - }; -} - -const SUCCESS_BODY = JSON.stringify({ - ref: { - path: 'jr://file/commcare/image/x.png', - uid: 'd'.repeat(32), // md5 hex - m_id: '9'.repeat(32), - url: '/hq/multimedia/file/CommCareImage/' + '9'.repeat(32) + '/', - updated: false, - media_type: 'Image', - }, - errors: [], -}); - -describe('commcare uploadMultimedia', () => { - it('POSTs to /multimedia/uploaded/image/ for image content types', async () => { - let postedUrl = ''; - const fake = fakeRequest((url) => { - postedUrl = url; - return { status: 200, body: SUCCESS_BODY }; - }); - const backend = new CommcareBackend({ request: fake as any, baseUrl: 'https://test.cchq' }); - await backend.uploadMultimedia({ - domain: 'demo', app_id: 'a'.repeat(32), - media_path: 'jr://file/commcare/image/x.png', - file_bytes: Buffer.from('PNG'), content_type: 'image/png', - }); - expect(postedUrl).toBe('https://test.cchq/a/demo/apps/' + 'a'.repeat(32) + '/multimedia/uploaded/image/'); - }); - - it('returns multimedia_id (m_id) and file_hash_md5 (uid) from ref', async () => { - const fake = fakeRequest(() => ({ status: 200, body: SUCCESS_BODY })); - const backend = new CommcareBackend({ request: fake as any, baseUrl: 'https://test.cchq' }); - const out = await backend.uploadMultimedia({ - domain: 'demo', app_id: 'a'.repeat(32), - media_path: 'jr://file/commcare/image/x.png', - file_bytes: Buffer.from('PNG'), content_type: 'image/png', - }); - expect(out.multimedia_id).toBe('9'.repeat(32)); - expect(out.file_hash_md5).toBe('d'.repeat(32)); - }); - - it('routes audio content types to /multimedia/uploaded/audio/', async () => { - let postedUrl = ''; - const fake = fakeRequest((url) => { - postedUrl = url; - return { status: 200, body: SUCCESS_BODY }; - }); - const backend = new CommcareBackend({ request: fake as any, baseUrl: 'https://test.cchq' }); - await backend.uploadMultimedia({ - domain: 'd', app_id: 'a'.repeat(32), - media_path: 'jr://file/commcare/audio/x.mp3', - file_bytes: Buffer.from('MP3'), content_type: 'audio/mpeg', - }); - expect(postedUrl).toMatch(/\/multimedia\/uploaded\/audio\/$/); - }); - - it('sets X-CSRFToken from cookies and uses the app-view page as Referer', async () => { - let init: any = null; - const fake = fakeRequest((_url, _init) => { - init = _init; - return { status: 200, body: SUCCESS_BODY }; - }); - const backend = new CommcareBackend({ request: fake as any, baseUrl: 'https://test.cchq' }); - await backend.uploadMultimedia({ - domain: 'demo', app_id: 'a'.repeat(32), - media_path: 'jr://file/commcare/image/x.png', - file_bytes: Buffer.from('PNG'), content_type: 'image/png', - }); - expect(init.headers['X-CSRFToken']).toBe('TOKEN'); - expect(init.headers.Referer).toBe('https://test.cchq/a/demo/apps/view/' + 'a'.repeat(32) + '/'); - }); - - it('throws with errors[] payload when CCHQ returns 400', async () => { - const fake = fakeRequest(() => ({ - status: 400, - body: JSON.stringify({ ref: null, errors: ['File extension does not match content_type'] }), - })); - const backend = new CommcareBackend({ request: fake as any, baseUrl: 'https://test.cchq' }); - await expect( - backend.uploadMultimedia({ - domain: 'd', app_id: 'a'.repeat(32), - media_path: 'jr://file/commcare/image/x.png', - file_bytes: Buffer.from('x'), content_type: 'image/png', - }), - ).rejects.toThrow(/extension does not match/); - }); - - it('throws on 302 redirect (session expired)', async () => { - const fake = fakeRequest(() => ({ status: 302, body: 'login' })); - const backend = new CommcareBackend({ request: fake as any, baseUrl: 'https://test.cchq' }); - await expect( - backend.uploadMultimedia({ - domain: 'd', app_id: 'a'.repeat(32), - media_path: 'jr://file/commcare/image/x.png', - file_bytes: Buffer.from('x'), content_type: 'image/png', - }), - ).rejects.toThrow(/302|session/i); - }); -}); -``` - -- [ ] **Step 2: Run test, verify it fails** - -```bash -npm test -- test/mcp/connect/unit/commcare-upload-multimedia.test.ts -``` - -Expected: FAIL — `uploadMultimedia` does not exist. - -- [ ] **Step 3: Add the backend method** - -In `mcp/connect/backends/commcare.ts`, add (place after `patchXform` around line 360+): - -```typescript -export interface UploadMultimediaArgs { - domain: string; - app_id: string; - media_path: string; // jr://file/commcare//. - file_bytes: Buffer; - content_type: string; // "image/png" | "image/jpeg" | "audio/mpeg" | ... -} - -export interface UploadMultimediaResult { - multimedia_id: string; // CouchDB doc _id (CCHQ's ref.m_id) - file_hash_md5: string; // md5 hex of the file bytes (CCHQ's ref.uid) -} - -// Inside CommcareBackend class: -async uploadMultimedia(args: UploadMultimediaArgs): Promise { - const mediaType = mediaTypeFromContentType(args.content_type); - const path = `/a/${args.domain}/apps/${args.app_id}/multimedia/uploaded/${mediaType}/`; - const refreshPath = `/a/${args.domain}/apps/view/${args.app_id}/`; - - // Refresh CSRF + session via the app view page (same pattern as patchXform). - await this.opts.request.get(`${this.opts.baseUrl}${refreshPath}`); - const csrf = await this.csrfFromCookies(); - - // Derive filename from media_path (last URI segment). - const filename = args.media_path.split('/').pop() ?? 'unnamed'; - - const form = new FormData(); - form.set('Filedata', new Blob([args.file_bytes], { type: args.content_type }), filename); - form.set('path', args.media_path); - - const res = await this.opts.request.post(`${this.opts.baseUrl}${path}`, { - multipart: form as any, - headers: { - 'X-CSRFToken': csrf ?? '', - Referer: `${this.opts.baseUrl}${refreshPath}`, - }, - maxRedirects: 0, - }); - const status = res.status(); - const body = await res.text(); - - if (status === 302) { - throw new Error( - `commcare_upload_multimedia POST ${path} returned 302 — session expired. Re-run /ace:connect-login.`, - ); - } - if (status !== 200) { - let errs: string[] = []; - try { - const j = JSON.parse(body); - if (Array.isArray(j?.errors)) errs = j.errors; - } catch { /* fall through */ } - const errMsg = errs.length ? errs.join('; ') : body.slice(0, 300); - throw new Error( - `commcare_upload_multimedia POST ${path} returned ${status}: ${errMsg}`, - ); - } - - let parsed: { ref?: { m_id?: string; uid?: string }; errors?: string[] } = {}; - try { - parsed = JSON.parse(body); - } catch { - throw new Error(`commcare_upload_multimedia non-JSON response: ${body.slice(0, 200)}`); - } - if (parsed.errors && parsed.errors.length > 0) { - throw new Error(`commcare_upload_multimedia errors: ${parsed.errors.join('; ')}`); - } - if (!parsed.ref?.m_id || !parsed.ref?.uid) { - throw new Error( - `commcare_upload_multimedia response missing ref.m_id / ref.uid: ${body.slice(0, 200)}`, - ); - } - return { - multimedia_id: parsed.ref.m_id, - file_hash_md5: parsed.ref.uid, - }; -} - -function mediaTypeFromContentType(ct: string): 'image' | 'audio' | 'video' | 'text' { - if (ct.startsWith('image/')) return 'image'; - if (ct.startsWith('audio/')) return 'audio'; - if (ct.startsWith('video/')) return 'video'; - if (ct.startsWith('text/')) return 'text'; - throw new Error(`commcare_upload_multimedia: unsupported content_type ${ct}`); -} -``` - -- [ ] **Step 4: Run tests, verify pass** - -```bash -npm test -- test/mcp/connect/unit/commcare-upload-multimedia.test.ts -``` - -Expected: PASS (6 tests). - -- [ ] **Step 5: Add integration test** - -```typescript -// test/mcp/connect/integration/commcare-upload-multimedia.test.ts -// -// Verifies the upload atom against a live CCHQ project. NOTE: this test -// only verifies the upload step (200 + parseable response). It does NOT -// build or release because orphan multimedia (no form reference) gets -// pruned by clean_paths() — that's expected CCHQ behavior, not a bug. -// Bundling-into-CCZ is the skill's responsibility, exercised end-to-end -// in Task 14's live smoke test. -import { describe, it, expect } from 'vitest'; -import { commcareClient } from '../../../../mcp/connect/backends/commcare.js'; - -const RUN = process.env.CONNECT_INTEGRATION === '1'; - -const TINY_PNG = Buffer.from( - 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII=', - 'base64', -); - -describe.skipIf(!RUN)('commcare_upload_multimedia (integration)', () => { - it('uploads a tiny PNG and returns multimedia_id + file_hash_md5', async () => { - const domain = process.env.ACE_HQ_DOMAIN!; - const appId = process.env.ACE_SMOKE_APP_ID!; - expect(domain).toBeTruthy(); - expect(appId).toBeTruthy(); - - const c = await commcareClient(); - const filename = `probe-${Date.now()}.png`; - const out = await c.uploadMultimedia({ - domain, app_id: appId, - media_path: `jr://file/commcare/image/${filename}`, - file_bytes: TINY_PNG, content_type: 'image/png', - }); - expect(out.multimedia_id).toMatch(/^[0-9a-f]{32}$/); - expect(out.file_hash_md5).toMatch(/^[0-9a-f]{32}$/); - }, 30_000); - - it('is idempotent — same bytes return the same multimedia_id', async () => { - const domain = process.env.ACE_HQ_DOMAIN!; - const appId = process.env.ACE_SMOKE_APP_ID!; - const c = await commcareClient(); - const filename = `probe-idem-${Date.now()}.png`; - const a = await c.uploadMultimedia({ - domain, app_id: appId, - media_path: `jr://file/commcare/image/${filename}`, - file_bytes: TINY_PNG, content_type: 'image/png', - }); - const b = await c.uploadMultimedia({ - domain, app_id: appId, - media_path: `jr://file/commcare/image/${filename}`, - file_bytes: TINY_PNG, content_type: 'image/png', - }); - expect(b.multimedia_id).toBe(a.multimedia_id); - expect(b.file_hash_md5).toBe(a.file_hash_md5); - }, 30_000); -}); -``` - -- [ ] **Step 6: Run integration test (gated)** - -```bash -CONNECT_INTEGRATION=1 \ - ACE_HQ_DOMAIN=connect-ace-prod \ - ACE_SMOKE_APP_ID=4e20ddf5beca42278c4d2c20383eb943 \ - npm test -- test/mcp/connect/integration/commcare-upload-multimedia.test.ts -``` - -(Domain + app_id values above are from the live probe in Task 2; replace with current smoke target if those are stale.) - -Expected: PASS. If it fails because the contract has drifted, re-run the probe script to confirm the live shape and adjust the atom + test together. - -- [ ] **Step 7: Commit** - -```bash -git add mcp/connect/backends/commcare.ts \ - test/mcp/connect/unit/commcare-upload-multimedia.test.ts \ - test/mcp/connect/integration/commcare-upload-multimedia.test.ts \ - scripts/probe-multimedia-upload.ts -git commit -m "feat(connect): commcare_upload_multimedia atom backend" -``` - ---- - -## Task 10: Wire the atom into the MCP server + capability map - -**Files:** -- Modify: `mcp/connect-server.ts` (add `server.tool('commcare_upload_multimedia', ...)`) -- Modify: `mcp/connect/capability-map.ts` (add the capability entry) - -- [ ] **Step 1: Add tool registration** - -In `mcp/connect-server.ts`, after the `commcare_patch_xform` block (line ~413), add: - -```typescript -// commcare_upload_multimedia — POST a binary multimedia asset to CCHQ. -// Required companion to commcare_patch_xform: the form-XML patch makes the -// build *reference* the asset; this atom puts the *bytes* into CouchDB so -// CCHQ's clean_paths() doesn't prune the reference on the next make_build. -// -// Endpoint: POST /a//apps//multimedia/uploaded// -// derives from content_type MIME prefix. -// Auth: same Playwright session as commcare_patch_xform; X-CSRFToken header. -// Returns: { multimedia_id, file_hash_md5 } — see backends/commcare.ts. -// -// CRITICAL ORDER OF OPERATIONS: -// 1. patch form XML to reference jr://file/commcare// -// 2. commcare_upload_multimedia (this atom) -// 3. commcare_make_build + commcare_release_build -// Reversing 1 and 2 still works (uploads are idempotent), but skipping -// step 1 means the upload is silently no-op for FLW devices because -// CCHQ's clean_paths() prunes orphaned media on every build. -server.tool('commcare_upload_multimedia', - { - domain: z.string(), - app_id: z.string().regex(/^[0-9a-f]{32}$/, '32-char hex'), - media_path: z.string().regex(/^jr:\/\/file\/commcare\/(image|audio|video|text)\/[^\/]+$/), - file_bytes_base64: z.string().min(1).describe('Asset bytes, base64-encoded'), - content_type: z.string().regex(/^(image|audio|video|text)\//), - }, - async (args) => - runAtom(async () => { - const { file_bytes_base64, ...rest } = args; - return (await commcareClient()).uploadMultimedia({ - ...rest, - file_bytes: Buffer.from(file_bytes_base64, 'base64'), - }); - }), -); -``` - -- [ ] **Step 2: (skipped) capability-map** - -`mcp/connect/capability-map.ts` is **Connect-side-only** — it lists the -21 atoms targeting `connect.dimagi.com`. The four CommCare atoms -(`make_build`, `release_build`, `download_ccz`, `patch_xform`) all -target `commcarehq.org` via the separate `commcareClient()` factory -and are NOT tracked in capability-map. Adding an `'upload_multimedia'` -entry there would either break the typed `Record` or -imply wrong routing. - -Earlier draft of this plan said "add a parallel entry" based on an -incorrect read of capability-map's scope. **No edit to capability-map -is needed for CommCare atoms.** Tool registration in `connect-server.ts` -(Step 1 above) is sufficient. - -- [ ] **Step 3: Smoke-test the MCP server boots** - -```bash -npm run mcp:connect 2>&1 | head -20 & -sleep 2 -kill %1 || true -``` - -Expected: server starts without throwing. If a Zod schema or capability-map entry is malformed, the import fails noisily. - -- [ ] **Step 4: Commit** - -```bash -git add mcp/connect-server.ts mcp/connect/capability-map.ts -git commit -m "feat(connect): register commcare_upload_multimedia tool" -``` - ---- - -## Task 11: Doctor checks for new env vars - -**Files:** -- Modify: `bin/ace-doctor` (add CONTENT_GENERATOR_URL / CONTENT_GENERATOR_API_KEY to the env-drift check) - -- [ ] **Step 1: Find the existing env-drift check** - -```bash -grep -n "CONTENT_GENERATOR\|env_drift\|env_file" bin/ace-doctor | head -20 -grep -n "OCS_API_TOKEN\|ACE_HQ_USERNAME" bin/ace-doctor | head -10 -``` - -The env-drift block enumerates expected `.env` keys and reports any that are missing. - -- [ ] **Step 2: Add the two new keys** - -In `bin/ace-doctor`, locate the array / list of expected env var names and add `CONTENT_GENERATOR_URL` and `CONTENT_GENERATOR_API_KEY`. - -If there's a separate "service health" section that probes each integration, add a passive check: present-and-non-empty only (no live HTTP call — image generation is too slow / costly to ping on every doctor run). - -- [ ] **Step 3: Run doctor and verify** - -```bash -/ace:doctor 2>&1 | grep -i content -``` - -Expected: output mentions `CONTENT_GENERATOR_URL` and `CONTENT_GENERATOR_API_KEY` either as `OK` (if `.env` was regenerated in Task 3) or `MISSING` (otherwise). - -- [ ] **Step 4: Commit** - -```bash -git add bin/ace-doctor -git commit -m "feat(doctor): check Content Generator env vars" -``` - ---- - -## Task 12: `skills/app-multimedia-coverage/SKILL.md` — the orchestration prose - -**Files:** -- Create: `skills/app-multimedia-coverage/SKILL.md` - -This is a prompt, not code. Mirror the structure of `skills/commcare-form-patch/SKILL.md` (process, mode behavior, dry-run, failure modes, MCP tools used, change log). - -- [ ] **Step 1: Read the reference skill** - -Open `skills/commcare-form-patch/SKILL.md` end-to-end and `skills/app-connect-coverage/SKILL.md` for the verify+fix pattern. - -- [ ] **Step 2: Author SKILL.md** - -```markdown ---- -name: app-multimedia-coverage -description: > - Post-Phase-2 enhancement skill that attaches display-only images to - Connect Learn / Deliver app questions. Uses an LLM judge to pick which - fields deserve images (criterion: FLW uses it OR shows it to a client), - generates them via Dimagi's Content Generator API, patches the form - XML to add `` itext references, uploads the assets to CCHQ via - `commcare_upload_multimedia`, and re-builds + re-releases the apps. - Manual gate; not part of `/ace:run`. Sibling of `commcare-form-patch`. - Delete when Nova ships first-class field-level multimedia (see § Removal - criteria). ---- - -# App Multimedia Coverage - -Generate and attach display-only images to Connect app questions where -they meaningfully help frontline workers. This skill closes the loop -that Nova doesn't today: schema for media on a field, asset generation, -CCZ bundling, form-XML reference, and a release. Mirrors the -end-to-end pattern of `commcare-form-patch`. - -## Why this skill exists - -CommCare apps render images on questions via standard `` itext -references and CCZ-bundled assets at `commcare/multimedia/image/...`. -Nova has no schema for this — its `image`/`audio`/`video` field kinds -are *input capture*, not *display*. Until Nova ships field-level media -(see § Removal criteria), this skill is the only path from "PDD" to -"images on screen." - -## Removal criteria - -Delete this skill (and the supporting helpers + atom) when ALL of: - -1. Nova ships a field-level `media: { image_url, alt_text, image_directives }` - schema and round-trips it through `compile_app`. -2. Nova's compile bundles linked media into the produced CCZ at - `commcare/multimedia/image/...`. -3. A clean `/ace:run` against `CRISPR-Test-004-KMC-multimedia` produces - images-attached apps without this skill firing. -4. Each affected opp's `run_state.yaml` has empty - `phase_2_backlog.app-multimedia-coverage`. - -## Process - -Inputs: -- `` — positional, required -- `--app=learn|deliver|both` — default `both` -- `--max-images=N` — default `100` (runaway guard) -- `--dry-run` — investigate without generating or patching - -For each app in scope: - -1. **Read deployment summary** `2-commcare/app-deploy_summary.md` → - `hq_domain`, `learn_app_id` / `deliver_app_id`, latest released - `build_id`. Read PDD for App Context derivation. - -2. **Derive Application Context.** If - `2-commcare/app-multimedia-coverage_app-context.md` exists, use as-is - (operator override wins). Otherwise synthesize from the PDD's - intervention description + a target-FLW one-liner + the standard - Dimagi guidance ("People should be dressed modestly. All of the - users and participants should be representative of the context."). - Write the synthesized version back so the operator can edit. - -3. **LLM-judge each visible field** via `lib/multimedia-judge.ts`. Skip - `hidden` and `calculate` kinds. Skip kinds with no displayed label. - Application Context goes in a prompt-cached system block — every - per-field call benefits from cache hit on the constant block. - -4. **Write candidates YAML** to - `2-commcare/app-multimedia-coverage_candidates-.yaml`. If the - file already exists, **operator hand-edits win** — load as-is. The - judge runs only on first creation; re-run with `--rejudge` to - refresh. - -5. **Cost preview** — print - `Will generate {N} images for ; ~30s each ≈ M minutes.` - Halt if `N > --max-images`. - -6. **Generate images.** For each `generate: true` candidate: - - Compute `prompt_hash` via `lib/multimedia-prompt-hash.ts`. - - Cache hit (PNG present at expected path) → skip. - - Cache miss → `ContentGeneratorClient.generateImage(...)` → save - PNG to - `2-commcare/app-multimedia-coverage_generated///__.png` - → update `app-multimedia-coverage_manifest.yaml`. - - Default: serial. Bounded parallelism may be added later. - -7. **Patch form XML** for each form with ≥1 image: - - `commcare_download_ccz` to fetch the released form XML. - - `addImageItext()` from `lib/multimedia-xform-patch.ts` to add the - `` itext entries. - - `commcare_patch_xform` to POST the patched XML. - - Re-fetch via `commcare_download_ccz` to confirm the patch stuck. - - **WHY this happens before the upload:** CCHQ's `clean_paths()` prunes - any multimedia binary that no form references on the next - `make_build`. The form-XML reference is what causes CCHQ to retain - the asset in the build's multimedia map. Reverse this order and the - asset lands in CouchDB but never reaches FLW devices. - -8. **Upload multimedia to CCHQ** via `commcare_upload_multimedia` per - image. Record returned `multimedia_id` (CCHQ couch _id) and - `file_hash_md5` (CCHQ's md5 of the bytes) into the manifest. - -9. **Build + release** — `commcare_make_build` then - `commcare_release_build`. Capture new `build_id` + `version`. - -10. **Verify** — re-download the released CCZ. Assert every manifest - image is at `commcare/multimedia/image/` AND every patched - form XML references its `jr://file/...` URI. Halt on mismatch — if - the file is missing despite a successful upload, the most likely - cause is step 7 didn't land before step 9 (orphan-prune). - -11. **Report.** Write - `2-commcare/app-multimedia-coverage_report-.md` - (frontmatter + per-form table; see spec § 4 step 11). - -12. **Update `run_state.yaml`** with status + per-app counts under - `phases.manual.app-multimedia-coverage`. - -## Mode behavior - -- **Auto** (default): walk → judge → generate → patch → upload → build - → release → verify → report. No human gate. -- **Review**: pause after step 4 (candidates YAML written) and after - step 7 (form-XML diff staged) for operator approval. -- **Dry-run** (`--dry-run`): execute steps 1–4 + cost preview only. - Outputs candidates YAML for inspection. State tracks - `dry-run-success`. - -## Failure modes - -| Mode | Cause | Behavior | -|---|---|---| -| `judge.error` ≥1 field | LLM Zod validation failed | Skip that field, log to candidates, continue. Status: `partial`. | -| Content Generator 5xx | Service hiccup | One retry with backoff, then halt. | -| `ContentGeneratorAuthError` | Bad/missing API key | Halt; point at `/ace:doctor`. | -| `XformConflictError` | CCHQ form sha1 changed | Halt the form, surface live sha1. | -| `commcare_upload_multimedia` HTTP 500 | CCHQ rejected the binary | Halt skill; surface response slice. | -| Verify (step 10) fails | Patch or upload silently dropped | Halt with per-form diff. Status: `blocked`. | -| `--max-images` exceeded | Runaway opp | Halt before generation. | -| Nova MCP unavailable | Step 1 fallback | Use released-CCZ XML walk for field discovery. | - -## MCP tools used - -- **Google Drive:** `drive_read_file`, `drive_create_file`, `drive_update_file`, `drive_create_folder`, `drive_list_folder` -- **ace-connect (CCHQ atoms):** `commcare_download_ccz`, `commcare_patch_xform`, `commcare_upload_multimedia` (new), `commcare_make_build`, `commcare_release_build` -- **Nova:** `nova_get_app`, `nova_get_form`, `nova_get_field` (read-only — for field metadata when blueprint is available) -- **Anthropic SDK:** Sonnet 4.6 via `@anthropic-ai/sdk` (judge calls) -- **HTTP:** Content Generator API via `lib/content-generator-client.ts` - -## Change log - -| Date | Change | Author | -|------|--------|--------| -| 2026-05-05 | Initial version. Manual gate, sibling of `commcare-form-patch`. | ACE team | -``` - -- [ ] **Step 3: Commit** - -```bash -git add skills/app-multimedia-coverage/SKILL.md -git commit -m "feat(skills): app-multimedia-coverage SKILL.md" -``` - ---- - -## Task 13: Smoke fixture — `CRISPR-Test-004-KMC-multimedia` - -**Files:** -- Create: `test/fixtures/CRISPR-Test-004-KMC-multimedia/pdd.md` -- Create: `test/fixtures/CRISPR-Test-004-KMC-multimedia/expected-multimedia-candidates-learn.yaml` (golden expected output for the judge) - -The fixture exists for two purposes: (a) demo target for live runs, (b) Nova-feature-request removal-criteria check (when Nova ships, the same PDD should produce images-attached apps without this skill). - -- [ ] **Step 1: Author the PDD** - -```markdown - ---- -name: KMC Multimedia Smoke -archetype: atomic-visit -target_flws: African community health workers in low-resource settings ---- - -# Kangaroo Mother Care for Small Vulnerable Newborns - -## Intervention - -Frontline workers visit mothers of small or vulnerable newborns (SVN — -under 2.5 kg or born preterm) and teach Kangaroo Mother Care (KMC): -continuous skin-to-skin contact, exclusive breastfeeding, and early -recognition of warning signs. Each visit is a single in-person -encounter with one structured assessment and several teaching points. - -## Learn app — module structure - -1. **What is KMC?** - - Form: instructional. Label-only fields explaining benefits, - positioning, duration, and indications. -2. **How to position the baby** - - Form: instructional. Step-by-step visual demonstration: head and - neck support, skin contact, wrapping the baby securely. -3. **Recognising danger signs** - - Form: instructional with a quiz. Visual cues for jaundice, apnea, - poor feeding, hypothermia. -4. **Knowledge check** - - Form: quiz. Single-select questions on positioning, signs, etc. - -## Deliver app — visit structure - -Single registration form per visit: -- Mother's name, age, contact -- Baby weight at birth, gestational age, current weight -- Direct observation: is baby positioned correctly? (yes/no with photo) -- Triage: any danger signs present? (multi-select with visual choices) -- Counselling delivered: which teaching points? (multi-select) -- Follow-up date - -## Preferred LLOs - -(none — smoke fixture, runs without solicitation) -``` - -- [ ] **Step 2: Author the expected-judge-output YAML (golden ground truth)** - -```yaml -# test/fixtures/CRISPR-Test-004-KMC-multimedia/expected-multimedia-candidates-learn.yaml -# -# Golden ground truth for what the LLM judge SHOULD emit on a clean run. -# Used to spot regressions in the judge prompt over time. Form/field IDs -# match the structure the live Nova build will produce; if Nova field -# naming changes, regenerate this file from a known-good run. - -# ~8-12 candidates expected: -# - "What is KMC" intro screen → generate=true (FLW shows mother) -# - Positioning step labels (3-5 of them) → generate=true (FLW demonstrates) -# - Danger sign visual cues (jaundice, apnea, hypothermia) → generate=true -# - "Mother's name" text field → generate=false -# - "Baby weight" numeric → generate=false -# - "Follow-up date" → generate=false -``` - -(The actual ground-truth file is filled in after the first live run against this fixture; for now it ships as a placeholder describing what's expected.) - -- [ ] **Step 3: Commit** - -```bash -git add test/fixtures/CRISPR-Test-004-KMC-multimedia/ -git commit -m "test(fixture): CRISPR-Test-004-KMC-multimedia smoke PDD" -``` - ---- - -## Task 14: Live smoke test against the fixture - -**Files:** none (this is a manual verification step) - -- [ ] **Step 1: Verify env is wired** - -```bash -/ace:doctor 2>&1 | grep -E 'CONTENT_GENERATOR|env_file' | head -10 -``` - -Expected: both `CONTENT_GENERATOR_*` keys reported as OK. - -- [ ] **Step 2: Pick (or create) a smoke opp that has Nova-built apps released** - -The skill needs an existing opp where Phase 1 + Phase 2 have completed. The simplest path: pick the most recent passing smoke opp from `~/.ace/` or Drive, and run against it. (Standing up a new opp from `CRISPR-Test-004` is a longer-form smoke that comes with the Nova-feature-request validation cycle; not required for first live run of this skill.) - -```bash -/ace:status 2>&1 | tail -30 -``` - -Choose an opp with `phase_2: clean` and noted `learn_app_id` + `deliver_app_id`. - -- [ ] **Step 3: Dry-run first** - -```bash -/ace:step app-multimedia-coverage --dry-run -``` - -Expected: candidates YAML written, cost preview printed, no API calls to Content Generator, no patches. - -- [ ] **Step 4: Inspect the candidates YAML** - -Read `2-commcare/app-multimedia-coverage_candidates-learn.yaml` and `_candidates-deliver.yaml` from Drive. Sanity-check: do the `generate: true` choices look right? Are the directives reasonable? - -If choices look bad: iterate the judge prompt in `lib/multimedia-judge.ts` (`SYSTEM_HEAD`), regenerate the candidates with `--rejudge`, repeat until the operator is happy. - -- [ ] **Step 5: Live run on Learn only first** - -```bash -/ace:step app-multimedia-coverage --app=learn --max-images=10 -``` - -Expected: 10 images generated, form XML patched, multimedia uploaded, app re-built and re-released, verify step passes, report written. - -- [ ] **Step 6: Manual visual check** - -Open the new build in CommCare HQ's app preview or pull the CCZ and inspect: - -```bash -gh attestation download ... # OR via CCHQ's preview UI -``` - -Confirm at least one image renders alongside the expected question. - -- [ ] **Step 7: Iterate** - -If something failed: address the failure mode, re-run, repeat. **Convergence is the goal of this task.** Each iteration should commit fixes (judge prompt tweaks, atom edge cases, XML patcher edge cases) as small focused commits. - -- [ ] **Step 8: Run on Deliver after Learn passes** - -```bash -/ace:step app-multimedia-coverage --app=deliver --max-images=10 -``` - -- [ ] **Step 9: Capture the golden run for the fixture's expected YAML** - -Once a live run looks right, copy the produced `app-multimedia-coverage_candidates-learn.yaml` from Drive into `test/fixtures/CRISPR-Test-004-KMC-multimedia/expected-multimedia-candidates-learn.yaml`, replacing the placeholder. - -```bash -git add test/fixtures/CRISPR-Test-004-KMC-multimedia/expected-multimedia-candidates-learn.yaml -git commit -m "test(fixture): capture golden judge output from first live run" -``` - ---- - -## Task 15: Update CLAUDE.md and CHANGELOG - -**Files:** -- Modify: `CLAUDE.md` -- Modify: `CHANGELOG.md` - -- [ ] **Step 1: Add a CLAUDE.md "Current state" bullet** - -After the existing post-Nova-skill paragraphs (the section that mentions `app-connect-coverage` and `commcare-form-patch`), add: - -```markdown -- **`app-multimedia-coverage` — manual post-Phase-2 multimedia attach.** - Sibling of `commcare-form-patch`. LLM-judges each Nova-built field - (criterion: would the FLW use it OR show it to a client?), calls - Dimagi's Content Generator API for the chosen ones, patches form XML - with `` itext entries, uploads PNGs via the new - `commcare_upload_multimedia` atom, and re-builds + re-releases. **Not - part of `/ace:run`** — invoked manually with `/ace:step - app-multimedia-coverage `. Spec at - `docs/superpowers/specs/2026-05-05-app-multimedia-coverage-design.md`; - delete when Nova ships first-class field-level multimedia (see the - removal criteria in the SKILL.md). -``` - -- [ ] **Step 2: Add a CHANGELOG entry under the next version** - -(Pick the next minor or patch version per `CLAUDE.md` § "Plugin updates".) - -```markdown -## 0.13.4 — app-multimedia-coverage skill - -- New skill `app-multimedia-coverage` (manual gate, post-Phase 2): - attaches display-only images to Connect Learn/Deliver app questions - via Dimagi's Content Generator + post-Nova CCZ patching. -- New CCHQ atom `commcare_upload_multimedia` to bundle binary assets - into the released CCZ. -- New helpers under `lib/`: `multimedia-judge`, `content-generator-client`, - `multimedia-manifest`, `multimedia-prompt-hash`, `multimedia-xform-patch`. -- New `.env.tpl` keys: `CONTENT_GENERATOR_URL`, `CONTENT_GENERATOR_API_KEY`. -- Filed Nova feature request `voidcraft-labs/nova-plugin#` for - field-level multimedia; this skill has explicit removal criteria. -``` - -- [ ] **Step 3: Bump version** - -```bash -scripts/version-bump.sh -``` - -- [ ] **Step 4: Commit** - -```bash -git add CLAUDE.md CHANGELOG.md VERSION package.json .claude-plugin/plugin.json .claude-plugin/marketplace.json -git commit -m "docs: app-multimedia-coverage in CLAUDE.md + CHANGELOG (0.13.4)" -``` - ---- - -## Task 16: File the Nova feature request - -**Files:** none in this repo - -- [ ] **Step 1: Verify gh CLI auth** - -```bash -gh auth status -``` - -If not authed for `voidcraft-labs/nova-plugin`, sort that out before proceeding. - -- [ ] **Step 2: File the issue** - -```bash -gh issue create --repo voidcraft-labs/nova-plugin \ - --title "Field-level multimedia (display-only) on Learn/Deliver questions" \ - --body "$(cat <<'EOF' -## Problem - -Nova has no schema for **display-only** multimedia on a question (the -image / audio / video that the FLW sees alongside a question label). -The existing `image` / `audio` / `video` field kinds are *input -capture* (FLW takes a photo / records audio), not display. - -Standard CommCare apps render display media via: - -- `` / `