diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index a770885..b3ac8cd 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -6,13 +6,13 @@ "url": "https://github.com/jjackson" }, "metadata": { - "version": "0.13.334" + "version": "0.13.335" }, "plugins": [ { "name": "ace", "source": "./", - "version": "0.13.334", + "version": "0.13.335", "description": "AI Connect Engine — orchestrates the CRISPR-Connect lifecycle from idea through app building, Connect setup, LLO management, and closeout" } ] diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index 17ed81e..2b35152 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "ace", - "version": "0.13.334", + "version": "0.13.335", "description": "AI Connect Engine — orchestrates the CRISPR-Connect lifecycle from idea through app building, Connect setup, LLO management, and closeout", "author": { "name": "Jonathan Jackson", diff --git a/CLAUDE.md b/CLAUDE.md index 38dc2e8..6c2b460 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -105,6 +105,20 @@ Hook setup if needed: `git config core.hooksPath scripts/hooks`. Two hooks ship: Cache dir is keyed by version: `~/.claude/plugins/cache/ace/ace//`. On session start, Claude Code pulls the marketplace repo, compares `plugin.json` version against the installed version, and re-installs if different. +### MCP changes need a full Claude restart (not just `/reload-plugins`) + +Editing MCP code in `mcp/` (or a new schema deploying upstream — e.g. labs publishes a new `tools/list`) does NOT take effect in the current Claude Code session via `/ace:update` + `/reload-plugins` alone. **MCP subprocesses bind their tool list, schemas, and module code at subprocess startup** — that's tied to the parent Claude Code process, not to plugin reloads. After: + +- Editing anything under `mcp/` (atom handlers, capability maps, backend wiring, `tools/list` shape) and merging the change +- An upstream MCP server (labs, OCS, Connect) deploying a new schema while your session was running +- `/ace:update` bumping the plugin to a version whose MCP code differs + +…the running MCP subprocess is still holding the OLD code/schema. `/reload-plugins` reloads agents + skills + hooks; it does NOT respawn MCP subprocesses. **Quit and reopen Claude Code (full process restart)** to pick up the new MCP behavior. + +Symptom of skipping this: payloads that match the documented schema get `INVALID_SCHEMA` rejections; new atoms don't appear in `ToolSearch`; deprecated atoms still resolve. The on-disk code is right; the running subprocess is just stale. Verified live on the `solicitation-create` schema-drift class (2026-05-22) where ACE was reading a pre-labs-PR-#211 `{data: {...}}` shape while the live deployed schema had been flat for weeks. + +When in doubt, validate by curling the live MCP's `tools/list` directly (e.g. `curl ... https://labs.connect.dimagi.com/mcp/ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'`) — that response is the canonical contract. If it disagrees with what `ToolSearch` shows you in-session, restart Claude. + ## Conventions - **Skills are stateless.** Per-opportunity state lives in Drive `ACE//`. Don't introduce local state in `SKILL.md` files. diff --git a/VERSION b/VERSION index d7660db..aec6a7e 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.13.334 +0.13.335 diff --git a/package.json b/package.json index 89d49af..69a620f 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "ace", - "version": "0.13.334", + "version": "0.13.335", "description": "AI Connect Engine - orchestrator for building Connect Opps using AI", "type": "module", "scripts": { diff --git a/skills/solicitation-create/SKILL.md b/skills/solicitation-create/SKILL.md index ca5d2ce..48102e3 100644 --- a/skills/solicitation-create/SKILL.md +++ b/skills/solicitation-create/SKILL.md @@ -158,11 +158,16 @@ contract. configuration. The Phase 4 payment band overrides the work order's if they differ. -2. **Build the solicitation `data` payload using the labs canonical - schema.** Labs's `solicitations/models.py` declares the field names - the public-detail template renders. Drift kills the public page - (silently — the API echoes back whatever you send). The canonical - shape is: +2. **Build the solicitation payload using the labs canonical schema.** + **The labs MCP's `tools/list` is the canonical contract** — it + returns the live `inputSchema` for `create_solicitation` and + `update_solicitation`. If this SKILL.md ever disagrees with + `tools/list`, `tools/list` wins; re-read it and update the skill. + See § Stale-schema gotcha below for why this matters. + + The deployed schema is **flat** — every solicitation field is a + top-level property of the atom's argument object. There is no + `data: {...}` envelope. The canonical shape is: | data-object field | Type | Source / composition | |---|---|---| @@ -175,8 +180,8 @@ contract. | `expected_end_date` | string `YYYY-MM-DD` | Phase 4 opp `end_date` if available, else PDD `## Timeline` → end. **NOT** `anticipated_end`. | | `estimated_scale` | string | Human-readable summary of expected reach, e.g. "30–50 verified HH visits per LLO; 2–3 LLOs total (90–150 HH end-to-end)". Sourced from PDD `## Target Population` → `Expected reach`. **NOT** `sample_target`. | | `contact_email` | string | Operator-monitored address. **NOT** `ace@dimagi-ai.com` (that's the service-account bot). Use `${ACE_SOLICITATION_CONTACT_EMAIL}` env var; halt with `[BLOCKER]` if unset rather than defaulting to the bot inbox. | - | `evaluation_criteria` | array of `{id, name, description, weight, scoring_guide, linked_questions}` | Composed locally — see Step 3. **NOT** `rubric`. **NOT** `[{dimension, criterion, weight}]`. **`id` is REQUIRED** — kebab-case identifier (e.g. `field-ops-realism`), unique within the rubric. Server rejects criteria without `id` even though the labs atom's documented inputSchema doesn't flag it (see `jjackson/connect-labs#212`). Each criterion MUST also have a populated `scoring_guide` and at least one `linked_questions` id. Weights sum to 100 (integers). | - | `questions` | array of `{id, text, required, type}` (+ inlined framing — see below) | Composed locally — see Step 4. **NOT** `response_questions`. Field is `text`, not `question`. `id` is kebab-case (`field-ops-realism`). `type` is one of `"textarea"` (default), `"multiple_choice"`, `"number"`. **Framing handling — current labs reality:** the deployed labs server (as of 2026-05-22) rejects a `framing` key with `INVALID_SCHEMA`. Until `jjackson/connect-labs#212` ships, inline the framing as a prefix in the `text` field using the literal convention `Why we're asking: \n\n`. Two-line gap between framing and prompt is load-bearing — `solicitation-review` parses the prefix back out on this anchor. Once `connect-labs#212` lands, drop the inline prefix and emit `framing` as a structured field. | + | `evaluation_criteria` | array of `{id, name, weight, description, scoring_guide, linked_questions}` | Composed locally — see Step 3. **NOT** `rubric`. **NOT** `[{dimension, criterion, weight}]`. The deployed schema requires `id`, `name`, `weight` (verified via live `tools/list`); the ACE convention adds `description`, `scoring_guide`, and `linked_questions` as content-quality requirements. `id` is kebab-case (e.g. `field-ops-realism`), unique within the rubric — derive `slugify(name)` if you're tempted to elide it; explicit ids are required because duplicate-named criteria would otherwise silently collide. Weights sum to 100 (integers). | + | `questions` | array of `{id, text, type, framing, required, options}` | Composed locally — see Step 4. **NOT** `response_questions`. Field is `text`, not `question`. The deployed schema requires `id`, `text`, `type` (verified via live `tools/list`); `framing` is an optional structured key that ACE always populates because it's the rubric anchor `solicitation-review` consumes when scoring responses. `id` is kebab-case (e.g. `field-ops-realism`). `type` is one of `"textarea"` (default), `"multiple_choice"`, `"number"`. Empty `framing` is a `[BLOCKER]` — the review path can't score responses against a missing anchor. | | `status` | string | `'active'` (publishes immediately; `'draft'` for dry-run mode). | | `is_public` | bool | `true` (so unsolicited orgs can find it on the public marketplace). | | `connect_opportunity_id` | int | Phase 4 opp internal id (not the UUID). Stored on the record for downstream solicitation-review linkage. | @@ -322,16 +327,17 @@ contract. linked_questions: [string] # one or more question `id`s from the questions block; each criterion links to ≥1 question ``` - **Every field is required, including `id`.** The deployed labs server - enforces `id` even though the MCP atom's documented inputSchema does - not currently flag it (tracked in `jjackson/connect-labs#212`). - Derive `id` from `slugify(name)` if you're tempted to elide it. - `scoring_guide` and `linked_questions` are NOT optional — the labs - template renders both, and an empty `scoring_guide` makes the rubric - uninterpretable to responding LLOs. **Weights must sum to 100** - (integers — the labs template formats them as `%`). 5-8 - criteria total; more than 8 dilutes the signal, fewer than 4 misses - dimensions. + **`id`, `name`, `weight` are required by the labs schema.** + `description`, `scoring_guide`, and `linked_questions` are ACE + content-quality requirements layered on top. `id` is kebab-case and + unique within the rubric — derive `slugify(name)` if needed, but + emit it explicitly (auto-derivation would silently collide on + duplicate names). `scoring_guide` and `linked_questions` MUST be + non-empty; an empty `scoring_guide` makes the rubric uninterpretable + to responding LLOs and silently ships an unscoreable solicitation. + **Weights must sum to 100** (integers — the labs template formats + them as `%`). 5-8 criteria total; more than 8 dilutes the + signal, fewer than 4 misses dimensions. **`scoring_guide` shape — what a strong response looks like:** Each `scoring_guide` MUST describe (a) what a strong (8-10) answer @@ -390,25 +396,26 @@ contract. Required shape per question: ```yaml - - id: string # short kebab-case, e.g. "field-ops-realism" - text: string # framing prefix + prompt — see "Framing convention" below - required: bool # default true - type: string # "textarea" (default), "multiple_choice", "number" + - id: string # kebab-case, e.g. "field-ops-realism" — REQUIRED by labs schema + text: string # the actual prompt — REQUIRED by labs schema + type: string # "textarea" (default), "multiple_choice", "number" — REQUIRED by labs schema + framing: string # 1-2 sentence "why we're asking" preface — optional in labs schema, ALWAYS populated by ACE + required: bool # default true (optional in labs schema) + options: [string] # required when type=multiple_choice ``` - **Framing convention (current labs reality).** The deployed labs - server rejects a top-level `framing` key on questions (`INVALID_SCHEMA`). - Until `jjackson/connect-labs#212` ships, inline the framing inside - `text` using the literal anchor `Why we're asking: ` followed by the - framing sentence(s), a blank line, then the prompt: + Labs's public-detail template renders `framing` above the prompt + in a muted "Why we're asking" preface block; `solicitation-review` + consumes `framing` directly as the rubric anchor for response + scoring. Example: ```yaml - id: field-ops-realism + framing: | + A strong response names supervisor:FLW ratios, handles the + photo-heavy visit logistics, and treats the mid-pilot checkpoint + as a real planning anchor. text: | - Why we're asking: a strong response names supervisor:FLW ratios, - handles the photo-heavy visit logistics, and treats the mid-pilot - checkpoint as a real planning anchor. - Propose a week-by-week schedule from award through Week 10 closeout, including LLO mobilization, Connect onboarding, Learn calibration, field launch, mid-pilot checkpoint, and closeout. @@ -416,11 +423,8 @@ contract. type: textarea ``` - The `Why we're asking: ` prefix + blank-line gap is load-bearing — - `solicitation-review` parses the framing back out on this exact - anchor. Once `connect-labs#212` lands and the server accepts - `framing` as a structured field, drop the inline prefix and emit - the framing separately. Until then this is the canonical convention. + **Empty `framing` is a `[BLOCKER]`** — `solicitation-review` can't + score responses against a missing anchor. **Dedupe by intent.** The PDD's `## Solicitation` → `Response template` may list opp-specific questions that overlap with the @@ -505,37 +509,35 @@ contract. shadow programs on first opportunity sync). Surface the Connect program name and the list of labs programs the caller can see. -6. **Publish.** - - **Wire-shape note (current labs reality).** The MCP atom's - `tools/list` inputSchema declares `{program_id, data: {...fields...}}` - (fields wrapped in `data`). The deployed labs server (as of - 2026-05-22) instead expects the fields **flat** at the JSON-RPC - params level and rejects the `data` envelope with `unknown fields: - ['data']`. This is a drift between the documented atom schema and - the server validator, tracked in `jjackson/connect-labs#212`. - - Two paths exist while the drift is live: - - - **Preferred:** call the MCP atom as documented (`data: {...}` wrap) - and let the proxy translate; the labs MCP server will be updated - to accept the wrapped shape once `connect-labs#212` lands. If the - MCP call succeeds, you're done. - - **Fallback (if MCP atom returns `unknown fields: ['data']`):** POST - the JSON-RPC frame directly to `https://labs.connect.dimagi.com/mcp/` - with the fields flat at `params.arguments`. Set - `Authorization: Bearer ${LABS_MCP_TOKEN}` and - `Content-Type: application/json`. Body shape: - ```json - {"jsonrpc":"2.0","id":1,"method":"tools/call","params":{ - "name":"create_solicitation", - "arguments":{"program_id":"153","title":"...","description":"...", /* flat fields */} - }} - ``` - Parse the `result` envelope to extract `solicitation_id`. - - Either way, the payload field SHAPE (names + types + required-ness) - is the same. Only the outer wrap differs. +6. **Publish.** Call `mcp__connect-labs__create_solicitation` with the + flat-fields shape from § Step 2 (every solicitation field as a + top-level property — no `data: {...}` envelope). Labs validates the + canonical schema server-side and rejects drift with + `INVALID_SCHEMA` + per-field `error.details.fields` (JSON-path + keyed). Read the error and fix the composition; do not retry with + the same payload, and do not stuff extras into a free-form field. + + **Stale-schema gotcha.** If you see `INVALID_SCHEMA` errors that + reference fields you swear are valid per the labs canonical schema + (or vice versa — a payload that should be wrapped works when sent + flat, or fields you expected to be optional are flagged required), + the MCP subprocess in this session is probably holding a stale + `tools/list` view of the labs schema. The labs MCP schema can + evolve faster than this skill's documentation, and the proxy's + schema gets cached at MCP-subprocess startup. Recovery: + + 1. Curl the live `tools/list` directly: + ```bash + curl -sS -X POST https://labs.connect.dimagi.com/mcp/ \ + -H "Authorization: Bearer $LABS_MCP_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' + ``` + Treat the response as truth. If it disagrees with this SKILL.md, + this SKILL.md is wrong — re-read the schema and update. + 2. Restart Claude Code (full process restart, not `/reload-plugins`) + so the MCP subprocess re-fetches `tools/list` at startup. + 3. Update this SKILL.md if the live schema has moved. Call: @@ -625,7 +627,7 @@ contract. - `expected_start_date` + `expected_end_date` both parse and match. - `estimated_scale` is non-empty. - `contact_email` matches. - - `len(questions)` matches sent; every question has `id`, `text`, `required`, `type`; `text` contains the `Why we're asking: ` framing anchor. + - `len(questions)` matches sent; every question has `id`, `text`, `type`, `framing` (non-empty), plus `required` if sent. - `len(evaluation_criteria)` matches sent; every criterion has `id`, `name`, `weight`, `scoring_guide`, `linked_questions` (non-empty); `sum(weights) == 100`. - `is_public: true`; `status: 'active'`; `connect_opportunity_id` matches sent. @@ -636,15 +638,14 @@ contract. proceed to write `published.md` against a half-persisted solicitation. **Why round-trip and not curl-the-public-URL.** The labs - `/solicitations//` detail page currently 302s to `/labs/login/` - for unauthenticated visitors even when `is_public: true` — tracked - in `jjackson/connect-labs#212`. A vanilla curl gets the login page, - not the solicitation. Once `connect-labs#212` ships the public-page - ACL fix, add a second verifier step that curls the public URL and - greps for the rendered sections (the original Step 7a, surfaced as - the structural backstop for the *rendered* layer rather than just - the persistence layer). Until then, `get_solicitation` round-trip - is the canonical structural check. + `/solicitations//` detail page requires a labs login by design. + `is_public: true` controls marketplace listing visibility for + logged-in labs users; it does NOT mean "anonymously readable." A + vanilla curl gets the labs login page, not the solicitation. The + `get_solicitation` round-trip is the canonical structural check at + the persistence layer. (For visual confirmation of rendered HTML, + log into labs and hit `/solicitations//` in a browser; ACE + skills don't need to script that.) This catches the class of bugs where the labs MCP accepts the create cleanly but persisted state diverges from sent payload. @@ -810,5 +811,6 @@ Each row this skill writes uses `phase: 8-solicitation-management` and | Date | Change | Author | |------|--------|--------| | 2026-05-21 | **Work-order-as-primary-input + canonical-schema field names + comprehensive-content shape.** Three bundled rewrites prompted by solicitation 3130 on `malaria-itn-app/20260521-1400` where the public page rendered blank Description, "TBD" timeline, "No deadline," Python-list-repr Scope, and zero questions / zero rubric simultaneously. (1) Inputs now read Phase 1's work order (`1-design/pdd-to-work-order.gdoc`) as the primary content source + `decisions.yaml` for later run decisions, alongside the PDD (now used for problem-framing only, not for scope). (2) Field names migrated to the labs canonical schema (`description` not `overview`, `application_deadline` not `response_window_days`, `expected_start_date/_end_date` not `anticipated_*`, `estimated_scale` not `sample_target`, `questions[].text` not `response_questions[].question`, `evaluation_criteria[].name/.scoring_guide/.linked_questions` not `rubric[].dimension/.criterion`; `solicitation_type: 'eoi'` lowercase). Top-level fields not in `solicitations/models.py` (`pass_bar`, `eligibility_criteria`, `geographic_scope`, `per_hh_payment_band_usd`, `budget`) folded into `description`/`scope_of_work` prose. (3) Content shape demands comprehensive prose: `description` 500-800 words foundation-pitch tone; `scope_of_work` 600-1000+ words derived section-by-section from the work order with explicit de-prescription rules (exact dollars → ranges, exact weeks → windows); every question has a required `framing` field; every evaluation criterion has a required `scoring_guide` + `linked_questions`. (4) Added Step 7a — a curl-the-public-URL structural verifier that catches field-name drift at write time instead of at human-eye time. | ACE team | -| 2026-05-22 | **Align with current labs reality + document the ideal end-state (`jjackson/connect-labs#212`).** Surfaced during the malaria-itn-app `20260521-1400` Phase 8 republish (solicitation 3140). Three labs-side gaps required inline workarounds: (a) atom inputSchema `{data: {...}}` shape vs deployed server's flat-fields validator — added Step 6's wire-shape fallback documenting both paths; (b) `questions[].framing` rejected as unknown key — adopted the literal `Why we're asking: \n\n` inline anchor convention, load-bearing for `solicitation-review` parsing; (c) `evaluation_criteria[].id` required by server but undocumented in atom inputSchema — added `id` to the required criterion shape with `slugify(name)` as the derivation fallback. Step 7a verifier rewritten from "curl the public URL" to "`get_solicitation` round-trip" because the labs public-detail page now 302s to login for unauthenticated visitors even when `is_public: true` (tracked in `connect-labs#212`). When all four labs items in #212 ship, drop the inline workarounds: emit `framing` as a structured key, drop the wire-shape fallback paragraph, restore the curl-the-public-URL verifier as a second post-round-trip check. | ACE team | +| 2026-05-22 | **(superseded — see 2026-05-22 correction below)** Align with current labs reality + document the ideal end-state (`jjackson/connect-labs#212`). Three labs-side gaps surfaced during the malaria-itn-app `20260521-1400` Phase 8 republish required inline workarounds. Three of the four "labs-side gaps" I diagnosed turned out to be ACE-side stale-schema reads, not labs bugs — see the correction entry. | ACE team | +| 2026-05-22 | **Correction: 3 of 4 "labs gaps" in #212 were actually ACE reading a stale schema.** The labs maintainer triaged `jjackson/connect-labs#212` against the live `tools/list` and found: (a) wire-shape "drift" — NOT a labs bug; deployed schema is flat (no `data` envelope), the wrapped shape this skill called out came from ACE's stale view; (b) `evaluation_criteria[].id` undocumented — NOT a labs bug; live schema declares `evaluation_criteria.items.required: [id, name, weight]`, ACE was reading a stale schema; (c) public-detail page 302s to login — WORKING AS INTENDED; `is_public` controls marketplace listing for logged-in users, NOT anonymous readability. Only #2 (`framing` on questions) was a real ask, and it turns out the deployed schema already has `framing` as an optional property — the live MCP accepts it as a structured key. Skill rewritten to match the live schema: dropped Step 6's wire-shape fallback paragraph (call the atom as documented, flat fields); dropped the criterion-id hedge (it's documented + required); emit `framing` as a structured key (drop the `Why we're asking: \n\n` inline anchor convention); dropped the "Step 7a-bis curl-the-public-URL once #212 ships" plan (won't work — `is_public` isn't anonymity). Added new **Stale-schema gotcha** under Step 6 explaining the recovery: curl `tools/list` against the live labs MCP, treat that as truth, restart Claude Code (full process restart, not `/reload-plugins`) to pick up the fresh schema. Added "live `tools/list` is the canonical contract; if this SKILL.md disagrees, the SKILL.md is wrong" at the top of Step 2. The underlying lesson: when an MCP server can evolve faster than this skill's documentation, treat the live `tools/list` as the source of truth — never trust a cached view in the local subprocess. | ACE team | | 2026-05-22 | **Architecture decision: ACE owns composition; labs validates.** PR #396 had floated a future labs-side `create_solicitation_from_brief` MCP tool that would compose content server-side via labs's `solicitation_agent`. Walked back — operator chose to keep composition in ACE so this skill retains full control over voice, archetype-branched scope, framing/scoring_guide quality, and decisions-log integration (all of which are ACE-context that labs would have to learn). Labs's tightened MCP (forthcoming deploy: `create_solicitation` + `update_solicitation` now validate the canonical schema and fail loudly with `INVALID_SCHEMA` + `error.details.fields` on drift) is the right server-side contribution: schema enforcement, not content generation. This skill is the long-term home for solicitation composition; Step 6's payload shape is bound to labs's `tools/list` inputSchema rather than to a future composer call. Removal of the prior "Removal criteria" line. | ACE team |