diff --git a/plugins/compound-engineering/AGENTS.md b/plugins/compound-engineering/AGENTS.md index 0cefbd52..96793827 100644 --- a/plugins/compound-engineering/AGENTS.md +++ b/plugins/compound-engineering/AGENTS.md @@ -93,6 +93,10 @@ When adding or modifying skills, verify compliance with the skill spec: This resolves relative to the SKILL.md and substitutes content before the model sees it. If a file is over ~150 lines, prefer a backtick path even if it is always needed - [ ] For files the agent needs to *execute* (scripts, shell templates), always use backtick paths -- `@` would inline the script as text content instead of keeping it as an executable file +### Conditional and Late-Sequence Extraction + +Skill content loaded at trigger time is carried in every subsequent message — every tool call, agent dispatch, and response. This carrying cost compounds across the session. For skills that orchestrate many tool or agent calls, extract blocks to `references/` when they are conditional (only execute under specific conditions) or late-sequence (only needed after many prior calls) and represent a meaningful share of the skill (~20%+). The more tool/agent calls a skill makes, the more aggressively to extract. Replace extracted blocks with a 1-3 line stub stating the condition and a backtick path reference (e.g., "Read `references/deepening-workflow.md`"). Never use `@` for extracted blocks — it inlines content at load time, defeating the extraction. + ### Writing Style - [ ] Use imperative/infinitive form (verb-first instructions) diff --git a/plugins/compound-engineering/skills/ce-plan/SKILL.md b/plugins/compound-engineering/skills/ce-plan/SKILL.md index a5cc624c..ca74a42b 100644 --- a/plugins/compound-engineering/skills/ce-plan/SKILL.md +++ b/plugins/compound-engineering/skills/ce-plan/SKILL.md @@ -587,35 +587,7 @@ For larger `Deep` plans, extend the core template only when useful with sections #### 4.4 Visual Communication in Plan Documents -Section 3.4 covers diagrams about the *solution being planned* (pseudo-code, mermaid sequences, state diagrams). The existing Section 4.3 mermaid rule encourages those solution-design diagrams within Technical Design and per-unit fields. This guidance covers a different concern: visual aids that help readers *navigate and comprehend the plan document itself* -- dependency graphs, interaction diagrams, and comparison tables that make plan structure scannable. - -Visual aids are conditional on content patterns, not on plan depth classification -- a Lightweight plan about a complex multi-unit workflow may warrant a dependency graph; a Deep plan about a straightforward feature may not. - -**When to include:** - -| Plan describes... | Visual aid | Placement | -|---|---|---| -| 4+ implementation units with non-linear dependencies (parallelism, diamonds, fan-in/fan-out) | Mermaid dependency graph | Before or after the Implementation Units heading | -| System-Wide Impact naming 3+ interacting surfaces or cross-layer effects | Mermaid interaction or component diagram | Within the System-Wide Impact section | -| Problem/Overview involving 3+ behavioral modes, states, or variants | Markdown comparison table | Within Overview or Problem Frame | -| Key Technical Decisions with 3+ interacting decisions, or Alternative Approaches with 3+ alternatives | Markdown comparison table | Within the relevant section | - -**When to skip:** -- The plan has 3 or fewer units in a straight dependency chain -- the Dependencies field on each unit is sufficient -- Prose already communicates the relationships clearly -- The visual would duplicate what the High-Level Technical Design section already shows -- The visual describes code-level detail (specific method names, SQL columns, API field lists) - -**Format selection:** -- **Mermaid** (default) for dependency graphs and interaction diagrams -- 5-15 nodes, no in-box annotations, standard flowchart shapes. Use `TB` (top-to-bottom) direction so diagrams stay narrow in both rendered and source form. Source should be readable as fallback in diff views and terminals. -- **ASCII/box-drawing diagrams** for annotated flows that need rich in-box content -- file path layouts, decision logic branches, multi-column spatial arrangements. More expressive than mermaid when the diagram's value comes from annotations within nodes. Follow 80-column max for code blocks, use vertical stacking. -- **Markdown tables** for mode/variant comparisons and decision/approach comparisons. -- Keep diagrams proportionate to the plan. A 6-unit linear chain gets a simple 6-node graph. A complex dependency graph with fan-out and fan-in may need 10-15 nodes -- that is fine if every node earns its place. -- Place inline at the point of relevance, not in a separate section. -- Plan-structure level only -- unit dependencies, component interactions, mode comparisons, impact surfaces. Not implementation architecture, data schemas, or code structure (those belong in Section 3.4). -- Prose is authoritative: when a visual aid and its surrounding prose disagree, the prose governs. - -After generating a visual aid, verify it accurately represents the plan sections it illustrates -- correct dependency edges, no missing surfaces, no merged units. +When the plan contains 4+ implementation units with non-linear dependencies, 3+ interacting surfaces in System-Wide Impact, 3+ behavioral modes/variants in Overview or Problem Frame, or 3+ interacting decisions in Key Technical Decisions or alternatives in Alternative Approaches, read `references/visual-communication.md` for diagram and table guidance. This covers plan-structure visuals (dependency graphs, interaction diagrams, comparison tables) — not solution-design diagrams, which are covered in Section 3.4. ### Phase 5: Final Review, Write File, and Handoff @@ -701,323 +673,12 @@ Build a risk profile. Treat these as high-risk signals: If the plan already appears sufficiently grounded and the thin-grounding override does not apply, report "Confidence check passed — no sections need strengthening" and skip to Phase 5.3.8 (Document Review). Document-review always runs regardless of whether deepening was needed — the two tools catch different classes of issues. -##### 5.3.3 Score Confidence Gaps - -Use a checklist-first, risk-weighted scoring pass. - -For each section, compute: -- **Trigger count** - number of checklist problems that apply -- **Risk bonus** - add 1 if the topic is high-risk and this section is materially relevant to that risk -- **Critical-section bonus** - add 1 for `Key Technical Decisions`, `Implementation Units`, `System-Wide Impact`, `Risks & Dependencies`, or `Open Questions` in `Standard` or `Deep` plans - -Treat a section as a candidate if: -- it hits **2+ total points**, or -- it hits **1+ point** in a high-risk domain and the section is materially important - -Choose only the top **2-5** sections by score. If deepening a lightweight plan (high-risk exception), cap at **1-2** sections. - -If the plan already has a `deepened:` date: -- Prefer sections that have not yet been substantially strengthened, if their scores are comparable -- Revisit an already-deepened section only when it still scores clearly higher than alternatives - -**Section Checklists:** - -**Requirements Trace** -- Requirements are vague or disconnected from implementation units -- Success criteria are missing or not reflected downstream -- Units do not clearly advance the traced requirements -- Origin requirements are not clearly carried forward - -**Context & Research / Sources & References** -- Relevant repo patterns are named but never used in decisions or implementation units -- Cited learnings or references do not materially shape the plan -- High-risk work lacks appropriate external or internal grounding -- Research is generic instead of tied to this repo or this plan - -**Key Technical Decisions** -- A decision is stated without rationale -- Rationale does not explain tradeoffs or rejected alternatives -- The decision does not connect back to scope, requirements, or origin context -- An obvious design fork exists but the plan never addresses why one path won - -**Open Questions** -- Product blockers are hidden as assumptions -- Planning-owned questions are incorrectly deferred to implementation -- Resolved questions have no clear basis in repo context, research, or origin decisions -- Deferred items are too vague to be useful later - -**High-Level Technical Design (when present)** -- The sketch uses the wrong medium for the work -- The sketch contains implementation code rather than pseudo-code -- The non-prescriptive framing is missing or weak -- The sketch does not connect to the key technical decisions or implementation units - -**High-Level Technical Design (when absent)** *(Standard or Deep plans only)* -- The work involves DSL design, API surface design, multi-component integration, complex data flow, or state-heavy lifecycle -- Key technical decisions would be easier to validate with a visual or pseudo-code representation -- The approach section of implementation units is thin and a higher-level technical design would provide context - -**Implementation Units** -- Dependency order is unclear or likely wrong -- File paths or test file paths are missing where they should be explicit -- Units are too large, too vague, or broken into micro-steps -- Approach notes are thin or do not name the pattern to follow -- Test scenarios are vague (don't name inputs and expected outcomes), skip applicable categories (e.g., no error paths for a unit with failure modes, no integration scenarios for a unit crossing layers), or are disproportionate to the unit's complexity -- Feature-bearing units have blank or missing test scenarios (feature-bearing units require actual test scenarios; the `Test expectation: none` annotation is only valid for non-feature-bearing units) -- Verification outcomes are vague or not expressed as observable results - -**System-Wide Impact** -- Affected interfaces, callbacks, middleware, entry points, or parity surfaces are missing -- Failure propagation is underexplored -- State lifecycle, caching, or data integrity risks are absent where relevant -- Integration coverage is weak for cross-layer work - -**Risks & Dependencies / Documentation / Operational Notes** -- Risks are listed without mitigation -- Rollout, monitoring, migration, or support implications are missing when warranted -- External dependency assumptions are weak or unstated -- Security, privacy, performance, or data risks are absent where they obviously apply - -Use the plan's own `Context & Research` and `Sources & References` as evidence. If those sections cite a pattern, learning, or risk that never affects decisions, implementation units, or verification, treat that as a confidence gap. - -##### 5.3.4 Report and Dispatch Targeted Research - -Before dispatching agents, report what sections are being strengthened and why: - -```text -Strengthening [section names] — [brief reason for each, e.g., "decision rationale is thin", "cross-boundary effects aren't mapped"] -``` - -For each selected section, choose the smallest useful agent set. Do **not** run every agent. Use at most **1-3 agents per section** and usually no more than **8 agents total**. - -Use fully-qualified agent names inside Task calls. - -**Deterministic Section-to-Agent Mapping:** - -**Requirements Trace / Open Questions classification** -- `compound-engineering:workflow:spec-flow-analyzer` for missing user flows, edge cases, and handoff gaps -- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for repo-grounded patterns, conventions, and implementation reality checks - -**Context & Research / Sources & References gaps** -- `compound-engineering:research:learnings-researcher` for institutional knowledge and past solved problems -- `compound-engineering:research:framework-docs-researcher` for official framework or library behavior -- `compound-engineering:research:best-practices-researcher` for current external patterns and industry guidance -- Add `compound-engineering:research:git-history-analyzer` only when historical rationale or prior art is materially missing - -**Key Technical Decisions** -- `compound-engineering:review:architecture-strategist` for design integrity, boundaries, and architectural tradeoffs -- Add `compound-engineering:research:framework-docs-researcher` or `compound-engineering:research:best-practices-researcher` when the decision needs external grounding beyond repo evidence - -**High-Level Technical Design** -- `compound-engineering:review:architecture-strategist` for validating that the technical design accurately represents the intended approach and identifying gaps -- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for grounding the technical design in existing repo patterns and conventions -- Add `compound-engineering:research:best-practices-researcher` when the technical design involves a DSL, API surface, or pattern that benefits from external validation - -**Implementation Units / Verification** -- `compound-engineering:research:repo-research-analyst` (Scope: `patterns`) for concrete file targets, patterns to follow, and repo-specific sequencing clues -- `compound-engineering:review:pattern-recognition-specialist` for consistency, duplication risks, and alignment with existing patterns -- Add `compound-engineering:workflow:spec-flow-analyzer` when sequencing depends on user flow or handoff completeness - -**System-Wide Impact** -- `compound-engineering:review:architecture-strategist` for cross-boundary effects, interface surfaces, and architectural knock-on impact -- Add the specific specialist that matches the risk: - - `compound-engineering:review:performance-oracle` for scalability, latency, throughput, and resource-risk analysis - - `compound-engineering:review:security-sentinel` for auth, validation, exploit surfaces, and security boundary review - - `compound-engineering:review:data-integrity-guardian` for migrations, persistent state safety, consistency, and data lifecycle risks - -**Risks & Dependencies / Operational Notes** -- Use the specialist that matches the actual risk: - - `compound-engineering:review:security-sentinel` for security, auth, privacy, and exploit risk - - `compound-engineering:review:data-integrity-guardian` for persistent data safety, constraints, and transaction boundaries - - `compound-engineering:review:data-migration-expert` for migration realism, backfills, and production data transformation risk - - `compound-engineering:review:deployment-verification-agent` for rollout checklists, rollback planning, and launch verification - - `compound-engineering:review:performance-oracle` for capacity, latency, and scaling concerns - -**Agent Prompt Shape:** - -For each selected section, pass: -- The scope prefix from the mapping above when the agent supports scoped invocation -- A short plan summary -- The exact section text -- Why the section was selected, including which checklist triggers fired -- The plan depth and risk profile -- A specific question to answer - -Instruct the agent to return: -- findings that change planning quality -- stronger rationale, sequencing, verification, risk treatment, or references -- no implementation code -- no shell commands - -##### 5.3.5 Choose Research Execution Mode - -Use the lightest mode that will work: - -- **Direct mode** - Default. Use when the selected section set is small and the parent can safely read the agent outputs inline. -- **Artifact-backed mode** - Use only when the selected research scope is large enough that inline returns would create unnecessary context pressure. - -Signals that justify artifact-backed mode: -- More than 5 agents are likely to return meaningful findings -- The selected section excerpts are long enough that repeating them in multiple agent outputs would be wasteful -- The topic is high-risk and likely to attract bulky source-backed analysis - -If artifact-backed mode is not clearly warranted, stay in direct mode. - -Artifact-backed mode uses a per-run scratch directory under `.context/compound-engineering/ce-plan/deepen/`. - -##### 5.3.6 Run Targeted Research - -Launch the selected agents in parallel using the execution mode chosen above. If the current platform does not support parallel dispatch, run them sequentially instead. - -Prefer local repo and institutional evidence first. Use external research only when the gap cannot be closed responsibly from repo context or already-cited sources. - -If a selected section can be improved by reading the origin document more carefully, do that before dispatching external agents. - -**Direct mode:** Have each selected agent return its findings directly to the parent. Keep the return payload focused: strongest findings only, the evidence or sources that matter, the concrete planning improvement implied by the finding. - -**Artifact-backed mode:** For each selected agent, instruct it to write one compact artifact file in the scratch directory and return only a short completion summary. Each artifact should contain: target section, why selected, 3-7 findings, source-backed rationale, the specific plan change implied by each finding. No implementation code, no shell commands. - -If an artifact is missing or clearly malformed, re-run that agent or fall back to direct-mode reasoning for that section. - -If agent outputs conflict: -- Prefer repo-grounded and origin-grounded evidence over generic advice -- Prefer official framework documentation over secondary best-practice summaries when the conflict is about library behavior -- If a real tradeoff remains, record it explicitly in the plan - -##### 5.3.6b Interactive Finding Review (Interactive Mode Only) - -Skip this step in auto mode — proceed directly to 5.3.7. - -In interactive mode, present each agent's findings to the user before integration. For each agent that returned findings: - -1. **Summarize the agent and its target section** — e.g., "The architecture-strategist reviewed Key Technical Decisions and found:" -2. **Present the findings concisely** — bullet the key points, not the raw agent output. Include enough context for the user to evaluate: what the agent found, what evidence supports it, and what plan change it implies. -3. **Ask the user** using the platform's blocking question tool when available (see Interaction Method): - - **Accept** — integrate these findings into the plan - - **Reject** — discard these findings entirely - - **Discuss** — the user wants to talk through the findings before deciding - -If the user chooses "Discuss", engage in brief dialogue about the findings and then re-ask with only accept/reject (no discuss option on the second ask). The user makes a deliberate choice either way. - -When presenting findings from multiple agents targeting the same section, present them one agent at a time so the user can make independent decisions. Do not merge findings from different agents before showing them. - -After all agents have been reviewed, carry only the accepted findings forward to 5.3.7. - -If the user accepted no findings, report "No findings accepted — plan unchanged." If artifact-backed mode was used, clean up the scratch directory before continuing. Then proceed directly to Phase 5.4 (skip document-review and synthesis — the plan was not modified). This interactive-mode-only skip does not apply in auto mode; auto mode always proceeds through 5.3.7 and 5.3.8. - -If findings were accepted and the plan was modified, proceed through 5.3.7 and 5.3.8 as normal — document-review acts as a quality gate on the changes. - -##### 5.3.7 Synthesize and Update the Plan - -Strengthen only the selected sections. Keep the plan coherent and preserve its overall structure. - -**In interactive mode:** Only integrate findings the user accepted in 5.3.6b. If some findings from different agents touch the same section, reconcile them coherently but do not reintroduce rejected findings. - -Allowed changes: -- Clarify or strengthen decision rationale -- Tighten requirements trace or origin fidelity -- Reorder or split implementation units when sequencing is weak -- Add missing pattern references, file/test paths, or verification outcomes -- Expand system-wide impact, risks, or rollout treatment where justified -- Reclassify open questions between `Resolved During Planning` and `Deferred to Implementation` when evidence supports the change -- Strengthen, replace, or add a High-Level Technical Design section when the work warrants it and the current representation is weak -- Strengthen or add per-unit technical design fields where the unit's approach is non-obvious -- Add or update `deepened: YYYY-MM-DD` in frontmatter when the plan was substantively improved - -Do **not**: -- Add implementation code — no imports, exact method signatures, or framework-specific syntax. Pseudo-code sketches and DSL grammars are allowed -- Add git commands, commit choreography, or exact test command recipes -- Add generic `Research Insights` subsections everywhere -- Rewrite the entire plan from scratch -- Invent new product requirements, scope changes, or success criteria without surfacing them explicitly - -If research reveals a product-level ambiguity that should change behavior or scope: -- Do not silently decide it here -- Record it under `Open Questions` -- Recommend `ce:brainstorm` if the gap is truly product-defining - -##### 5.3.8 Document Review - -After the confidence check (and any deepening), run the `document-review` skill on the plan file. Pass the plan path as the argument. When this step is reached, it is mandatory — do not skip it because the confidence check already ran. The two tools catch different classes of issues. - -The confidence check and document-review are complementary: -- The confidence check strengthens rationale, sequencing, risk treatment, and grounding -- Document-review checks coherence, feasibility, scope alignment, and surfaces role-specific issues - -If document-review returns findings that were auto-applied, note them briefly when presenting handoff options. If residual P0/P1 findings were surfaced, mention them so the user can decide whether to address them before proceeding. - -When document-review returns "Review complete", proceed to Final Checks. - -**Pipeline mode:** If invoked from an automated workflow such as LFG, SLFG, or any `disable-model-invocation` context, run `document-review` with `mode:headless` and the plan path. Headless mode applies auto-fixes silently and returns structured findings without interactive prompts. Address any P0/P1 findings before returning control to the caller. - -##### 5.3.9 Final Checks and Cleanup - -Before proceeding to post-generation options: -- Confirm the plan is stronger in specific ways, not merely longer -- Confirm the planning boundary is intact -- Confirm origin decisions were preserved when an origin document exists - -If artifact-backed mode was used: -- Clean up the temporary scratch directory after the plan is safely updated -- If cleanup is not practical on the current platform, note where the artifacts were left - -#### 5.4 Post-Generation Options - -**Pipeline mode:** If invoked from an automated workflow such as LFG, SLFG, or any `disable-model-invocation` context, skip the interactive menu below and return control to the caller immediately. The plan file has already been written, the confidence check has already run, and document-review has already run — the caller (e.g., lfg, slfg) determines the next step. - -After document-review completes, present the options using the platform's blocking question tool when available (see Interaction Method). Otherwise present numbered options in chat and wait for the user's reply before proceeding. - -**Question:** "Plan ready at `docs/plans/YYYY-MM-DD-NNN---plan.md`. What would you like to do next?" - -**Options:** -1. **Start `/ce:work`** - Begin implementing this plan in the current environment (recommended) -2. **Open plan in editor** - Open the plan file for review -3. **Run additional document review** - Another pass for further refinement -4. **Share to Proof** - Upload the plan for collaborative review and sharing -5. **Start `/ce:work` in another session** - Begin implementing in a separate agent session when the current platform supports it -6. **Create Issue** - Create an issue in the configured tracker - -Based on selection: -- **Open plan in editor** → Open `docs/plans/.md` using the current platform's file-open or editor mechanism (e.g., `open` on macOS, `xdg-open` on Linux, or the IDE's file-open API) -- **Run additional document review** → Load the `document-review` skill with the plan path for another pass -- **Share to Proof** → Upload the plan: - ```bash - CONTENT=$(cat docs/plans/.md) - TITLE="Plan: " - RESPONSE=$(curl -s -X POST https://www.proofeditor.ai/share/markdown \ - -H "Content-Type: application/json" \ - -d "$(jq -n --arg title "$TITLE" --arg markdown "$CONTENT" --arg by "ai:compound" '{title: $title, markdown: $markdown, by: $by}')") - PROOF_URL=$(echo "$RESPONSE" | jq -r '.tokenUrl') - ``` - Display `View & collaborate in Proof: ` if successful, then return to the options -- **`/ce:work`** → Call `/ce:work` with the plan path -- **`/ce:work` in another session** → If the current platform supports launching a separate agent session, start `/ce:work` with the plan path there. Otherwise, explain the limitation briefly and offer to run `/ce:work` in the current session instead. -- **Create Issue** → Follow the Issue Creation section below -- **Other** → Accept free text for revisions and loop back to options - -## Issue Creation - -When the user selects "Create Issue", detect their project tracker from `AGENTS.md` or, if needed for compatibility, `CLAUDE.md`: - -1. Look for `project_tracker: github` or `project_tracker: linear` -2. If GitHub: +##### 5.3.3–5.3.7 Deepening Execution - ```bash - gh issue create --title ": " --body-file <plan_path> - ``` +When deepening is warranted, read `references/deepening-workflow.md` for confidence scoring checklists, section-to-agent dispatch mapping, execution mode selection, research execution, interactive finding review, and plan synthesis instructions. Execute steps 5.3.3 through 5.3.7 from that file, then return here for 5.3.8. -3. If Linear: +##### 5.3.8–5.4 Document Review, Final Checks, and Post-Generation Options - ```bash - linear issue create --title "<title>" --description "$(cat <plan_path>)" - ``` - -4. If no tracker is configured: - - Ask which tracker they use using the platform's blocking question tool when available (see Interaction Method) - - Suggest adding the tracker to `AGENTS.md` for future runs - -After issue creation: -- Display the issue URL -- Ask whether to proceed to `/ce:work` +When reaching this phase, read `references/plan-handoff.md` for document review instructions (5.3.8), final checks and cleanup (5.3.9), post-generation options menu (5.4), and issue creation. Do not load this file earlier. Document review is mandatory — do not skip it even if the confidence check already ran. NEVER CODE! Research, decide, and write the plan. diff --git a/plugins/compound-engineering/skills/ce-plan/references/deepening-workflow.md b/plugins/compound-engineering/skills/ce-plan/references/deepening-workflow.md new file mode 100644 index 00000000..85ad6928 --- /dev/null +++ b/plugins/compound-engineering/skills/ce-plan/references/deepening-workflow.md @@ -0,0 +1,238 @@ +# Deepening Workflow + +This file contains the confidence-check execution path (5.3.3-5.3.7). Load it only when the deepening gate at 5.3.2 determines that deepening is warranted. + +## 5.3.3 Score Confidence Gaps + +Use a checklist-first, risk-weighted scoring pass. + +For each section, compute: +- **Trigger count** - number of checklist problems that apply +- **Risk bonus** - add 1 if the topic is high-risk and this section is materially relevant to that risk +- **Critical-section bonus** - add 1 for `Key Technical Decisions`, `Implementation Units`, `System-Wide Impact`, `Risks & Dependencies`, or `Open Questions` in `Standard` or `Deep` plans + +Treat a section as a candidate if: +- it hits **2+ total points**, or +- it hits **1+ point** in a high-risk domain and the section is materially important + +Choose only the top **2-5** sections by score. If deepening a lightweight plan (high-risk exception), cap at **1-2** sections. + +If the plan already has a `deepened:` date: +- Prefer sections that have not yet been substantially strengthened, if their scores are comparable +- Revisit an already-deepened section only when it still scores clearly higher than alternatives + +**Section Checklists:** + +**Requirements Trace** +- Requirements are vague or disconnected from implementation units +- Success criteria are missing or not reflected downstream +- Units do not clearly advance the traced requirements +- Origin requirements are not clearly carried forward + +**Context & Research / Sources & References** +- Relevant repo patterns are named but never used in decisions or implementation units +- Cited learnings or references do not materially shape the plan +- High-risk work lacks appropriate external or internal grounding +- Research is generic instead of tied to this repo or this plan + +**Key Technical Decisions** +- A decision is stated without rationale +- Rationale does not explain tradeoffs or rejected alternatives +- The decision does not connect back to scope, requirements, or origin context +- An obvious design fork exists but the plan never addresses why one path won + +**Open Questions** +- Product blockers are hidden as assumptions +- Planning-owned questions are incorrectly deferred to implementation +- Resolved questions have no clear basis in repo context, research, or origin decisions +- Deferred items are too vague to be useful later + +**High-Level Technical Design (when present)** +- The sketch uses the wrong medium for the work +- The sketch contains implementation code rather than pseudo-code +- The non-prescriptive framing is missing or weak +- The sketch does not connect to the key technical decisions or implementation units + +**High-Level Technical Design (when absent)** *(Standard or Deep plans only)* +- The work involves DSL design, API surface design, multi-component integration, complex data flow, or state-heavy lifecycle +- Key technical decisions would be easier to validate with a visual or pseudo-code representation +- The approach section of implementation units is thin and a higher-level technical design would provide context + +**Implementation Units** +- Dependency order is unclear or likely wrong +- File paths or test file paths are missing where they should be explicit +- Units are too large, too vague, or broken into micro-steps +- Approach notes are thin or do not name the pattern to follow +- Test scenarios are vague (don't name inputs and expected outcomes), skip applicable categories (e.g., no error paths for a unit with failure modes, no integration scenarios for a unit crossing layers), or are disproportionate to the unit's complexity +- Feature-bearing units have blank or missing test scenarios (feature-bearing units require actual test scenarios; the `Test expectation: none` annotation is only valid for non-feature-bearing units) +- Verification outcomes are vague or not expressed as observable results + +**System-Wide Impact** +- Affected interfaces, callbacks, middleware, entry points, or parity surfaces are missing +- Failure propagation is underexplored +- State lifecycle, caching, or data integrity risks are absent where relevant +- Integration coverage is weak for cross-layer work + +**Risks & Dependencies / Documentation / Operational Notes** +- Risks are listed without mitigation +- Rollout, monitoring, migration, or support implications are missing when warranted +- External dependency assumptions are weak or unstated +- Security, privacy, performance, or data risks are absent where they obviously apply + +Use the plan's own `Context & Research` and `Sources & References` as evidence. If those sections cite a pattern, learning, or risk that never affects decisions, implementation units, or verification, treat that as a confidence gap. + +## 5.3.4 Report and Dispatch Targeted Research + +Before dispatching agents, report what sections are being strengthened and why: + +```text +Strengthening [section names] — [brief reason for each, e.g., "decision rationale is thin", "cross-boundary effects aren't mapped"] +``` + +For each selected section, choose the smallest useful agent set. Do **not** run every agent. Use at most **1-3 agents per section** and usually no more than **8 agents total**. + +Use fully-qualified agent names inside Task calls. + +**Deterministic Section-to-Agent Mapping:** + +**Requirements Trace / Open Questions classification** +- `compound-engineering:workflow:spec-flow-analyzer` for missing user flows, edge cases, and handoff gaps +- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for repo-grounded patterns, conventions, and implementation reality checks + +**Context & Research / Sources & References gaps** +- `compound-engineering:research:learnings-researcher` for institutional knowledge and past solved problems +- `compound-engineering:research:framework-docs-researcher` for official framework or library behavior +- `compound-engineering:research:best-practices-researcher` for current external patterns and industry guidance +- Add `compound-engineering:research:git-history-analyzer` only when historical rationale or prior art is materially missing + +**Key Technical Decisions** +- `compound-engineering:review:architecture-strategist` for design integrity, boundaries, and architectural tradeoffs +- Add `compound-engineering:research:framework-docs-researcher` or `compound-engineering:research:best-practices-researcher` when the decision needs external grounding beyond repo evidence + +**High-Level Technical Design** +- `compound-engineering:review:architecture-strategist` for validating that the technical design accurately represents the intended approach and identifying gaps +- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for grounding the technical design in existing repo patterns and conventions +- Add `compound-engineering:research:best-practices-researcher` when the technical design involves a DSL, API surface, or pattern that benefits from external validation + +**Implementation Units / Verification** +- `compound-engineering:research:repo-research-analyst` (Scope: `patterns`) for concrete file targets, patterns to follow, and repo-specific sequencing clues +- `compound-engineering:review:pattern-recognition-specialist` for consistency, duplication risks, and alignment with existing patterns +- Add `compound-engineering:workflow:spec-flow-analyzer` when sequencing depends on user flow or handoff completeness + +**System-Wide Impact** +- `compound-engineering:review:architecture-strategist` for cross-boundary effects, interface surfaces, and architectural knock-on impact +- Add the specific specialist that matches the risk: + - `compound-engineering:review:performance-oracle` for scalability, latency, throughput, and resource-risk analysis + - `compound-engineering:review:security-sentinel` for auth, validation, exploit surfaces, and security boundary review + - `compound-engineering:review:data-integrity-guardian` for migrations, persistent state safety, consistency, and data lifecycle risks + +**Risks & Dependencies / Operational Notes** +- Use the specialist that matches the actual risk: + - `compound-engineering:review:security-sentinel` for security, auth, privacy, and exploit risk + - `compound-engineering:review:data-integrity-guardian` for persistent data safety, constraints, and transaction boundaries + - `compound-engineering:review:data-migration-expert` for migration realism, backfills, and production data transformation risk + - `compound-engineering:review:deployment-verification-agent` for rollout checklists, rollback planning, and launch verification + - `compound-engineering:review:performance-oracle` for capacity, latency, and scaling concerns + +**Agent Prompt Shape:** + +For each selected section, pass: +- The scope prefix from the mapping above when the agent supports scoped invocation +- A short plan summary +- The exact section text +- Why the section was selected, including which checklist triggers fired +- The plan depth and risk profile +- A specific question to answer + +Instruct the agent to return: +- findings that change planning quality +- stronger rationale, sequencing, verification, risk treatment, or references +- no implementation code +- no shell commands + +## 5.3.5 Choose Research Execution Mode + +Use the lightest mode that will work: + +- **Direct mode** - Default. Use when the selected section set is small and the parent can safely read the agent outputs inline. +- **Artifact-backed mode** - Use only when the selected research scope is large enough that inline returns would create unnecessary context pressure. + +Signals that justify artifact-backed mode: +- More than 5 agents are likely to return meaningful findings +- The selected section excerpts are long enough that repeating them in multiple agent outputs would be wasteful +- The topic is high-risk and likely to attract bulky source-backed analysis + +If artifact-backed mode is not clearly warranted, stay in direct mode. + +Artifact-backed mode uses a per-run scratch directory under `.context/compound-engineering/ce-plan/deepen/`. + +## 5.3.6 Run Targeted Research + +Launch the selected agents in parallel using the execution mode chosen above. If the current platform does not support parallel dispatch, run them sequentially instead. + +Prefer local repo and institutional evidence first. Use external research only when the gap cannot be closed responsibly from repo context or already-cited sources. + +If a selected section can be improved by reading the origin document more carefully, do that before dispatching external agents. + +**Direct mode:** Have each selected agent return its findings directly to the parent. Keep the return payload focused: strongest findings only, the evidence or sources that matter, the concrete planning improvement implied by the finding. + +**Artifact-backed mode:** For each selected agent, instruct it to write one compact artifact file in the scratch directory and return only a short completion summary. Each artifact should contain: target section, why selected, 3-7 findings, source-backed rationale, the specific plan change implied by each finding. No implementation code, no shell commands. + +If an artifact is missing or clearly malformed, re-run that agent or fall back to direct-mode reasoning for that section. + +If agent outputs conflict: +- Prefer repo-grounded and origin-grounded evidence over generic advice +- Prefer official framework documentation over secondary best-practice summaries when the conflict is about library behavior +- If a real tradeoff remains, record it explicitly in the plan + +## 5.3.6b Interactive Finding Review (Interactive Mode Only) + +Skip this step in auto mode — proceed directly to 5.3.7. + +In interactive mode, present each agent's findings to the user before integration. For each agent that returned findings: + +1. **Summarize the agent and its target section** — e.g., "The architecture-strategist reviewed Key Technical Decisions and found:" +2. **Present the findings concisely** — bullet the key points, not the raw agent output. Include enough context for the user to evaluate: what the agent found, what evidence supports it, and what plan change it implies. +3. **Ask the user** using the platform's blocking question tool when available (see Interaction Method): + - **Accept** — integrate these findings into the plan + - **Reject** — discard these findings entirely + - **Discuss** — the user wants to talk through the findings before deciding + +If the user chooses "Discuss", engage in brief dialogue about the findings and then re-ask with only accept/reject (no discuss option on the second ask). The user makes a deliberate choice either way. + +When presenting findings from multiple agents targeting the same section, present them one agent at a time so the user can make independent decisions. Do not merge findings from different agents before showing them. + +After all agents have been reviewed, carry only the accepted findings forward to 5.3.7. + +If the user accepted no findings, report "No findings accepted — plan unchanged." If artifact-backed mode was used, clean up the scratch directory before continuing. Then proceed directly to Phase 5.4 (skip document-review and synthesis — the plan was not modified). This interactive-mode-only skip does not apply in auto mode; auto mode always proceeds through 5.3.7 and 5.3.8. + +If findings were accepted and the plan was modified, proceed through 5.3.7 and 5.3.8 as normal — document-review acts as a quality gate on the changes. + +## 5.3.7 Synthesize and Update the Plan + +Strengthen only the selected sections. Keep the plan coherent and preserve its overall structure. + +**In interactive mode:** Only integrate findings the user accepted in 5.3.6b. If some findings from different agents touch the same section, reconcile them coherently but do not reintroduce rejected findings. + +Allowed changes: +- Clarify or strengthen decision rationale +- Tighten requirements trace or origin fidelity +- Reorder or split implementation units when sequencing is weak +- Add missing pattern references, file/test paths, or verification outcomes +- Expand system-wide impact, risks, or rollout treatment where justified +- Reclassify open questions between `Resolved During Planning` and `Deferred to Implementation` when evidence supports the change +- Strengthen, replace, or add a High-Level Technical Design section when the work warrants it and the current representation is weak +- Strengthen or add per-unit technical design fields where the unit's approach is non-obvious +- Add or update `deepened: YYYY-MM-DD` in frontmatter when the plan was substantively improved + +Do **not**: +- Add implementation code — no imports, exact method signatures, or framework-specific syntax. Pseudo-code sketches and DSL grammars are allowed +- Add git commands, commit choreography, or exact test command recipes +- Add generic `Research Insights` subsections everywhere +- Rewrite the entire plan from scratch +- Invent new product requirements, scope changes, or success criteria without surfacing them explicitly + +If research reveals a product-level ambiguity that should change behavior or scope: +- Do not silently decide it here +- Record it under `Open Questions` +- Recommend `ce:brainstorm` if the gap is truly product-defining diff --git a/plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md b/plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md new file mode 100644 index 00000000..3b95ff65 --- /dev/null +++ b/plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md @@ -0,0 +1,87 @@ +# Plan Handoff + +This file contains post-plan-writing instructions: document review, post-generation options, and issue creation. Load it after the plan file has been written and the confidence check (5.3.1-5.3.7) is complete. + +## 5.3.8 Document Review + +After the confidence check (and any deepening), run the `document-review` skill on the plan file. Pass the plan path as the argument. When this step is reached, it is mandatory — do not skip it because the confidence check already ran. The two tools catch different classes of issues. + +The confidence check and document-review are complementary: +- The confidence check strengthens rationale, sequencing, risk treatment, and grounding +- Document-review checks coherence, feasibility, scope alignment, and surfaces role-specific issues + +If document-review returns findings that were auto-applied, note them briefly when presenting handoff options. If residual P0/P1 findings were surfaced, mention them so the user can decide whether to address them before proceeding. + +When document-review returns "Review complete", proceed to Final Checks. + +**Pipeline mode:** If invoked from an automated workflow such as LFG, SLFG, or any `disable-model-invocation` context, run `document-review` with `mode:headless` and the plan path. Headless mode applies auto-fixes silently and returns structured findings without interactive prompts. Address any P0/P1 findings before returning control to the caller. + +## 5.3.9 Final Checks and Cleanup + +Before proceeding to post-generation options: +- Confirm the plan is stronger in specific ways, not merely longer +- Confirm the planning boundary is intact +- Confirm origin decisions were preserved when an origin document exists + +If artifact-backed mode was used: +- Clean up the temporary scratch directory after the plan is safely updated +- If cleanup is not practical on the current platform, note where the artifacts were left + +## 5.4 Post-Generation Options + +**Pipeline mode:** If invoked from an automated workflow such as LFG, SLFG, or any `disable-model-invocation` context, skip the interactive menu below and return control to the caller immediately. The plan file has already been written, the confidence check has already run, and document-review has already run — the caller (e.g., lfg, slfg) determines the next step. + +After document-review completes, present the options using the platform's blocking question tool when available (see Interaction Method). Otherwise present numbered options in chat and wait for the user's reply before proceeding. + +**Question:** "Plan ready at `docs/plans/YYYY-MM-DD-NNN-<type>-<name>-plan.md`. What would you like to do next?" + +**Options:** +1. **Start `/ce:work`** - Begin implementing this plan in the current environment (recommended) +2. **Open plan in editor** - Open the plan file for review +3. **Run additional document review** - Another pass for further refinement +4. **Share to Proof** - Upload the plan for collaborative review and sharing +5. **Start `/ce:work` in another session** - Begin implementing in a separate agent session when the current platform supports it +6. **Create Issue** - Create an issue in the configured tracker + +Based on selection: +- **Open plan in editor** -> Open `docs/plans/<plan_filename>.md` using the current platform's file-open or editor mechanism (e.g., `open` on macOS, `xdg-open` on Linux, or the IDE's file-open API) +- **Run additional document review** -> Load the `document-review` skill with the plan path for another pass +- **Share to Proof** -> Upload the plan: + ```bash + CONTENT=$(cat docs/plans/<plan_filename>.md) + TITLE="Plan: <plan title from frontmatter>" + RESPONSE=$(curl -s -X POST https://www.proofeditor.ai/share/markdown \ + -H "Content-Type: application/json" \ + -d "$(jq -n --arg title "$TITLE" --arg markdown "$CONTENT" --arg by "ai:compound" '{title: $title, markdown: $markdown, by: $by}')") + PROOF_URL=$(echo "$RESPONSE" | jq -r '.tokenUrl') + ``` + Display `View & collaborate in Proof: <PROOF_URL>` if successful, then return to the options +- **`/ce:work`** -> Call `/ce:work` with the plan path +- **`/ce:work` in another session** -> If the current platform supports launching a separate agent session, start `/ce:work` with the plan path there. Otherwise, explain the limitation briefly and offer to run `/ce:work` in the current session instead. +- **Create Issue** -> Follow the Issue Creation section below +- **Other** -> Accept free text for revisions and loop back to options + +## Issue Creation + +When the user selects "Create Issue", detect their project tracker from `AGENTS.md` or, if needed for compatibility, `CLAUDE.md`: + +1. Look for `project_tracker: github` or `project_tracker: linear` +2. If GitHub: + + ```bash + gh issue create --title "<type>: <title>" --body-file <plan_path> + ``` + +3. If Linear: + + ```bash + linear issue create --title "<title>" --description "$(cat <plan_path>)" + ``` + +4. If no tracker is configured: + - Ask which tracker they use using the platform's blocking question tool when available (see Interaction Method) + - Suggest adding the tracker to `AGENTS.md` for future runs + +After issue creation: +- Display the issue URL +- Ask whether to proceed to `/ce:work` diff --git a/plugins/compound-engineering/skills/ce-plan/references/visual-communication.md b/plugins/compound-engineering/skills/ce-plan/references/visual-communication.md new file mode 100644 index 00000000..3b11e297 --- /dev/null +++ b/plugins/compound-engineering/skills/ce-plan/references/visual-communication.md @@ -0,0 +1,31 @@ +# Visual Communication in Plan Documents + +Section 3.4 covers diagrams about the *solution being planned* (pseudo-code, mermaid sequences, state diagrams). The existing Section 4.3 mermaid rule encourages those solution-design diagrams within Technical Design and per-unit fields. This guidance covers a different concern: visual aids that help readers *navigate and comprehend the plan document itself* -- dependency graphs, interaction diagrams, and comparison tables that make plan structure scannable. + +Visual aids are conditional on content patterns, not on plan depth classification -- a Lightweight plan about a complex multi-unit workflow may warrant a dependency graph; a Deep plan about a straightforward feature may not. + +**When to include:** + +| Plan describes... | Visual aid | Placement | +|---|---|---| +| 4+ implementation units with non-linear dependencies (parallelism, diamonds, fan-in/fan-out) | Mermaid dependency graph | Before or after the Implementation Units heading | +| System-Wide Impact naming 3+ interacting surfaces or cross-layer effects | Mermaid interaction or component diagram | Within the System-Wide Impact section | +| Problem/Overview involving 3+ behavioral modes, states, or variants | Markdown comparison table | Within Overview or Problem Frame | +| Key Technical Decisions with 3+ interacting decisions, or Alternative Approaches with 3+ alternatives | Markdown comparison table | Within the relevant section | + +**When to skip:** +- The plan has 3 or fewer units in a straight dependency chain -- the Dependencies field on each unit is sufficient +- Prose already communicates the relationships clearly +- The visual would duplicate what the High-Level Technical Design section already shows +- The visual describes code-level detail (specific method names, SQL columns, API field lists) + +**Format selection:** +- **Mermaid** (default) for dependency graphs and interaction diagrams -- 5-15 nodes, no in-box annotations, standard flowchart shapes. Use `TB` (top-to-bottom) direction so diagrams stay narrow in both rendered and source form. Source should be readable as fallback in diff views and terminals. +- **ASCII/box-drawing diagrams** for annotated flows that need rich in-box content -- file path layouts, decision logic branches, multi-column spatial arrangements. More expressive than mermaid when the diagram's value comes from annotations within nodes. Follow 80-column max for code blocks, use vertical stacking. +- **Markdown tables** for mode/variant comparisons and decision/approach comparisons. +- Keep diagrams proportionate to the plan. A 6-unit linear chain gets a simple 6-node graph. A complex dependency graph with fan-out and fan-in may need 10-15 nodes -- that is fine if every node earns its place. +- Place inline at the point of relevance, not in a separate section. +- Plan-structure level only -- unit dependencies, component interactions, mode comparisons, impact surfaces. Not implementation architecture, data schemas, or code structure (those belong in Section 3.4). +- Prose is authoritative: when a visual aid and its surrounding prose disagree, the prose governs. + +After generating a visual aid, verify it accurately represents the plan sections it illustrates -- correct dependency edges, no missing surfaces, no merged units. diff --git a/tests/pipeline-review-contract.test.ts b/tests/pipeline-review-contract.test.ts index 91b138f2..eeed1cd2 100644 --- a/tests/pipeline-review-contract.test.ts +++ b/tests/pipeline-review-contract.test.ts @@ -118,10 +118,11 @@ describe("ce:plan testing contract", () => { describe("ce:plan review contract", () => { test("requires document review after confidence check", async () => { - const content = await readRepoFile("plugins/compound-engineering/skills/ce-plan/SKILL.md") + // Document review instructions extracted to references/plan-handoff.md + const content = await readRepoFile("plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md") // Phase 5.3.8 runs document-review before final checks (5.3.9) - expect(content).toContain("##### 5.3.8 Document Review") + expect(content).toContain("## 5.3.8 Document Review") expect(content).toContain("`document-review` skill") // Document review must come before final checks so auto-applied edits are validated @@ -130,16 +131,24 @@ describe("ce:plan review contract", () => { expect(docReviewIdx).toBeLessThan(finalChecksIdx) }) - test("uses headless mode in pipeline context", async () => { + test("SKILL.md stub points to plan-handoff reference", async () => { const content = await readRepoFile("plugins/compound-engineering/skills/ce-plan/SKILL.md") + // Stub references the handoff file and marks document review as mandatory + expect(content).toContain("`references/plan-handoff.md`") + expect(content).toContain("Document review is mandatory") + }) + + test("uses headless mode in pipeline context", async () => { + const content = await readRepoFile("plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md") + // Pipeline mode runs document-review headlessly, not skipping it expect(content).toContain("document-review` with `mode:headless`") expect(content).not.toContain("skip document-review and return control") }) test("handoff options recommend ce:work after review", async () => { - const content = await readRepoFile("plugins/compound-engineering/skills/ce-plan/SKILL.md") + const content = await readRepoFile("plugins/compound-engineering/skills/ce-plan/references/plan-handoff.md") // ce:work is recommended (review already happened) expect(content).toContain("**Start `/ce:work`** - Begin implementing this plan in the current environment (recommended)")