From 7f35df35632a1eb9d00b04b88817b4f5c69eb59e Mon Sep 17 00:00:00 2001 From: Oriol Torrent Florensa Date: Fri, 22 May 2026 18:46:56 +0200 Subject: [PATCH] =?UTF-8?q?docs(pr-review):=20plan=20specs=2012=E2=80=9316?= =?UTF-8?q?=20and=20ADRs=200016=E2=80=930018?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Spec 12 (P0): fold thread fetch + mode detection into ADO Fetcher, fixing the broken `az repos pr thread list` Step 4. Spec 14: ADO CLI preflight + smoke test with inventory completeness check. Specs 13, 15, 16: dry-run mode, fan-out resilience Notice, aspect-selection observability — deferred to follow-up PR. ADR 0016 absorbs two amendments: ADR 0013 carve-out removal and ADR 0015 thread-fetch ABORT exemption. CONTEXT.md's ADO Fetcher definition widened to reflect new responsibility. Refs #120 Co-Authored-By: Claude Sonnet 4.6 --- apps/claude-code/pr-review/CONTEXT.md | 12 +- .../adr/0015-canonical-http-tier-mapping.md | 5 +- ...0016-fold-thread-fetch-into-ado-fetcher.md | 55 ++++++ .../adr/0017-dry-run-as-fourth-peer-mode.md | 46 +++++ .../0018-fan-out-resilience-trust-harness.md | 60 ++++++ apps/claude-code/pr-review/docs/adr/README.md | 36 ++-- ...ix-step4-fold-thread-fetch-into-fetcher.md | 186 ++++++++++++++++++ .../docs/plans/13-formal-dry-run-mode.md | 162 +++++++++++++++ .../14-ado-cli-preflight-and-smoke-test.md | 121 ++++++++++++ ...5-fan-out-resilience-agent-spawn-notice.md | 124 ++++++++++++ .../16-aspect-selection-observability.md | 142 +++++++++++++ .../pr-review/docs/plans/README.md | 5 + 12 files changed, 933 insertions(+), 21 deletions(-) create mode 100644 apps/claude-code/pr-review/docs/adr/0016-fold-thread-fetch-into-ado-fetcher.md create mode 100644 apps/claude-code/pr-review/docs/adr/0017-dry-run-as-fourth-peer-mode.md create mode 100644 apps/claude-code/pr-review/docs/adr/0018-fan-out-resilience-trust-harness.md create mode 100644 apps/claude-code/pr-review/docs/plans/12-fix-step4-fold-thread-fetch-into-fetcher.md create mode 100644 apps/claude-code/pr-review/docs/plans/13-formal-dry-run-mode.md create mode 100644 apps/claude-code/pr-review/docs/plans/14-ado-cli-preflight-and-smoke-test.md create mode 100644 apps/claude-code/pr-review/docs/plans/15-fan-out-resilience-agent-spawn-notice.md create mode 100644 apps/claude-code/pr-review/docs/plans/16-aspect-selection-observability.md diff --git a/apps/claude-code/pr-review/CONTEXT.md b/apps/claude-code/pr-review/CONTEXT.md index 1b896d68..7f534924 100644 --- a/apps/claude-code/pr-review/CONTEXT.md +++ b/apps/claude-code/pr-review/CONTEXT.md @@ -91,10 +91,14 @@ _Avoid_: initial review, fresh review A Review run against an ADO PR where a prior **Bot Signature** is found in the PR's threads. Focuses on commits since the last Review, performs Thread Classification, and replies to or resolves existing Review Threads rather than duplicating them. _Avoid_: incremental review, follow-up review, second pass +**Dry-run mode**: +A Review run against an ADO PR with the `--dry-run` flag. Identical to first-review / re-review for every read-side step — **ADO Fetcher**, **Doc Context Orchestrator**, all Review Aspect agents, and (when prior signature is found) the **Re-review Coordinator's** Thread Classification — but the **ADO Writer** is never invoked. Findings are rendered in the Claude interface using the same severity-grouped format as Pre-PR mode. Used to preview what a Review or Re-review would post before committing to write-back. +_Avoid_: preview review, simulated review, no-post review + ### Orchestration agents **ADO Fetcher**: -A plugin agent that retrieves PR metadata, iterations, changed files, and the raw diff from Azure DevOps. Used by first-review and re-review modes; not invoked in pre-PR mode. +A plugin agent that retrieves PR metadata, iterations, PR threads, changed files, and the raw diff from Azure DevOps — and determines the Review mode from the thread data. All ADO read operations for a PR review are owned here; the orchestrator makes no inline ADO read calls. _Avoid_: fetcher, data agent, ADO client **Re-review Coordinator**: @@ -158,9 +162,9 @@ A single end-of-run line printed by the orchestrator to the Claude interface, re - A **Doc Context** is assembled via a three-tier pipeline: the **Doc Context Orchestrator** spawns **Work Item Summarizer** and **Confluence Fetcher** agents (Doc Context Sub-agents) in parallel, then delegates their outputs to the **Doc Context Synthesizer**, which produces the final `DOC_CONTEXT` narrative injected into every Review Aspect agent - A **Doc Context Sub-agent** operates on a single source (work item or Confluence page) and receives the changed files list and the local diff when available - The **Doc Context Orchestrator** returns the **Doc Context Synthesizer**'s output verbatim; it does not rewrite or reformat the narrative -- The **ADO Fetcher** is invoked by first-review and re-review modes; **Pre-PR mode** skips it entirely and goes directly to Review Aspect agents -- The **Re-review Coordinator** is invoked only when the mode is re-review; first-review and pre-PR modes never load it -- The **ADO Writer** is invoked by first-review and re-review modes; **Pre-PR mode** does not write back to ADO +- The **ADO Fetcher** is invoked by first-review, re-review, and dry-run modes; **Pre-PR mode** skips it entirely and goes directly to Review Aspect agents +- The **Re-review Coordinator** is invoked when the mode is re-review or a dry-run that detected a prior **Bot Signature**; first-review, pre-PR, and dry-run-on-fresh-PR modes never load it. In dry-run mode the Coordinator performs Thread Classification but its results feed only the rendered output — no Replies are posted +- The **ADO Writer** is invoked by first-review and re-review modes; **Pre-PR mode** and **Dry-run mode** do not write back to ADO - Every operation in an orchestration agent terminates in one of the four **Notice Tiers**. **DEGRADED** and **EMPTY-BY-DESIGN**-with-message operations emit a **Notice** that flows from the agent's structured result block, through the orchestrator's merge step, into the **Review Summary** (for ADO modes) or the printed pre-findings block (for **Pre-PR mode**). The end-of-run **Trailer** carries Notice counts so the invoker sees them without opening the PR. ## Example dialogue diff --git a/apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md b/apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md index 282ab676..602b51db 100644 --- a/apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md +++ b/apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md @@ -1,6 +1,6 @@ # ADR 0015 — Canonical HTTP-Tier Mapping -**Status:** Accepted +**Status:** Accepted, amended by 0018 **Date:** 2026-05-13 **Deciders:** Oriol Torrent Florensa **Context:** ADR 0014 (failure-classification helper layer) @@ -55,6 +55,9 @@ Retries are not implemented. Reasons: Re-evaluate if 5xx Notices prove painful in practice; retries can be added behind the same Notice surface without changing the doctrine. +**Update (2026-05-14):** this no-retry policy is extended to agent-spawn failures during orchestrator +fan-out by ADR 0018. The reasoning is identical (AFK latency, retry-storm risk, accurate Notice). + ### Implementation The canonical mapping is implemented in `scripts/ado/classify-http-error.mjs`: diff --git a/apps/claude-code/pr-review/docs/adr/0016-fold-thread-fetch-into-ado-fetcher.md b/apps/claude-code/pr-review/docs/adr/0016-fold-thread-fetch-into-ado-fetcher.md new file mode 100644 index 00000000..f0b04537 --- /dev/null +++ b/apps/claude-code/pr-review/docs/adr/0016-fold-thread-fetch-into-ado-fetcher.md @@ -0,0 +1,55 @@ +# 0016. Fold thread fetch into ADO Fetcher + +**Status:** Accepted (2026-05) +**Context:** Amends ADR 0013 (orchestrator split for review-pr) +**Date:** 2026-05-14 + +## Context + +ADR 0013 split `review-pr.md` into a thin orchestrator and focused agents. As part of that split, ADR 0013 carved out **one allowed inline ADO call**: the mode-detection `az repos pr thread list` in the orchestrator's Step 4. The justification was that mode detection had to happen _before_ the ADO Fetcher could be launched, so the orchestrator needed to fetch and inspect threads itself. + +Two facts surfaced after shipping the split: + +1. **The carved-out command is invalid syntax.** `az repos pr thread list` does not exist in the `azure-devops` extension — valid `az repos pr` subcommands are `create / list / show / update / work-item / set-vote / reviewer / policy / checkout`. There is no `thread` group. The correct call is `az devops invoke --area git --resource pullRequestThreads --route-parameters …`. Every ADO PR review since the orchestrator-split shipped has been failing at Step 4. The first observed-in-the-wild failure is the 2026-05-14 dry-run captured in `docs/conversations/pr-review-dry-run-01.txt`. + +2. **The "fetch threads → derive mode" pair is naturally adjacent to the rest of the Fetcher's work.** The Fetcher already calls `az devops invoke` for `pullRequestIterations`, `pullRequestIterationChanges`, `pullRequestWorkItems`. Adding `pullRequestThreads` to that family keeps ADO read knowledge in one place — the agent that exists exactly to own it. + +The carve-out can go away if the Fetcher returns `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID`, and `RAW_THREADS_JSON` in its structured result block. The orchestrator branches on those fields in Step 5 onwards — the same pattern it already uses for `WORK_ITEM_IDS`, `DIFF_RANGE`, etc. + +## Decision + +**Move thread fetching and mode detection from the orchestrator into the ADO Fetcher.** The Fetcher gains a new step (Step 2.5) between iterations fetch and changed-files fetch: + +1. Call `az devops invoke --area git --resource pullRequestThreads --route-parameters "project=$PROJECT" "repositoryId=$REPO_ID" "pullRequestId=$PR_ID"`. +2. Apply ADR 0015's HTTP-tier mapping: 401/403 → ABORTED; 404 → OK (treat as empty threads; equivalent to first-review); 5xx / network → DEGRADED with `kind: thread-fetch`, treat as empty threads so mode detection still produces `first-review`. +3. Run `detectMode` (from `scripts/mode-detection.mjs`, unchanged) on the response's `.value` array. The function returns `{ mode, isRereview, priorIterationId, summaryThreadId }`. +4. Append `RAW_THREADS_JSON`, `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID` to the Fetcher's result block. + +The orchestrator's Step 4 simplifies to a single `az repos pr show` call that captures PR metadata (`REPO_ID`, `PROJECT`, `SOURCE_BRANCH`, `TARGET_BRANCH`, `PR_TITLE`, `PR_DESCRIPTION`) and passes them into the Fetcher as inputs. The Fetcher no longer makes its own `az repos pr show` call — the data is already in its prompt. + +After this change, the orchestrator contains **no `az devops invoke` calls**. The only remaining inline `az` is `az --version`, `az extension list` (Step 3 preflight), and `az repos pr show` (Step 4 metadata). + +ADR 0013's carve-out (_"The one allowed inline ADO call is the mode-detection `az repos pr thread list` in the mode detection block"_) is **removed**. ADR 0013's core decision — thin orchestrator + focused agents — stands; only that one-sentence exception is superseded by this ADR. + +## Alternatives considered + +**Fix the inline command in Step 4 and leave it in the orchestrator.** Smallest change. Rejected because it preserves the architectural reason for the carve-out (mode detection must precede Fetcher launch) which doesn't actually exist — the orchestrator can equally well branch on Fetcher output. Keeping ADO knowledge in two places means future API changes require edits in two files. + +**Introduce a separate `mode-detector` agent that fetches threads.** Cleaner separation of concerns. Rejected because mode detection and the rest of the Fetcher's work share the same ADO call shape (`az devops invoke --area git --resource …`); a new agent would duplicate the auth/retry/error-classification setup the Fetcher already has. + +**Replace `az` with direct REST calls.** Long-discussed in ADR 0008's neighbourhood. Out of scope for this ADR; revisit if `az` proves a recurring failure source after spec 14's smoke test catches near-term drift. + +## Consequences + +- The plugin's mode-detection path actually works (Step 4 was previously broken on every run). +- ADO read knowledge is centralised in `ado-fetcher.md`. +- Adding new mode-affecting ADO data (e.g. a `mergeAttempt` field) requires editing one agent prompt. +- The Fetcher's result block grows by five lines — downstream prompts (Coordinator, Writer) parse them via existing string-extraction patterns, no helper change. +- ADR 0013's status line is updated to `Accepted (2026-05), amended by 0016`. + +## See also + +- ADR 0013 — Orchestrator split for review-pr (amended) +- ADR 0014 — Notice Tier doctrine (governs the new `kind: thread-fetch` DEGRADED path) +- ADR 0015 — Canonical HTTP-tier mapping (governs the 401/403/5xx behaviour for thread fetch) +- Spec 12 — Implementation of this ADR diff --git a/apps/claude-code/pr-review/docs/adr/0017-dry-run-as-fourth-peer-mode.md b/apps/claude-code/pr-review/docs/adr/0017-dry-run-as-fourth-peer-mode.md new file mode 100644 index 00000000..ca4d5b8b --- /dev/null +++ b/apps/claude-code/pr-review/docs/adr/0017-dry-run-as-fourth-peer-mode.md @@ -0,0 +1,46 @@ +# 0017. Dry-run as a fourth peer operating mode + +**Status:** Accepted (2026-05) +**Date:** 2026-05-14 + +## Context + +The 2026-05-14 dry-run captured in `docs/conversations/pr-review-dry-run-01.txt` was invoked with the natural-language instruction _"Make a dry-run PR review. DO NOT POST ANY COMMENT TO THE PR! Only report inline"_. The plugin had no formal dry-run mechanism. The LLM-as-orchestrator improvised: inlined `az` data fetches (compounding the Step 4 bug), skipped the `pr-review:ado-fetcher` agent entirely, and never invoked the ADO Writer. The user got useful findings but only by accident — the documented spec was bypassed. + +Two design problems: + +1. **No deterministic dry-run path.** Asking the LLM to "not post comments" works most of the time but invites free-form deviation from the spec. A documented mode replaces persuasion with branching. +2. **Dry-run is genuinely useful and not just a debug helper.** Previewing what a re-review would post — before letting it edit threads in a customer-facing PR — is a legitimate user need. + +Two shapes for the formalisation: a fourth peer mode, or an orthogonal boolean flag. + +## Decision + +**Dry-run is a fourth peer mode, alongside pre-PR, first-review, and re-review.** CLI surface: `/pr-review:review-pr --dry-run`. The orchestrator's mode resolution applies the flag after URL parsing; `IS_REREVIEW` (whether a prior bot signature exists) is captured separately and drives Coordinator inclusion within dry-run mode. + +Dry-run runs every read-side step identically to first-review / re-review (preflight, metadata fetch, ADO Fetcher, Doc Context Orchestrator, Review Aspects fan-out, and Re-review Coordinator's Thread Classification when `IS_REREVIEW=true`). The **ADO Writer is never invoked**. Findings render to the Claude interface using pre-PR Step E's severity-grouped format. Notices from Fetcher and Coordinator render via `formatNoticesAsPrePrPreamble`. The Trailer reads `🔍 Dry-run complete: findings ( critical, important) · warning notices · would have posted to `. + +Coordinator runs in dry-run mode when `IS_REREVIEW=true`, but its reply-posting branch is short-circuited. Its `freshFindings` output still feeds `FINDINGS_JSON` so the user previews exactly the incremental finding set the real re-review would have posted. + +## Alternatives considered + +**Orthogonal `DRY_RUN` boolean flag.** Any mode (first-review / re-review) could be dry-run. Conceptually cleaner — dry-run is "do the work, don't post" — but produces awkward mode names internally (`first-review-dry-run`, `re-review-dry-run`) and forces every branch in the orchestrator to read two state variables instead of one. Rejected: peer-mode keeps the orchestrator's mode-switch readable as a single discriminator. + +**Document a "DO NOT POST" convention and trust the LLM.** Zero implementation cost. Rejected because it is exactly what failed on 2026-05-14: the LLM honoured the instruction by skipping not just Writer but also the entire structured fetch path, masking the Step 4 bug and producing inconsistent behaviour across runs. + +**Make dry-run a flag that disables only the Writer agent invocation.** Halfway between the other two. Rejected because the orchestrator's other branches (Step 8 trailer text, Notice rendering) still need to know whether to use the pre-PR rendering path or the ADO Summary path. A single mode value carries that decision uniformly. + +## Consequences + +- The orchestrator branches on `MODE ∈ {pre-pr, dry-run, first-review, re-review}` — four cases, no nested flags. +- Re-review-eligible PRs can be previewed safely: classification runs and reports its plan, but no replies are posted and no thread statuses are PATCHed. +- The Pre-PR rendering helpers (`formatNoticesAsPrePrPreamble`, severity-grouped finding print) are reused by dry-run — confirming that "render to interface" is a coherent capability, not a pre-PR-only quirk. +- The Trailer line gets a fourth shape (`🔍 Dry-run complete: ...`). `formatTrailer` in `scripts/ado/notices.mjs` grows one branch + test cases. +- `CONTEXT.md` gains a `Dry-run mode` term and updates three relationship lines documenting which agents run in which modes. + +## See also + +- Spec 13 — Implementation of this ADR +- Spec 12 — Step 4 fix (the upstream bug that exposed the no-formal-dry-run gap) +- ADR 0014 — Notice Tier doctrine (`formatTrailer`, `formatNoticesAsPrePrPreamble` are governed here) +- `CONTEXT.md` — `Dry-run mode` definition and the agent-invocation matrix diff --git a/apps/claude-code/pr-review/docs/adr/0018-fan-out-resilience-trust-harness.md b/apps/claude-code/pr-review/docs/adr/0018-fan-out-resilience-trust-harness.md new file mode 100644 index 00000000..3fb2815e --- /dev/null +++ b/apps/claude-code/pr-review/docs/adr/0018-fan-out-resilience-trust-harness.md @@ -0,0 +1,60 @@ +# 0018. Fan-out resilience — trust the harness, surface partial failure as a Notice + +**Status:** Accepted (2026-05) +**Context:** Amends ADR 0015 (canonical HTTP-tier mapping — no-retry doctrine) +**Date:** 2026-05-14 + +## Context + +The 2026-05-14 dry-run captured in `docs/conversations/pr-review-dry-run-01.txt` produced a transient `API Error: The socket connection was closed unexpectedly` from the Claude Code HTTP client during Step 6's parallel agent fan-out. Step 6 is the plugin's largest concurrent spawn: Doc Context Orchestrator plus 2–5 review-aspect agents in a single message. + +The error printed in the output but all spawned agents completed normally. The Claude Code harness either auto-retried the launch transparently or the server-side launch had already been accepted before the client socket dropped. The user-visible outcome was a successful run that _looked_ broken. + +Two unaddressed risks: + +1. **Cosmetic confusion.** A scary API error appeared with no framing. AFK users skimming the output can't tell whether a partial failure occurred. +2. **Real partial-failure invisibility.** If a future fan-out _does_ lose one agent (e.g. the harness fails to recover), the orchestrator currently has no post-condition that detects the missing result block. The Review posts with N-1 aspects' findings and the user has no signal that coverage was reduced. + +ADR 0015 already forbids retries for HTTP failures in v1 with three reasons: (1) AFK latency, (2) retry-storm failure mode, (3) DEGRADED Notices carrying accurate information. All three reasons apply identically to agent-spawn failures during fan-out. This ADR extends ADR 0015's doctrine to the orchestrator's fan-out layer. + +## Decision + +**Trust the harness. Don't retry. Surface partial fan-out as a DEGRADED Notice.** + +After Step 6's fan-out completes, the orchestrator runs a post-condition check: every launched Review Aspect agent must have returned a parseable JSON findings array. Agents that return nothing (no result block, no JSON, empty stdout) are recorded as missing. + +If any aspects are missing, the orchestrator emits a single Notice: + +```js +{ severity: 'warning', kind: 'agent-spawn', + message: 'Review Aspects did not return findings: , . Coverage is reduced for this run.' } +``` + +The Notice flows through the same pipeline as Fetcher and Coordinator notices: merged via `mergeNotices`, rendered in the Review Summary (or pre-PR / dry-run preamble), counted in the Trailer's `warning notices` field. + +The Review still posts with whatever findings did arrive. No retry. No double-spawn risk. + +The `agent-spawn` kind is added to the `NoticeKind` enum in `scripts/ado/notices.mjs`. Per ADR 0014's invariant _"each `kind` has exactly one source agent"_, the source agent is **the orchestrator itself** — it is the only place that knows which agents were launched. + +## Alternatives considered + +**Retry missing agents sequentially after fan-out.** The orchestrator detects the missing ones and re-spawns them in a second pass. Rejected: the harness may already have launched them server-side (the dry-run example demonstrates the harness recovers transparently). Re-spawning risks double inferences, double findings, and confusing the user with duplicate output. No idempotency guarantee. + +**Catch the socket error itself and retry the whole fan-out.** Guaranteed double-spawn of agents that did succeed — explicitly worse than the previous option. + +**Add a feature-flagged single retry behind a configuration option.** Postponed. ADR 0015's clause _"Re-evaluate if 5xx Notices prove painful in practice"_ applies equivalently here. If `agent-spawn` Notices become a frequent reality, this ADR can be revisited with the same opt-in approach. + +**Treat partial fan-out as ABORTED.** Rejected: one missing aspect (e.g. `comment-analyzer`) is not run-corrupting. The Notice tells the user; they can rerun if needed. ABORTED is reserved for failures that would corrupt cross-run state. + +## Consequences + +- The orchestrator gains a single post-condition check after Step 6 — a few lines of bash + a `createNotice` call. No new helper, no new test infrastructure beyond extending the existing `tests/notices.test.mjs` cases. +- The user always sees a documented status: either findings (success), findings + `agent-spawn` Notice (partial), or aborted Trailer (full failure upstream). +- Future failure modes affecting fan-out (e.g. a `pr-review-toolkit` upgrade introducing a non-JSON response) become observable via the same Notice, without a new failure-classification path. +- ADR 0015's no-retry doctrine is documented as load-bearing across two layers (HTTP and agent-spawn). Its status line gains `, amended by 0018` and its "No retries in v1" section closes with a cross-reference to this ADR. + +## See also + +- ADR 0014 — Notice Tier doctrine (`agent-spawn` is a new `kind` in the enum) +- ADR 0015 — Canonical HTTP-tier mapping (amended) +- Spec 15 — Implementation of this ADR diff --git a/apps/claude-code/pr-review/docs/adr/README.md b/apps/claude-code/pr-review/docs/adr/README.md index 75cb0c98..e4120420 100644 --- a/apps/claude-code/pr-review/docs/adr/README.md +++ b/apps/claude-code/pr-review/docs/adr/README.md @@ -5,19 +5,23 @@ See the root `docs/adr/README.md` for format and numbering conventions. ## Index -| ID | Title | Status | -| ---- | --------------------------------------------------------------------------------------- | ---------- | -| 0001 | Canonical bot signature | Accepted | -| 0002 | Signature-based prior-review detection | Accepted | -| 0003 | Target latest PR iteration | Accepted | -| 0004 | Incremental diff baseline | Accepted | -| 0005 | Four-state thread classification | Accepted | -| 0006 | Reply to existing threads instead of opening duplicates; auto-resolve addressed threads | Accepted | -| 0007 | Summary comment is rewritten on re-review, not appended | Superseded | -| 0008 | Soft dependency on pr-review-toolkit | Accepted | -| 0009 | Re-review summary delta is posted as a reply to the existing summary thread | Accepted | -| 0010 | Inline Confluence client | Accepted | -| 0011 | Additive parallel paths for doc-context extensibility | Accepted | -| 0012 | Plain-text Doc-Context agent return | Accepted | -| 0013 | Orchestrator split for review-pr | Accepted | -| 0014 | Notice Tier doctrine and failure-classification helpers | Accepted | +| ID | Title | Status | +| ---- | --------------------------------------------------------------------------------------- | -------------------------- | +| 0001 | Canonical bot signature | Accepted | +| 0002 | Signature-based prior-review detection | Accepted | +| 0003 | Target latest PR iteration | Accepted | +| 0004 | Incremental diff baseline | Accepted | +| 0005 | Four-state thread classification | Accepted | +| 0006 | Reply to existing threads instead of opening duplicates; auto-resolve addressed threads | Accepted | +| 0007 | Summary comment is rewritten on re-review, not appended | Superseded | +| 0008 | Soft dependency on pr-review-toolkit | Accepted | +| 0009 | Re-review summary delta is posted as a reply to the existing summary thread | Accepted | +| 0010 | Inline Confluence client | Accepted | +| 0011 | Additive parallel paths for doc-context extensibility | Accepted | +| 0012 | Plain-text Doc-Context agent return | Accepted | +| 0013 | Orchestrator split for review-pr | Accepted (amended by 0016) | +| 0014 | Notice Tier doctrine and failure-classification helpers | Accepted | +| 0015 | Canonical HTTP-tier mapping | Accepted (amended by 0018) | +| 0016 | Fold thread fetch into ADO Fetcher | Accepted | +| 0017 | Dry-run as a fourth peer operating mode | Accepted | +| 0018 | Fan-out resilience — trust the harness | Accepted | diff --git a/apps/claude-code/pr-review/docs/plans/12-fix-step4-fold-thread-fetch-into-fetcher.md b/apps/claude-code/pr-review/docs/plans/12-fix-step4-fold-thread-fetch-into-fetcher.md new file mode 100644 index 00000000..6c9d5629 --- /dev/null +++ b/apps/claude-code/pr-review/docs/plans/12-fix-step4-fold-thread-fetch-into-fetcher.md @@ -0,0 +1,186 @@ +# 12. Fix Step 4 — fold thread fetch into ADO Fetcher + +- Priority: P0 +- Effort: M +- Version impact: patch (bug fix; orchestrator delegation change is internal) +- Depends on: — +- Touches: `commands/review-pr.md`, `agents/ado-fetcher.md`, `scripts/mode-detection.mjs` (export `SIGNATURE_PREFIX`), new `docs/adr/0016-fold-thread-fetch-into-ado-fetcher.md`, `docs/adr/0013-orchestrator-split-for-review-pr.md` (status amendment), `docs/adr/0015-canonical-http-tier-mapping.md` (status amendment — thread-fetch exemption), `docs/adr/README.md`, `docs/plans/README.md`, `CHANGELOG.md` + +## Context + +The orchestrator's Step 4 calls `az repos pr thread list --id "$PR_ID" --org "$ORG_URL" --output json`. That subcommand does **not exist** in the `azure-devops` extension — valid `az repos pr` subcommands are `create / list / show / update / work-item / set-vote / reviewer / policy / checkout`. There is no `thread` group. Every ADO PR review since the orchestrator-split shipped has been failing at Step 4 with `ERROR: 'thread' is misspelled or not recognized`, hitting the fatal `exit 1` immediately after. + +The first observed-in-the-wild failure is the 2026-05-14 dry-run captured in `docs/conversations/pr-review-dry-run-01.txt`. In that run the LLM-as-orchestrator improvised an inline data-fetch path instead of bailing — masking the bug as a partial success that bypassed the `pr-review:ado-fetcher` agent entirely. + +ADR 0013 explicitly carved out Step 4 as "the one allowed inline ADO call" in the orchestrator. That carve-out is no longer worth its weight: (1) the inline call as documented is invalid syntax, (2) the data Step 4 needs (`RAW_THREADS_JSON`) is naturally adjacent to the data the Fetcher already fetches in Step 5 (`pullRequestIterations`, `pullRequestIterationChanges`, `pullRequestWorkItems`). Folding thread fetching into the Fetcher removes all `az devops invoke` from the orchestrator and consolidates ADO read knowledge in one place. + +## Current behaviour + +- Step 4 runs `az repos pr thread list ...` — exits non-zero on every run. +- Step 4 is the only place the orchestrator calls `az devops invoke`-shaped ADO data (in spirit; the actual call uses the invalid `az repos pr thread` path). +- `scripts/mode-detection.mjs` is invoked inline in Step 4. +- The Fetcher's result block emits: `ORG_URL`, `PROJECT`, `PR_ID`, `REPO_ID`, `PR_TITLE`, `PR_DESCRIPTION`, `SOURCE_BRANCH`, `TARGET_BRANCH`, `LATEST_ITERATION_ID`, `LATEST_COMMIT_SHA`, `DIFF_RANGE`, `WORK_ITEM_IDS`, `NOTICES`, `CHANGED_FILES`, `RAW_DIFF`. + +## Target behaviour + +- Step 4 in the orchestrator runs `az repos pr show` only (capturing `REPO_ID`, `PROJECT`, `SOURCE_BRANCH`, `TARGET_BRANCH`, `PR_TITLE`, `PR_DESCRIPTION`). These values are passed into the Fetcher as literal-string inputs. No duplicate `az repos pr show` call inside the Fetcher. +- A new Step 5a inside the Fetcher (between iterations and changed-files) fetches threads via `az devops invoke --area git --resource pullRequestThreads --route-parameters "project=$PROJECT" "repositoryId=$REPO_ID" "pullRequestId=$PR_ID"`. +- The Fetcher invokes `detectMode` (from `scripts/mode-detection.mjs`) on the thread response and adds `RAW_THREADS_JSON`, `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID` to its result block. +- The orchestrator's Step 4 becomes the metadata fetch; mode detection is read from the Fetcher's result block after Step 5 completes. +- ADR 0013's "one allowed inline ADO call" carve-out is removed by an amending ADR (0016). The orchestrator's only remaining inline `az` calls are `az --version`, `az extension list` (preflight, Step 3) and `az repos pr show` (metadata, Step 4). +- A thread-fetch failure on `pullRequestThreads` aborts on every transport failure — `401/403 → ABORTED` (re-auth hint), `5xx / network → ABORTED` (transient but unsafe to default), `404 → OK` (treated as "no threads yet"; equivalent to first-review). This is a deliberate **exemption from ADR 0015's blanket 5xx-DEGRADED rule** because thread fetch drives mode selection: defaulting to first-review on a re-review-eligible PR would post duplicate threads (the issue #46 pattern) and violates CONTEXT.md's ABORTED-for-mode-misdetection invariant. The exemption is recorded in ADR 0016 and ADR 0015 is amended accordingly. + +## Affected files + +| File | Change | +|---|---| +| `commands/review-pr.md` | Step 4: replace inline `az repos pr thread list` block with `az repos pr show` metadata fetch; parse REPO_ID/PROJECT/branches/title/description. Pass these into the Fetcher prompt. Step 5: parse `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID`, `RAW_THREADS_JSON` from the Fetcher's result block. | +| `.agents/ado-fetcher.md` | New "Step 1.5 — Fetch PR threads + detect mode" between iterations (Step 2) and changed-files (Step 3). Accept inputs from Step 4 as literal strings (`REPO_ID`, `PROJECT`, `SOURCE_BRANCH`, `TARGET_BRANCH`, `PR_TITLE`, `PR_DESCRIPTION`) and skip the `az repos pr show` call originally in Step 1. Add `RAW_THREADS_JSON`, `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID` to the result block. Add a `kind: thread-fetch` Notice emission path on degraded fetch. | +| `docs/adr/0016-fold-thread-fetch-into-ado-fetcher.md` | New ADR amending ADR 0013. | +| `docs/adr/0013-orchestrator-split-for-review-pr.md` | Status line: `Accepted (2026-05), amended by 0016`. | +| `docs/adr/README.md` | Add row 0015 (currently missing from the index) and row 0016. | +| `docs/plans/README.md` | Add row 12. | +| `CHANGELOG.md` | `Fixed` entry under `[Unreleased]` describing the Step 4 bug and the fold-into-Fetcher refactor. | + +No new helper modules. `scripts/mode-detection.mjs` gains an exported `SIGNATURE_PREFIX` constant (`'🤖 *Reviewed by Claude Code*'`) so the Fetcher can import it directly rather than receive it as a prompt literal. Its `detectMode` invocation site moves from the orchestrator's Step 4 to the Fetcher's new Step 2.5; orchestrator no longer needs to know the prefix. + +## Implementation steps + +### 1. Write ADR 0016 + +Two coupled decisions, both recorded in ADR 0016: + +1. **Amend ADR 0013's "one allowed inline ADO call" carve-out.** Thread fetching + mode detection moves into the Fetcher. The carve-out existed because mode detection had to precede Fetcher launch; consolidating mode-derived fields into the Fetcher's result block removes that constraint. + +2. **Amend ADR 0015's blanket 5xx → DEGRADED rule with a thread-fetch exemption.** Thread fetch drives mode selection. A 5xx default of "treat as empty → first-review" would post duplicate threads on re-review-eligible PRs (the issue #46 failure pattern, now caused by ADO outage instead of Writer bug). CONTEXT.md's ABORTED-Notice-Tier definition explicitly lists "mode misdetection" as cross-run state corruption. Therefore 5xx / network on `pullRequestThreads` is ABORTED, not DEGRADED. ADR 0015's HTTP-tier mapping remains correct for every other ADO read. + +### 2. Amend ADR 0013 and ADR 0015 + +- ADR 0013 status: `**Status:** Accepted (2026-05), amended by 0016`. Closing sentence: *"The 'one allowed inline ADO call' carve-out (mode-detection thread list) is removed by ADR 0016 — see that ADR for rationale."* +- ADR 0015 status: `**Status:** Accepted, amended by 0016 (thread-fetch exemption)`. Closing sentence in its "No retries in v1" section: *"Thread fetch is the one ADO read exempt from the blanket 5xx → DEGRADED rule — see ADR 0016 for rationale."* + +### 3. Update `.agents/ado-fetcher.md` + +Add new inputs to the Inputs section: + +``` +- `REPO_ID` — captured by the orchestrator in Step 4 from `az repos pr show` +- `PROJECT` — captured by the orchestrator in Step 4 +- `SOURCE_BRANCH`, `TARGET_BRANCH`, `PR_TITLE`, `PR_DESCRIPTION` — captured by the orchestrator in Step 4 + +`SIGNATURE_PREFIX` is **not** a Fetcher input — it is imported from `scripts/mode-detection.mjs` as an exported constant (see "Out of scope" section's removal note). +``` + +Remove the existing Step 1 (`az repos pr show`). Renumber subsequent steps. Insert a new step between iterations (current Step 2) and changed-files (current Step 3): + +``` +## Step 2.5 — Fetch PR threads and detect mode + +az devops invoke \ + --area git \ + --resource pullRequestThreads \ + --route-parameters "project=$PROJECT" "repositoryId=$REPO_ID" "pullRequestId=$PR_ID" \ + --org "$ORG_URL" \ + --api-version "7.1" \ + --output json 2>/tmp/ado_fetcher_threads.err +``` + +On non-zero exit: +- 401 / 403 → ABORTED with `az devops login` hint. +- 404 → OK; treat `RAW_THREADS_JSON` as `{"value":[]}` (no threads yet — fresh PR). Distinguishable from "PR not found" because the orchestrator's Step 4 (`az repos pr show`) already succeeded. +- 5xx / network error → ABORTED with message: *"Thread fetch failed (transient HTTP error). Cannot safely detect Review mode — retry the review."* Re-run is the recovery path; no Notice emitted (ABORTED goes to stderr + Trailer, not the Notices array). + +**No new `kind` is added to the `NoticeKind` union.** The original draft of this spec added `kind: thread-fetch`, but with 5xx now ABORTED, the only DEGRADED-emitting code path would be 404 — which is OK, not DEGRADED. So the existing kinds are sufficient. + +Run `detectMode` from `scripts/mode-detection.mjs` against the `.value` array. Capture `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID`. + +Extend the result block with five new lines after `WORK_ITEM_IDS`: + +``` +RAW_THREADS_JSON: +{RAW_THREADS_JSON} +MODE: {MODE} +IS_REREVIEW: {IS_REREVIEW} +PRIOR_ITERATION_ID: {PRIOR_ITERATION_ID} +SUMMARY_THREAD_ID: {SUMMARY_THREAD_ID} +``` + +### 4. Rewrite Step 4 in `commands/review-pr.md` + +Replace the existing Step 4 block. New Step 4: + +```bash +PR_SHOW_ERR="${TMPDIR:-/tmp}/pr_show.err" +PR_SHOW_JSON=$(az repos pr show --id "$PR_ID" --org "$ORG_URL" --output json 2>"$PR_SHOW_ERR") || { + echo "ERROR: failed to fetch PR metadata via az repos pr show. Try \`az devops login\` to re-authenticate." >&2 + cat "$PR_SHOW_ERR" >&2; exit 1; } +``` + +Parse `repository.id`, `repository.project.name`, `sourceRefName`, `targetRefName`, `title`, `description` into `REPO_ID`, `PROJECT`, `SOURCE_BRANCH`, `TARGET_BRANCH`, `PR_TITLE`, `PR_DESCRIPTION` (stripping `refs/heads/` from the branches). + +Step 4 no longer invokes `scripts/mode-detection.mjs` and no longer emits `MODE` / `PRIOR_ITERATION_ID` directly. + +### 5. Update Step 5 in `commands/review-pr.md` + +Add the new inputs to the Fetcher prompt: + +``` +Agent( + subagent_type: "pr-review:ado-fetcher", + prompt: "Fetch all ADO data for this PR review. + ORG_URL: {ORG_URL} + PROJECT: {PROJECT} + PR_ID: {PR_ID} + REPO_ID: {REPO_ID} + SOURCE_BRANCH: {SOURCE_BRANCH} + TARGET_BRANCH: {TARGET_BRANCH} + PR_TITLE: {PR_TITLE} + PR_DESCRIPTION: {PR_DESCRIPTION} + SIGNATURE_PREFIX: 🤖 *Reviewed by Claude Code* + PRIOR_ITERATION_ID: + PLUGIN_ROOT: {CLAUDE_PLUGIN_ROOT}" +) +``` + +After parsing `ADO_FETCHER_RESULT`, extract the five new fields and assign them to the orchestrator's `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID`, `RAW_THREADS_JSON`. Print `Mode detected: $MODE`. + +### 6. Update Step 7 in `commands/review-pr.md` + +The Coordinator prompt continues to receive `RAW_THREADS_JSON` from the orchestrator; no change required other than ensuring the value comes from the Fetcher's result block, not from the deleted Step 4. + +### 7. Update READMEs and CHANGELOG + +- `docs/plans/README.md`: add row 12. +- `docs/adr/README.md`: add row 0015 (Canonical HTTP-Tier Mapping — missing from index) and row 0016. +- `CHANGELOG.md`: under `[Unreleased]`, add `### Fixed` entry: *Step 4 mode detection was calling a non-existent `az repos pr thread list` subcommand and failing fatally on every ADO PR review. Thread fetching now lives in the ADO Fetcher and uses `az devops invoke --resource pullRequestThreads`.* + +## Verification + +- Run `/pr-review:review-pr `: Fetcher emits `MODE: first-review`, `IS_REREVIEW: false`, `PRIOR_ITERATION_ID:` (empty), `SUMMARY_THREAD_ID:` (empty). Run proceeds through Step 6. +- Run `/pr-review:review-pr `: Fetcher emits `MODE: re-review`, `IS_REREVIEW: true`, `PRIOR_ITERATION_ID: `, `SUMMARY_THREAD_ID: `. Step 7 invokes Coordinator with the same `RAW_THREADS_JSON`. +- Confirm orchestrator no longer contains any `az devops invoke` calls; only `az --version`, `az extension list`, `az repos pr show`. +- Confirm Fetcher no longer contains a `az repos pr show` call (the metadata is passed in). +- Simulate a 5xx on `pullRequestThreads`: Fetcher aborts with retry message; run stops before Step 6 fan-out; Trailer reflects ABORTED status. No Notice in any Summary block (run never produces one). +- Simulate 401 on `pullRequestThreads`: Fetcher aborts with `az devops login` hint. +- Simulate 404 on `pullRequestThreads` against a PR that exists (Step 4 PR-show succeeded): Fetcher treats threads as empty and proceeds in `first-review` mode. + +## Acceptance criteria + +- [ ] `az repos pr thread list` appears nowhere in the repo (grep returns no matches outside `docs/conversations/`). +- [ ] `commands/review-pr.md` contains no `az devops invoke` lines. +- [ ] `.agents/ado-fetcher.md` Step 2.5 calls `az devops invoke --area git --resource pullRequestThreads`. +- [ ] Fetcher result block includes `RAW_THREADS_JSON`, `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID`. +- [ ] ADR 0013 status line ends with `, amended by 0016`. +- [ ] ADR 0015 status line ends with `, amended by 0016 (thread-fetch exemption)`. +- [ ] ADR 0016 exists and records both amendments (carve-out removal + thread-fetch ABORT exemption). +- [ ] `scripts/mode-detection.mjs` exports `SIGNATURE_PREFIX` and the Fetcher imports it (no `SIGNATURE_PREFIX` input in the Fetcher's prompt contract). +- [ ] `docs/adr/README.md` lists 0015 and 0016. +- [ ] CHANGELOG `[Unreleased]` has a `Fixed` entry. + +## Out of scope + +- A `kind: thread-fetch` Notice emission path. Considered and removed once the 5xx mapping flipped to ABORTED (see "Target behaviour"). `classify-http-error.mjs` is still used to detect 5xx but its result drives an abort, not a Notice. +- No change to `scripts/mode-detection.mjs`. +- No change to the Re-review Coordinator's downstream behaviour. +- Dry-run mode (spec 13) — handled separately. +- Fan-out resilience (spec 15) — handled separately. diff --git a/apps/claude-code/pr-review/docs/plans/13-formal-dry-run-mode.md b/apps/claude-code/pr-review/docs/plans/13-formal-dry-run-mode.md new file mode 100644 index 00000000..0875aa0a --- /dev/null +++ b/apps/claude-code/pr-review/docs/plans/13-formal-dry-run-mode.md @@ -0,0 +1,162 @@ +# 13. Formal dry-run mode + +- Priority: P1 +- Effort: M +- Version impact: minor (new user-visible operating mode) +- Depends on: 12 +- Touches: `commands/review-pr.md`, new `docs/adr/0017-dry-run-as-fourth-peer-mode.md`, `docs/adr/README.md`, `docs/plans/README.md`, `CHANGELOG.md`, `README.md`, `CLAUDE.md` + +## Context + +The 2026-05-14 dry-run captured in `docs/conversations/pr-review-dry-run-01.txt` invoked the plugin with the natural-language instruction *"Make a dry-run PR review. DO NOT POST ANY COMMENT TO THE PR! Only report inline"*. The plugin had no formal dry-run mode, so the orchestrator improvised: it inlined `az` data fetches (compounding spec 12's Step 4 bug), skipped the `pr-review:ado-fetcher` agent, and never invoked the ADO Writer. The end-user got useful findings but only by accident — the documented spec was bypassed entirely. + +A formal mode replaces natural-language pleading with a documented CLI flag. The LLM-as-orchestrator no longer has to interpret "DO NOT POST"; it branches on `MODE=dry-run` and skips Writer invocation deterministically. + +Dry-run is conceptually a peer of pre-PR / first-review / re-review (not an orthogonal flag), per ADR 0017: it shares pre-PR's *rendering* layer and first-review/re-review's *fetch + analyse* layer. Making it a peer mode keeps the branching uniform across the orchestrator. + +## Current behaviour + +- Three modes: `pre-pr` (no URL), `first-review` (URL, no prior signature), `re-review` (URL, prior signature found). +- ADO Writer is invoked in `first-review` and `re-review`. +- The Trailer reads `✅ Review posted: ...` (ADO modes) or `✅ Pre-PR review complete: ...` (pre-PR). +- There is no way to preview Writer output for an ADO PR without posting. + +## Target behaviour + +- A fourth mode, `dry-run`, is selected when the orchestrator's argument parsing finds a `--dry-run` flag anywhere in `$ARGUMENTS`. +- The flag does **not** suppress the `` argument: dry-run requires a URL. `--dry-run` with no URL is a usage error. +- Mode resolution order in Step 2: + 1. No URL → `pre-pr`. + 2. URL + `--dry-run` flag → `dry-run` (Fetcher still runs to determine whether a prior signature exists; this drives Coordinator inclusion in Step 7 but does not split dry-run into two sub-modes). + 3. URL + prior bot signature in threads → `re-review`. + 4. URL + no prior bot signature → `first-review`. +- Dry-run runs Steps 3–6 identically to first-review / re-review (preflight, metadata fetch, Fetcher, Doc Context, Review Aspects fan-out). +- In Step 7, if the Fetcher reported `IS_REREVIEW: true`, the Coordinator runs and performs Thread Classification. The Coordinator's reply-posting branch is skipped (no ADO writes). `freshFindings` is still computed and assigned to `FINDINGS_JSON`. +- The ADO Writer is **never** invoked when `MODE=dry-run`. +- Step 8 in dry-run mode prints findings using pre-PR Step E's format (severity-grouped, `[severity] /path L` headers, title, body). Notices from Fetcher + Coordinator are rendered above findings via `formatNoticesAsPrePrPreamble`. +- Trailer for dry-run reads: `🔍 Dry-run complete: findings ( critical, important) · warning notices · would have posted to `. + +## Affected files + +| File | Change | +|---|---| +| `commands/review-pr.md` | Step 2: parse `--dry-run` flag from `$ARGUMENTS`; set `IS_DRY_RUN=true` before mode detection. Step 4: unchanged. Step 5: Fetcher runs in dry-run mode the same way it does for first-review / re-review. After Fetcher returns, if `IS_DRY_RUN=true` set `MODE=dry-run` (overriding the `first-review` / `re-review` value from the Fetcher's result block — but preserving `IS_REREVIEW`). Step 7: branch on `IS_DRY_RUN`; if true, run Coordinator only when `IS_REREVIEW=true` and only to compute `freshFindings`; skip Writer. Step 8: when `MODE=dry-run`, call `formatTrailer({ mode: 'dry-run', ... })` and render findings via pre-PR Step E. | +| `docs/adr/0017-dry-run-as-fourth-peer-mode.md` | New ADR — peer mode vs orthogonal flag trade-off; sibling-of-pre-PR rendering. | +| `docs/adr/README.md` | Add row 0017. | +| `docs/plans/README.md` | Add row 13. | +| `scripts/ado/notices.mjs` | Extend `formatTrailer` to accept `mode: 'dry-run'` and emit the `🔍 Dry-run complete: ...` line. Add tests in `tests/notices.test.mjs`. | +| `tests/notices.test.mjs` | New test cases for the dry-run trailer. | +| `CHANGELOG.md` | `Added` entry under `[Unreleased]`. | +| `README.md` | Document the `--dry-run` flag with one usage example. | +| `CLAUDE.md` | Note that the orchestrator must never invoke ADO Writer when `MODE=dry-run`. | + +`CONTEXT.md` already documents `Dry-run mode` and the agent-invocation matrix — no edit required. + +## Implementation steps + +### 1. Write ADR 0017 + +Decision: dry-run is a fourth peer mode, not a boolean flag. Alternative considered: a `DRY_RUN` boolean orthogonal to mode (`first-review-dry-run`, `re-review-dry-run`). Rejected because re-review-dry-run still needs Coordinator output and a single mode label keeps the orchestrator's branching readable. Mode resolution honours `--dry-run` after URL parsing; `IS_REREVIEW` survives the override to drive Coordinator inclusion. + +### 2. Update `formatTrailer` in `scripts/ado/notices.mjs` + +Accept `mode: 'dry-run'`. Emit: + +``` +🔍 Dry-run complete: findings ( critical, important) · warning notices · would have posted to +``` + +Add `node:test` cases mirroring the existing first-review trailer test: empty findings, mixed severities, mixed notices. + +### 3. Rewrite Step 2 in `commands/review-pr.md` + +Add `--dry-run` flag parsing after URL parsing: + +```bash +IS_DRY_RUN=false +if echo "$ARGUMENTS" | grep -q -- "--dry-run"; then IS_DRY_RUN=true; fi +``` + +If `IS_DRY_RUN=true` and no URL was found: emit usage error and stop. + +### 4. Rewrite Step 5 parsing in `commands/review-pr.md` + +After parsing `ADO_FETCHER_RESULT`, apply the dry-run override: + +```bash +if [ "$IS_DRY_RUN" = "true" ]; then MODE=dry-run; fi +``` + +`IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID` are preserved exactly as the Fetcher reported them — they still drive Coordinator inclusion in Step 7. + +### 5. Rewrite Step 7 in `commands/review-pr.md` + +Branch on `MODE`: + +- `MODE=re-review` → unchanged (Coordinator runs, Writer runs). +- `MODE=first-review` → unchanged (no Coordinator, Writer runs). +- `MODE=dry-run` → + - If `IS_REREVIEW=true`: run Coordinator (same prompt, same result-block parsing). On `earlyExit: true`, jump to Step 8 with `FINDINGS_JSON = []`. Otherwise reassign `FINDINGS_JSON = freshFindings`. + - Skip Writer invocation entirely. + +### 6. Rewrite Step 8 in `commands/review-pr.md` + +Add a branch at the top: + +``` +If MODE=dry-run: + - Merge Fetcher and Coordinator notices via `mergeNotices`. + - Print notices via `formatNoticesAsPrePrPreamble`. + - Print findings grouped by severity using the pre-PR Step E format. + - Call `formatTrailer({ mode: 'dry-run', findings, notices, prUrl })`. + - Stop. +``` + +The existing first-review / re-review and pre-PR branches stay intact. + +### 7. Update `README.md` + +One usage example: + +```sh +# Preview the review without posting comments +/pr-review:review-pr https://dev.azure.com/org/proj/_git/repo/pullrequest/1234 --dry-run +``` + +Document that dry-run on a PR with prior bot signatures runs Thread Classification but does not post Replies. + +### 8. Update `CLAUDE.md` + +Add a single bullet under "Command conventions": *When `MODE=dry-run`, the orchestrator MUST NOT invoke `pr-review:ado-writer` or call any ADO write endpoint. Coordinator runs in classify-only mode (no Reply posting).* + +### 9. Update READMEs and CHANGELOG + +- `docs/plans/README.md`: add row 13. +- `docs/adr/README.md`: add row 0017. +- `CHANGELOG.md`: under `[Unreleased]`, add `### Added` entry: *`--dry-run` flag. Runs all read-side review steps (Fetcher, Doc Context, Review Aspects, Thread Classification on re-review-eligible PRs) and renders findings to the Claude interface without posting to ADO.* + +## Verification + +- `/pr-review:review-pr --dry-run` → Mode `dry-run`; Fetcher runs; review aspect agents run; Writer is not invoked; findings render in Step E format; Trailer reads `🔍 Dry-run complete: ...`. +- `/pr-review:review-pr --dry-run` → Coordinator runs; classification output is included in the rendered Notices; no Replies posted; no thread status PATCH issued. +- `/pr-review:review-pr --dry-run` (no URL) → usage error. +- `/pr-review:review-pr ` (no `--dry-run`) → existing behaviour unchanged. +- Confirm via instrumented test: no calls to `az devops invoke --resource pullRequestThreads` with method `PATCH`/`POST` from any agent when `MODE=dry-run`. + +## Acceptance criteria + +- [ ] `--dry-run` is documented in `README.md` with one example. +- [ ] Step 2 parses `--dry-run` and sets `IS_DRY_RUN`. +- [ ] Step 5 overrides `MODE=dry-run` after Fetcher returns. +- [ ] Step 7 skips Writer for `MODE=dry-run`; runs Coordinator only when `IS_REREVIEW=true`. +- [ ] Step 8 renders dry-run findings in pre-PR Step E format with the dry-run Trailer. +- [ ] `formatTrailer({ mode: 'dry-run', ... })` exists and is tested. +- [ ] ADR 0017 exists and explains peer-mode-not-flag trade-off. +- [ ] `CLAUDE.md` documents the no-write invariant. +- [ ] CHANGELOG `[Unreleased]` has an `Added` entry. + +## Out of scope + +- A dry-run mode for pre-PR. Pre-PR already doesn't write to ADO; dry-run is meaningless there. +- A "diff dry-run" that only renders new findings since the last review. Re-review-dry-run already covers this via the Coordinator's `freshFindings`. +- Persisting dry-run output to a file. Future enhancement, not required now. diff --git a/apps/claude-code/pr-review/docs/plans/14-ado-cli-preflight-and-smoke-test.md b/apps/claude-code/pr-review/docs/plans/14-ado-cli-preflight-and-smoke-test.md new file mode 100644 index 00000000..14556bff --- /dev/null +++ b/apps/claude-code/pr-review/docs/plans/14-ado-cli-preflight-and-smoke-test.md @@ -0,0 +1,121 @@ +# 14. ADO CLI preflight hardening + smoke test + +- Priority: P1 +- Effort: S +- Version impact: patch (test addition + preflight expansion; no user-visible behaviour change) +- Depends on: — +- Touches: `commands/review-pr.md` (Step 3), new `tests/ado-cli-smoke.test.mjs`, new `tests/fixtures/ado-cli-inventory.mjs`, new `tests/fixtures/ado-cli-allowlist.mjs`, `.github/workflows/ci.yml` (install `azure-cli` on one matrix cell), `CLAUDE.md`, `docs/plans/README.md`, `CHANGELOG.md` + +## Context + +The 2026-05-14 dry-run failed at multiple `az` calls: `az repos pr thread list` (does not exist), `az repos pr show --project` (unsupported flag), `az repos pr list-files` (does not exist). Spec 12 fixes the orchestrator's broken Step 4, but the underlying class of failure — Microsoft renaming or removing `az` subcommands without us noticing — remains uncovered. + +ADR 0008 keeps `pr-review` as a self-contained plugin (no skill-level dependencies). The cost of self-containment is that every `az` command the plugin uses is hand-authored in agent prompts. Today there is no test that asserts these commands resolve to something the installed `az` extension actually knows. + +A CI smoke test that runs `--help` against every `--area` / `--resource` pair the plugin uses fails fast on Microsoft's rename / removal, before a production PR review trips on it. + +## Current behaviour + +- Step 3 preflight runs `az --version` and `az extension list | grep azure-devops`. +- Nothing asserts that specific subcommands or resources are available. +- A typo (`thread list` vs `pr thread show`) or a Microsoft rename surfaces only at production-time when the orchestrator tries to use the broken command. + +## Target behaviour + +- Step 3 preflight additionally runs `az devops invoke --help` and asserts exit code 0. If it fails (e.g. extension installed but corrupted): stop with `ERROR: az devops invoke unavailable. Re-install the azure-devops extension: az extension remove --name azure-devops && az extension add --name azure-devops`. +- A new test file `tests/ado-cli-smoke.test.mjs` enumerates every `az` command the plugin uses and runs `--help` (or `az devops invoke --area X --resource Y --help`) for each. Test fails if any call exits non-zero or its help text does not include the expected `--route-parameters` hint. +- The test is gated on `az` being installed locally / in CI. If not installed, it skips (not fails). +- `CLAUDE.md` gets a "Canonical ADO commands" section enumerating the same set of commands, so the inventory is humans-readable and human-greppable. The test file imports from a tiny shared inventory file (`tests/fixtures/ado-cli-inventory.mjs`) so the test and the doc cannot drift. + +## Affected files + +| File | Change | +|---|---| +| `commands/review-pr.md` | Step 3: add `az devops invoke --help` assertion. | +| `tests/fixtures/ado-cli-inventory.mjs` | New file. Exports `adoCliInventory` — an array of `{ kind: 'invoke'|'repos'|'boards', area?, resource?, command, helpKeywordsRequired: string[] }` entries. | +| `tests/ado-cli-smoke.test.mjs` | New file. Two test cases: (1) **Inventory completeness** — grep every `az ` invocation in `agents/`, `commands/`, `scripts/`; assert each unique command shape (modulo the allowlist below) has a matching inventory entry. Pure-JS; runs everywhere, no `az` skip. (2) **CLI smoke** — iterates `adoCliInventory`; runs the corresponding `--help` command via `node:child_process`; asserts exit 0 and that each `helpKeywordsRequired` substring appears in stdout. Skips with `t.skip` when `az` is unavailable. | +| `tests/fixtures/ado-cli-allowlist.mjs` | New file. Exports `ADO_CLI_ALLOWLIST` — regex patterns for `az` invocations that are intentionally NOT in the inventory: `az --version`, `az extension list`, error-message-only hints (`az devops login`, `az extension add`), and the preflight `az devops invoke --help` itself. The completeness check uses this allowlist to filter out registered exceptions. | +| `CLAUDE.md` | Add a single pointer paragraph (no command table) in the "External dependencies" or "Command conventions" area: *"All `az` commands the plugin uses are enumerated in `tests/fixtures/ado-cli-inventory.mjs`. The smoke test asserts every `az ` invocation in `agents/`/`commands/`/`scripts/` has a matching inventory entry (modulo the allowlist in `tests/fixtures/ado-cli-allowlist.mjs`). Register a new ADO call in the inventory before invoking it."* No table — the inventory file is the single source of truth, and a mirrored table would silently drift.| +| `docs/plans/README.md` | Add row 14. | +| `CHANGELOG.md` | `### Changed` entry under `[Unreleased]`: *Preflight now verifies `az devops invoke` is callable; CI smoke test asserts every ADO subcommand the plugin uses exists.* | + +## Implementation steps + +### 1. Create `tests/fixtures/ado-cli-inventory.mjs` + +Export an array. Initial entries (matches what the plugin uses today after spec 12): + +```js +export const adoCliInventory = [ + { kind: 'repos', command: ['az', 'repos', 'pr', 'show', '--help'], helpKeywordsRequired: ['--id', '--org'] }, + { kind: 'repos', command: ['az', 'repos', 'pr', 'checkout', '--help'], helpKeywordsRequired: ['--id', '--org'] }, + { kind: 'invoke', area: 'git', resource: 'pullRequestThreads', command: ['az', 'devops', 'invoke', '--area', 'git', '--resource', 'pullRequestThreads', '--help'], helpKeywordsRequired: ['--route-parameters'] }, + { kind: 'invoke', area: 'git', resource: 'pullRequestIterations', command: ['az', 'devops', 'invoke', '--area', 'git', '--resource', 'pullRequestIterations', '--help'], helpKeywordsRequired: ['--route-parameters'] }, + { kind: 'invoke', area: 'git', resource: 'pullRequestIterationChanges', command: ['az', 'devops', 'invoke', '--area', 'git', '--resource', 'pullRequestIterationChanges', '--help'], helpKeywordsRequired: ['--route-parameters'] }, + { kind: 'invoke', area: 'git', resource: 'pullRequestWorkItems', command: ['az', 'devops', 'invoke', '--area', 'git', '--resource', 'pullRequestWorkItems', '--help'], helpKeywordsRequired: ['--route-parameters'] }, + { kind: 'invoke', area: 'git', resource: 'pullRequestThreadComments', command: ['az', 'devops', 'invoke', '--area', 'git', '--resource', 'pullRequestThreadComments', '--help'], helpKeywordsRequired: ['--route-parameters'] }, + { kind: 'boards', command: ['az', 'boards', 'work-item', 'show', '--help'], helpKeywordsRequired: ['--id', '--org'] }, +] +``` + +### 2. Create `tests/ado-cli-smoke.test.mjs` + +`node:test` style, ESM, `// @ts-check` header. For each inventory entry: + +- Run the command via `child_process.spawnSync` with a 5-second timeout. +- If `spawnSync` reports `ENOENT` (i.e. `az` not installed): call `t.skip('az CLI not installed')` and stop. +- Otherwise assert exit code 0 and that every `helpKeywordsRequired` substring appears in stdout. + +Mirror the prior-art style of `tests/parse-diff-hunks.test.mjs`. + +### 3. Update Step 3 in `commands/review-pr.md` + +Append to Step 3: + +```bash +if ! az devops invoke --help >/dev/null 2>&1; then + echo "ERROR: az devops invoke unavailable. Re-install the azure-devops extension: az extension remove --name azure-devops && az extension add --name azure-devops" >&2 + exit 1 +fi +``` + +### 4. Update `CLAUDE.md` + +Append a single pointer paragraph (no command table) to the "External dependencies" section: + +```markdown +All `az` commands the plugin uses are enumerated in `tests/fixtures/ado-cli-inventory.mjs`. The smoke test (`tests/ado-cli-smoke.test.mjs`) asserts every `az ` invocation in `agents/`/`commands/`/`scripts/` has a matching inventory entry — modulo the allowlist in `tests/fixtures/ado-cli-allowlist.mjs`. Register a new ADO call in the inventory before invoking it. +``` + +A mirrored command table here was considered and rejected — it would be a third source of truth that drifts silently from the inventory and the source code. + +### 5. Update READMEs and CHANGELOG + +- `docs/plans/README.md`: add row 14. +- `CHANGELOG.md`: `### Changed` entry as described. + +## Verification + +- Run `pnpm --filter pr-review test` locally with `az` installed: smoke test passes (completeness test always runs, CLI smoke test runs when `az` is on PATH). +- Run in CI: the inventory-completeness test runs on every matrix cell. The CLI smoke test only runs on the `ubuntu-latest × Node 24` cell, where `azure-cli` + the `azure-devops` extension are installed via a dedicated workflow step. Other cells `t.skip` the CLI smoke test cleanly. +- Remove a known-good entry from the inventory and rerun: no change (inventory is reduced — test still passes; nothing forces inventory completeness). +- Add a known-bad entry (`{ kind: 'repos', command: ['az', 'repos', 'pr', 'thread', 'list', '--help'], ... }`): test fails with non-zero exit and informative output. +- Uninstall `az` locally and rerun: test skips, does not fail. + +## Acceptance criteria + +- [ ] `tests/fixtures/ado-cli-inventory.mjs` exists and lists every `az` command the plugin uses post-spec-12. +- [ ] `tests/ado-cli-smoke.test.mjs` runs `--help` for every inventory entry and asserts exit 0 + keyword presence. +- [ ] `tests/ado-cli-smoke.test.mjs` also asserts inventory completeness: every `az` invocation in `agents/`, `commands/`, `scripts/` either has an inventory entry or matches an allowlist regex. This test does not require `az` and runs everywhere. +- [ ] `tests/fixtures/ado-cli-allowlist.mjs` documents the allowlist regexes inline (each entry has a one-line comment explaining why it is exempt). +- [ ] Smoke test skips (not fails) when `az` is absent. Completeness test still runs. +- [ ] Step 3 in `commands/review-pr.md` asserts `az devops invoke --help` exits 0. +- [ ] `CLAUDE.md` has a "Canonical ADO commands" section pointing at the inventory. +- [ ] CI passes on Linux, macOS, Windows × Node 22, 24. The CLI smoke test executes only on `ubuntu-latest × Node 24` (where `azure-cli` is installed); other cells skip it. The inventory-completeness test runs on every cell. +- [ ] `.github/workflows/ci.yml` has a single conditional install step (`if: matrix.os == 'ubuntu-latest' && matrix.node == 24`) that installs `azure-cli` via `apt` and adds the `azure-devops` extension. No install on the other five matrix cells. + +## Out of scope + +- Replacing `az` with direct REST calls. Considered and rejected (token acquisition + corporate TLS-inspecting proxies are exactly what `az` handles). +- Invoking the global `azure-devops-cli` skill from the plugin. ADR 0008 keeps `pr-review` self-contained; a global skill is the wrong abstraction layer. +- Validating ADO REST API responses against schemas. Future work if response drift becomes a real failure mode. diff --git a/apps/claude-code/pr-review/docs/plans/15-fan-out-resilience-agent-spawn-notice.md b/apps/claude-code/pr-review/docs/plans/15-fan-out-resilience-agent-spawn-notice.md new file mode 100644 index 00000000..3651b6e4 --- /dev/null +++ b/apps/claude-code/pr-review/docs/plans/15-fan-out-resilience-agent-spawn-notice.md @@ -0,0 +1,124 @@ +# 15. Fan-out resilience — agent-spawn Notice + +- Priority: P1 +- Effort: S +- Version impact: patch (new Notice kind; no retry; no behaviour change on the happy path) +- Depends on: 12 +- Touches: `commands/review-pr.md` (post-Step-6 check), `scripts/ado/notices.mjs` (extend enum), `tests/notices.test.mjs`, new `docs/adr/0018-fan-out-resilience-trust-harness.md`, `docs/adr/0015-canonical-http-tier-mapping.md` (status amendment), `docs/adr/README.md`, `docs/plans/README.md`, `CHANGELOG.md` + +## Context + +The 2026-05-14 dry-run produced a transient `API Error: The socket connection was closed unexpectedly` from Claude Code's HTTP client during Step 6's parallel agent fan-out (Doc Context Orchestrator + up to 5 review-aspect agents — the plugin's largest concurrent spawn). All six agents completed normally afterward: the Claude Code harness either auto-retried or the server-side launch had already been accepted before the client socket dropped. + +The error appeared in the user-facing output without framing, which made the run look broken even though it succeeded. More importantly: if a partial fan-out *had* happened (e.g. one agent's launch dropped server-side too), the orchestrator would currently have no way to detect or report it. The Review would post with N-1 aspects' findings and the user would not know. + +ADR 0015 ("Canonical HTTP-Tier Mapping") explicitly forbids HTTP-tier retries in v1. That doctrine should extend to agent-spawn failures: trust the harness, don't retry, but surface partial fan-out as a DEGRADED Notice so the user sees reduced coverage. + +## Current behaviour + +- Step 6 launches Doc Context Orchestrator + 2–5 review-aspect agents in a single message. +- If any agent silently fails to return its `_RESULT_START`/`_END` block (or `pr-review-toolkit:*` returns nothing), the orchestrator parses what came back and proceeds. +- No Notice is emitted for missing agent results. +- The user has no signal that coverage was reduced. + +## Target behaviour + +- After Step 6 fan-out completes, the orchestrator asserts a result block came back for every agent it launched: + - Doc Context Orchestrator → presence of the `## Business context` heading (or an explicitly empty string is acceptable). + - Each Review Aspect agent → presence of a JSON findings array (`[]` is acceptable). +- For every missing result block, the orchestrator emits a `warning` Notice with `kind: agent-spawn` and `message: Review Aspect did not return findings — coverage is reduced for this run`. +- No retry. The Review still posts with whatever findings did arrive. +- The same post-condition runs in dry-run mode (spec 13) — Notice is rendered in the dry-run preamble. +- The new `agent-spawn` kind is added to ADR 0014's enum in `scripts/ado/notices.mjs`. Per ADR 0014's invariant ("each `kind` has exactly one source agent"), the source is **the orchestrator itself** — it is the only place that knows which agents it launched. + +## Affected files + +| File | Change | +|---|---| +| `commands/review-pr.md` | After parsing review aspect agent responses in Step 6, run a post-condition check: for every launched aspect, assert a JSON array was returned; for missing ones, push a `createNotice('warning', 'agent-spawn', '')` to the orchestrator-owned notices list. Merge with Fetcher + Coordinator notices in Step 8. | +| `scripts/ado/notices.mjs` | Add `'agent-spawn'` to the `NoticeKind` union and its enumeration. No other helper changes — `createNotice` / `mergeNotices` / `formatNoticesAsSummaryBlock` / `formatNoticesAsPrePrPreamble` already handle new kinds via the enum. | +| `tests/notices.test.mjs` | Add cases for: single missing aspect → one `agent-spawn` warning; two missing aspects → two warnings (deduped by `kind` per existing `mergeNotices` rules — verify expected behaviour); rendering in Summary block; rendering in pre-PR preamble. | +| `docs/adr/0018-fan-out-resilience-trust-harness.md` | New ADR — extends ADR 0015's no-retry doctrine to agent-spawn failures. | +| `docs/adr/0015-canonical-http-tier-mapping.md` | Status line: `Accepted, amended by 0018`. | +| `docs/adr/README.md` | Add row 0018; status amendment for 0015. | +| `docs/plans/README.md` | Add row 15. | +| `CHANGELOG.md` | `### Added` entry: *New `agent-spawn` warning Notice surfaces when a Review Aspect agent fails to return findings (e.g. transient socket error during fan-out).* | + +## Implementation steps + +### 1. Write ADR 0018 + +Record: agent-spawn failures during parallel fan-out are treated identically to ADR 0015's network-error policy — DEGRADED, no retry, emit a Notice. Rationale: the Claude Code harness already handles transport-layer retries and idempotency on its side; double-spawning agents from the orchestrator would waste tokens and produce duplicate findings. The DEGRADED Notice is accurate information: one aspect was not run. + +Alternative considered: retry missing agents sequentially after fan-out. Rejected — no idempotency guarantee; double-spawn risk. + +Alternative considered: catch the socket error and retry the whole fan-out. Rejected — guaranteed double-spawn of agents that did succeed. + +### 2. Amend ADR 0015 + +Status line: `**Status:** Accepted, amended by 0018`. Add a closing sentence to the "No retries in v1" section: *"This no-retry policy extends to agent-spawn failures during orchestrator fan-out — see ADR 0018."* + +### 3. Add `'agent-spawn'` to `scripts/ado/notices.mjs` + +Locate the `NoticeKind` enum / union literal. Add `'agent-spawn'`. Confirm `mergeNotices` deduplicates by `kind` — multiple missing aspects in one run produce one `agent-spawn` Notice with a deterministic message. Either: +- **(recommended)** Aggregate at the orchestrator before calling `createNotice` (build a single message listing every missing aspect), so the Notice carries useful detail. The dedup-by-kind invariant stays untouched. +- Or: change message-building in the orchestrator so each missing aspect becomes its own Notice; let `mergeNotices` keep the first. Loses information. + +### 4. Extend `tests/notices.test.mjs` + +`node:test` cases: + +- `agent-spawn` Notice renders with `⚠` prefix in both Summary block and pre-PR preamble. +- `mergeNotices` collapses two `agent-spawn` Notices (first wins per existing rule). +- Trailer count includes `agent-spawn` Notices in `warning notices`. + +### 5. Update Step 6 in `commands/review-pr.md` + +After parsing the JSON arrays from each launched review-aspect agent, run: + +``` +For each agent in : + If no JSON array was returned (agent did not produce a parseable response): + Add to MISSING_ASPECTS + +If MISSING_ASPECTS is non-empty: + Push createNotice('warning', 'agent-spawn', + 'Review Aspects did not return findings: ' + MISSING_ASPECTS.join(', ') + '. Coverage is reduced for this run.') + to ORCH_NOTICES. +``` + +Store `ORCH_NOTICES` and merge it with Fetcher and Coordinator notices in Step 8. + +### 6. Update Step 8 in `commands/review-pr.md` + +Change the merge call from `mergeNotices([...fetcherNotices, ...coordinatorNotices, ...result.notices])` to `mergeNotices([...fetcherNotices, ...coordinatorNotices, ...result.notices, ...ORCH_NOTICES])`. Pre-PR Step E gets the same treatment for the dry-run path (spec 13). + +### 7. Update READMEs and CHANGELOG + +- `docs/plans/README.md`: add row 15. +- `docs/adr/README.md`: add row 0018; mark 0015 as amended. +- `CHANGELOG.md`: `### Added` entry as described. + +## Verification + +- Force a missing Review Aspect response (e.g. mock `Agent` to return empty for `pr-review-toolkit:code-reviewer`) → orchestrator emits one `agent-spawn` warning Notice; Review still posts with the remaining aspects' findings; Trailer warning count is +1. +- Force two missing aspects → single Notice listing both; Trailer warning count is +1. +- Happy path (all aspects return) → no `agent-spawn` Notice in output. +- Dry-run with a missing aspect → Notice renders in dry-run preamble; Trailer reads `🔍 Dry-run complete: ... · 1 warning notices`. +- `mergeNotices` unit test still passes for the deduplication invariant. + +## Acceptance criteria + +- [ ] `scripts/ado/notices.mjs` `NoticeKind` includes `'agent-spawn'`. +- [ ] Orchestrator emits the Notice when any launched Review Aspect agent returns no parseable findings. +- [ ] No retry path is added. +- [ ] ADR 0015 status line ends with `, amended by 0018`. +- [ ] ADR 0018 exists and extends the no-retry doctrine to agent fan-out. +- [ ] Tests cover happy path, one missing aspect, two missing aspects, and rendering in both Summary and pre-PR preamble. +- [ ] CHANGELOG `[Unreleased]` has an `Added` entry. + +## Out of scope + +- Retry policy for the ADO Fetcher, Coordinator, or Writer. Those follow the ADR 0015 HTTP-tier mapping unchanged. +- Detecting which specific *HTTP error* caused a missing agent response — the orchestrator only sees "the agent did not return a result block". Detail is the harness's job. +- A configurable retry count behind a feature flag. If 5xx-driven Notices become painful, ADR 0018 can be revisited per its own re-evaluation clause. diff --git a/apps/claude-code/pr-review/docs/plans/16-aspect-selection-observability.md b/apps/claude-code/pr-review/docs/plans/16-aspect-selection-observability.md new file mode 100644 index 00000000..e14f44e4 --- /dev/null +++ b/apps/claude-code/pr-review/docs/plans/16-aspect-selection-observability.md @@ -0,0 +1,142 @@ +# 16. Aspect-selection observability — extract to pure JS + +- Priority: P2 +- Effort: S +- Version impact: patch (observability addition + refactor; no aspect-selection rule change beyond adding `type-design-analyzer` extension filter) +- Depends on: 12 +- Touches: `commands/review-pr.md`, new `scripts/aspect-selection.mjs`, new `tests/aspect-selection.test.mjs`, `docs/plans/README.md`, `CHANGELOG.md` + +## Context + +The 2026-05-14 dry-run launched only 3 review-aspect agents (`code-reviewer`, `silent-failure-hunter`, `pr-test-analyzer`) for a PR that included new C# types and new doc-comment changes. `comment-analyzer` and `type-design-analyzer` were skipped without any rationale shown to the user. Two problems: + +1. **Observability gap.** The user could not tell which aspects ran or why. After the fact, reading the orchestrator source revealed the heuristics, but during the run there was no signal. +2. **Possibly missed coverage.** `type-design-analyzer` arguably should have run — the PR introduced new `AnalyticsViewModel`, `SendUserInteractionEvent` types and a swagger contract. The current heuristic ("if new types were introduced") is prose-encoded in `review-pr.md` with no concrete detection rule. + +The selection logic is currently inlined in `commands/review-pr.md`'s "Aspect-filter selection" block. That makes it untested, undocumented, and indistinguishable from prompt instructions to the LLM. Extracting it to a pure-JS helper (`scripts/aspect-selection.mjs`) makes the rules testable and uniform across pre-PR / dry-run / first-review / re-review modes. + +**Boundary preserved:** the selection rules in `pr-review` make only *file-pattern* claims about which `pr-review-toolkit:*` agents to invoke. They never encode semantic claims about what those agents detect. This preserves ADR 0008's soft-dependency invariant: if `pr-review-toolkit` updates an aspect to flag new behaviour, our selection layer does not need to change. + +## Current behaviour + +- The "Aspect-filter selection (used in Step 6 and Pre-PR Step D)" block in `commands/review-pr.md` reads: *Always run `code-reviewer` and `silent-failure-hunter`. Also run `pr-test-analyzer` if test files changed, `comment-analyzer` if docs/comments were added, and `type-design-analyzer` if new types were introduced.* +- "Test files", "docs/comments", "new types" are interpreted by the LLM. No deterministic rule. No printed rationale. +- The user sees `3 background agents launched` (or 5, etc.) but not *which* and *why*. + +## Target behaviour + +- `scripts/aspect-selection.mjs` exports a pure function `selectAspects({ changedFiles, aspectFilter })` returning `[{ agent, selected: boolean, reason: string }]`. +- `changedFiles` is the array shape already produced in Step 5 — newline-split entries like `edit: /src/foo.ts`. +- `aspectFilter` accepts the same values as today: `'code' | 'errors' | 'tests' | 'comments' | 'types' | 'all'` (default `'all'`). +- Rules (file-pattern only, no diff semantics): + - `code-reviewer` → always selected. Reason: `"always-on"`. + - `silent-failure-hunter` → always selected. Reason: `"always-on"`. + - `pr-test-analyzer` → selected when any changed file path matches `\.(test|spec)\.(m?js|ts|tsx)$` OR contains `/tests?/` OR contains `/__tests__/` OR ends in `_test.go`. Reason: `"test files changed: "`. + - `comment-analyzer` → selected when any changed file is markdown (`.md`) OR is a JSDoc-relevant source (`.ts | .tsx | .js | .mjs | .cs | .java | .py`). Reason: `" source files with potential comments/docs"`. (Heuristic: docs/comment edits happen inside any source file; we cannot detect "added a comment" without parsing diffs, but flagging source-file edits is the right file-pattern level.) + - `type-design-analyzer` → selected when any changed file extension is in `.ts | .tsx | .cs | .py | .rs | .go`. Reason: `" typed-language files changed"`. **No semantic regex** — explicitly excluded per ADR 0008 soft-dependency boundary. +- Aspect filter override: if `aspectFilter` is set to a single aspect (e.g. `'tests'`), only that aspect (plus the two always-on aspects) is selected, regardless of file patterns. +- Before launching agents in Step 6, the orchestrator prints: + + ``` + Launching review aspects: + ✓ code-reviewer (always-on) + ✓ silent-failure-hunter (always-on) + ✓ pr-test-analyzer (test files changed: 3) + ✗ comment-analyzer (skipped — 0 source files with potential comments/docs) + ✓ type-design-analyzer (2 typed-language files changed) + ``` + +- Pre-PR Step D and dry-run mode also use `selectAspects` and print the same block. + +## Affected files + +| File | Change | +|---|---| +| `scripts/aspect-selection.mjs` | New file. `// @ts-check` header, ESM. Exports `selectAspects` and the constant `ASPECT_AGENT_NAMES` (used to map agent → subagent_type when launching). | +| `tests/aspect-selection.test.mjs` | New file. `node:test` cases for: empty `changedFiles` (only always-on selected); test-file detection across patterns; comment-analyzer extension list; type-design-analyzer extension list; aspect filter override; aspect filter `'all'`; invalid filter (defaults to `'all'`). | +| `commands/review-pr.md` | Replace the "Aspect-filter selection" prose block with a call to `selectAspects(...)` (Node import via `await import`, same pattern as `mode-detection.mjs`). Print the rendered selection block before fan-out in Step 6 and pre-PR Step D. Use the returned array's `selected: true` entries as the launch list. | +| `docs/plans/README.md` | Add row 16. | +| `CHANGELOG.md` | `### Added` entry: *Review aspect selection is now printed before fan-out (which aspects ran, which were skipped, and why). Selection logic is extracted to a tested pure helper.* | + +No ADR — refactor with observability addition, not a new architectural decision. The boundary-preservation rationale (no semantic claims) is a restatement of ADR 0008's existing invariant. + +## Implementation steps + +### 1. Create `scripts/aspect-selection.mjs` + +```js +// @ts-check +/** @typedef {'code-reviewer'|'silent-failure-hunter'|'pr-test-analyzer'|'comment-analyzer'|'type-design-analyzer'} AspectAgent */ +/** @typedef {{ agent: AspectAgent, selected: boolean, reason: string }} AspectDecision */ + +export const ASPECT_AGENT_NAMES = [ + 'code-reviewer', + 'silent-failure-hunter', + 'pr-test-analyzer', + 'comment-analyzer', + 'type-design-analyzer', +] + +const TEST_PATTERNS = [ + /\.(test|spec)\.(m?js|ts|tsx)$/, + /\/__tests__\//, + /\/tests?\//, + /_test\.go$/, +] + +const COMMENT_EXTS = ['.md', '.ts', '.tsx', '.js', '.mjs', '.cs', '.java', '.py'] +const TYPED_EXTS = ['.ts', '.tsx', '.cs', '.py', '.rs', '.go'] + +export function selectAspects({ changedFiles, aspectFilter = 'all' }) { /* ... */ } +``` + +Implementation: split `changedFiles` strings on the first `: ` to extract paths; apply each rule; honour `aspectFilter` override. + +### 2. Create `tests/aspect-selection.test.mjs` + +Mirror `tests/parse-diff-hunks.test.mjs` style. Cases listed in "Target behaviour" above. + +### 3. Update `commands/review-pr.md` + +Replace the "Aspect-filter selection" block. New version: + +```markdown +### Aspect-filter selection (used in Step 6 and Pre-PR Step D) + +Parse `$ARGUMENTS` for an aspect filter (`code` | `errors` | `tests` | `comments` | `types` | `all`); default `all`. + +Compute the selection via `scripts/aspect-selection.mjs`: + + Via `await import`, call `selectAspects({ changedFiles: CHANGED_FILES_ARRAY, aspectFilter: ASPECT_FILTER })`. + +Print the rendered selection block (✓/✗ per agent with reason) to the Claude interface. Launch every agent whose `selected: true` entry corresponds to a `pr-review-toolkit:` subagent. +``` + +### 4. Update READMEs and CHANGELOG + +- `docs/plans/README.md`: add row 16. +- `CHANGELOG.md`: `### Added` entry as described. + +## Verification + +- Run pre-PR mode on a branch with only `.cs` changes → `type-design-analyzer` selected; `pr-test-analyzer` skipped with reason "no test files changed: 0". +- Run on a branch with `.md` changes → `comment-analyzer` selected with reason citing source-file count. +- Run with `--dry-run` flag on a PR with test files → printed selection block appears; selection deterministic across runs. +- Run with `aspects=tests` → only `code-reviewer`, `silent-failure-hunter`, `pr-test-analyzer` selected; others skipped with reason `"filtered out by aspects=tests"`. +- `pnpm --filter pr-review test`: new aspect-selection test cases pass on Linux/macOS/Windows. + +## Acceptance criteria + +- [ ] `scripts/aspect-selection.mjs` exists with `selectAspects` and `ASPECT_AGENT_NAMES`. +- [ ] `selectAspects` makes only file-pattern claims (no diff content parsing). +- [ ] `type-design-analyzer` is selected by extension only (no semantic regex). +- [ ] Orchestrator prints the selection block before fan-out in both Step 6 and pre-PR Step D. +- [ ] Tests cover empty input, each rule, filter override, and `'all'` filter. +- [ ] Tests skip on no-`az`-needed (pure JS, OS-independent). +- [ ] CHANGELOG `[Unreleased]` has an `Added` entry. + +## Out of scope + +- Adding new Review Aspect agents to `pr-review-toolkit`. The selection rules accommodate today's aspects only. +- Diff-content-based selection (e.g. "added a `class` declaration"). Explicitly out per ADR 0008 soft-dependency boundary — would encode toolkit-agent semantics in our plugin. +- A per-PR aspect override (e.g. `aspects=code,types`). Today's single-aspect filter is sufficient; comma-separated lists are a future enhancement. diff --git a/apps/claude-code/pr-review/docs/plans/README.md b/apps/claude-code/pr-review/docs/plans/README.md index d05ee12b..45db876f 100644 --- a/apps/claude-code/pr-review/docs/plans/README.md +++ b/apps/claude-code/pr-review/docs/plans/README.md @@ -18,3 +18,8 @@ Goal: when `/unic-pr-review:review-pr ` runs against a PR that already has | 09 | Test harness — node:test + modules | done | 02, 05, 06 | | 10 | Doc Context enrichment — work items + Confluence pages | done | — | | 11 | Doc Context spawn reliability — fix silently skipped phase | done | 10 | +| 12 | Fix Step 4 — fold thread fetch into ADO Fetcher | todo | — | +| 13 | Formal dry-run mode | todo | 12 | +| 14 | ADO CLI preflight hardening + smoke test | todo | — | +| 15 | Fan-out resilience — agent-spawn Notice | todo | 12 | +| 16 | Aspect-selection observability + extract to pure JS | todo | 12 |