Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions apps/claude-code/pr-review/CONTEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,14 @@ _Avoid_: initial review, fresh review
A Review run against an ADO PR where a prior **Bot Signature** is found in the PR's threads. Focuses on commits since the last Review, performs Thread Classification, and replies to or resolves existing Review Threads rather than duplicating them.
_Avoid_: incremental review, follow-up review, second pass

**Dry-run mode**:
A Review run against an ADO PR with the `--dry-run` flag. Identical to first-review / re-review for every read-side step — **ADO Fetcher**, **Doc Context Orchestrator**, all Review Aspect agents, and (when prior signature is found) the **Re-review Coordinator's** Thread Classification — but the **ADO Writer** is never invoked. Findings are rendered in the Claude interface using the same severity-grouped format as Pre-PR mode. Used to preview what a Review or Re-review would post before committing to write-back.
_Avoid_: preview review, simulated review, no-post review

### Orchestration agents

**ADO Fetcher**:
A plugin agent that retrieves PR metadata, iterations, changed files, and the raw diff from Azure DevOps. Used by first-review and re-review modes; not invoked in pre-PR mode.
A plugin agent that retrieves PR metadata, iterations, PR threads, changed files, and the raw diff from Azure DevOps — and determines the Review mode from the thread data. All ADO read operations for a PR review are owned here; the orchestrator makes no inline ADO read calls.
_Avoid_: fetcher, data agent, ADO client

**Re-review Coordinator**:
Expand Down Expand Up @@ -158,9 +162,9 @@ A single end-of-run line printed by the orchestrator to the Claude interface, re
- A **Doc Context** is assembled via a three-tier pipeline: the **Doc Context Orchestrator** spawns **Work Item Summarizer** and **Confluence Fetcher** agents (Doc Context Sub-agents) in parallel, then delegates their outputs to the **Doc Context Synthesizer**, which produces the final `DOC_CONTEXT` narrative injected into every Review Aspect agent
- A **Doc Context Sub-agent** operates on a single source (work item or Confluence page) and receives the changed files list and the local diff when available
- The **Doc Context Orchestrator** returns the **Doc Context Synthesizer**'s output verbatim; it does not rewrite or reformat the narrative
- The **ADO Fetcher** is invoked by first-review and re-review modes; **Pre-PR mode** skips it entirely and goes directly to Review Aspect agents
- The **Re-review Coordinator** is invoked only when the mode is re-review; first-review and pre-PR modes never load it
- The **ADO Writer** is invoked by first-review and re-review modes; **Pre-PR mode** does not write back to ADO
- The **ADO Fetcher** is invoked by first-review, re-review, and dry-run modes; **Pre-PR mode** skips it entirely and goes directly to Review Aspect agents
- The **Re-review Coordinator** is invoked when the mode is re-review or a dry-run that detected a prior **Bot Signature**; first-review, pre-PR, and dry-run-on-fresh-PR modes never load it. In dry-run mode the Coordinator performs Thread Classification but its results feed only the rendered output — no Replies are posted
- The **ADO Writer** is invoked by first-review and re-review modes; **Pre-PR mode** and **Dry-run mode** do not write back to ADO
- Every operation in an orchestration agent terminates in one of the four **Notice Tiers**. **DEGRADED** and **EMPTY-BY-DESIGN**-with-message operations emit a **Notice** that flows from the agent's structured result block, through the orchestrator's merge step, into the **Review Summary** (for ADO modes) or the printed pre-findings block (for **Pre-PR mode**). The end-of-run **Trailer** carries Notice counts so the invoker sees them without opening the PR.

## Example dialogue
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ADR 0015 — Canonical HTTP-Tier Mapping

**Status:** Accepted
**Status:** Accepted, amended by 0018
**Date:** 2026-05-13
**Deciders:** Oriol Torrent Florensa
**Context:** ADR 0014 (failure-classification helper layer)
Expand Down Expand Up @@ -55,6 +55,9 @@ Retries are not implemented. Reasons:
Re-evaluate if 5xx Notices prove painful in practice; retries can be added behind the same Notice
surface without changing the doctrine.

**Update (2026-05-14):** this no-retry policy is extended to agent-spawn failures during orchestrator
fan-out by ADR 0018. The reasoning is identical (AFK latency, retry-storm risk, accurate Notice).

### Implementation

The canonical mapping is implemented in `scripts/ado/classify-http-error.mjs`:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# 0016. Fold thread fetch into ADO Fetcher

**Status:** Accepted (2026-05)
**Context:** Amends ADR 0013 (orchestrator split for review-pr)
**Date:** 2026-05-14

## Context

ADR 0013 split `review-pr.md` into a thin orchestrator and focused agents. As part of that split, ADR 0013 carved out **one allowed inline ADO call**: the mode-detection `az repos pr thread list` in the orchestrator's Step 4. The justification was that mode detection had to happen _before_ the ADO Fetcher could be launched, so the orchestrator needed to fetch and inspect threads itself.

Two facts surfaced after shipping the split:

1. **The carved-out command is invalid syntax.** `az repos pr thread list` does not exist in the `azure-devops` extension — valid `az repos pr` subcommands are `create / list / show / update / work-item / set-vote / reviewer / policy / checkout`. There is no `thread` group. The correct call is `az devops invoke --area git --resource pullRequestThreads --route-parameters …`. Every ADO PR review since the orchestrator-split shipped has been failing at Step 4. The first observed-in-the-wild failure is the 2026-05-14 dry-run captured in `docs/conversations/pr-review-dry-run-01.txt`.

2. **The "fetch threads → derive mode" pair is naturally adjacent to the rest of the Fetcher's work.** The Fetcher already calls `az devops invoke` for `pullRequestIterations`, `pullRequestIterationChanges`, `pullRequestWorkItems`. Adding `pullRequestThreads` to that family keeps ADO read knowledge in one place — the agent that exists exactly to own it.

The carve-out can go away if the Fetcher returns `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID`, and `RAW_THREADS_JSON` in its structured result block. The orchestrator branches on those fields in Step 5 onwards — the same pattern it already uses for `WORK_ITEM_IDS`, `DIFF_RANGE`, etc.

## Decision

**Move thread fetching and mode detection from the orchestrator into the ADO Fetcher.** The Fetcher gains a new step (Step 2.5) between iterations fetch and changed-files fetch:

1. Call `az devops invoke --area git --resource pullRequestThreads --route-parameters "project=$PROJECT" "repositoryId=$REPO_ID" "pullRequestId=$PR_ID"`.
2. Apply ADR 0015's HTTP-tier mapping: 401/403 → ABORTED; 404 → OK (treat as empty threads; equivalent to first-review); 5xx / network → DEGRADED with `kind: thread-fetch`, treat as empty threads so mode detection still produces `first-review`.
3. Run `detectMode` (from `scripts/mode-detection.mjs`, unchanged) on the response's `.value` array. The function returns `{ mode, isRereview, priorIterationId, summaryThreadId }`.
4. Append `RAW_THREADS_JSON`, `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID` to the Fetcher's result block.

The orchestrator's Step 4 simplifies to a single `az repos pr show` call that captures PR metadata (`REPO_ID`, `PROJECT`, `SOURCE_BRANCH`, `TARGET_BRANCH`, `PR_TITLE`, `PR_DESCRIPTION`) and passes them into the Fetcher as inputs. The Fetcher no longer makes its own `az repos pr show` call — the data is already in its prompt.

After this change, the orchestrator contains **no `az devops invoke` calls**. The only remaining inline `az` is `az --version`, `az extension list` (Step 3 preflight), and `az repos pr show` (Step 4 metadata).

ADR 0013's carve-out (_"The one allowed inline ADO call is the mode-detection `az repos pr thread list` in the mode detection block"_) is **removed**. ADR 0013's core decision — thin orchestrator + focused agents — stands; only that one-sentence exception is superseded by this ADR.

## Alternatives considered

**Fix the inline command in Step 4 and leave it in the orchestrator.** Smallest change. Rejected because it preserves the architectural reason for the carve-out (mode detection must precede Fetcher launch) which doesn't actually exist — the orchestrator can equally well branch on Fetcher output. Keeping ADO knowledge in two places means future API changes require edits in two files.

**Introduce a separate `mode-detector` agent that fetches threads.** Cleaner separation of concerns. Rejected because mode detection and the rest of the Fetcher's work share the same ADO call shape (`az devops invoke --area git --resource …`); a new agent would duplicate the auth/retry/error-classification setup the Fetcher already has.

**Replace `az` with direct REST calls.** Long-discussed in ADR 0008's neighbourhood. Out of scope for this ADR; revisit if `az` proves a recurring failure source after spec 14's smoke test catches near-term drift.

## Consequences

- The plugin's mode-detection path actually works (Step 4 was previously broken on every run).
- ADO read knowledge is centralised in `ado-fetcher.md`.
- Adding new mode-affecting ADO data (e.g. a `mergeAttempt` field) requires editing one agent prompt.
- The Fetcher's result block grows by five lines — downstream prompts (Coordinator, Writer) parse them via existing string-extraction patterns, no helper change.
- ADR 0013's status line is updated to `Accepted (2026-05), amended by 0016`.

## See also

- ADR 0013 — Orchestrator split for review-pr (amended)
- ADR 0014 — Notice Tier doctrine (governs the new `kind: thread-fetch` DEGRADED path)
- ADR 0015 — Canonical HTTP-tier mapping (governs the 401/403/5xx behaviour for thread fetch)
- Spec 12 — Implementation of this ADR
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# 0017. Dry-run as a fourth peer operating mode

**Status:** Accepted (2026-05)
**Date:** 2026-05-14

## Context

The 2026-05-14 dry-run captured in `docs/conversations/pr-review-dry-run-01.txt` was invoked with the natural-language instruction _"Make a dry-run PR review. DO NOT POST ANY COMMENT TO THE PR! Only report inline"_. The plugin had no formal dry-run mechanism. The LLM-as-orchestrator improvised: inlined `az` data fetches (compounding the Step 4 bug), skipped the `pr-review:ado-fetcher` agent entirely, and never invoked the ADO Writer. The user got useful findings but only by accident — the documented spec was bypassed.

Two design problems:

1. **No deterministic dry-run path.** Asking the LLM to "not post comments" works most of the time but invites free-form deviation from the spec. A documented mode replaces persuasion with branching.
2. **Dry-run is genuinely useful and not just a debug helper.** Previewing what a re-review would post — before letting it edit threads in a customer-facing PR — is a legitimate user need.

Two shapes for the formalisation: a fourth peer mode, or an orthogonal boolean flag.

## Decision

**Dry-run is a fourth peer mode, alongside pre-PR, first-review, and re-review.** CLI surface: `/pr-review:review-pr <url> --dry-run`. The orchestrator's mode resolution applies the flag after URL parsing; `IS_REREVIEW` (whether a prior bot signature exists) is captured separately and drives Coordinator inclusion within dry-run mode.

Dry-run runs every read-side step identically to first-review / re-review (preflight, metadata fetch, ADO Fetcher, Doc Context Orchestrator, Review Aspects fan-out, and Re-review Coordinator's Thread Classification when `IS_REREVIEW=true`). The **ADO Writer is never invoked**. Findings render to the Claude interface using pre-PR Step E's severity-grouped format. Notices from Fetcher and Coordinator render via `formatNoticesAsPrePrPreamble`. The Trailer reads `🔍 Dry-run complete: <N> findings (<criticals> critical, <importants> important) · <warnings> warning notices · would have posted to <PR URL>`.

Coordinator runs in dry-run mode when `IS_REREVIEW=true`, but its reply-posting branch is short-circuited. Its `freshFindings` output still feeds `FINDINGS_JSON` so the user previews exactly the incremental finding set the real re-review would have posted.

## Alternatives considered

**Orthogonal `DRY_RUN` boolean flag.** Any mode (first-review / re-review) could be dry-run. Conceptually cleaner — dry-run is "do the work, don't post" — but produces awkward mode names internally (`first-review-dry-run`, `re-review-dry-run`) and forces every branch in the orchestrator to read two state variables instead of one. Rejected: peer-mode keeps the orchestrator's mode-switch readable as a single discriminator.

**Document a "DO NOT POST" convention and trust the LLM.** Zero implementation cost. Rejected because it is exactly what failed on 2026-05-14: the LLM honoured the instruction by skipping not just Writer but also the entire structured fetch path, masking the Step 4 bug and producing inconsistent behaviour across runs.

**Make dry-run a flag that disables only the Writer agent invocation.** Halfway between the other two. Rejected because the orchestrator's other branches (Step 8 trailer text, Notice rendering) still need to know whether to use the pre-PR rendering path or the ADO Summary path. A single mode value carries that decision uniformly.

## Consequences

- The orchestrator branches on `MODE ∈ {pre-pr, dry-run, first-review, re-review}` — four cases, no nested flags.
- Re-review-eligible PRs can be previewed safely: classification runs and reports its plan, but no replies are posted and no thread statuses are PATCHed.
- The Pre-PR rendering helpers (`formatNoticesAsPrePrPreamble`, severity-grouped finding print) are reused by dry-run — confirming that "render to interface" is a coherent capability, not a pre-PR-only quirk.
- The Trailer line gets a fourth shape (`🔍 Dry-run complete: ...`). `formatTrailer` in `scripts/ado/notices.mjs` grows one branch + test cases.
- `CONTEXT.md` gains a `Dry-run mode` term and updates three relationship lines documenting which agents run in which modes.

## See also

- Spec 13 — Implementation of this ADR
- Spec 12 — Step 4 fix (the upstream bug that exposed the no-formal-dry-run gap)
- ADR 0014 — Notice Tier doctrine (`formatTrailer`, `formatNoticesAsPrePrPreamble` are governed here)
- `CONTEXT.md` — `Dry-run mode` definition and the agent-invocation matrix
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# 0018. Fan-out resilience — trust the harness, surface partial failure as a Notice

**Status:** Accepted (2026-05)
**Context:** Amends ADR 0015 (canonical HTTP-tier mapping — no-retry doctrine)
**Date:** 2026-05-14

## Context

The 2026-05-14 dry-run captured in `docs/conversations/pr-review-dry-run-01.txt` produced a transient `API Error: The socket connection was closed unexpectedly` from the Claude Code HTTP client during Step 6's parallel agent fan-out. Step 6 is the plugin's largest concurrent spawn: Doc Context Orchestrator plus 2–5 review-aspect agents in a single message.

The error printed in the output but all spawned agents completed normally. The Claude Code harness either auto-retried the launch transparently or the server-side launch had already been accepted before the client socket dropped. The user-visible outcome was a successful run that _looked_ broken.

Two unaddressed risks:

1. **Cosmetic confusion.** A scary API error appeared with no framing. AFK users skimming the output can't tell whether a partial failure occurred.
2. **Real partial-failure invisibility.** If a future fan-out _does_ lose one agent (e.g. the harness fails to recover), the orchestrator currently has no post-condition that detects the missing result block. The Review posts with N-1 aspects' findings and the user has no signal that coverage was reduced.

ADR 0015 already forbids retries for HTTP failures in v1 with three reasons: (1) AFK latency, (2) retry-storm failure mode, (3) DEGRADED Notices carrying accurate information. All three reasons apply identically to agent-spawn failures during fan-out. This ADR extends ADR 0015's doctrine to the orchestrator's fan-out layer.

## Decision

**Trust the harness. Don't retry. Surface partial fan-out as a DEGRADED Notice.**

After Step 6's fan-out completes, the orchestrator runs a post-condition check: every launched Review Aspect agent must have returned a parseable JSON findings array. Agents that return nothing (no result block, no JSON, empty stdout) are recorded as missing.

If any aspects are missing, the orchestrator emits a single Notice:

```js
{ severity: 'warning', kind: 'agent-spawn',
message: 'Review Aspects did not return findings: <name>, <name>. Coverage is reduced for this run.' }
```

The Notice flows through the same pipeline as Fetcher and Coordinator notices: merged via `mergeNotices`, rendered in the Review Summary (or pre-PR / dry-run preamble), counted in the Trailer's `warning notices` field.

The Review still posts with whatever findings did arrive. No retry. No double-spawn risk.

The `agent-spawn` kind is added to the `NoticeKind` enum in `scripts/ado/notices.mjs`. Per ADR 0014's invariant _"each `kind` has exactly one source agent"_, the source agent is **the orchestrator itself** — it is the only place that knows which agents were launched.

## Alternatives considered

**Retry missing agents sequentially after fan-out.** The orchestrator detects the missing ones and re-spawns them in a second pass. Rejected: the harness may already have launched them server-side (the dry-run example demonstrates the harness recovers transparently). Re-spawning risks double inferences, double findings, and confusing the user with duplicate output. No idempotency guarantee.

**Catch the socket error itself and retry the whole fan-out.** Guaranteed double-spawn of agents that did succeed — explicitly worse than the previous option.

**Add a feature-flagged single retry behind a configuration option.** Postponed. ADR 0015's clause _"Re-evaluate if 5xx Notices prove painful in practice"_ applies equivalently here. If `agent-spawn` Notices become a frequent reality, this ADR can be revisited with the same opt-in approach.

**Treat partial fan-out as ABORTED.** Rejected: one missing aspect (e.g. `comment-analyzer`) is not run-corrupting. The Notice tells the user; they can rerun if needed. ABORTED is reserved for failures that would corrupt cross-run state.

## Consequences

- The orchestrator gains a single post-condition check after Step 6 — a few lines of bash + a `createNotice` call. No new helper, no new test infrastructure beyond extending the existing `tests/notices.test.mjs` cases.
- The user always sees a documented status: either findings (success), findings + `agent-spawn` Notice (partial), or aborted Trailer (full failure upstream).
- Future failure modes affecting fan-out (e.g. a `pr-review-toolkit` upgrade introducing a non-JSON response) become observable via the same Notice, without a new failure-classification path.
- ADR 0015's no-retry doctrine is documented as load-bearing across two layers (HTTP and agent-spawn). Its status line gains `, amended by 0018` and its "No retries in v1" section closes with a cross-reference to this ADR.

## See also

- ADR 0014 — Notice Tier doctrine (`agent-spawn` is a new `kind` in the enum)
- ADR 0015 — Canonical HTTP-tier mapping (amended)
- Spec 15 — Implementation of this ADR
Loading
Loading