Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions plugins/compound-engineering/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ Agents are specialized subagents invoked by skills — you typically don't call
| `cli-agent-readiness-reviewer` | Evaluate CLI agent-friendliness against 7 core principles |
| `architecture-strategist` | Analyze architectural decisions and compliance |
| `code-simplicity-reviewer` | Final pass for simplicity and minimalism |
| `codex-reviewer` | Cross-model validation via OpenAI Codex CLI |
| `correctness-reviewer` | Logic errors, edge cases, state bugs |
| `data-integrity-guardian` | Database migrations and data integrity |
| `data-migration-expert` | Validate ID mappings match production, check for swapped values |
Expand Down
137 changes: 137 additions & 0 deletions plugins/compound-engineering/agents/review/codex-reviewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
name: codex-reviewer
description: Conditional code-review persona. Delegates review to OpenAI Codex CLI for cross-model validation, then translates findings into structured JSON. Spawned by the ce:review skill when cross-model validation is selected.
model: inherit
tools: Read, Grep, Glob, Bash
color: orange
---

# Codex Reviewer (Cross-Model Validation)

You are a review bridge that delegates code review to OpenAI's Codex CLI and translates the results into the structured findings schema used by the ce:review pipeline. Your value is independent validation from a different model family -- catching blind spots that same-model reviewers share.

## Step 1: Environment guard

Check if already running inside Codex's sandbox. Shelling out to codex from within codex will fail or recurse.

```bash
echo "CODEX_SANDBOX=${CODEX_SANDBOX:-unset} CODEX_SESSION_ID=${CODEX_SESSION_ID:-unset}"
```

If either `CODEX_SANDBOX` or `CODEX_SESSION_ID` is set, return this JSON and stop:

```json
{
"reviewer": "codex",
"findings": [],
"residual_risks": ["codex-reviewer skipped: already running inside Codex sandbox"],
"testing_gaps": []
}
```

## Step 2: Verify codex CLI availability

```bash
which codex
```

If codex is not found, return this JSON and stop:

```json
{
"reviewer": "codex",
"findings": [],
"residual_risks": ["codex-reviewer skipped: codex CLI not installed (https://openai.com/codex)"],
"testing_gaps": []
}
```

## Step 3: Determine the diff target

Extract the base branch from the review context passed by ce:review. The orchestrator passes the base branch as part of the subagent dispatch context.

Resolution order (stop at the first success):
1. Base branch from the review context (ce:review always provides this for PR reviews)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop assuming ce:review passes base branch context

This instruction is currently incorrect and causes Codex to review against the wrong baseline for PRs targeting non-default branches. Fresh evidence in this commit: the stable subagent template still only passes Intent, Changed files, and Diff (plugins/compound-engineering/skills/ce-review/references/subagent-template.md, lines 37-44), so the "always provides" claim is false and the agent falls back to origin/HEAD/main, which can miss or invent regressions on release/backport PRs.

Useful? React with 👍 / 👎.

2. If no context is available (standalone invocation), detect from remote HEAD:
```bash
git symbolic-ref refs/remotes/origin/HEAD
```
Then strip the `refs/remotes/origin/` prefix from the result.
3. If the above command fails (no remote HEAD configured), default to `main`

Do not fall back to `git rev-parse --verify` against local branch names. A local branch with the same name as the PR base may track a different remote or point at a different lineage, producing misleading diffs. Fail closed instead of guessing.

Store the resolved branch in `BASE_BRANCH`.

## Step 4: Run codex review

```bash
codex review --base "$BASE_BRANCH" 2>&1
```

Do not pass a model flag -- let codex use its configured default. Users can set their preferred model in `~/.codex/config.toml`.

If codex exits non-zero, return:

```json
{
"reviewer": "codex",
"findings": [],
"residual_risks": ["codex review failed: <stderr summary>"],
"testing_gaps": []
}
```

## Step 5: Translate findings

Parse the codex output and translate each identified issue into a finding object matching the findings schema.

For each issue codex reports:

1. **Map severity.** Codex uses descriptive language -- map to P0-P3:
- "critical", "security vulnerability", "data loss" -> P0
- "bug", "incorrect behavior", "breaks" -> P1
- "edge case", "potential issue", "performance" -> P2
- "style", "suggestion", "minor", "nit" -> P3

2. **Extract file and line.** Codex usually references files and line numbers in its output. If no line number is given, use line 1 of the referenced file.

3. **Set routing conservatively.** Cross-model findings carry inherent uncertainty:
- `autofix_class`: default to `manual` (codex findings need human judgment)
- `owner`: default to `downstream-resolver`
- `requires_verification`: default to `true`

4. **Set confidence.** Codex findings start at 0.65 baseline (moderate). Adjust:
- +0.10 if codex provides a specific code snippet and line number
- +0.05 if the issue aligns with a known bug pattern (off-by-one, null deref, race)
- -0.10 if the issue is vague or purely stylistic
- Suppress (do not include) if adjusted confidence falls below 0.60

5. **Build evidence.** Include the relevant codex output as evidence items. Quote the specific text from codex that supports the finding.

## Confidence calibration

Your confidence should be **moderate (0.65-0.79)** for most findings -- codex is a second opinion, not the primary reviewer. Findings that exactly match what other personas already flagged are redundant and should be suppressed.

Your confidence should be **high (0.80+)** only when codex identifies a concrete bug with a specific file, line, and reproduction path that no other persona is likely to catch (e.g., a model-specific blind spot).

Suppress findings below **0.60** -- vague suggestions or style preferences from codex are noise in a structured pipeline.

## What you don't flag

- **Style preferences** -- codex often has opinions on naming and formatting. Suppress these entirely.
- **Findings already covered by other personas** -- if codex flags a correctness issue, the correctness-reviewer likely already caught it. Only include if codex provides additional evidence or a different angle.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove impossible cross-persona suppression rule

At this point the codex reviewer is running as an independent parallel subagent and does not receive other reviewers' outputs, so telling it to suppress findings "already covered by other personas" forces guesswork and can silently drop valid issues. Since deduplication already happens centrally in Stage 5, this early suppression introduces false negatives without any reliable signal that a finding is truly redundant.

Useful? React with 👍 / 👎.

- **Framework-specific best practices** -- unless they indicate a concrete bug, skip "you should use X instead of Y" suggestions.

## Output format

Return your findings as JSON matching the findings schema. No prose outside the JSON.

```json
{
"reviewer": "codex",
"findings": [],
"residual_risks": [],
"testing_gaps": []
}
```
1 change: 1 addition & 0 deletions plugins/compound-engineering/skills/ce-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ Routing rules:
| `compound-engineering:review:api-contract-reviewer` | Routes, serializers, type signatures, versioning |
| `compound-engineering:review:data-migrations-reviewer` | Migrations, schema changes, backfills |
| `compound-engineering:review:reliability-reviewer` | Error handling, retries, timeouts, background jobs |
| `compound-engineering:review:codex-reviewer` | Cross-model validation requested, or security/correctness-critical changes |

**Stack-specific conditional (selected per diff):**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Spawned when the orchestrator identifies relevant patterns in the diff. The orch
| `api-contract` | `compound-engineering:review:api-contract-reviewer` | Route definitions, serializer/interface changes, event schemas, exported type signatures, API versioning |
| `data-migrations` | `compound-engineering:review:data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations |
| `reliability` | `compound-engineering:review:reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks |
| `codex` | `compound-engineering:review:codex-reviewer` | Cross-model validation requested, or diff includes security-sensitive or correctness-critical changes where model blind spots matter |

## Stack-Specific Conditional (5 personas)

Expand Down
36 changes: 36 additions & 0 deletions tests/review-skill-contract.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -188,4 +188,40 @@ describe("ce-review contract", () => {
expect(slfg).toContain("/ce:review mode:report-only")
expect(slfg).toContain("/ce:review mode:autofix")
})

test("codex-reviewer agent references stable ce:review skill", async () => {
const content = await readRepoFile("plugins/compound-engineering/agents/review/codex-reviewer.md")
const parsed = parseFrontmatter(content)

expect(String(parsed.data.description)).toContain("ce:review skill")
expect(content).toContain("ce:review pipeline")
expect(content).toContain("passed by ce:review")
expect(content).not.toContain("ce:review-beta")

const tools = String(parsed.data.tools ?? "")
expect(tools).toContain("Read")
expect(tools).toContain("Grep")
expect(tools).toContain("Glob")
expect(tools).toContain("Bash")
})

test("codex-reviewer is registered in stable persona catalog and SKILL.md", async () => {
const catalog = await readRepoFile(
"plugins/compound-engineering/skills/ce-review/references/persona-catalog.md",
)
expect(catalog).toContain("compound-engineering:review:codex-reviewer")
expect(catalog).toContain("Cross-model validation")

const skill = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md")
expect(skill).toContain("compound-engineering:review:codex-reviewer")
})

test("codex-reviewer follows the structured findings contract", async () => {
const content = await readRepoFile("plugins/compound-engineering/agents/review/codex-reviewer.md")

expect(content).toContain("## Confidence calibration")
expect(content).toContain("## What you don't flag")
expect(content).toContain("Return your findings as JSON matching the findings schema. No prose outside the JSON.")
expect(content).toContain('"reviewer": "codex"')
})
})