Get a second opinion from other AI models without leaving Claude Code. This skill sends your prompt to GPT (via OpenAI Codex CLI) and Gemini (via Gemini CLI), has them debate each other, and brings back a structured summary you can act on.
Prerequisites:
- Claude Code installed
- OpenAI Codex CLI installed (
npm install -g @openai/codexorbrew install --cask codex) - Gemini CLI installed (
npm install -g @google/gemini-cli) - Authenticated:
codex login(or setOPENAI_API_KEY) andgemini auth(or setGEMINI_API_KEY)
Optional fallback: GitHub Copilot CLI — used for GPT if Codex CLI is unavailable (brew install github/gh/copilot-cli, then gh auth login)
Install the skill:
Via plugin marketplace (recommended — auto-updates when you push changes):
/plugin marketplace add maleick/peer-review
/plugin install peer-review@peer-reviewManual install (no auto-updates):
# Clone this repo
git clone https://github.com/Maleick/peer-review.git ~/Projects/peer-review
# Copy command to Claude Code
mkdir -p ~/.claude/commands
cp ~/Projects/peer-review/plugins/peer-review/commands/peer-review.md ~/.claude/commands/Use it:
/peer-review Should we use Redis or Memcached for our session cache?
That's it. Claude dispatches your question to GPT-5.4 (via Codex CLI) and Gemini (via Gemini CLI), they each review it from different angles, debate each other's responses, and you get back a numbered list of action items you can accept, cherry-pick, or discard.
You type: /peer-review We plan to add WebSocket support to the API
Claude does this:
1. Sends your prompt to GPT via Codex CLI (as an "implementation reviewer" — finds edge cases, concrete risks)
2. Sends your prompt to Gemini via Gemini CLI (as a "strategic reviewer" — finds architectural issues)
3. Shows each model the other's response and asks them to critique it
4. Reads everything and produces a Decision Packet with numbered action items
5. Asks you: Accept all? Cherry-pick? Refine? Discard?
The key insight: each model gets a different reviewer persona tuned to its strengths. GPT focuses on tactical implementation risks. Gemini focuses on strategic architecture concerns. Then they challenge each other, which filters out weak arguments and surfaces genuine consensus.
20 modes covering review, brainstorming, adversarial analysis, deployment, API design, performance, and more. Highlights:
| Command | What It Does |
|---|---|
/peer-review <plan> |
Implementation + architecture review (default) |
/peer-review redteam <plan> |
Find attack vectors, failure modes, blind spots |
/peer-review debate <question> |
Pro/con argument with judge synthesis |
/peer-review gate |
Review Claude's own output before proceeding |
/peer-review delegate <task> |
Hand off coding to GPT/Gemini with write access |
/peer-review diff |
Review staged/unstaged git changes |
/peer-review --modes m1,m2 |
Run multiple modes in parallel (cap: 4) |
Full modes table, options, model aliases, multi-mode presets, and configuration reference: docs/SPEC.md
Legacy alias: /brainstorm maps to the same modes.
Instead of adversarial cross-examination (find weaknesses), steelman mode asks each model to first make the strongest possible version of the other's argument before critiquing it. This produces deeper analysis — models can't dismiss arguments they've just strengthened. No extra cost (same number of CLI calls).
Turns peer-review into a convergence loop. After each review, Claude auto-accepts HIGH CONFIDENCE items, applies fixes to the file, and re-reviews. Repeats until no new HIGH items appear or max iterations reached.
/peer-review refactor src/auth.ts --iterate 3
Iteration 1: 9 items (3 HIGH) → auto-apply 3 fixes
Iteration 2: 5 items (1 HIGH, 4 RESOLVED) → auto-apply 1 fix
Iteration 3: 2 items (0 HIGH) → convergence achieved
Safety rails prevent runaway iteration: a validation gate syntax-checks each fix before applying (Python, JS/TS, Shell, JSON), scope control blocks deletions/renames/multi-file/schema changes without approval, and a diff size guard pauses on fixes exceeding 50 lines. Type stop at any iteration to halt, or override to switch to manual cherry-pick. Regressions (more items than before) trigger an automatic pause.
## Peer Review: review — "API rate limiting design"
### Claude's Take
> [Analysis using codebase context that external models don't have]
### GPT (Implementation Reviewer)
[Concrete risks, edge cases, severity ratings, specific fixes]
### Gemini (Strategic Reviewer)
[Systemic risks, alternative approaches, long-term implications]
### Cross-Examination Highlights
- GPT challenged Gemini's caching suggestion as premature optimization
- Both converged on the need for per-tenant rate limiting
### Decision Packet
Summary: 8 items — 1 critical, 3 high, 3 medium, 1 low
Recommended path: ...
Actionable items:
1. [HIGH CONFIDENCE] Add per-tenant limits _(consensus)_
2. [MEDIUM] Consider token bucket over sliding window _(GPT)_
3. [LOW] Evaluate distributed rate limiting _(Gemini, challenged by GPT)_
### Priority Matrix
| | Low Effort | High Effort |
| --------------- | ---------- | ----------- |
| **High Impact** | Items 1, 3 | Item 2 |
| **Low Impact** | Item 5 | Item 4 |
---
What would you like to do with this feedback?
- **Accept all** / **Cherry-pick** (e.g., "keep 1, 3") / **Refine** / **Discard**- Claude verifies the Codex CLI and Gemini CLI are installed and authenticated (falls back to Copilot CLI for GPT if needed)
- Scans your prompt for sensitive data (API keys, JWT tokens, AWS keys, GitHub/Slack tokens, PEM keys, high-entropy strings, credentials) — blocks dispatch if found
- Builds role-differentiated prompts for each model
- Dispatches to both models in parallel via their respective CLIs
- Sanitizes model output before reuse (cross-exam, TODOs, JSON export, iteration fixes)
- Each cross-examination round includes the original task, the model's own prior response, and the peer's response — so models maintain context across rounds
- Synthesizes all rounds into a Decision Packet with confidence levels based on cross-exam convergence
- Presents the cherry-pick menu
Security: Prompts are written to temp files with restricted permissions, piped via stdin (never command-line args), and cleaned up after each call. Heredoc delimiters use cryptographically random suffixes to prevent injection. Cross-examination uses randomized DATA boundary markers. Diff mode uses block-by-default privacy — sensitive diffs require --allow-sensitive. JSON exports are scanned for leaked secrets post-write.
| Problem | Fix |
|---|---|
PREFLIGHT_FAIL: No GPT CLI found |
Install Codex CLI: npm install -g @openai/codex, then codex login |
PREFLIGHT_FAIL: Gemini CLI not found |
Install Gemini CLI: npm install -g @google/gemini-cli, then gemini auth |
PREFLIGHT_WARN: Codex CLI auth may not be configured |
Run codex login or set OPENAI_API_KEY |
GPT_FAILED with exit code |
Re-authenticate: codex login. If using Copilot fallback: gh auth login |
GEMINI_FAILED with exit code |
Re-authenticate: gemini auth or check GEMINI_API_KEY |
| Rate limited | Wait a few minutes, or use /peer-review quick for a lighter call |
| Timeout (no response after 3 min) | Prompt may be too large. Split into smaller reviews. |
| Empty/partial output | Model returned a stub. Retry, or use single-target mode to isolate. |
Debug tip: Test each model independently with /peer-review gpt <prompt> or /peer-review gemini <prompt>.
MIT