From e09071d3611de4eae256c15bae108ca29d88ef41 Mon Sep 17 00:00:00 2001 From: Nathan Heskew Date: Fri, 1 May 2026 15:57:26 -0700 Subject: [PATCH] =?UTF-8?q?ci(claude):=20bump=20max-turns=2024=E2=86=9248?= =?UTF-8?q?=20+=20sharpen=20anti-pattern=20in=20Tools=20section?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Diagnosis from harper PR #450 review failure (run 25234303343): agent burned ~20 turns on `Bash(grep -rn …)` searches across the whole repo before hitting `--max-turns 24`, posted no review comment, and the log step skipped (no marker'd comment to log). Diff was small (5 files, +83/−8) — exploration pattern, not diff size, was the cost driver. Two changes: 1. **--max-turns 24 → 48.** Cheap band-aid that ends the immediate dropouts on PRs with non-trivial cross-symbol exploration. Per the existing comment, this is the real cost ceiling — bumping doubles the worst-case budget and moves us out of the 'one exploratory grep too many = no review at all' regime. 2. **Tools section: explicit turn-budget hint + sharpened anti-pattern.** Names "repo-wide search" as THE biggest turn-burner, points at the `Grep` tool with `path` scoping, and cites the failure mode ("recent run that timed out before posting anything") so the agent has concrete signal that matches its observed behavior. Out of scope: investigating whether `Bash(grep …)` is supposed to be denied by the existing `--allowedTools` list (current entries scope to `Bash(git diff:*)`, `Bash(git log:*)`, `Bash(git blame:*)`, `Bash(git show:*)`, `Bash(gh ...:*)` — `Bash(grep …)` shouldn't match any pattern but ran cleanly in #450's failed run, with `is_error: false` on every grep tool call). Either claude-code- action permits any `Bash(...)` once the tool is enabled, or there's a matching gotcha. Tracked as a follow-up; this PR addresses the immediate symptom. oauth needs the same change — will mirror after this lands. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/claude-review.yml | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/.github/workflows/claude-review.yml b/.github/workflows/claude-review.yml index f736e1d60..62196db95 100644 --- a/.github/workflows/claude-review.yml +++ b/.github/workflows/claude-review.yml @@ -158,7 +158,7 @@ jobs: show_full_output: true # TEMP: keep on during calibration so tool denials are visible claude_args: | --model claude-sonnet-4-6 - --max-turns 24 + --max-turns 48 # This workflow is READ-ONLY by design — the agent reviews and # comments, it does not modify the repo. Git subcommands are # scoped individually to strictly read-only operations. @@ -224,9 +224,25 @@ jobs: ## Tools - For file inspection use the `Read`, `Grep`, and `Glob` tools. + **Turn budget:** you have 48 turns total per review. Plan + accordingly: 1–2 turns to read agent context files, 1 turn + to identify changed files, ~1 turn per changed file via + `Read`, the rest for targeted searches and posting + findings. Most reviews finish well under that. Running + out of turns means no review gets posted at all — the + log step skips when no marker'd review comment exists. + + **The single biggest turn-burner is repo-wide search.** + Use the dedicated `Grep` tool with a tight `path` parameter + scoping to the changed files or the specific subdirectory + you need. Do NOT do `grep -rn …` (or `Bash(grep …)`) over + the whole repo — that pattern returned ~20 hits per call + in a recent run that timed out before posting anything. + For file inspection use `Read`, `Grep`, and `Glob` tools. + Do NOT use `cat`, `head`, `tail`, `grep`, `ls`, or `find` - via Bash — those commands are not allowed and waste turns. + via Bash. The Bash allowlist below excludes them on + purpose, and the equivalent dedicated tools are faster. Do NOT run `npm test`, `npm run test:unit`, or any other script that executes PR code — the PR's tests are already checked separately.