Skip to content

fix(design): bump image-gen timeout to 240s + pin gpt-image-2#1586

Open
matteo-hertel wants to merge 1 commit into
garrytan:mainfrom
matteo-hertel:fix/design-timeout-gpt-image-2
Open

fix(design): bump image-gen timeout to 240s + pin gpt-image-2#1586
matteo-hertel wants to merge 1 commit into
garrytan:mainfrom
matteo-hertel:fix/design-timeout-gpt-image-2

Conversation

@matteo-hertel
Copy link
Copy Markdown

@matteo-hertel matteo-hertel commented May 18, 2026

Problem

/design-shotgun (and any flow using the design binary) appears to hang indefinitely and never opens the comparison board.

Root cause: the binary calls POST /v1/responses with model: gpt-4o + the image_generation tool at quality: "high", 1536x1024, but aborts the request after a hardcoded 120 000 ms (design/src/generate.ts, variants.ts, evolve.ts, iterate.ts).

That class of request consistently takes ~140-160 s end-to-end. Every generation aborts before the image is ready. In /design-shotgun, Step 3c launches N parallel agents each calling $D generate; every one aborts at 120 s, retries (another 120 s), all N fail, the board never opens, so the skill looks like an infinite hang.

Evidence

Reproduced the exact API call the binary makes, with a longer budget:

Call Result
identical request, 300 s budget HTTP 200, valid 1.7 MB image, 143.5 s
same with model: gpt-image-2 HTTP 200, valid 1.85 MB image, 143.4 s
binary abort deadline 120 s fires every time

Then ran a real /design-shotgun end-to-end with the patched binary, 3 variants generated in parallel:

Variant Generate time Quality gate
A 150.0 s pass
B 161.0 s pass
C 152.1 s pass

The 161 s case is the key data point: a naive bump to 150 s would still have failed it. 240 s leaves real margin while still bounding a genuinely stuck request.

Change

  • Bump the AbortController timeout 120_000 to 240_000 in generate.ts, variants.ts, evolve.ts, iterate.ts (both call sites in iterate.ts).
  • Pin the image_generation tool to model: "gpt-image-2".

Testing

  • design/test/variants-retry-after.test.ts (exercises the changed retry/timeout path): 5 pass, 0 fail.
  • Real /design-shotgun run: 3/3 variants generated and passed the vision quality gate, comparison board served.
  • The design/test/feedback-roundtrip.test.ts failures are pre-existing and unrelated: they fail identically on clean main (session.clearLoadedHtml is not a function, a browse-module breakage in write-commands.ts). Not touched by this PR.

🤖 Generated with Claude Code

@badcom
Copy link
Copy Markdown

badcom commented May 18, 2026

There's a related PR open for this same issue: #1528

The design binary calls /v1/responses (gpt-4o + image_generation tool,
quality:high, 1536x1024) but aborted the request after a hardcoded 120s.
That class of request consistently takes ~140-160s end-to-end, so every
generate/variants/evolve/iterate call aborted before the image returned.

In /design-shotgun this cascades: Step 3c launches N parallel agents,
each calling `$D generate`, each aborts at 120s and retries, all fail,
the comparison board never opens — the skill appears to hang indefinitely.

Reproduced the exact API call with a longer budget: HTTP 200, valid
image, 143.5s. A real /design-shotgun run after the patch generated 3
variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the
161s case, which a naive 150s bump would still have failed.

- Bump AbortController timeout 120_000 -> 240_000 in generate.ts,
  variants.ts, evolve.ts, iterate.ts (both call sites)
- Pin the image_generation tool to model "gpt-image-2"

design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The
feedback-roundtrip.test.ts failures are a pre-existing browse-module
breakage (session.clearLoadedHtml undefined), unrelated to this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@matteo-hertel matteo-hertel force-pushed the fix/design-timeout-gpt-image-2 branch from 77258cc to ccbf263 Compare May 18, 2026 15:46
@matteo-hertel
Copy link
Copy Markdown
Author

There's a related PR open for this same issue: #1528

@badcom the two PRs are fundamentally different. The fact that a PR exists doesn't meant it's the only viable solution.
The one you linked makes the timeout configurable, this one increases the timeout and bumps gpt-image-2.
Ultimately I get little value by having the time configurable, and therefore I made this.
If the other one is merged then feel free to close this, I don't mind at all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants