Skip to content

test(cloudflare): Reduce flakiness for cloudflare with sub workers#20632

Merged
JPeer264 merged 5 commits intodevelopfrom
fn/unflake-cloudflare-test
May 4, 2026
Merged

test(cloudflare): Reduce flakiness for cloudflare with sub workers#20632
JPeer264 merged 5 commits intodevelopfrom
fn/unflake-cloudflare-test

Conversation

@mydea
Copy link
Copy Markdown
Member

@mydea mydea commented May 4, 2026

Fixes #20626

It seems that cloudflare integration tests can be flaky.
After some digging into this with some clanker help, the idea came up that this is due to sub workers not being ready when DEV_SERVER_READY is emitted. To accomodate this, in the case of a sub worker existing this just adds a little delay (100ms, random number) to wait for before we continue, hopefully reducing flakes. AI wanted to look at more complex things like retrying requests etc. but this felt a bit overkill, I think this is likely good enough if that is indeed the problem...?

@mydea mydea requested a review from JPeer264 May 4, 2026 09:08
@mydea mydea self-assigned this May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

size-limit report 📦

Path Size % Change Change
@sentry/browser 26.31 kB - -
@sentry/browser - with treeshaking flags 24.8 kB - -
@sentry/browser (incl. Tracing) 44.2 kB - -
@sentry/browser (incl. Tracing + Span Streaming) 46.42 kB - -
@sentry/browser (incl. Tracing, Profiling) 49.16 kB - -
@sentry/browser (incl. Tracing, Replay) 83.58 kB - -
@sentry/browser (incl. Tracing, Replay) - with treeshaking flags 73.04 kB - -
@sentry/browser (incl. Tracing, Replay with Canvas) 88.26 kB - -
@sentry/browser (incl. Tracing, Replay, Feedback) 100.87 kB - -
@sentry/browser (incl. Feedback) 43.47 kB - -
@sentry/browser (incl. sendFeedback) 31.12 kB - -
@sentry/browser (incl. FeedbackAsync) 36.21 kB - -
@sentry/browser (incl. Metrics) 27.62 kB - -
@sentry/browser (incl. Logs) 27.75 kB - -
@sentry/browser (incl. Metrics & Logs) 28.45 kB - -
@sentry/react 28.05 kB - -
@sentry/react (incl. Tracing) 46.42 kB - -
@sentry/vue 31.18 kB - -
@sentry/vue (incl. Tracing) 46.04 kB - -
@sentry/svelte 26.34 kB - -
CDN Bundle 28.91 kB - -
CDN Bundle (incl. Tracing) 46.95 kB - -
CDN Bundle (incl. Logs, Metrics) 30.34 kB - -
CDN Bundle (incl. Tracing, Logs, Metrics) 48.06 kB - -
CDN Bundle (incl. Replay, Logs, Metrics) 69.41 kB - -
CDN Bundle (incl. Tracing, Replay) 84.11 kB - -
CDN Bundle (incl. Tracing, Replay, Logs, Metrics) 85.16 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback) 89.91 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics) 91.01 kB - -
CDN Bundle - uncompressed 84.72 kB - -
CDN Bundle (incl. Tracing) - uncompressed 140.31 kB - -
CDN Bundle (incl. Logs, Metrics) - uncompressed 88.92 kB - -
CDN Bundle (incl. Tracing, Logs, Metrics) - uncompressed 143.77 kB - -
CDN Bundle (incl. Replay, Logs, Metrics) - uncompressed 212.86 kB - -
CDN Bundle (incl. Tracing, Replay) - uncompressed 258.11 kB - -
CDN Bundle (incl. Tracing, Replay, Logs, Metrics) - uncompressed 261.56 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback) - uncompressed 271.81 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics) - uncompressed 275.25 kB - -
@sentry/nextjs (client) 48.92 kB - -
@sentry/sveltekit (client) 44.67 kB - -
@sentry/node-core 59.13 kB +0.02% +9 B 🔺
@sentry/node 170.42 kB +0.01% +10 B 🔺
@sentry/node - without tracing 97 kB +0.02% +10 B 🔺
@sentry/aws-serverless 113.85 kB +0.03% +33 B 🔺
@sentry/cloudflare (withSentry) - minified 165.2 kB - -
@sentry/cloudflare (withSentry) 417.71 kB - -

View base workflow run

@JPeer264
Copy link
Copy Markdown
Member

JPeer264 commented May 4, 2026

quite interesting - a delay could mitigate it, but still won't guarantee a successful start.

Maybe it would be better to have another deferredPromise inside the sub worker start and the main worker waits for that port 🤔

@JPeer264
Copy link
Copy Markdown
Member

JPeer264 commented May 4, 2026

@mydea I initially changed the logic to match the main workers functionality which was technically almost the same as before. I then checked how the workers-sdk is doing that, and they basically just check for the CLI output and the "Ready on ...." output.

I added it, should work now a little better 🤞

@JPeer264 JPeer264 marked this pull request as ready for review May 4, 2026 11:36
@JPeer264 JPeer264 requested a review from a team as a code owner May 4, 2026 11:36
Comment on lines +227 to +237
stdout.on('data', (chunk: Buffer) => {
const text = chunk.toString();
if (process.env.DEBUG) process.stdout.write(text);
output += text;

const match = output.match(/Ready on (https?:\/\/[^\s]+)/);
if (match?.[1]) {
resolve(parseInt(new URL(match[1]).port, 10));
}
});
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The waitForReady function can hang indefinitely because it lacks an exit event handler for the child process. If the process exits before it's ready, tests will time out.
Severity: MEDIUM

Suggested Fix

Add an on('exit', ...) event handler to the child process within the waitForReady function. This handler should reject the promise if the process exits before the 'Ready on' message is received, ensuring that tests fail fast with a clear error instead of timing out. A similar implementation exists for the childSubWorker which can be used as a reference.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: dev-packages/cloudflare-integration-tests/runner.ts#L218-L237

Potential issue: The `waitForReady` function lacks a handler for the child process's
'exit' event. If the `wrangler` process exits prematurely (e.g., due to a configuration
error or port conflict) before emitting its "Ready on" message, the promise returned by
`waitForReady` will never resolve or reject. This causes any test calls to `makeRequest`
to also hang, leading to a test timeout instead of a fast failure with a clear error
message. The existing 'error' event handler is insufficient to catch all premature exit
scenarios.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 952baff. Configure here.

childSubWorker?.on('error', onChildError);
child.on('error', onChildError);
child.on('message', (msg: string) => onChildMessage(msg, setWorkerPort));
waitForReady(child).then(setWorkerPort).catch(reject);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate error handler registered on sub-worker process

Low Severity

The onChildError handler is registered on childSubWorker's 'error' event twice — once at line 259 inside the if (existsSync(...)) block, and again at line 293 outside it via optional chaining. When a sub-worker exists and emits an error, onChildError fires twice, logging the error and calling reject redundantly.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 952baff. Configure here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, forgot to change that part of the code

resolve(parseInt(new URL(match[1]).port, 10));
}
});
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForReady hangs forever if process exits early

Medium Severity

waitForReady only resolves when "Ready on" appears in stdout but never rejects if the process exits or errors before that. At line 264, await waitForReady(childSubWorker) blocks indefinitely if the sub-worker crashes during startup. The outer onChildError handler rejects the isComplete promise, but waitForReady's own promise (which shadows reject in its constructor) never settles, so the async function remains stuck and the main worker is never spawned. The old code properly rejected the awaited startup promise on process exit/error. This is a regression that can cause test hangs instead of fast, clear failures.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 952baff. Configure here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would only "hang" for as long as the test timeout. Also if the subworker process would exit, then it is flaky anyways.

@mydea
Copy link
Copy Markdown
Member Author

mydea commented May 4, 2026

Sounds good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants