fix(ci): serialize E2E runs via repo-wide concurrency mutex by TortoiseWolfe · Pull Request #89 · TortoiseWolfe/ScriptHammer

TortoiseWolfe · 2026-05-13T02:35:49Z

Summary

The E2E suite runs against the project's single Supabase Cloud instance with shared test users, shared conversation rows, and beforeAll cleanupOldMessages hooks in 11+ messaging spec files. Two concurrent CI runs against the same backend race each other and dogpile shared state, blowing the 60-min GitHub Actions job timeout.

Fix is one workflow file edit. No test code changes.

Root cause (verified on 2026-05-13)

Run 25769955636 (main, post-fix(docker): pin pnpm version to honor package.json packageManager field #86 merge, started 00:14:46Z) had E2E (firefox-msg 1/1) cancelled at exactly 60 min.
Run 25770068787 (PR docs(auth): #85 replace stale Supabase project ref with placeholder #88, started 00:18:07Z — 3 min later) was running in parallel against the same Supabase project.
Direct DB query confirmed conversation 4105492f-1047-40ef-bd6e-411788850547 (the one in the cancelled main test's Step 8: Conversation ID: log line) was created at 00:21:55Z by PR docs(auth): #85 replace stale Supabase project ref with placeholder #88's run, not main's.
Main's tests reused PR docs(auth): #85 replace stale Supabase project ref with placeholder #88's conversation (test code detects "Connection already exists / Conversation already exists" and skips creation), then watched PR docs(auth): #85 replace stale Supabase project ref with placeholder #88's cleanupOldMessages wipe the data on each of 11 spec boundaries.

Disproven hypotheses

Theory	Evidence against
Retry-heavy specific test (e.g. `complete-user-workflow.spec.ts`)	That test passed first-attempt at 00:49:26Z; cancel hit much later
Supabase rate limiting	`auth.audit_log_entries` empty for entire window (audit disabled, not 429s); `health` endpoint reports `ACTIVE_HEALTHY` for all services
Cumulative latency from one run's request volume	Two runs were in flight, not one

Diff

 concurrency:
-  group: e2e-${{ github.ref }}
-  cancel-in-progress: true
+  group: e2e-supabase-${{ github.repository }}
+  cancel-in-progress: false

Why both fields had to change:

Just flipping cancel-in-progress wouldn't fix anything — the race is cross-branch (main vs PR), not within one branch.
Just changing the group key without cancel-in-progress: false would let pushes mid-run abort each other and leave Supabase in a half-cleaned state.
${{ github.repository }} keeps forks isolated from upstream — each fork gets its own mutex, preserving the template story.

Trade-off

PRs land slightly slower when 2+ are pushed close together (queue wait up to ~45 min worst case). On this repo's PR volume (~5/day max), the queue is rarely hit. Slower-by-design beats 60-min cancellation + re-run + human debug time.

What this PR does NOT do

Does NOT bump timeout-minutes: 60 — masking the budget is exactly what 9 prior rounds of flake mitigation did.
Does NOT shard messaging tests N-way — current 1/1 shard works fine when not contended (28 min on chromium-msg when alone).
Does NOT drop retries — retries weren't the problem.
Does NOT replace the hand-rolled retry loops in complete-user-workflow.spec.ts or encrypted-messaging.spec.ts — those loops were doing the right thing; they polled while a different CI run was deleting their data. With the mutex in place, they'll succeed on attempt 1.

Test plan

Pre-push hooks (lint, type-check, build) all green
Watch the post-merge E2E run on main — should complete without cancellation in ~28-40 min
Negative test (later): push 2 commits to different branches within 1 min — confirm Actions UI shows "1 queued / 1 running" instead of 2 in parallel

References

Closes round 10 of the E2E flake pattern called out in STATUS.md:127-129
Cancelled job log analyzed: https://github.com/TortoiseWolfe/ScriptHammer/actions/runs/25769955636/job/75694213896

🤖 Generated with Claude Code

The E2E suite runs against the project's single Supabase Cloud instance (huvitqubafsrazpjxsax) with shared test users, shared conversation rows, and beforeAll cleanupOldMessages hooks in 11+ messaging spec files. Concurrent runs against the same backend race each other and dogpile shared state, causing the 60-min GitHub Actions job timeout cap to bite. Verified on 2026-05-13: - Run 25769955636 (main, post-#86 merge, started 00:14:46Z) had its `E2E (firefox-msg 1/1)` job cancelled at exactly 60 min during step "Run E2E tests". Last operation in the captured log: a Realtime subscription waitForSelector on `body[data-messages-subscribed]`. - Run 25770068787 (PR #88 docs cleanup, started 00:18:07Z — only 3 minutes later) ran on the same Supabase project. Direct DB queries confirm conversation `4105492f-1047-40ef-bd6e-411788850547` (the one the cancelled main test was using at "Step 8: Conversation ID:") was created by PR #88's run at 00:21:55Z, not by main's run. - The cancelled run's tests detected "Connection already exists / Conversation already exists" and reused PR #88's data, then watched PR #88's beforeAll hooks wipe it on each of 11 spec boundaries. - Progressive per-spec slowdown across the cancel window (35s, 35s, 2m45s, 1m, 6m48s, 5m6s, 8m36s, 12m16s, 10m11s, 10m46s, cancelled at 10m46s) is the cumulative effect of cross-run data races, not Supabase rate limiting (verified zero rate-limit signals in the Supabase health endpoint, zero auth.audit_log_entries during the window — Supabase internal audit is disabled on this project). Three earlier diagnostic theories were wrong because they all assumed single-actor causation: 1. Retry-heavy specific test (complete-user-workflow.spec.ts retry loop) — disproved: that test passed first-attempt at 00:49:26Z. 2. Rate limiting — disproved: ACTIVE_HEALTHY across all Supabase services, no 429s. 3. Cumulative latency from one run's request volume — disproved: two runs were in flight, not one. The fix is structural: GitHub Actions native `concurrency:` group with a repo-wide key. All E2E runs across all refs (main, PRs, schedule) share one mutex; concurrent runs queue rather than race. Changes to .github/workflows/e2e.yml: Before: concurrency: group: e2e-${{ github.ref }} # per-branch cancel-in-progress: true # kill running on new push After: concurrency: group: e2e-supabase-${{ github.repository }} # repo-wide cancel-in-progress: false # queue, don't cancel Why both fields had to change: - Just flipping cancel-in-progress wouldn't fix anything because the race is cross-branch (main vs PR), not within one branch. - Just changing the group key without cancel-in-progress: false would let pushes mid-run abort each other and leave Supabase in a half-cleaned state. - ${{ github.repository }} keeps forks isolated from upstream — each fork gets its own mutex, preserving the template story. Trade-off accepted: PRs land slightly slower when 2+ are pushed close together (queue wait up to ~45 min worst case). On this repo's PR volume (~5/day max), the queue is rarely hit. Slower-by-design beats 60-min cancellation + re-run + human debug time. What this fix does NOT do (and why each is correct): - Does NOT bump `timeout-minutes: 60` — masking the budget is exactly what 9 prior rounds of flake mitigation did and got us here. - Does NOT shard messaging tests N-way — current 1/1 shard works fine when not contended (28 min on chromium-msg when alone). - Does NOT drop retries — the retries weren't the problem. - Does NOT replace the hand-rolled retry loops in complete-user-workflow.spec.ts:437-462 or encrypted-messaging.spec.ts:311,580 — those loops were doing the right thing; they polled while a different CI run was deleting their data. With the mutex in place, they'll succeed on attempt 1. Lessons captured in memory: - ~/.claude/projects/-home-TurtleWolfe-repos-ScriptHammer/memory/lesson_concurrent_ci_shared_backend.md - ~/.claude/projects/-home-TurtleWolfe-repos-ScriptHammer/memory/feedback_branch_hygiene.md This closes round 10 of the E2E flake pattern (STATUS.md:127-129) by addressing the actual underlying invariant: one writer at a time against the shared Supabase backend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ging tests The 4 messaging E2E specs that set `el.scrollTop = N` programmatically were relying on the browser to auto-fire a `scroll` event so the React component's handleScroll listener could update state. Chromium and Firefox do this reliably; WebKit does not. Result: tests that depend on the listener firing (jump-to-bottom button visibility, auto-scroll behavior) pass on Chromium / Firefox and fail intermittently on WebKit. This was surfaced by run 25774750382 (PR #89) where `webkit-msg` failed on `messaging-scroll.spec.ts:261` "T007-T008: Jump button appears when scrolled and does not overlap input." The test set scrollTop, the React listener at src/components/molecular/MessageThread/MessageThread.tsx:194-212 never ran, showScrollButton stayed false, and the assert on `[data-testid="jump-to-bottom"]` timed out. All 3 retries failed identically. The same pattern existed in three other test sites that hid the bug behind either flaky-retry (PR #88's webkit-msg passed with "1 flaky" tag on a different test) or the assert not depending on the listener firing immediately. Fixing all 4 to prevent future round-N rediscovery: tests/e2e/messaging/messaging-scroll.spec.ts:237 (test setup scroll-to-top) tests/e2e/messaging/messaging-scroll.spec.ts:276 (THE failing test T007-T008) tests/e2e/messaging/messaging-scroll.spec.ts:322 (test setup scroll-to-top) tests/e2e/messaging/encrypted-messaging.spec.ts:665 (test setup scroll-to-top) The change is additive — same scrollTop assignment, plus an explicit `el.dispatchEvent(new Event('scroll', { bubbles: true }))`. handleScroll is idempotent (just reads current scroll position and sets state), so the event firing twice in browsers that auto-fire is harmless. Rationale for bundling with the concurrency mutex fix in this PR: - Both changes are E2E reliability fixes - The webkit failure surfaced *because* the concurrency mutex removed the 60-min cancellation bottleneck, letting webkit-msg run to completion and reveal pre-existing flake that the prior cancel had been masking - Splitting into two PRs would require coordinating their merge order and re-running CI twice; bundled is one merge cycle Verification: - All 4 sites now dispatch the scroll event - Pre-push hooks (lint, type-check, build) green - CI re-run will confirm webkit-msg passes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(status): #31 close + round 10 E2E flake mitigation resolution Four updates to STATUS.md: 1. Snapshot date 2026-05-06 -> 2026-05-13; subtitle now mentions round 10 closure. 2. Feature 019 (Google Analytics) reclassified from "Mostly Shipped (config gap)" `[~]` to "Shipped" `[x]`. The remaining gap was per-fork GA4 property ID provisioning, which is fork-config not code work. Issue #31 closed. 3. The "E2E flake mitigation" hotspot row, Status column: "Ongoing" -> "Resolved". Evidence column rewritten to explain rounds 1-9 attacked symptoms (stale closures, unstable hook refs, hydration timing, retry budgets, stagger delays, Supabase priming) while round 10 found the underlying cause: two concurrent E2E CI runs racing against one shared Supabase project. Fix landed in #89 / commit 996211e — repo-wide `concurrency:` mutex in .github/workflows/e2e.yml + dispatchEvent('scroll') after each programmatic el.scrollTop = N to work around WebKit's quirk of not auto-firing scroll events. 4. Summary-by-Status counts: Shipped 22 -> 23 (019 added), Mostly Shipped 4 -> 3 (019 removed). Verification: confirmed both PR #89 post-merge run (25801256089) and PR #88 post-merge run (25801281715) on main reached completed/success after merging, with the mutex visibly holding #88 in `pending` for ~80 min while #89 ran exclusively. The previous "9 rounds and counting" cycle pattern is broken. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(status): update 007 inline reference to match resolved hotspot The hotspot table (lines 121-127) was already updated to "Resolved" in the previous commit dfd191f, but the inline reference under feature 007 still said "9 rounds of flake mitigation" without the resolution status. Aligning both references so a future reader doesn't see contradictory framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: TurtleWolfe <TurtleWolfe@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#92) Two small hygiene changes that lock in this session's learnings. 1. .gitignore — add `.screenshots/`. The existing `screenshots/` entry on line 120 didn't match the dot-prefixed variant, so .screenshots/ has been showing up in every `git status` run for weeks. One-line fix. 2. CLAUDE.md — new section "CI & E2E Stability (Round 10 Lessons, 2026-05-13)" between "Supabase Database Migrations" and "Important Notes", capturing: - NEVER merge a PR while another PR's CI is running against the same backend. The concurrency mutex in e2e.yml protects this now, but the rule still applies for any other shared backend introduced later. - NEVER bypass commit hooks. Husky + lint-staged + gitleaks catch real bugs; --no-verify is explicitly forbidden without user opt-in. - Programmatic `el.scrollTop = N` does NOT fire scroll events in WebKit reliably. The fix pattern (dispatch the scroll event explicitly) is now applied at 4 sites in tests/e2e/messaging/, and any future test code touching scrollTop should follow it. - Branch hygiene — delete_branch_on_merge=true, fetch --prune after merge, no floating unmerged branches, avoid stacked PRs (PR #87 auto-close footgun documented). - CI logs API ≠ UI. The Actions list page is misleading for in-progress matrix runs; use `gh run view --json` for authoritative state. Plus a small fix on the existing "Important Notes" bullet — the line claiming "E2E tests are local-only, not in CI pipeline" was stale. E2E now runs on every push and PR across 28 shards (chromium + firefox + webkit). Updated to reflect reality and point at the new CI section. Verification: - `git check-ignore .screenshots/` confirms it's now matched by the new line 121 entry - `git status` no longer shows .screenshots/ as untracked - Pre-commit hooks ran clean (prettier + gitleaks) Refs: PR #89 (concurrency mutex), PR #87 (stacked-PR autoclose footgun), memory entries feedback_branch_hygiene.md and lesson_concurrent_ci_shared_backend.md Co-authored-by: TurtleWolfe <TurtleWolfe@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ry (#93) The PRP-STATUS.md dashboard was last fully refreshed 2026-04-25. Six PRs have landed since then (2026-05-12 to 2026-05-14) closing the long-running E2E flake pattern, the #31 GA4 ticket, and improving fork onboarding. Three targeted updates: 1. Header — bump "Last Updated" to 2026-05-14, "Previous Update" to 2026-04-25. Shipped count 17 -> 18 (019 GA moved from Mostly Shipped to Shipped after #31 close on 2026-05-13). Updated "Current Phase" line to reflect round 10 closure. 2. New "v0.4.x updates since 2026-04-25 audit" section between the header and the full feature table — one-paragraph summary of each merged PR (#86, #88, #89, #90, #91, #92) plus the issue closures (#31 GA4, #85 OAuth) with link to the closure comment for #85's outstanding dashboard work. 3. Stability hotspots note — added a callout indicating the E2E flake row in the hotspot table is resolved at round 10. Rounds 1-9 attacked symptoms; round 10 found the underlying cause (concurrent CI runs racing against a shared Supabase project) and fixed it structurally via the concurrency mutex. Other 9 hotspots remain open. Per-feature audit data in the lower sections is left untouched — the 2026-04-25 sweep is still the canonical detail. This refresh is purely the top-of-document changes needed to reflect 19 days of activity. Verification: - grep "Last Updated" docs/prp-docs/PRP-STATUS.md -> "2026-05-14" - Pre-commit hooks pass (prettier + gitleaks) Co-authored-by: TurtleWolfe <TurtleWolfe@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TurtleWolfe and others added 2 commits May 13, 2026 02:33

TortoiseWolfe merged commit 996211e into main May 13, 2026
28 checks passed

TortoiseWolfe deleted the fix/e2e-concurrency-mutex branch May 13, 2026 13:10

TortoiseWolfe mentioned this pull request May 13, 2026

docs(status): #31 close + round 10 E2E flake mitigation resolution #90

Merged

TortoiseWolfe mentioned this pull request May 14, 2026

chore: gitignore .screenshots/ + capture round 10 lessons in CLAUDE.md #92

Merged

TortoiseWolfe mentioned this pull request May 14, 2026

docs(prp-status): #31 close + round 10 E2E closure + PR #86-#92 summary #93

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): serialize E2E runs via repo-wide concurrency mutex#89

fix(ci): serialize E2E runs via repo-wide concurrency mutex#89
TortoiseWolfe merged 2 commits into
mainfrom
fix/e2e-concurrency-mutex

TortoiseWolfe commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants