feat(playwright): add flaky detection via multi-process orchestration by kozlek · Pull Request #85 · Mergifyio/mergify-ci-plugins-ts

kozlek · 2026-04-30T12:57:13Z

Phase 1 runs Playwright normally and records candidate-test outcomes via
the existing reporter pipeline. The quarantine fixture is extended to
also absorb tests listed in the API's unhealthy_test_names so phase-1
failures of known-flaky tests don't break the build. After phase 1, the
reporter spawns a single Playwright subprocess targeting all candidates
(--grep '<regex>' --repeat-each=N) and writes per-attempt outcomes to
a JSONL file. The main reporter aggregates phase-1 + phase-2 outcomes,
emits cicd.test.flaky_detection / .new / .flaky / .rerun_count
attributes on the consolidated test-case spans, and prints a summary.

Each phase-2 rerun is a fresh Playwright invocation, so all fixtures
(including user-defined test.extend(...) ones) re-initialize between
attempts — matching Playwright's normal per-test isolation guarantees.
This avoids the in-test body-wrapping approach we initially attempted,
which broke testInfo.file resolution because the body lived in a
wrapper function instead of the user's spec file.

Activation is gated by _MERGIFY_TEST_NEW_FLAKY_DETECTION=true. Mode is
inferred from vcs.ref.base.name (PR-like → new, push → unhealthy).

Fixes MRGFY-7017

mergify · 2026-04-30T12:57:38Z

kozlek · 2026-04-30T12:59:41Z

Revision history

#	Type	Changes	Date
1	initial	`af4cad1`	2026-04-30 12:59 UTC
2	content	`af4cad1 → df915d5`	2026-04-30 12:59 UTC
3	content	`df915d5 → ecbb059`	2026-04-30 13:02 UTC
4	content	`ecbb059 → ec7a552`	2026-04-30 13:11 UTC
5	content	`ec7a552 → 70a39b9`	2026-05-12 14:28 UTC
6	content	`70a39b9 → 38b39bc` (raw)	2026-05-12 16:41 UTC

Copilot

Pull request overview

Adds a multi-process Playwright flaky-detection “phase 2” rerun orchestration path (gated by _MERGIFY_TEST_NEW_FLAKY_DETECTION) that enriches the shared state file, reruns candidate tests in a subprocess, aggregates outcomes, and annotates test-case spans + prints a summary.

Changes:

Rename and generalize the canonical test key builder to buildTestKey (used for both quarantine and flaky detection).
Extend state-file schema + validation to carry flaky-detection context/mode/candidates/deadlines, and enrich candidates on reporter.onBegin.
Implement phase-2 rerun subprocess + JSONL outcome collection/aggregation, plus add unit + integration test coverage and README docs.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
packages/playwright/src/reporter.ts	Implements rerun-mode JSONL writing, state enrichment, phase-2 subprocess reruns, aggregation, span enrichment, and summary printing.
packages/playwright/src/state-file.ts	Expands state schema to include flaky-detection fields and validates/strips malformed optional fields.
packages/playwright/src/global-setup.ts	Fetches flaky-detection context (feature-flagged) and persists `flakyContext` + `flakyMode` into the state file.
packages/playwright/src/fixture.ts	Extends quarantine absorption to also include `unhealthy_test_names` in unhealthy mode.
packages/playwright/src/utils.ts	Renames `buildQuarantineKey` → `buildTestKey` and updates docs accordingly.
packages/playwright/README.md	Documents flaky-detection preview behavior, modes, attributes, and env vars.
packages/core/src/flaky-detection.ts	Exposes `mode`, `candidates`, and `perTestDeadlineMs` as readonly public fields; refactors candidate/budget computation into a static helper.
packages/core/tests/flaky-detection.test.ts	Adds test coverage for the new public readonly fields on `FlakyDetector`.
packages/playwright/tests/reporter.test.ts	Adds unit tests for state enrichment, annotation parsing, and summary output.
packages/playwright/tests/state-file.test.ts	Adds round-trip/validation tests for new flaky-detection state-file fields.
packages/playwright/tests/utils.test.ts	Updates tests to use `buildTestKey`.
packages/playwright/tests/global-setup.test.ts	Adds tests for feature-flagged flaky context fetching and mode selection.
packages/playwright/tests/integration/run.test.ts	Adds end-to-end integration coverage for unhealthy/new flaky-detection modes.
packages/playwright/tests/fixtures/playwright.config.ts	Makes fixture testDir configurable via `PW_FIXTURE_DIR` for integration tests.
packages/playwright/tests/fixtures/tests-unhealthy/sample.spec.ts	Adds a deterministic flaky fixture spec used by integration tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kozlek · 2026-04-30T13:11:54Z

+    // Build a grep regex from candidate test titles. We escape regex
+    // metacharacters and join with `|`. Anchoring isn't necessary because
+    // Playwright matches against the full test path; substring match is
+    // sufficient and robust to project-name prefixes.
+    const titles = [...this.phase1Outcomes.keys()]
+      .map((k) => k.split(' > ').pop() ?? k)
+      .map((s) => s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'));
+    const grepPattern = `(${titles.join('|')})`;


Partial pushback. Playwright --grep matches the joined title path (describes + title), not the file path, so adding the filepath to the regex would not narrow the match. The aggregation step filters JSONL entries by full-key membership, so non-candidate reruns produce JSONL lines that are dropped — wasted CI time, not incorrect results. Tighter scoping requires switching from --grep to a list-then-filter approach, which is V2 scope. Documented the limitation in the code comment for now.

Phase 1 runs Playwright normally and records candidate-test outcomes via the existing reporter pipeline. The quarantine fixture is extended to also absorb tests listed in the API's `unhealthy_test_names` so phase-1 failures of known-flaky tests don't break the build. After phase 1, the reporter spawns a single Playwright subprocess targeting all candidates (`--grep '<regex>' --repeat-each=N`) and writes per-attempt outcomes to a JSONL file. The main reporter aggregates phase-1 + phase-2 outcomes, emits `cicd.test.flaky_detection` / `.new` / `.flaky` / `.rerun_count` attributes on the consolidated test-case spans, and prints a summary. Each phase-2 rerun is a fresh Playwright invocation, so all fixtures (including user-defined `test.extend(...)` ones) re-initialize between attempts — matching Playwright's normal per-test isolation guarantees. This avoids the in-test body-wrapping approach we initially attempted, which broke `testInfo.file` resolution because the body lived in a wrapper function instead of the user's spec file. Activation is gated by `_MERGIFY_TEST_NEW_FLAKY_DETECTION=true`. Mode is inferred from `vcs.ref.base.name` (PR-like → `new`, push → `unhealthy`). Fixes MRGFY-7017 Change-Id: I8f7ae58a8ea95ff5ddabddd99e769324fcb4079d

Copilot AI review requested due to automatic review settings April 30, 2026 12:57

mergify Bot had a problem deploying to Mergify Merge Protections April 30, 2026 12:57 Failure

Copilot started reviewing on behalf of kozlek April 30, 2026 12:57 View session

kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from af4cad1 to df915d5 Compare April 30, 2026 12:59

mergify Bot had a problem deploying to Mergify Merge Protections April 30, 2026 12:59 Failure

kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from df915d5 to ecbb059 Compare April 30, 2026 13:02

mergify Bot had a problem deploying to Mergify Merge Protections April 30, 2026 13:03 Failure

mergify Bot requested a review from a team April 30, 2026 13:04

Copilot AI reviewed Apr 30, 2026

View reviewed changes

mergify Bot added the review threads unresolved label Apr 30, 2026

kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from ecbb059 to ec7a552 Compare April 30, 2026 13:11

mergify Bot had a problem deploying to Mergify Merge Protections April 30, 2026 13:11 Failure

kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from ec7a552 to 70a39b9 Compare May 12, 2026 14:28

mergify Bot had a problem deploying to Mergify Merge Protections May 12, 2026 14:28 Failure

kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from 70a39b9 to 38b39bc Compare May 12, 2026 16:41

mergify Bot had a problem deploying to Mergify Merge Protections May 12, 2026 16:42 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(playwright): add flaky detection via multi-process orchestration#85

feat(playwright): add flaky detection via multi-process orchestration#85
kozlek wants to merge 1 commit into
mainfrom
devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a

kozlek commented Apr 30, 2026

Uh oh!

mergify Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

kozlek commented Apr 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

kozlek Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

kozlek commented Apr 30, 2026

Uh oh!

mergify Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 Approval

🔴 🔎 Reviews

🟢 Continuous Integration

🟢 Enforce conventional commit

🟢 📕 PR description

Uh oh!

kozlek commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Revision history

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

kozlek Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

mergify Bot commented Apr 30, 2026 •

edited

Loading

kozlek commented Apr 30, 2026 •

edited

Loading