Skip to content

feat(playwright): add flaky detection via multi-process orchestration#85

Draft
kozlek wants to merge 1 commit into
mainfrom
devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a
Draft

feat(playwright): add flaky detection via multi-process orchestration#85
kozlek wants to merge 1 commit into
mainfrom
devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a

Conversation

@kozlek
Copy link
Copy Markdown
Collaborator

@kozlek kozlek commented Apr 30, 2026

Phase 1 runs Playwright normally and records candidate-test outcomes via
the existing reporter pipeline. The quarantine fixture is extended to
also absorb tests listed in the API's unhealthy_test_names so phase-1
failures of known-flaky tests don't break the build. After phase 1, the
reporter spawns a single Playwright subprocess targeting all candidates
(--grep '<regex>' --repeat-each=N) and writes per-attempt outcomes to
a JSONL file. The main reporter aggregates phase-1 + phase-2 outcomes,
emits cicd.test.flaky_detection / .new / .flaky / .rerun_count
attributes on the consolidated test-case spans, and prints a summary.

Each phase-2 rerun is a fresh Playwright invocation, so all fixtures
(including user-defined test.extend(...) ones) re-initialize between
attempts — matching Playwright's normal per-test isolation guarantees.
This avoids the in-test body-wrapping approach we initially attempted,
which broke testInfo.file resolution because the body lived in a
wrapper function instead of the user's spec file.

Activation is gated by _MERGIFY_TEST_NEW_FLAKY_DETECTION=true. Mode is
inferred from vcs.ref.base.name (PR-like → new, push → unhealthy).

Fixes MRGFY-7017

Copilot AI review requested due to automatic review settings April 30, 2026 12:57
@mergify mergify Bot had a problem deploying to Mergify Merge Protections April 30, 2026 12:57 Failure
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Approval

Waiting for

  • #approved-reviews-by >= 2
This rule is failing.
  • #approved-reviews-by >= 2

🔴 🔎 Reviews

Waiting for

  • #review-requested = 0
  • #review-threads-unresolved = 0
This rule is failing.
  • #review-requested = 0
  • #review-threads-unresolved = 0
  • #changes-requested-reviews-by = 0

🟢 Continuous Integration

Wonderful, this rule succeeded.
  • check-success = all-greens

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|ui)(?:\(.+\))?:

🟢 📕 PR description

Wonderful, this rule succeeded.
  • body ~= (?ms:.{48,})

@kozlek kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from af4cad1 to df915d5 Compare April 30, 2026 12:59
@kozlek
Copy link
Copy Markdown
Collaborator Author

kozlek commented Apr 30, 2026

Revision history

# Type Changes Reason Date
1 initial af4cad1 2026-04-30 12:59 UTC
2 content af4cad1 → df915d5 2026-04-30 12:59 UTC
3 content df915d5 → ecbb059 2026-04-30 13:02 UTC
4 content ecbb059 → ec7a552 2026-04-30 13:11 UTC
5 content ec7a552 → 70a39b9 2026-05-12 14:28 UTC
6 content 70a39b9 → 38b39bc (raw) 2026-05-12 16:41 UTC

@mergify mergify Bot had a problem deploying to Mergify Merge Protections April 30, 2026 12:59 Failure
@kozlek kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from df915d5 to ecbb059 Compare April 30, 2026 13:02
@mergify mergify Bot had a problem deploying to Mergify Merge Protections April 30, 2026 13:03 Failure
@mergify mergify Bot requested a review from a team April 30, 2026 13:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a multi-process Playwright flaky-detection “phase 2” rerun orchestration path (gated by _MERGIFY_TEST_NEW_FLAKY_DETECTION) that enriches the shared state file, reruns candidate tests in a subprocess, aggregates outcomes, and annotates test-case spans + prints a summary.

Changes:

  • Rename and generalize the canonical test key builder to buildTestKey (used for both quarantine and flaky detection).
  • Extend state-file schema + validation to carry flaky-detection context/mode/candidates/deadlines, and enrich candidates on reporter.onBegin.
  • Implement phase-2 rerun subprocess + JSONL outcome collection/aggregation, plus add unit + integration test coverage and README docs.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/playwright/src/reporter.ts Implements rerun-mode JSONL writing, state enrichment, phase-2 subprocess reruns, aggregation, span enrichment, and summary printing.
packages/playwright/src/state-file.ts Expands state schema to include flaky-detection fields and validates/strips malformed optional fields.
packages/playwright/src/global-setup.ts Fetches flaky-detection context (feature-flagged) and persists flakyContext + flakyMode into the state file.
packages/playwright/src/fixture.ts Extends quarantine absorption to also include unhealthy_test_names in unhealthy mode.
packages/playwright/src/utils.ts Renames buildQuarantineKeybuildTestKey and updates docs accordingly.
packages/playwright/README.md Documents flaky-detection preview behavior, modes, attributes, and env vars.
packages/core/src/flaky-detection.ts Exposes mode, candidates, and perTestDeadlineMs as readonly public fields; refactors candidate/budget computation into a static helper.
packages/core/tests/flaky-detection.test.ts Adds test coverage for the new public readonly fields on FlakyDetector.
packages/playwright/tests/reporter.test.ts Adds unit tests for state enrichment, annotation parsing, and summary output.
packages/playwright/tests/state-file.test.ts Adds round-trip/validation tests for new flaky-detection state-file fields.
packages/playwright/tests/utils.test.ts Updates tests to use buildTestKey.
packages/playwright/tests/global-setup.test.ts Adds tests for feature-flagged flaky context fetching and mode selection.
packages/playwright/tests/integration/run.test.ts Adds end-to-end integration coverage for unhealthy/new flaky-detection modes.
packages/playwright/tests/fixtures/playwright.config.ts Makes fixture testDir configurable via PW_FIXTURE_DIR for integration tests.
packages/playwright/tests/fixtures/tests-unhealthy/sample.spec.ts Adds a deterministic flaky fixture spec used by integration tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/playwright/src/reporter.ts Outdated
Comment thread packages/playwright/src/reporter.ts Outdated
Comment on lines +458 to +465
// Build a grep regex from candidate test titles. We escape regex
// metacharacters and join with `|`. Anchoring isn't necessary because
// Playwright matches against the full test path; substring match is
// sufficient and robust to project-name prefixes.
const titles = [...this.phase1Outcomes.keys()]
.map((k) => k.split(' > ').pop() ?? k)
.map((s) => s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'));
const grepPattern = `(${titles.join('|')})`;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial pushback. Playwright --grep matches the joined title path (describes + title), not the file path, so adding the filepath to the regex would not narrow the match. The aggregation step filters JSONL entries by full-key membership, so non-candidate reruns produce JSONL lines that are dropped — wasted CI time, not incorrect results. Tighter scoping requires switching from --grep to a list-then-filter approach, which is V2 scope. Documented the limitation in the code comment for now.

Comment thread packages/playwright/src/reporter.ts Outdated
@kozlek kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from ecbb059 to ec7a552 Compare April 30, 2026 13:11
@mergify mergify Bot had a problem deploying to Mergify Merge Protections April 30, 2026 13:11 Failure
@kozlek kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from ec7a552 to 70a39b9 Compare May 12, 2026 14:28
@mergify mergify Bot had a problem deploying to Mergify Merge Protections May 12, 2026 14:28 Failure
Phase 1 runs Playwright normally and records candidate-test outcomes via
the existing reporter pipeline. The quarantine fixture is extended to
also absorb tests listed in the API's `unhealthy_test_names` so phase-1
failures of known-flaky tests don't break the build. After phase 1, the
reporter spawns a single Playwright subprocess targeting all candidates
(`--grep '<regex>' --repeat-each=N`) and writes per-attempt outcomes to
a JSONL file. The main reporter aggregates phase-1 + phase-2 outcomes,
emits `cicd.test.flaky_detection` / `.new` / `.flaky` / `.rerun_count`
attributes on the consolidated test-case spans, and prints a summary.

Each phase-2 rerun is a fresh Playwright invocation, so all fixtures
(including user-defined `test.extend(...)` ones) re-initialize between
attempts — matching Playwright's normal per-test isolation guarantees.
This avoids the in-test body-wrapping approach we initially attempted,
which broke `testInfo.file` resolution because the body lived in a
wrapper function instead of the user's spec file.

Activation is gated by `_MERGIFY_TEST_NEW_FLAKY_DETECTION=true`. Mode is
inferred from `vcs.ref.base.name` (PR-like → `new`, push → `unhealthy`).

Fixes MRGFY-7017

Change-Id: I8f7ae58a8ea95ff5ddabddd99e769324fcb4079d
@kozlek kozlek force-pushed the devs/kozlek/feat/playwright-flaky-detection/add-flaky-detection-via-multi-process--8f7ae58a branch from 70a39b9 to 38b39bc Compare May 12, 2026 16:41
@mergify mergify Bot had a problem deploying to Mergify Merge Protections May 12, 2026 16:42 Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants