Skip to content

unix: use extended PGO training set#1018

Draft
indygreg wants to merge 2 commits intomainfrom
pgo-extended
Draft

unix: use extended PGO training set#1018
indygreg wants to merge 2 commits intomainfrom
pgo-extended

Conversation

@indygreg
Copy link
Copy Markdown
Collaborator

This will run all tests by default and ensure we have maximal training coverage for PGO.

@geofft
Copy link
Copy Markdown
Collaborator

geofft commented Mar 20, 2026

I suspect this PR is equivalent to "get the full test suite passing," which we've talked about wanting to do but hasn't quite been prioritized :)

@indygreg
Copy link
Copy Markdown
Collaborator Author

Yeah. I just wanted to throw up the PR to see what kind of damage we were looking at for enabling the full test harness. It looks substantial :/

@indygreg indygreg changed the base branch from main to gps-pgo-tweaks March 21, 2026 05:30
@indygreg indygreg force-pushed the pgo-extended branch 4 times, most recently from 0e37663 to c3eb94d Compare March 21, 2026 07:23
@indygreg indygreg force-pushed the pgo-extended branch 3 times, most recently from c7381c9 to 0bfc6cc Compare March 21, 2026 12:31
Base automatically changed from gps-pgo-tweaks to main March 23, 2026 21:55
@indygreg indygreg force-pushed the pgo-extended branch 17 times, most recently from f87b1f0 to 71059e7 Compare March 29, 2026 09:04
This commit overhauls our ability to run the stdlib test harness and enables
the stdlib test harness in CI for builds that we can run natively in CI.

Previously, `testdist.py` called a `run_tests.py` script that was bundled
in the distribution. This script was simply a wrapper to calling
`python -m test --slow-ci`. And `--slow-ci` currently expands to
`--multiprocess 0 --randomize --fail-env-changes --rerun --print-slow --verbose3
-u all --timeout 1200`.

This commit effectively inlines `run_tests.py` into `testdir.py` as well as
greatly expands functionality for running the test harness.

When enabling the stdlib test harness in CI as part of this commit, several
test failures were encountered, especially in non-standard builds like
`static` and `debug`. Even the `freethreaded` builds seemed to encounter a
significant amount of failures (many of them intermittent), implying that
the official CPython CI is failing to catch a lot of legitimate test failures.

We want PBS to run stdlib tests to help us catch changes in behavior. And we
can only do that if the CI pass/fail signal is high quality: we don't want CI
"passing" if there are changes to test pass/fail behavior.

Achieving this requires annotating all tests that can potentially fail. And then
the test harness needs to validate that these annotations are accurate (read: that
tests actually fail).

So this commit introduces a `stdlib-test-annotations.yml` file in the root
directory. It contains rules that filter a build configuration and 3 sections
that describe specific annotations:

1. Skip running the test harness completely. This is necessary on some builds
   that are just so broken it wasn't worth annotating tests because so many
   tests failed.
2. Exclude all tests within a given Python module. This is reserved for scenarios
   where importing the test module fails and causes most/all tests to fail. Again,
   a mechanism to short-circuit having to annotate every failing test.
3. Expected test failures. The most common annotation. These annotations describe
   individual tests or glob pattern matches of tests that are "expected" to fail.
   Entries can be annotated as "intermittent" or "dont-verify" to allow the test
   to pass without failing our test harness.

Most of the new code is in support of reading and applying these annotations.

At build time, we read the `stdlib-test-annotations.yml` file and derive a new
`stdlib-test-annotations.json` file with only the active annotations matching the
build configuration. This file is included in the build distribution as
`python/build/stdlib-test-annotations.json`. It has to be JSON so the Python test
harness runner is able to read the file using just the stdlib.

`test-distributions.py` has gained some new functionality, including the ability
to run the stdlib test harness with raw arguments and emit a JUnit XML file with
test results.

One of the things the test harness does now is attempt to ensure that tests annotated
as failing actually fail. However, this isn't enforced for tests marked as "intermittent"
or "dont-verify." You need an asynchronous mechanism looking at historical execution
results to assess whether an "intermittent" test is such. We facilitate this by uploading
a JUnit XML artifact with details of test execution. But the mining of historical test
results is not implemented. (And I'm not sure if it is worth implementing.)

It took dozens of iterations to get a reliably working set of test annotations.
There's just lots of variability across build configurations and Python versions.
Despite best efforts, there's likely a few lingering intermittent failures that
aren't yet annotated.
@indygreg indygreg force-pushed the pgo-extended branch 3 times, most recently from 878bab2 to 149dd5b Compare March 30, 2026 02:23
This will run all tests, ensuring maximal training coverage.

As part of this, we had to annotate/ignore every failing test because
test failures would otherwise fail the build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants