Skip to content

Weekly regression chunks#2114

Open
steveri wants to merge 47 commits intomasterfrom
weekly-regression-chunks
Open

Weekly regression chunks#2114
steveri wants to merge 47 commits intomasterfrom
weekly-regression-chunks

Conversation

@steveri
Copy link
Copy Markdown
Contributor

@steveri steveri commented Mar 24, 2026

This is part of my ongoing effort to make weekly regressions more useful and more manageable.

Instead of trying to run the entire suite of weekly apps in a single 2-day chunk, this new code runs it in sequential chunks, where each chunk is a restartable group i.e. glb_tests, glb_tests_RV etc. It still takes 2 days for the entire run, but a single failure at the end no longer requires a whole new 2-day run from the beginning. Also, when/if a group fails, the test continues on to run the remaining groups. This way, you can optionally restart the failed step/group even as the remaining groups continue to run.

Examples:

In terms of being more manageable, weekly runs now bypass the weird byzantine regress-metahooks/regression-steps mechanism in favor of a much simpler weekly.yml driver. The new driver gets loaded as soon as pipeline.yml recognizes that we are doing a weekly run and not the normal aha1-9 regressions.

And also I took this opportunity to simplify and optimize the way we do E64_supported_test checks.

Summary of changes

New files

  • generate-weekly-pipeline.sh
  • weekly.yml: much simpler full regression pipeline, generated with help from generate-weekly-pipeline script

Changed files

  • pipeline.yml: new "Launch Weekly Run" step lets us swap weekly.yml in place of normal aha9 regressions
  • app: added new --subgroup option to run a single config group standalone
  • repress.py
    • support for new --group(s) option e.g. can do e.g. "--groups glb_tests,resnet_tests"
    • print timing table after every app success or failure
    • reduced try/except block extent
  • repress_info.py: fixed method summarize_and_print_info(), which was supposed to use the timing table read-only, but oops no
  • tests.py and regress_util.py: E64_supported_test group properties do not belong with dynamically-loaded executable app groups and directives, so this fixes that.

@steveri steveri requested review from mcoduoza and yuchen-mei March 25, 2026 14:47
@steveri
Copy link
Copy Markdown
Contributor Author

steveri commented Mar 25, 2026

This one is ready to go.
I think these changes will make our lives better, at least I hope so!

@yuchen-mei
Copy link
Copy Markdown
Collaborator

Hi Steve, thanks for making these changes! We unfortunately have to work on a resubmission of the conference paper, which is due pretty soon. We will review and approve the changes after we finish the resubmission on April 10.

@steveri
Copy link
Copy Markdown
Contributor Author

steveri commented Mar 26, 2026

Hi Steve, thanks for making these changes! We unfortunately have to work on a resubmission of the conference paper, which is due pretty soon. We will review and approve the changes after we finish the resubmission on April 10.

Sure, no problem. Let me know if you need/want help with the paper.

@steveri
Copy link
Copy Markdown
Contributor Author

steveri commented Apr 15, 2026

Okay guys, this PR is still waiting for some kind of action. If we can get it approved and merged before Friday, then I think we have a chance of getting through a weekly regression this weekend, for the first time since November maybe :)

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants