Skip to content

[CI]Add parity report scripts and workflow#3094

Merged
jithunnair-amd merged 77 commits intodevelopfrom
add-parity-scripts-dashboard
Apr 10, 2026
Merged

[CI]Add parity report scripts and workflow#3094
jithunnair-amd merged 77 commits intodevelopfrom
add-parity-scripts-dashboard

Conversation

@ethanwee1
Copy link
Copy Markdown

@ethanwee1 ethanwee1 commented Mar 20, 2026

Summary

  • Add pytorch-unit-test-scripts/ directory with all parity scripts (download_testlogs, summarize_xml_testreports, parity.sh, and supporting utilities)
  • Add parity.yml GitHub Actions workflow that can be manually triggered to download CI artifacts and generate parity CSVs
  • All download_testlogs and summarize_xml_testreports.py flags are exposed as workflow inputs (SHA, PR ID, arch, exclude flags, filter, set names, etc.)
  • Architectures are configurable via comma-separated input (default: mi200,mi300,mi355)
  • Generated CSVs and logs are uploaded as downloadable workflow artifacts

Setup

Requires these repository secrets:

  • - IFU_GITHUB_TOKEN (already exists)
  • - AWS_ACCESS_KEY_ID
  • - AWS_SECRET_ACCESS_KEY

Test plan

Move unit test parity scripts from frameworks-internal into ROCm/pytorch.
Includes download_testlogs, summarize_xml_testreports, parity.sh, and
supporting utilities. Also adds a daily GitHub Actions workflow that
generates parity CSVs for MI200/MI300/MI355 and deploys a skip reason
dashboard to GitHub Pages.
Replace hardcoded workflow with full configurability: SHA, PR ID,
architecture list, exclude flags, ignore_status, artifacts_only,
no_rocm, no_cuda, set1/set2 names, and status filter. Remove
dashboard/Pages -- workflow now just generates and uploads CSVs
as downloadable artifacts.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Mar 20, 2026

Jenkins build for 67120c11565d67066bccdc1232b8b3f11a6edbd6 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Mar 20, 2026

Jenkins build for e24415b7052760edf0e4eb16ee3fb74c975097a1 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

- New generate_summary.py produces a combined summary CSV with
  per-architecture columns showing per-workflow stats and overall
  parity metrics.
- Workflow now has a generate-summary job that runs after all
  per-arch CSVs are generated and uploads the summary as an artifact.
- Default arch order changed to mi355, mi300, mi200.
- Arch input now accepts both commas and spaces as delimiters.
- Fixed upload glob to capture all CSVs regardless of csv_name.
- generate_summary.py now outputs both .csv and .md files
- Markdown uses proper tables with section headers per workflow
- Workflow writes the .md to GITHUB_STEP_SUMMARY so it renders
  directly on the workflow run page
Shows test_file, test_class, test_name, workflow, and both statuses
for every test marked FAILED on either set, grouped by architecture.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Mar 20, 2026

Jenkins build for c93e9c137fc60256ed6dc4775218224b39e60b49 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

When trunk workflow has no distributed jobs for MI355 at the given SHA
(e.g. PR commits), fall back to periodic-rocm-mi355 workflow. Also
switches the job prefix to linux-noble-rocm-py3.12-mi355 when the
fallback is used.
When csv_name is provided (e.g. "march_report"), artifacts are named:
- march_report_mi355 (per-arch CSV)
- march_report_summary (summary)

Without csv_name, defaults to parity-csv-ARCH and parity-summary.
- Replace artifacts_only with include_logs checkbox (default: off).
  When checked, CI log files (.txt) are downloaded and included.
- Move download_testlogs tee log into the output folder so it is
  always captured in the artifact zip.
- Upload step now captures *.csv, *.log, and *.txt from the output
  folder, putting everything into one zip per architecture.
Query the GitHub API after all artifacts are uploaded to resolve
real artifact IDs and generate direct download links.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Mar 20, 2026

Jenkins build for c93e9c137fc60256ed6dc4775218224b39e60b49 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@ethanwee1 ethanwee1 changed the title Add parity report scripts and workflow [Develop][CI]Add parity report scripts and workflow Mar 22, 2026
Workflow now automatically downloads the previous week's CSV to carry
forward skip_reason, assignee, comments, and existed_last_week columns:
1. Checks the parity-input GitHub Release for a user-edited CSV
2. Falls back to the last successful parity run's artifact
3. Gracefully skips if neither is available

Also adds a README documenting the weekly parity workflow process.
MI200 tests have moved from separate rocm-mi200/periodic-rocm-mi200/
inductor-rocm-mi200 workflows into the unified trunk-rocm-sandbox
workflow. All three test types (default, distributed, inductor) now
use the same workflow and linux-jammy-rocm-py3.10 job prefix.
Generates a self-contained HTML dashboard from per-architecture CSVs
with four tabs:
- Summary: per-workflow stats cards with AGREE/DISAGREE percentages
- Skip Reasons: breakdown by category with bar charts per workflow
- Failed Tests: table of all FAILED tests across architectures
- All Tests: filterable/searchable table of skipped, missed, failed,
  and new tests with pagination

Dashboard is included in the summary artifact for easy download.
Dashboard All Tests tab now has two status filters (e.g. "All rocm"
and "All cuda") so you can filter by either set independently or
combine both to find specific status pairs like SKIPPED+PASSED.
Dashboard is now a separate artifact with a prominent download link
at the top of the summary, above the artifacts table.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Mar 22, 2026

Jenkins build for 88759200b8abde043046841f17782e969cfd635e commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

ethanwee1 and others added 10 commits March 26, 2026 22:15
…rocm_dist*.txt, baseline_rocm_inductor*.txt)
- Move baseline_sha workflow input to appear right after sha for better UX
- When both SHAs provided, prefix log files with short SHAs (e.g. 5809e41e_rocm1.txt, a4de454b_rocm1.txt)
- When no baseline, preserve original naming (rocm1.txt, baseline_rocm1.txt)
Moves all GitHub project items assigned to ethanwee1 from the previous
sprint to the current sprint on ROCm/projects/18 using the GraphQL API.
Exclude MISSED status from total counts in generate_summary.py.
Tests only present in ROCm get status_cuda=MISSED in the merged CSV,
and since each architecture has different ROCm-only tests, the CUDA
totals were incorrectly inflated by varying amounts per architecture.
This metric is always 0 because --prev_week_csv is never passed in the
workflow. It's also misleading since runs target arbitrary commits, not
a weekly cadence.
Classifies ~6400 tests with "skipIfRocm: Fails with Triton 3.7"
skip messages under the "triton 3.7 bump" category.
Add include_xml checkbox (default off) to control whether raw XML test
reports are included in the artifact zip. XMLs account for ~53% of the
uncompressed artifact size and are only needed for debugging.
@ethanwee1 ethanwee1 force-pushed the add-parity-scripts-dashboard branch from 0758547 to 72e550a Compare March 31, 2026 18:51
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Mar 31, 2026

Jenkins build for 2afdb21c2377908d47346f4da03f34e35bf35250 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

@jithunnair-amd jithunnair-amd changed the title [Develop][CI]Add parity report scripts and workflow [CI]Add parity report scripts and workflow Mar 31, 2026
@jithunnair-amd jithunnair-amd marked this pull request as ready for review March 31, 2026 18:55
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Mar 31, 2026

Jenkins build for 74d0e969363a1b2cacfdf3c3ce33ccf18c199370 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

Keep pytorch-unit-test-scripts as the directory name. Remove scripts
not used by the parity workflow and remove move-sprint-items.sh.

Remaining files: download_testlogs, summarize_xml_testreports.py,
generate_summary.py, auto_classify_skip_reasons.py, upload_test_stats.py,
upload_stats_lib.py, requirements.txt.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Mar 31, 2026

Jenkins build for 74d0e969363a1b2cacfdf3c3ce33ccf18c199370 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Upstream CI workflows for the same SHA can have different created_at
dates when they span midnight, causing create_test_folder to create
separate directories. The ls -dt folder selection then picks the wrong
one (most recently modified), leading to all-zero results.

Fix by adding get_or_create_test_folder() which reuses the first folder
for all subsequent downloads, ensuring all artifacts land in one place.

Also update mi355 job prefix from linux-jammy-rocm-py3.10 to
linux-jammy-rocm-py3.10-mi355 to match current upstream CI job names.
When trunk workflow has no run for a given SHA (e.g. PR commits),
fall back to rocm-mi355 workflow for mi355 default tests. Mirrors
the existing periodic/distributed fallback pattern.
The rocm-mi355 workflow uses linux-noble-rocm-py3.12-mi355 as the job
prefix, not linux-jammy-rocm-py3.10-mi355 (trunk). Without this, log
downloads fail with "TEST KEY DOES NOT EXIST" when the fallback is used.
The inductor-rocm-mi355 workflow uses linux-noble-rocm-py3.12-mi355
as the job prefix, not rocm-py3.12-inductor-mi355.
mi300 inductor: rocm-py3.12-inductor-mi300 -> linux-noble-rocm-py3.12-mi300
navi31 default: linux-jammy-rocm-py3_10 -> linux-jammy-rocm-py3.10-navi31
Remove nightly from auto-exclusion list and add workflow names,
job prefixes, and shard counts for distributed and inductor tests.
Update workflow input descriptions to reflect the change.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented Apr 9, 2026

Jenkins build for 4f726975463f1f5ed41d1299eeffbdbb67907c3d commit finished as SUCCESS
Links: Pipeline Overview / Build artifacts

@jithunnair-amd jithunnair-amd merged commit 79e8877 into develop Apr 10, 2026
0 of 2 checks passed
@jithunnair-amd jithunnair-amd deleted the add-parity-scripts-dashboard branch April 10, 2026 14:53
@rocm-repo-management-api
Copy link
Copy Markdown

Jenkins build for f1481642031550c5c461b6ce347a6ac8aaced87b commit is in progress
Links: Pipeline Overview / Build artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants