ci: add amd-ci-job-monitor for runner fleet reporting#543
Open
amdfaa wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a scheduled/manual GitHub Actions workflow that inventories recent Actions job execution and generates artifacts (including a “runner fleet report”) intended for import into an AI Frameworks Dashboard.
Changes:
- Added
AMD CI Job Monitorworkflow to discover workflows/jobs, snapshot recent Actions data, and publish per-job and fleet-wide reports as artifacts. - Added
.github/scripts/query_job_status.pyto query Actions runs/jobs (or consume a snapshot) and emit markdown summaries plus runner concurrency/fleet reporting. - Added
.github/scripts/list_jobs.pyand updated.github/runner-config.ymlto support workflow/job discovery and runner metadata enrichment.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
.github/workflows/amd-ci-job-monitor.yml |
Adds a scheduled/dispatch workflow that builds snapshots, per-job reports, and a runner fleet report artifact. |
.github/scripts/query_job_status.py |
Implements GitHub API querying + snapshotting and produces job/runner summary tables. |
.github/scripts/list_jobs.py |
Discovers workflow jobs and generates a matrix/workflow map for the monitor workflow. |
.github/runner-config.yml |
Updates runner label → GPU metadata used to annotate runner fleet reporting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+28
to
+32
| parser.add_argument( | ||
| "--exclude-jobs", | ||
| default="", | ||
| help="Comma-separated job names to skip.", | ||
| ) |
Comment on lines
+83
to
+90
| for job_id in job_ids: | ||
| job_def = jobs_dict.get(job_id) or {} | ||
| raw_name = job_def.get("name") if isinstance(job_def, dict) else None | ||
| if isinstance(raw_name, str) and "${{" not in raw_name: | ||
| display_name = raw_name | ||
| else: | ||
| display_name = job_id | ||
| display_names.append(display_name) |
| default: "" | ||
| type: string | ||
| exclude_jobs: | ||
| description: "Comma-separated job names to exclude" |
Comment on lines
+491
to
+506
| events.sort(key=lambda item: (item[0], item[1])) | ||
| concurrent = 0 | ||
| peak = 0 | ||
| time_weighted_sum = 0.0 | ||
| total_time = 0.0 | ||
| previous_time = events[0][0] | ||
|
|
||
| for timestamp, delta in events: | ||
| if concurrent > 0: | ||
| elapsed = (timestamp - previous_time).total_seconds() | ||
| if elapsed > 0: | ||
| time_weighted_sum += concurrent * elapsed | ||
| total_time += elapsed | ||
| concurrent += delta | ||
| peak = max(peak, concurrent) | ||
| previous_time = timestamp |
Member
|
@amdfaa Can you fix the code style error? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds runner-fleet-report artifact pipeline for AI Frameworks Dashboard import (ported from ROCm/aiter).
Made with Cursor