Skip to content

[staging-test] Pin arc-staging to PR build for HUD listener changes#578

Draft
huydhn wants to merge 1 commit into
pytorch:mainfrom
huydhn:test-arc-hud-listener-changes-on-staging
Draft

[staging-test] Pin arc-staging to PR build for HUD listener changes#578
huydhn wants to merge 1 commit into
pytorch:mainfrom
huydhn:test-arc-hud-listener-changes-on-staging

Conversation

@huydhn
Copy link
Copy Markdown
Contributor

@huydhn huydhn commented May 16, 2026

Wires up arc-staging to exercise jeanschmidt/actions-runner-controller#5 (per-scale-set runnerLabels, 60s default interval, startup jitter).

  • osdc/clusters.yaml — add arc.image_repository + arc.image_tag under arc-staging only. Pins the controller/listener container to a local Harbor build of the PR branch. chart_version stays on jeanschmidt.9 because the chart contents are unchanged (only the image differs). Operator: bump image_tag each time you push a new build to Harbor; revert this block once the upstream chart bumps to jeanschmidt.10+.

  • osdc/modules/arc-runners/templates/runner.yaml.tpl — drop the explicit CAPACITY_AWARE_RECALCULATE_INTERVAL: "30s" env var so the listener's compiled-in default wins. Old chart (jeanschmidt.9): default is 30s (no change). New chart with PR#5: default is 60s plus startup jitter.

    The HUD URL is intentionally kept as-is. The new code overwrites the parameters query at request time, so it doesn't matter for the new build; meanwhile old-image clusters still parse the URL verbatim and rely on queuedThresholdMinutes=0 being there. Stripping the suffix would silently flip prod to the server-side default of 30 min.

  • osdc/docs/arc-fork-build-deploy.md — update the env-var table to reflect the new "unset" template value and document the jitter.

DO NOT MERGE until the PR build image has been pushed to Harbor and the image_tag has been bumped to that build's tag.

Wires up arc-staging to exercise jeanschmidt/actions-runner-controller#5
(per-scale-set runnerLabels, 60s default interval, startup jitter).

* osdc/clusters.yaml — add `arc.image_repository` + `arc.image_tag` under
  `arc-staging` only. Pins the controller/listener container to a local
  Harbor build of the PR branch. chart_version stays on jeanschmidt.9
  because the chart contents are unchanged (only the image differs).
  Operator: bump `image_tag` each time you push a new build to Harbor;
  revert this block once the upstream chart bumps to jeanschmidt.10+.

* osdc/modules/arc-runners/templates/runner.yaml.tpl — drop the explicit
  `CAPACITY_AWARE_RECALCULATE_INTERVAL: "30s"` env var so the listener's
  compiled-in default wins. Old chart (jeanschmidt.9): default is 30s
  (no change). New chart with PR#5: default is 60s plus startup jitter.

  The HUD URL is intentionally kept as-is. The new code overwrites the
  `parameters` query at request time, so it doesn't matter for the new
  build; meanwhile old-image clusters still parse the URL verbatim and
  rely on `queuedThresholdMinutes=0` being there. Stripping the suffix
  would silently flip prod to the server-side default of 30 min.

* osdc/docs/arc-fork-build-deploy.md — update the env-var table to
  reflect the new "unset" template value and document the jitter.

DO NOT MERGE until the PR build image has been pushed to Harbor and the
image_tag has been bumped to that build's tag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant