From 9b9c9dd1c3b7e5c7b7fe3a5f7be7490de37481be Mon Sep 17 00:00:00 2001 From: James Wiesebron Date: Tue, 19 May 2026 12:48:09 -0700 Subject: [PATCH] cloud-build-docker: stop clobbering :latest on PR builds MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Tag/Push cache steps unconditionally retagged the freshly-built image under $_IMAGE_NAME:$_CACHE_TAG and pushed it. That is fine when $_IMAGE_TAG and $_IMAGE_NAME:$_CACHE_TAG resolve to the same tag (canonical/master builds, where both are :latest), but breaks on any first-time PR build: build_image.py:get_effective_cache_tag() falls back to "latest" when the requested image_tag_suffix tag does not yet exist in Artifact Registry. So a first-time PR build runs with _IMAGE_TAG = $_IMAGE_NAME:revert-nfs (correct) _CACHE_TAG = latest (read-side fallback) The Push cache step then publishes the PR's image content under $_IMAGE_NAME:latest, overwriting whatever master last pushed there. In Khan/internal-services this manifested as a perpetual drift loop on the GitHub Actions Runner terraform config: each PR plan and the master push plan pushed different content to :latest, and the apply workflow's chained re-plan kept seeing the runner image digest "change" even though nothing had actually been redeployed. The Tag/Push cache steps are redundant with the `images:` block — that block already publishes $_IMAGE_TAG, so canonical (master) builds still update :latest correctly. The only effect of the removed steps was the clobber. $_CACHE_TAG continues to serve its read-side purpose in --cache-from. --- .../modules/cloud-build-docker/cloudbuild.yml | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/terraform/modules/cloud-build-docker/cloudbuild.yml b/terraform/modules/cloud-build-docker/cloudbuild.yml index e797a69..da8e6bb 100644 --- a/terraform/modules/cloud-build-docker/cloudbuild.yml +++ b/terraform/modules/cloud-build-docker/cloudbuild.yml @@ -32,17 +32,11 @@ steps: --build-arg BASE_IMAGE="$_BASE_DIGEST" \ . -- name: 'gcr.io/cloud-builders/docker' - id: Tag cache image - entrypoint: bash - args: ['-c', 'docker tag "$_IMAGE_TAG" "$_IMAGE_NAME:$_CACHE_TAG"'] - waitFor: ['Build image with BuildKit'] - -- name: 'gcr.io/cloud-builders/docker' - id: Push cache image - entrypoint: bash - args: ['-c', 'docker push "$_IMAGE_NAME:$_CACHE_TAG"'] - waitFor: ['Tag cache image'] - +# Only push the build's own tag ($_IMAGE_TAG). Do NOT also push under +# $_IMAGE_NAME:$_CACHE_TAG: that tag is chosen as a *read* fallback by +# build_image.py (it falls back to "latest" when the requested tag does +# not yet exist), and pushing the build under it would clobber whatever +# is currently at that tag. In practice this would let a first-time PR +# build overwrite the master image at :latest with its own content. images: - '$_IMAGE_TAG'