Skip to content

cloud-build-docker: stop clobbering :latest on PR builds#33

Merged
jwbron merged 1 commit into
mainfrom
jwies-fix-cache-tag-clobber
May 19, 2026
Merged

cloud-build-docker: stop clobbering :latest on PR builds#33
jwbron merged 1 commit into
mainfrom
jwies-fix-cache-tag-clobber

Conversation

@jwbron
Copy link
Copy Markdown
Contributor

@jwbron jwbron commented May 19, 2026

Summary

The cloud-build-docker module's cloudbuild.yml has an explicit "Tag cache image" + "Push cache image" pair of steps that unconditionally pushes the freshly-built image under $_IMAGE_NAME:$_CACHE_TAG. That tag is chosen as a read-side fallback by build_image.py:get_effective_cache_tag() — it falls back to "latest" whenever the caller's requested image_tag_suffix doesn't yet exist in Artifact Registry. So a first-time PR build runs with:

_IMAGE_TAG  = $_IMAGE_NAME:revert-nfs   (intended: publish under :revert-nfs)
_CACHE_TAG  = latest                    (read-side fallback)

…and the cache push step then writes the PR's image content to $_IMAGE_NAME:latest, overwriting whatever master last pushed there.

The Tag/Push cache steps are redundant with the existing images: block, which already publishes $_IMAGE_TAG — canonical (master) builds still update :latest correctly via images:. The only effect of the removed steps was the clobber. $_CACHE_TAG continues to be used for the read side (--cache-from).

Why this matters

This bug caused a perpetual drift loop on Khan/internal-services's GitHub Actions Runner terraform config. The data.external "build_image" data source reads :latest's digest into the plan; when concurrent PR + master plan workflows each pushed different image content to :latest, the apply workflow's chained re-plan kept seeing the runner image digest "change" even when nothing had actually been redeployed, opening repeated terraform-plan PRs.

Once this is merged, internal-services bumps ref=cloud-build-docker-v0.2.0 (or v0.3.0) to the new tag.

Test plan

  • Tag a new release (e.g. cloud-build-docker-v0.4.0) after merging.
  • In a consumer repo, bump the module ref= and run a PR plan against a branch whose tag doesn't yet exist in AR; confirm Cloud Build pushes only $_IMAGE_TAG (the branch tag) and does NOT also push :latest.
  • Subsequent master push plan should be the only thing that updates :latest.

The Tag/Push cache steps unconditionally retagged the freshly-built
image under $_IMAGE_NAME:$_CACHE_TAG and pushed it. That is fine when
$_IMAGE_TAG and $_IMAGE_NAME:$_CACHE_TAG resolve to the same tag
(canonical/master builds, where both are :latest), but breaks on any
first-time PR build:

  build_image.py:get_effective_cache_tag() falls back to "latest" when
  the requested image_tag_suffix tag does not yet exist in Artifact
  Registry. So a first-time PR build runs with
    _IMAGE_TAG    = $_IMAGE_NAME:revert-nfs    (correct)
    _CACHE_TAG    = latest                     (read-side fallback)
  The Push cache step then publishes the PR's image content under
  $_IMAGE_NAME:latest, overwriting whatever master last pushed there.

In Khan/internal-services this manifested as a perpetual drift loop on
the GitHub Actions Runner terraform config: each PR plan and the master
push plan pushed different content to :latest, and the apply workflow's
chained re-plan kept seeing the runner image digest "change" even
though nothing had actually been redeployed.

The Tag/Push cache steps are redundant with the `images:` block — that
block already publishes $_IMAGE_TAG, so canonical (master) builds still
update :latest correctly. The only effect of the removed steps was the
clobber. $_CACHE_TAG continues to serve its read-side purpose in
--cache-from.
@jwbron jwbron requested a review from csilvers May 19, 2026 20:06
Copy link
Copy Markdown
Member

@csilvers csilvers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems right to me!

@jwbron jwbron merged commit ab60540 into main May 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants