Skip to content

refactor(deployment): replace {{ now }} annotation with values hash#6433

Draft
theosanderson wants to merge 1 commit into
mainfrom
claude/explore-now-alternatives-SE79V
Draft

refactor(deployment): replace {{ now }} annotation with values hash#6433
theosanderson wants to merge 1 commit into
mainfrom
claude/explore-now-alternatives-SE79V

Conversation

@theosanderson
Copy link
Copy Markdown
Member

@theosanderson theosanderson commented May 13, 2026

Summary

Replaces the timestamp: {{ now | quote }} pod template annotation on every Deployment in the Helm chart with valuesHash: {{ .Values | toYaml | sha256sum }}.

Motivation

Refs #6431.

{{ now }} re-evaluates on every Helm render, so:

  • Every ArgoCD sync rolls every pod, even when nothing has actually changed.
  • helm diff / dry-runs are never clean — there is always at least one phantom field change per Deployment.
  • Two different {{ now }} annotations in the same chart are guaranteed to disagree by a few milliseconds, which is exactly the class of bug behind The reason we get Unexpected error when handling authentication request to identity provider on dev instances #6431: the dev Keycloak DB Deployment and the Keycloak Deployment had separate restart triggers and ended up out of sync after a redeploy with the same image tag, leading to “Unexpected error when handling authentication request to identity provider”.

Approach

Hash the chart's input values instead:

annotations:
  valuesHash: {{ .Values | toYaml | sha256sum }}

Properties of this approach:

  • Stable: identical inputs produce the same hash, so re-syncing without changes does not roll pods.
  • Sensitive: any change to values.yaml (image tag, env, replica count, feature flag, …) produces a new hash and triggers a rolling restart.
  • Consistent across templates: every Deployment annotated this way will agree on whether a roll is needed — no more drift between e.g. the Keycloak DB and Keycloak itself.

Files changed

The annotation is updated in all 10 Deployment templates that previously used {{ now }}:

  • autoapprove-deployment.yaml
  • docs-preview.yaml
  • ena-submission-deployment.yaml
  • keycloak-database-standin.yaml
  • lapis-deployment.yaml
  • loculus-backend.yaml
  • loculus-preprocessing-deployment.yaml
  • loculus-website.yaml
  • silo-deployment.yaml
  • taxonomy-deployment.yaml

Behaviour change to be aware of

Previously, any sync would restart all pods. After this change, a sync with unchanged values will not restart pods. This is the intended behaviour, but it means workflows that relied on “re-syncing to get a fresh pod” (e.g. to refresh a latest-tagged image, or to recover from an in-cluster issue) will need to be done explicitly via kubectl rollout restart or by bumping a value.

For Keycloak specifically (the #6431 root cause), the Keycloak DB standin and Keycloak Deployment now share an annotation source. If we deploy something that wipes the dev DB, the hash will change for both, so Keycloak will roll alongside its DB.

Test plan

https://claude.ai/code/session_01SevbUTUMLU9wGCYw6f1shL


Generated by Claude Code

🚀 Preview: Add preview label to enable

Replace the `timestamp: {{ now | quote }}` annotation on every Deployment
template with `valuesHash: {{ .Values | toYaml | sha256sum }}`.

`{{ now }}` changes on every render, so each ArgoCD sync re-rolls every
pod even when nothing has actually changed. Hashing the values means
pods only roll when the chart's input values actually differ, while
still triggering a restart whenever a real change is deployed.

Refs #6431.

https://claude.ai/code/session_01SevbUTUMLU9wGCYw6f1shL
@claude claude Bot added the deployment Code changes targetting the deployment infrastructure label May 13, 2026
@theosanderson theosanderson added the preview Triggers a deployment to argocd label May 14, 2026
@anna-parker anna-parker removed the preview Triggers a deployment to argocd label May 19, 2026
@corneliusroemer
Copy link
Copy Markdown
Contributor

corneliusroemer commented May 21, 2026

For CI/previews, we want all pods to roll whenever anything changes - any container image, values etc. The reason we saw the keycloak issue is as Claude says that we don't have the annotation on the keycloak pod. We can switch to hashing values but then we also need to add the hash to keycloak - which isn't happening in this PR AFAICT.

One potential issue in particular for prod: the values are not all state. There's some stuff that can change without Values changing like secrets or external resources downloaded by config processor.

For most rollouts in prod, Values should change, so issues should be rare, but it's something to keep in mind.

Main benefit of switching away from now is that we don't get the unnecessary rollouts every 24hr.

For sparse rollouts where we only roll pods that need to, we could hash relevant subtrees of values but that seems a lot of work.

We might want to use toJson | sha256sum | quote instead of toYaml | sha256sum. Quoting for type safety and json due to being more compact and deterministic.

We should probably also add the hash annotation to keycloak, at least for previews like done in #6432

@theosanderson
Copy link
Copy Markdown
Member Author

The reason we saw the keycloak issue is as Claude says that we don't have the annotation on the keycloak pod. We can switch to hashing values but then we also need to add the hash to keycloak - which isn't happening in this PR AFAICT.

Keycloak is annotated with the Loculus commit tag so it does restart on rollouts - not saying we shouldn't do something else as well. The downside is that it logs people out on PPX (but that will be happening anyway when we bump Loculus).

@theosanderson
Copy link
Copy Markdown
Member Author

theosanderson commented May 21, 2026

One potential issue in particular for prod: the values are not all state. There's some stuff that can change without Values changing like secrets or external resources downloaded by config processor.

Secrets are values. But yep, this is a good reason to version external resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployment Code changes targetting the deployment infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants