refactor(deployment): replace {{ now }} annotation with values hash#6433
refactor(deployment): replace {{ now }} annotation with values hash#6433theosanderson wants to merge 1 commit into
{{ now }} annotation with values hash#6433Conversation
Replace the `timestamp: {{ now | quote }}` annotation on every Deployment
template with `valuesHash: {{ .Values | toYaml | sha256sum }}`.
`{{ now }}` changes on every render, so each ArgoCD sync re-rolls every
pod even when nothing has actually changed. Hashing the values means
pods only roll when the chart's input values actually differ, while
still triggering a restart whenever a real change is deployed.
Refs #6431.
https://claude.ai/code/session_01SevbUTUMLU9wGCYw6f1shL
|
For CI/previews, we want all pods to roll whenever anything changes - any container image, values etc. The reason we saw the keycloak issue is as Claude says that we don't have the annotation on the keycloak pod. We can switch to hashing values but then we also need to add the hash to keycloak - which isn't happening in this PR AFAICT. One potential issue in particular for prod: the values are not all state. There's some stuff that can change without For most rollouts in prod, Main benefit of switching away from For sparse rollouts where we only roll pods that need to, we could hash relevant subtrees of values but that seems a lot of work. We might want to use We should probably also add the hash annotation to keycloak, at least for previews like done in #6432 |
Keycloak is annotated with the Loculus commit tag so it does restart on rollouts - not saying we shouldn't do something else as well. The downside is that it logs people out on PPX (but that will be happening anyway when we bump Loculus). |
Secrets are values. But yep, this is a good reason to version external resources. |
Summary
Replaces the
timestamp: {{ now | quote }}pod template annotation on every Deployment in the Helm chart withvaluesHash: {{ .Values | toYaml | sha256sum }}.Motivation
Refs #6431.
{{ now }}re-evaluates on every Helm render, so:helm diff/ dry-runs are never clean — there is always at least one phantom field change per Deployment.{{ now }}annotations in the same chart are guaranteed to disagree by a few milliseconds, which is exactly the class of bug behind The reason we getUnexpected error when handling authentication request to identity provideron dev instances #6431: the dev Keycloak DB Deployment and the Keycloak Deployment had separate restart triggers and ended up out of sync after a redeploy with the same image tag, leading to “Unexpected error when handling authentication request to identity provider”.Approach
Hash the chart's input values instead:
Properties of this approach:
values.yaml(image tag, env, replica count, feature flag, …) produces a new hash and triggers a rolling restart.Files changed
The annotation is updated in all 10 Deployment templates that previously used
{{ now }}:autoapprove-deployment.yamldocs-preview.yamlena-submission-deployment.yamlkeycloak-database-standin.yamllapis-deployment.yamlloculus-backend.yamlloculus-preprocessing-deployment.yamlloculus-website.yamlsilo-deployment.yamltaxonomy-deployment.yamlBehaviour change to be aware of
Previously, any sync would restart all pods. After this change, a sync with unchanged values will not restart pods. This is the intended behaviour, but it means workflows that relied on “re-syncing to get a fresh pod” (e.g. to refresh a
latest-tagged image, or to recover from an in-cluster issue) will need to be done explicitly viakubectl rollout restartor by bumping a value.For Keycloak specifically (the #6431 root cause), the Keycloak DB standin and Keycloak Deployment now share an annotation source. If we deploy something that wipes the dev DB, the hash will change for both, so Keycloak will roll alongside its DB.
Test plan
helm lint kubernetes/loculus -f kubernetes/loculus/values.yamlpasseshelm templaterenders without errors and the annotation appears as a 64-char hex string on each DeploymentUnexpected error when handling authentication request to identity provideron dev instances #6431 no longer occurs after a same-tag redeployhttps://claude.ai/code/session_01SevbUTUMLU9wGCYw6f1shL
Generated by Claude Code
🚀 Preview: Add
previewlabel to enable