fix(deployment): restart keycloak on every deploy when dev DB is wiped#6432
Closed
theosanderson-agent wants to merge 1 commit into
Closed
fix(deployment): restart keycloak on every deploy when dev DB is wiped#6432theosanderson-agent wants to merge 1 commit into
theosanderson-agent wants to merge 1 commit into
Conversation
Preview/dev deploys reset the keycloak DB on each Helm sync (the keycloak-database-standin pod has a `timestamp: now` annotation, and without persistence the pod has no PVC so its data is lost on every restart). The keycloak pod itself however only restarted when the docker tag changed (via the `LOCULUS_VERSION` env var). When a redeploy happened without a version bump the DB was wiped but keycloak kept running with stale internal state, causing the "Unexpected error when handling authentication request to identity provider" errors users were seeing on dev instances (#6431). Add a `timestamp: now` pod annotation to the keycloak Deployment, gated on `runDevelopmentKeycloakDatabase` AND NOT `developmentDatabasePersistence`, so keycloak is recreated on every Helm sync exactly when its DB is. We deliberately do NOT add the timestamp in prod or in persistent dev mode — #4326 removed an unconditional timestamp here because it was logging users out every 24h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
|
now unsure about this |
Contributor
|
What's wrong about restarting keycloak? This is necessary whenever we restart the db in previews - otherwise db tables aren't initialized. So irrespective of using Maybe it already restarts almost always due to keycloakify theme commit changing. But there seem to be edge cases where the db renews but keycloak doesn't. |
5 tasks
Member
|
This made it happen every 24 hrs, and iirc all users were logged out when it was restarted - that's why we removed it |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #6431.
Preview/dev deployments reset the Keycloak DB on every Helm sync — the
loculus-keycloak-databasepod has atimestamp: {{ now }}annotation, and (whendevelopmentDatabasePersistence: false) no PVC, so its data is lost whenever the pod is recreated.The Keycloak pod itself was only restarted when the docker tag changed (via the
LOCULUS_VERSIONenv var added in #4720). So a redeploy with the same image tag (e.g. a config-only sync, or two argo syncs of the same commit) would wipe the DB without restarting Keycloak. Keycloak then ran on with internal/in-memory state pointing at rows that no longer existed, producing theUnexpected error when handling authentication request to identity providererrors.Fix
Add a
timestamp: {{ now }}annotation to the Keycloak pod template, gated onrunDevelopmentKeycloakDatabase && !developmentDatabasePersistence— so the Keycloak Deployment rolls every sync, exactly when its DB is going to be wiped.We deliberately do not restart Keycloak unconditionally — #4326 removed an unconditional timestamp here because it was rolling Keycloak every 24h in production and logging everyone out. The new condition only fires for ephemeral dev/preview DBs, never for prod or for persistent dev DBs.
runDevelopmentKeycloakDatabasedevelopmentDatabasePersistencefalse(prod)truetrue(persistent PVC)truefalse(ephemeral, default for previews)🤖 Generated with Claude Code
🚀 Preview: Add
previewlabel to enable