Skip to content

fix: reconciliation loop in taskspawner controller#864

Open
gjkim42 wants to merge 2 commits intomainfrom
fix-ghproxy
Open

fix: reconciliation loop in taskspawner controller#864
gjkim42 wants to merge 2 commits intomainfrom
fix-ghproxy

Conversation

@gjkim42
Copy link
Copy Markdown
Collaborator

@gjkim42 gjkim42 commented Mar 31, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:

The taskspawner controller was in an infinite reconciliation loop, constantly updating deployments and cronjobs even when nothing changed. This caused repeated pod restarts which exhausted the GitHub API rate limit, breaking all spawner discovery.

Root cause: reflect.DeepEqual was used to compare init containers and volumes between the desired spec (from the builder) and the live object (from the API server). Kubernetes normalizes objects by adding defaulted fields that the builder doesn't set:

  • Init containers: securityContext, terminationMessagePath, terminationMessagePolicy
  • Volumes: Secret.DefaultMode (defaults to 420)

This meant every reconciliation detected a "change" and called Update, which triggered another reconciliation.

Fix: Replace reflect.DeepEqual with semantic comparison functions (initContainersEqual, volumesEqual) that only compare the fields the builder explicitly sets — matching the pattern already used for main container comparison and resourceRequirementsEqual.

Which issue(s) this PR is related to:

N/A

Special notes for your reviewer:

The comparison bug was pre-existing but was amplified by #859's new Task watcher, which added more reconciliation triggers. The fix applies to both updateDeployment() and updateCronJob().

Does this PR introduce a user-facing change?

NONE

Summary by cubic

Fixes an infinite reconciliation loop in the taskspawner controller by using semantic comparisons for init containers and volumes. This stops unnecessary Deployment/CronJob updates, prevents pod restarts, and avoids GitHub API rate limit exhaustion.

  • Bug Fixes
    • Compare only builder-set fields for init containers and volumes, ignoring Kubernetes defaults (securityContext, terminationMessage*, Secret.DefaultMode).
    • Treat empty init container ImagePullPolicy as matching any API-defaulted value to avoid false drift.
    • Apply the comparison in both Deployment and CronJob reconciliation paths.
    • Add tests to ensure defaults (including pull policy) are ignored and real changes are detected.

Written for commit a17fcdd. Summary will update on new commits.

…ciler

reflect.DeepEqual on init containers and volumes always returned false
because Kubernetes adds defaulted fields (securityContext,
terminationMessagePath, Secret.DefaultMode) that the builder does not
set. This caused an infinite reconciliation loop where every reconcile
updated the deployment, which triggered another reconcile. The constant
pod restarts exhausted the GitHub API rate limit.

Replace reflect.DeepEqual with initContainersEqual and volumesEqual that
compare only the fields the builder explicitly sets.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

When TokenRefresherImagePullPolicy is left at its default empty value
(the Helm chart default), the API server persists a concrete policy
(Always/IfNotPresent). The verbatim comparison would still detect drift
on every reconcile. Skip the comparison when either side is empty.
@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented Mar 31, 2026

/kelow review

@gjkim42
Copy link
Copy Markdown
Collaborator Author

gjkim42 commented Mar 31, 2026

/kelos review

Copy link
Copy Markdown
Collaborator Author

@gjkim42 gjkim42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/kelos review

@kelos-bot
Copy link
Copy Markdown

kelos-bot bot commented Mar 31, 2026

🤖 Kelos Task Status

Task kelos-reviewer-864 has succeeded. ✅

Copy link
Copy Markdown

@kelos-bot kelos-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Kelos Reviewer Agent @gjkim42

Review Summary

Verdict: APPROVE
Scope: Replace reflect.DeepEqual with semantic comparisons for init containers and volumes to break reconciliation loop

Findings

Correctness

  • The fix correctly addresses the root cause: Kubernetes-defaulted fields (securityContext, terminationMessagePath, terminationMessagePolicy, Secret.DefaultMode) causing reflect.DeepEqual to always report false.
  • initContainersEqual (taskspawner_controller.go:970) compares exactly the fields the builder sets (Name, Image, ImagePullPolicy, RestartPolicy, Env, VolumeMounts, Resources) — verified against taskspawner_deployment_builder.go:159-176.
  • pullPolicyEqual (taskspawner_controller.go:992) correctly treats empty as a wildcard. This avoids a loop when the builder omits ImagePullPolicy and the API server fills in a default, while still detecting intentional policy changes when both sides are non-empty.
  • volumesEqual (taskspawner_controller.go:1002) correctly compares only Name and the key identifier fields (SecretName, ConfigMap.Name), matching what the builder sets at taskspawner_deployment_builder.go:105-118.
  • The fix is applied consistently to both updateDeployment and updateCronJob paths.

Tests

  • Good coverage: ignore-defaults tests (TestInitContainersEqual_IgnoresKubernetesDefaults, TestVolumesEqual_IgnoresSecretDefaultMode) and detect-changes tests (TestInitContainersEqual_DetectsRealChanges, TestVolumesEqual_DetectsRealChanges) exercise the key scenarios.
  • TestInitContainersEqual_IgnoresAPIDefaultedPullPolicy specifically covers the empty-policy edge case.
  • All unit tests pass.

Conventions

  • Commit messages are concise with no PR links.
  • No unrelated changes or unnecessary refactoring.
  • Follows the existing comparison pattern established by resourceRequirementsEqual and equalEnvVars.

Suggestions (optional)

  • initContainersEqual and volumesEqual are tightly coupled to the set of fields the builder populates. If the builder gains new fields (e.g., Command, Args, or a new volume type beyond EmptyDir/Secret/ConfigMap), the comparison functions would silently ignore them. This mirrors how equalEnvVars works elsewhere so it's consistent, but worth keeping in mind during future builder changes.

/kelos needs-input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant