Skip to content

fix(controller): handle long rule names in bootstrap annotation keys#224

Open
vishnukothakapu wants to merge 1 commit into
kubernetes-sigs:mainfrom
vishnukothakapu:fix-annotation-key-length
Open

fix(controller): handle long rule names in bootstrap annotation keys#224
vishnukothakapu wants to merge 1 commit into
kubernetes-sigs:mainfrom
vishnukothakapu:fix-annotation-key-length

Conversation

@vishnukothakapu
Copy link
Copy Markdown

Description

This PR fixes a bug where NodeReadinessRule resources with long names (longer than 43 characters) caused the controller to fail when patching Node annotations. Kubernetes strictly limits the name part of an annotation key to 63 characters. Since our key pattern was readiness.k8s.io/bootstrap-completed-<rule-name>, long rule names resulted in invalid keys.

I introduced a helper function getBootstrapAnnotationKey that deterministically hashes the rule name using MD5 when it exceeds the length limit, ensuring the final key is always valid.

Related Issue

Fixes #223

Type of Change

/kind bug

Testing

  • Added internal/controller/helper_unit_test.go: Pure unit tests covering short, medium, and very long name scenarios to verify deterministic hashing and length compliance.
  • Added internal/controller/node_controller_reproduction_test.go: Reproduction test case that confirms the controller can now successfully reconcile rules with long names.
  • Verified with go test and go vet.

Checklist

  • make test passes
  • make lint passes

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 7, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented May 7, 2026

Deploy Preview for node-readiness-controller canceled.

Name Link
🔨 Latest commit 261f03d
🔍 Latest deploy log https://app.netlify.com/projects/node-readiness-controller/deploys/6a040ed4f962790008481603

@k8s-ci-robot k8s-ci-robot requested a review from mrunalp May 7, 2026 08:45
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vishnukothakapu
Once this PR has been reviewed and has the lgtm label, please assign ajaysundark for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from tallclair May 7, 2026 08:45
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @vishnukothakapu!

It looks like this is your first PR to kubernetes-sigs/node-readiness-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/node-readiness-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 7, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @vishnukothakapu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 7, 2026
Comment thread internal/controller/helper.go Outdated
Comment thread internal/controller/helper.go Outdated
@vishnukothakapu vishnukothakapu force-pushed the fix-annotation-key-length branch from 538cca4 to d42de51 Compare May 7, 2026 13:08
@ajaysundark ajaysundark self-requested a review May 9, 2026 13:02
@ajaysundark
Copy link
Copy Markdown
Contributor

Thanks for catching this. My only thoughts on this is that it takes away the human observability on this when a bootsrap-rule is done. :/

// 20 (prefix) + 32 (hash) = 52 characters.
namePart = hex.EncodeToString(hash[:16])
}
return fmt.Sprintf("%s%s", bootstrapAnnotationPrefix, namePart)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 should we go with a hybrid approach something like below

availableSpace := maxK8sAnnotationNameLen - len(bootstrapAnnotationPrefix)

const hashLen = 8
humanPartLen := availableSpace - hashLen - 1

hashBytes := sha256.Sum256([]byte(ruleName))
shortHash := hex.EncodeToString(hashBytes[:])[:hashLen]

namePart := fmt.Sprintf("%s-%s", ruleName[:humanPartLen], shortHash)


return fmt.Sprintf("%s%s", bootstrapAnnotationPrefix, namePart)

this way we keep the ruleName

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a lot of options to handle this. I added some different approach as well. We could discuss further over the meeting, feel free to join if you are available, @vishnukothakapu

@ajaysundark
Copy link
Copy Markdown
Contributor

The annotation restrictions are <dns-prefix: 253>/<name: 63>; metadata.name can also be 253 chars in length.

Hashing the rule name is one option, couple of alternatives to consider:

  1. restricting the length of the rule-name to 63 (we can also move the "bootstrap-completed" to the value?)
  2. alternatively, safer to use a standard key like "readiness.k8s.io/rule-status" and go with a json payload for value. Value doesnt seem to have a restriction. it'll allow us to capture the rule-name fully inside the payload.

We need to evaluate the pros/cons on the implementation. @vishnukothakapu Do you want to evaluate the alternatives and propose a plan here?

@vishnukothakapu
Copy link
Copy Markdown
Author

Thanks for catching this. My only thoughts on this is that it takes away the human observability on this when a bootsrap-rule is done. :/

Good point. I agree the full hash reduces readability during debugging. I’ll explore the hybrid approach with a readable prefix + short hash and compare it with the other alternatives discussed.

The annotation restrictions are <dns-prefix: 253>/<name: 63>; metadata.name can also be 253 chars in length.

Hashing the rule name is one option, couple of alternatives to consider:

  1. restricting the length of the rule-name to 63 (we can also move the "bootstrap-completed" to the value?)
  2. alternatively, safer to use a standard key like "readiness.k8s.io/rule-status" and go with a json payload for value. Value doesnt seem to have a restriction. it'll allow us to capture the rule-name fully inside the payload.

We need to evaluate the pros/cons on the implementation. @vishnukothakapu Do you want to evaluate the alternatives and propose a plan here?

Thanks @ajaysundark , these are good points. I’ll evaluate the tradeoffs between the current hashing approach, the hybrid readable-prefix approach, and the single annotation JSON payload design, then propose a direction based on readability, implementation complexity, and backward compatibility.

@ajaysundark
Copy link
Copy Markdown
Contributor

/assign @vishnukothakapu

@vishnukothakapu
Copy link
Copy Markdown
Author

Hi @AvineshTripathi & @ajaysundark,
Thanks for the suggestion! I decided to go with the hybrid approach (truncated-name + short-hash) instead of the JSON payload approach because it preserves Kubernetes native selector support while still avoiding annotation length issues.
This keeps the annotations readable for debugging, ensures uniqueness with a deterministic hash, and stays compatible with the current architecture and existing workflows. I have updated the implementation and adjusted the related unit and integration tests accordingly.

@vishnukothakapu vishnukothakapu force-pushed the fix-annotation-key-length branch from 87864cf to 261f03d Compare May 13, 2026 05:40
@ajaysundark
Copy link
Copy Markdown
Contributor

instead of the JSON payload approach because it preserves Kubernetes native selector support

Could you clarify your thoughts further on this? What are the downsides of using a json payload. This is also how kubectl saves last applied configurations in objects today - ref: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#how-to-create-objects

@AvineshTripathi / @Karthik-K-N I think fixing this short-term with a hash based approach for length immunity doesnt feel right. A more reliable long term solution would be to maintain the rule-status inside a JSON payload to track individual rule evaluation data. It would also address concerns such as #247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Annotation key length limit exceeded for long NodeReadinessRule names

5 participants