OCPBUGS-75010: Change how DCM identifies state changes#725
Conversation
Router's dynamic update code (DCM) is identifying when state updates should be called via API by comparing the current running state, calling the API only if the state differs. This is not working on all possible states, e.g. server DOWN due to failing health check is being identified as MAINT, making the API call to be skipped. Changed the approach by leaving the current state empty, and the code will always send the API call in case there is a need to be updated, despite on what is the current state.
|
@jcmoraisjr: This pull request references NE-2477 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jcmoraisjr: This pull request references Jira Issue OCPBUGS-75010, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jcmoraisjr: This pull request references Jira Issue OCPBUGS-75010, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
tested it with 4.22.0-0-2026-02-11-011156-test-ci-ln-khb4wwt-latest |
|
/assign @bentito |
|
Tested it again with 4.22.0-0-2026-02-27-082328-test-ci-ln-q0disgt-latest |
|
/label qe-approved |
|
@jcmoraisjr: This pull request references Jira Issue OCPBUGS-75010, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ShudiLi: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
The change elegantly solves the issue. By always explicitly sending the updated server state when an update is requested, we no longer rely on parsing the HAProxy internal administrative state (which previously caused the router to skip sending the state maint command if it identified a down server as already being in maint). I've verified that UpdateServerState is only called when endpoint updates are processed, so there's no unnecessary API overhead.
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bentito The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@jcmoraisjr: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@jcmoraisjr: Jira Issue Verification Checks: Jira Issue OCPBUGS-75010 Jira Issue OCPBUGS-75010 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.22.0-0.nightly-2026-03-11-034211 |
Router's dynamic update code (DCM) is identifying when state updates should be called via API by comparing the current running state, calling the API only if the state differs. This is not working on all possible states, e.g. server DOWN due to failing health check is being identified as MAINT, making the API call to be skipped.
Changed the approach by leaving the current state empty, and the code will always send the API call in case there is a need to be updated, despite on what is the current state.
Jira: https://issues.redhat.com/browse/OCPBUGS-75010