Skip to content

OCPBUGS-75010: Change how DCM identifies state changes#725

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
jcmoraisjr:NE-2477-dcm-update-state
Mar 10, 2026
Merged

OCPBUGS-75010: Change how DCM identifies state changes#725
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
jcmoraisjr:NE-2477-dcm-update-state

Conversation

@jcmoraisjr
Copy link
Copy Markdown
Member

@jcmoraisjr jcmoraisjr commented Feb 2, 2026

Router's dynamic update code (DCM) is identifying when state updates should be called via API by comparing the current running state, calling the API only if the state differs. This is not working on all possible states, e.g. server DOWN due to failing health check is being identified as MAINT, making the API call to be skipped.

Changed the approach by leaving the current state empty, and the code will always send the API call in case there is a need to be updated, despite on what is the current state.

Jira: https://issues.redhat.com/browse/OCPBUGS-75010

Router's dynamic update code (DCM) is identifying when state updates
should be called via API by comparing the current running state, calling
the API only if the state differs. This is not working on all possible
states, e.g. server DOWN due to failing health check is being identified
as MAINT, making the API call to be skipped.

Changed the approach by leaving the current state empty, and the code
will always send the API call in case there is a need to be updated,
despite on what is the current state.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Feb 2, 2026

@jcmoraisjr: This pull request references NE-2477 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Router's dynamic update code (DCM) is identifying when state updates should be called via API by comparing the current running state, calling the API only if the state differs. This is not working on all possible states, e.g. server DOWN due to failing health check is being identified as MAINT, making the API call to be skipped.

Changed the approach by leaving the current state empty, and the code will always send the API call in case there is a need to be updated, despite on what is the current state.

Jira: https://issues.redhat.com/browse/NE-2477

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcmoraisjr jcmoraisjr changed the title NE-2477: Change how DCM identifies state changes OCPBUGS-75010: Change how DCM identifies state changes Feb 3, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Feb 3, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-75010, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Router's dynamic update code (DCM) is identifying when state updates should be called via API by comparing the current running state, calling the API only if the state differs. This is not working on all possible states, e.g. server DOWN due to failing health check is being identified as MAINT, making the API call to be skipped.

Changed the approach by leaving the current state empty, and the code will always send the API call in case there is a need to be updated, despite on what is the current state.

Jira: https://issues.redhat.com/browse/NE-2477

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-75010, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Router's dynamic update code (DCM) is identifying when state updates should be called via API by comparing the current running state, calling the API only if the state differs. This is not working on all possible states, e.g. server DOWN due to failing health check is being identified as MAINT, making the API call to be skipped.

Changed the approach by leaving the current state empty, and the code will always send the API call in case there is a need to be updated, despite on what is the current state.

Jira: https://issues.redhat.com/browse/OCPBUGS-75010

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@lihongan
Copy link
Copy Markdown

lihongan commented Feb 4, 2026

/retest

@ShudiLi
Copy link
Copy Markdown

ShudiLi commented Feb 11, 2026

tested it with 4.22.0-0-2026-02-11-011156-test-ci-ln-khb4wwt-latest

1.
 % oc get clusterversion
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.22.0-0-2026-02-11-011156-test-ci-ln-khb4wwt-latest   True        False         32m     Cluster version is 4.22.0-0-2026-02-11-011156-test-ci-ln-khb4wwt-latest

2.
% oc get route
NAME          HOST/PORT                                                             PATH   SERVICES      PORT          TERMINATION   WILDCARD
unsec-apach   unsec-apach-default.apps.ci-ln-khb4wwt-76ef8.aws-4.ci.openshift.org          unsec-apach   unsec-apach                 None
% oc get EndpointSlice unsec-apach-42vzq(the port was 9090, on which the backend wasn't listening)
NAME                ADDRESSTYPE   PORTS   ENDPOINTS                AGE
unsec-apach-42vzq   IPv4          9090    10.129.2.8,10.128.2.19   30m

3.
% oc get pods -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE                          NOMINATED NODE   READINESS GATES
appach-server-66b4878747-hfcvb   1/1     Running   0          19m   10.128.2.18   ip-10-0-89-146.ec2.internal   <none>           <none>
appach-server-66b4878747-hsttr   1/1     Running   0          19m   10.131.0.23   ip-10-0-48-149.ec2.internal   <none>           <none>
appach-server-66b4878747-jhzr7   1/1     Running   0          19m   10.129.2.8    ip-10-0-85-191.ec2.internal   <none>           <none>

4.
sh-5.1$ for back in unsec-apach; do echo $back; echo show servers state be_http:default:$back | socat - /var/lib/haproxy/run/haproxy.sock | sed '1d;2s/^# //' | cut -d' ' -f4,6,7,14; done
unsec-apach
srv_name srv_op_state srv_admin_state srv_check_state
pod:appach-server-66b4878747-hfcvb:unsec-apach:unsec-apach:10.128.2.18:9090 0 0 6
pod:appach-server-66b4878747-jhzr7:unsec-apach:unsec-apach:10.129.2.8:9090 0 0 6
pod:appach-server-66b4878747-hsttr:unsec-apach:unsec-apach:10.131.0.23:9090 0 0 6
_dynamic-pod-1 0 5 14

5. scale the deployment to 2
% oc scale deployment appach-server  --replicas=2
deployment.apps/appach-server scaled
% oc get pods -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE                          NOMINATED NODE   READINESS GATES
appach-server-66b4878747-hsttr   1/1     Running   0          22m   10.131.0.23   ip-10-0-48-149.ec2.internal   <none>           <none>
appach-server-66b4878747-jhzr7   1/1     Running   0          22m   10.129.2.8    ip-10-0-85-191.ec2.internal   <none>           <none>

6.
sh-5.1$ for back in unsec-apach; do echo $back; echo show servers state be_http:default:$back | socat - /var/lib/haproxy/run/haproxy.sock | sed '1d;2s/^# //' | cut -d' ' -f4,6,7,14; done
unsec-apach
srv_name srv_op_state srv_admin_state srv_check_state
pod:appach-server-66b4878747-hfcvb:unsec-apach:unsec-apach:10.128.2.18:9090 0 1 14
pod:appach-server-66b4878747-jhzr7:unsec-apach:unsec-apach:10.129.2.8:9090 0 0 6
pod:appach-server-66b4878747-hsttr:unsec-apach:unsec-apach:10.131.0.23:9090 0 0 6
_dynamic-pod-1 0 5 14

7. scale the deployment to 4
% oc scale deployment appach-server  --replicas=4
deployment.apps/appach-server scaled
% oc get pods -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE                          NOMINATED NODE   READINESS GATES
appach-server-66b4878747-dbdgd   1/1     Running   0          20s   10.131.0.24   ip-10-0-48-149.ec2.internal   <none>           <none>
appach-server-66b4878747-f75kt   1/1     Running   0          20s   10.128.2.19   ip-10-0-89-146.ec2.internal   <none>           <none>
appach-server-66b4878747-hsttr   1/1     Running   0          23m   10.131.0.23   ip-10-0-48-149.ec2.internal   <none>           <none>
appach-server-66b4878747-jhzr7   1/1     Running   0          23m   10.129.2.8    ip-10-0-85-191.ec2.internal   <none>           <none>

8.
sh-5.1$ for back in unsec-apach; do echo $back; echo show servers state be_http:default:$back | socat - /var/lib/haproxy/run/haproxy.sock | sed '1d;2s/^# //' | cut -d' ' -f4,6,7,14; done
unsec-apach
srv_name srv_op_state srv_admin_state srv_check_state
pod:appach-server-66b4878747-f75kt:unsec-apach:unsec-apach:10.128.2.19:9090 0 0 6
pod:appach-server-66b4878747-jhzr7:unsec-apach:unsec-apach:10.129.2.8:9090 0 0 6
pod:appach-server-66b4878747-hsttr:unsec-apach:unsec-apach:10.131.0.23:9090 0 1 14
pod:appach-server-66b4878747-dbdgd:unsec-apach:unsec-apach:10.131.0.24:9090 0 1 14
_dynamic-pod-1 0 5 14

9. check the backend of the canary route
sh-5.1$ for back in canary; do echo $back; echo show servers state be_tcp:openshift-ingress-canary:canary | socat - /var/lib/haproxy/run/haproxy.sock | sed '1d;2s/^# //' | cut -d' ' -f4,6,7,14; done
canary
srv_name srv_op_state srv_admin_state srv_check_state
pod:ingress-canary-6k5s2:ingress-canary:8443-tcp:10.128.2.15:8443 2 0 6
pod:ingress-canary-rwbvg:ingress-canary:8443-tcp:10.129.2.12:8443 2 0 6
pod:ingress-canary-dsjmn:ingress-canary:8443-tcp:10.131.0.14:8443 2 0 6
_dynamic-pod-1 0 5 14

@alebedev87
Copy link
Copy Markdown
Contributor

/assign @bentito

@ShudiLi
Copy link
Copy Markdown

ShudiLi commented Feb 27, 2026

Tested it again with 4.22.0-0-2026-02-27-082328-test-ci-ln-q0disgt-latest

1.
sh-5.1$ for back in unsec-apach; do echo $back; echo show servers state be_http:default:$back | socat - /var/lib/haproxy/run/haproxy.sock | sed '1d;2s/^# //' | cut -d' ' -f4,6,7,14; done
unsec-apach
srv_name srv_op_state srv_admin_state srv_check_state
pod:appach-server-66b4878747-wc79f:unsec-apach:unsec-apach:10.128.2.7:8080 0 0 6
pod:appach-server-66b4878747-kwdnw:unsec-apach:unsec-apach:10.128.2.13:8080 2 0 6
pod:appach-server-66b4878747-68qwp:unsec-apach:unsec-apach:10.129.2.10:8080 0 0 6
pod:appach-server-66b4878747-2zd2d:unsec-apach:unsec-apach:10.129.2.15:8080 2 0 6
_dynamic-pod-1 0 4 6

2. start the http service on the server pod, and run the command again
sh-5.1$ 
sh-5.1$ for back in unsec-apach; do echo $back; echo show servers state be_http:default:$back | socat - /var/lib/haproxy/run/haproxy.sock | sed '1d;2s/^# //' | cut -d' ' -f4,6,7,14; done
unsec-apach
srv_name srv_op_state srv_admin_state srv_check_state
pod:appach-server-66b4878747-wc79f:unsec-apach:unsec-apach:10.128.2.7:8080 0 0 6
pod:appach-server-66b4878747-kwdnw:unsec-apach:unsec-apach:10.128.2.13:8080 2 0 6
pod:appach-server-66b4878747-68qwp:unsec-apach:unsec-apach:10.129.2.10:8080 0 0 6
pod:appach-server-66b4878747-2zd2d:unsec-apach:unsec-apach:10.129.2.15:8080 2 0 6
_dynamic-pod-1 2 4 6

3.
% oc get pods                                      
NAME                             READY   STATUS    RESTARTS   AGE
appach-server-66b4878747-2zd2d   1/1     Running   0          51m
appach-server-66b4878747-68qwp   1/1     Running   0          51m
appach-server-66b4878747-9nln6   1/1     Running   0          14m
appach-server-66b4878747-gcqzf   1/1     Running   0          2m58s
appach-server-66b4878747-jkwhk   1/1     Running   0          10m
appach-server-66b4878747-kwdnw   1/1     Running   0          55m
appach-server-66b4878747-m5rvj   1/1     Running   0          2m58s
appach-server-66b4878747-wc79f   1/1     Running   0          55m
appach-server-66b4878747-zmx5g   1/1     Running   0          10m

4.
sh-5.1$ curl http://127.0.0.1:1936/metrics -s   -u admin:admin  | grep appach-server | grep -E "haproxy_server_check_failures_total|haproxy_server_up"
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-2zd2d",route="unsec-apach",server="10.129.2.15:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-68qwp",route="unsec-apach",server="10.129.2.10:8080",service="unsec-apach"} 6
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-9nln6",route="unsec-apach",server="10.131.0.11:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-gcqzf",route="unsec-apach",server="10.129.2.24:8080",service="unsec-apach"} 1
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-jkwhk",route="unsec-apach",server="10.131.0.15:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-kwdnw",route="unsec-apach",server="10.128.2.13:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-m5rvj",route="unsec-apach",server="10.131.0.16:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-wc79f",route="unsec-apach",server="10.128.2.7:8080",service="unsec-apach"} 6
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-zmx5g",route="unsec-apach",server="10.131.0.12:8080",service="unsec-apach"} 2
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-2zd2d",route="unsec-apach",server="10.129.2.15:8080",service="unsec-apach"} 1
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-68qwp",route="unsec-apach",server="10.129.2.10:8080",service="unsec-apach"} 0
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-9nln6",route="unsec-apach",server="10.131.0.11:8080",service="unsec-apach"} 1
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-gcqzf",route="unsec-apach",server="10.129.2.24:8080",service="unsec-apach"} 0
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-jkwhk",route="unsec-apach",server="10.131.0.15:8080",service="unsec-apach"} 1
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-kwdnw",route="unsec-apach",server="10.128.2.13:8080",service="unsec-apach"} 1
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-m5rvj",route="unsec-apach",server="10.131.0.16:8080",service="unsec-apach"} 1
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-wc79f",route="unsec-apach",server="10.128.2.7:8080",service="unsec-apach"} 0
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-zmx5g",route="unsec-apach",server="10.131.0.12:8080",service="unsec-apach"} 0
sh-5.1$ 
sh-5.1$ 
sh-5.1$ for back in unsec-apach; do echo $back; echo show servers state be_http:default:$back | socat - /var/lib/haproxy/run/haproxy.sock | sed '1d;2s/^# //' | cut -d' ' -f4,6,7,14; done
unsec-apach
srv_name srv_op_state srv_admin_state srv_check_state
pod:appach-server-66b4878747-wc79f:unsec-apach:unsec-apach:10.128.2.7:8080 0 0 6
pod:appach-server-66b4878747-kwdnw:unsec-apach:unsec-apach:10.128.2.13:8080 2 0 6
pod:appach-server-66b4878747-68qwp:unsec-apach:unsec-apach:10.129.2.10:8080 0 0 6
pod:appach-server-66b4878747-2zd2d:unsec-apach:unsec-apach:10.129.2.15:8080 2 0 6
pod:appach-server-66b4878747-gcqzf:unsec-apach:unsec-apach:10.129.2.24:8080 0 0 6
pod:appach-server-66b4878747-9nln6:unsec-apach:unsec-apach:10.131.0.11:8080 2 0 6
pod:appach-server-66b4878747-zmx5g:unsec-apach:unsec-apach:10.131.0.12:8080 0 0 6
pod:appach-server-66b4878747-jkwhk:unsec-apach:unsec-apach:10.131.0.15:8080 2 0 6
pod:appach-server-66b4878747-m5rvj:unsec-apach:unsec-apach:10.131.0.16:8080 2 0 6
_dynamic-pod-1 0 5 14

5.
% oc scale deployment appach-server --replicas=6

6.
sh-5.1$ for back in unsec-apach; do echo $back; echo show servers state be_http:default:$back | socat - /var/lib/haproxy/run/haproxy.sock | sed '1d;2s/^# //' | cut -d' ' -f4,6,7,14; done
unsec-apach
srv_name srv_op_state srv_admin_state srv_check_state
pod:appach-server-66b4878747-wc79f:unsec-apach:unsec-apach:10.128.2.7:8080 0 0 6
pod:appach-server-66b4878747-kwdnw:unsec-apach:unsec-apach:10.128.2.13:8080 2 0 6
pod:appach-server-66b4878747-68qwp:unsec-apach:unsec-apach:10.129.2.10:8080 0 0 6
pod:appach-server-66b4878747-2zd2d:unsec-apach:unsec-apach:10.129.2.15:8080 2 0 6
pod:appach-server-66b4878747-gcqzf:unsec-apach:unsec-apach:10.129.2.24:8080 0 0 6
pod:appach-server-66b4878747-9nln6:unsec-apach:unsec-apach:10.131.0.11:8080 0 1 14
pod:appach-server-66b4878747-zmx5g:unsec-apach:unsec-apach:10.131.0.12:8080 0 1 14
pod:appach-server-66b4878747-jkwhk:unsec-apach:unsec-apach:10.131.0.15:8080 2 0 6
pod:appach-server-66b4878747-m5rvj:unsec-apach:unsec-apach:10.131.0.16:8080 0 1 14
_dynamic-pod-1 0 5 14

sh-5.1$ curl http://127.0.0.1:1936/metrics -s   -u admin:admin  | grep appach-server | grep -E "haproxy_server_check_failures_total|haproxy_server_up"
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-2zd2d",route="unsec-apach",server="10.129.2.15:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-68qwp",route="unsec-apach",server="10.129.2.10:8080",service="unsec-apach"} 6
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-9nln6",route="unsec-apach",server="10.131.0.11:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-gcqzf",route="unsec-apach",server="10.129.2.24:8080",service="unsec-apach"} 1
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-jkwhk",route="unsec-apach",server="10.131.0.15:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-kwdnw",route="unsec-apach",server="10.128.2.13:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-m5rvj",route="unsec-apach",server="10.131.0.16:8080",service="unsec-apach"} 0
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-wc79f",route="unsec-apach",server="10.128.2.7:8080",service="unsec-apach"} 6
haproxy_server_check_failures_total{namespace="default",pod="appach-server-66b4878747-zmx5g",route="unsec-apach",server="10.131.0.12:8080",service="unsec-apach"} 2
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-2zd2d",route="unsec-apach",server="10.129.2.15:8080",service="unsec-apach"} 1
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-68qwp",route="unsec-apach",server="10.129.2.10:8080",service="unsec-apach"} 0
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-9nln6",route="unsec-apach",server="10.131.0.11:8080",service="unsec-apach"} 0
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-gcqzf",route="unsec-apach",server="10.129.2.24:8080",service="unsec-apach"} 0
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-jkwhk",route="unsec-apach",server="10.131.0.15:8080",service="unsec-apach"} 1
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-kwdnw",route="unsec-apach",server="10.128.2.13:8080",service="unsec-apach"} 1
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-m5rvj",route="unsec-apach",server="10.131.0.16:8080",service="unsec-apach"} 0
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-wc79f",route="unsec-apach",server="10.128.2.7:8080",service="unsec-apach"} 0
haproxy_server_up{namespace="default",pod="appach-server-66b4878747-zmx5g",route="unsec-apach",server="10.131.0.12:8080",service="unsec-apach"} 0
sh-5.1$ 

7.
% oc get route unsec-apach -oyaml | grep haproxy.health
    router.openshift.io/haproxy.health.check.interval: 0s

8.
% oc get clusterversion
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.22.0-0-2026-02-27-082328-test-ci-ln-q0disgt-latest   True        False         78m     Cluster version is 4.22.0-0-2026-02-27-082328-test-ci-ln-q0disgt-latest

@ShudiLi
Copy link
Copy Markdown

ShudiLi commented Feb 27, 2026

/label qe-approved
/verified by @ShudiLi

@openshift-ci openshift-ci Bot added the qe-approved Signifies that QE has signed off on this PR label Feb 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-75010, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

Details

In response to this:

Router's dynamic update code (DCM) is identifying when state updates should be called via API by comparing the current running state, calling the API only if the state differs. This is not working on all possible states, e.g. server DOWN due to failing health check is being identified as MAINT, making the API call to be skipped.

Changed the approach by leaving the current state empty, and the code will always send the API call in case there is a need to be updated, despite on what is the current state.

Jira: https://issues.redhat.com/browse/OCPBUGS-75010

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from ShudiLi February 27, 2026 11:32
@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Feb 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@ShudiLi: This PR has been marked as verified by @ShudiLi.

Details

In response to this:

/label qe-approved
/verified by @ShudiLi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@bentito bentito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change elegantly solves the issue. By always explicitly sending the updated server state when an update is requested, we no longer rely on parsing the HAProxy internal administrative state (which previously caused the router to skip sending the state maint command if it identified a down server as already being in maint). I've verified that UpdateServerState is only called when endpoint updates are processed, so there's no unnecessary API overhead.

@bentito
Copy link
Copy Markdown
Contributor

bentito commented Mar 9, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2026
@bentito
Copy link
Copy Markdown
Contributor

bentito commented Mar 9, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 9, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bentito

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2026
@jcmoraisjr
Copy link
Copy Markdown
Member Author

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 10, 2026

@jcmoraisjr: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 2e1389e into openshift:master Mar 10, 2026
11 checks passed
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jcmoraisjr: Jira Issue Verification Checks: Jira Issue OCPBUGS-75010
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-75010 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Router's dynamic update code (DCM) is identifying when state updates should be called via API by comparing the current running state, calling the API only if the state differs. This is not working on all possible states, e.g. server DOWN due to failing health check is being identified as MAINT, making the API call to be skipped.

Changed the approach by leaving the current state empty, and the code will always send the API call in case there is a need to be updated, despite on what is the current state.

Jira: https://issues.redhat.com/browse/OCPBUGS-75010

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Copy Markdown
Contributor

Fix included in accepted release 4.22.0-0.nightly-2026-03-11-034211

@jcmoraisjr jcmoraisjr deleted the NE-2477-dcm-update-state branch March 11, 2026 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants