Skip to content

fix: requeue quickly after 409 conflict in HTTPProxy reconcile#169

Closed
drewr wants to merge 1 commit into
mainfrom
fix/httpproxy-409-backoff
Closed

fix: requeue quickly after 409 conflict in HTTPProxy reconcile#169
drewr wants to merge 1 commit into
mainfrom
fix/httpproxy-409-backoff

Conversation

@drewr
Copy link
Copy Markdown
Contributor

@drewr drewr commented May 22, 2026

Summary

Fixes #166.

When a new tunnel (HTTPProxy) is created, concurrent reconciles race to write the child Gateway, HTTPRoute, and EndpointSlice. The resulting 409 Conflict errors were passed back to controller-runtime as plain errors, which applied exponential backoff — after ~15 conflicts in the initial burst, the backoff reached 3-4 minutes. The controller went silent until the next periodic tick.

This was the root cause of the ~3m47s delay between tunnel creation and Programmed=True being set on the HTTPProxy, which is when the UI toggle turns green.

Before

19:33:47Z  HTTPProxy created
19:33:47–52Z  burst of 409 Conflict writes to Gateway + HTTPRoute
19:33:52Z  controller goes silent (exponential backoff accumulated)
19:37:37Z  next tick: Programmed=True set (~3m47s total)

After

Each 409 Conflict returns ctrl.Result{RequeueAfter: 1s} instead of entering the exponential-backoff queue. The conflict resolves within one retry cycle; Programmed=True should be set within seconds of the child resources being accepted.

Changes

  • Named the ctrl.Result return from _ ctrl.Result to result ctrl.Result so the deferred status-update block can set result.RequeueAfter on conflict without joining an error that re-enters the backoff queue.
  • Added apierrors.IsConflict guard in the Gateway, HTTPRoute, and EndpointSlice CreateOrUpdate error paths — returns ctrl.Result{RequeueAfter: retryAfterConflict} (1s), matching the pattern in gateway_dns_controller.go and result.go.
  • Renamed result locals (the controllerutil.OperationResult string type) to opResult to avoid shadowing the named return.
  • Added TestHTTPProxyReconcileConflictRequeue covering all three write paths (Gateway update, HTTPRoute update, status update) with injected 409s.

Testing

go test ./internal/controller/ -run TestHTTPProxyReconcileConflictRequeue -v

All three subtests pass; full controller suite clean.

Related

When a new HTTPProxy is created, concurrent reconciles race to write the
child Gateway, HTTPRoute, and EndpointSlice resources. The resulting 409
Conflict errors were returned as plain errors to controller-runtime, which
applied exponential backoff. After ~15 conflicts in the initial burst the
backoff reached 3-4 minutes, silencing the controller until the next
periodic tick.

This was observed on tunnel creation: the UI toggle stayed grey for ~3m47s
before Programmed=True was set on the HTTPProxy (network-services-operator#166).

Fix: replace the exponential-backoff path for 409 Conflict errors on child
resource updates and on the HTTPProxy status update with an explicit
RequeueAfter of retryAfterConflict (1s). This matches the pattern already
used in gateway_dns_controller.go and result.go.

Changes:
- Rename the unnamed ctrl.Result return to a named 'result' variable so the
  deferred status-update block can set result.RequeueAfter on conflict
  without joining an error that would re-enter the backoff queue
- Rename controllerutil.OperationResult locals from 'result' to 'opResult'
  to avoid shadowing the named return
- Add IsConflict guard in the Gateway, HTTPRoute, and EndpointSlice
  CreateOrUpdate error paths
- Add TestHTTPProxyReconcileConflictRequeue covering all three write paths
@drewr drewr marked this pull request as ready for review May 22, 2026 20:43
@drewr drewr requested review from savme and scotwells May 22, 2026 20:43
@drewr
Copy link
Copy Markdown
Contributor Author

drewr commented May 22, 2026

@savme Seeing if this passes the smell test on some issues I've been having lately with creating tunnels. Thanks!

@scotwells
Copy link
Copy Markdown
Contributor

This feels heavy handed. We need to understand why conflicts are happening before we just throw requeues at the problem. Seems like we should be using server side apply or better conflict resolution.

@drewr
Copy link
Copy Markdown
Contributor Author

drewr commented May 22, 2026

Copy that. Incoming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTPProxy reconcile backs off 3-4 min after 409 conflict burst at tunnel creation

2 participants