From ee472ee405383ef8adc445afa7d1f2bd393f636c Mon Sep 17 00:00:00 2001 From: Dmitrii Creed Date: Mon, 11 May 2026 21:57:26 +0400 Subject: [PATCH] fix(caddy): default catch-all returns 503 instead of welcome page MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When all Services with a `simple-container.com/caddyfile-entry` annotation for a given Host disappear — for example, a cascade-deletion from a namespace Replace gone wrong — requests fell through to the catch-all `http:// { file_server /etc/caddy/pages }` block and got back HTTP 200 + "Default page" from index.html. External monitoring saw healthy 200s. CDNs and load balancers saw 200s. Pingdom / UptimeRobot / the dashboard everyone trusts saw 200s. The outage was invisible to every layer that wasn't deep-inspecting the response body. PAY-SPACE hit this on 2026-05-10: the migration from SC #230 cascade- deleted the shared parent namespace, every Service annotation for production hosts evaporated, and every domain pointing at the cluster served the Caddy welcome page. The outage was only noticed when a human opened a browser tab. Change: - Default catch-all now uses `respond ... 503 { close }` instead of `file_server /etc/caddy/pages`. - Retry-After: 60 so CDNs back off appropriately and clients know to retry rather than treating 503 as a hard failure. - Cache-Control: no-store so an aggressive cache doesn't pin the 503 state past route recovery. - HTML body still rendered for humans visiting in a browser, but it's now a 503 page that names the problem (missing `simple-container.com/caddyfile-entry` annotation) and tells operators what to check. The literal "Default page" string is gone. Behavior verified by running the Caddy image with the new default block: configured host (Host: example.com) → HTTP 200 unmatched host (Host: support-bot.pay.space) → HTTP 503 Retry-After: 60 Cache-Control: no-store `caddy validate` against the full embedded Caddyfile + new default block + a sample matched site passes clean. The /etc/caddy/pages directory (index.html, 404.html, 502.html, 500.html) is still embedded and used by the `handle_bucket_error` and `handle_server_error` snippets for legitimate per-Service error fallbacks — only the catch-all stopped serving it as a 200. Pairs with #255 (Caddy aggregator dedup) as the two halves of the 2026-05-10 PAY-SPACE outage: dedup keeps the aggregator from crashlooping during a Service transition, this PR keeps the absence of routes loud so it doesn't masquerade as a healthy 200. Signed-off-by: Dmitrii Creed --- pkg/clouds/pulumi/kubernetes/caddy.go | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/pkg/clouds/pulumi/kubernetes/caddy.go b/pkg/clouds/pulumi/kubernetes/caddy.go index 857b6493..825868ec 100644 --- a/pkg/clouds/pulumi/kubernetes/caddy.go +++ b/pkg/clouds/pulumi/kubernetes/caddy.go @@ -90,11 +90,21 @@ func DeployCaddyService(ctx *sdk.Context, caddy CaddyDeployment, input api.Resou } defaultCaddyFileEntryStart := `http:// {` + // Default catch-all serves a hard 503 instead of a static "welcome" page. + // Rationale: when all Services with a `simple-container.com/caddyfile-entry` + // annotation for a given Host vanish (e.g. a cascade-deletion from a + // namespace Replace gone wrong), the request used to fall through to a + // `file_server /etc/caddy/pages` block and respond with HTTP 200 + "Default + // page". External monitoring saw healthy 200s while every backend was gone. + // 503 + Retry-After makes the absence of routes loud: CDNs fail over, + // uptime checks alert, oncall sees it. defaultCaddyFileEntry := ` import gzip - import handle_static - root * /etc/caddy/pages - file_server + header Cache-Control "no-store" + header Retry-After "60" + respond "503 Service Unavailable

503 Service Unavailable

No backend route is configured for this host.

If you are an operator, verify the Service has the simple-container.com/caddyfile-entry annotation and that Caddy has been rolled.

" 503 { + close + } ` // if caddy must respect SSL connections only useSSL := caddy.UseSSL == nil || *caddy.UseSSL