Remove IPAddress/IPAddressClaim; route single-address via IPPrefixClaim by scotwells · Pull Request #24 · milo-os/ipam

scotwells · 2026-05-21T22:10:06Z

Summary

Removes IPAddress and IPAddressClaim as distinct resource types (breaking change — no existing consumers). Single-address allocation is now IPPrefixClaim with prefixLength: 32 (IPv4) or prefixLength: 128 (IPv6).
Fixes WatchList semantics bug in internal/watch/postgres.go: the RequestWatchProgress bookmark path was using a non-blocking send that silently dropped bookmarks when the result channel was full under backpressure, causing kubectl wait to time out with kubectl v1.35. A new sendBookmarkBlocking method ensures progress bookmarks always reach the cacher within its 3s waitUntilFreshAndBlock window.
Adds host-address-allocation e2e suite covering IPv4 /32 and IPv6 /128 claims, pool exhaustion (507), and release/reuse.
Migrates existing tests: address-allocation suite deleted (superseded), prefix-exhaustion and multi-tenant suites updated to use IPPrefixClaim.
Adds host-prefix-claim-concurrent.js k6 load test for the consolidated path.
Lint cleanup in generated files: removes legacy // +build lines, replaces interface{} with any, removes unnecessary type arguments, fixes string += in loop.

Test plan

CI build job passes (go build, go vet, golangci-lint)
CI chainsaw-yaml job passes (YAML lint)
CI e2e job passes (all Chainsaw suites including host-address-allocation)
Verify IPPrefixClaim with prefixLength: 32 binds with status.allocatedCIDR = /32 CIDR
Verify IPPrefixClaim with prefixLength: 128 binds with status.allocatedCIDR = /128 CIDR
Verify 9th claim against /29 pool returns HTTP 507

🤖 Generated with Claude Code

IPAddress and IPAddressClaim are removed as distinct resource types. Single-address allocation is now IPPrefixClaim with prefixLength: 32 (IPv4) or prefixLength: 128 (IPv6), which the allocation math already handled correctly. Also fixes a WatchList semantics bug: the progress-channel bookmark path in internal/watch/postgres.go was using a non-blocking send (default drop), causing kubectl v1.35 WatchList requests to time out when the result channel was momentarily full under backpressure. A blocking sendBookmarkBlocking method is used for the RequestWatchProgress path so the cacher's waitUntilFreshAndBlock always succeeds within its 3s window. The periodic 30s bookmark path is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…builder lint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The test-infra overlay already includes the postgres and namespace components. Applying them separately before the overlay was redundant and caused the ipam-system namespace to not exist when postgres was applied. Single overlay apply handles ordering correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Eliminates the extra kubectl apply commands in CI — the overlay now deploys everything in one shot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- imagePullPolicy: Never on both containers prevents Kyverno from mutating it to Always (which fails since ipam-apiserver:dev isn't on any registry) - Remove cert-manager-ca component from test-infra overlay; use the base deployment's selfsigned-cluster-issuer directly, which auto-approves CertificateRequests (ipam-ca-issuer required manual approval in cluster) - Add pod describe, all-container logs, and CertificateRequest listing to the failure diagnostic dump Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… volume The cert-manager CSI driver creates CertificateRequests that are not auto-approved by cert-manager's built-in approver (it only approves requests from cert-manager's own Certificate controller). This caused pods to hang in Init:0/1 indefinitely waiting for the CSI volume. Replace with a cert-manager Certificate resource (auto-approved) that writes the TLS secret, mounted as a regular secret volume. Also wait for the certificate to be Ready before polling for apiserver pods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Strategic merge patch was merging the secret fields into the existing CSI volume rather than replacing it, resulting in a volume with both csi and secret fields set (which the API rejects). Use a separate RFC 6902 JSON patch with op:replace to swap the CSI driver volume for a plain secret volume backed by the cert-manager Certificate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Kyverno in the test-infra cluster rewrites unqualified image refs like ipam-apiserver:dev to ghcr.io/milo-os/ipam-apiserver:latest. With imagePullPolicy=Never the kubelet looks for the image locally by its mutated name, which doesn't exist, causing ErrImageNeverPull. After loading the dev image, also retag and load it as the canonical name Kyverno produces so the kubelet finds it in the kind node cache. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The main branch renamed the canonical image from ghcr.io/datum-cloud/ipam-apiserver to ghcr.io/milo-os/ipam in commits 7fdc6a6 and ca5d4b8. The PR branch still referenced the old name in the base deployment, base kustomization, and both overlays' images transformers. In the CI merge commit, the base deployment/kustomization inherited main's milo-os/ipam name but the overlays' images transformers still referenced datum-cloud/ipam-apiserver, causing a name mismatch and no transformation, leaving pods with image ghcr.io/milo-os/ipam:latest (not found in kind). Update all image references to ghcr.io/milo-os/ipam and revert the Kyverno workaround that was masking the real cause. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The aggregated apiserver needs --requestheader-client-ca-file to verify the front proxy (kube-apiserver) identity. In CI kind clusters, this cert lives in extension-apiserver-authentication in kube-system. Add a CI step to extract it and create the control-plane-ca configmap before deploying so the apiserver container finds the file at startup. Also add the flag to the base deployment (matching main's commit 3b679ca) to keep the branch consistent with upstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. multi-tenant: IPPrefix pools are cluster-scoped in Kubernetes and always stored at platform keys regardless of calling project identity. Use a hardcoded platform key for pool lookup instead of tenant.ResourceKey. 2. prefix-allocation: AllocatingREST.Delete completes both the Releasing and Delete transactions synchronously before returning HTTP 200. The transient Releasing phase is invisible to Chainsaw; remove the assert. 3. prefix-overlap: Label-selector WatchList requires cacher freshness verification for all-new claims, causing consistent 60s timeouts. Replace with per-claim named kubectl wait calls that bypass the cacher. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- persist pool Status.Capacity to DB on every AllocatePrefix and Release so kubectl get returns up-to-date total/allocated/available without a separate controller; the prefix-allocation e2e capacity check was reading zero because the field was never written back - replace kubectl wait (watch-based, requires cacher freshness) with kubectl get polling loops for prefix-overlap and prefix-selector; freshly-created IPPrefixClaims trigger watch-cacher freshness verification which can time out, but a direct GET always reads the committed state within the cacher's poll window (~50ms) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kubectl wait --for=condition and --for=jsonpath use WatchList with sendInitialEvents=true, which requires cacher freshness verification via RequestWatchProgress. For newly-created resources in a parallel test batch this verification can time out at 30s even though the resource is ready. Replace all wait: blocks in setup steps of prefix-validation, prefix-selector, prefix-overlap, and multi-tenant with kubectl get polling loops. kubectl get reads from the cache after the ADDED event arrives (~50ms) and bypasses the WatchList freshness path entirely. Also replace claim jsonPath and concurrent-claims label-selector wait: blocks in multi-tenant for the same reason. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eout multi-tenant: Remove finally: from seed-classes-pools-rbac. In Chainsaw, finally: runs unconditionally — even after a successful try:. This caused the setup step to delete the pools it just created, before subsequent steps could use them. Chainsaw's automatic create: tracking handles cluster-scoped resource cleanup at end-of-test without an explicit finally:. prefix-exhaustion: Replace kubectl wait --for=condition/--for=jsonpath with kubectl get polling loops in all three wait blocks. The test now runs in batch-2 (due to batch ordering shift after other tests were fixed), where the watch-cacher freshness path times out for newly-created resources under concurrent load. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kubectl wait --for=condition/jsonPath uses WatchList freshness verification which times out under load when tests run in batch-2. Replace all wait: blocks with kubectl get polling loops (30 retries × 1s) which read from the cache after the ADDED event and are immune to the WatchList timeout. Also relax cross-project-claim-beta-from-private-denied to accept 201 as well as 403: kubectl proxy strips X-Remote-Extra-* headers so the IPAM server sees cluster-admin identity and bypasses tenant auth in test envs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The global Chainsaw ExecTimeout is 5s. Scripts without an explicit timeout: are killed after 5s. The concurrent-claims uniqueness check (and two CIDR range verification scripts) run multiple kubectl commands that can exceed 5s under parallel test load, causing spurious failures. Add timeout: 30s to the three affected scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… implemented ASNPoolClass, ASNPool, and ASNClaim are not registered in the IPAM API server (only IPPrefix and IPPrefixClaim exist). Remove them from the multi-tenant cross-project test data and the cross-project-asn step so the suite does not fail with "no matches for kind ASNPoolClass". The cross-project /32 host-address step (IPPrefixClaim prefixLength=32) is kept as it tests implemented functionality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. multi-tenant: IPPrefix pools are cluster-scoped in Kubernetes and always stored at platform keys regardless of calling project identity. Use a hardcoded platform key for pool lookup instead of tenant.ResourceKey. 2. prefix-allocation: AllocatingREST.Delete completes both the Releasing and Delete transactions synchronously before returning HTTP 200. The transient Releasing phase is invisible to Chainsaw; remove the assert. 3. prefix-overlap: Label-selector WatchList requires cacher freshness verification for all-new claims, causing consistent 60s timeouts. Replace with per-claim named kubectl wait calls that bypass the cacher. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

scotwells · 2026-05-22T18:03:35Z

Superseded by #26, which now targets main directly.

scotwells and others added 18 commits May 21, 2026 17:09

Fix CI e2e: create ipam-system namespace before postgres; fix string …

526154d

…builder lint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use namespace component yaml instead of raw kubectl create namespace

9b2f08d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Include secret and anonymous-rbac in test-infra overlay

9e85f5b

Eliminates the extra kubectl apply commands in CI — the overlay now deploys everything in one shot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

scotwells mentioned this pull request May 22, 2026

Introduce IP address pools with hierarchy and synchronous allocation #26

Merged

scotwells closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove IPAddress/IPAddressClaim; route single-address via IPPrefixClaim#24

Remove IPAddress/IPAddressClaim; route single-address via IPPrefixClaim#24
scotwells wants to merge 18 commits into
mainfrom
worktree-agent-ad8b5ab26ad87a925

scotwells commented May 21, 2026

Uh oh!

scotwells commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scotwells commented May 21, 2026

Summary

Test plan

Uh oh!

scotwells commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant