[CONTP-1639] Refactor CRD metadata collection to use informer#2979
[CONTP-1639] Refactor CRD metadata collection to use informer#2979
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f966929809
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2979 +/- ##
==========================================
+ Coverage 41.40% 41.76% +0.36%
==========================================
Files 331 332 +1
Lines 28908 29008 +100
==========================================
+ Hits 11969 12116 +147
+ Misses 16084 16025 -59
- Partials 855 867 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Adds pkg/controller/utils/metadata/informer_runner.go: a shared engine for event-driven metadata forwarders. Owns a rate-limited workqueue, worker pool, optional heartbeat ticker, RBAC pre-check via SelfSubjectAccessReview, and an AddWatch helper that registers an informer with filter and Add/Update/Delete handlers. Forwarders register watches via AddWatch and supply ProcessFunc / DeleteFunc / HeartbeatFunc callbacks. Queue keys encode kind + namespace + name as "<Kind>/<namespace>/<name>", with a "delete:" prefix for deletion events. WatchTarget carries an explicit API Group for the RBAC pre-check; callers pass "datadoghq.com" for Datadog CRDs and "" for core resources like ConfigMap/Secret. Without this, SelfSubjectAccessReview defaults to the core group and would deny list/watch on any CRD even when the operator's RBAC permits it. This commit only adds the engine and its unit tests; existing forwarders are migrated in follow-up commits.
Pure refactor. Helm metadata sends behavior is unchanged; only the
implementation moves onto the shared engine introduced in the previous
commit.
- Removes inline workqueue, RBAC pre-check, worker pool, and ticker
loop in favor of InformerWorkQueue.
- processKey now dispatches by kind ("ConfigMap" / "Secret") encoded in
the queue key, replacing the prior "try as ConfigMap then Secret"
fallback.
- handleDelete and the periodic heartbeat are now plain callbacks the
engine invokes.
- Watches register with Group="" since ConfigMap and Secret live in the
core API group.
- Note: shared handlers include UpdateFunc; the previous Helm code
registered Add/Delete only. Helm release dedup in handleHelmResource
(revision check) absorbs any extra events, so no behavior leak.
All existing Helm tests continue to pass without modification beyond
the queue-initialization assertion (now reads through runner.QueueLen).
🛑 Gate Violations
ℹ️ Info🎯 Code Coverage (details) Useful? React with 👍 / 👎 This comment will be updated automatically if new data arrives.🔗 Commit SHA: 45b81c9 | Docs | Datadog PR Page | Give us feedback! |
f966929 to
7e86a2a
Compare
…rkQueue Replaces the 1-minute polling loop with informer-driven sends. CRD spec changes now propagate to the metadata pipeline within ~1 second instead of up to 60 seconds. CRDMetadataForwarder changes: - Constructor takes manager.Manager; implements manager.Runnable and manager.LeaderElectionRunnable so the manager starts it after cache sync and dedupes sends across operator replicas. - Start registers AddWatch for DatadogAgent, DatadogAgentInternal, and DatadogAgentProfile against the manager cache, each with Group="datadoghq.com" so the RBAC pre-check matches the operator's granted permissions on that group. - processKey Gets the typed object from the cache, builds a CRDInstance, hashes against a sync.Map of last-sent snapshots keyed by "<Kind>/<namespace>/<name>", and sends only if new or changed. - handleDelete clears the snapshot for a deleted CRD. - A 10-minute heartbeat re-sends every snapshot for safety. - Removes the polling loop and its helpers (sendMetadata, getAllActiveCRDs, getCRDsToSend, cleanupDeletedCRDs, buildCacheKey). cmd/main.go: setupAndStartCRDMetadataForwarder now takes manager.Manager and registers the forwarder via mgr.Add, alongside the Helm forwarder. The leader-elected goroutine still runs the operator metadata forwarder which remains a poll loop. Tests cover the new processKey paths (new send, unchanged skip, NotFound no-op), handleDelete, and heartbeat re-send.
7e86a2a to
45b81c9
Compare
What does this PR do?
Replaces the 1-minute polling loop in
CRDMetadataForwarderwith an event-driven informer + workqueue, and extracts the shared informer/RBAC/queue/heartbeat plumbing into a newInformerWorkQueueengine used by both the CRD and Helm metadata forwarders.Three commits, intended to be reviewed in order:
1.
feat(metadata): add shared InformerWorkQueue enginepkg/controller/utils/metadata/informer_runner.godefinesInformerWorkQueue— owns the rate-limited workqueue, worker pool, RBAC pre-check (SelfSubjectAccessReview), informer registration viaAddWatch(ctx, WatchTarget), and an optional heartbeat ticker.WatchTargetcarries an explicit APIGroupso the SAR pre-check matches the operator's granted RBAC (e.g.datadoghq.comfor CRDs,""for core resources).2.
refactor(metadata): migrate HelmMetadataForwarder to InformerWorkQueueprocessKeynow dispatches by kind (ConfigMap/Secret) encoded in the queue key, replacing the prior "try as ConfigMap then Secret" fallback.runner.QueueLen).3.
feat(metadata): make CRDMetadataForwarder event-driven via InformerWorkQueuemanager.Manager; implementsmanager.Runnable+manager.LeaderElectionRunnable.StartregistersAddWatchforDatadogAgent/DatadogAgentInternal/DatadogAgentProfileagainst the manager cache, each withGroup="datadoghq.com".processKeyGets the typed object, hashes against async.Mapof last-sent snapshots, sends only if new or changed; stores the new snapshot before sending so a concurrent heartbeat tick cannot replay a stale snapshot.handleDeleteclears the snapshot for a deleted CRD; 10-minute heartbeat re-sends every snapshot.sendMetadata,getAllActiveCRDs,getCRDsToSend,cleanupDeletedCRDs,buildCacheKey) deleted.cmd/main.go:setupAndStartCRDMetadataForwardernow takesmanager.Managerand registers viamgr.Add(cmf), sitting next to the Helm wiring instead of inside the leader-elected goroutine.Motivation
CRD metadata sends were taking up to 60 seconds to reflect a
kubectl editon aDatadogAgent, because the forwarder relied on a wall-clock poll. The Helm forwarder has used informers since it was written.After this PR, a CRD spec change reaches the Datadog metadata pipeline within ~1 second.
Additional Notes
UpdateFunchandler (the old Helm code only handled Add/Delete). This means metadata-only changes to a Helm-managed ConfigMap/Secret can now enqueue events that previously were silent.handleHelmResourcealready deduplicates by Helm revision, so no extra metadata sends leak through — verified by the existing Helm test suite.<-mgr.Elected()goroutine, so was effectively leader-only. The newNeedLeaderElection() bool { return true }preserves that. Operators with--enable-leader-election=falsestill get sends because controller-runtime treatsLeaderElectionRunnablereturningtrueas "run unconditionally when leader election is disabled".Minimum Agent Versions
Describe your test plan
Unit tests (added with this PR, all passing on workspace):
Test_EncodeDecodeKey,Test_InformerWorkQueue_DispatchesAdd,Test_InformerWorkQueue_DispatchesDelete,Test_InformerWorkQueue_HeartbeatFires,Test_InformerWorkQueue_CanListWatch_Allowed— engine mechanicsTest_CRDProcessKey_NewCRDSends,Test_CRDProcessKey_UnchangedSkipsSend,Test_CRDProcessKey_NotFoundIsNoOp,Test_CRDHandleDelete,Test_CRDHeartbeatResendsAllSnapshots— CRD informer flow (httptest server + seededkube-systemNamespace +DD_API_KEY/DD_URLenv vars for credentials)Test_workqueueInitialization,Test_parseHelmResource,Test_releaseSnapshots,Test_revisionLogic,Test_buildSnapshot,Test_snapshotToReleaseData,Test_mergeValues,Test_buildPayload) — unchanged, still passe2e test
DatadogAgentCRkubectl edit ddaon a label/annotation/spec field, watch operator logs forSending metadatawithin seconds (V(1) info)kubectl delete ddaand confirm the snapshot is cleared (V(1)Removed deleted CRD from snapshot store)Checklist
refactoringqa/skip-qalabel