ERA-60185 ERA-57641 : Add ConfigMap defaults support for Database CR and LCMConfig data Bug#208
ERA-60185 ERA-57641 : Add ConfigMap defaults support for Database CR and LCMConfig data Bug#208manavrajvanshi merged 16 commits intomainfrom
Conversation
shivaprasadmb
left a comment
There was a problem hiding this comment.
few suggesions:
-
Document all supported ConfigMap keys
README already has a good example. Adding a short “Supported keys” subsection (e.g. clusterName, size, timezone, profiles., timeMachine., type-specific keys like postgres.size) would help admins. -
Validation after applying defaults
Optional: after ApplyDefaultsToDatabase, run a quick check (e.g. for provisioning: cluster set, size >= 10, required TM fields) and emit a clear event or set a condition if something required is still missing, so “ConfigMap missing required key” is easier to diagnose. -
ConfigMap namespace
Restricting the ConfigMap to the Database’s namespace is simple and consistent with RBAC. If you ever need cross-namespace defaults, you could add an optional defaultsConfigMapNamespace (or a small struct ref) later; not necessary for the current design.
Allow developers to specify minimal inputs in Database CR with admin-provided defaults via ConfigMap. Values in CR take precedence over ConfigMap defaults.
…ecified This commit adds the critical logic to skip validation for fields that can be provided via ConfigMap defaults (cluster, size, profiles, timeMachine). Key changes: - Webhook detects when defaultsConfigMapRef is set - Passes hasDefaultsConfigMap flag to validation handlers - Validation handlers skip cluster/size/TM checks when ConfigMap ref is present - Allows minimal Database CRs to pass validation successfully Tested: Successfully provisioned databases using minimal CRs with ConfigMap defaults.
…iles - Add postgres.profiles.dbParam.name for PostgreSQL-specific defaults - Add mysql.profiles.dbParam.name for MySQL-specific defaults - Add MySQL-specific example section in ConfigMap - Clarify that dbParam profiles should be database-type-specific
- Updated webhook validation to allow omitting both snapshotId and snapshotName when defaultsConfigMapRef is set - Added automatic selection of latest snapshot when both snapshot fields are empty - Implemented GetLatestSnapshotForTM() in ndb_api to fetch most recent snapshot by timestamp - Updated README with documentation on auto-snapshot feature Testing: Verified with PostgreSQL and MySQL clones, backward compatibility maintained
Changes: - Removed ConfigMap requirement for auto-snapshot feature - Auto-snapshot now works when both snapshotId and snapshotName are omitted - Updated webhook validation: removed hasDefaultsConfigMap check - Updated test: changed from expecting error to expecting success - Updated documentation: clarified auto-snapshot works in any scenario This makes features orthogonal: - ConfigMap defaults = provide default configuration values - Auto-snapshot = automatically select latest snapshot Benefits: - Better UX: no need for ConfigMap just to get auto-snapshot - More flexible: supports common "clone latest" use case - Simpler mental model: omit snapshot = latest Testing: - All 7 webhook validation scenarios pass - Auto-snapshot works without ConfigMap (new behavior) - Backward compatibility maintained - ConfigMap + auto-snapshot still works
69e4209 to
9bbdc29
Compare
…nts (ERA-57641) - Initialize req.LcmConfig before dereferencing to avoid panic - Fix refreshDetails error condition: use refreshDetailsCount instead of databaseLcmConfigCount - Fix typo in error message: refreshInDay -> refreshInDays
manavrajvanshi
left a comment
There was a problem hiding this comment.
Most changes look good to me, can you please do the following -
- Add unit tests for the changes made and new functions added.
- Attach screenshots for how you tested the configmap defaults.
- Document empty ConfigMap data, RBAC (get/list ConfigMaps), and precedence - Split engine size vs profile keys; size is provision-only - Note OOB vs ConfigMap for profile slots; fix typos (Default, Development) Made-with: Cursor
Added unit tests and attached screenshots of testing. |
shivaprasadmb
left a comment
There was a problem hiding this comment.
Please check the review comments and also run make fmt to format all the files
manavrajvanshi
left a comment
There was a problem hiding this comment.
Left a comment about the nil bug.
Overview
Operators can define fleet defaults in a Kubernetes ConfigMap. Developers set defaultsConfigMapRef on the Database spec to point at that ConfigMap. The mutating webhook fills missing fields first; then the validating webhook runs. The API server stores a fully specified custom resource, and the controller does not load the ConfigMap again for defaulting.
Key changes
API (database_types.go under api/v1alpha1)
ConfigMap application (configmap_defaults.go and database_webhook.go)
Clone: latest snapshot (name_resolution.go under controller_adapters, snapshots.go under ndb_api)
Webhook validation (webhook_helpers.go under api/v1alpha1)
RBAC (config/rbac)
Architecture
Create or update a Database custom resource in this order:
ConfigMap behaviour
Unit Tests
ConfigMap defaulting is covered in api/v1alpha1/configmap_defaults_test.go; the mutating defaulter (DatabaseCustomDefaulter) in api/v1alpha1/database_webhook_test.go. Validating and mutating admission paths are covered by the existing Ginkgo suite in api/v1alpha1/webhook_suite_test.go (envtest). Clone request generation is covered in ndb_api/clone_helpers_test.go.
Per-field defaults (
defaultsConfigMap→ empty CR fields only)clusterNameclusterIdandclusterNameare both emptytimezonetimezoneis emptysizesizeis0profiles.compute.nameprofiles.network.nameprofiles.software.nameprofiles.dbParam.nameprofiles.dbParamInstance.nametimeMachine.slatimeMachine.dailySnapshotTimeHH:MM:SS)timeMachine.snapshotsPerDay0timeMachine.logCatchUpFrequency0timeMachine.weeklySnapshotDaytimeMachine.monthlySnapshotDay0timeMachine.quarterlySnapshotMonthclone.clusterNameclusterNameclone.timezonetimezoneclone.profiles.*profiles.*ConfigMap Structure Used
Example Usage
ConfigMap with Defaults
Minimal Database CR (Provisioning)
Minimal Clone CR (Auto-snapshot)
Tested Provisioning All for DBs and cloning of PGSI
Traditional CR PGSI Provisioning Test :
Negative test : Bad software profile
Shows clear controller events with :
Scenarios:
Wrong profile name in the CR
ConfigMap only fills profiles when id and name are empty. If the user sets a name (even wrong), that value wins; ConfigMap does not fix it. Reconcile fails with “could not resolve profile … name=…”. Fix the name/id in the CR, or clear both to let defaults apply again.
Traditional CR, empty profiles (open-source) :
Empty id/name → operator picks NDB OOB profiles via its built-in filters (not arbitrary). MSSQL still needs software in the CR.
ConfigMap has a custom profile; user wanted “NDB default” so left CR empty
Empty CR + key in ConfigMap → ConfigMap value is applied, not NDB OOB.
Handle it: Omit profile keys from the org ConfigMap if you want OOB; use a slimmer ConfigMap for those teams; or put the real NDB profile name in the CR once known. There is no separate “use NDB OOB” flag.
Test Coverage: 20/20 Tests Passed
Key Validations
Documentation
README.mdwith ConfigMap usage, supported keys table, key precedence, and "How it works" (namespace, missing ConfigMap behavior)Testing Summary and Logs
1. Functionality Testing
1.1 PostgreSQL Provisioning with ConfigMap Defaults
Test File:
test-pgsi-1.yamlMinimal CR:
ConfigMap Defaults Applied:
auto_cluster_nested_697cec1c92fce9da8fbb8668UTCPOSTGRES_14.13_OOBDEFAULT_OOB_POSTGRESQL_NETWORKDEFAULT_POSTGRES_PARAMSDEFAULT_OOB_COMPUTE10 GBDEFAULT_OOB_BRASS_SLALogs (from defaulter webhook in operator pod):
Result: Database
pgsi-1successfully created on NDB with all defaults properly applied.1.2 MySQL Provisioning with ConfigMap Defaults
Test File:
test-mysql-1.yamlMinimal CR:
ConfigMap Defaults Applied:
MYSQL_8.0.37_OOBDEFAULT_OOB_MYSQL_NETWORKDEFAULT_MYSQL_PARAMSLogs (from defaulter webhook in operator pod):
Result: Database
mysql-1successfully created with database-specific MySQL profiles.1.3 Cloning with Auto-Snapshot Selection
Test File:
test-clone-pgsi-auto-snapshot.yamlMinimal Clone CR (no snapshot specified):
Operator Logs:
Result: Clone successfully created using auto-selected latest snapshot.
1.4 Cloning with Explicit Snapshot Name
Test File:
test-clone-pgsi-with-name.yamlOperator Logs:
Result: Clone created successfully using explicit snapshot name.
1.5 Cloning with Explicit Snapshot ID
Test File:
test-clone-pgsi-with-id.yamlResult: Clone created successfully using explicit snapshot UUID (no name resolution needed).
2. Backward Compatibility Testing
2.1 Provisioning WITHOUT ConfigMap (Traditional Flow)
Test File:
test-pgsi-2-backward-compat.yamlTraditional CR (no ConfigMap reference):
Result: Database
pgsi-2successfully created. Traditional provisioning flow works unchanged.Verification:
2.2 Cloning WITHOUT ConfigMap (Traditional Flow)
Test File:
test-clone-pgsi-1-backward-compat.yamlTraditional Clone CR:
Result: Clone
clone-pgsi-1-noconfigsuccessfully created. Traditional cloning flow works unchanged.2.3. Webhook validation (traditional flow, no ConfigMap)
Test: Provision CR with no defaultsConfigMapRef, no cluster, no ConfigMap, no size (minimal invalid body).
Actual admission response (this run):
Result: Webhook correctly enforces validation when ConfigMap is not used.
3. Override Behavior Testing
3.1 Provisioning with overrides
ConfigMap:
timezone: UTC,postgres.size: "10"— CR overrides withsize: 20,timezone: Asia/Kolkata.CR (
override-demo-provision):Admission (mutating webhook) — same
requestIDon each line; ConfigMap fetched (keysCount: 28); defaults applied for cluster/profiles/Time Machine — noApplied default size/Applied default timezone(CR wins for those).Operator log —
Database Provisioning(requestBodysanitized):{ "name": "override-demo-provision", "newDbServerTimeZone": "Asia/Kolkata", "timeMachineInfo": { "name": "override-demo-provision_TM" }, "actionArguments": [ { "name": "database_size", "value": "20" } ] }Result:
sizeandtimezoneon the CR override ConfigMappostgres.size/timezone.3.2 Cloning with overrides
ConfigMap:
timezone: UTC— CR overrides clonetimezone: Europe/London.CR (
override-demo-clone):Admission (mutating webhook) — ConfigMap applied for cluster/profiles; no
Applied default timezone(timezone set on CR).Operator log —
Database Cloning(requestBodysanitized):{ "name": "override-demo-clone", "timeZone": "Europe/London", "timeMachineId": "2715137d-079d-46a7-a828-66a3ba6d74ab", "snapshotId": "b1a5a9df-2872-4e4d-8545-be866f9edb9a", "nxClusterId": "1b243c01-56f4-4c8c-af00-f50ae117e04e" }Result: Clone
timezoneon the CR overrides ConfigMapUTC.4. Edge Cases and Error Handling
4.1 Non-existent ConfigMap — handled
Test:
docs/configmap-defaults-pr/14-neg-missing-configmap.yaml(defaultsConfigMapRef→ missing ConfigMap).Admission: Request denied (resource not created). Example:
Operator / admission logs (excerpt):
4.2 Empty ConfigMap (
data: {}) — handledTest: ConfigMap
ndb-empty-defaults-testwithdata: {}+Databaseneg-empty-cm-testreferencing it (minimal spec withsize: 10andclusterNameso admission passes).Admission logs (excerpt):
No ConfigMap keys are applied; standard defaulter still runs; reconcile proceeds if the rest of the spec is valid.
4.3 Invalid profile name — handled
Test:
15-neg-bad-profile-configmap.yaml+16-neg-bad-profile-database.yaml(postgres.profiles.software.name: THIS_SOFTWARE_PROFILE_DOES_NOT_EXIST).Admission (excerpt): ConfigMap fetched, bad software name applied from map, validation passes.
Reconcile / controller (excerpt):
Events (
kubectl describe database neg-bad-profile):4.4 Invalid source database name — handled
Test: Clone
neg-invalid-srcwithsourceDatabaseName: this-database-does-not-exist-xyz(no ConfigMap).Events:
4.5 Invalid snapshot name — handled
Test: Clone
neg-invalid-snapwith validsourceDatabaseName: pgsi-provision-dbcrandsnapshotName: this-snapshot-does-not-exist-xyz.Events (NDB may include time machine id in the message):
4.6 Webhook + ConfigMap (defaulter → validation)
Behavior (verified across tests above): Mutating webhook loads ConfigMap when referenced, applies defaults only to empty fields, then runs the standard defaulter. Validating webhook runs on the merged object (
ValidateCreate/combined_errin logs).Key validations
postgres.*) override generic keys when set and non-empty.RequestGenerationFailed/NDBRequestFailedEvents with operator/NDB text.defaultsConfigMapRefunchanged.DatabasenamespaceLCMConfig Jira test (ERA-57641)
Goal: Clone Postgres with LCM
additionalArguments(expireInDays,expiryDateTimezone,deleteDatabase) and confirm the controller builds a clone request withlcmConfigand does not panic on nilLcmConfig.YAML exercised
Repo path:
docs/configmap-defaults-pr/era57641-lcm-clone-test.yamloperator logs (verification)
Entered ndb_api.GenerateCloningRequestandReturning from ndb_api.GenerateCloningRequestfor the Database name used above.Database CloningwithrequestBodycontaininglcmConfigand nesteddatabaseLCMConfig/expiryDetailswith the three values (expireInDays,expiryDateTimezone,deleteDatabase).