Enable operator mode in local monitoring tests#1837
Open
shreyabiradar07 wants to merge 9 commits intokruize:mvp_demofrom
Open
Enable operator mode in local monitoring tests#1837shreyabiradar07 wants to merge 9 commits intokruize:mvp_demofrom
shreyabiradar07 wants to merge 9 commits intokruize:mvp_demofrom
Conversation
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Contributor
Reviewer's GuideEnable local monitoring functional tests to deploy and clean up Kruize via an operator as an alternative to the existing script-based deployment, with CLI flags to control operator mode and image selection. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
autotune_cleanup, the operator branch setsOPERATOR_REPO_DIRandpushdinto it twice (once before the cleanup check and again inside theUSE_OPERATORblock) without correspondingpopds, which can confuse the directory stack; consider consolidating the directory change to a singlepushd/popdpair for the operator path. - In
deploy_kruize_operator, thesed -iedits onconfig/samples/v1alpha1_kruize.yaml(for image/cluster_type/namespace) permanently mutate the sample CR on each run, unlikekruize_operator_patchwhich creates a backup; it would be safer to patch a copy or restore from the backup to avoid cumulative changes between test runs. - The repeated wait/timeout loops for
kruize-db,kruize, andkruize-uipods indeploy_kruize_operatorshare almost identical logic; extracting a small helper (e.g.,wait_for_pod_ready <label>) would reduce duplication and make future changes to the wait behavior easier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `autotune_cleanup`, the operator branch sets `OPERATOR_REPO_DIR` and `pushd` into it twice (once before the cleanup check and again inside the `USE_OPERATOR` block) without corresponding `popd`s, which can confuse the directory stack; consider consolidating the directory change to a single `pushd`/`popd` pair for the operator path.
- In `deploy_kruize_operator`, the `sed -i` edits on `config/samples/v1alpha1_kruize.yaml` (for image/cluster_type/namespace) permanently mutate the sample CR on each run, unlike `kruize_operator_patch` which creates a backup; it would be safer to patch a copy or restore from the backup to avoid cumulative changes between test runs.
- The repeated wait/timeout loops for `kruize-db`, `kruize`, and `kruize-ui` pods in `deploy_kruize_operator` share almost identical logic; extracting a small helper (e.g., `wait_for_pod_ready <label>`) would reduce duplication and make future changes to the wait behavior easier.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Contributor
Author
|
Initial local monitoring functional test results in operator mode with latest test builds. |
shreyabiradar07
commented
Apr 23, 2026
| # Update kruize-db resources | ||
| ${SED_INPLACE} -i '/kruize-db:/,/volumeMounts:/ { | ||
| /requests:/,/limits:/ { | ||
| s/cpu: ".*"/cpu: "2"/g |
Contributor
Author
There was a problem hiding this comment.
@chandrams What are the recommended resource configuration increases for Kruize and Kruize-db based on past testing resource usage patterns for Openshift testing to avoid OOMKilled errors?
Context:
- Currently, the
kruize_operator_patch()function increases both DB and Kruize resources to 2Gi memory and 2 CPU cores for both components to avoid OOM errors - Default OpenShift (ITCP) clusters typically use
m6a.xlarge instances (4 cores, 16GiB total) - With increased values (2+2=4 cores, 2+2=4GiB), this consumes 100% of CPU and 25% of memory on minimal cluster configs
- This may prevent users from running tests on minimal configurations locally
- Facing below error with
m6a.xlarge(4 cores, 16GiB total), m6a.2xlarge(8 cores, 32GiB total)instances and Kruize pod is not deployed
0/1 nodes are available: 1 Insufficient cpu. no new claims to deallocate, preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR enables Operator deployment mode in Local monitoring functional tests
Fixes # (issue)
Type of change
How has this been tested?
Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.
Test Configuration
Checklist 🎯
Additional information
Include any additional information such as links, test results, screenshots here
Summary by Sourcery
Add support for deploying and cleaning up Kruize via the operator in local monitoring tests, configurable through the test runner CLI.
New Features:
Enhancements: