Skip to content

Enable operator mode in local monitoring tests#1837

Open
shreyabiradar07 wants to merge 9 commits intokruize:mvp_demofrom
shreyabiradar07:enable_operator_local_tests
Open

Enable operator mode in local monitoring tests#1837
shreyabiradar07 wants to merge 9 commits intokruize:mvp_demofrom
shreyabiradar07:enable_operator_local_tests

Conversation

@shreyabiradar07
Copy link
Copy Markdown
Contributor

@shreyabiradar07 shreyabiradar07 commented Mar 5, 2026

Description

This PR enables Operator deployment mode in Local monitoring functional tests

Fixes # (issue)

Type of change

  • Bug fix
  • New feature
  • Docs update
  • Breaking change (What changes might users need to make in their application due to this PR?)
  • Requires DB changes

How has this been tested?

Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.

  • New Test X
  • Functional testsuite

Test Configuration

  • Kubernetes clusters tested on:

Checklist 🎯

  • Followed coding guidelines
  • Comments added
  • Dependent changes merged
  • Documentation updated
  • Tests added or updated

Additional information

Include any additional information such as links, test results, screenshots here

Summary by Sourcery

Add support for deploying and cleaning up Kruize via the operator in local monitoring tests, configurable through the test runner CLI.

New Features:

  • Introduce operator-based deployment and cleanup paths for Kruize in functional/local monitoring tests.
  • Add CLI support in test_autotune.sh to enable operator mode and optionally specify a custom operator image.
  • Create utilities to patch operator Custom Resource resources and wait for operator-managed pods to become ready during tests.

Enhancements:

  • Extend existing cleanup logic to handle operator-based deployments alongside script-based deployments.

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 5, 2026

Reviewer's Guide

Enable local monitoring functional tests to deploy and clean up Kruize via an operator as an alternative to the existing script-based deployment, with CLI flags to control operator mode and image selection.

File-Level Changes

Change Details Files
Add operator-based deployment path for Kruize in common shell utilities, including namespace selection, repo cloning, CR patching, and readiness checks.
  • Introduce deploy_kruize_operator() to clone kruize-operator, patch the sample CR, deploy via make deploy-<cluster_type>, and wait for Kruize pods to become Ready.
  • Add kruize_operator_patch() to tune CR CPU/memory and storage resources for database, application, and persistent volume sections.
  • Add cleanup_kruize_operator() to undeploy the operator via make undeploy-<cluster_type>, with warnings on failure or missing repo.
tests/scripts/common/common_functions.sh
Wire operator-mode deployment into local monitoring tests, allowing tests to choose between operator and script-based setup.
  • Add USE_OPERATOR and KRUIZE_OPERATOR_IMAGE variables with defaults to local monitoring test script.
  • Update local_monitoring_tests() setup flow to conditionally call deploy_kruize_operator when USE_OPERATOR=1, falling back to existing patch+setup path otherwise.
tests/scripts/local_monitoring_tests/local_monitoring_tests.sh
Extend autotune test runner CLI to support enabling operator mode and passing an optional operator image.
  • Update usage() text to document -o flag and operator image usage examples.
  • Extend getopts to handle -o and a long --operator-image option, parsing and exporting USE_OPERATOR and KRUIZE_OPERATOR_IMAGE.
  • Allow -o with optional immediate image argument, defaulting to operator mode with no explicit image when omitted.
tests/test_autotune.sh
Harden cleanup logic to correctly handle operator-based deployments in autotune_cleanup().
  • Adjust autotune_cleanup() repo pushd logic to enter kruize-operator directory when USE_OPERATOR is set.
  • Add operator-aware cleanup branch that runs make undeploy-<cluster_type> in kruize-operator if present, falling back to existing deploy.sh -t cleanup when operator mode is not used.
  • Improve logging and warnings around missing kruize-operator directory or undeploy failures, while preserving prometheus cleanup behavior.
tests/scripts/common/common_functions.sh

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@shreyabiradar07 shreyabiradar07 marked this pull request as ready for review March 12, 2026 08:42
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In autotune_cleanup, the operator branch sets OPERATOR_REPO_DIR and pushd into it twice (once before the cleanup check and again inside the USE_OPERATOR block) without corresponding popds, which can confuse the directory stack; consider consolidating the directory change to a single pushd/popd pair for the operator path.
  • In deploy_kruize_operator, the sed -i edits on config/samples/v1alpha1_kruize.yaml (for image/cluster_type/namespace) permanently mutate the sample CR on each run, unlike kruize_operator_patch which creates a backup; it would be safer to patch a copy or restore from the backup to avoid cumulative changes between test runs.
  • The repeated wait/timeout loops for kruize-db, kruize, and kruize-ui pods in deploy_kruize_operator share almost identical logic; extracting a small helper (e.g., wait_for_pod_ready <label>) would reduce duplication and make future changes to the wait behavior easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `autotune_cleanup`, the operator branch sets `OPERATOR_REPO_DIR` and `pushd` into it twice (once before the cleanup check and again inside the `USE_OPERATOR` block) without corresponding `popd`s, which can confuse the directory stack; consider consolidating the directory change to a single `pushd`/`popd` pair for the operator path.
- In `deploy_kruize_operator`, the `sed -i` edits on `config/samples/v1alpha1_kruize.yaml` (for image/cluster_type/namespace) permanently mutate the sample CR on each run, unlike `kruize_operator_patch` which creates a backup; it would be safer to patch a copy or restore from the backup to avoid cumulative changes between test runs.
- The repeated wait/timeout loops for `kruize-db`, `kruize`, and `kruize-ui` pods in `deploy_kruize_operator` share almost identical logic; extracting a small helper (e.g., `wait_for_pod_ready <label>`) would reduce duplication and make future changes to the wait behavior easier.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@shreyabiradar07 shreyabiradar07 self-assigned this Apr 8, 2026
@shreyabiradar07 shreyabiradar07 added this to the Kruize 0.11.0 Release milestone Apr 8, 2026
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@shreyabiradar07 shreyabiradar07 moved this to In Progress in Monitoring Apr 21, 2026
@shreyabiradar07 shreyabiradar07 added test kruize-local Tag for mentioning all the PR's and issues raised which covers the kruize local monitoring usecase labels Apr 21, 2026
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
Signed-off-by: Shreya Biradar <shbirada@ibm.com>
@shreyabiradar07
Copy link
Copy Markdown
Contributor Author

Initial local monitoring functional test results in operator mode with latest test builds.

./tests/test_autotune.sh -c openshift -i quay.io/dinogun210/autotune_operator:0.10-rc1 -o quay.io/chandra25ms/kruize-operator:0.0.5-rc1 --testsuite=local_monitoring_tests

*********************************************************************************
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Overall summary of the tests ~~~~~~~~~~~~~~~~~~~~~~~
Total time taken to perform the test 8640 seconds
Total Number of test suites performed 1
Total Number of tests performed 232
Total Number of tests passed 229
Total Number of tests failed 3

Check below logs for failed test cases:
		                        local_monitoring_tests

*********************************************************************************

kruize_local_monitoring_tests_pr_1837.zip

# Update kruize-db resources
${SED_INPLACE} -i '/kruize-db:/,/volumeMounts:/ {
/requests:/,/limits:/ {
s/cpu: ".*"/cpu: "2"/g
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chandrams What are the recommended resource configuration increases for Kruize and Kruize-db based on past testing resource usage patterns for Openshift testing to avoid OOMKilled errors?

Context:

  • Currently, the kruize_operator_patch() function increases both DB and Kruize resources to 2Gi memory and 2 CPU cores for both components to avoid OOM errors
  • Default OpenShift (ITCP) clusters typically use m6a.xlarge instances (4 cores, 16GiB total)
  • With increased values (2+2=4 cores, 2+2=4GiB), this consumes 100% of CPU and 25% of memory on minimal cluster configs
  • This may prevent users from running tests on minimal configurations locally
  • Facing below error with m6a.xlarge(4 cores, 16GiB total), m6a.2xlarge(8 cores, 32GiB total) instances and Kruize pod is not deployed
0/1 nodes are available: 1 Insufficient cpu. no new claims to deallocate, preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kruize-local Tag for mentioning all the PR's and issues raised which covers the kruize local monitoring usecase test

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant