Skip to content

OADP-4855 Add cachePVC support for Kopia cache volume#2090

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:oadp-devfrom
mpryc:oadp-dev-copia-cache-update
Mar 2, 2026
Merged

OADP-4855 Add cachePVC support for Kopia cache volume#2090
openshift-merge-bot[bot] merged 1 commit intoopenshift:oadp-devfrom
mpryc:oadp-dev-copia-cache-update

Conversation

@mpryc
Copy link
Copy Markdown
Contributor

@mpryc mpryc commented Feb 13, 2026

Add cachePVC configuration to the DPA CRD under spec.configuration.nodeAgent, enabling dedicated PVC-backed cache volumes for Kopia repository cache during data movement restore operations. This prevents ephemeral storage exhaustion on nodes with limited root filesystem space during concurrent restores.

  • Add CachePVCConfig field to NodeAgentConfigMapSettings using upstream types.CachePVC directly to prevent field name drift with Velero
  • Serialize cachePVC config into the node-agent ConfigMap so Velero provisions per-restore cache PVCs instead of using pod root filesystem
  • Add default StorageClass resolution when cachePVC is configured without explicit storageClass, falling back to the cluster's default SC
  • Pass backup-repository-configmap to node-agent DaemonSet for cache volume size calculation (cacheLimitMB * 1.2)
  • Add RBAC for StorageClass list/get/watch
  • Add unit tests for ConfigMap assembly, default SC resolution, isNodeAgentCMRequired, and DaemonSet build with backup-repo configmap
  • Add user documentation with configuration examples

Why the changes were made

Fix OADP-4855

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for configuring dedicated cache volumes for Kopia repository cache during restore operations, including storage class and resident threshold settings.
    • Added CACertRef field for CA certificate references; CACert is now deprecated.
  • Documentation

    • Added configuration guide for cache volume PVC setup.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 13, 2026

Walkthrough

This PR introduces support for configurable cache PVCs for Kopia repository caching during restore operations. It adds a CachePVCConfig field to the DataProtectionApplication CRD, implements automatic default StorageClass resolution, updates RBAC permissions to read storageclasses, and includes comprehensive tests and documentation.

Changes

Cohort / File(s) Summary
API Type and CRD Definitions
api/v1alpha1/dataprotectionapplication_types.go, bundle/manifests/oadp.openshift.io_dataprotectionapplications.yaml, config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml
Adds CachePVCConfig field to NodeAgentConfigMapSettings with residentThresholdInMB and storageClass properties, enabling per-restore cache PVC configuration. Updated DeepCopyInto to handle deep copying of CachePVCConfig.
RBAC and Permissions
bundle/manifests/oadp-operator.clusterserviceversion.yaml, config/rbac/role.yaml, internal/controller/dataprotectionapplication_controller.go
Grants ClusterRole manager-role permissions to get, list, and watch storageclasses resources via kubebuilder RBAC marker and manifest rules.
StorageClass Deprecation CRD
bundle/manifests/velero.io_backupstoragelocations.yaml, config/crd/bases/velero.io_backupstoragelocations.yaml
Deprecates caCert field in BackupStorageLocation objectStorage schema and introduces new caCertRef object field (with key, name, optional properties) to reference CA certificates via Kubernetes Secret.
Controller Implementation
internal/controller/nodeagent.go
Adds getDefaultStorageClass() method to resolve cluster default StorageClass by annotation. Extends isNodeAgentCMRequired to check CachePVCConfig. Updates updateNodeAgentCM to auto-resolve and inject default StorageClass into CachePVCConfig when empty. Updates buildNodeAgentDaemonset to propagate backup repository config map name.
Test Coverage
internal/controller/nodeagent_test.go
Adds test functions for KopiaRepoOptions handling, CachePVCConfig-based NodeAgent ConfigMap creation, default StorageClass resolution validation, and multi-scenario StorageClass injection tests.
Documentation
docs/cache_volume_pvc_configuration.md
New documentation file detailing cache volume PVC configuration options, PVC lifecycle, size calculation logic (cacheLimitMB \* 1.2, rounded up to nearest GB), and multiple YAML configuration examples with backward compatibility notes.

Sequence Diagram

sequenceDiagram
    participant DPA as DPA Reconciliation
    participant SC as StorageClass API
    participant CM as NodeAgent ConfigMap
    participant DS as NodeAgent DaemonSet

    DPA->>DPA: Check if NodeAgent ConfigMap required<br/>(CachePVCConfig present?)
    
    alt CachePVCConfig has no StorageClass
        DPA->>SC: Query default StorageClass<br/>(via annotation)
        SC-->>DPA: Return default SC name or not found
        alt Default StorageClass found
            DPA->>DPA: Inject default into<br/>CachePVCConfig copy
        else No default found
            DPA->>DPA: Nil or disable CachePVCConfig
        end
    end
    
    DPA->>CM: Create/Update NodeAgent ConfigMap<br/>with CachePVCConfig
    CM-->>DPA: ConfigMap ready with cache settings
    
    DPA->>DS: Build NodeAgent DaemonSet<br/>with backup repo config map name
    DS-->>DPA: DaemonSet configured
    
    DPA->>DS: Deploy DaemonSet to cluster
    DS-->>DPA: Deployment complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test code contains panic-prone bare type assertion, missing storagev1 scheme registration, and lacks test case for multiple default StorageClasses. Replace bare type assertion with comma-ok pattern, add storagev1.AddToScheme() to getSchemeForFakeClient(), add test case for multiple defaults, and add messages to all assertions.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding cachePVC support for Kopia cache volumes, which aligns with the primary objective of the PR.
Description check ✅ Passed The PR description provides comprehensive context including what was changed, why it matters, and references the issue (OADP-4855). However, it lacks a dedicated 'How to test' section as specified in the template.
Stable And Deterministic Test Names ✅ Passed All test names in the pull request are stable, deterministic, and descriptive with no dynamic values.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@mpryc
Copy link
Copy Markdown
Contributor Author

mpryc commented Feb 13, 2026

/hold needs some more testing.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 13, 2026
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/nodeagent.go (1)

400-418: ⚠️ Potential issue | 🟡 Minor

Add explicit guard for empty backup-repository config map name to match pattern in velero.go.

The code passes backupRepoConfigMapName directly to install.WithBackupRepoConfigMap() without verifying it's non-empty. While the name is conditionally populated, velero.go (line 435) shows the pattern used elsewhere in the codebase: if cmName.Name != "" before constructing the flag. Add the same guard before calling install.WithBackupRepoConfigMap() to ensure consistency and protect against unintended empty flag injection.

🤖 Fix all issues with AI agents
In `@bundle/manifests/oadp-operator.clusterserviceversion.yaml`:
- Around line 1055-1061: The RBAC rule incorrectly lists the resource
"persistentvolumerclaims" (typo) so it grants no access; update the resources
array used with the apiGroup "" to include the correct Kubernetes resource name
"persistentvolumeclaims" (and keep "persistentvolumes"), ensuring the
resources/verbs block around the rule that contains "persistentvolumes" and
"get" is updated to use "persistentvolumeclaims" instead of the misspelled
token.

In `@config/crd/bases/_.yaml`:
- Around line 1-13: The file _.yaml is an unused CRD scaffold containing empty
placeholders (apiVersion, kind, names, scope, versions) and should be deleted;
remove config/crd/bases/_.yaml from the repo (or move it to an archival
location) so dead scaffold CRD content is not kept alongside real CRDs
referenced by config/crd/kustomization.yaml.
🧹 Nitpick comments (1)
internal/controller/nodeagent.go (1)

130-145: Verify default StorageClass detection covers legacy annotation if required.
If any supported clusters still mark defaults with the beta annotation, this helper could miss them and cachePVC could be disabled. Confirm GA-only is acceptable or add a fallback check.

🔧 Optional fallback for legacy default annotation
-		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
+		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" ||
+			sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" {
 			return sc.Name, nil
 		}

Comment thread bundle/manifests/oadp-operator.clusterserviceversion.yaml Outdated
Comment thread config/crd/bases/_.yaml Outdated
Comment on lines +1 to +13
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.5
spec:
group: ""
names:
kind: ""
plural: ""
scope: ""
versions: null
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and see its content
find . -name "_.yaml" -path "*/crd/bases/*" | head -20

Repository: openshift/oadp-operator

Length of output: 92


🏁 Script executed:

# Search for references to this file in kustomization.yaml files
rg "_.yaml" -g "kustomization.yaml" -B 2 -A 2

Repository: openshift/oadp-operator

Length of output: 49


🏁 Script executed:

# Search for references to config/crd/bases directory in kustomization files
rg "config/crd/bases" -g "kustomization.yaml" -B 2 -A 2

Repository: openshift/oadp-operator

Length of output: 49


🏁 Script executed:

# Check if config/crd/bases directory is included in any kustomization
rg "config/crd" -g "kustomization.yaml" -l

Repository: openshift/oadp-operator

Length of output: 49


🏁 Script executed:

# Check kustomization directory structure to understand build patterns
find . -name "kustomization.yaml" -type f | head -20

Repository: openshift/oadp-operator

Length of output: 566


🏁 Script executed:

# Search for any reference to bases/ directory in kustomization files more broadly
rg "bases/" -g "kustomization.yaml" -B 2 -A 2 | head -50

Repository: openshift/oadp-operator

Length of output: 2596


🏁 Script executed:

# Check if there's any .gitignore or build configuration that might exclude this
cat config/crd/.gitignore 2>/dev/null || echo "no .gitignore in crd directory"

Repository: openshift/oadp-operator

Length of output: 97


🏁 Script executed:

# Verify the file content matches what's in the review
cat -n config/crd/bases/_.yaml

Repository: openshift/oadp-operator

Length of output: 388


Remove unused placeholder CRD file.

The _.yaml file is an empty scaffold that should either be populated with a real CRD or removed entirely. It is currently not referenced in config/crd/kustomization.yaml or any build configuration, so it poses no immediate risk; however, it should be cleaned up as dead code.

🤖 Prompt for AI Agents
In `@config/crd/bases/_.yaml` around lines 1 - 13, The file _.yaml is an unused
CRD scaffold containing empty placeholders (apiVersion, kind, names, scope,
versions) and should be deleted; remove config/crd/bases/_.yaml from the repo
(or move it to an archival location) so dead scaffold CRD content is not kept
alongside real CRDs referenced by config/crd/kustomization.yaml.

@mpryc mpryc force-pushed the oadp-dev-copia-cache-update branch 3 times, most recently from 3081300 to 153dbff Compare February 13, 2026 16:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/cache_volume_pvc_configuration.md`:
- Around line 32-34: Add a language tag to the fenced code block containing
"cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)" to satisfy
MD040; change the opening fence from ``` to something like ```text (or ```none)
so the block reads ```text followed by the existing line and the closing ```,
ensuring the fenced block has an explicit language tag.

In `@internal/controller/nodeagent.go`:
- Around line 130-145: The getDefaultStorageClass function currently only checks
the "storageclass.kubernetes.io/is-default-class" annotation; update it to also
consider the legacy "storageclass.beta.kubernetes.io/is-default-class"
annotation when deciding a default StorageClass. In the loop over scList.Items
(function getDefaultStorageClass) test both
sc.Annotations["storageclass.kubernetes.io/is-default-class"] and
sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] for the value
"true" (treat either as indicating default) and return sc.Name when either is
present; keep error handling and the rest of the function unchanged.

Comment on lines +32 to +34
```
cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced code block.

Markdownlint (MD040) flags fenced blocks without a language.

Suggested fix
-```
+```text
 cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
-```
+```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
```
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@docs/cache_volume_pvc_configuration.md` around lines 32 - 34, Add a language
tag to the fenced code block containing "cacheVolumeSize = cacheLimitMB * 1.2
(rounded up to the nearest GB)" to satisfy MD040; change the opening fence from
``` to something like ```text (or ```none) so the block reads ```text followed
by the existing line and the closing ```, ensuring the fenced block has an
explicit language tag.

Comment on lines +130 to +145
// getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists.
// A StorageClass is considered default if it has the annotation
// "storageclass.kubernetes.io/is-default-class" set to "true".
func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) {
scList := &storagev1.StorageClassList{}
if err := r.Client.List(r.Context, scList); err != nil {
return "", fmt.Errorf("failed to list storage classes: %w", err)
}
for i := range scList.Items {
sc := &scList.Items[i]
if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
return sc.Name, nil
}
}
return "", nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for default StorageClass annotation handling to keep behavior consistent.
rg -n "is-default-class" -S

Repository: openshift/oadp-operator

Length of output: 2092


🏁 Script executed:

#!/bin/bash
# Also search for StorageClass handling in general
rg -n "StorageClass" --type go -A 2 -B 2 | head -100

Repository: openshift/oadp-operator

Length of output: 7344


🏁 Script executed:

#!/bin/bash
# Check the exact file and context
cat -n internal/controller/nodeagent.go | sed -n '125,150p'

Repository: openshift/oadp-operator

Length of output: 1356


🌐 Web query:

Kubernetes StorageClass default annotations storageclass.beta.kubernetes.io/is-default-class

💡 Result:

storageclass.beta.kubernetes.io/is-default-class is the legacy (beta) annotation that was used to mark a StorageClass as the cluster default (so PVCs that omit storageClassName get this class). It appears in older Kubernetes examples/docs. [3]

Today, the supported/current annotation is:

metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"

Kubernetes treats any other value (or absence) as not default. [2]

To switch defaults:

# remove default from an existing class
kubectl patch storageclass standard -p \
  '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

# set a new default
kubectl patch storageclass gold -p \
  '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

[2]

Notes:

  • If more than one StorageClass is marked default, Kubernetes uses the most recently created default class for PVCs that omit storageClassName. [1]
  • Defaulting is applied by the DefaultStorageClass admission controller. [4]

Sources: Kubernetes StorageClasses concept doc [1], “Change the default StorageClass” task doc [2], older Kubernetes blog showing the beta annotation [3], dynamic provisioning doc [4].


Consider supporting the legacy StorageClass default annotation for compatibility.

While Kubernetes currently uses storageclass.kubernetes.io/is-default-class, the legacy annotation storageclass.beta.kubernetes.io/is-default-class was historically supported. Some older clusters may still use it, causing the default StorageClass lookup to fail and unexpectedly disabling cache PVC. Adding support for both annotations is a simple, non-invasive defensive measure. Note that similar code elsewhere in this project (e.g., tests/e2e/lib/virt_storage_helpers.go) also uses only the stable annotation.

Proposed fix
-		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
+		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" ||
+			sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" {
 			return sc.Name, nil
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists.
// A StorageClass is considered default if it has the annotation
// "storageclass.kubernetes.io/is-default-class" set to "true".
func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) {
scList := &storagev1.StorageClassList{}
if err := r.Client.List(r.Context, scList); err != nil {
return "", fmt.Errorf("failed to list storage classes: %w", err)
}
for i := range scList.Items {
sc := &scList.Items[i]
if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
return sc.Name, nil
}
}
return "", nil
}
// getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists.
// A StorageClass is considered default if it has the annotation
// "storageclass.kubernetes.io/is-default-class" set to "true".
func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) {
scList := &storagev1.StorageClassList{}
if err := r.Client.List(r.Context, scList); err != nil {
return "", fmt.Errorf("failed to list storage classes: %w", err)
}
for i := range scList.Items {
sc := &scList.Items[i]
if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" ||
sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" {
return sc.Name, nil
}
}
return "", nil
}
🤖 Prompt for AI Agents
In `@internal/controller/nodeagent.go` around lines 130 - 145, The
getDefaultStorageClass function currently only checks the
"storageclass.kubernetes.io/is-default-class" annotation; update it to also
consider the legacy "storageclass.beta.kubernetes.io/is-default-class"
annotation when deciding a default StorageClass. In the loop over scList.Items
(function getDefaultStorageClass) test both
sc.Annotations["storageclass.kubernetes.io/is-default-class"] and
sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] for the value
"true" (treat either as indicating default) and return sc.Name when either is
present; keep error handling and the rest of the function unchanged.

@mpryc mpryc force-pushed the oadp-dev-copia-cache-update branch from 153dbff to d47c738 Compare February 23, 2026 12:25
@mpryc
Copy link
Copy Markdown
Contributor Author

mpryc commented Feb 23, 2026

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 23, 2026
@mpryc
Copy link
Copy Markdown
Contributor Author

mpryc commented Feb 23, 2026

Below is the 100% AI driven manual test plan and execution performed by the Claude Opus 4.6 model

OADP-4855 Manual Test Plan: cachePVC Support for Kopia Cache Volume

Overview

This test plan validates the cachePVC feature added in commit d47c738, which enables dedicated PVC-backed cache storage for Kopia during restore operations. Tests are executed on an IBM Cloud VPC cluster with the OADP operator already deployed.

Cluster Environment

  • Platform: IBM Cloud VPC (ROKS)
  • Default StorageClass: ibmc-vpc-block-10iops-tier
  • Alternative StorageClass: ibmc-vpc-block-5iops-tier
  • Namespace: openshift-adp
  • Credentials Secret: cloud-credentials-1
  • BSL Bucket: mpryc-nac-first

Prerequisites

  • OADP operator is deployed with the OADP-4855 commit
  • Cloud credentials secret cloud-credentials-1 exists in openshift-adp
  • No DPA currently exists (clean slate)

Execution Summary

Test Case Description Result
TC1 cachePVC with explicit StorageClass PASSED
TC2 cachePVC with default StorageClass (empty object) PASSED
TC3 cachePVC with residentThresholdInMB PASSED
TC4 Backward compatibility (no cachePVC) PASSED
TC5 Add cachePVC to existing DPA via patch PASSED
TC6 Remove cachePVC from existing DPA PASSED
TC7 Non-existent StorageClass PASSED
TC8 End-to-end backup and restore with cachePVC PASSED

Execution Date: 2026-02-23

Key Findings

  • The node-agent ConfigMap is named node-agent-<dpa-name> (e.g., node-agent-nac-testing-dpa), not node-agent-config as might be expected.
  • Default StorageClass auto-resolution works correctly (TC2 resolved ibmc-vpc-block-10iops-tier).
  • The operator does not validate StorageClass existence for explicit values (TC7) -- validation occurs at restore time by Velero.
  • Adding/removing cachePVC via patch or replace correctly triggers re-reconciliation and ConfigMap updates (TC5, TC6).
  • TC8 backup/restore cycle completed successfully with cachePVC configured. No cache PVCs were dynamically created because the test workload (nginx deployment) had no PVCs to trigger fs-backup/restore data movement. Cache PVC provisioning occurs only during DataDownload/PodVolumeRestore operations on actual PVC-backed volumes.

Test Case 1: DPA with cachePVC using explicit StorageClass

Objective: Verify that specifying cachePVC with an explicit storageClass creates the node-agent ConfigMap with the correct cachePVC configuration and deploys the node-agent DaemonSet.

DPA Configuration (/tmp/tc1_dpa.yaml):

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: nac-testing-dpa
  namespace: openshift-adp
spec:
  configuration:
    velero:
      defaultPlugins:
        - openshift
        - aws
      resourceTimeout: 10m
    nodeAgent:
      enable: true
      uploaderType: kopia
      cachePVC:
        storageClass: ibmc-vpc-block-5iops-tier
  backupLocations:
    - name: default
      velero:
        provider: aws
        default: true
        objectStorage:
          bucket: mpryc-nac-first
          prefix: default-nac-bucket
        config:
          region: eu-central-1
        credential:
          key: default
          name: cloud-credentials-1

Steps:

  1. Apply the DPA: oc apply -f /tmp/tc1_dpa.yaml
  2. Wait for DPA to reconcile: oc wait --for=condition=Reconciled dpa/nac-testing-dpa -n openshift-adp --timeout=120s
  3. Verify DPA status: oc get dpa nac-testing-dpa -n openshift-adp -o jsonpath='{.status.conditions}'
  4. Check node-agent ConfigMap exists: oc get configmap node-agent-config -n openshift-adp -o yaml
  5. Verify ConfigMap contains cachePVC with storageClass: ibmc-vpc-block-5iops-tier
  6. Verify node-agent DaemonSet is running: oc get daemonset node-agent -n openshift-adp
  7. Verify node-agent pods are Ready: oc get pods -n openshift-adp -l name=node-agent

Expected Results:

  • DPA reconciles successfully (Reconciled=True)
  • ConfigMap node-agent-config exists with cachePVC.storageClass set to ibmc-vpc-block-5iops-tier
  • node-agent DaemonSet is running with all pods Ready

Cleanup: oc delete dpa nac-testing-dpa -n openshift-adp

Result: PASSED

  • DPA reconciled: Reconciled=True, VeleroReady=True, NodeAgentReady=True (3/3 pods)
  • ConfigMap node-agent-nac-testing-dpa created with data: {"cachePVC":{"storageClass":"ibmc-vpc-block-5iops-tier"},"privilegedFsBackup":true}
  • Operator logs confirmed: NodeAgent config map created

Test Case 2: DPA with cachePVC using default StorageClass (empty object)

Objective: Verify that specifying cachePVC: {} without a storageClass causes the operator to auto-resolve the cluster's default StorageClass (ibmc-vpc-block-10iops-tier).

DPA Configuration (/tmp/tc2_dpa.yaml):

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: nac-testing-dpa
  namespace: openshift-adp
spec:
  configuration:
    velero:
      defaultPlugins:
        - openshift
        - aws
      resourceTimeout: 10m
    nodeAgent:
      enable: true
      uploaderType: kopia
      cachePVC: {}
  backupLocations:
    - name: default
      velero:
        provider: aws
        default: true
        objectStorage:
          bucket: mpryc-nac-first
          prefix: default-nac-bucket
        config:
          region: eu-central-1
        credential:
          key: default
          name: cloud-credentials-1

Steps:

  1. Apply the DPA: oc apply -f /tmp/tc2_dpa.yaml
  2. Wait for DPA to reconcile: oc wait --for=condition=Reconciled dpa/nac-testing-dpa -n openshift-adp --timeout=120s
  3. Verify DPA status
  4. Check node-agent ConfigMap: oc get configmap node-agent-config -n openshift-adp -o yaml
  5. Verify ConfigMap contains cachePVC with storageClass: ibmc-vpc-block-10iops-tier (auto-resolved default)
  6. Verify node-agent pods are Ready

Expected Results:

  • DPA reconciles successfully
  • ConfigMap node-agent-config contains cachePVC.storageClass set to ibmc-vpc-block-10iops-tier (the cluster default)
  • node-agent DaemonSet runs with all pods Ready

Cleanup: oc delete dpa nac-testing-dpa -n openshift-adp

Result: PASSED

  • Operator auto-resolved default StorageClass to ibmc-vpc-block-10iops-tier
  • ConfigMap data: {"cachePVC":{"storageClass":"ibmc-vpc-block-10iops-tier"},"privilegedFsBackup":true}
  • All 3 node-agent pods running

Test Case 3: DPA with cachePVC including residentThresholdInMB

Objective: Verify that residentThresholdInMB is correctly serialized into the node-agent ConfigMap.

DPA Configuration (/tmp/tc3_dpa.yaml):

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: nac-testing-dpa
  namespace: openshift-adp
spec:
  configuration:
    velero:
      defaultPlugins:
        - openshift
        - aws
      resourceTimeout: 10m
    nodeAgent:
      enable: true
      uploaderType: kopia
      cachePVC:
        storageClass: ibmc-vpc-block-5iops-tier
        residentThresholdInMB: 1024
  backupLocations:
    - name: default
      velero:
        provider: aws
        default: true
        objectStorage:
          bucket: mpryc-nac-first
          prefix: default-nac-bucket
        config:
          region: eu-central-1
        credential:
          key: default
          name: cloud-credentials-1

Steps:

  1. Apply the DPA: oc apply -f /tmp/tc3_dpa.yaml
  2. Wait for DPA to reconcile
  3. Check node-agent ConfigMap: oc get configmap node-agent-config -n openshift-adp -o yaml
  4. Verify ConfigMap contains both storageClass and residentThresholdInMB: 1024
  5. Verify node-agent pods are Ready

Expected Results:

  • ConfigMap node-agent-config contains cachePVC with:
    • storageClass: ibmc-vpc-block-5iops-tier
    • residentThresholdInMB: 1024
  • node-agent pods are Ready

Cleanup: oc delete dpa nac-testing-dpa -n openshift-adp

Result: PASSED

  • ConfigMap data: {"cachePVC":{"storageClass":"ibmc-vpc-block-5iops-tier","residentThresholdInMB":1024},"privilegedFsBackup":true}
  • Both fields correctly serialized into the ConfigMap
  • All 3 node-agent pods running

Test Case 4: DPA without cachePVC (backward compatibility)

Objective: Verify that omitting cachePVC entirely results in no cachePVC configuration in the ConfigMap (backward compatibility).

DPA Configuration (/tmp/tc4_dpa.yaml):

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: nac-testing-dpa
  namespace: openshift-adp
spec:
  configuration:
    velero:
      defaultPlugins:
        - openshift
        - aws
      resourceTimeout: 10m
    nodeAgent:
      enable: true
      uploaderType: kopia
  backupLocations:
    - name: default
      velero:
        provider: aws
        default: true
        objectStorage:
          bucket: mpryc-nac-first
          prefix: default-nac-bucket
        config:
          region: eu-central-1
        credential:
          key: default
          name: cloud-credentials-1

Steps:

  1. Apply the DPA: oc apply -f /tmp/tc4_dpa.yaml
  2. Wait for DPA to reconcile
  3. Check if node-agent ConfigMap exists: oc get configmap node-agent-config -n openshift-adp -o yaml 2>&1
  4. If ConfigMap exists, verify it does NOT contain cachePVC
  5. Verify node-agent pods are Ready (node-agent should still deploy since enable: true)
  6. Verify velero pod is running

Expected Results:

  • DPA reconciles successfully
  • ConfigMap either does not exist or does not contain cachePVC key in JSON
  • node-agent DaemonSet runs normally (backward-compatible behavior)

Cleanup: oc delete dpa nac-testing-dpa -n openshift-adp

Result: PASSED

  • ConfigMap exists but contains only: {"privilegedFsBackup":true} -- no cachePVC key
  • Node-agent DaemonSet running (3/3 pods), Velero pod running
  • Backward compatibility confirmed

Test Case 5: DPA update - add cachePVC to existing DPA

Objective: Verify that adding cachePVC to an already-reconciled DPA (without cachePVC) triggers re-reconciliation and creates the correct ConfigMap entry.

Steps:

  1. Apply DPA without cachePVC (TC4 config): oc apply -f /tmp/tc4_dpa.yaml
  2. Wait for DPA to reconcile
  3. Confirm no cachePVC in ConfigMap
  4. Patch DPA to add cachePVC:
    oc patch dpa nac-testing-dpa -n openshift-adp --type=merge -p '{"spec":{"configuration":{"nodeAgent":{"cachePVC":{"storageClass":"ibmc-vpc-block-5iops-tier"}}}}}'
    
  5. Wait for re-reconciliation (10-15 seconds)
  6. Check ConfigMap now contains cachePVC with correct storageClass
  7. Verify node-agent pods restarted/updated

Expected Results:

  • After patch, ConfigMap is updated to include cachePVC.storageClass: ibmc-vpc-block-5iops-tier
  • node-agent DaemonSet continues to run

Cleanup: oc delete dpa nac-testing-dpa -n openshift-adp

Result: PASSED

  • Before patch: ConfigMap had {"privilegedFsBackup":true} (no cachePVC)
  • After patch: ConfigMap updated to {"cachePVC":{"storageClass":"ibmc-vpc-block-5iops-tier"},"privilegedFsBackup":true}
  • Node-agent pods restarted with new names confirming DaemonSet update

Test Case 6: DPA update - remove cachePVC from existing DPA

Objective: Verify that removing cachePVC from a DPA that had it configured cleans up the ConfigMap entry.

Steps:

  1. Apply DPA with cachePVC (TC1 config): oc apply -f /tmp/tc1_dpa.yaml
  2. Wait for DPA to reconcile
  3. Confirm cachePVC exists in ConfigMap
  4. Apply DPA without cachePVC (TC4 config): oc apply -f /tmp/tc4_dpa.yaml
  5. Wait for re-reconciliation
  6. Check ConfigMap no longer contains cachePVC

Expected Results:

  • After applying DPA without cachePVC, the ConfigMap is updated to remove the cachePVC entry
  • node-agent DaemonSet continues to run

Cleanup: oc delete dpa nac-testing-dpa -n openshift-adp

Result: PASSED

  • Used oc replace (not oc apply) because oc patch doesn't update last-applied-configuration annotation
  • After replacement: ConfigMap reverted to {"privilegedFsBackup":true} -- cachePVC removed
  • DPA spec confirmed: no cachePVC field present

Test Case 7: DPA with cachePVC and non-existent StorageClass

Objective: Verify behavior when a non-existent StorageClass is specified. The operator should still reconcile (the StorageClass is only used at restore time by Velero, not validated by the operator at DPA creation time).

DPA Configuration (/tmp/tc7_dpa.yaml):

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: nac-testing-dpa
  namespace: openshift-adp
spec:
  configuration:
    velero:
      defaultPlugins:
        - openshift
        - aws
      resourceTimeout: 10m
    nodeAgent:
      enable: true
      uploaderType: kopia
      cachePVC:
        storageClass: nonexistent-storage-class
  backupLocations:
    - name: default
      velero:
        provider: aws
        default: true
        objectStorage:
          bucket: mpryc-nac-first
          prefix: default-nac-bucket
        config:
          region: eu-central-1
        credential:
          key: default
          name: cloud-credentials-1

Steps:

  1. Apply the DPA: oc apply -f /tmp/tc7_dpa.yaml
  2. Wait for DPA to reconcile
  3. Check DPA status conditions
  4. Check ConfigMap for the non-existent storageClass value
  5. Verify node-agent pods are Ready

Expected Results:

  • DPA reconciles (the operator doesn't validate storageClass existence for explicit values)
  • ConfigMap contains cachePVC.storageClass: nonexistent-storage-class
  • node-agent DaemonSet runs (failure would occur at restore time, not at configuration time)

Cleanup: oc delete dpa nac-testing-dpa -n openshift-adp

Result: PASSED

  • DPA reconciled with all conditions True despite non-existent StorageClass
  • ConfigMap data: {"cachePVC":{"storageClass":"nonexistent-storage-class"},"privilegedFsBackup":true}
  • Operator passes explicit storageClass values through without validation
  • All 3 node-agent pods running -- failure would only occur at restore time

Test Case 8: End-to-end backup and restore with cachePVC

Objective: Perform an actual backup and restore cycle with cachePVC enabled to verify the full flow including cache PVC creation during restore.

Steps:

  1. Create a test namespace with a sample workload:
    oc new-project cache-pvc-test
    oc create deployment nginx --image=nginx:latest -n cache-pvc-test
    oc wait --for=condition=Available deployment/nginx -n cache-pvc-test --timeout=120s
    
  2. Apply DPA with cachePVC (TC1 config): oc apply -f /tmp/tc1_dpa.yaml
  3. Wait for DPA to reconcile and velero pod to be Ready
  4. Create a Backup:
    apiVersion: velero.io/v1
    kind: Backup
    metadata:
      name: cache-pvc-test-backup
      namespace: openshift-adp
    spec:
      includedNamespaces:
        - cache-pvc-test
      defaultVolumesToFsBackup: true
  5. Wait for backup to complete: oc wait --for=jsonpath='{.status.phase}'=Completed backup/cache-pvc-test-backup -n openshift-adp --timeout=300s
  6. Delete the test namespace: oc delete ns cache-pvc-test --wait=true
  7. Create a Restore:
    apiVersion: velero.io/v1
    kind: Restore
    metadata:
      name: cache-pvc-test-restore
      namespace: openshift-adp
    spec:
      backupName: cache-pvc-test-backup
      includedNamespaces:
        - cache-pvc-test
  8. During restore, check for cache PVCs: oc get pvc -n openshift-adp -l velero.io/component=cache-volume (timing-sensitive)
  9. Wait for restore to complete: oc wait --for=jsonpath='{.status.phase}'=Completed restore/cache-pvc-test-restore -n openshift-adp --timeout=300s
  10. Verify test namespace and workload restored: oc get deployment nginx -n cache-pvc-test
  11. After restore, verify cache PVCs are cleaned up: oc get pvc -n openshift-adp

Expected Results:

  • Backup completes successfully
  • Restore completes successfully
  • During restore, cache PVC(s) may be created (if backup data exceeds threshold)
  • After restore, cache PVCs are cleaned up
  • Workload is restored in cache-pvc-test namespace

Cleanup:

oc delete restores.velero.io cache-pvc-test-restore -n openshift-adp
oc delete backups.velero.io cache-pvc-test-backup -n openshift-adp
oc delete ns cache-pvc-test
oc delete dpa nac-testing-dpa -n openshift-adp

Result: PASSED

  • Backup completed: 28 items backed up, phase=Completed
  • Restore completed: 20 items restored, phase=Completed (4 warnings, expected for OpenShift resources)
  • nginx deployment restored and running (1/1 Ready) in cache-pvc-test namespace
  • No cache PVCs observed during restore -- expected because the test workload had no PVCs, so no DataDownload/PodVolumeRestore operations triggered cache PVC provisioning
  • The cachePVC configuration was correctly propagated to the ConfigMap and did not interfere with backup/restore operations

Note: To fully exercise dynamic cache PVC creation, a test workload with PVC-backed persistent volumes would be needed. The cache PVC feature targets DataDownload and PodVolumeRestore operations that involve Kopia repository cache, which only occurs during data movement of actual persistent volume data.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/cache_volume_pvc_configuration.md (1)

32-34: Add a language tag to the fenced code block.

Markdownlint (MD040) flags fenced blocks without a language. Use text or none for non-code formulas.

Suggested fix
-```
+```text
 cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/cache_volume_pvc_configuration.md around lines 32 - 34, The fenced code
block containing the formula "cacheVolumeSize = cacheLimitMB * 1.2 (rounded up
to the nearest GB)" should include a language tag to satisfy markdownlint MD040;
update that block to use a non-code language tag such as text or none (e.g.,
change the opening fence to ```text) so the snippet is properly tagged—look for
the fenced block with the formula in docs/cache_volume_pvc_configuration.md and
add the language identifier.


</details>

</blockquote></details>
<details>
<summary>internal/controller/nodeagent.go (1)</summary><blockquote>

`130-145`: **Consider supporting the legacy StorageClass default annotation for compatibility.**

The function only checks `storageclass.kubernetes.io/is-default-class`. Some older clusters may still use the legacy annotation `storageclass.beta.kubernetes.io/is-default-class`, causing the default StorageClass lookup to fail unexpectedly.

<details>
<summary>Proposed fix</summary>

```diff
 	for i := range scList.Items {
 		sc := &scList.Items[i]
-		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
+		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" ||
+			sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" {
 			return sc.Name, nil
 		}
 	}
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/nodeagent.go` around lines 130 - 145, The
getDefaultStorageClass function only checks the modern annotation key; update
getDefaultStorageClass to also consider the legacy key
"storageclass.beta.kubernetes.io/is-default-class" when iterating scList.Items
(sc variable) and ensure sc.Annotations is nil-safe before reading keys; return
sc.Name when either annotation equals "true" (case-insensitive if desired) so
older clusters that use the legacy annotation are supported.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (1)</summary><blockquote>

<details>
<summary>config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml (1)</summary><blockquote>

`261-275`: **Consider adding `minimum: 0` constraint to `residentThresholdInMB` for consistency.**

The sibling field `cacheLimitMB` (line 259) includes a `minimum: 0` constraint, but `residentThresholdInMB` does not. Negative thresholds are semantically invalid, and adding this constraint would provide consistent schema-level validation and better API documentation.



<details>
<summary>💡 Suggested change</summary>

```diff
                         cachePVC:
                           description: |-
                             CachePVCConfig configures a dedicated PVC for Kopia repository cache during restore operations.
                             When set, cache data is stored on a provisioned PVC instead of the pod's root filesystem,
                             preventing ephemeral storage exhaustion on nodes during concurrent restores.
                             If storageClass is omitted, the cluster's default StorageClass is used.
                           properties:
                             residentThresholdInMB:
                               description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
                               format: int64
+                              minimum: 0
                               type: integer
                             storageClass:
                               description: StorageClass specifies the storage class for cache PVC
                               type: string
                           type: object
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml` around
lines 261 - 275, The schema for cachePVC's residentThresholdInMB allows negative
values; add a JSON schema constraint "minimum: 0" to residentThresholdInMB to
match the sibling cacheLimitMB and prevent negative thresholds. Locate the
cachePVC object (properties -> residentThresholdInMB) in the CRD and add the
minimum: 0 key alongside type/format so the field is validated as non-negative.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @bundle/manifests/oadp.openshift.io_dataprotectionapplications.yaml:

  • Around line 261-275: Add a non-negative validation for the
    residentThresholdInMB field in the cachePVC schema: update the cachePVC
    properties so that residentThresholdInMB (under cachePVC) includes "minimum: 0"
    alongside its existing type/format to enforce >=0 validation for the
    ResidentThresholdInMB value.

Duplicate comments:
In @docs/cache_volume_pvc_configuration.md:

  • Around line 32-34: The fenced code block containing the formula
    "cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)" should
    include a language tag to satisfy markdownlint MD040; update that block to use a
    non-code language tag such as text or none (e.g., change the opening fence to
formula in docs/cache_volume_pvc_configuration.md and add the language
identifier.

In `@internal/controller/nodeagent.go`:
- Around line 130-145: The getDefaultStorageClass function only checks the
modern annotation key; update getDefaultStorageClass to also consider the legacy
key "storageclass.beta.kubernetes.io/is-default-class" when iterating
scList.Items (sc variable) and ensure sc.Annotations is nil-safe before reading
keys; return sc.Name when either annotation equals "true" (case-insensitive if
desired) so older clusters that use the legacy annotation are supported.

---

Nitpick comments:
In `@config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml`:
- Around line 261-275: The schema for cachePVC's residentThresholdInMB allows
negative values; add a JSON schema constraint "minimum: 0" to
residentThresholdInMB to match the sibling cacheLimitMB and prevent negative
thresholds. Locate the cachePVC object (properties -> residentThresholdInMB) in
the CRD and add the minimum: 0 key alongside type/format so the field is
validated as non-negative.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 153dbff and d47c738.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (14)
  • api/v1alpha1/dataprotectionapplication_types.go
  • bundle/manifests/oadp-operator.clusterserviceversion.yaml
  • bundle/manifests/oadp.openshift.io_dataprotectionapplications.yaml
  • bundle/manifests/oadp.openshift.io_dataprotectiontests.yaml
  • bundle/manifests/velero.io_backupstoragelocations.yaml
  • config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml
  • config/crd/bases/oadp.openshift.io_dataprotectiontests.yaml
  • config/crd/bases/velero.io_backupstoragelocations.yaml
  • config/rbac/role.yaml
  • docs/cache_volume_pvc_configuration.md
  • go.mod
  • internal/controller/dataprotectionapplication_controller.go
  • internal/controller/nodeagent.go
  • internal/controller/nodeagent_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/controller/dataprotectionapplication_controller.go
  • bundle/manifests/oadp-operator.clusterserviceversion.yaml
  • bundle/manifests/velero.io_backupstoragelocations.yaml

Comment on lines +261 to +275
cachePVC:
description: |-
CachePVCConfig configures a dedicated PVC for Kopia repository cache during restore operations.
When set, cache data is stored on a provisioned PVC instead of the pod's root filesystem,
preventing ephemeral storage exhaustion on nodes during concurrent restores.
If storageClass is omitted, the cluster's default StorageClass is used.
properties:
residentThresholdInMB:
description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
format: int64
type: integer
storageClass:
description: StorageClass specifies the storage class for cache PVC
type: string
type: object
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add non-negative validation for residentThresholdInMB.

A negative threshold is nonsensical and could lead to unexpected controller behavior; add minimum: 0 for schema validation.

🛠️ Suggested schema tweak
                         cachePVC:
                           description: |-
                             CachePVCConfig configures a dedicated PVC for Kopia repository cache during restore operations.
                             When set, cache data is stored on a provisioned PVC instead of the pod's root filesystem,
                             preventing ephemeral storage exhaustion on nodes during concurrent restores.
                             If storageClass is omitted, the cluster's default StorageClass is used.
                           properties:
                             residentThresholdInMB:
                               description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
                               format: int64
+                              minimum: 0
                               type: integer
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cachePVC:
description: |-
CachePVCConfig configures a dedicated PVC for Kopia repository cache during restore operations.
When set, cache data is stored on a provisioned PVC instead of the pod's root filesystem,
preventing ephemeral storage exhaustion on nodes during concurrent restores.
If storageClass is omitted, the cluster's default StorageClass is used.
properties:
residentThresholdInMB:
description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
format: int64
type: integer
storageClass:
description: StorageClass specifies the storage class for cache PVC
type: string
type: object
cachePVC:
description: |-
CachePVCConfig configures a dedicated PVC for Kopia repository cache during restore operations.
When set, cache data is stored on a provisioned PVC instead of the pod's root filesystem,
preventing ephemeral storage exhaustion on nodes during concurrent restores.
If storageClass is omitted, the cluster's default StorageClass is used.
properties:
residentThresholdInMB:
description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
format: int64
minimum: 0
type: integer
storageClass:
description: StorageClass specifies the storage class for cache PVC
type: string
type: object
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bundle/manifests/oadp.openshift.io_dataprotectionapplications.yaml` around
lines 261 - 275, Add a non-negative validation for the residentThresholdInMB
field in the cachePVC schema: update the cachePVC properties so that
residentThresholdInMB (under cachePVC) includes "minimum: 0" alongside its
existing type/format to enforce >=0 validation for the ResidentThresholdInMB
value.

@weshayutin
Copy link
Copy Markdown
Contributor

/retest

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2026
@weshayutin
Copy link
Copy Markdown
Contributor

@mpryc LGTM but now needs a rebase.. :( sorry..

Add cachePVC configuration to the DPA CRD under spec.configuration.nodeAgent,
enabling dedicated PVC-backed cache volumes for Kopia repository cache during
data movement restore operations. This prevents ephemeral storage exhaustion
on nodes with limited root filesystem space during concurrent restores.

- Add CachePVCConfig field to NodeAgentConfigMapSettings using upstream
  types.CachePVC directly to prevent field name drift with Velero
- Serialize cachePVC config into the node-agent ConfigMap so Velero
  provisions per-restore cache PVCs instead of using pod root filesystem
- Add default StorageClass resolution when cachePVC is configured without
  explicit storageClass, falling back to the cluster's default SC
- Pass backup-repository-configmap to node-agent DaemonSet for cache
  volume size calculation (cacheLimitMB * 1.2)
- Add RBAC for StorageClass list/get/watch
- Add unit tests for ConfigMap assembly, default SC resolution,
  isNodeAgentCMRequired, and DaemonSet build with backup-repo configmap
- Add user documentation with configuration examples

Signed-off-by: Michal Pryc <mpryc@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
@mpryc mpryc force-pushed the oadp-dev-copia-cache-update branch from d47c738 to 4da4e85 Compare February 24, 2026 13:40
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2026
@mpryc
Copy link
Copy Markdown
Contributor Author

mpryc commented Feb 24, 2026

To be 100% AI driven, rebase process was also performed by the Claude.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
docs/cache_volume_pvc_configuration.md (1)

32-34: ⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced code block.

📝 Suggested fix
-```
+```text
 cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
-```
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cache_volume_pvc_configuration.md` around lines 32 - 34, The fenced code
block containing the expression "cacheVolumeSize = cacheLimitMB * 1.2 (rounded
up to the nearest GB)" should include a language tag (e.g., "text" or "bash")
after the opening ``` so the block renders correctly; update the opening fence
from ``` to ```text (or another appropriate tag) around that line in
docs/cache_volume_pvc_configuration.md.
🧹 Nitpick comments (2)
internal/controller/nodeagent.go (1)

130-145: Make default StorageClass selection deterministic when multiple defaults exist.

List order isn’t guaranteed, so “first match” can be non‑deterministic. Consider choosing the most recently created default (or surfacing a warning).

🔧 Possible deterministic selection
-	for i := range scList.Items {
-		sc := &scList.Items[i]
-		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
-			return sc.Name, nil
-		}
-	}
-	return "", nil
+	var latest *storagev1.StorageClass
+	for i := range scList.Items {
+		sc := &scList.Items[i]
+		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
+			if latest == nil || sc.CreationTimestamp.Time.After(latest.CreationTimestamp.Time) {
+				latest = sc
+			}
+		}
+	}
+	if latest != nil {
+		return latest.Name, nil
+	}
+	return "", nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/nodeagent.go` around lines 130 - 145, The
getDefaultStorageClass function currently returns the "first" default
StorageClass which is non-deterministic; change it to collect all StorageClasses
whose annotation "storageclass.kubernetes.io/is-default-class" == "true" and
pick the most recently created one deterministically (e.g., compare each
matching StorageClass's CreationTimestamp and return the one with the latest
timestamp). Update the logic inside getDefaultStorageClass (use scList.Items,
sc.Annotations, and sc.CreationTimestamp) to choose the newest default and
return its Name, keeping the same error handling for r.Client.List; optionally
surface a warning if multiple defaults were found before returning the chosen
one.
internal/controller/nodeagent_test.go (1)

2050-2084: Consider adding a test case for disableFsBackup: nil.

The function signature takes *bool, and a nil pointer is a distinct state from false. Currently all three cases pass an explicit ptr.To(true/false). Adding a nil-pointer case (and an empty-config+nil case) would close that coverage gap.

✅ Suggested additional cases
  {
      name:            "CachePVCConfig set, fs-backup enabled, CM should be required",
      config:          oadpv1alpha1.NodeAgentConfigMapSettings{CachePVCConfig: &velerotypes.CachePVC{StorageClass: "gp3-csi"}},
      disableFsBackup: ptr.To(false),
      want:            true,
  },
+ {
+     name:            "CachePVCConfig set, disableFsBackup nil, CM should be required",
+     config:          oadpv1alpha1.NodeAgentConfigMapSettings{CachePVCConfig: &velerotypes.CachePVC{StorageClass: "gp3-csi"}},
+     disableFsBackup: nil,
+     want:            true,
+ },
+ {
+     name:            "No config set, disableFsBackup nil, CM should not be required",
+     config:          oadpv1alpha1.NodeAgentConfigMapSettings{},
+     disableFsBackup: nil,
+     want:            false,
+ },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/nodeagent_test.go` around lines 2050 - 2084, Add two
table-driven test cases in Test_isNodeAgentCMRequired_CachePVCConfig to cover a
nil disableFsBackup pointer: one where config has CachePVCConfig set and
disableFsBackup is nil (expect want: true), and one with an empty config and
disableFsBackup nil (expect want: false); update the tests slice in
Test_isNodeAgentCMRequired_CachePVCConfig so the new entries reference
isNodeAgentCMRequired and use ptr.Nil (or simply nil) for disableFsBackup to
ensure the function behavior for a nil *bool is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml`:
- Around line 268-272: The CRD schema for residentThresholdInMB allows negative
values; update the OpenAPI schema entry for residentThresholdInMB to include a
minimum: 0 (same style as cacheLimitMB) so it is validated as a non-negative
integer (keep type: integer and format: int64). Locate the residentThresholdInMB
property in the CRD and add the minimum field to prevent invalid CRs.

In `@internal/controller/nodeagent_test.go`:
- Around line 2263-2267: The test currently does a bare type assertion
cachePVC.(map[string]interface{}) which can panic on unexpected JSON shapes;
change it to the comma-ok form: cachePVCMap, ok :=
cachePVC.(map[string]interface{}), then assert require.True(t, ok, "cachePVC
should be a JSON object in ConfigMap") (immediately after the existing presence
check) before accessing cachePVCMap["storageClass"] so the test fails cleanly
instead of panicking.
- Around line 2086-2153: Register the storagev1 types in the fake client scheme
by adding storagev1.AddToScheme(scheme.Scheme) inside getSchemeForFakeClient (in
bsl_test.go) so StorageClass objects deserialize properly, and add a new test
case to Test_getDefaultStorageClass (nodeagent_test.go) that creates two
storagev1.StorageClass objects both annotated
"storageclass.kubernetes.io/is-default-class":"true" to assert
getDefaultStorageClass on DataProtectionApplicationReconciler returns the first
matching StorageClass name (documenting current non-deterministic ordering).

In `@internal/controller/nodeagent.go`:
- Around line 163-171: The code currently clears
configWithPrivileged.CachePVCConfig when r.getDefaultStorageClass() returns an
error, silently disabling the cache PVC; instead, modify the reconciliation to
return the error so the controller retries (do not set CachePVCConfig = nil on
getDefaultStorageClass error). Specifically, in the block that calls
r.getDefaultStorageClass() (refer to configWithPrivileged.CachePVCConfig and
r.getDefaultStorageClass()), change the error branch to return the error (or
wrap it with context) rather than logging and nil’ing the config; keep the
existing behavior only when defaultSC is empty after a successful call.

---

Duplicate comments:
In `@docs/cache_volume_pvc_configuration.md`:
- Around line 32-34: The fenced code block containing the expression
"cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)" should
include a language tag (e.g., "text" or "bash") after the opening ``` so the
block renders correctly; update the opening fence from ``` to ```text (or
another appropriate tag) around that line in
docs/cache_volume_pvc_configuration.md.

---

Nitpick comments:
In `@internal/controller/nodeagent_test.go`:
- Around line 2050-2084: Add two table-driven test cases in
Test_isNodeAgentCMRequired_CachePVCConfig to cover a nil disableFsBackup
pointer: one where config has CachePVCConfig set and disableFsBackup is nil
(expect want: true), and one with an empty config and disableFsBackup nil
(expect want: false); update the tests slice in
Test_isNodeAgentCMRequired_CachePVCConfig so the new entries reference
isNodeAgentCMRequired and use ptr.Nil (or simply nil) for disableFsBackup to
ensure the function behavior for a nil *bool is exercised.

In `@internal/controller/nodeagent.go`:
- Around line 130-145: The getDefaultStorageClass function currently returns the
"first" default StorageClass which is non-deterministic; change it to collect
all StorageClasses whose annotation
"storageclass.kubernetes.io/is-default-class" == "true" and pick the most
recently created one deterministically (e.g., compare each matching
StorageClass's CreationTimestamp and return the one with the latest timestamp).
Update the logic inside getDefaultStorageClass (use scList.Items,
sc.Annotations, and sc.CreationTimestamp) to choose the newest default and
return its Name, keeping the same error handling for r.Client.List; optionally
surface a warning if multiple defaults were found before returning the chosen
one.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between d47c738 and 4da4e85.

📒 Files selected for processing (11)
  • api/v1alpha1/dataprotectionapplication_types.go
  • bundle/manifests/oadp-operator.clusterserviceversion.yaml
  • bundle/manifests/oadp.openshift.io_dataprotectionapplications.yaml
  • bundle/manifests/velero.io_backupstoragelocations.yaml
  • config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml
  • config/crd/bases/velero.io_backupstoragelocations.yaml
  • config/rbac/role.yaml
  • docs/cache_volume_pvc_configuration.md
  • internal/controller/dataprotectionapplication_controller.go
  • internal/controller/nodeagent.go
  • internal/controller/nodeagent_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • bundle/manifests/oadp.openshift.io_dataprotectionapplications.yaml
  • bundle/manifests/velero.io_backupstoragelocations.yaml
  • config/rbac/role.yaml
  • bundle/manifests/oadp-operator.clusterserviceversion.yaml
  • config/crd/bases/velero.io_backupstoragelocations.yaml

Comment on lines +268 to +272
residentThresholdInMB:
description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
format: int64
type: integer
storageClass:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate residentThresholdInMB as non‑negative.

Negative thresholds don’t make sense here; add a minimum of 0 to prevent invalid CRs and align with cacheLimitMB.

✅ Suggested schema tweak
                             residentThresholdInMB:
                               description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
                               format: int64
+                              minimum: 0
                               type: integer
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
residentThresholdInMB:
description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
format: int64
type: integer
storageClass:
residentThresholdInMB:
description: ResidentThresholdInMB specifies the minimum size of the backup data to create cache PVC
format: int64
minimum: 0
type: integer
storageClass:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml` around
lines 268 - 272, The CRD schema for residentThresholdInMB allows negative
values; update the OpenAPI schema entry for residentThresholdInMB to include a
minimum: 0 (same style as cacheLimitMB) so it is validated as a non-negative
integer (keep type: integer and format: int64). Locate the residentThresholdInMB
property in the CRD and add the minimum field to prevent invalid CRs.

Comment on lines +2086 to +2153
func Test_getDefaultStorageClass(t *testing.T) {
tests := []struct {
name string
objects []client.Object
want string
wantErr bool
}{
{
name: "no storage classes exist",
objects: nil,
want: "",
wantErr: false,
},
{
name: "default storage class exists",
objects: []client.Object{
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "gp3-csi",
Annotations: map[string]string{
"storageclass.kubernetes.io/is-default-class": "true",
},
},
},
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "standard",
},
},
},
want: "gp3-csi",
wantErr: false,
},
{
name: "no default storage class",
objects: []client.Object{
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "gp3-csi",
},
},
},
want: "",
wantErr: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
fakeClient, err := getFakeClientFromObjects(tt.objects...)
if err != nil {
t.Fatalf("error in creating fake client: %v", err)
}
r := &DataProtectionApplicationReconciler{
Client: fakeClient,
Scheme: fakeClient.Scheme(),
Log: logr.Discard(),
Context: newContextForTest(),
}
got, err := r.getDefaultStorageClass()
if (err != nil) != tt.wantErr {
t.Errorf("getDefaultStorageClass() error = %v, wantErr %v", err, tt.wantErr)
}
if got != tt.want {
t.Errorf("getDefaultStorageClass() = %v, want %v", got, tt.want)
}
})
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check scheme registration for storagev1 in test setup
echo "=== Checking suite_test.go for storagev1 scheme registration ==="
rg -n "storagev1\|storage/v1\|AddToScheme\|SchemeBuilder" --type=go -C3 internal/controller/suite_test.go

echo -e "\n=== Checking for getFakeClientFromObjects definition ==="
rg -n "func getFakeClientFromObjects\|getFakeClientFromObjects" --type=go -B2 -A10 internal/controller/ | head -50

Repository: openshift/oadp-operator

Length of output: 189


🏁 Script executed:

#!/bin/bash
# Check getDefaultStorageClass implementation in production code
echo "=== Checking nodeagent.go for getDefaultStorageClass implementation ==="
rg -n "func.*getDefaultStorageClass\|is-default-class\|is-default" --type=go -C5 internal/controller/nodeagent.go

Repository: openshift/oadp-operator

Length of output: 138


🏁 Script executed:

#!/bin/bash
# Find all relevant files
echo "=== Looking for nodeagent files ==="
find . -name "*nodeagent*" -type f 2>/dev/null | head -20

echo -e "\n=== Looking for suite_test files ==="
find . -name "suite_test.go" -type f 2>/dev/null

echo -e "\n=== Files in internal/controller ==="
ls -la internal/controller/ 2>/dev/null | head -30

Repository: openshift/oadp-operator

Length of output: 2475


🏁 Script executed:

#!/bin/bash
# Search for getFakeClientFromObjects definition across the repo
echo "=== Searching for getFakeClientFromObjects ==="
rg -n "getFakeClientFromObjects" --type=go -B2 -A10

echo -e "\n=== Searching for getDefaultStorageClass ==="
rg -n "getDefaultStorageClass" --type=go -B2 -A10

Repository: openshift/oadp-operator

Length of output: 41244


🏁 Script executed:

#!/bin/bash
# Check getSchemeForFakeClient implementation
echo "=== Checking getSchemeForFakeClient definition ==="
rg -n "func getSchemeForFakeClient" --type=go -A20 internal/controller/bsl_test.go | head -40

Repository: openshift/oadp-operator

Length of output: 655


🏁 Script executed:

#!/bin/bash
# Check if storagev1 is imported and check the full scheme setup
echo "=== Checking imports in bsl_test.go for storagev1 ==="
head -50 internal/controller/bsl_test.go | grep -i "storagev1\|storage/v1"

echo -e "\n=== Checking all scheme additions in getSchemeForFakeClient ==="
rg -n "getSchemeForFakeClient" --type=go -A30 internal/controller/bsl_test.go | head -50

echo -e "\n=== Checking nodeagent.go imports ==="
head -50 internal/controller/nodeagent.go | grep -i "storagev1\|storage/v1"

echo -e "\n=== Checking if storagev1 is added anywhere in test setup ==="
rg -n "storagev1\|storage\.v1" --type=go internal/controller/ | grep -i "addtoscheme\|scheme"

Repository: openshift/oadp-operator

Length of output: 1749


🏁 Script executed:

#!/bin/bash
# Check if storagev1 is registered anywhere in the codebase
echo "=== Searching for storagev1 AddToScheme registrations ==="
rg -n "storagev1.*AddToScheme\|storage\.v1.*AddToScheme" --type=go

echo -e "\n=== Checking suite_test.go for scheme setup ==="
cat internal/controller/suite_test.go | head -100

echo -e "\n=== Checking if there's any other scheme setup in nodeagent_test.go ==="
rg -n "getScheme\|AddToScheme" --type=go internal/controller/nodeagent_test.go | head -20

Repository: openshift/oadp-operator

Length of output: 3232


🏁 Script executed:

#!/bin/bash
# Check the imports at the top of nodeagent_test.go
echo "=== Checking imports in nodeagent_test.go (first 100 lines) ==="
head -100 internal/controller/nodeagent_test.go

echo -e "\n=== Check if nodeagent_test uses its own getSchemeForFakeClient or imports it ==="
rg -n "import\|getSchemeForFakeClient\|getScheme" --type=go internal/controller/nodeagent_test.go | head -20

Repository: openshift/oadp-operator

Length of output: 2768


🏁 Script executed:

#!/bin/bash
# Verify full getDefaultStorageClass implementation to check annotation handling
echo "=== Full getDefaultStorageClass implementation ==="
rg -n "func.*getDefaultStorageClass\|is-default-class" --type=go -A15 internal/controller/nodeagent.go

echo -e "\n=== Check for beta annotation usage anywhere ==="
rg -n "beta\.kubernetes\.io.*is-default\|is-default.*beta" --type=go internal/controller/

Repository: openshift/oadp-operator

Length of output: 167


🏁 Script executed:

#!/bin/bash
# Find getDefaultStorageClass with a different pattern
echo "=== Searching for getDefaultStorageClass implementation ==="
rg "getDefaultStorageClass" --type=go -A15 internal/controller/nodeagent.go

echo -e "\n=== Check entire nodeagent.go around line 130-150 ==="
sed -n '128,150p' internal/controller/nodeagent.go

Repository: openshift/oadp-operator

Length of output: 2818


🏁 Script executed:

#!/bin/bash
# Check if storagev1.AddToScheme is available to add
echo "=== Check K8s storagev1 availability ==="
rg "k8s\.io.*storagev1" --type=go internal/controller/nodeagent.go

echo -e "\n=== Check if any test file adds storagev1 to scheme ==="
rg "storagev1.*AddToScheme\|AddToScheme.*storage" --type=go internal/controller/ | grep -v ".swp"

echo -e "\n=== Full getSchemeForFakeClient to see all additions ==="
sed -n '34,51p' internal/controller/bsl_test.go

Repository: openshift/oadp-operator

Length of output: 561


Add test case for multiple StorageClasses marked as default and fix missing storagev1 scheme registration.

The test file passes storagev1.StorageClass objects to getFakeClientFromObjects, but the underlying getSchemeForFakeClient() function does not register storagev1 in its scheme. This will cause the fake client to silently fail to deserialize these objects. Add storagev1.AddToScheme(scheme.Scheme) to getSchemeForFakeClient() in bsl_test.go.

Additionally, the production code's getDefaultStorageClass() returns the first matching default StorageClass without deterministic ordering. Add a test case for multiple StorageClasses simultaneously marked as default to document this behavior:

Suggested additional test case
  {
      name: "no default storage class",
      ...
  },
+ {
+     name: "multiple storage classes marked as default",
+     objects: []client.Object{
+         &storagev1.StorageClass{
+             ObjectMeta: metav1.ObjectMeta{
+                 Name: "gp3-csi",
+                 Annotations: map[string]string{
+                     "storageclass.kubernetes.io/is-default-class": "true",
+                 },
+             },
+         },
+         &storagev1.StorageClass{
+             ObjectMeta: metav1.ObjectMeta{
+                 Name: "standard",
+                 Annotations: map[string]string{
+                     "storageclass.kubernetes.io/is-default-class": "true",
+                 },
+             },
+         },
+     },
+     want:    "gp3-csi",
+     wantErr: false,
+ },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func Test_getDefaultStorageClass(t *testing.T) {
tests := []struct {
name string
objects []client.Object
want string
wantErr bool
}{
{
name: "no storage classes exist",
objects: nil,
want: "",
wantErr: false,
},
{
name: "default storage class exists",
objects: []client.Object{
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "gp3-csi",
Annotations: map[string]string{
"storageclass.kubernetes.io/is-default-class": "true",
},
},
},
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "standard",
},
},
},
want: "gp3-csi",
wantErr: false,
},
{
name: "no default storage class",
objects: []client.Object{
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "gp3-csi",
},
},
},
want: "",
wantErr: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
fakeClient, err := getFakeClientFromObjects(tt.objects...)
if err != nil {
t.Fatalf("error in creating fake client: %v", err)
}
r := &DataProtectionApplicationReconciler{
Client: fakeClient,
Scheme: fakeClient.Scheme(),
Log: logr.Discard(),
Context: newContextForTest(),
}
got, err := r.getDefaultStorageClass()
if (err != nil) != tt.wantErr {
t.Errorf("getDefaultStorageClass() error = %v, wantErr %v", err, tt.wantErr)
}
if got != tt.want {
t.Errorf("getDefaultStorageClass() = %v, want %v", got, tt.want)
}
})
}
}
func Test_getDefaultStorageClass(t *testing.T) {
tests := []struct {
name string
objects []client.Object
want string
wantErr bool
}{
{
name: "no storage classes exist",
objects: nil,
want: "",
wantErr: false,
},
{
name: "default storage class exists",
objects: []client.Object{
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "gp3-csi",
Annotations: map[string]string{
"storageclass.kubernetes.io/is-default-class": "true",
},
},
},
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "standard",
},
},
},
want: "gp3-csi",
wantErr: false,
},
{
name: "no default storage class",
objects: []client.Object{
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "gp3-csi",
},
},
},
want: "",
wantErr: false,
},
{
name: "multiple storage classes marked as default",
objects: []client.Object{
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "gp3-csi",
Annotations: map[string]string{
"storageclass.kubernetes.io/is-default-class": "true",
},
},
},
&storagev1.StorageClass{
ObjectMeta: metav1.ObjectMeta{
Name: "standard",
Annotations: map[string]string{
"storageclass.kubernetes.io/is-default-class": "true",
},
},
},
},
want: "gp3-csi",
wantErr: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
fakeClient, err := getFakeClientFromObjects(tt.objects...)
if err != nil {
t.Fatalf("error in creating fake client: %v", err)
}
r := &DataProtectionApplicationReconciler{
Client: fakeClient,
Scheme: fakeClient.Scheme(),
Log: logr.Discard(),
Context: newContextForTest(),
}
got, err := r.getDefaultStorageClass()
if (err != nil) != tt.wantErr {
t.Errorf("getDefaultStorageClass() error = %v, wantErr %v", err, tt.wantErr)
}
if got != tt.want {
t.Errorf("getDefaultStorageClass() = %v, want %v", got, tt.want)
}
})
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/nodeagent_test.go` around lines 2086 - 2153, Register the
storagev1 types in the fake client scheme by adding
storagev1.AddToScheme(scheme.Scheme) inside getSchemeForFakeClient (in
bsl_test.go) so StorageClass objects deserialize properly, and add a new test
case to Test_getDefaultStorageClass (nodeagent_test.go) that creates two
storagev1.StorageClass objects both annotated
"storageclass.kubernetes.io/is-default-class":"true" to assert
getDefaultStorageClass on DataProtectionApplicationReconciler returns the first
matching StorageClass name (documenting current non-deterministic ordering).

Comment on lines +2263 to +2267
} else {
require.True(t, ok, "cachePVC should be present in ConfigMap")
cachePVCMap := cachePVC.(map[string]interface{})
require.Equal(t, tt.wantStorageClass, cachePVCMap["storageClass"])
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use the comma-ok form for the type assertion to avoid a panic on unexpected JSON shapes.

If cachePVC is not a JSON object (edge case from a future serialization change), the bare assertion on Line 2265 panics the test instead of failing it cleanly.

🛡️ Proposed fix
-		cachePVCMap := cachePVC.(map[string]interface{})
-		require.Equal(t, tt.wantStorageClass, cachePVCMap["storageClass"])
+		cachePVCMap, ok := cachePVC.(map[string]interface{})
+		require.True(t, ok, "cachePVC value should be a JSON object")
+		require.Equal(t, tt.wantStorageClass, cachePVCMap["storageClass"])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else {
require.True(t, ok, "cachePVC should be present in ConfigMap")
cachePVCMap := cachePVC.(map[string]interface{})
require.Equal(t, tt.wantStorageClass, cachePVCMap["storageClass"])
}
} else {
require.True(t, ok, "cachePVC should be present in ConfigMap")
cachePVCMap, ok := cachePVC.(map[string]interface{})
require.True(t, ok, "cachePVC value should be a JSON object")
require.Equal(t, tt.wantStorageClass, cachePVCMap["storageClass"])
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/nodeagent_test.go` around lines 2263 - 2267, The test
currently does a bare type assertion cachePVC.(map[string]interface{}) which can
panic on unexpected JSON shapes; change it to the comma-ok form: cachePVCMap, ok
:= cachePVC.(map[string]interface{}), then assert require.True(t, ok, "cachePVC
should be a JSON object in ConfigMap") (immediately after the existing presence
check) before accessing cachePVCMap["storageClass"] so the test fails cleanly
instead of panicking.

Comment on lines +163 to +171
// If CachePVCConfig is set but StorageClass is empty, resolve the cluster's default StorageClass.
// This prevents restore failures when local storage is limited by ensuring a StorageClass is available
// for cache PVC provisioning, even when the user doesn't explicitly specify one.
if configWithPrivileged.CachePVCConfig != nil && configWithPrivileged.CachePVCConfig.StorageClass == "" {
defaultSC, err := r.getDefaultStorageClass()
if err != nil {
r.Log.Info("Failed to resolve default StorageClass for cache PVC, cache volume will be disabled", "error", err)
configWithPrivileged.CachePVCConfig = nil
} else if defaultSC != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid silently disabling cache PVC on StorageClass list errors.

A transient API failure will currently remove cachePVC from the config map and can regress restores. Prefer returning an error so reconciliation retries instead of downgrading behavior.

💡 Suggested change
 	defaultSC, err := r.getDefaultStorageClass()
 	if err != nil {
-		r.Log.Info("Failed to resolve default StorageClass for cache PVC, cache volume will be disabled", "error", err)
-		configWithPrivileged.CachePVCConfig = nil
+		return fmt.Errorf("failed to resolve default StorageClass for cache PVC: %w", err)
 	} else if defaultSC != "" {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// If CachePVCConfig is set but StorageClass is empty, resolve the cluster's default StorageClass.
// This prevents restore failures when local storage is limited by ensuring a StorageClass is available
// for cache PVC provisioning, even when the user doesn't explicitly specify one.
if configWithPrivileged.CachePVCConfig != nil && configWithPrivileged.CachePVCConfig.StorageClass == "" {
defaultSC, err := r.getDefaultStorageClass()
if err != nil {
r.Log.Info("Failed to resolve default StorageClass for cache PVC, cache volume will be disabled", "error", err)
configWithPrivileged.CachePVCConfig = nil
} else if defaultSC != "" {
// If CachePVCConfig is set but StorageClass is empty, resolve the cluster's default StorageClass.
// This prevents restore failures when local storage is limited by ensuring a StorageClass is available
// for cache PVC provisioning, even when the user doesn't explicitly specify one.
if configWithPrivileged.CachePVCConfig != nil && configWithPrivileged.CachePVCConfig.StorageClass == "" {
defaultSC, err := r.getDefaultStorageClass()
if err != nil {
return fmt.Errorf("failed to resolve default StorageClass for cache PVC: %w", err)
} else if defaultSC != "" {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/nodeagent.go` around lines 163 - 171, The code currently
clears configWithPrivileged.CachePVCConfig when r.getDefaultStorageClass()
returns an error, silently disabling the cache PVC; instead, modify the
reconciliation to return the error so the controller retries (do not set
CachePVCConfig = nil on getDefaultStorageClass error). Specifically, in the
block that calls r.getDefaultStorageClass() (refer to
configWithPrivileged.CachePVCConfig and r.getDefaultStorageClass()), change the
error branch to return the error (or wrap it with context) rather than logging
and nil’ing the config; keep the existing behavior only when defaultSC is empty
after a successful call.

@mpryc
Copy link
Copy Markdown
Contributor Author

mpryc commented Feb 24, 2026

/test 4.22-e2e-test-aws

1 similar comment
@mpryc
Copy link
Copy Markdown
Contributor Author

mpryc commented Feb 25, 2026

/test 4.22-e2e-test-aws

Copy link
Copy Markdown
Contributor

@weshayutin weshayutin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@mpryc mpryc added the tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges. label Feb 25, 2026
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2026
@kaovilai
Copy link
Copy Markdown
Member

@mpryc merge method is not allowed in repo settings on this. we had only enabled squash here.

@kaovilai
Copy link
Copy Markdown
Member

I configured it when we had admin. Assuming this setting is outside of openshift/release, and you want to bring back merge method: merge, then we have to request admin access. lmk if you absolutely need merge method in this repo. Easier is to set tide to squash for this repo via ocp/release pr

@mpryc
Copy link
Copy Markdown
Contributor Author

mpryc commented Feb 25, 2026

@kaovilai I think we can't use squash when rebasebot works, here rebasebot also works to get the go mods updated. Why it's not a problem for other repos?

@kaovilai
Copy link
Copy Markdown
Member

other repos we do rebases, force pushes.. here we don't.

@mpryc
Copy link
Copy Markdown
Contributor Author

mpryc commented Feb 25, 2026

@kaovilai we do use rebasebot to update all go modules and links to other deps in similar way, it's not strictly rebase, but it's using rebasebot to do very similar work than in other repos https://github.com/oadp-rebasebot/oadp-rebase/blob/oadp-dev/rebase-configs/openshift_oadp-operator_oadp-dev.env.sh#L10-L13

@kaovilai
Copy link
Copy Markdown
Member

which is different, and is not strictly something that required rewriting history of like we were doing in other repos, so we were able to add more protections here.

@kaovilai
Copy link
Copy Markdown
Member

I have been a fan of squashing on merge, instead of strictly enforcing having only single commit pre-PR. Allows you to explain your steps in PR, but squash to 1 on merge.

@kaovilai
Copy link
Copy Markdown
Member

We can add merge allowance, just gotta request admin.

@kaovilai
Copy link
Copy Markdown
Member

Just in case: https://issues.redhat.com/browse/DPP-19573

@mpryc mpryc added tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. and removed tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges. labels Feb 26, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Feb 26, 2026

@mpryc: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kaovilai
Copy link
Copy Markdown
Member

kaovilai commented Mar 2, 2026

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, mpryc, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit 13e07c0 into openshift:oadp-dev Mar 2, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-gen-bugfix ai-generated-test approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants