Skip to content

Comments

NFS3 protocol specific snapshot workflows#36

Open
rajiv-jain-netapp wants to merge 10 commits intomainfrom
feature/CSTACKEX-18_2
Open

NFS3 protocol specific snapshot workflows#36
rajiv-jain-netapp wants to merge 10 commits intomainfrom
feature/CSTACKEX-18_2

Conversation

@rajiv-jain-netapp
Copy link

@rajiv-jain-netapp rajiv-jain-netapp commented Feb 20, 2026

Description

This PR...

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@rajiv-jain-netapp
Copy link
Author

I have tested

Taking a CS-volume snapshot for the data CS-volume.
Tried to take VM-level snapshot, CloudStack asked me to select a single CS-volume instead of allowing to take a VM-level snapshot. That also worked.

@rajiv-jain-netapp
Copy link
Author

I have tested VM level and cloudstack-volume level snapshots, all found working

  1. VM level snapshot without memory-disk and quiecing option.
  2. VM level snapshot with quiecing option enabled. --> did not find working and error says, unable to quiece VM since emu agent is not found working on VM
  3. VM level snapshot with memory snapshot enabled--> working
  4. VM level snapshot with memory-snapshot and quiecing options enabled --> working fine.
  5. cloudstack volume level snapshots are working fine.


logger.error("Disk-only VM snapshot for VM [{}] failed{}.", userVm.getUuid(), answer != null ? " due to" + answer.getDetails() : "");
throw new CloudRuntimeException(String.format("Disk-only VM snapshot for VM [%s] failed.", userVm.getUuid()));
String details = answer != null ? answer.getDetails() : String.format("No answer received from host [%s]. The host may be unreachable.", hostId);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change is required?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am returning the failure reasons through the exception.


@Override
public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) {
s_logger.error("temp takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log level should not be error plus remove temp

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was intentionally kept for testing and was mistakenly overlooked during cleanup.

Map<String, String> poolDetails = storagePoolDetailsDao.listDetailsKeyPairs(volumeVO.getPoolId());
StorageStrategy storageStrategy = Utility.getStrategyByStoragePoolDetails(poolDetails);

Map<String, String> cloudStackVolumeRequestMap = new HashMap<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get Volume Request is not as per protocol

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added protocol check.

cloudStackVolumeRequestMap.put(Constants.VOLUME_UUID, poolDetails.get(Constants.VOLUME_UUID));
cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath());
CloudStackVolume cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap);
if (cloudStackVolume == null || cloudStackVolume.getFile() == null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition is also specific to file, it will fail for lun

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken care

throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot");
}
long capacityBytes = storagePool.getCapacityBytes();
s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log level should not be error

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

long capacityBytes = storagePool.getCapacityBytes();
s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());
long usedBytes = getUsedBytes(storagePool);
long fileSize = cloudStackVolume.getFile().getSize();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these all changes are related to file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


String fileSnapshotName = volumeInfo.getName() + "-" + snapshot.getUuid();

int maxSnapshotNameLength = 64;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can have more character in snapshot name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to contants for now, will update the length by checking it, if required.

return cloudStackVolumeRequest;
}

private CloudStackVolume snapshotCloudStackVolumeRequestByProtocol(Map<String, String> details,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not used anywhere

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am calling this method in takeSnapshot()

* a. Field is eligible for unified storage only.
* b. It will be null for the disaggregated storage.
*/
private String flexVolumeUuid;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need these 2 fields here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you review the logic, the volume UUID is required to support file‑level snapshot workflows. The destination path will also be useful for future copy operations. In the current implementation, I am using it specifically for the clone API call.

Copy link
Author

@rajiv-jain-netapp rajiv-jain-netapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reposting the PR with changes


logger.error("Disk-only VM snapshot for VM [{}] failed{}.", userVm.getUuid(), answer != null ? " due to" + answer.getDetails() : "");
throw new CloudRuntimeException(String.format("Disk-only VM snapshot for VM [%s] failed.", userVm.getUuid()));
String details = answer != null ? answer.getDetails() : String.format("No answer received from host [%s]. The host may be unreachable.", hostId);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am returning the failure reasons through the exception.


@Override
public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) {
s_logger.error("temp takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was intentionally kept for testing and was mistakenly overlooked during cleanup.

Map<String, String> poolDetails = storagePoolDetailsDao.listDetailsKeyPairs(volumeVO.getPoolId());
StorageStrategy storageStrategy = Utility.getStrategyByStoragePoolDetails(poolDetails);

Map<String, String> cloudStackVolumeRequestMap = new HashMap<>();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added protocol check.

cloudStackVolumeRequestMap.put(Constants.VOLUME_UUID, poolDetails.get(Constants.VOLUME_UUID));
cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath());
CloudStackVolume cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap);
if (cloudStackVolume == null || cloudStackVolume.getFile() == null) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken care

throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot");
}
long capacityBytes = storagePool.getCapacityBytes();
s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

long capacityBytes = storagePool.getCapacityBytes();
s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());
long usedBytes = getUsedBytes(storagePool);
long fileSize = cloudStackVolume.getFile().getSize();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


String fileSnapshotName = volumeInfo.getName() + "-" + snapshot.getUuid();

int maxSnapshotNameLength = 64;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to contants for now, will update the length by checking it, if required.

return cloudStackVolumeRequest;
}

private CloudStackVolume snapshotCloudStackVolumeRequestByProtocol(Map<String, String> details,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am calling this method in takeSnapshot()

* a. Field is eligible for unified storage only.
* b. It will be null for the disaggregated storage.
*/
private String flexVolumeUuid;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you review the logic, the volume UUID is required to support file‑level snapshot workflows. The destination path will also be useful for future copy operations. In the current implementation, I am using it specifically for the clone API call.

mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.TRUE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.TRUE.toString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set this as true as we are currently not supporting volume creation from snapshot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this property set to true, CloudStack initiates both the snapshot creation and the corresponding copy operation to secondary storage over the data path, as it assumes that snapshots must reside on secondary storage by default.
When only the storage-system-snapshot property is enabled, CloudStack successfully invokes the takeSnapshot() method in the plugin driver class. However, the snapshot remains on the primary storage pool and is not transferred to secondary storage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, with this implementation, snapshots reside in primary storage itself?

* @param storagePoolId: primary storage pool id
* @param ontapSnapSize: Size of snapshot CS volume(LUN/file)
*/
private void updateSnapshotDetails(long csSnapshotId, long csVolumeId, String ontapVolumeUuid, String ontapNewSnapshot, long storagePoolId, long ontapSnapSize) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a map and put whatever we need to at one time and persist in SnapshotDetailsV0 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store these details as key–value pairs in the database. To facilitate efficient querying—particularly during snapshot delete or revert operations—I prefer to keep them independent rather than embedding them within a combined structure.

return new Pair<>(snapshotXml, volumeObjectToNewPathMap);
}

protected void cleanupLeftoverDeltas(List<VolumeObjectTO> volumeObjectTos, Map<String, Pair<Long, String>> mapVolumeToSnapshotSizeAndNewVolumePath, KVMStoragePoolManager storagePoolMgr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this refactoring somehow help our plugin functionality?

mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.TRUE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.TRUE.toString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are supporting volume creation from snapshot with this PR?


@Override
public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) {
s_logger.info("takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have an intuitive log message instead of a method entry logger?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a few examples?
Since I am also logging the method arguments, I felt that this approach would give me better insight into the method invocation flow, rather than relying solely on static logging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding an entry log we can add a log post line 544. For example, it could be like
s_logger.info("takeSnapshot: Snapshot with ID: " + snapshot.getId() + " and name: " + snapshot.getName() + " requested for volume with ID: " + volumeVO.getId() + " and name: " + volumeVO.getName())
It's just an example but I feel this could give us more outlook on whats happening.

long capacityBytes = storagePool.getCapacityBytes();

// Only proceed for NFS3 protocol
if (ProtocolType.NFS3.name().equalsIgnoreCase(poolDetails.get(Constants.PROTOCOL))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we decide not to have any conditional statements on protocols?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did, but the current context is not related to the hierarchy of strategy classes. The implementation here is analogous to the request‑preparation helper methods we already maintain within the driver classes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A part of the discussion was to have nothing related to protocols here, move all the necessary objects to strategy classes and have the necessary implementation there itself, according to the requirement. Will try to pick up the code for iSCSI soon, but, please take the lead in making code changes to align with the discussed approach?

cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath());
cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap);
if (cloudStackVolume == null || cloudStackVolume.getFile() == null) {
throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have method names only in loggers and not have them in exceptions? Management server might show this as an error on the UI?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acknowledged

try {
fileResponse = nasFeignClient.getFileResponse(authHeader, volumeUuid, filePath);
if (fileResponse == null || fileResponse.getRecords().isEmpty()) {
throw new CloudRuntimeException("File " + filePath + " not not found on ONTAP. " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "not not"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* VM Snapshot strategy for NetApp ONTAP managed storage.
*
* <p>This strategy handles VM-level (instance) snapshots for VMs whose volumes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need comments in HTML format?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have any limitations on these yet, so I kept them. BTW, I got them generated.

* <ol>
* <li>Freeze the VM via QEMU guest agent ({@code fsfreeze})</li>
* <li>For each attached volume, create a storage-level snapshot (ONTAP file clone)</li>
* <li>Thaw the VM</li>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "Thaw the VM" mean?

return false;
}

if (!Hypervisor.HypervisorType.KVM.equals(userVm.getHypervisorType())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have checks in StoragePool and CS Volume creation workflows for Hypervisor type right? Do we need it here also?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inline with the snapshot workflow

return false;
}

if (!VirtualMachine.State.Running.equals(userVm.getState())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this block disk snapshots?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope

* <li>All VM volumes are on ONTAP managed NFS primary storage</li>
* </ul>
*/
public class OntapVMSnapshotStrategy extends StorageVMSnapshotStrategy {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any other vendor extending this class to implement snapshot functionality. What ways this class is helping us ? And is it being called by the framework in snapshot workflow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this implementation for NFS specific workflow requirement. None of the other vendors are supporting NFS workflows via a plugin.
I added this to override the behaviour of quiceing the volume if quiecing is already done from the VM side.

s_logger.info("takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());
fileSize = cloudStackVolume.getFile().getSize();
usedBytes += fileSize;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot or clone are not of the same size as actual volume(Until spllit clone). So , indirectly we are reducing the available space of storage pool for a clone/snapshot . We can query fr snapshot or clone size and update accordingly.


updateSnapshotDetails(snapshot.getId(), volumeInfo.getId(), poolDetails.get(Constants.VOLUME_UUID), cloneCloudStackVolume.getFile().getPath(), volumeVO.getPoolId(), fileSize);

snapshotObjectTo.setPath(Constants.ONTAP_SNAP_ID +"="+cloneCloudStackVolume.getFile().getPath());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "=" intentional ?

CloudStackVolume snapCloudStackVolumeRequest = snapshotCloudStackVolumeRequestByProtocol(poolDetails, volumeVO.getPath(), snapshotName);
CloudStackVolume cloneCloudStackVolume = storageStrategy.snapshotCloudStackVolume(snapCloudStackVolumeRequest);

updateSnapshotDetails(snapshot.getId(), volumeInfo.getId(), poolDetails.get(Constants.VOLUME_UUID), cloneCloudStackVolume.getFile().getPath(), volumeVO.getPoolId(), fileSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If iSCSI needs different things to be updated in DB, we might get an if-else on protocol here also.

if (volume.getPoolId() == null) {
return false;
}
StoragePoolVO pool = storagePool.findById(volume.getPoolId());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize here -> Instead of making findById for each volume, we can use listByUuids by passing all volumes together. This will save much db calls and memory.

boolean quiescevm = true;
VMSnapshotOptions options = vmSnapshotVO.getOptions();
if (options != null && !options.needQuiesceVM()) {
logger.info("Quiesce option was set to false, but overriding to true for ONTAP managed storage " +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are planning to only support app consistency, why not throw an error message stating ONTAP recommends quiescing the vm instead of overriding in backend without notifying user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants