NFS3 protocol specific snapshot workflows#36
NFS3 protocol specific snapshot workflows#36rajiv-jain-netapp wants to merge 10 commits intomainfrom
Conversation
|
I have tested Taking a CS-volume snapshot for the data CS-volume. |
…iecing is enabled and memory-disk is opted out
|
I have tested VM level and cloudstack-volume level snapshots, all found working
|
|
|
||
| logger.error("Disk-only VM snapshot for VM [{}] failed{}.", userVm.getUuid(), answer != null ? " due to" + answer.getDetails() : ""); | ||
| throw new CloudRuntimeException(String.format("Disk-only VM snapshot for VM [%s] failed.", userVm.getUuid())); | ||
| String details = answer != null ? answer.getDetails() : String.format("No answer received from host [%s]. The host may be unreachable.", hostId); |
There was a problem hiding this comment.
I am returning the failure reasons through the exception.
|
|
||
| @Override | ||
| public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) { | ||
| s_logger.error("temp takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName()); |
There was a problem hiding this comment.
log level should not be error plus remove temp
There was a problem hiding this comment.
Yes, this was intentionally kept for testing and was mistakenly overlooked during cleanup.
| Map<String, String> poolDetails = storagePoolDetailsDao.listDetailsKeyPairs(volumeVO.getPoolId()); | ||
| StorageStrategy storageStrategy = Utility.getStrategyByStoragePoolDetails(poolDetails); | ||
|
|
||
| Map<String, String> cloudStackVolumeRequestMap = new HashMap<>(); |
There was a problem hiding this comment.
Get Volume Request is not as per protocol
There was a problem hiding this comment.
added protocol check.
| cloudStackVolumeRequestMap.put(Constants.VOLUME_UUID, poolDetails.get(Constants.VOLUME_UUID)); | ||
| cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath()); | ||
| CloudStackVolume cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap); | ||
| if (cloudStackVolume == null || cloudStackVolume.getFile() == null) { |
There was a problem hiding this comment.
this condition is also specific to file, it will fail for lun
| throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot"); | ||
| } | ||
| long capacityBytes = storagePool.getCapacityBytes(); | ||
| s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize()); |
| long capacityBytes = storagePool.getCapacityBytes(); | ||
| s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize()); | ||
| long usedBytes = getUsedBytes(storagePool); | ||
| long fileSize = cloudStackVolume.getFile().getSize(); |
There was a problem hiding this comment.
these all changes are related to file
|
|
||
| String fileSnapshotName = volumeInfo.getName() + "-" + snapshot.getUuid(); | ||
|
|
||
| int maxSnapshotNameLength = 64; |
There was a problem hiding this comment.
I think, we can have more character in snapshot name
There was a problem hiding this comment.
moved it to contants for now, will update the length by checking it, if required.
| return cloudStackVolumeRequest; | ||
| } | ||
|
|
||
| private CloudStackVolume snapshotCloudStackVolumeRequestByProtocol(Map<String, String> details, |
There was a problem hiding this comment.
I am calling this method in takeSnapshot()
| * a. Field is eligible for unified storage only. | ||
| * b. It will be null for the disaggregated storage. | ||
| */ | ||
| private String flexVolumeUuid; |
There was a problem hiding this comment.
why do we need these 2 fields here?
There was a problem hiding this comment.
If you review the logic, the volume UUID is required to support file‑level snapshot workflows. The destination path will also be useful for future copy operations. In the current implementation, I am using it specifically for the clone API call.
rajiv-jain-netapp
left a comment
There was a problem hiding this comment.
reposting the PR with changes
|
|
||
| logger.error("Disk-only VM snapshot for VM [{}] failed{}.", userVm.getUuid(), answer != null ? " due to" + answer.getDetails() : ""); | ||
| throw new CloudRuntimeException(String.format("Disk-only VM snapshot for VM [%s] failed.", userVm.getUuid())); | ||
| String details = answer != null ? answer.getDetails() : String.format("No answer received from host [%s]. The host may be unreachable.", hostId); |
There was a problem hiding this comment.
I am returning the failure reasons through the exception.
|
|
||
| @Override | ||
| public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) { | ||
| s_logger.error("temp takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName()); |
There was a problem hiding this comment.
Yes, this was intentionally kept for testing and was mistakenly overlooked during cleanup.
| Map<String, String> poolDetails = storagePoolDetailsDao.listDetailsKeyPairs(volumeVO.getPoolId()); | ||
| StorageStrategy storageStrategy = Utility.getStrategyByStoragePoolDetails(poolDetails); | ||
|
|
||
| Map<String, String> cloudStackVolumeRequestMap = new HashMap<>(); |
There was a problem hiding this comment.
added protocol check.
| cloudStackVolumeRequestMap.put(Constants.VOLUME_UUID, poolDetails.get(Constants.VOLUME_UUID)); | ||
| cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath()); | ||
| CloudStackVolume cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap); | ||
| if (cloudStackVolume == null || cloudStackVolume.getFile() == null) { |
| throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot"); | ||
| } | ||
| long capacityBytes = storagePool.getCapacityBytes(); | ||
| s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize()); |
| long capacityBytes = storagePool.getCapacityBytes(); | ||
| s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize()); | ||
| long usedBytes = getUsedBytes(storagePool); | ||
| long fileSize = cloudStackVolume.getFile().getSize(); |
|
|
||
| String fileSnapshotName = volumeInfo.getName() + "-" + snapshot.getUuid(); | ||
|
|
||
| int maxSnapshotNameLength = 64; |
There was a problem hiding this comment.
moved it to contants for now, will update the length by checking it, if required.
| return cloudStackVolumeRequest; | ||
| } | ||
|
|
||
| private CloudStackVolume snapshotCloudStackVolumeRequestByProtocol(Map<String, String> details, |
There was a problem hiding this comment.
I am calling this method in takeSnapshot()
| * a. Field is eligible for unified storage only. | ||
| * b. It will be null for the disaggregated storage. | ||
| */ | ||
| private String flexVolumeUuid; |
There was a problem hiding this comment.
If you review the logic, the volume UUID is required to support file‑level snapshot workflows. The destination path will also be useful for future copy operations. In the current implementation, I am using it specifically for the clone API call.
| mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.FALSE.toString()); | ||
| mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.FALSE.toString()); | ||
| mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.TRUE.toString()); | ||
| mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.TRUE.toString()); |
There was a problem hiding this comment.
Why do we need to set this as true as we are currently not supporting volume creation from snapshot.
There was a problem hiding this comment.
Without this property set to true, CloudStack initiates both the snapshot creation and the corresponding copy operation to secondary storage over the data path, as it assumes that snapshots must reside on secondary storage by default.
When only the storage-system-snapshot property is enabled, CloudStack successfully invokes the takeSnapshot() method in the plugin driver class. However, the snapshot remains on the primary storage pool and is not transferred to secondary storage.
There was a problem hiding this comment.
So, with this implementation, snapshots reside in primary storage itself?
| * @param storagePoolId: primary storage pool id | ||
| * @param ontapSnapSize: Size of snapshot CS volume(LUN/file) | ||
| */ | ||
| private void updateSnapshotDetails(long csSnapshotId, long csVolumeId, String ontapVolumeUuid, String ontapNewSnapshot, long storagePoolId, long ontapSnapSize) { |
There was a problem hiding this comment.
can we create a map and put whatever we need to at one time and persist in SnapshotDetailsV0 ?
There was a problem hiding this comment.
We store these details as key–value pairs in the database. To facilitate efficient querying—particularly during snapshot delete or revert operations—I prefer to keep them independent rather than embedding them within a combined structure.
| return new Pair<>(snapshotXml, volumeObjectToNewPathMap); | ||
| } | ||
|
|
||
| protected void cleanupLeftoverDeltas(List<VolumeObjectTO> volumeObjectTos, Map<String, Pair<Long, String>> mapVolumeToSnapshotSizeAndNewVolumePath, KVMStoragePoolManager storagePoolMgr) { |
There was a problem hiding this comment.
Does this refactoring somehow help our plugin functionality?
| mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.FALSE.toString()); | ||
| mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.FALSE.toString()); | ||
| mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.TRUE.toString()); | ||
| mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.TRUE.toString()); |
There was a problem hiding this comment.
We are supporting volume creation from snapshot with this PR?
|
|
||
| @Override | ||
| public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) { | ||
| s_logger.info("takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName()); |
There was a problem hiding this comment.
Lets have an intuitive log message instead of a method entry logger?
There was a problem hiding this comment.
Could you provide a few examples?
Since I am also logging the method arguments, I felt that this approach would give me better insight into the method invocation flow, rather than relying solely on static logging.
There was a problem hiding this comment.
Instead of adding an entry log we can add a log post line 544. For example, it could be like
s_logger.info("takeSnapshot: Snapshot with ID: " + snapshot.getId() + " and name: " + snapshot.getName() + " requested for volume with ID: " + volumeVO.getId() + " and name: " + volumeVO.getName())
It's just an example but I feel this could give us more outlook on whats happening.
| long capacityBytes = storagePool.getCapacityBytes(); | ||
|
|
||
| // Only proceed for NFS3 protocol | ||
| if (ProtocolType.NFS3.name().equalsIgnoreCase(poolDetails.get(Constants.PROTOCOL))) { |
There was a problem hiding this comment.
Didn't we decide not to have any conditional statements on protocols?
There was a problem hiding this comment.
We did, but the current context is not related to the hierarchy of strategy classes. The implementation here is analogous to the request‑preparation helper methods we already maintain within the driver classes.
There was a problem hiding this comment.
A part of the discussion was to have nothing related to protocols here, move all the necessary objects to strategy classes and have the necessary implementation there itself, according to the requirement. Will try to pick up the code for iSCSI soon, but, please take the lead in making code changes to align with the discussed approach?
| cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath()); | ||
| cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap); | ||
| if (cloudStackVolume == null || cloudStackVolume.getFile() == null) { | ||
| throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot"); |
There was a problem hiding this comment.
Should we have method names only in loggers and not have them in exceptions? Management server might show this as an error on the UI?
| try { | ||
| fileResponse = nasFeignClient.getFileResponse(authHeader, volumeUuid, filePath); | ||
| if (fileResponse == null || fileResponse.getRecords().isEmpty()) { | ||
| throw new CloudRuntimeException("File " + filePath + " not not found on ONTAP. " + |
| /** | ||
| * VM Snapshot strategy for NetApp ONTAP managed storage. | ||
| * | ||
| * <p>This strategy handles VM-level (instance) snapshots for VMs whose volumes |
There was a problem hiding this comment.
Do we need comments in HTML format?
There was a problem hiding this comment.
We do not have any limitations on these yet, so I kept them. BTW, I got them generated.
| * <ol> | ||
| * <li>Freeze the VM via QEMU guest agent ({@code fsfreeze})</li> | ||
| * <li>For each attached volume, create a storage-level snapshot (ONTAP file clone)</li> | ||
| * <li>Thaw the VM</li> |
There was a problem hiding this comment.
what does "Thaw the VM" mean?
| return false; | ||
| } | ||
|
|
||
| if (!Hypervisor.HypervisorType.KVM.equals(userVm.getHypervisorType())) { |
There was a problem hiding this comment.
We already have checks in StoragePool and CS Volume creation workflows for Hypervisor type right? Do we need it here also?
There was a problem hiding this comment.
This is inline with the snapshot workflow
| return false; | ||
| } | ||
|
|
||
| if (!VirtualMachine.State.Running.equals(userVm.getState())) { |
There was a problem hiding this comment.
Wouldn't this block disk snapshots?
| * <li>All VM volumes are on ONTAP managed NFS primary storage</li> | ||
| * </ul> | ||
| */ | ||
| public class OntapVMSnapshotStrategy extends StorageVMSnapshotStrategy { |
There was a problem hiding this comment.
I don't see any other vendor extending this class to implement snapshot functionality. What ways this class is helping us ? And is it being called by the framework in snapshot workflow?
There was a problem hiding this comment.
I have this implementation for NFS specific workflow requirement. None of the other vendors are supporting NFS workflows via a plugin.
I added this to override the behaviour of quiceing the volume if quiecing is already done from the VM side.
| s_logger.info("takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize()); | ||
| fileSize = cloudStackVolume.getFile().getSize(); | ||
| usedBytes += fileSize; | ||
| } |
There was a problem hiding this comment.
snapshot or clone are not of the same size as actual volume(Until spllit clone). So , indirectly we are reducing the available space of storage pool for a clone/snapshot . We can query fr snapshot or clone size and update accordingly.
|
|
||
| updateSnapshotDetails(snapshot.getId(), volumeInfo.getId(), poolDetails.get(Constants.VOLUME_UUID), cloneCloudStackVolume.getFile().getPath(), volumeVO.getPoolId(), fileSize); | ||
|
|
||
| snapshotObjectTo.setPath(Constants.ONTAP_SNAP_ID +"="+cloneCloudStackVolume.getFile().getPath()); |
| CloudStackVolume snapCloudStackVolumeRequest = snapshotCloudStackVolumeRequestByProtocol(poolDetails, volumeVO.getPath(), snapshotName); | ||
| CloudStackVolume cloneCloudStackVolume = storageStrategy.snapshotCloudStackVolume(snapCloudStackVolumeRequest); | ||
|
|
||
| updateSnapshotDetails(snapshot.getId(), volumeInfo.getId(), poolDetails.get(Constants.VOLUME_UUID), cloneCloudStackVolume.getFile().getPath(), volumeVO.getPoolId(), fileSize); |
There was a problem hiding this comment.
If iSCSI needs different things to be updated in DB, we might get an if-else on protocol here also.
| if (volume.getPoolId() == null) { | ||
| return false; | ||
| } | ||
| StoragePoolVO pool = storagePool.findById(volume.getPoolId()); |
There was a problem hiding this comment.
We can optimize here -> Instead of making findById for each volume, we can use listByUuids by passing all volumes together. This will save much db calls and memory.
| boolean quiescevm = true; | ||
| VMSnapshotOptions options = vmSnapshotVO.getOptions(); | ||
| if (options != null && !options.needQuiesceVM()) { | ||
| logger.info("Quiesce option was set to false, but overriding to true for ONTAP managed storage " + |
There was a problem hiding this comment.
If we are planning to only support app consistency, why not throw an error message stating ONTAP recommends quiescing the vm instead of overriding in backend without notifying user.
Description
This PR...
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?