storage: validate NFS secondary storage mount using SSVM during image store discovery#12678
storage: validate NFS secondary storage mount using SSVM during image store discovery#12678SURYAS1306 wants to merge 1 commit intoapache:4.22from
Conversation
| if (zoneId != null) { | ||
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | ||
| boolean mountSuccess = false; | ||
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | ||
|
|
||
| for (HostVO ssvm : ssvmHosts) { | ||
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | ||
| continue; | ||
| } | ||
|
|
There was a problem hiding this comment.
The whole validation could be a separate method.
And could it be an issue for a newly deployed zone which will not have a SSVM until a secondary store is added?
HostDao can list by both type and datacenter ID. If a method is not present it can be added.
shwstppr
left a comment
There was a problem hiding this comment.
Will need validation for a new zone
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent registering unusable NFS secondary storage by performing fail-fast validation during image store discovery, using a Secondary Storage VM (SSVM) to verify the mount before proceeding.
Changes:
- Injects
SecondaryStorageVmManagerintoStorageManagerImpl. - Adds SSVM-based validation logic inside
discoverImageStoreand rolls back the image store on validation failure. - Returns an error to the API caller when SSVM setup/mount validation fails.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (zoneId != null) { | ||
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | ||
| boolean mountSuccess = false; | ||
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | ||
|
|
||
| for (HostVO ssvm : ssvmHosts) { | ||
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | ||
| continue; | ||
| } | ||
|
|
||
| try { | ||
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | ||
| if (result) { | ||
| mountSuccess = true; | ||
| break; | ||
| } else { | ||
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | ||
| } | ||
| } catch (Exception e) { | ||
| failureReason = e.getMessage(); | ||
| } | ||
| } | ||
|
|
||
| if (!mountSuccess) { | ||
| // cleanup created store | ||
| _imageStoreDao.remove(store.getId()); | ||
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
SSVM selection uses _hostDao.listByType(SecondaryStorageVM) and only filters by zone ID. This includes hosts in non-Up states and can result in false failures (or an unhelpful "invalid mount" error) when the zone has no Up/Connecting SSVM yet. Prefer using SecondaryStorageVmManager.listUpAndConnectingSecondaryStorageVmHost(zoneId) (or equivalent) and consider how to handle zones with no available SSVM (e.g., start one, retry, or skip validation with a clearer error).
| if (zoneId != null) { | |
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | |
| boolean mountSuccess = false; | |
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | |
| for (HostVO ssvm : ssvmHosts) { | |
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | |
| continue; | |
| } | |
| try { | |
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | |
| if (result) { | |
| mountSuccess = true; | |
| break; | |
| } else { | |
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | |
| } | |
| } catch (Exception e) { | |
| failureReason = e.getMessage(); | |
| } | |
| } | |
| if (!mountSuccess) { | |
| // cleanup created store | |
| _imageStoreDao.remove(store.getId()); | |
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | |
| } | |
| } | |
| if (zoneId != null) { | |
| List<HostVO> ssvmHosts = _ssVmMgr.listUpAndConnectingSecondaryStorageVmHost(zoneId); | |
| if (CollectionUtils.isEmpty(ssvmHosts)) { | |
| if (logger.isInfoEnabled()) { | |
| logger.info("Skipping immediate secondary storage mount validation: no Up/Connecting Secondary Storage VM available in zone " + zoneId); | |
| } | |
| } else { | |
| boolean mountSuccess = false; | |
| String failureReason = "No Secondary Storage VM was able to validate the NFS mount."; | |
| for (HostVO ssvm : ssvmHosts) { | |
| try { | |
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | |
| if (result) { | |
| mountSuccess = true; | |
| break; | |
| } else { | |
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | |
| } | |
| } catch (Exception e) { | |
| failureReason = e.getMessage(); | |
| } | |
| } | |
| if (!mountSuccess) { | |
| // cleanup created store | |
| _imageStoreDao.remove(store.getId()); | |
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | |
| } | |
| } | |
| } |
| if (zoneId != null) { | ||
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | ||
| boolean mountSuccess = false; | ||
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | ||
|
|
||
| for (HostVO ssvm : ssvmHosts) { | ||
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | ||
| continue; | ||
| } | ||
|
|
||
| try { | ||
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | ||
| if (result) { | ||
| mountSuccess = true; | ||
| break; | ||
| } else { | ||
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | ||
| } | ||
| } catch (Exception e) { | ||
| failureReason = e.getMessage(); | ||
| } | ||
| } | ||
|
|
||
| if (!mountSuccess) { | ||
| // cleanup created store | ||
| _imageStoreDao.remove(store.getId()); | ||
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The added validation block isn’t indented consistently with the surrounding method body (the if (zoneId != null) starts at column 1). This is likely to violate the project’s formatting/checkstyle rules and makes the method harder to read; please align indentation with the surrounding code style.
| if (zoneId != null) { | |
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | |
| boolean mountSuccess = false; | |
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | |
| for (HostVO ssvm : ssvmHosts) { | |
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | |
| continue; | |
| } | |
| try { | |
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | |
| if (result) { | |
| mountSuccess = true; | |
| break; | |
| } else { | |
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | |
| } | |
| } catch (Exception e) { | |
| failureReason = e.getMessage(); | |
| } | |
| } | |
| if (!mountSuccess) { | |
| // cleanup created store | |
| _imageStoreDao.remove(store.getId()); | |
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | |
| } | |
| } | |
| if (zoneId != null) { | |
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | |
| boolean mountSuccess = false; | |
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | |
| for (HostVO ssvm : ssvmHosts) { | |
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | |
| continue; | |
| } | |
| try { | |
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | |
| if (result) { | |
| mountSuccess = true; | |
| break; | |
| } else { | |
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | |
| } | |
| } catch (Exception e) { | |
| failureReason = e.getMessage(); | |
| } | |
| } | |
| if (!mountSuccess) { | |
| // cleanup created store | |
| _imageStoreDao.remove(store.getId()); | |
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | |
| } | |
| } |
| // Validate secondary storage mount immediately using SSVM | ||
| if (zoneId != null) { | ||
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | ||
| boolean mountSuccess = false; | ||
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | ||
|
|
||
| for (HostVO ssvm : ssvmHosts) { | ||
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | ||
| continue; | ||
| } | ||
|
|
||
| try { | ||
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | ||
| if (result) { | ||
| mountSuccess = true; | ||
| break; | ||
| } else { | ||
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | ||
| } | ||
| } catch (Exception e) { | ||
| failureReason = e.getMessage(); | ||
| } | ||
| } | ||
|
|
||
| if (!mountSuccess) { | ||
| // cleanup created store | ||
| _imageStoreDao.remove(store.getId()); | ||
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | ||
| } | ||
| } |
There was a problem hiding this comment.
This change introduces new behavior (fail-fast validation + rollback) in discoverImageStore, but there’s no accompanying unit test coverage for the new success/failure paths. Please add tests that mock SSVM host selection and generateSetupCommand results to assert: (1) invalid NFS causes an exception and the store is cleaned up, and (2) valid NFS proceeds without deleting the store.
| // Validate secondary storage mount immediately using SSVM | ||
| if (zoneId != null) { | ||
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | ||
| boolean mountSuccess = false; | ||
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | ||
|
|
||
| for (HostVO ssvm : ssvmHosts) { | ||
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | ||
| continue; | ||
| } | ||
|
|
||
| try { | ||
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | ||
| if (result) { | ||
| mountSuccess = true; | ||
| break; | ||
| } else { | ||
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | ||
| } | ||
| } catch (Exception e) { | ||
| failureReason = e.getMessage(); | ||
| } | ||
| } | ||
|
|
||
| if (!mountSuccess) { | ||
| // cleanup created store | ||
| _imageStoreDao.remove(store.getId()); | ||
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The new validation calls SecondaryStorageVmManager.generateSetupCommand(ssvmHostId), but that method programs all NFS image stores in the zone (iterates getImageStoresByScope and returns false if any setup fails). This can cause adding a new image store to fail due to an unrelated existing store problem, and it also doesn’t specifically validate the just-created store. Consider sending a setup/validation command targeted only at the newly initialized store (or extending SecondaryStorageVmManager with a method that validates a specific store/URL) so the result reflects only the store being added.
| // Validate secondary storage mount immediately using SSVM | |
| if (zoneId != null) { | |
| List<HostVO> ssvmHosts = _hostDao.listByType(Host.Type.SecondaryStorageVM); | |
| boolean mountSuccess = false; | |
| String failureReason = "No Secondary Storage VM available to validate the NFS mount."; | |
| for (HostVO ssvm : ssvmHosts) { | |
| if (ssvm.getDataCenterId() != zoneId.longValue()) { | |
| continue; | |
| } | |
| try { | |
| boolean result = _ssVmMgr.generateSetupCommand(ssvm.getId()); | |
| if (result) { | |
| mountSuccess = true; | |
| break; | |
| } else { | |
| failureReason = "Secondary Storage VM failed to mount the NFS secondary storage."; | |
| } | |
| } catch (Exception e) { | |
| failureReason = e.getMessage(); | |
| } | |
| } | |
| if (!mountSuccess) { | |
| // cleanup created store | |
| _imageStoreDao.remove(store.getId()); | |
| throw new CloudRuntimeException("Invalid secondary storage mount: " + failureReason); | |
| } | |
| } |
| // cleanup created store | ||
| _imageStoreDao.remove(store.getId()); |
There was a problem hiding this comment.
Rollback on mount validation failure only removes the image_store record. In this codebase, image store deletion also explicitly deletes image_store_details and other related records because delete-cascade won’t run (see deleteImageStore). This failure path should perform the same cleanup (ideally in a transaction) or call the appropriate lifecycle/service delete so the DB doesn’t retain orphaned details/refs when validation fails.
| // cleanup created store | |
| _imageStoreDao.remove(store.getId()); |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #12678 +/- ##
=========================================
Coverage 17.60% 17.61%
- Complexity 15659 15660 +1
=========================================
Files 5917 5917
Lines 531394 531414 +20
Branches 64970 64975 +5
=========================================
+ Hits 93575 93582 +7
- Misses 427269 427282 +13
Partials 10550 10550
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
###Description
This PR addresses issue #12674 where CloudStack allows adding a secondary storage (image store) even when the provided NFS mount path is invalid.
Currently during image store discovery, CloudStack persists the image store in the database without validating whether the Secondary Storage VM (SSVM) can access the NFS export.
The failure only appears later during system VM template seeding (setup-sysvm-tmplt), producing errors in logs while the API/UI reports successful storage addition.
This leads to an inconsistent state where unusable secondary storage is registered in the system.
Previously validation occurred only during System VM template seeding; this change introduces fail-fast validation during image store discovery.
This PR performs immediate validation using SSVM during image store discovery:
If the SSVM setup command fails:
This prevents unusable secondary storage from being registered and avoids late failures during system VM template registration.
Fixes: #12674
Types of changes
Bug Severity
How Has This Been Tested?
Environment
Test Scenarios
Valid NFS mount
Invalid NFS mount
Regression check
Unit tests executed:
mvn -pl server -Dtest=StorageManagerImplTest,SecondaryStorageManagerImplTest testHow did you try to break this feature and the system with this change?
In all cases, storage creation failed correctly and no invalid store was registered.