-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
problem
NASBackupProvider.syncBackupStorageStats() crashes with a NullPointerException when ResourceManager.findOneRandomRunningHostByHypervisor() returns null. This happens when no KVM host in the zone has status=Up at the exact moment the BackupSyncTask runs (e.g., during management server startup, brief agent disconnections, or host state
transitions).
The NPE kills the entire BackupSyncTask background job every sync interval (default 300s), flooding the management server log with stack traces and preventing backup storage stats from being updated.
Stack Trace
ERROR [o.a.c.b.B.BackupSyncTask] Error trying to run backup-sync background task due to:
[Cannot invoke "com.cloud.host.Host.getId()" because "host" is null].
java.lang.NullPointerException: Cannot invoke "com.cloud.host.Host.getId()" because "host" is null
at org.apache.cloudstack.backup.NASBackupProvider.syncBackupStorageStats(NASBackupProvider.java:544)
at org.apache.cloudstack.backup.BackupManagerImpl$BackupSyncTask.runInContext(BackupManagerImpl.java:1947)
Affected Code
File: plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java`
java
@OverRide
public void syncBackupStorageStats(Long zoneId) {
final List repositories = backupRepositoryDao.listByZoneAndProvider(zoneId, getName());
final Host host = resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM, zoneId);
// host can be null here, but no null check before using it:
for (final BackupRepository repository : repositories) {
...
answer = (BackupStorageStatsAnswer) agentManager.send(host.getId(), command); // NPE
...
}
}
findOneRandomRunningHostByHypervisor in ResourceManagerImpl returns null when no matching host is found:
if (CollectionUtils.isEmpty(hosts)) {
return null;
}
The same pattern also exists in deleteBackup() (line ~450) where the host can be null when the VM is removed and no running KVM host is available.
Suggested Fix
Add a null check after findOneRandomRunningHostByHypervisor, log a warning, and return early:
@OverRide
public void syncBackupStorageStats(Long zoneId) {
final List repositories = backupRepositoryDao.listByZoneAndProvider(zoneId, getName());
if (repositories.isEmpty()) {
return;
}
final Host host = resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM, zoneId);
if (host == null) {
logger.warn("Unable to find a running KVM host in zone {} to sync backup storage stats", zoneId);
return;
}
for (final BackupRepository repository : repositories) {
...
}
}
And similarly for deleteBackup():
Host host = vm != null ? getVMHypervisorHost(vm) :
resourceManager.findOneRandomRunningHostByHypervisor(HypervisorType.KVM, Long.valueOf(backup.getZoneId()));
if (host == null) {
throw new CloudRuntimeException("Unable to find a running KVM host to process backup deletion");
}
Environment
- CloudStack version: 4.22.0.0
- Hypervisor: KVM
- Backup provider: NAS (NFS)
- OS: Ubuntu 24.04, Java 21
How to Reproduce
- Configure NAS backup provider with an NFS backup repository
- Assign backup offerings to VMs
- Restart cloudstack-management (or wait for a transient host disconnect)
- Observe management-server.log — the NPE fires every backup.framework.sync.interval seconds
Impact
- BackupSyncTask fails completely on every cycle, backup storage capacity stats are never updated
- Log spam (one full stack trace every 5 minutes)
- No data loss, but backup monitoring/reporting is degraded
versions
The versions of ACS, hypervisors, storage, network etc..
The steps to reproduce the bug
...
What to do about it?
No response