Skip to content

NullPointerException in NASBackupProvider.syncBackupStorageStats when no KVM host is in "Up" state #12679

@jmsperu

Description

@jmsperu

problem

NASBackupProvider.syncBackupStorageStats() crashes with a NullPointerException when ResourceManager.findOneRandomRunningHostByHypervisor() returns null. This happens when no KVM host in the zone has status=Up at the exact moment the BackupSyncTask runs (e.g., during management server startup, brief agent disconnections, or host state
transitions).

The NPE kills the entire BackupSyncTask background job every sync interval (default 300s), flooding the management server log with stack traces and preventing backup storage stats from being updated.

Stack Trace

ERROR [o.a.c.b.B.BackupSyncTask] Error trying to run backup-sync background task due to:
[Cannot invoke "com.cloud.host.Host.getId()" because "host" is null].
java.lang.NullPointerException: Cannot invoke "com.cloud.host.Host.getId()" because "host" is null
at org.apache.cloudstack.backup.NASBackupProvider.syncBackupStorageStats(NASBackupProvider.java:544)
at org.apache.cloudstack.backup.BackupManagerImpl$BackupSyncTask.runInContext(BackupManagerImpl.java:1947)

Affected Code

File: plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java`

java
@OverRide
public void syncBackupStorageStats(Long zoneId) {
final List repositories = backupRepositoryDao.listByZoneAndProvider(zoneId, getName());
final Host host = resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM, zoneId);
// host can be null here, but no null check before using it:
for (final BackupRepository repository : repositories) {
...
answer = (BackupStorageStatsAnswer) agentManager.send(host.getId(), command); // NPE
...
}
}

findOneRandomRunningHostByHypervisor in ResourceManagerImpl returns null when no matching host is found:

if (CollectionUtils.isEmpty(hosts)) {
return null;
}

The same pattern also exists in deleteBackup() (line ~450) where the host can be null when the VM is removed and no running KVM host is available.

Suggested Fix

Add a null check after findOneRandomRunningHostByHypervisor, log a warning, and return early:

@OverRide
public void syncBackupStorageStats(Long zoneId) {
final List repositories = backupRepositoryDao.listByZoneAndProvider(zoneId, getName());
if (repositories.isEmpty()) {
return;
}
final Host host = resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM, zoneId);
if (host == null) {
logger.warn("Unable to find a running KVM host in zone {} to sync backup storage stats", zoneId);
return;
}
for (final BackupRepository repository : repositories) {
...
}
}

And similarly for deleteBackup():

Host host = vm != null ? getVMHypervisorHost(vm) :
resourceManager.findOneRandomRunningHostByHypervisor(HypervisorType.KVM, Long.valueOf(backup.getZoneId()));
if (host == null) {
throw new CloudRuntimeException("Unable to find a running KVM host to process backup deletion");
}

Environment

  • CloudStack version: 4.22.0.0
  • Hypervisor: KVM
  • Backup provider: NAS (NFS)
  • OS: Ubuntu 24.04, Java 21

How to Reproduce

  1. Configure NAS backup provider with an NFS backup repository
  2. Assign backup offerings to VMs
  3. Restart cloudstack-management (or wait for a transient host disconnect)
  4. Observe management-server.log — the NPE fires every backup.framework.sync.interval seconds

Impact

  • BackupSyncTask fails completely on every cycle, backup storage capacity stats are never updated
  • Log spam (one full stack trace every 5 minutes)
  • No data loss, but backup monitoring/reporting is degraded

versions

The versions of ACS, hypervisors, storage, network etc..

The steps to reproduce the bug

...

What to do about it?

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions