Problem Description
Dear ROCm Team,
We are encountering an issue where HIP-based applications fail to detect the AMD GPU when running inside a rootless Docker container configured with CDI. The test program terminates with No ROCm-capable device is found error.
Error messages
root@119afb106f8b:/# groups
root video render
root@119afb106f8b:/# amd-smi
+------------------------------------------------------------------------------+
| AMD-SMI 26.2.1+fc0010cf6a amdgpu version: N/A ROCm version: 7.2.0 |
| Platform: Linux Baremetal |
|-------------------------------------+----------------------------------------|
| BDF GPU-Name | Mem-Uti Temp UEC Power-Usage |
| GPU HIP-ID OAM-ID Partition-Mode | GFX-Uti Fan Mem-Usage |
|=====================================+========================================|
| 0000:03:00.0 N/A | 0 % 26 °C 0 8/300 W |
| 0 3 N/A N/A | 0 % 20.0 % 57/32624 MB |
|-------------------------------------+----------------------------------------|
| 0000:43:00.0 N/A | 0 % 26 °C 0 11/300 W |
| 1 2 N/A N/A | 0 % 20.0 % 57/32624 MB |
|-------------------------------------+----------------------------------------|
| 0000:83:00.0 N/A | 0 % 23 °C 0 14/300 W |
| 2 1 N/A N/A | 0 % 20.0 % 57/30576 MB |
|-------------------------------------+----------------------------------------|
| 0000:c3:00.0 N/A | 0 % 24 °C 0 8/300 W |
| 3 0 N/A N/A | 0 % 20.0 % 57/32624 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes: |
| GPU PID Process Name GTT_MEM VRAM_MEM MEM_USAGE CU % |
|==============================================================================|
| No running processes found |
+------------------------------------------------------------------------------+
root@119afb106f8b:/# rocminfo
ROCk module version 6.16.13 is loaded
Unable to open /dev/kfd read-write: Permission denied
root is not member of "nogroup" group, the default DRM access group. Users must be a member of the "nogroup" group or another DRM access group in order for ROCm applications to run successfully.
root@119afb106f8b:/# /works/a.out
before: 1 2 3 4 5 6 7 8 9 10
[HIP Error] no ROCm-capable device is detected at main.hip:30
root@119afb106f8b:/# ls -n /dev/kfd
crw-rw---- 1 65534 65534 511, 0 Mar 15 11:30 /dev/kfd
root@119afb106f8b:/# ls -n /dev/dri
total 0
crw-rw----+ 1 65534 65534 226, 1 Mar 16 21:17 card1
crw-rw----+ 1 65534 65534 226, 2 Mar 16 21:17 card2
crw-rw----+ 1 65534 65534 226, 3 Mar 16 21:17 card3
crw-rw----+ 1 65534 65534 226, 4 Mar 16 21:17 card4
crw-rw----+ 1 65534 65534 226, 128 Mar 15 11:30 renderD128
crw-rw----+ 1 65534 65534 226, 129 Mar 15 11:30 renderD129
crw-rw----+ 1 65534 65534 226, 130 Mar 15 11:30 renderD130
crw-rw----+ 1 65534 65534 226, 131 Mar 15 11:30 renderD131
Expected results
Print the correct summation and exit normally as running outside the container.
Specification
- Ubuntu 24.04.4 HWE Kernel (6.17.0-14)
- ROCm 7.2.70200-1 (w/ AMD Radeon AI Pro R9700)
- AMD Container ToolKit 1.3.4
- Docker 29.3.0
Best regards,
Vac Chen
Operating System
Ubuntu Server 24.04.4 HWE Kernel (6.17.0-14)
CPU
AMD EPYC 7413 24-Core Processor
GPU
AMD Radeon AI PRO R9700
ROCm Version
ROCm 7.2.70200-1
ROCm Component
HIP
Steps to Reproduce
- Generate CDI specification
sudo amd-ctk cdi generate
sudo amd-ctk cdi validate
- Initialize rootless docker
dockerd-rootless-setuptool.sh install
- Start container using the following command (current user has been added to group
video, render already)
docker run -it --rm --device amd.com/gpu=all --group-add render --group-add video rocm/dev-ubuntu-24.04
The GPU is correctly listed and healthy when running amd-smi inside the container, and the device nodes are visible under /dev/dri/.
However, the compiled HIP program returns an error indicating that no ROCm-capable device is found.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
main.hip (built with the commandhipcc main.hip)
#include <iostream>
#include <hip/hip_runtime.h>
#define HIP_CHECK(call) \
do { \
hipError_t err = (call); \
if (err != hipSuccess) { \
std::fprintf(stderr, "[HIP Error] %s at %s:%d\n", hipGetErrorString(err), __FILE__, __LINE__); \
exit(EXIT_FAILURE); \
} \
} while (0)
__global__ void _vadd(float* a, float* b, int n) {
int i = threadIdx.x + blockDim.x * blockIdx.x;
if (i < n) {
a[i] += b[i];
}
}
void print(float* a, int n) {
for (int i = 0; i < n; i++) {
std::cout << a[i] << " ";
}
std::cout << std::endl;
}
void vadd(float* a, float* b, int n) {
int size = n * sizeof(float);
float *_a, *_b;
HIP_CHECK(hipMalloc(&_a, size));
HIP_CHECK(hipMalloc(&_b, size));
HIP_CHECK(hipMemcpy(_a, a, size, hipMemcpyDefault));
HIP_CHECK(hipMemcpy(_b, b, size, hipMemcpyDefault));
int bs = 16;
int gs = 1 + (n - 1) / bs;
_vadd<<<gs, bs>>>(_a, _b, n);
HIP_CHECK(hipGetLastError());
HIP_CHECK(hipDeviceSynchronize());
HIP_CHECK(hipMemcpy(a, _a, size, hipMemcpyDefault));
HIP_CHECK(hipFree(_a));
HIP_CHECK(hipFree(_b));
}
int main() {
int n = 10;
float *a = new float[n], *b = new float[n];
int size = n * sizeof(float);
for (int i = 0; i < n; i++) {
a[i] = b[i] = i + 1;
}
std::cout << "before: ";
print(a, n);
vadd(a, b, n);
std::cout << "after: ";
print(a, n);
delete[] a;
delete[] b;
return 0;
}
Problem Description
Dear ROCm Team,
We are encountering an issue where HIP-based applications fail to detect the AMD GPU when running inside a rootless Docker container configured with CDI. The test program terminates with
No ROCm-capable device is founderror.Error messages
Expected results
Print the correct summation and exit normally as running outside the container.
Specification
Best regards,
Vac Chen
Operating System
Ubuntu Server 24.04.4 HWE Kernel (6.17.0-14)
CPU
AMD EPYC 7413 24-Core Processor
GPU
AMD Radeon AI PRO R9700
ROCm Version
ROCm 7.2.70200-1
ROCm Component
HIP
Steps to Reproduce
video, renderalready)The GPU is correctly listed and healthy when running
amd-smiinside the container, and the device nodes are visible under/dev/dri/.However, the compiled HIP program returns an error indicating that no ROCm-capable device is found.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
main.hip (built with the command
hipcc main.hip)