The libflashinfer/include/flashinfer/attention/generic CUDA/HIP templates were designed to support CUDA and HIP through same set of generic kernels. As such they have some sections with #if-def for CUDA or HIP.
Since, the original CUDA templates in libflashinfer/include/flashinfer/attention/ are not slotted for removal to maintain parity with upstream, the utility of having CUDA support in the generic directory is limited.
I propose:
- we remove all CUDA code sections and rename
libflashinfer/include/flashinfer/attention/generic to libflashinfer/include/flashinfer/attention/amdgpu.
- Change the source files to be purely HIP sources rather than hybrid CUDA+HIP.
- Add conditional compilation based on amdgpu back end CDNA3/CDNA4/RDNA etc as needed to support flavours of AMDGPU.
The
libflashinfer/include/flashinfer/attention/genericCUDA/HIP templates were designed to support CUDA and HIP through same set of generic kernels. As such they have some sections with#if-deffor CUDA or HIP.Since, the original CUDA templates in
libflashinfer/include/flashinfer/attention/are not slotted for removal to maintain parity with upstream, the utility of having CUDA support in thegenericdirectory is limited.I propose:
libflashinfer/include/flashinfer/attention/generictolibflashinfer/include/flashinfer/attention/amdgpu.