Skip to content

Redesign of attention/generic #41

@diptorupd

Description

@diptorupd

The libflashinfer/include/flashinfer/attention/generic CUDA/HIP templates were designed to support CUDA and HIP through same set of generic kernels. As such they have some sections with #if-def for CUDA or HIP.
Since, the original CUDA templates in libflashinfer/include/flashinfer/attention/ are not slotted for removal to maintain parity with upstream, the utility of having CUDA support in the generic directory is limited.

I propose:

  1. we remove all CUDA code sections and rename libflashinfer/include/flashinfer/attention/generic to libflashinfer/include/flashinfer/attention/amdgpu.
  2. Change the source files to be purely HIP sources rather than hybrid CUDA+HIP.
  3. Add conditional compilation based on amdgpu back end CDNA3/CDNA4/RDNA etc as needed to support flavours of AMDGPU.

Metadata

Metadata

Labels

questionFurther information is requestedrefactoringRequires refactoring of codebase.

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions