Suggestion Description
Here we present a low latency oriented topk kernel with on-chip network facility:
https://github.com/yiakwy-xpu-ml-framework-team/flash-float-jit-kernels
In this kernel we resolve latency problem in long context decoding with TopK indexer and NSA (DS3.2). The kernel reduce the latency by half for low batch size workloads.
Operating System
Ubuntu-24.04
GPU
MI355
ROCm Component
flashinfer-rocm
Suggestion Description
Here we present a low latency oriented topk kernel with on-chip network facility:
https://github.com/yiakwy-xpu-ml-framework-team/flash-float-jit-kernels
In this kernel we resolve latency problem in long context decoding with TopK indexer and NSA (DS3.2). The kernel reduce the latency by half for low batch size workloads.
Operating System
Ubuntu-24.04
GPU
MI355
ROCm Component
flashinfer-rocm