Required prerequisites
Questions
When i run the gemm.py example on an RTX 2080 Ti , it cannot support fp16 Tensorcore MMA. i tested other examples and encountered this issue in all cases.
My questions:
Does it mean if I want to use fp16 Tensorcore MMA, I`ll have to use a better gpu?
Are there other methods available?
there are datailed error message:
"Attempting to use SM80_16x8x16_F32F16F16F32_TN without CUTE_ARCH_MMA_SM80_ENABLED"failed. /data2/zhaojingyan/miniconda3/envs/zjy1/lib/python3.10/site-packages/tilelang/3rdparty/cutlass/include/cute/arch/mma_sm80.hpp:183: static void cute::SM80_16x8x16_F32F16F16F32_TN::fma(float &, float &, float &, float &, const unsigned int &, const unsigned int &, const unsigned int &, const unsigned int &, const unsigned int &, const unsigned int &, const float &, const float &, const float &, const float &): block: [1,1,0], thread: [63,0,0] Assertion0 && "Attempting to use SM80_16x8x16_F32F16F16F32_TN without CUTE_ARCH_MMA_SM80_ENABLED"` failed.
corrupted size vs. prev_size while consolidating
Required prerequisites
Questions
When i run the gemm.py example on an RTX 2080 Ti , it cannot support fp16 Tensorcore MMA. i tested other examples and encountered this issue in all cases.
My questions:
Does it mean if I want to use fp16 Tensorcore MMA, I`ll have to use a better gpu?
Are there other methods available?
there are datailed error message:
"Attempting to use SM80_16x8x16_F32F16F16F32_TN without CUTE_ARCH_MMA_SM80_ENABLED"
failed. /data2/zhaojingyan/miniconda3/envs/zjy1/lib/python3.10/site-packages/tilelang/3rdparty/cutlass/include/cute/arch/mma_sm80.hpp:183: static void cute::SM80_16x8x16_F32F16F16F32_TN::fma(float &, float &, float &, float &, const unsigned int &, const unsigned int &, const unsigned int &, const unsigned int &, const unsigned int &, const unsigned int &, const float &, const float &, const float &, const float &): block: [1,1,0], thread: [63,0,0] Assertion0 && "Attempting to use SM80_16x8x16_F32F16F16F32_TN without CUTE_ARCH_MMA_SM80_ENABLED"` failed.corrupted size vs. prev_size while consolidating