Skip to content
This repository was archived by the owner on Feb 27, 2024. It is now read-only.
This repository was archived by the owner on Feb 27, 2024. It is now read-only.

Performance drop for speciifc tile size M=4096 N=4096 K=16 #48

@artyom-beilis

Description

@artyom-beilis

I'm using rx 560 16CU 4GB/gfx803

I run into performance issue when working with matrices of this specific size M=4096, N=4096, K=16, if I modify N to 4097 or 4095 performance is changed dramatically:

./test_gemm_miopengemm -m 4096 -n 4096 -k 16
70.1651 GFLOPS 7.41242 ms
./test_gemm_miopengemm -m 4096 -n 4095 -k 16
254.339 GFLOPS 2.04438 ms
./test_gemm_miopengemm -m 4096 -n 4097 -k 16
290.907 GFLOPS 1.78827 ms

It is rocm 3.7

Also I noticed somewhat similar performance drop in other libraries like clBLAS (171.722 vs 245.325 gflops) but to lesser extend. I'm trying to understand the root cause of the issue - why there is 4x performance drop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions