Skip to content

refactor generator code #92

@carlushuang

Description

@carlushuang

need to generalize code generation logic for different direction, precision, arch

  • global load/store:

    • support different precision, fp32/fp16(short)/ubyte
    • support 2d/3d load, and have exec mask from different dimension
    • support global_load/buffer_load and accumulate through sgpr/vgpr
  • share memory load/store:

    • support 1d/2d load/store from different precision
    • support k pack
  • coalescing store:

    • support multiple groups to do coalescing store
    • support fp16/int8 final store out pack operation
    • support some case not need LDS shuffle
    • vector write out support
  • mfma main loop:

    • different repeat/step
    • support need inst-schedule or no need inst-schedule
    • support k pack suitable from instruction requirement and precision
    • support share load multiple k_pack at once, then do mfma multiple times
    • pass through LDS
  • fma main loop

  • thread mapping

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions