Skip to content

[RFC] MiniMax-M2.5 FP8 — Marathon Optimized (MI355X) #3192

@peymanr

Description

@peymanr

This issue tracks a series of 3 pull request(s) targeting ROCm/aiter.

Status: PRs being prepared — full description will be added shortly.

  • PR 1: [Perf][Kernel] Add decode buffer caches to eliminate per-step HIP malloc in fused_moe
  • PR 2: [Perf][Kernel] Add gfx950 1-stage ASM fast path for FP8 blockscale decode (ntok<=512)
  • PR 3: [Kernel][Perf] Add MiniMax-M2.5 GEMM and FMoE tuning configs for gfx950

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions