Skip to content

[TASK] Add A New Quant Ball for FP32-MXFP Conversion #26

@shirohasuki

Description

@shirohasuki

Deliverables

  • Add an MXFP ball RTL implementation in the prototype lib (under the arch path).
  • A Pull Request (PR) containing a test written in C for this operation and a README to introduce your design.
  • Report the performance results in this issue.

Task Description

  • MXFP is a lower-precision floating-point representation designed to reduce data size and simplify computations in the following process. Using MXFP can improve throughput and hardware efficiency in bandwidth-sensitive workloads, while still maintaining acceptable numerical quality for many ML scenarios.
  • You can learn this format and its variants, starting from this paper, "With Shared Microexponents, A Little Shifting Goes a Long Way".
  • As we envisage, an FP32 matrix will be loaded into the banks, and then a your customised MXFP instruction will read the data from one bank into the ball you are to implement, before outputting it to another bank.
  • You can refer to the previous Pull Request (Completed the development of ReluBall and further improved the operation manual #6) for the detailed implementation.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions