Skip to content

using fp32 data format to simulate mx data format is not equivalent? #6

@jackiechen19940708

Description

@jackiechen19940708

It's nice work but I have some questions:
we see that we use template T=float (see following code)
(1)to use fp32 to represent mx data format and
(2)simulating mx format calculation operation using fp32
so I think this may exist in-equivalent with real mx data format representation and operation. Do you use FPGA to evaluate how much error between the fp32-simuation and real mx data format?
template<typename T> __global__ void quantize_mx_cuda_kernel( const T* __restrict__ input, const int scale_bits, const int elem_ebits, const int elem_mbits, const float elem_max_norm, const float* __restrict__ max_values, const long total_size, const int axis_size, const int post_axis_size, const bool flush_fp32_subnorms, const RoundingMode rounding_mode, T* __restrict__ output ) {

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions