Skip to content

libm: Add fmaf16#861

Open
tgross35 wants to merge 2 commits intorust-lang:mainfrom
tgross35:f16-fma
Open

libm: Add fmaf16#861
tgross35 wants to merge 2 commits intorust-lang:mainfrom
tgross35:f16-fma

Conversation

@tgross35
Copy link
Copy Markdown
Contributor

@tgross35 tgross35 commented Apr 20, 2025

Recreated from rust-lang/libm#419

Comment thread libm/src/math/fma_wide.rs Outdated
@@ -5,11 +5,10 @@ use super::support::{FpResult, IntTy, Round, Status};
use super::{CastFrom, CastInto, DFloat, Float, HFloat, MinInt};

// Placeholder so we can have `fmaf16` in the `Float` trait.
Copy link
Copy Markdown

@tmvkrpxl0 tmvkrpxl0 Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really a placeholder if it (now) has implementation?

Copy link
Copy Markdown
Contributor Author

@tgross35 tgross35 Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, updated to a doc comment :)

@tgross35 tgross35 force-pushed the f16-fma branch 3 times, most recently from 40f4317 to 1c9cf79 Compare February 12, 2026 09:00
@rustbot

This comment has been minimized.

@tgross35 tgross35 force-pushed the f16-fma branch 7 times, most recently from 11e2b85 to 1909bf7 Compare April 20, 2026 10:50
@tgross35
Copy link
Copy Markdown
Contributor Author

Baseline at 1909bf7:

icount::icount_bench_math_group::icount_bench_fmaf16 logspace:(setup_fmaf16())
  Baselines:                   arch_enabled|arch_enabled (old)
  Instructions:                       64543|N/A                  (*********)
  L1 Hits:                            71915|N/A                  (*********)
  LL Hits:                                3|N/A                  (*********)
  RAM Hits:                              18|N/A                  (*********)
  Total read+write:                   71936|N/A                  (*********)
  Estimated Cycles:                   72560|N/A                  (*********)

Comment thread libm/src/math/fmaf16.rs
} else {
rneg = mneg;
m64 + z64
};
Copy link
Copy Markdown
Contributor

@quaternic quaternic Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we have the result as a sign-magnitude fixed-point value that just needs rounding. The rest of this function might be a useful to factor out, and test independently. Something like

/// Rounds the value `n * 2^s` to `f16`
fn f16_from_u64_with_scale(n: u64, s: i32) -> f16

And this could return

f16_from_u64_with_scale(r64, -40).copysign(/* from rneg */)

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll get this in my next round)

Comment thread libm/src/math/fmaf16.rs Outdated
Comment on lines +177 to +192
// Apply rounding from LSB and GRS bits
if r & 0b01000 != 0 && r & 0b10111 != 0 {
r += 0b1000;
}

if r & (1 << 15) != 0 {
// rounding overflowed to the next power
r >>= 1;
rexp += 1;
println!("round rexp: {rexp}");
}
if r & (1 << 14) != 0 {
// ensure subnormals that rounded up get the exponent of 1
rexp = rexp.max(1);
println!("sub rexp: {rexp}");
}
Copy link
Copy Markdown
Contributor

@quaternic quaternic Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A handy trick for doing the rounding: The effect of rounding a finite magnitude up is always just +1 to the bits of the float. If the mantissa had its maximum value, it carries into the exponent field but that still gives the correct value because the mantissa is then all zeroes.

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added back now

@tgross35 tgross35 changed the title Add fmaf16 libm: Add fmaf16 Apr 20, 2026
@tgross35 tgross35 force-pushed the f16-fma branch 2 times, most recently from 18c21ac to de8e37f Compare April 22, 2026 10:59
@tgross35
Copy link
Copy Markdown
Contributor Author

Scratch: https://rust.godbolt.org/z/T9Tb66q8W

@tgross35
Copy link
Copy Markdown
Contributor Author

tgross35 commented Apr 22, 2026

On x86 and other targets that need to do f16 math via soft f32 conversions, it's probably more efficient to continue with the rest of this routine than pay for an __extendhfsf2/__truncsfhf2 roundtrip for x * y early exits. Perhaps even unconditionally since many arches with any f16 support also have an fmaf16 instruction so don't need this routine, and there's a good chance that anything without fmaf16 but with cheap f16<->f32 will have cheap f16<->f64 and also won't need it.

I'll investigate this more in a followup.

@tgross35 tgross35 force-pushed the f16-fma branch 3 times, most recently from 59a6a1d to 26857cc Compare April 23, 2026 09:18
@tgross35 tgross35 force-pushed the f16-fma branch 3 times, most recently from 7ed94fe to 28ba66a Compare April 23, 2026 11:18
@tgross35
Copy link
Copy Markdown
Contributor Author

Okay, I think this should be in a pretty reasonable state.

@tgross35 tgross35 marked this pull request as ready for review April 23, 2026 11:22
@tgross35 tgross35 requested a review from quaternic April 23, 2026 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants