Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which standard do f32/f64::min/max follow? #87702

Closed
CryZe opened this issue Aug 2, 2021 · 2 comments
Closed

Which standard do f32/f64::min/max follow? #87702

CryZe opened this issue Aug 2, 2021 · 2 comments
Labels
A-floating-point Area: Floating point numbers and arithmetic C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@CryZe
Copy link
Contributor

CryZe commented Aug 2, 2021

I've been working on adding the new WebAssembly SIMD instructions to the wide crate, which is the most widely used crate for portable SIMD on stable Rust. As I finished up the implementation and ran the tests I noticed that various operations behave quite differently on the different architectures. In particular I ran across floating point min and max. Those have quite differing behavior when it comes to NaN handling and -0.0 / +0.0.

When it comes to NaN there's three different strategies I've seen:

  • You can define the resulting value to always be NaN if either side is NaN. This is commonly referred to as NaN propagation.
  • You can define the resulting value to ignore NaN if it encounters it on either side and always choose the other side as the result.
  • You can do a fast implementation (a < b ? b : a) where you don't particularly treat NaN at all and just happen to either prefer the left or right hand side in that case.

Ignoring all NaNs is defined in IEEE Std 754-2008 as maxNum:

maxNum(x, y) is the canonicalized number y if x<y, x if y<x, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x or y, canonicalized (this means results might differ among implementations). When either x or y is a signalingNaN, then theresult is according to 6.2.

Then there's two different strategies when it comes to handling -0.0 and +0.0:

  • You can treat them as the same value and then either prefer returning the left hand side or the right hand side.
  • You can treat -0.0 as smaller than +0.0

The IEEE Std 754-2008 allows the implementation to choose the behavior there. Not so much the IEEE Std 754-2019 which always requires -0.0 to be treated as smaller than +0.0. The 2019 standard defines the following two implementations:

maximum(x, y) is x if x>y, y if y>x, and a quiet NaN if either operand is a NaN, according to 6.2. For this operation, +0 compares greater than −0. Otherwise (i.e., when x=y and signs are the same) it is either x or y.

maximumNumber(x, y) is x if x>y, y if y>x, and the number if one operand is a number and the other is a NaN. For this operation, +0 compares greater than −0. If x=y and signs are the same it is either x or y. If both operands are NaNs, a quiet NaN is returned, according to 6.2. If either operand is a signaling NaN, an invalid operation exception is signaled, but unless both operands are NaNs, the signaling NaN is otherwise ignored and not converted to a quiet NaN as stated in 6.2 for other operations.

Here's a list I created of various languages and processor instructions and how their max implementations behave: Survey of Floating Point Implementations for Maximum

Rust's min and max seem to not follow the latest standard and instead -0.0 and +0.0 are treated as equal. I'd say this behavior is likely incidental as Rust seems to just call into libm. Considering Rust intends to have a specification / standard at some point, we probably would need to cut some ties to libm and specify which IEEE 754 standard Rust intends to use (the reference points to 2008 for now at least). I'm mostly raising this to get some discussion going to see what the plan is moving forward, as it likely makes sense to adopt the new standard at some point.

@CryZe CryZe added the C-bug Category: This is a bug. label Aug 2, 2021
@jonas-schievink jonas-schievink added A-floating-point Area: Floating point numbers and arithmetic T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 2, 2021
@nikic
Copy link
Contributor

nikic commented Aug 2, 2021

Duplicate of #83984.

@nikic nikic closed this as completed Aug 2, 2021
@CryZe
Copy link
Contributor Author

CryZe commented Aug 2, 2021

Ah perfect thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants