Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect parsing of large floating point number: one bit offset #48648

Closed
Ilia-Kosenkov opened this issue Feb 23, 2021 · 9 comments
Closed

Incorrect parsing of large floating point number: one bit offset #48648

Ilia-Kosenkov opened this issue Feb 23, 2021 · 9 comments
Assignees

Comments

@Ilia-Kosenkov
Copy link
Contributor

Description

Parsing large floating-point numbers sometimes produces a number that differs from the correct value by one bit.
In particular, "6250000000000000000000000000000000e-12" produces this error.
The problem is illustrated in this sharplab.io sample

Let @double = double.Parse("6250000000000000000000000000000000e-12", NumberStyles.Any).
The first indication of the parsing problem is that the default string representation of @double is "6.250000000000001E+21" (note an extra 1 at the end).
The (little-endian) byte-representation (obtained with e.g. BitConverter.GetBytes) is
"F74AE1C7022D7544".
This representation is unfortunately incorrect.
The value 6.25e21 can be represented more accurately by a double-precision variable, and the byte-representation of this value is in fact
"F64AE1C7022D7544".
The difference (for this particular number) is one bit (F6 vs F7).
The mentioned above sharplab sample nicely illustrates this problem.

This issue was discovered when dotnet's parsing methods were run against a large set of floating point tests provided in this repository
https:/nigeltao/parse-number-fxx-test-data

The test case discussed in this issue is found in data/ibm-fpgen.txt, line 64724

7C00 63A96816 44752D02C7E14AF6 6250000000000000000000000000000000e-12

There may be other (similar) issues, which I have not found yet.

Configuration

So far tested on Windows 10 x64 20H2 19042.804

  • dotnet @ Windows:

    • 5.0.103
    • 6.0.100-preview.1.21103.13
  • dotnet @ WSL2:

    • 5.0.103

Whatever backend sharplab.io uses also reproduces this error.

Regression?

Did not test earlier versions of dotnet.

Other information

This report would not be possible without https:/nigeltao/parse-number-fxx-test-data project and its contributors.

I verified this particular parsing case against Rust implementation, see this rust demo.

Maybe related to #48119
Partially inspired by discussion around #48646.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 23, 2021
@Ilia-Kosenkov
Copy link
Contributor Author

This issue may arise here

mantissa = RightShiftWithRounding(mantissa, -normalMantissaShift, hasZeroTail);

And here

private static ulong RightShiftWithRounding(ulong value, int shift, bool hasZeroTail)
{
// If we'd need to shift further than it is possible to shift, the answer
// is always zero:
if (shift >= 64)
{
return 0;
}
ulong extraBitsMask = (1UL << (shift - 1)) - 1;
ulong roundBitMask = (1UL << (shift - 1));
ulong lsbBitMask = 1UL << shift;
bool lsbBit = (value & lsbBitMask) != 0;
bool roundBit = (value & roundBitMask) != 0;
bool hasTailBits = !hasZeroTail || (value & extraBitsMask) != 0;
return (value >> shift) + (ShouldRoundUp(lsbBit, roundBit, hasTailBits) ? 1UL : 0);
}

There should be no rounding for this number (if I understand this correctly).
I am unable to trace what causes ShouldRoundUp to return true instead of false.

@ghost
Copy link

ghost commented Feb 23, 2021

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Parsing large floating-point numbers sometimes produces a number that differs from the correct value by one bit.
In particular, "6250000000000000000000000000000000e-12" produces this error.
The problem is illustrated in this sharplab.io sample

Let @double = double.Parse("6250000000000000000000000000000000e-12", NumberStyles.Any).
The first indication of the parsing problem is that the default string representation of @double is "6.250000000000001E+21" (note an extra 1 at the end).
The (little-endian) byte-representation (obtained with e.g. BitConverter.GetBytes) is
"F74AE1C7022D7544".
This representation is unfortunately incorrect.
The value 6.25e21 can be represented more accurately by a double-precision variable, and the byte-representation of this value is in fact
"F64AE1C7022D7544".
The difference (for this particular number) is one bit (F6 vs F7).
The mentioned above sharplab sample nicely illustrates this problem.

This issue was discovered when dotnet's parsing methods were run against a large set of floating point tests provided in this repository
https:/nigeltao/parse-number-fxx-test-data

The test case discussed in this issue is found in data/ibm-fpgen.txt, line 64724

7C00 63A96816 44752D02C7E14AF6 6250000000000000000000000000000000e-12

There may be other (similar) issues, which I have not found yet.

Configuration

So far tested on Windows 10 x64 20H2 19042.804

  • dotnet @ Windows:

    • 5.0.103
    • 6.0.100-preview.1.21103.13
  • dotnet @ WSL2:

    • 5.0.103

Whatever backend sharplab.io uses also reproduces this error.

Regression?

Did not test earlier versions of dotnet.

Other information

This report would not be possible without https:/nigeltao/parse-number-fxx-test-data project and its contributors.

I verified this particular parsing case against Rust implementation, see this rust demo.

Maybe related to #48119
Partially inspired by discussion around #48646.

Author: Ilia-Kosenkov
Assignees: -
Labels:

area-System.Numerics, untriaged

Milestone: -

@tannergooding
Copy link
Member

@pgovind, can you check if this is handled by your trailing zero fix?

@Ilia-Kosenkov
Copy link
Contributor Author

Just to clarify - after running parsing tests for all of the test datasets (in this folder, 5 268 191 strings), this is the only value that fails parsing test, so this is probably something quite rare.

@tannergooding
Copy link
Member

Thanks! We had also run another large suite from ES6 covering 100m inputs and hadn't found any failures there either, so I expect you're right.

The Roslyn implementation doesn't have this bug so it was likely introduced by one of the refactorings or perf improvements on the libraries side.
Given the large number of trailing zeros with a negative exponent here, I'm hoping that it's fixed by the bug that @pgovind recently fixed.

@tannergooding
Copy link
Member

We should make sure to add a regression test for this before closing the issue in either case.

@pgovind
Copy link
Contributor

pgovind commented Feb 23, 2021

Hmm weird, this is not fixed by https:/dotnet/runtime/pull/47666/files. Investigating locally.

@pgovind pgovind self-assigned this Feb 23, 2021
@pgovind pgovind removed the untriaged New issue has not been triaged by the area owner label Feb 23, 2021
@pgovind
Copy link
Contributor

pgovind commented Feb 23, 2021

So, this is similar to #46827. #47666 does not fix this because #47666 only works when we encounter a decimal point in the input. Since there is no decimal point in the input here, we never calculate the number of trailing zeros.

The fix here is to carefully update the trailing zero calculation to take such cases into account.

pgovind pushed a commit to pgovind/runtime that referenced this issue Feb 24, 2021
Ilia-Kosenkov added a commit to Ilia-Kosenkov/Backports that referenced this issue Feb 24, 2021
Ilia-Kosenkov added a commit to Ilia-Kosenkov/Backports that referenced this issue Feb 24, 2021
@ghost ghost closed this as completed in 5b04977 Feb 26, 2021
github-actions bot pushed a commit that referenced this issue Feb 27, 2021
danmoseley pushed a commit that referenced this issue Feb 28, 2021
…umbers (#48857)

* Track trailing zeros only for floating point numbers

* Undo previous unit test change

* Add a roundtrip unit test

* Move check outside the loop

* Globalization

* Also fix #48648 and unit tests

* Assert and unit test

Co-authored-by: Prashanth Govindarajan <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2021
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants