Fix parsing of trailing zeros: TryParseNumber #47666

pgovind · 2021-01-30T01:45:42Z

We're currently not throwing away the trailing zeros from the fractional part of a number while parsing. This leads us to erroneously round a floating point number sometimes. This PR fixes that. I've also updated the CheckConsistency method to detect trailing zeros.

ghost · 2021-01-30T01:45:46Z

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #46827

We're currently not throwing away the trailing zeros from the fractional part of a number while parsing. This leads us to erroneously round a floating point number sometimes. This PR fixes that. I've also updated the CheckConsistency method to detect trailing zeros.

Author:	pgovind
Assignees:	-
Labels:	`area-System.Numerics`
Milestone:	-

src/libraries/System.Private.CoreLib/src/System/Number.Parsing.cs

danmoseley · 2021-01-30T02:36:25Z

Do we have perf tests.. can you get numbers?

Calculate number of trailing zeros only when number.Digits is updated

tannergooding · 2021-02-02T15:12:28Z

src/libraries/System.Formats.Cbor/tests/Writer/CborWriterTests.Tag.cs

@@ -158,7 +158,7 @@ public static void WriteInteger_SingleValue_HappyPath(string valueString, string
 [InlineData("1", "c4820001")]
 [InlineData("-1", "c4820020")]
 [InlineData("1.1", "c482200b")]
- [InlineData("1.000", "c482221903e8")]


I'm not familiar with cbor. Do we know why this changed in size?

Nor am I. I just figured that we are parsing correctly now and returning 1. So I looked at the test case 2 lines above for 1 and copied that. I can go back and see what value we're returning without the bug fix, but I'm pretty sure it is wrong somehow and this PR is fixing it.

It might be good to get confirmation.

The number of hex digits implies it was over the length 32-bits, but it also looks like there is some general encoding (compression?) here so it's unclear whether the change is actually correct (I would guess it is though).

@eiriktsarpalis for thoughts

The original CBOR encodes 1000e-3, while the new one is 1e0, so I presume this is in line with the intended change in the parser.

src/libraries/System.Runtime/tests/System/DoubleTests.cs

src/libraries/System.Runtime/tests/System/DecimalTests.cs

src/libraries/System.Private.CoreLib/src/System/Number.Parsing.cs

src/libraries/System.Private.CoreLib/src/System/Number.NumberBuffer.cs

src/libraries/System.Runtime/tests/System/DoubleTests.cs

pgovind · 2021-02-03T21:53:53Z

Do we have perf tests.. can you get numbers?

I'm reasonably confident that CI will be green now, so I got some perf numbers:

summary:
worse: 6, geomean: 1.126
total diff: 6

| Slower                                                  | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Tests.Perf_Decimal.Mod                           |      1.21 |            13.13 |            15.88 |         |
| System.Tests.Perf_Decimal.Parse(value: "123456.789")    |      1.14 |            79.49 |            90.56 | several?|
| System.Tests.Perf_Decimal.Floor                         |      1.14 |            11.65 |            13.23 | bimodal |
| System.Tests.Perf_Decimal.TryParse(value: "123456.789") |      1.11 |            80.55 |            89.61 | several?|
| System.Tests.Perf_Decimal.Divide                        |      1.11 |            66.90 |            74.11 |         |
| System.Tests.Perf_Decimal.ToString(value: 123456.789)   |      1.06 |            73.23 |            77.61 | several?|

No Faster results for the provided threshold = 0.001% and noise filter = 0.3ns.

No file given

summary:
better: 4, geomean: 1.034
worse: 1, geomean: 1.027
total diff: 5

| Slower                                                           | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Tests.Perf_Double.Parse(value: "1.7976931348623157e+308") |      1.03 |           436.96 |           448.71 |         |

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Tests.Perf_Double.ToStringWithCultureInfo(value: 12345, culture: zh)      |      1.08 |           153.69 |           142.69 | bimodal |
| System.Tests.Perf_Double.Parse(value: "12345")                                   |      1.02 |            66.27 |            64.96 |         |
| System.Tests.Perf_Double.ToString(value: 12345)                                  |      1.02 |           158.42 |           155.29 | several?|
| System.Tests.Perf_Double.ToStringWithFormat(value: -1.7976931348623157E+308, for |      1.02 |           277.30 |           271.97 | several?|

No file given

summary:
better: 4, geomean: 1.034
worse: 1, geomean: 1.027
total diff: 5

| Slower                                                           | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Tests.Perf_Double.Parse(value: "1.7976931348623157e+308") |      1.03 |           436.96 |           448.71 |         |

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Tests.Perf_Double.ToStringWithCultureInfo(value: 12345, culture: zh)      |      1.08 |           153.69 |           142.69 | bimodal |
| System.Tests.Perf_Double.Parse(value: "12345")                                   |      1.02 |            66.27 |            64.96 |         |
| System.Tests.Perf_Double.ToString(value: 12345)                                  |      1.02 |           158.42 |           155.29 | several?|
| System.Tests.Perf_Double.ToStringWithFormat(value: -1.7976931348623157E+308, for |      1.02 |           277.30 |           271.97 | several?|

No file given

I'm a little skeptical about these perf numbers. This PR touches only the Parse and TryParse methods. The other benchmarks also show a slow down here that I can't explain. For ex: The ToString and ToStringWIthFormat methods are untouched by this PR. All things considered, we're likely slowing down the Parse and TryParse methods a little bit here, but I'd still take the PR since we're fixing a bug here.

src/libraries/System.Private.CoreLib/src/System/Number.NumberBuffer.cs

src/libraries/System.Private.CoreLib/src/System/Number.Parsing.cs

src/libraries/System.Runtime/tests/System/HalfTests.cs

tannergooding · 2021-02-11T23:45:32Z

src/libraries/System.Private.CoreLib/src/System/Number.NumberBuffer.cs

- int numberOfTrailingZeros = 0;
- for (int i = DigitsCount - (int)fractionalDigitsPresent; i < DigitsCount; i++)
+ // For a number like 1.23000, verify that we don't store trailing zeros in Digits
+ // However, if the number of digits exceeds maxDigCount and rounding is required, we store the trailing zeros in the buffer.


Does it require any more complex changes to just trim in this case as well?

I don't remember if this is going to have "cost" when dealing with the internal BigInteger (I think it does) and so it might be better to trim and have HasNonZeroTail correctly set if it doesn't require more complex changes.

If it does require more changes, we can just log it as a "investigate later" issue

I started investigating this a little bit last week. It looks like the number of fractional digits present is used in calculating a fastExponent which in turns determines whether we use the fast/slow path to convert a number to float/double. Also, HasNonZeroTail is only used in the slow path currently. For this change, I decided to just be conservative, but yea I'll log an "investigate later" issue. Would be something to work on in December (at the least).

ghost · 2021-02-12T00:08:17Z

Hello @pgovind!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

tannergooding · 2021-02-17T18:01:52Z

src/libraries/System.Private.CoreLib/src/System/Number.Formatting.cs

@@ -374,7 +374,7 @@ internal static unsafe void DecimalToNumber(ref decimal d, ref NumberBuffer numb
 }
 *dst = (byte)('\0');

- number.CheckConsistency();
+ number.CheckConsistency(skipTrailingZeroCheck: true);


This is because decimal needs to care about the number of trailing zeros, correct?

Actually I'm not uber certain. I see this in the log:

// NumberBuffer for 2060.ToString() 2060: Length 4, Scale 1, ....

I first assumed this would be represented as 2.060*10^3, but it's not. Then there's other cases where Scale is -1. So, for now, I'm turning off the consistency check for Decimal and we can figure out how to turn it back on later

I meant that for decimal, 1000.0 and 1000.00 are represented differently.

> decimal.GetBits(1000.0m) int[4] { 10000, 0, 0, 65536 } > decimal.GetBits(1000.00m) int[4] { 100000, 0, 0, 131072 }

because the number of zeros are actually tracked

pgovind · 2021-02-17T20:13:07Z

I removed the trailing zero check in CheckConsistency now and opened #48418 to track adding it back

pgovind · 2021-02-18T00:34:43Z

CI failures are unrelated. Merging.

pranavkm · 2021-02-19T22:47:59Z

Blazor (both .NET Core and WASM) has a couple of tests where it attempts to verify that decimal -> string conversions roundtrip without any modifications. With this change, trailing 0s for decimal values are trimmed.

decimal.Parse("0.10")

// before: 0.10
// after: 0.1

Is there a way to retain the current behavior of keeping trailing 0s?
Have you considered treating this as breaking change?

pranavkm · 2021-02-20T00:19:51Z

Found a more egregious issue:

Console.WriteLine(int.Parse("3.00", NumberStyles.Number, CultureInfo.InvariantCulture));

Before: 3
After:

Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Number.TryParseNumber(Char*& str, Char* strEnd, NumberStyles styles, NumberBuffer& number, NumberFormatInfo info)
   at System.Number.TryStringToNumber(ReadOnlySpan`1 value, NumberStyles styles, NumberBuffer& number, NumberFormatInfo info)
   at System.Number.TryParseInt32Number(ReadOnlySpan`1 value, NumberStyles styles, NumberFormatInfo info, Int32& result)
   at System.Number.ParseInt32(ReadOnlySpan`1 value, NumberStyles styles, NumberFormatInfo info)
   at System.Int32.Parse(String s, NumberStyles style, IFormatProvider provider)

pgovind · 2021-02-20T00:28:27Z

I stepped out for something, I’ll look at this later tonight. In the meanwhile, tagging @tannergooding and @jeffhandley since this code change already made it to the P2 snap

jeffhandley · 2021-02-20T01:13:10Z

We will definitely need to fix this exception--that egregious issue is release-blocking for sure. So we'll need to get either a fix for that, or we'll need to revert the change and try again in Preview 3.

The original issue that this spawns from is marked as a breaking change, but we need to look at these trailing zeroes cases more closely.

pgovind · 2021-02-20T02:00:30Z

@jeffhandley : Just a note that this PR fixes #46827 which is just a straight bug. I'm looking into the issue now

jeffhandley · 2021-02-20T02:04:32Z

Oh, I see -- I thought it was related to #46874.

pgovind · 2021-02-20T03:12:54Z

Alright, the bug occurs because of this line:

number.DigitsCount = digEnd - numberOfTrailingZeros;

digEnd is 1 and numberOfTrailingZeros = 2, so we end up with -1. The correct value here should be number.DigitsCount = 1. I'll consider a careful fix for this locally and put a PR up.

Currently, this bug only affects Int.Parse calls with an input that has trailing zeros => 1.0, 1.00, 1.000, 1.0000, 1.0100 etc are broken, but 1.01, 1.001, 1.0001, 1.01 etc are not broken.

@jeffhandley : Just wondering about time line here? After I put a PR up, I'm not sure when it'll get approved and merged :)

jeffhandley · 2021-02-20T04:09:02Z

The fix could land early- to mid-week in the release branch next week. We'll want to get @pranavkm's validation of the fix from his scenarios through a private build and probably from the release branch bits too, with that completed before the end of next week.

pgovind · 2021-02-22T21:14:24Z

Is there a way to retain the current behavior of keeping trailing 0s?
Have you considered treating this as breaking change?

#48608 will fix this, so you shouldn't have to do anything @pranavkm

Prashanth Govindarajan added 2 commits January 29, 2021 16:56

Fix parsing of numbers to detect trailing zeros

1cedc7d

sq

28b5598

pgovind added the area-System.Numerics label Jan 30, 2021

pgovind requested a review from tannergooding January 30, 2021 01:45

pgovind commented Jan 30, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Number.Parsing.cs Outdated Show resolved Hide resolved

Prashanth Govindarajan added 3 commits February 1, 2021 09:33

Typo

d5b75c4

Fix a bug

a6ad2bb

Calculate number of trailing zeros only when number.Digits is updated

Test case should be culture agnostic

5092b84

This was referenced Feb 2, 2021

slicebuffers_success variant tests failing sporadically #47734

Closed

HttpWebRequestTest_Sync.ReadWriteTimeout_CancelsResponse failed in CI mono Linux x64 #47728

Closed

Testing out the new system jaredpar/runfo#71

Open