Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing of trailing zeros: TryParseNumber #47666

Merged
merged 14 commits into from
Feb 18, 2021

Conversation

pgovind
Copy link
Contributor

@pgovind pgovind commented Jan 30, 2021

Fixes #46827

We're currently not throwing away the trailing zeros from the fractional part of a number while parsing. This leads us to erroneously round a floating point number sometimes. This PR fixes that. I've also updated the CheckConsistency method to detect trailing zeros.

@ghost
Copy link

ghost commented Jan 30, 2021

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #46827

We're currently not throwing away the trailing zeros from the fractional part of a number while parsing. This leads us to erroneously round a floating point number sometimes. This PR fixes that. I've also updated the CheckConsistency method to detect trailing zeros.

Author: pgovind
Assignees: -
Labels:

area-System.Numerics

Milestone: -

@danmoseley
Copy link
Member

Do we have perf tests.. can you get numbers?

Prashanth Govindarajan added 3 commits February 1, 2021 09:33
Calculate number of trailing zeros only when number.Digits is updated
@@ -158,7 +158,7 @@ public static void WriteInteger_SingleValue_HappyPath(string valueString, string
[InlineData("1", "c4820001")]
[InlineData("-1", "c4820020")]
[InlineData("1.1", "c482200b")]
[InlineData("1.000", "c482221903e8")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with cbor. Do we know why this changed in size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor am I. I just figured that we are parsing correctly now and returning 1. So I looked at the test case 2 lines above for 1 and copied that. I can go back and see what value we're returning without the bug fix, but I'm pretty sure it is wrong somehow and this PR is fixing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to get confirmation.

The number of hex digits implies it was over the length 32-bits, but it also looks like there is some general encoding (compression?) here so it's unclear whether the change is actually correct (I would guess it is though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiriktsarpalis for thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original CBOR encodes 1000e-3, while the new one is 1e0, so I presume this is in line with the intended change in the parser.

@pgovind
Copy link
Contributor Author

pgovind commented Feb 3, 2021

Do we have perf tests.. can you get numbers?

I'm reasonably confident that CI will be green now, so I got some perf numbers:

summary:
worse: 6, geomean: 1.126
total diff: 6

| Slower                                                  | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Tests.Perf_Decimal.Mod                           |      1.21 |            13.13 |            15.88 |         |
| System.Tests.Perf_Decimal.Parse(value: "123456.789")    |      1.14 |            79.49 |            90.56 | several?|
| System.Tests.Perf_Decimal.Floor                         |      1.14 |            11.65 |            13.23 | bimodal |
| System.Tests.Perf_Decimal.TryParse(value: "123456.789") |      1.11 |            80.55 |            89.61 | several?|
| System.Tests.Perf_Decimal.Divide                        |      1.11 |            66.90 |            74.11 |         |
| System.Tests.Perf_Decimal.ToString(value: 123456.789)   |      1.06 |            73.23 |            77.61 | several?|

No Faster results for the provided threshold = 0.001% and noise filter = 0.3ns.

No file given
summary:
better: 4, geomean: 1.034
worse: 1, geomean: 1.027
total diff: 5

| Slower                                                           | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Tests.Perf_Double.Parse(value: "1.7976931348623157e+308") |      1.03 |           436.96 |           448.71 |         |

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Tests.Perf_Double.ToStringWithCultureInfo(value: 12345, culture: zh)      |      1.08 |           153.69 |           142.69 | bimodal |
| System.Tests.Perf_Double.Parse(value: "12345")                                   |      1.02 |            66.27 |            64.96 |         |
| System.Tests.Perf_Double.ToString(value: 12345)                                  |      1.02 |           158.42 |           155.29 | several?|
| System.Tests.Perf_Double.ToStringWithFormat(value: -1.7976931348623157E+308, for |      1.02 |           277.30 |           271.97 | several?|

No file given

summary:
better: 4, geomean: 1.034
worse: 1, geomean: 1.027
total diff: 5

| Slower                                                           | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Tests.Perf_Double.Parse(value: "1.7976931348623157e+308") |      1.03 |           436.96 |           448.71 |         |

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Tests.Perf_Double.ToStringWithCultureInfo(value: 12345, culture: zh)      |      1.08 |           153.69 |           142.69 | bimodal |
| System.Tests.Perf_Double.Parse(value: "12345")                                   |      1.02 |            66.27 |            64.96 |         |
| System.Tests.Perf_Double.ToString(value: 12345)                                  |      1.02 |           158.42 |           155.29 | several?|
| System.Tests.Perf_Double.ToStringWithFormat(value: -1.7976931348623157E+308, for |      1.02 |           277.30 |           271.97 | several?|

No file given

I'm a little skeptical about these perf numbers. This PR touches only the Parse and TryParse methods. The other benchmarks also show a slow down here that I can't explain. For ex: The ToString and ToStringWIthFormat methods are untouched by this PR. All things considered, we're likely slowing down the Parse and TryParse methods a little bit here, but I'd still take the PR since we're fixing a bug here.

Prashanth Govindarajan added 2 commits February 10, 2021 10:37
int numberOfTrailingZeros = 0;
for (int i = DigitsCount - (int)fractionalDigitsPresent; i < DigitsCount; i++)
// For a number like 1.23000, verify that we don't store trailing zeros in Digits
// However, if the number of digits exceeds maxDigCount and rounding is required, we store the trailing zeros in the buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it require any more complex changes to just trim in this case as well?

I don't remember if this is going to have "cost" when dealing with the internal BigInteger (I think it does) and so it might be better to trim and have HasNonZeroTail correctly set if it doesn't require more complex changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it does require more changes, we can just log it as a "investigate later" issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started investigating this a little bit last week. It looks like the number of fractional digits present is used in calculating a fastExponent which in turns determines whether we use the fast/slow path to convert a number to float/double. Also, HasNonZeroTail is only used in the slow path currently. For this change, I decided to just be conservative, but yea I'll log an "investigate later" issue. Would be something to work on in December (at the least).

@ghost
Copy link

ghost commented Feb 12, 2021

Hello @pgovind!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@@ -374,7 +374,7 @@ internal static unsafe void DecimalToNumber(ref decimal d, ref NumberBuffer numb
}
*dst = (byte)('\0');

number.CheckConsistency();
number.CheckConsistency(skipTrailingZeroCheck: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because decimal needs to care about the number of trailing zeros, correct?

Copy link
Contributor Author

@pgovind pgovind Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not uber certain. I see this in the log:

// NumberBuffer for 2060.ToString()
2060: Length 4, Scale 1, ....

I first assumed this would be represented as 2.060*10^3, but it's not. Then there's other cases where Scale is -1. So, for now, I'm turning off the consistency check for Decimal and we can figure out how to turn it back on later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that for decimal, 1000.0 and 1000.00 are represented differently.

> decimal.GetBits(1000.0m)
int[4] { 10000, 0, 0, 65536 }
> decimal.GetBits(1000.00m)
int[4] { 100000, 0, 0, 131072 }

because the number of zeros are actually tracked

@pgovind
Copy link
Contributor Author

pgovind commented Feb 17, 2021

I removed the trailing zero check in CheckConsistency now and opened #48418 to track adding it back

@pgovind
Copy link
Contributor Author

pgovind commented Feb 18, 2021

CI failures are unrelated. Merging.

@pgovind pgovind merged commit 47f8173 into dotnet:master Feb 18, 2021
@pranavkm
Copy link
Contributor

pranavkm commented Feb 19, 2021

Blazor (both .NET Core and WASM) has a couple of tests where it attempts to verify that decimal -> string conversions roundtrip without any modifications. With this change, trailing 0s for decimal values are trimmed.

decimal.Parse("0.10")

// before: 0.10
// after: 0.1
  • Is there a way to retain the current behavior of keeping trailing 0s?
  • Have you considered treating this as breaking change?

@pranavkm
Copy link
Contributor

Found a more egregious issue:

Console.WriteLine(int.Parse("3.00", NumberStyles.Number, CultureInfo.InvariantCulture));

Before: 3
After:

Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Number.TryParseNumber(Char*& str, Char* strEnd, NumberStyles styles, NumberBuffer& number, NumberFormatInfo info)
   at System.Number.TryStringToNumber(ReadOnlySpan`1 value, NumberStyles styles, NumberBuffer& number, NumberFormatInfo info)
   at System.Number.TryParseInt32Number(ReadOnlySpan`1 value, NumberStyles styles, NumberFormatInfo info, Int32& result)
   at System.Number.ParseInt32(ReadOnlySpan`1 value, NumberStyles styles, NumberFormatInfo info)
   at System.Int32.Parse(String s, NumberStyles style, IFormatProvider provider)

@pgovind
Copy link
Contributor Author

pgovind commented Feb 20, 2021

I stepped out for something, I’ll look at this later tonight. In the meanwhile, tagging @tannergooding and @jeffhandley since this code change already made it to the P2 snap

@jeffhandley
Copy link
Member

We will definitely need to fix this exception--that egregious issue is release-blocking for sure. So we'll need to get either a fix for that, or we'll need to revert the change and try again in Preview 3.

The original issue that this spawns from is marked as a breaking change, but we need to look at these trailing zeroes cases more closely.

@pgovind
Copy link
Contributor Author

pgovind commented Feb 20, 2021

@jeffhandley : Just a note that this PR fixes #46827 which is just a straight bug. I'm looking into the issue now

@jeffhandley
Copy link
Member

Oh, I see -- I thought it was related to #46874.

@pgovind
Copy link
Contributor Author

pgovind commented Feb 20, 2021

Alright, the bug occurs because of this line:

number.DigitsCount = digEnd - numberOfTrailingZeros;

digEnd is 1 and numberOfTrailingZeros = 2, so we end up with -1. The correct value here should be number.DigitsCount = 1. I'll consider a careful fix for this locally and put a PR up.

Currently, this bug only affects Int.Parse calls with an input that has trailing zeros => 1.0, 1.00, 1.000, 1.0000, 1.0100 etc are broken, but 1.01, 1.001, 1.0001, 1.01 etc are not broken.

@jeffhandley : Just wondering about time line here? After I put a PR up, I'm not sure when it'll get approved and merged :)

@jeffhandley
Copy link
Member

The fix could land early- to mid-week in the release branch next week. We'll want to get @pranavkm's validation of the fix from his scenarios through a private build and probably from the release branch bits too, with that completed before the end of next week.

@pgovind
Copy link
Contributor Author

pgovind commented Feb 22, 2021

Is there a way to retain the current behavior of keeping trailing 0s?
Have you considered treating this as breaking change?

#48608 will fix this, so you shouldn't have to do anything @pranavkm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Double.Parse rounding bug when there are trailing zeroes in the input string
6 participants