-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seq: use BigDecimal to represent floats #2698
Conversation
There are some remaining issues due to parsing/formatting of NaNs and -0.0 not being implemented in Rust versions prior to 1.53.0; see this pull request rust-lang/rust#78618. I'm confident that I can make this work with some adjustments though. |
MinRustV is failing (interesting issue :) |
54bca04
to
499dde4
Compare
I updated this branch with changes that I hope support Rust v1.47.0. The updates in v1.53.0 that allow parsing and displaying floating point negative zero would make the code much simpler, but I suppose that will have to wait! |
499dde4
to
0e90c48
Compare
I have added an additional commit that adds a missing test case: the use of |
@sylvestre it looks like the merge commit you added may have broken something. Should I try to fix that? |
I will update this pull request due to the new feature added in pull request #2701. |
f44fb7c
to
3379817
Compare
@@ -60,7 +60,7 @@ fn test_hex_identifier_in_wrong_place() { | |||
.args(&["1234ABCD0x"]) | |||
.fails() | |||
.no_stdout() | |||
.stderr_contains("invalid hexadecimal argument: '1234ABCD0x'") | |||
.stderr_contains("invalid floating point argument: '1234ABCD0x'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the error message here (and in the tests above) to match the behavior of GNU seq:
$ seq ff0x
seq: invalid floating point argument: ‘ff0x’
Try 'seq --help' for more information.
It makes the implementation simpler, since anything that does not start with "0x" or "-0x" follows the code path for parsing a base ten number, eventually resulting in a ParseNumberError::Float
.
Is it expected that it has no impact on the GNU tests?
|
I don't remember anymore, I'll double check the GNU tests and report back here. |
This pull request will resolve a few of the test cases but not all of them. There are still a few test failures in at least the
|
Use `BigDecimal` to represent arbitrary precision floats in order to prevent numerical precision issues when iterating over a sequence of numbers. This commit makes several changes at once to accomplish this goal. First, it creates a new struct, `PreciseNumber`, that is responsible for storing not only the number itself but also the number of digits (both integer and decimal) needed to display it. This information is collected at the time of parsing the number, which lives in the new `numberparse.rs` module. Second, it uses the `BigDecimal` struct to store arbitrary precision floating point numbers instead of the previous `f64` primitive type. This protects against issues of numerical precision when repeatedly accumulating a very small increment. Third, since neither the `BigDecimal` nor `BigInt` types have a representation of infinity, minus infinity, minus zero, or NaN, we add the `ExtendedBigDecimal` and `ExtendedBigInt` enumerations which extend the basic types with these concepts.
3379817
to
c2c2622
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change, just a few comments. Have you done any benchmarks to check how performance looks like? I'd obviously expect BigDecimal
to be much slower than using f64
but I'm interested in the effect of wrapping BigInt
in ExtendedBigInt
.
(Self::BigDecimal(_), Self::MinusInfinity) => false, | ||
(Self::BigDecimal(_), Self::Infinity) => false, | ||
(Self::BigDecimal(_), Self::Nan) => false, | ||
(Self::BigDecimal(_), Self::MinusZero) => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think positive 0 and negative 0 should compare equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually, this is intentionally false
.
The reason I am representing MinusZero
as a distinct concept at all is because if -0.0
is given in the input as the starting number, it must be displayed as -0.0
in the output. Because -0.0
gets rendered as "0.0"
in older versions of Rust, we need to know whether the value is ExtendedBigDecimal::MinusZero
or the value is ExtendedBigDecimal::BigDecimal(BigDecimal::zero())
. The equality comparison happens when we are deciding how to render the current value in write_value_float()
:
if *value == ExtendedBigDecimal::MinusZero && is_first_iteration {
If the BigDecimal::zero()
and MinusZero
were to compare equal, then the write_value_float()
would write "-0.0"
when it should be writing "0.0"
. For example,
$ ./target/debug/coreutils seq -w 0.0 1
-0.0
1.0
We don't want that, so we need to ensure that negative zero and positive zero are considered different things.
When the minimum Rust version becomes large enough, we may be able to eliminate much of the special handling code for negative zero.
(Self::MinusInfinity, Self::MinusInfinity) => true, | ||
(Self::MinusInfinity, Self::Nan) => false, | ||
(Self::Nan, _) => false, | ||
(Self::MinusZero, Self::BigDecimal(_)) => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
// zero. See | ||
// https:/rust-lang/rust/pull/78618. Currently, | ||
// this just formats "0.0". | ||
(0.0f32).fmt(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we want to change this to (-0.0f32).fmt(f)
at a later point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so.
write!(writer, "{}", value_as_str) | ||
} | ||
|
||
// TODO `print_seq()` and `print_seq_integers()` are nearly identical, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could create a trait that both ExtendedBigInt
and ExtendedBigDecimal
implement and make write_value
generic, but we can do this at a later point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely. If this pull request gets merged, I propose we open a new issue for that change. This diff is already larger than I usually like to produce.
I have not yet done any benchmarks. I can work on that next. |
I have made the changes suggested by @miDeb, and provided a response to the suggestion about negative zero and positive zero comparing equal. Since it came up, I have added a simple Finally, here's a quick comparison between the GNU version of When enumerating the floating point numbers between 0 and 100,000 by increments of 0.1, the
When enumerating the integers between 0 and 2,000,000, the GNU version is fastest, the
My intent in making this pull request was to provide correctness, not efficiency. A future pull request could make optimizations. For example, in the common case of enumerating small positive integers, much simpler code could be used. |
Thanks for the very detailed response! I agree that we can do some optimizations later. I think my main concern is that wrapping the |
Use
BigDecimal
to represent arbitrary precision floats in order toprevent numerical precision issues when iterating over a sequence of
numbers. This commit makes several changes at once to accomplish this
goal.
First, it creates a new struct,
PreciseNumber
, that is responsible forstoring not only the number itself but also the number of digits (both
integer and decimal) needed to display it. This information is collected
at the time of parsing the number, which lives in the new
numberparse.rs
module.Second, it uses the
BigDecimal
struct to store arbitrary precisionfloating point numbers instead of the previous
f64
primitivetype. This protects against issues of numerical precision when
repeatedly accumulating a very small increment.
Third, since neither the
BigDecimal
norBigInt
types have arepresentation of infinity, minus infinity, minus zero, or NaN, we add
the
ExtendedBigDecimal
andExtendedBigInt
enumerations which extendthe basic types with these concepts.