Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tail: fix argument parsing of sleep interval #4239

Merged

Conversation

Joining7943
Copy link
Contributor

Like discussed, I've extracted this from the refactoring pr of tail, which fixes the parsing of the sleep interval.

The main intend of this pr is to match the behavior of gnu's tail and properly parse a string in f64 format to a Duration.

Short summary:

  • The sleep interval is a f64 instead of a f32
  • Duration::from_secs_f64 panics and Duration::try_from_secs_f64 is not stable, so we need a separate function to accomplish the conversion from f64 to a Duration.

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from 1e5b081 to faae8b8 Compare December 16, 2022 22:50
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from faae8b8 to bf767e2 Compare December 17, 2022 20:40
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/misc/timeout is no longer failing!
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?

let trimmed = src.trim();
match src.parse::<f64>() {
// We're working here with the string representation instead of num.fract() to avoid further
// floating point precision problems where we can.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting to a string is lossy too. Are you sure this is worth it? We could also try to copy the unstable implementation in std.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting to a string is lossy too.

It's the multiplication with 1_000_000_000 which causes further imprecision. Some of the tests fail with the multiplication. No sure which one it was, but I think something like 1.999999999 resulted in Duration::new(1, 999_999_998). I'm trying to be as precise as possible to make the parsing more predictable.

Are you sure this is worth it?

Dunno. Sure it's slower than the the multiplication but parsing the sleep interval is a one time action and the string length of the f64 parsed src is also managable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd imagine that conversion to a string would lead to inaccuracies in different places, but I don't have evidence for that. Let's keep this, but add a link to the tracking issue for try_from_secs_f64, so we can periodically check whether it has been stabilized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it's been stabilized in 1.66! Let's put a Replace with Duration::try_from_secs_f64 once we hit MSRV 1.66 here in a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I've added a comment.

},
Ok(num)
if num.is_infinite()
&& (trimmed.eq_ignore_ascii_case("inf")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is to distinguish positive infinity from negative? Can't we do that with is_sign_negative too if we check that first? In any case, could you document what this check is for?

Copy link
Contributor Author

@Joining7943 Joining7943 Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is to distinguish positive infinity from negative

Not only positive infinity from negative, but also if the infinity is caused by a src number greater than f64::MAX or if inf or similar was given as src. gnu's tail fails with an error in the first case. I'm describing the problem in the comment below these lines. Maybe I should that comment move up.

Can't we do that with is_sign_negative

If we deviate from gnu's tail, sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I assumed it would be Err if it was more than f64::MAX, but that makes sense. If you could add a quick comment explaining that in the code that would be great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// positive infinite (if src > f64::MAX) or NaN. In case of positive infinite we need to
// match gnu's tail behavior and return an error although it may be a nice feature to
// interpret this result as a valid (maximum) Duration.
Ok(_) => Err(String::from("Not a number")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting Duration::MAX makes sense to me here if it is a valid positive number, but that's a bit tricky to check (especially with an exponent). So cool idea!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting Duration::MAX makes sense to me here if it is a valid positive number

Ok, I think checking for sign_positive in the match arm

Ok(num) if num.is_infinite() ...

should do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't changed this because of your comment above.

Oh I see, I assumed it would be Err if it was more than f64::MAX, but that makes sense. If you could add a quick comment explaining that in the code that would be great!

@@ -524,4 +590,136 @@ mod tests {
assert!(result.is_ok());
assert_eq!(result.unwrap(), Signum::Negative(1));
}

#[test]
fn test_parse_duration_when_simple_arguments_are_valid() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent tests!

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from bf767e2 to ea29b96 Compare December 19, 2022 20:17
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from ea29b96 to 1ce6f51 Compare December 24, 2022 00:50
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/tail-2/inotify-dir-recreate is no longer failing!

@jfinkels
Copy link
Collaborator

How does this parse_duration() function relate to uucore::parse_time::from_str()?

/// Parse a duration from a string.
///
/// The string may contain only a number, like "123" or "4.5", or it
/// may contain a number with a unit specifier, like "123s" meaning
/// one hundred twenty three seconds or "4.5d" meaning four and a half
/// days. If no unit is specified, the unit is assumed to be seconds.
///
/// The only allowed suffixes are
///
/// * "s" for seconds,
/// * "m" for minutes,
/// * "h" for hours,
/// * "d" for days.
///
/// This function uses [`Duration::saturating_mul`] to compute the
/// number of seconds, so it does not overflow. If overflow would have
/// occurred, [`Duration::MAX`] is returned instead.
///
/// # Errors
///
/// This function returns an error if the input string is empty, the
/// input is not a valid number, or the unit specifier is invalid or
/// unknown.
///
/// # Examples
///
/// ```rust
/// use std::time::Duration;
/// use uucore::parse_time::from_str;
/// assert_eq!(from_str("123"), Ok(Duration::from_secs(123)));
/// assert_eq!(from_str("2d"), Ok(Duration::from_secs(60 * 60 * 24 * 2)));
/// ```
pub fn from_str(string: &str) -> Result<Duration, String> {

@Joining7943
Copy link
Contributor Author

Joining7943 commented Dec 25, 2022

Not sure what you mean, but they don't relate I think. This function tries to provide the same functionality like gnu's tail parsing of the --sleep-interval flag. So, there are for example no units like s, ms etc. allowed and if the number on the command line overflows f64 there's an error. That's the main difference to the uucore::parse_time::from_str function as far as I can see.

@sylvestre sylvestre force-pushed the tail-fix-parsing-of-sleep-interval branch from 1ce6f51 to fde84f9 Compare December 26, 2022 09:23
@Joining7943
Copy link
Contributor Author

Please do not merge. I'm just trying something, to get a more performant, precise and unlossy version of the parse_duration method.

@tertsdiepraam tertsdiepraam marked this pull request as draft December 27, 2022 17:16
@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from fde84f9 to 4853796 Compare December 31, 2022 13:36
@Joining7943
Copy link
Contributor Author

Joining7943 commented Dec 31, 2022

@tertsdiepraam I've completely rewritten the parse_duration method and moved it because of its size into the parse module of tail.

This duration parser is almost as fast as Duration::from_secs_f64(input.parse::<f64>().unwrap), has the same grammar like a f64 parser but is not lossy (due to rounding and lack of floating point precision) by storing the exact digit representation of the input string. parse_duration is around 2 - 5 times slower depending on the size of the input string but still operates in the nano seconds domain and takes around 100 - 300 ns for normal input like 1.0e2 on my system. More extreme input like format!("{}.{}e-1022", "1".repeat(2000), "9".repeat(2000)) which can't even parsed without errors by the Duration::from_secs_f64 method still ran in around 5 microseconds on my quadcore system.

If the seconds overflow u64::MAX, the duration is interpreted as Duration::MAX, which is also an improvement over gnu's tail, I think. Imho we don't need to replace the parse_duration method with Duration::try_from_secs_f64 anymore. What do you think?

@tertsdiepraam
Copy link
Member

I'll have to look closely into the code later, but it sounds great! It does add significant complexity to the code over a single function call, but the advantages you list might be worth it.

Unrelated to the actual change, but only to the tests (which I only quickly skimmed), I'm not sure I like using rstest for creating multiple test cases. Personally, I think a for loop is easier to read and more approachable for new contributors.

@Joining7943
Copy link
Contributor Author

Ok, cool. Parameterized tests have the advantage that all test cases can run (and fail). A for loop stops at the first error encountered and it's maybe not clear which case actually failed. That's even worse in the CI, because we don't output the backtrace there. Parameterizing the tests helped me a lot during the writing. I actually don't think this is a huge obstacle for new contributors, since the syntax is quiet clear #[case::some_descriptive_name(...)] (even without knowing anything about rstest) and the output of an errored test case is parse::tests::my_test::case_1_some_descriptive_name pointing to the exact test case which went south by also providing a first hint. rstest is also well documented in my opinion, on one page in the github README and on docs.rs.

@Joining7943 Joining7943 marked this pull request as ready for review January 1, 2023 12:08
@github-actions
Copy link

github-actions bot commented Jan 1, 2023

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

@Joining7943
Copy link
Contributor Author

I'll have to look closely into the code later,

Have you looked at this yet?

@Joining7943
Copy link
Contributor Author

I had the idea to move this into uucore, if this is too big for tail alone. It's fairly simple to add time unit parsing and make the time units customizable. Each uutil may have its own set of time units it accepts. Maybe we can do this once and for all uutils by also providing a speedy and precise parsing of a Duration. If required, I had some additional ideas to max out the parsing speed, but this would also add additional complexity. What do you think?

@sylvestre sylvestre force-pushed the tail-fix-parsing-of-sleep-interval branch from cd5e71c to 250b53a Compare January 27, 2023 20:09
@Joining7943
Copy link
Contributor Author

No offense, but I extracted the code I've written into an own crate https://crates.io/crates/fundu. The idea and much of the code is the same. I've taken care, that it's compatible with the requirements of uutils :) I don't intend to change that and I hope you see this as an advantage over the original post. It is also a lot of lines of code less in uutils. Would be happy to hear your opinion and maybe see fundu used in uutils.

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from 250b53a to 83d7de6 Compare February 6, 2023 13:25
@github-actions
Copy link

github-actions bot commented Feb 6, 2023

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch 2 times, most recently from 3dc1348 to d8e8689 Compare February 6, 2023 20:04
@github-actions
Copy link

github-actions bot commented Feb 6, 2023

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

@Joining7943
Copy link
Contributor Author

@tertsdiepraam ping?

@tertsdiepraam
Copy link
Member

I'll take a look now :)

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Feb 8, 2023

I hope you see this as an advantage over the original post. It is also a lot of lines of code less in uutils.

Agreed, it's also much better documented and tested now, which is great! It's a nice crate for other projects too. Excellent work!

I think it would be interesting to show benchmarks against simple String -> f64 -> Duration to provide some proof that it really has a low (or no) overhead. You could also show some examples where that fails and where fundu succeeds in the README. You could also document a bit more clearly what the default units are. On the code side, the only thing I have to comment, is that the multiplier is a bit confusing because it's sometimes 10^-m and sometimes a normal multiplier. Why not split that into two functions or maybe a function that returns a multiplier and an exponent in a tuple? That might also clean up some branching.

}
}
if let Some(source) = matches.get_one::<String>(options::SLEEP_INT) {
settings.sleep_sec =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a quick comment here explaining what the advantage of fundu is over try_from_f64_secs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, no problem but it'll take until tomorrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Joining7943
Copy link
Contributor Author

Thanks for your review. It's much appreciated.

I think it would be interesting to show benchmarks against simple String -> f64 -> Duration to provide some proof that it really has a low (or no) overhead

good idea. Currently, the comparison with String -> f64 -> Duration can be seen when running the benches. It's the reference function. fundu will always have some (low) overhead compared to Duration::from_secs_f64 combined with f64::from_str because of its precision and the integration of time unit parsing etc. Parsing some simple input with fundu takes around 50ns and the stdlib methods combined around 26ns on my testing machine. I think, in most use cases this difference won't be noticable. However, I'm working on lowering the difference by some additional nano seconds (maybe 5ns or so) for small input, but especially for large input.

You could also show some examples where that fails and where fundu succeeds in the README. You could also document a bit more clearly what the default units are.

yeah that would be good. It'll be part of 0.3.0. I added a lot of documentation for the public api in general there.

On the code side, the only thing I have to comment, is that the multiplier is a bit confusing because it's sometimes 10^-m and sometimes a normal multiplier. Why not split that into two functions or maybe a function that returns a multiplier and an exponent in a tuple? That might also clean up some branching.

you're right it's confusing. Do you want that method pub? However, I'll try out the tuple.

@tertsdiepraam
Copy link
Member

Do you want that method pub?

Not really, I was just checking out the code and this stood out :)

@Joining7943
Copy link
Contributor Author

ok :) I think this method is not really useful outside of the crate, so I'll remove it from the public api for now.

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from d8e8689 to 256ef9a Compare February 9, 2023 13:23
@github-actions
Copy link

github-actions bot commented Feb 9, 2023

GNU testsuite comparison:

GNU test failed: tests/misc/timeout. tests/misc/timeout is passing on 'main'. Maybe you have to rebase?

@Joining7943
Copy link
Contributor Author

Are we done here?

@tertsdiepraam
Copy link
Member

We made a change to how we declare dependencies so the Cargo.toml conflicts, but apart from it's good!

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch 3 times, most recently from ef5c527 to fb15539 Compare February 12, 2023 17:59
…rate.

Activate tests for parsing sleep interval
@Joining7943
Copy link
Contributor Author

ok great! The errors in the ci aren't related to this pr as far as I can see. However, I'm triggering a rerun, so maybe the deny step recovers.

@Joining7943 Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from fb15539 to 0ed6a9f Compare February 12, 2023 19:26
@Joining7943
Copy link
Contributor Author

ok, looks like the deny step recovered.

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long on this! Excellent work!

@tertsdiepraam tertsdiepraam merged commit ff5000d into uutils:main Feb 16, 2023
@Joining7943
Copy link
Contributor Author

Thanks :)

@Joining7943 Joining7943 deleted the tail-fix-parsing-of-sleep-interval branch February 16, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants