`tail`: fix argument parsing of sleep interval #4239

Joining7943 · 2022-12-16T22:25:17Z

Like discussed, I've extracted this from the refactoring pr of tail, which fixes the parsing of the sleep interval.

The main intend of this pr is to match the behavior of gnu's tail and properly parse a string in f64 format to a Duration.

Short summary:

The sleep interval is a f64 instead of a f32
Duration::from_secs_f64 panics and Duration::try_from_secs_f64 is not stable, so we need a separate function to accomplish the conversion from f64 to a Duration.

github-actions · 2022-12-16T23:25:41Z

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

github-actions · 2022-12-16T23:51:08Z

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions · 2022-12-17T21:40:53Z

GNU testsuite comparison:

Congrats! The gnu test tests/misc/timeout is no longer failing!
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?

tertsdiepraam · 2022-12-18T21:41:46Z

src/uu/tail/src/args.rs

+ let trimmed = src.trim();
+ match src.parse::<f64>() {
+ // We're working here with the string representation instead of num.fract() to avoid further
+ // floating point precision problems where we can.


Converting to a string is lossy too. Are you sure this is worth it? We could also try to copy the unstable implementation in std.

Converting to a string is lossy too.

It's the multiplication with 1_000_000_000 which causes further imprecision. Some of the tests fail with the multiplication. No sure which one it was, but I think something like 1.999999999 resulted in Duration::new(1, 999_999_998). I'm trying to be as precise as possible to make the parsing more predictable.

Are you sure this is worth it?

Dunno. Sure it's slower than the the multiplication but parsing the sleep interval is a one time action and the string length of the f64 parsed src is also managable.

I'd imagine that conversion to a string would lead to inaccuracies in different places, but I don't have evidence for that. Let's keep this, but add a link to the tracking issue for try_from_secs_f64, so we can periodically check whether it has been stabilized.

Nevermind, it's been stabilized in 1.66! Let's put a Replace with Duration::try_from_secs_f64 once we hit MSRV 1.66 here in a comment.

ok, I've added a comment.

tertsdiepraam · 2022-12-18T21:43:13Z

src/uu/tail/src/args.rs

+ },
+ Ok(num)
+ if num.is_infinite()
+ && (trimmed.eq_ignore_ascii_case("inf")


I guess this is to distinguish positive infinity from negative? Can't we do that with is_sign_negative too if we check that first? In any case, could you document what this check is for?

I guess this is to distinguish positive infinity from negative

Not only positive infinity from negative, but also if the infinity is caused by a src number greater than f64::MAX or if inf or similar was given as src. gnu's tail fails with an error in the first case. I'm describing the problem in the comment below these lines. Maybe I should that comment move up.

Can't we do that with is_sign_negative

If we deviate from gnu's tail, sure.

Oh I see, I assumed it would be Err if it was more than f64::MAX, but that makes sense. If you could add a quick comment explaining that in the code that would be great!

tertsdiepraam · 2022-12-18T21:44:23Z

src/uu/tail/src/args.rs

+ // positive infinite (if src > f64::MAX) or NaN. In case of positive infinite we need to
+ // match gnu's tail behavior and return an error although it may be a nice feature to
+ // interpret this result as a valid (maximum) Duration.
+ Ok(_) => Err(String::from("Not a number")),


Setting Duration::MAX makes sense to me here if it is a valid positive number, but that's a bit tricky to check (especially with an exponent). So cool idea!

Setting Duration::MAX makes sense to me here if it is a valid positive number

Ok, I think checking for sign_positive in the match arm

Ok(num) if num.is_infinite() ...

should do the trick.

I haven't changed this because of your comment above.

Oh I see, I assumed it would be Err if it was more than f64::MAX, but that makes sense. If you could add a quick comment explaining that in the code that would be great!

tertsdiepraam · 2022-12-18T21:46:10Z

src/uu/tail/src/args.rs

@@ -524,4 +590,136 @@ mod tests {
 assert!(result.is_ok());
 assert_eq!(result.unwrap(), Signum::Negative(1));
 }
+
+ #[test]
+ fn test_parse_duration_when_simple_arguments_are_valid() {


Excellent tests!

github-actions · 2022-12-19T21:18:17Z

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions · 2022-12-24T01:50:31Z

GNU testsuite comparison:

Congrats! The gnu test tests/tail-2/inotify-dir-recreate is no longer failing!

jfinkels · 2022-12-25T17:22:30Z

How does this parse_duration() function relate to uucore::parse_time::from_str()?

coreutils/src/uucore/src/lib/parser/parse_time.rs

Lines 17 to 49 in bdd6f56

 /// Parse a duration from a string. 

 /// 

 /// The string may contain only a number, like "123" or "4.5", or it 

 /// may contain a number with a unit specifier, like "123s" meaning 

 /// one hundred twenty three seconds or "4.5d" meaning four and a half 

 /// days. If no unit is specified, the unit is assumed to be seconds. 

 /// 

 /// The only allowed suffixes are 

 /// 

 /// * "s" for seconds, 

 /// * "m" for minutes, 

 /// * "h" for hours, 

 /// * "d" for days. 

 /// 

 /// This function uses [`Duration::saturating_mul`] to compute the 

 /// number of seconds, so it does not overflow. If overflow would have 

 /// occurred, [`Duration::MAX`] is returned instead. 

 /// 

 /// # Errors 

 /// 

 /// This function returns an error if the input string is empty, the 

 /// input is not a valid number, or the unit specifier is invalid or 

 /// unknown. 

 /// 

 /// # Examples 

 /// 

 /// ```rust 

 /// use std::time::Duration; 

 /// use uucore::parse_time::from_str; 

 /// assert_eq!(from_str("123"), Ok(Duration::from_secs(123))); 

 /// assert_eq!(from_str("2d"), Ok(Duration::from_secs(60 * 60 * 24 * 2))); 

 /// ``` 

 pub fn from_str(string: &str) -> Result<Duration, String> {

Joining7943 · 2022-12-25T23:17:10Z

Not sure what you mean, but they don't relate I think. This function tries to provide the same functionality like gnu's tail parsing of the --sleep-interval flag. So, there are for example no units like s, ms etc. allowed and if the number on the command line overflows f64 there's an error. That's the main difference to the uucore::parse_time::from_str function as far as I can see.

Joining7943 · 2022-12-27T15:26:42Z

Please do not merge. I'm just trying something, to get a more performant, precise and unlossy version of the parse_duration method.

Joining7943 · 2022-12-31T14:14:25Z

@tertsdiepraam I've completely rewritten the parse_duration method and moved it because of its size into the parse module of tail.

This duration parser is almost as fast as Duration::from_secs_f64(input.parse::<f64>().unwrap), has the same grammar like a f64 parser but is not lossy (due to rounding and lack of floating point precision) by storing the exact digit representation of the input string. parse_duration is around 2 - 5 times slower depending on the size of the input string but still operates in the nano seconds domain and takes around 100 - 300 ns for normal input like 1.0e2 on my system. More extreme input like format!("{}.{}e-1022", "1".repeat(2000), "9".repeat(2000)) which can't even parsed without errors by the Duration::from_secs_f64 method still ran in around 5 microseconds on my quadcore system.

If the seconds overflow u64::MAX, the duration is interpreted as Duration::MAX, which is also an improvement over gnu's tail, I think. Imho we don't need to replace the parse_duration method with Duration::try_from_secs_f64 anymore. What do you think?

tertsdiepraam · 2022-12-31T14:31:02Z

I'll have to look closely into the code later, but it sounds great! It does add significant complexity to the code over a single function call, but the advantages you list might be worth it.

Unrelated to the actual change, but only to the tests (which I only quickly skimmed), I'm not sure I like using rstest for creating multiple test cases. Personally, I think a for loop is easier to read and more approachable for new contributors.

Joining7943 · 2022-12-31T15:02:32Z

Ok, cool. Parameterized tests have the advantage that all test cases can run (and fail). A for loop stops at the first error encountered and it's maybe not clear which case actually failed. That's even worse in the CI, because we don't output the backtrace there. Parameterizing the tests helped me a lot during the writing. I actually don't think this is a huge obstacle for new contributors, since the syntax is quiet clear #[case::some_descriptive_name(...)] (even without knowing anything about rstest) and the output of an errored test case is parse::tests::my_test::case_1_some_descriptive_name pointing to the exact test case which went south by also providing a first hint. rstest is also well documented in my opinion, on one page in the github README and on docs.rs.

github-actions · 2023-01-01T13:23:10Z

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

Joining7943 · 2023-01-06T19:13:22Z

I'll have to look closely into the code later,

Have you looked at this yet?

Joining7943 · 2023-01-08T19:40:46Z

I had the idea to move this into uucore, if this is too big for tail alone. It's fairly simple to add time unit parsing and make the time units customizable. Each uutil may have its own set of time units it accepts. Maybe we can do this once and for all uutils by also providing a speedy and precise parsing of a Duration. If required, I had some additional ideas to max out the parsing speed, but this would also add additional complexity. What do you think?

Joining7943 · 2023-02-03T20:06:26Z

No offense, but I extracted the code I've written into an own crate https://crates.io/crates/fundu. The idea and much of the code is the same. I've taken care, that it's compatible with the requirements of uutils :) I don't intend to change that and I hope you see this as an advantage over the original post. It is also a lot of lines of code less in uutils. Would be happy to hear your opinion and maybe see fundu used in uutils.

github-actions · 2023-02-06T14:26:08Z

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions · 2023-02-06T20:44:37Z

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions · 2023-02-06T21:05:07Z

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

Joining7943 · 2023-02-07T13:04:06Z

@tertsdiepraam ping?

tertsdiepraam · 2023-02-08T19:00:46Z

I'll take a look now :)

tertsdiepraam · 2023-02-08T19:23:23Z

I hope you see this as an advantage over the original post. It is also a lot of lines of code less in uutils.

Agreed, it's also much better documented and tested now, which is great! It's a nice crate for other projects too. Excellent work!

I think it would be interesting to show benchmarks against simple String -> f64 -> Duration to provide some proof that it really has a low (or no) overhead. You could also show some examples where that fails and where fundu succeeds in the README. You could also document a bit more clearly what the default units are. On the code side, the only thing I have to comment, is that the multiplier is a bit confusing because it's sometimes 10^-m and sometimes a normal multiplier. Why not split that into two functions or maybe a function that returns a multiplier and an exponent in a tuple? That might also clean up some branching.

tertsdiepraam · 2023-02-08T19:27:14Z

src/uu/tail/src/args.rs

- }
- }
+ if let Some(source) = matches.get_one::<String>(options::SLEEP_INT) {
+ settings.sleep_sec =


Could you add a quick comment here explaining what the advantage of fundu is over try_from_f64_secs?

sure, no problem but it'll take until tomorrow

Joining7943 · 2023-02-08T22:10:57Z

Thanks for your review. It's much appreciated.

I think it would be interesting to show benchmarks against simple String -> f64 -> Duration to provide some proof that it really has a low (or no) overhead

good idea. Currently, the comparison with String -> f64 -> Duration can be seen when running the benches. It's the reference function. fundu will always have some (low) overhead compared to Duration::from_secs_f64 combined with f64::from_str because of its precision and the integration of time unit parsing etc. Parsing some simple input with fundu takes around 50ns and the stdlib methods combined around 26ns on my testing machine. I think, in most use cases this difference won't be noticable. However, I'm working on lowering the difference by some additional nano seconds (maybe 5ns or so) for small input, but especially for large input.

You could also show some examples where that fails and where fundu succeeds in the README. You could also document a bit more clearly what the default units are.

yeah that would be good. It'll be part of 0.3.0. I added a lot of documentation for the public api in general there.

On the code side, the only thing I have to comment, is that the multiplier is a bit confusing because it's sometimes 10^-m and sometimes a normal multiplier. Why not split that into two functions or maybe a function that returns a multiplier and an exponent in a tuple? That might also clean up some branching.

you're right it's confusing. Do you want that method pub? However, I'll try out the tuple.

tertsdiepraam · 2023-02-08T22:31:02Z

Do you want that method pub?

Not really, I was just checking out the code and this stood out :)

Joining7943 · 2023-02-08T22:35:13Z

ok :) I think this method is not really useful outside of the crate, so I'll remove it from the public api for now.

github-actions · 2023-02-09T14:37:17Z

GNU testsuite comparison:

GNU test failed: tests/misc/timeout. tests/misc/timeout is passing on 'main'. Maybe you have to rebase?

Joining7943 · 2023-02-11T16:56:31Z

Are we done here?

tertsdiepraam · 2023-02-12T15:49:18Z

We made a change to how we declare dependencies so the Cargo.toml conflicts, but apart from it's good!

…rate. Activate tests for parsing sleep interval

Joining7943 · 2023-02-12T19:25:59Z

ok great! The errors in the ci aren't related to this pr as far as I can see. However, I'm triggering a rerun, so maybe the deny step recovers.

Joining7943 · 2023-02-12T20:40:02Z

ok, looks like the deny step recovered.

tertsdiepraam

Sorry for taking so long on this! Excellent work!

Joining7943 · 2023-02-16T14:32:49Z

Thanks :)

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from 1e5b081 to faae8b8 Compare December 16, 2022 22:50

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from faae8b8 to bf767e2 Compare December 17, 2022 20:40

tertsdiepraam reviewed Dec 18, 2022

View reviewed changes

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from bf767e2 to ea29b96 Compare December 19, 2022 20:17

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from ea29b96 to 1ce6f51 Compare December 24, 2022 00:50

sylvestre force-pushed the tail-fix-parsing-of-sleep-interval branch from 1ce6f51 to fde84f9 Compare December 26, 2022 09:23

tertsdiepraam marked this pull request as draft December 27, 2022 17:16

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from fde84f9 to 4853796 Compare December 31, 2022 13:36

Joining7943 marked this pull request as ready for review January 1, 2023 12:08

sylvestre force-pushed the tail-fix-parsing-of-sleep-interval branch from cd5e71c to 250b53a Compare January 27, 2023 20:09

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from 250b53a to 83d7de6 Compare February 6, 2023 13:25

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch 2 times, most recently from 3dc1348 to d8e8689 Compare February 6, 2023 20:04

tertsdiepraam reviewed Feb 8, 2023

View reviewed changes

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from d8e8689 to 256ef9a Compare February 9, 2023 13:23

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch 3 times, most recently from ef5c527 to fb15539 Compare February 12, 2023 17:59

tail: Fix parsing of sleep interval. Use duration parser from fundu c…

0ed6a9f

…rate. Activate tests for parsing sleep interval

Joining7943 force-pushed the tail-fix-parsing-of-sleep-interval branch from fb15539 to 0ed6a9f Compare February 12, 2023 19:26

tertsdiepraam approved these changes Feb 16, 2023

View reviewed changes

tertsdiepraam merged commit ff5000d into uutils:main Feb 16, 2023

Joining7943 deleted the tail-fix-parsing-of-sleep-interval branch February 16, 2023 14:33

tail: fix argument parsing of sleep interval #4239

tail: fix argument parsing of sleep interval #4239

Conversation

Joining7943 commented Dec 16, 2022

github-actions bot commented Dec 16, 2022

github-actions bot commented Dec 16, 2022

github-actions bot commented Dec 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Joining7943 Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 19, 2022

github-actions bot commented Dec 24, 2022

jfinkels commented Dec 25, 2022

Joining7943 commented Dec 25, 2022 • edited Loading

Joining7943 commented Dec 27, 2022

Joining7943 commented Dec 31, 2022 • edited Loading

tertsdiepraam commented Dec 31, 2022

Joining7943 commented Dec 31, 2022

github-actions bot commented Jan 1, 2023

Joining7943 commented Jan 6, 2023

Joining7943 commented Jan 8, 2023

Joining7943 commented Feb 3, 2023

github-actions bot commented Feb 6, 2023

github-actions bot commented Feb 6, 2023

github-actions bot commented Feb 6, 2023

Joining7943 commented Feb 7, 2023

tertsdiepraam commented Feb 8, 2023

tertsdiepraam commented Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Joining7943 commented Feb 8, 2023

tertsdiepraam commented Feb 8, 2023

Joining7943 commented Feb 8, 2023

github-actions bot commented Feb 9, 2023

Joining7943 commented Feb 11, 2023

tertsdiepraam commented Feb 12, 2023

Joining7943 commented Feb 12, 2023

Joining7943 commented Feb 12, 2023

tertsdiepraam left a comment

Choose a reason for hiding this comment

Joining7943 commented Feb 16, 2023

`tail`: fix argument parsing of sleep interval #4239

`tail`: fix argument parsing of sleep interval #4239

Joining7943 Dec 19, 2022 •

edited

Loading

Joining7943 commented Dec 25, 2022 •

edited

Loading

Joining7943 commented Dec 31, 2022 •

edited

Loading

tertsdiepraam commented Feb 8, 2023 •

edited

Loading