New argument parser #4254

tertsdiepraam · 2023-01-01T16:19:38Z

I talked about this a bit on Discord, but I think it's time to actually start talking about a new library I've been working on. So here goes: introducing the very creatively named uutils-args, a derive-based argument parsing library.

Why a custom argument parser

For a while now, we've been using clap as our argument parser, but over time I've been become increasingly frustrated with it. Not because it's a bad library, but because it doesn't fit the very specific needs of the coreutils and we constantly have to work around clap with a lot of verbose code. I looked into basically every Rust argument parser out there and none really fit our needs. So about a month ago, I started working on a parser based on lexopt specifically for uutils.

The whole design is too complex to explain here, so I wrote some documents in the repo. It has three subpages:

The library also integrates very nicely with some of the plans I had earlier for the --help pages (#3181, #2816), because the help text is read as markdown from a file in this library.

Current Status

The library supports many utils already, but a few important features are still missing. Some low-hanging fruit:

The deprecated arguments for head, tail and uniq,
~~Hidden arguments,~~
~~Setting exit codes.~~

And more complicated:

better and more complete markdown rendering,
argument completion.

There are also rough edges that can be smoothed out like improving the error messages and the output of --help.

It is also very much a product of me working on it alone, which means that most of the naming conventions make sense to me but maybe not to others. I would very much welcome feedback on any part of the API.

If you want to see the library in action, I suggest to take a look at some of the coreutils for which I have created tests (more to come in the future):

A full list is here: https:/tertsdiepraam/uutils-args/tree/main/tests/coreutils. I found that most are relatively easy to port.

Downsides

Porting to this new library will have a few downsides.

New contributors won't be familiar with this library.
We won't have argument completion (until we implement ourselves in this library)
We'll have to update uudoc.
We don't get all the features and years of iterative improvement that clap has.

Proposed roadmap

Of course, I'm aware that this is gonna be a huge effort to port and we should carefully how (and even if) we want to do that. For now, I've come up with the following steps:

Figure out whether the migration is going to be worth it.
Collect feedback and accept PRs to get the library ready for use in uutils. This includes
- Reevaluating the names for crates, traits and functions.
- Implementing missing features.
- Adding more documentation.
- Stabilize the API.
- Add more coreutils APIs as tests to prove the feasibility of the library.
Fix up administration
- Move the repo to the uutils organisation
- Publish the crates in the repo
Port utils one by one¹
Change uudoc to use the info from the new library.

How to help out

For now, I need lots of feedback on this library, so I can get it ready, feel free to open an issue with whatever feedback you have or comment below. You can also help by picking up any of the open issues.

I'd be happy to answer any questions about this library :)

cc @sylvestre, @rivy, @jfinkels

I think this is possible by having clap and this new library coexist in the repo for a while, though it will break completion and the online docs. ↩

The text was updated successfully, but these errors were encountered:

sylvestre · 2023-01-03T14:00:06Z

Interesting :)
but I am a bit worried that we end up reinventing the wheel.
Did you try to:

ask clap developer if they would be willing to take some patches?
if we could write a clap extension for this?

tertsdiepraam · 2023-01-03T15:14:20Z

but I am a bit worried that we end up reinventing the wheel.

I agree that this is something to watch out for. In my opinion the problems with clap have been stacking up too much and I think I found a solution that will simplify the code by a lot in some utilities. I'm trying to avoid reinventing the wheel as much as possible by using lexopt to deal with all the parsing complexity. We therefore get a lot of correct behaviour for free.

If you want to see how bad clap can get for our use case, you only have to look at this function in ls, which is 500 lines of workarounds for clap and was really complicated to get right:

coreutils/src/uu/ls/src/ls.rs

Lines 431 to 939 in c2e4da9

 pub fn from(options: &clap::ArgMatches) -> UResult<Self> { 

 let context = options.get_flag(options::CONTEXT); 

 let (mut format, opt) = if let Some(format_) = options.get_one::<String>(options::FORMAT) { 

 ( 

 match format_.as_str() { 

 "long" | "verbose" => Format::Long, 

 "single-column" => Format::OneLine, 

 "columns" | "vertical" => Format::Columns, 

 "across" | "horizontal" => Format::Across, 

 "commas" => Format::Commas, 

 // below should never happen as clap already restricts the values. 

 _ => unreachable!("Invalid field for --format"), 

 }, 

 Some(options::FORMAT), 

 ) 

 } else if options.get_flag(options::format::LONG) { 

 (Format::Long, Some(options::format::LONG)) 

 } else if options.get_flag(options::format::ACROSS) { 

 (Format::Across, Some(options::format::ACROSS)) 

 } else if options.get_flag(options::format::COMMAS) { 

 (Format::Commas, Some(options::format::COMMAS)) 

 } else if options.get_flag(options::format::COLUMNS) { 

 (Format::Columns, Some(options::format::COLUMNS)) 

 } else if atty::is(atty::Stream::Stdout) { 

 (Format::Columns, None) 

 } else { 

 (Format::OneLine, None) 

 }; 

 // The -o, -n and -g options are tricky. They cannot override with each 

 // other because it's possible to combine them. For example, the option 

 // -og should hide both owner and group. Furthermore, they are not 

 // reset if -l or --format=long is used. So these should just show the 

 // group: -gl or "-g --format=long". Finally, they are also not reset 

 // when switching to a different format option in-between like this: 

 // -ogCl or "-og --format=vertical --format=long". 

 // 

 // -1 has a similar issue: it does nothing if the format is long. This 

 // actually makes it distinct from the --format=singe-column option, 

 // which always applies. 

 // 

 // The idea here is to not let these options override with the other 

 // options, but manually whether they have an index that's greater than 

 // the other format options. If so, we set the appropriate format. 

 if format != Format::Long { 

 let idx = opt 

 .and_then(|opt| options.indices_of(opt).map(|x| x.max().unwrap())) 

 .unwrap_or(0); 

 if [ 

 options::format::LONG_NO_OWNER, 

 options::format::LONG_NO_GROUP, 

 options::format::LONG_NUMERIC_UID_GID, 

 options::FULL_TIME, 

 ] 

 .iter() 

 .flat_map(|opt| { 

 if options.value_source(opt) == Some(clap::parser::ValueSource::CommandLine) { 

 options.indices_of(opt) 

 } else { 

 None 

 } 

 }) 

 .flatten() 

 .any(|i| i >= idx) 

 { 

 format = Format::Long; 

 } else if let Some(mut indices) = options.indices_of(options::format::ONE_LINE) { 

 if options.value_source(options::format::ONE_LINE) 

 == Some(clap::parser::ValueSource::CommandLine) 

 && indices.any(|i| i > idx) 

 { 

 format = Format::OneLine; 

 } 

 } 

 } 

 let files = if options.get_flag(options::files::ALL) { 

 Files::All 

 } else if options.get_flag(options::files::ALMOST_ALL) { 

 Files::AlmostAll 

 } else { 

 Files::Normal 

 }; 

 let sort = if let Some(field) = options.get_one::<String>(options::SORT) { 

 match field.as_str() { 

 "none" => Sort::None, 

 "name" => Sort::Name, 

 "time" => Sort::Time, 

 "size" => Sort::Size, 

 "version" => Sort::Version, 

 "extension" => Sort::Extension, 

 // below should never happen as clap already restricts the values. 

 _ => unreachable!("Invalid field for --sort"), 

 } 

 } else if options.get_flag(options::sort::TIME) { 

 Sort::Time 

 } else if options.get_flag(options::sort::SIZE) { 

 Sort::Size 

 } else if options.get_flag(options::sort::NONE) { 

 Sort::None 

 } else if options.get_flag(options::sort::VERSION) { 

 Sort::Version 

 } else if options.get_flag(options::sort::EXTENSION) { 

 Sort::Extension 

 } else { 

 Sort::Name 

 }; 

 let time = if let Some(field) = options.get_one::<String>(options::TIME) { 

 match field.as_str() { 

 "ctime" | "status" => Time::Change, 

 "access" | "atime" | "use" => Time::Access, 

 "birth" | "creation" => Time::Birth, 

 // below should never happen as clap already restricts the values. 

 _ => unreachable!("Invalid field for --time"), 

 } 

 } else if options.get_flag(options::time::ACCESS) { 

 Time::Access 

 } else if options.get_flag(options::time::CHANGE) { 

 Time::Change 

 } else { 

 Time::Modification 

 }; 

 let mut needs_color = match options.get_one::<String>(options::COLOR) { 

 None => options.contains_id(options::COLOR), 

 Some(val) => match val.as_str() { 

 "" | "always" | "yes" | "force" => true, 

 "auto" | "tty" | "if-tty" => atty::is(atty::Stream::Stdout), 

 /* "never" | "no" | "none" | */ _ => false, 

 }, 

 }; 

 let cmd_line_bs = options.get_one::<String>(options::size::BLOCK_SIZE); 

 let opt_si = cmd_line_bs.is_some() 

 && options 

 .get_one::<String>(options::size::BLOCK_SIZE) 

 .unwrap() 

 .eq("si") 

 || options.get_flag(options::size::SI); 

 let opt_hr = (cmd_line_bs.is_some() 

 && options 

 .get_one::<String>(options::size::BLOCK_SIZE) 

 .unwrap() 

 .eq("human-readable")) 

 || options.get_flag(options::size::HUMAN_READABLE); 

 let opt_kb = options.get_flag(options::size::KIBIBYTES); 

 let bs_env_var = std::env::var_os("BLOCK_SIZE"); 

 let ls_bs_env_var = std::env::var_os("LS_BLOCK_SIZE"); 

 let pc_env_var = std::env::var_os("POSIXLY_CORRECT"); 

 let size_format = if opt_si { 

 SizeFormat::Decimal 

 } else if opt_hr { 

 SizeFormat::Binary 

 } else { 

 SizeFormat::Bytes 

 }; 

 let raw_bs = if let Some(cmd_line_bs) = cmd_line_bs { 

 OsString::from(cmd_line_bs) 

 } else if !opt_kb { 

 if let Some(ls_bs_env_var) = ls_bs_env_var { 

 ls_bs_env_var 

 } else if let Some(bs_env_var) = bs_env_var { 

 bs_env_var 

 } else { 

 OsString::from("") 

 } 

 } else { 

 OsString::from("") 

 }; 

 let block_size: Option<u64> = if !opt_si && !opt_hr && !raw_bs.is_empty() { 

 match parse_size(&raw_bs.to_string_lossy()) { 

 Ok(size) => Some(size), 

 Err(_) => { 

 show!(LsError::BlockSizeParseError( 

 cmd_line_bs.unwrap().to_owned() 

 )); 

 None 

 } 

 } 

 } else if let Some(pc) = pc_env_var { 

 if pc.as_os_str() == OsStr::new("true") || pc == OsStr::new("1") { 

 Some(POSIXLY_CORRECT_BLOCK_SIZE) 

 } else { 

 None 

 } 

 } else { 

 None 

 }; 

 let long = { 

 let author = options.get_flag(options::AUTHOR); 

 let group = !options.get_flag(options::NO_GROUP) 

 && !options.get_flag(options::format::LONG_NO_GROUP); 

 let owner = !options.get_flag(options::format::LONG_NO_OWNER); 

 #[cfg(unix)] 

 let numeric_uid_gid = options.get_flag(options::format::LONG_NUMERIC_UID_GID); 

 LongFormat { 

 author, 

 group, 

 owner, 

 #[cfg(unix)] 

 numeric_uid_gid, 

 } 

 }; 

 let width = match options.get_one::<String>(options::WIDTH) { 

 Some(x) => { 

 if x.starts_with('0') && x.len() > 1 { 

 // Read number as octal 

 match u16::from_str_radix(x, 8) { 

 Ok(v) => v, 

 Err(_) => return Err(LsError::InvalidLineWidth(x.into()).into()), 

 } 

 } else { 

 match x.parse::<u16>() { 

 Ok(u) => u, 

 Err(_) => return Err(LsError::InvalidLineWidth(x.into()).into()), 

 } 

 } 

 } 

 None => match terminal_size::terminal_size() { 

 Some((width, _)) => width.0, 

 None => match std::env::var_os("COLUMNS") { 

 Some(columns) => match columns.to_str().and_then(|s| s.parse().ok()) { 

 Some(columns) => columns, 

 None => { 

 show_error!( 

 "ignoring invalid width in environment variable COLUMNS: {}", 

 columns.quote() 

 ); 

 DEFAULT_TERM_WIDTH 

 } 

 }, 

 None => DEFAULT_TERM_WIDTH, 

 }, 

 }, 

 }; 

 #[allow(clippy::needless_bool)] 

 let mut show_control = if options.get_flag(options::HIDE_CONTROL_CHARS) { 

 false 

 } else if options.get_flag(options::SHOW_CONTROL_CHARS) { 

 true 

 } else { 

 !atty::is(atty::Stream::Stdout) 

 }; 

 let opt_quoting_style = options 

 .get_one::<String>(options::QUOTING_STYLE) 

 .map(|cmd_line_qs| cmd_line_qs.to_owned()); 

 let mut quoting_style = if let Some(style) = opt_quoting_style { 

 match style.as_str() { 

 "literal" => QuotingStyle::Literal { show_control }, 

 "shell" => QuotingStyle::Shell { 

 escape: false, 

 always_quote: false, 

 show_control, 

 }, 

 "shell-always" => QuotingStyle::Shell { 

 escape: false, 

 always_quote: true, 

 show_control, 

 }, 

 "shell-escape" => QuotingStyle::Shell { 

 escape: true, 

 always_quote: false, 

 show_control, 

 }, 

 "shell-escape-always" => QuotingStyle::Shell { 

 escape: true, 

 always_quote: true, 

 show_control, 

 }, 

 "c" => QuotingStyle::C { 

 quotes: quoting_style::Quotes::Double, 

 }, 

 "escape" => QuotingStyle::C { 

 quotes: quoting_style::Quotes::None, 

 }, 

 _ => unreachable!("Should have been caught by Clap"), 

 } 

 } else if options.get_flag(options::quoting::LITERAL) { 

 QuotingStyle::Literal { show_control } 

 } else if options.get_flag(options::quoting::ESCAPE) { 

 QuotingStyle::C { 

 quotes: quoting_style::Quotes::None, 

 } 

 } else if options.get_flag(options::quoting::C) { 

 QuotingStyle::C { 

 quotes: quoting_style::Quotes::Double, 

 } 

 } else { 

 // TODO: use environment variable if available 

 QuotingStyle::Shell { 

 escape: true, 

 always_quote: false, 

 show_control, 

 } 

 }; 

 let indicator_style = if let Some(field) = 

 options.get_one::<String>(options::INDICATOR_STYLE) 

 { 

 match field.as_str() { 

 "none" => IndicatorStyle::None, 

 "file-type" => IndicatorStyle::FileType, 

 "classify" => IndicatorStyle::Classify, 

 "slash" => IndicatorStyle::Slash, 

 &_ => IndicatorStyle::None, 

 } 

 } else if let Some(field) = options.get_one::<String>(options::indicator_style::CLASSIFY) { 

 match field.as_str() { 

 "never" | "no" | "none" => IndicatorStyle::None, 

 "always" | "yes" | "force" => IndicatorStyle::Classify, 

 "auto" | "tty" | "if-tty" => { 

 if atty::is(atty::Stream::Stdout) { 

 IndicatorStyle::Classify 

 } else { 

 IndicatorStyle::None 

 } 

 } 

 &_ => IndicatorStyle::None, 

 } 

 } else if options.get_flag(options::indicator_style::SLASH) { 

 IndicatorStyle::Slash 

 } else if options.get_flag(options::indicator_style::FILE_TYPE) { 

 IndicatorStyle::FileType 

 } else { 

 IndicatorStyle::None 

 }; 

 let time_style = parse_time_style(options)?; 

 let mut ignore_patterns: Vec<Pattern> = Vec::new(); 

 if options.get_flag(options::IGNORE_BACKUPS) { 

 ignore_patterns.push(Pattern::new("*~").unwrap()); 

 ignore_patterns.push(Pattern::new(".*~").unwrap()); 

 } 

 for pattern in options 

 .get_many::<String>(options::IGNORE) 

 .into_iter() 

 .flatten() 

 { 

 match parse_glob::from_str(pattern) { 

 Ok(p) => { 

 ignore_patterns.push(p); 

 } 

 Err(_) => show_warning!("Invalid pattern for ignore: {}", pattern.quote()), 

 } 

 } 

 if files == Files::Normal { 

 for pattern in options 

 .get_many::<String>(options::HIDE) 

 .into_iter() 

 .flatten() 

 { 

 match parse_glob::from_str(pattern) { 

 Ok(p) => { 

 ignore_patterns.push(p); 

 } 

 Err(_) => show_warning!("Invalid pattern for hide: {}", pattern.quote()), 

 } 

 } 

 } 

 // According to ls info page, `--zero` implies the following flags: 

 // - `--show-control-chars` 

 // - `--format=single-column` 

 // - `--color=none` 

 // - `--quoting-style=literal` 

 // Current GNU ls implementation allows `--zero` Behavior to be 

 // overridden by later flags. 

 let zero_formats_opts = [ 

 options::format::ACROSS, 

 options::format::COLUMNS, 

 options::format::COMMAS, 

 options::format::LONG, 

 options::format::LONG_NO_GROUP, 

 options::format::LONG_NO_OWNER, 

 options::format::LONG_NUMERIC_UID_GID, 

 options::format::ONE_LINE, 

 options::FORMAT, 

 ]; 

 let zero_colors_opts = [options::COLOR]; 

 let zero_show_control_opts = [options::HIDE_CONTROL_CHARS, options::SHOW_CONTROL_CHARS]; 

 let zero_quoting_style_opts = [ 

 options::QUOTING_STYLE, 

 options::quoting::C, 

 options::quoting::ESCAPE, 

 options::quoting::LITERAL, 

 ]; 

 let get_last = |flag: &str| -> usize { 

 if options.value_source(flag) == Some(clap::parser::ValueSource::CommandLine) { 

 options.index_of(flag).unwrap_or(0) 

 } else { 

 0 

 } 

 }; 

 if get_last(options::ZERO) 

 > zero_formats_opts 

 .into_iter() 

 .map(get_last) 

 .max() 

 .unwrap_or(0) 

 { 

 format = if format == Format::Long { 

 format 

 } else { 

 Format::OneLine 

 }; 

 } 

 if get_last(options::ZERO) 

 > zero_colors_opts 

 .into_iter() 

 .map(get_last) 

 .max() 

 .unwrap_or(0) 

 { 

 needs_color = false; 

 } 

 if get_last(options::ZERO) 

 > zero_show_control_opts 

 .into_iter() 

 .map(get_last) 

 .max() 

 .unwrap_or(0) 

 { 

 show_control = true; 

 } 

 if get_last(options::ZERO) 

 > zero_quoting_style_opts 

 .into_iter() 

 .map(get_last) 

 .max() 

 .unwrap_or(0) 

 { 

 quoting_style = QuotingStyle::Literal { show_control }; 

 } 

 let color = if needs_color { 

 Some(LsColors::from_env().unwrap_or_default()) 

 } else { 

 None 

 }; 

 let dereference = if options.get_flag(options::dereference::ALL) { 

 Dereference::All 

 } else if options.get_flag(options::dereference::ARGS) { 

 Dereference::Args 

 } else if options.get_flag(options::dereference::DIR_ARGS) { 

 Dereference::DirArgs 

 } else if options.get_flag(options::DIRECTORY) 

 || indicator_style == IndicatorStyle::Classify 

 || format == Format::Long 

 { 

 Dereference::None 

 } else { 

 Dereference::DirArgs 

 }; 

 Ok(Self { 

 format, 

 files, 

 sort, 

 recursive: options.get_flag(options::RECURSIVE), 

 reverse: options.get_flag(options::REVERSE), 

 dereference, 

 ignore_patterns, 

 size_format, 

 directory: options.get_flag(options::DIRECTORY), 

 time, 

 color, 

 #[cfg(unix)] 

 inode: options.get_flag(options::INODE), 

 long, 

 alloc_size: options.get_flag(options::size::ALLOCATION_SIZE), 

 block_size, 

 width, 

 quoting_style, 

 indicator_style, 

 time_style, 

 context, 

 selinux_supported: { 

 #[cfg(feature = "selinux")] 

 { 

 selinux::kernel_support() != selinux::KernelSupport::Unsupported 

 } 

 #[cfg(not(feature = "selinux"))] 

 { 

 false 

 } 

 }, 

 group_directories_first: options.get_flag(options::GROUP_DIRECTORIES_FIRST), 

 eol: if options.get_flag(options::ZERO) { 

 '\0' 

 } else { 

 '\n' 

 }, 

 }) 

 }

My assertion is that we can get rid of almost this entire function with the new library. I'll try to port ls as a test soon, so I can prove that assertion :). There are more utils with similar behaviour that we are not handling correctly right now.

ask clap developer if they would be willing to take some patches?

I have asked about a few things in the past:

Most notably I implemented this: Infer long arguments clap-rs/clap#2525
Asked for three leading hyphens, which is a WIP: 3 leading hyphens for options (---long) clap-rs/clap#3309
And asked for exit code support, which was marked wont-fix: Custom exit code clap-rs/clap#3426

However, most of the things I tried to fix here are more fundamental. clap seems to assume that every option maps to a single value, which is a great simplification if you have the luxury of defining new API's, but it doesn't work well for coreutils. This is too much of a difference that they won't be able to bridge. clap is also (rightfully) concerned about backwards compatibility, which will limit some changes.

The reason that I'm so confident that this works well, is that the generated code is very similar to GNU. In GNU, they have an iterator over arguments and change the configuration based on the encountered values. clap first collects all arguments and provides an API for the final values. This is fundamentally why we need to inspect indices and do complicated tricks with some arguments to match the behaviour. This new library does basically the same as GNU just with a derive API and is therefore a great fit.

I've listed most of the problems I found here: https:/tertsdiepraam/uutils-args/blob/main/design/problems_with_clap.md, which shows how complicated these are to change in clap.

if we could write a clap extension for this?

I wish we could, but I don't know what that would look like. For example, I wouldn't know how to do that ls function above as an extension.

rivy · 2023-01-06T14:10:16Z

@tertsdiepraam , I'd love to have improvements in command line arguments support.

Your contributions have always been great and I'm happy to support your further experiments.

I would suggest that, if you decide to embark on this project, it should be built and developed in another repo and as it's own crate. We can set it up as a repo under the uutils banner.

I can understand @sylvestre's concern about duplication of effort and reinvention, but I empathize strongly with your position. I've contributed to clap a few times, and it's become a bit of a beast. I quite understand the "replace it" feeling that you are experiencing. 😄

sylvestre · 2023-01-06T14:14:38Z

yeah, we should probably do it :)
I just want to make sure that we have really discussed about it

jfinkels · 2023-02-05T18:40:04Z

This sounds like a reasonable change to me, I've run into quite a few issues with argument parsing. So if we can make it easier to deal with silliness like

chmod u=rrrr+-+-+-www f

or

df --o

without too much effort, that would be great.

tertsdiepraam · 2023-02-05T21:51:16Z

What's the silliness you're referring to? Is it about parsing the permissions and non-ambiguous prefixes of long arguments? I have the second one implemented, but the first one will probably not be part of this library. You can define a type that implements a parser that will work though.

jfinkels · 2023-02-06T00:31:52Z

Yes, I was referring to parsing the permissions and non-ambiguous prefixes of long arguments. They were just two things that came to mind when I tried to recall trouble I had had with arguments in the past.

tertsdiepraam · 2023-11-01T12:29:25Z

10 months after opening this issue, I've resumed my work on this :). I think the next step is determining when we deem it feature-complete and battle-tested enough to be included here.

Here's what I have right now:

Short args (with no, optional and required arguments)
Long args (with no, optional and required arguments)
Positional arguments
--help and --version
Help text based on an external file
Help text based on docstrings
dd-style args
A special parsing mode for echo
An escape hatch for arguments like -10/+10 like fold
Error messages on invalid arguments
Parsing of values with a derive macro
Control over the exit code on errors

What's missing is:

Making the errors more informative and pretty (e.g. add hints for typos)
Generate docs, completions, and manpages (https:/tertsdiepraam/uutils-args/issues/5, https:/tertsdiepraam/uutils-args/issues/4)
A good design for composing multiple definitions of arguments (https:/tertsdiepraam/uutils-args/issues/21).
A lot of documentation
Everything that I've missed in my research (even though the research was pretty extensive)

The problem is that there's only so much I can do in a separate repo. We have to start properly integrating this if we want to find the rest of the actual problems.

There's also the issue that while we are migrating, the docs and manpages will probably be broken anyway, because we will temporarily have both clap and this parser in the repo. Unless we only work on a separate branch, but that seems suboptimal too. Happy to hear any suggestions for how we should deal with this. My solution is that we generate the manpages and docs, store them and then remove the code that generates it. That will give people a snapshot that will be pretty accurate for a while.

The markdown format for help is also missing at this point. I cut that out to actually get to something done. We can add it back in later.

If you want to see the library in action, I recommend looking at the tests that implement several coreutils.

BenWiederhake · 2024-02-19T21:54:58Z

Looks like I accidentally stumbled into another clap-impossibility: unattached multi-valued short-args, which clap would prefer not to fix, and I honestly agree that it's a silly feature that should have never been invented in the first place.

~~However, with clap it is impossible to implement shuf -ez hello world correctly.~~

EDIT: Yes, it is possible, I was just a bit too daft.

tertsdiepraam · 2024-02-19T22:18:14Z

That might be where I definitively draw the line of compatibility 😄 Holy cow, that's cursed.

BenWiederhake · 2024-02-19T22:30:51Z

I think it's pretty doable with uutil-args. If I get it to work, would you be interested, or is that too cursed?

More cursed stuff …

$ echo foo > a; echo bar > b
$ shuf a b
shuf: extra operand 'b'
Try 'shuf --help' for more information.
[$? = 1]
$ shuf a b -e # A *later* argv string changes the interpretation of an *earlier* argv!
b
a
$

tertsdiepraam · 2024-02-19T22:45:57Z

Actually, I think this is because --echo does not take any arguments. So, it should be possible in both clap and the new parser.

Sort of related, I think we should also put some guidance in CONTRIBUTING.md about filing issues on our behalf. You obviously mean well and it's appreciated, but I'd like to minimize the time other maintainers have to spend on our problems and explore solutions on our side first. Especially with clap, we've had people file issues a few times without communicating with the uutils maintainers and I don't want Ed Page to get annoyed with us ¹ 😅

Of course, he's been awesome about this and always provided friendly and detailed responses. ↩

BenWiederhake · 2024-02-19T22:49:48Z

According to the GNU documentation, --echo does take arguments, and shuf a b proves that, kinda. But yes, I would implement it as a boolean flag, and then later error out if more than one "file" was supplied. Sorry, I'm getting off-topic.

I tried to keep the clap issue reasonably general, and in retrospect I agree that perhaps it would have been better to coordinate with you. Thankfully, nothing bad happened this time. :)

tertsdiepraam · 2024-02-19T22:54:33Z

Where do you see that --echotakes arguments? In the manpage it says

treat each ARG as an input line

with ARG referring to this line:

shuf -e [OPTION]... [ARG]...

and in the docs it says

Treat each command-line operand as an input line.

So I think it's just a boolean.

But, depending on the flag, the maximum number of arguments changes, which is supported by the new parser!

BenWiederhake · 2024-03-29T01:35:07Z

Here's another behavior that is impossible to replicate using clap:

$ date -Ilolwut -R -R  # parses "lolwut" before either "-R", and raises that error
date: invalid argument 'lolwut' for '--iso-8601'
Valid arguments are:
  - 'hours'
  - 'minutes'
  - 'date'
  - 'seconds'
  - 'ns'
Try 'date --help' for more information.
[$? = 1]
$ date -R -R -Ilolwut  # parses the second "-R" before "-Ilolwut", and raises that error
date: multiple output formats specified
[$? = 1]
$ date -R -Ilolwut -R  # In fact, it first tries to parse the value of "-i" before noticing duplicity
date: invalid argument 'lolwut' for '--iso-8601'
Valid arguments are:
  - 'hours'
  - 'minutes'
  - 'date'
  - 'seconds'
  - 'ns'
Try 'date --help' for more information.
[$? = 1]
$ date -R -Idate -R  # … so if the value to "-I" is valid, it notices the duplicity only afterwards.
date: multiple output formats specified
[$? = 1]

Doing this in uutils-args is actually also difficult, because Settings::apply does not (yet?) permit raising an error. See uutils/uutils-args#112 for my suggestion how to change that.

jfinkels · 2024-03-30T19:43:19Z

From your examples, it seems like the parsing is just handled sequentially in order, terminating at the first error. Is that right?

BenWiederhake · 2024-03-30T19:52:57Z

@jfinkels: Yes, hence my proposal to implement it like this: uutils/uutils-args#113

tertsdiepraam changed the title ~~New year, new argument parser~~ New year, new argument parser! Jan 1, 2023

tertsdiepraam mentioned this issue Feb 13, 2023

Require = for --tmpdir in mktemp #4342

Merged

tertsdiepraam changed the title ~~New year, new argument parser!~~ New argument parser Sep 14, 2023

This was referenced Feb 20, 2024

implement proof of concept for shuf echo args / file uutils/uutils-args#104

Merged

shuf: treat -e as a flag, not as a multi-value arg #5989

Merged

This was referenced Mar 29, 2024

Options::apply should permit returning an error uutils/uutils-args#112

Open

date: fix interaction of flags, fix issues around --set #6142

Open

This was referenced Apr 28, 2024

cksum: Improve the GNU compat #6256

Merged

fmt: clap causes information loss #6353

Open

BenWiederhake mentioned this issue May 10, 2024

date: iso-8601 argument format discrepancy #6387

Open

oSoMoN mentioned this issue Sep 30, 2024

Consider using an external argument parser uutils/diffutils#97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New argument parser #4254

New argument parser #4254

tertsdiepraam commented Jan 1, 2023 •

edited

Loading

sylvestre commented Jan 3, 2023

tertsdiepraam commented Jan 3, 2023

rivy commented Jan 6, 2023

sylvestre commented Jan 6, 2023

jfinkels commented Feb 5, 2023

tertsdiepraam commented Feb 5, 2023

jfinkels commented Feb 6, 2023

tertsdiepraam commented Nov 1, 2023 •

edited

Loading

BenWiederhake commented Feb 19, 2024 •

edited

Loading

tertsdiepraam commented Feb 19, 2024

BenWiederhake commented Feb 19, 2024

tertsdiepraam commented Feb 19, 2024 •

edited

Loading

BenWiederhake commented Feb 19, 2024

tertsdiepraam commented Feb 19, 2024 •

edited

Loading

BenWiederhake commented Mar 29, 2024 •

edited

Loading

jfinkels commented Mar 30, 2024

BenWiederhake commented Mar 30, 2024

New argument parser #4254

New argument parser #4254

Comments

tertsdiepraam commented Jan 1, 2023 • edited Loading

Why a custom argument parser

Current Status

Downsides

Proposed roadmap

How to help out

Footnotes

sylvestre commented Jan 3, 2023

tertsdiepraam commented Jan 3, 2023

rivy commented Jan 6, 2023

sylvestre commented Jan 6, 2023

jfinkels commented Feb 5, 2023

tertsdiepraam commented Feb 5, 2023

jfinkels commented Feb 6, 2023

tertsdiepraam commented Nov 1, 2023 • edited Loading

BenWiederhake commented Feb 19, 2024 • edited Loading

tertsdiepraam commented Feb 19, 2024

BenWiederhake commented Feb 19, 2024

tertsdiepraam commented Feb 19, 2024 • edited Loading

Footnotes

BenWiederhake commented Feb 19, 2024

tertsdiepraam commented Feb 19, 2024 • edited Loading

BenWiederhake commented Mar 29, 2024 • edited Loading

jfinkels commented Mar 30, 2024

BenWiederhake commented Mar 30, 2024

tertsdiepraam commented Jan 1, 2023 •

edited

Loading

tertsdiepraam commented Nov 1, 2023 •

edited

Loading

BenWiederhake commented Feb 19, 2024 •

edited

Loading

tertsdiepraam commented Feb 19, 2024 •

edited

Loading

tertsdiepraam commented Feb 19, 2024 •

edited

Loading

BenWiederhake commented Mar 29, 2024 •

edited

Loading