Skip to content

Commit

Permalink
Escape transform and docs (#970)
Browse files Browse the repository at this point in the history
Update some documentation and add a string escape transformer so escaped
strings can be handled on the command line as well as in the config
files.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
phlptp and pre-commit-ci[bot] authored Jan 6, 2024
1 parent 9110160 commit de1c6a1
Show file tree
Hide file tree
Showing 13 changed files with 152 additions and 48 deletions.
39 changes: 28 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -451,8 +451,8 @@ Before parsing, you can set the following options:
This equivalent to calling `->delimiter(delim)` and `->join()`. Valid values
are `CLI::MultiOptionPolicy::Throw`, `CLI::MultiOptionPolicy::Throw`,
`CLI::MultiOptionPolicy::TakeLast`, `CLI::MultiOptionPolicy::TakeFirst`,
`CLI::MultiOptionPolicy::Join`, `CLI::MultiOptionPolicy::TakeAll`, and
`CLI::MultiOptionPolicy::Sum` 🆕.
`CLI::MultiOptionPolicy::Join`, `CLI::MultiOptionPolicy::TakeAll`,
`CLI::MultiOptionPolicy::Sum` 🆕, and `CLI::MultiOptionPolicy::Reverse` 🚧.
- `->check(std::string(const std::string &), validator_name="",validator_description="")`:
Define a check function. The function should return a non empty string with
the error message if the check fails
Expand Down Expand Up @@ -702,6 +702,17 @@ filters on the key values is performed.
`CLI::FileOnDefaultPath(default_path, false)`. This allows multiple paths to
be chained using multiple transform calls.
- `CLI::EscapedString`: 🚧 can be used to process an escaped string. The
processing is equivalent to that used for TOML config files, see
[TOML strings](https://toml.io/en/v1.0.0#string). With 2 notable exceptions.
\` can also be used as a literal string notation, and it also allows binary
string notation see
[binary strings](https://cliutils.github.io/CLI11/book/chapters/config.html).
The escaped string processing will remove outer quotes if present, `"` will
indicate a string with potential escape sequences, `'` and \` will indicate a
literal string and the quotes removed but no escape sequences will be
processed. This is the same escape processing as used in config files.
##### Validator operations
Validators are copyable and have a few operations that can be performed on them
Expand Down Expand Up @@ -873,9 +884,11 @@ through the `add_subcommand` method have the same restrictions as option names.
- `--subcommand1.subsub.f val` (short form nested subcommand option)

The use of dot notation in this form is equivalent `--subcommand.long <args>` =>
`subcommand --long <args> ++`. Nested subcommands also work `"sub1.subsub"`
would trigger the subsub subcommand in `sub1`. This is equivalent to "sub1
subsub"
`subcommand --long <args> ++`. Nested subcommands also work `sub1.subsub` would
trigger the subsub subcommand in `sub1`. This is equivalent to "sub1 subsub".
Quotes around the subcommand names are permitted 🚧 following the TOML standard
for such specification. This includes allowing escape sequences. For example
`"subcommand".'f'` or `"subcommand.with.dots".arg1 = value`.

#### Subcommand options

Expand Down Expand Up @@ -1209,26 +1222,30 @@ option (like `set_help_flag`). Setting a configuration option is special. If it
is present, it will be read along with the normal command line arguments. The
file will be read if it exists, and does not throw an error unless `required` is
`true`. Configuration files are in [TOML][] format by default, though the
default reader can also accept files in INI format as well. It should be noted
that CLI11 does not contain a full TOML parser but can read strings from most
TOML files, including multi-line strings 🚧, and run them through the CLI11
parser. Other formats can be added by an adept user, some variations are
available through customization points in the default formatter. An example of a
TOML file:
default reader can also accept files in INI format as well. The config reader
can read most aspects of TOML files including strings both literal 🚧 and with
potential escape sequences 🚧, digit separators 🚧, and multi-line strings 🚧,
and run them through the CLI11 parser. Other formats can be added by an adept
user, some variations are available through customization points in the default
formatter. An example of a TOML file:

```toml
# Comments are supported, using a #
# The default section is [default], case insensitive

value = 1
value2 = 123_456 # a string with separators
str = "A string"
str2 = "A string\nwith new lines"
str3 = 'A literal "string"'
vector = [1,2,3]
str_vector = ["one","two","and three"]

# Sections map to subcommands
[subcommand]
in_subcommand = Wow
sub.subcommand = true
"sub"."subcommand2" = "string_value"
```

or equivalently in INI format
Expand Down
23 changes: 16 additions & 7 deletions book/chapters/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,9 @@ app.set_config("--config")
will read the files in the order given, which may be useful in some
circumstances. Using `CLI::MultiOptionPolicy::TakeLast` would work similarly
getting the last `N` files given.
getting the last `N` files given. The default policy for config options is
`CLI::MultiOptionPolicy::Reverse` which takes the last expected `N` and reverses
them so the last option given is given precedence.
## Configure file format
Expand Down Expand Up @@ -204,14 +206,18 @@ str3 = """\
```

The key is that the closing of the multiline string must be at the end of a line
and match the starting 3 quote sequence.
and match the starting 3 quote sequence. Multiline sequences using `"""` allow
escape sequences. Following [TOML](https://toml.io/en/v1.0.0#string) with the
addition of allowing '\0' for a null character, and binary Strings described in
the next section. This same formatting also applies to single line strings.
Multiline strings are not allowed as part of an array.

### Binary Strings

Config files have a binary conversion capability, this is mainly to support
writing config files but can be used by user generated files as well. Strings
with the form `B"(XXXXX)"` will convert any characters inside the parenthesis
with the form \xHH to the equivalent binary value. The HH are hexadecimal
with the form `\xHH` to the equivalent binary value. The HH are hexadecimal
characters. Characters not in this form will be translated as given. If argument
values with unprintable characters are used to generate a config file this
binary form will be used in the output string.
Expand Down Expand Up @@ -274,8 +280,8 @@ char arraySeparator = ',';
char valueDelimiter = '=';
/// the character to use around strings
char stringQuote = '"';
/// the character to use around single characters
char characterQuote = '\'';
/// the character to use around single characters and literal strings
char literalQuote = '\'';
/// the maximum number of layers to allow
uint8_t maximumLayers{255};
/// the separator used to separator parent layers
Expand All @@ -296,8 +302,8 @@ These can be modified via setter functions
an array
- `ConfigBase *valueSeparator(char vSep)`: Specify the delimiter between a name
and value
- `ConfigBase *quoteCharacter(char qString, char qChar)` :specify the characters
to use around strings and single characters
- `ConfigBase *quoteCharacter(char qString, char literalChar)` :specify the
characters to use around strings and single characters
- `ConfigBase *maxLayers(uint8_t layers)` : specify the maximum number of parent
layers to process. This is useful to limit processing for larger config files
- `ConfigBase *parentSeparator(char sep)` : specify the character to separate
Expand Down Expand Up @@ -410,3 +416,6 @@ will create an option name in following priority.
2. Positional name
3. First short name
4. Environment name
In config files the name will be enclosed in quotes if there is any potential
ambiguities in parsing the name.
24 changes: 12 additions & 12 deletions book/chapters/options.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,18 @@ app.add_option("-i", int_option, "Optional description")->capture_default_str();
You can use any C++ int-like type, not just `int`. CLI11 understands the
following categories of types:

| Type | CLI11 |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| number like | Integers, floats, bools, or any type that can be constructed from an integer or floating point number. Accepts common numerical strings like `0xFF` as well as octal, and decimal |
| string-like | std::string, or anything that can be constructed from or assigned a std::string |
| char | For a single char, single string values are accepted, otherwise longer strings are treated as integral values and a conversion is attempted |
| complex-number | std::complex or any type which has a real(), and imag() operations available, will allow 1 or 2 string definitions like "1+2j" or two arguments "1","2" |
| enumeration | any enum or enum class type is supported through conversion from the underlying type(typically int, though it can be specified otherwise) |
| container-like | a container(like vector) of any available types including other containers |
| wrapper | any other object with a `value_type` static definition where the type specified by `value_type` is one of the type in this list, including `std::atomic<>` |
| tuple | a tuple, pair, or array, or other type with a tuple size and tuple_type operations defined and the members being a type contained in this list |
| function | A function that takes an array of strings and returns a string that describes the conversion failure or empty for success. May be the empty function. (`{}`) |
| streamable | any other type with a `<<` operator will also work |
| Type | CLI11 |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| number like | Integers, floats, bools, or any type that can be constructed from an integer or floating point number. Accepts common numerical strings like `0xFF` as well as octal[\0755, or \o755], decimal, and binary(0b011111100), supports value separators including `_` and `'` |
| string-like | std::string, or anything that can be constructed from or assigned a std::string |
| char | For a single char, single string values are accepted, otherwise longer strings are treated as integral values and a conversion is attempted |
| complex-number | std::complex or any type which has a real(), and imag() operations available, will allow 1 or 2 string definitions like "1+2j" or two arguments "1","2" |
| enumeration | any enum or enum class type is supported through conversion from the underlying type(typically int, though it can be specified otherwise) |
| container-like | a container(like vector) of any available types including other containers |
| wrapper | any other object with a `value_type` static definition where the type specified by `value_type` is one of the type in this list, including `std::atomic<>` |
| tuple | a tuple, pair, or array, or other type with a tuple size and tuple_type operations defined and the members being a type contained in this list |
| function | A function that takes an array of strings and returns a string that describes the conversion failure or empty for success. May be the empty function. (`{}`) |
| streamable | any other type with a `<<` operator will also work |

By default, CLI11 will assume that an option is optional, and one value is
expected if you do not use a vector. You can change this on a specific option
Expand Down
6 changes: 3 additions & 3 deletions include/CLI/ConfigFwd.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -129,10 +129,10 @@ class ConfigBase : public Config {
valueDelimiter = vSep;
return this;
}
/// Specify the quote characters used around strings and characters
ConfigBase *quoteCharacter(char qString, char qChar) {
/// Specify the quote characters used around strings and literal strings
ConfigBase *quoteCharacter(char qString, char literalChar) {
stringQuote = qString;
literalQuote = qChar;
literalQuote = literalChar;
return this;
}
/// Specify the maximum number of parents
Expand Down
8 changes: 8 additions & 0 deletions include/CLI/Validators.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,11 @@ class IPV4Validator : public Validator {
IPV4Validator();
};

class EscapedStringTransformer : public Validator {
public:
EscapedStringTransformer();
};

} // namespace detail

// Static is not needed here, because global const implies static.
Expand All @@ -237,6 +242,9 @@ const detail::NonexistentPathValidator NonexistentPath;
/// Check for an IP4 address
const detail::IPV4Validator ValidIPV4;

/// convert escaped characters into their associated values
const detail::EscapedStringTransformer EscapedString;

/// Validate the input as a particular type
template <typename DesiredType> class TypeValidator : public Validator {
public:
Expand Down
6 changes: 3 additions & 3 deletions include/CLI/Version.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
// [CLI11:version_hpp:verbatim]

#define CLI11_VERSION_MAJOR 2
#define CLI11_VERSION_MINOR 3
#define CLI11_VERSION_PATCH 2
#define CLI11_VERSION "2.3.2"
#define CLI11_VERSION_MINOR 4
#define CLI11_VERSION_PATCH 0
#define CLI11_VERSION "2.4.0"

// [CLI11:version_hpp:end]
6 changes: 5 additions & 1 deletion include/CLI/impl/Config_inl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,11 @@ inline std::vector<ConfigItem> ConfigBase::from_config(std::istream &input) cons
item.pop_back();
}
if(keyChar == '\"') {
item = detail::remove_escaped_characters(item);
try {
item = detail::remove_escaped_characters(item);
} catch(const std::invalid_argument &ia) {
throw CLI::ParseError(ia.what(), CLI::ExitCodes::InvalidError);
}
}
} else {
if(lineExtension) {
Expand Down
21 changes: 20 additions & 1 deletion include/CLI/impl/Validators_inl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,29 @@ CLI11_INLINE IPV4Validator::IPV4Validator() : Validator("IPV4") {
return std::string("Each IP number must be between 0 and 255 ") + var;
}
}
return std::string();
return std::string{};
};
}

CLI11_INLINE EscapedStringTransformer::EscapedStringTransformer() {
func_ = [](std::string &str) {
try {
if(str.size() > 1 && (str.front() == '\"' || str.front() == '\'' || str.front() == '`') &&
str.front() == str.back()) {
process_quoted_string(str);
} else if(str.find_first_of('\\') != std::string::npos) {
if(detail::is_binary_escaped_string(str)) {
str = detail::extract_binary_string(str);
} else {
str = remove_escaped_characters(str);
}
}
return std::string{};
} catch(const std::invalid_argument &ia) {
return std::string(ia.what());
}
};
}
} // namespace detail

CLI11_INLINE FileOnDefaultPath::FileOnDefaultPath(std::string default_path, bool enableErrorReturn)
Expand Down
2 changes: 1 addition & 1 deletion tests/FuzzFailTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ TEST_CASE("file_fail") {
CLI::FuzzApp fuzzdata;
auto app = fuzzdata.generateApp();

int index = GENERATE(range(1, 6));
int index = GENERATE(range(1, 7));
auto parseData = loadFailureFile("fuzz_file_fail", index);
std::stringstream out(parseData);
try {
Expand Down
9 changes: 0 additions & 9 deletions tests/HelpersTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -308,15 +308,6 @@ TEST_CASE("StringTools: binaryStrings", "[helpers]") {
CHECK(result == "\\XEM\\X7K");
}

/// these are provided for compatibility with the char8_t for C++20 that breaks stuff
std::string from_u8string(const std::string &s) { return s; }
std::string from_u8string(std::string &&s) { return std::move(s); }
#if defined(__cpp_lib_char8_t)
std::string from_u8string(const std::u8string &s) { return std::string(s.begin(), s.end()); }
#elif defined(__cpp_char8_t)
std::string from_u8string(const char8_t *s) { return std::string(reinterpret_cast<const char *>(s)); }
#endif

TEST_CASE("StringTools: escapeConversion", "[helpers]") {
CHECK(CLI::detail::remove_escaped_characters("test\\\"") == "test\"");
CHECK(CLI::detail::remove_escaped_characters("test\\\\") == "test\\");
Expand Down
47 changes: 47 additions & 0 deletions tests/TransformTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -706,6 +706,53 @@ TEST_CASE_METHOD(TApp, "NumberWithUnitBadInput", "[transform]") {
CHECK_THROWS_AS(run(), CLI::ValidationError);
}

static const std::map<std::string, std::string> validValues = {
{"test\\u03C0\\u00e9", from_u8string(u8"test\u03C0\u00E9")},
{"test\\u03C0\\u00e9", from_u8string(u8"test\u73C0\u0057")},
{"test\\U0001F600\\u00E9", from_u8string(u8"test\U0001F600\u00E9")},
{R"("this\nis\na\nfour\tline test")", "this\nis\na\nfour\tline test"},
{"'B\"(\\x35\\xa7\\x46)\"'", std::string{0x35, static_cast<char>(0xa7), 0x46}},
{"B\"(\\x35\\xa7\\x46)\"", std::string{0x35, static_cast<char>(0xa7), 0x46}},
{"test\\ntest", "test\ntest"},
{"\"test\\ntest", "\"test\ntest"},
{R"('this\nis\na\nfour\tline test')", R"(this\nis\na\nfour\tline test)"},
{R"("this\nis\na\nfour\tline test")", "this\nis\na\nfour\tline test"},
{R"(`this\nis\na\nfour\tline test`)", R"(this\nis\na\nfour\tline test)"}};

TEST_CASE_METHOD(TApp, "StringEscapeValid", "[transform]") {

auto test_data = GENERATE(from_range(validValues));

std::string value{};

app.add_option("-n", value)->transform(CLI::EscapedString);

args = {"-n", test_data.first};

run();
CHECK(test_data.second == value);
}

static const std::vector<std::string> invalidValues = {"test\\U0001M600\\u00E9",
"test\\U0001E600\\u00M9",
"test\\U0001E600\\uD8E9",
"test\\U0001E600\\uD8",
"test\\U0001E60",
"test\\qbad"};

TEST_CASE_METHOD(TApp, "StringEscapeInvalid", "[transform]") {

auto test_data = GENERATE(from_range(invalidValues));

std::string value{};

app.add_option("-n", value)->transform(CLI::EscapedString);

args = {"-n", test_data};

CHECK_THROWS_AS(run(), CLI::ValidationError);
}

TEST_CASE_METHOD(TApp, "NumberWithUnitIntOverflow", "[transform]") {
std::map<std::string, int> mapping{{"a", 1000000}, {"b", 100}, {"c", 101}};

Expand Down
Loading

0 comments on commit de1c6a1

Please sign in to comment.