Feat: std: parseopt parser modes#25506
Conversation
Adds configurable parser modes to std/parseopt module. Take two. std/parseopt now supports three parser modes via an optional mode parameter in initOptParser and getopt. Three modes are provided: - NimMode (default, fully backward compatible), - LaxMode (POSIX-inspired with relaxed short option handling), - GnuMode (stricter GNU-style conventions).
style fixes Co-authored-by: Andreas Rumpf <[email protected]>
|
The implementation seems to be alright, the documentation leaves a lot to be desired. Keep the existing docs mostly as they are. Add a sentence: "There are different parsing modes. NimMode is the default. Read this for more information of the other modes and how they differ from each other." Make a separate |
Could you elaborate a bit? Do you mean only the new additions or the whole thing?
Can't say I like this suggestion very much. Can do that, of course. I'm a bit concerned that we'll have to link back a lot and the routine-level documentation won't be visible alongside. What are we trying to achieve there exactly, articulate the experimental nature of the new modes? |
|
I'm trying to achieve that newcomers can use |
|
That would require a redesign. I also don't like separating the new additions to the docs because at leat the third of them deals with explaining how the default mode works. This is helpful for the newcomers because, for example, the GnuMode's behavior is much more common, so they can easily see how the default mode differs. |
|
Well but before you came along the docs were simpler and I've seen many people use parseopt successfully throughout the years. There is not much "low level" going on here, you need a loop and a case statement. The alternatives always come down to requiring some mapping between CLI switch and data managment too (they need their own version of the You're concerned with the edge cases, good, but the existing documentation was sufficient. |
They weren't, because I didn't touch any docs beside one sentence at the start and converting examples to runnable. They weren't easy to follow either, but it's a matter of opinion. I'm doing what you asked, even though it doesn't feel nice. Splitting the new additions requires improvements in other parts to bring them to informational parity, and improving the old docs extends the scope of this PR. |
|
No, I'm taking this version now. Nobody reads this anyway, people will vibecode and the AI has no attention deficit. |
|
Thanks for your hard work on this PR! Hint: mm: orc; opt: speed; options: -d:release |
Follow-up to #25506. As I mentioned there, I was in the middle of an edit, so here it is. Splitting to a separate doc skipped. A couple of minor mistakes fixed, some things made a bit more concise and short.
Adds configurable parser modes to std/parseopt module. **Take two.**
Initially solved the issue of not being able to pass arguments to short
options as you do with most everyday CLI programs, but reading the tests
made me add more features so that some of the behaviour could be changed
and here we are.
**`std/parseopt` now supports three parser modes** via an optional
`mode` parameter in `initOptParser` and `getopt`.
Three modes are provided:
- `NimMode` (default, fully backward compatible),
- `LaxMode` (POSIX-inspired with relaxed short option handling),
- `GnuMode` (stricter GNU-style conventions).
The new modes are marked as experimental in the documentation.
The parser behaviour is controlled by a new `ParserRules` enum, which
provides granular feature flags that modes are built from. This makes it
possible for users with specific requirements to define custom rule sets
by importing private symbols, this is mentioned but clearly marked as
unsupported.
**Backward compatibility:**
The default mode preserves existing behaviour completely, with a single
exception: `allowWhitespaceAfterColon` is deprecated.
Now, `allowWhitespaceAfterColon` doesn't make much sense as a single
tuning knob. The `ParserRule.prSepAllowDelimAfter` controls this now.
As `allowWhitespaceAfterColon` had a default, most calls never mention
it so they will silently migrate to the new `initOptParser` overload. To
cover cases when the proc param was used at call-site, I added an
overload, which modifies the default parser mode to reflect the required
`allowWhitespaceAfterColon` value. Should be all smooth for most users,
except the deprecation warning.
The only thing I think can be classified as the breaking change is a
surprising **bug** of the old parser:
```nim
let p = initOptParser("-n 10 -m20 -k= 30 -40", shortNoVal = {'v'})
# ^-disappears
```
This is with the aforementioned `allowWhitespaceAfterColon` being true
by default, of course. In this case the `30` token is skipped
completely. I don't think that's right, so it's fixed.
Things I still don't like about how the old parser and the new default
mode behave:
1. **Parser behaviour is controlled by an emptiness of two containers**.
This is an interesting approach. It's also made more interesting because
the `shortNoVal`/`longNoVal` control both the namesakes, but *and also
how their opposites (value-taking opts) work*.
---
**Edit:**
2. `shortNoVal` is not mandatory:
```nim
let p = initOptParser(@["-a=foo"], shortNoVal = {'a'})
# Nim, Lax parses as: (cmdShortOption, "a", "foo")
# GnuMode parses as: (cmdShortOption, "a", "=foo")
```
In this case, even though the user specified `a` as no no-val, parser
ignores it, relying only on the syntax to decide the kind of the
argument. This is especially problematic with the modes that don't use
the rule `prShortAllowSep` (GnuMode), in this case the provided input is
twice invalid, regardless of the `shortNoVal`.
With the current parser architecture, parsing it this way **is
inevitable**, though. We don't have any way to signal the error state
detected with the input, so the user is expected to validate the input
for mistakes.
Bundling positional arguments is nonsensical and short option can't use
the separator character, so `[cmd "a", arg "=foo"]` and `[cmd "a", cmd
"=", cmd "f"...]` are both out of the question **and** would complicate
validating, requiring keeping track of a previous argument. Hope I'm
clear enough on the issue.
**Future work:**
1. Looks like the new modes are already usable, but from the discussions
elsewhere it looks like we might want to support special-casing
multi-digit short options (`-XX..`) to allow numerical options greater
than 9. This complicates bundling, though, so requires a bit of thinking
through.
2. Signaling error state?
---------
Co-authored-by: Andreas Rumpf <[email protected]>
(cherry picked from commit 7c873ca)
Adds configurable parser modes to std/parseopt module. Take two.
Initially solved the issue of not being able to pass arguments to short options as you do with most everyday CLI programs, but reading the tests made me add more features so that some of the behaviour could be changed and here we are.
std/parseoptnow supports three parser modes via an optionalmodeparameter ininitOptParserandgetopt.Three modes are provided:
NimMode(default, fully backward compatible),LaxMode(POSIX-inspired with relaxed short option handling),GnuMode(stricter GNU-style conventions).The new modes are marked as experimental in the documentation.
The parser behaviour is controlled by a new
ParserRulesenum, which provides granular feature flags that modes are built from. This makes it possible for users with specific requirements to define custom rule sets by importing private symbols, this is mentioned but clearly marked as unsupported.Backward compatibility:
The default mode preserves existing behaviour completely, with a single exception:
allowWhitespaceAfterColonis deprecated.Now,
allowWhitespaceAfterColondoesn't make much sense as a single tuning knob. TheParserRule.prSepAllowDelimAftercontrols this now.As
allowWhitespaceAfterColonhad a default, most calls never mention it so they will silently migrate to the newinitOptParseroverload. To cover cases when the proc param was used at call-site, I added an overload, which modifies the default parser mode to reflect the requiredallowWhitespaceAfterColonvalue. Should be all smooth for most users, except the deprecation warning.The only thing I think can be classified as the breaking change is a surprising bug of the old parser:
This is with the aforementioned
allowWhitespaceAfterColonbeing true by default, of course. In this case the30token is skipped completely. I don't think that's right, so it's fixed.Things I still don't like about how the old parser and the new default mode behave:
shortNoVal/longNoValcontrol both the namesakes, but and also how their opposites (value-taking opts) work.Edit:
shortNoValis not mandatory:In this case, even though the user specified
aas no no-val, parser ignores it, relying only on the syntax to decide the kind of the argument. This is especially problematic with the modes that don't use the ruleprShortAllowSep(GnuMode), in this case the provided input is twice invalid, regardless of theshortNoVal.With the current parser architecture, parsing it this way is inevitable, though. We don't have any way to signal the error state detected with the input, so the user is expected to validate the input for mistakes.
Bundling positional arguments is nonsensical and short option can't use the separator character, so
[cmd "a", arg "=foo"]and[cmd "a", cmd "=", cmd "f"...]are both out of the question and would complicate validating, requiring keeping track of a previous argument. Hope I'm clear enough on the issue.Future work:
Looks like the new modes are already usable, but from the discussions elsewhere it looks like we might want to support special-casing multi-digit short options (
-XX..) to allow numerical options greater than 9. This complicates bundling, though, so requires a bit of thinking through.Signaling error state?