Skip to content

Feat: std: parseopt parser modes#25506

Merged
Araq merged 7 commits intonim-lang:develfrom
ZoomRmc:zparseopt2
Feb 16, 2026
Merged

Feat: std: parseopt parser modes#25506
Araq merged 7 commits intonim-lang:develfrom
ZoomRmc:zparseopt2

Conversation

@ZoomRmc
Copy link
Copy Markdown
Contributor

@ZoomRmc ZoomRmc commented Feb 12, 2026

Adds configurable parser modes to std/parseopt module. Take two.

Initially solved the issue of not being able to pass arguments to short options as you do with most everyday CLI programs, but reading the tests made me add more features so that some of the behaviour could be changed and here we are.

std/parseopt now supports three parser modes via an optional mode parameter in initOptParser and getopt.

Three modes are provided:

  • NimMode (default, fully backward compatible),
  • LaxMode (POSIX-inspired with relaxed short option handling),
  • GnuMode (stricter GNU-style conventions).

The new modes are marked as experimental in the documentation.

The parser behaviour is controlled by a new ParserRules enum, which provides granular feature flags that modes are built from. This makes it possible for users with specific requirements to define custom rule sets by importing private symbols, this is mentioned but clearly marked as unsupported.

Backward compatibility:

The default mode preserves existing behaviour completely, with a single exception: allowWhitespaceAfterColon is deprecated.

Now, allowWhitespaceAfterColon doesn't make much sense as a single tuning knob. The ParserRule.prSepAllowDelimAfter controls this now.
As allowWhitespaceAfterColon had a default, most calls never mention it so they will silently migrate to the new initOptParser overload. To cover cases when the proc param was used at call-site, I added an overload, which modifies the default parser mode to reflect the required allowWhitespaceAfterColon value. Should be all smooth for most users, except the deprecation warning.

The only thing I think can be classified as the breaking change is a surprising bug of the old parser:

let p = initOptParser("-n 10 -m20 -k= 30 -40",  shortNoVal =  {'v'})
#                                     ^-disappears

This is with the aforementioned allowWhitespaceAfterColon being true by default, of course. In this case the 30 token is skipped completely. I don't think that's right, so it's fixed.

Things I still don't like about how the old parser and the new default mode behave:

  1. Parser behaviour is controlled by an emptiness of two containers. This is an interesting approach. It's also made more interesting because the shortNoVal/longNoVal control both the namesakes, but and also how their opposites (value-taking opts) work.

Edit:

  1. shortNoVal is not mandatory:

    let p = initOptParser(@["-a=foo"], shortNoVal = {'a'})
    # Nim, Lax parses as: (cmdShortOption, "a", "foo") 
    # GnuMode  parses as: (cmdShortOption, "a", "=foo")

    In this case, even though the user specified a as no no-val, parser ignores it, relying only on the syntax to decide the kind of the argument. This is especially problematic with the modes that don't use the rule prShortAllowSep (GnuMode), in this case the provided input is twice invalid, regardless of the shortNoVal.

    With the current parser architecture, parsing it this way is inevitable, though. We don't have any way to signal the error state detected with the input, so the user is expected to validate the input for mistakes.
    Bundling positional arguments is nonsensical and short option can't use the separator character, so [cmd "a", arg "=foo"] and [cmd "a", cmd "=", cmd "f"...] are both out of the question and would complicate validating, requiring keeping track of a previous argument. Hope I'm clear enough on the issue.

Future work:

  1. Looks like the new modes are already usable, but from the discussions elsewhere it looks like we might want to support special-casing multi-digit short options (-XX..) to allow numerical options greater than 9. This complicates bundling, though, so requires a bit of thinking through.

  2. Signaling error state?

Adds configurable parser modes to std/parseopt module. Take two.

std/parseopt now supports three parser modes via an optional mode
parameter in initOptParser and getopt.

Three modes are provided:

-  NimMode (default, fully backward compatible),
-  LaxMode (POSIX-inspired with relaxed short option handling),
-  GnuMode (stricter GNU-style conventions).
@Araq
Copy link
Copy Markdown
Member

Araq commented Feb 16, 2026

The implementation seems to be alright, the documentation leaves a lot to be desired. Keep the existing docs mostly as they are. Add a sentence: "There are different parsing modes. NimMode is the default. Read this for more information of the other modes and how they differ from each other." Make a separate .md documentation page and link to the produced html.

@ZoomRmc
Copy link
Copy Markdown
Contributor Author

ZoomRmc commented Feb 16, 2026

the documentation leaves a lot to be desired

Could you elaborate a bit? Do you mean only the new additions or the whole thing?

Keep the existing docs mostly as they are. Add a sentence: "There are different parsing modes. NimMode is the default. Read this for more information of the other modes and how they differ from each other." Make a separate .md documentation page and link to the produced html.

Can't say I like this suggestion very much. Can do that, of course. I'm a bit concerned that we'll have to link back a lot and the routine-level documentation won't be visible alongside. What are we trying to achieve there exactly, articulate the experimental nature of the new modes?

@Araq
Copy link
Copy Markdown
Member

Araq commented Feb 16, 2026

I'm trying to achieve that newcomers can use parseopt without being concerned about edge cases and without being overwhelmed with details. parseopt should be simple to understand&use.

@ZoomRmc
Copy link
Copy Markdown
Contributor Author

ZoomRmc commented Feb 16, 2026

That would require a redesign. parseopt is quite a low-level parser, not something inherently simple. Moreover, the bulk of the existing docs is explaining the idiosyncratic nature of the NoVal args changing how the parser works.

I also don't like separating the new additions to the docs because at leat the third of them deals with explaining how the default mode works. This is helpful for the newcomers because, for example, the GnuMode's behavior is much more common, so they can easily see how the default mode differs.

@Araq
Copy link
Copy Markdown
Member

Araq commented Feb 16, 2026

Well but before you came along the docs were simpler and I've seen many people use parseopt successfully throughout the years. There is not much "low level" going on here, you need a loop and a case statement. The alternatives always come down to requiring some mapping between CLI switch and data managment too (they need their own version of the case statement). Which is exactly why I never felt the need to DSL this step, Nim's case already works on strings, that's all that matters.

You're concerned with the edge cases, good, but the existing documentation was sufficient.

@ZoomRmc
Copy link
Copy Markdown
Contributor Author

ZoomRmc commented Feb 16, 2026

Well but before you came along the docs were simpler

They weren't, because I didn't touch any docs beside one sentence at the start and converting examples to runnable. They weren't easy to follow either, but it's a matter of opinion.

I'm doing what you asked, even though it doesn't feel nice. Splitting the new additions requires improvements in other parts to bring them to informational parity, and improving the old docs extends the scope of this PR.

@Araq
Copy link
Copy Markdown
Member

Araq commented Feb 16, 2026

No, I'm taking this version now. Nobody reads this anyway, people will vibecode and the AI has no attention deficit.

@Araq Araq merged commit 7c873ca into nim-lang:devel Feb 16, 2026
18 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from 7c873ca

Hint: mm: orc; opt: speed; options: -d:release
189824 lines; 11.567s; 801.609MiB peakmem

Araq pushed a commit that referenced this pull request Feb 16, 2026
Follow-up to #25506.
As I mentioned there, I was in the middle of an edit, so here it is.
Splitting to a separate doc skipped.

A couple of minor mistakes fixed, some things made a bit more concise
and short.
narimiran pushed a commit that referenced this pull request Feb 20, 2026
Adds configurable parser modes to std/parseopt module. **Take two.**

Initially solved the issue of not being able to pass arguments to short
options as you do with most everyday CLI programs, but reading the tests
made me add more features so that some of the behaviour could be changed
and here we are.

**`std/parseopt` now supports three parser modes** via an optional
`mode` parameter in `initOptParser` and `getopt`.

Three modes are provided:
- `NimMode` (default, fully backward compatible),
- `LaxMode` (POSIX-inspired with relaxed short option handling),
- `GnuMode` (stricter GNU-style conventions).

The new modes are marked as experimental in the documentation.

The parser behaviour is controlled by a new `ParserRules` enum, which
provides granular feature flags that modes are built from. This makes it
possible for users with specific requirements to define custom rule sets
by importing private symbols, this is mentioned but clearly marked as
unsupported.

**Backward compatibility:**

The default mode preserves existing behaviour completely, with a single
exception: `allowWhitespaceAfterColon` is deprecated.

Now, `allowWhitespaceAfterColon` doesn't make much sense as a single
tuning knob. The `ParserRule.prSepAllowDelimAfter` controls this now.
As `allowWhitespaceAfterColon` had a default, most calls never mention
it so they will silently migrate to the new `initOptParser` overload. To
cover cases when the proc param was used at call-site, I added an
overload, which modifies the default parser mode to reflect the required
`allowWhitespaceAfterColon` value. Should be all smooth for most users,
except the deprecation warning.

The only thing I think can be classified as the breaking change is a
surprising **bug** of the old parser:

```nim
let p = initOptParser("-n 10 -m20 -k= 30 -40",  shortNoVal =  {'v'})
#                                     ^-disappears
```

This is with the aforementioned `allowWhitespaceAfterColon` being true
by default, of course. In this case the `30` token is skipped
completely. I don't think that's right, so it's fixed.

Things I still don't like about how the old parser and the new default
mode behave:

1. **Parser behaviour is controlled by an emptiness of two containers**.
This is an interesting approach. It's also made more interesting because
the `shortNoVal`/`longNoVal` control both the namesakes, but *and also
how their opposites (value-taking opts) work*.
---

**Edit:**

2. `shortNoVal` is not mandatory:
    ```nim
	let p = initOptParser(@["-a=foo"], shortNoVal = {'a'})
	# Nim, Lax parses as: (cmdShortOption, "a", "foo")
	# GnuMode  parses as: (cmdShortOption, "a", "=foo")
	```
In this case, even though the user specified `a` as no no-val, parser
ignores it, relying only on the syntax to decide the kind of the
argument. This is especially problematic with the modes that don't use
the rule `prShortAllowSep` (GnuMode), in this case the provided input is
twice invalid, regardless of the `shortNoVal`.

With the current parser architecture, parsing it this way **is
inevitable**, though. We don't have any way to signal the error state
detected with the input, so the user is expected to validate the input
for mistakes.
Bundling positional arguments is nonsensical and short option can't use
the separator character, so `[cmd "a", arg "=foo"]` and `[cmd "a", cmd
"=", cmd "f"...]` are both out of the question **and** would complicate
validating, requiring keeping track of a previous argument. Hope I'm
clear enough on the issue.

**Future work:**

1. Looks like the new modes are already usable, but from the discussions
elsewhere it looks like we might want to support special-casing
multi-digit short options (`-XX..`) to allow numerical options greater
than 9. This complicates bundling, though, so requires a bit of thinking
through.

2. Signaling error state?

---------

Co-authored-by: Andreas Rumpf <[email protected]>
(cherry picked from commit 7c873ca)
narimiran pushed a commit that referenced this pull request Feb 20, 2026
Follow-up to #25506.
As I mentioned there, I was in the middle of an edit, so here it is.
Splitting to a separate doc skipped.

A couple of minor mistakes fixed, some things made a bit more concise
and short.

(cherry picked from commit 72e9bfe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants