-
Notifications
You must be signed in to change notification settings - Fork 766
Revise TOML lexer #2576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise TOML lexer #2576
Conversation
The new lexer matches the TOML spec much more closely. User-visible differences should be these: * Add MIME type * Highlight string escapes * Recognize \uXXXX and \UXXXX escapes * Also recognize booleans if they are followed by a comment * Fix single quotes inside multiline literal strings (closes pygments#2488) * Prevent multiline literal strings from eating comments * Add multiline basic strings (""") * Improve datetime recognition: recognize times without dates, dates without times and datetimes without time zone; allow sub-millisecond precision * Recognize floats with exponents (they used not to be recognized when having a decimal point) * Recognize binary, octal and hex literals * Recognize strings inside table headers * Recognize table headers followed by comments * Don't parse sequences of digits as integers when they are actually keys Includes several new tests, most of which were not working before.
birkenfeld
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| (r'[A-Za-z0-9_-]+', Keyword), | ||
| (r'"', String.Double, 'basic-string'), | ||
| (r"'", String.Single, 'literal-string'), | ||
| (r'\.', Keyword), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to put the dot in the above char class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to keep this state similar to 'key'. I'd be fine with changing it though.
|
Oops — I realized that a Fixed, with a test. |
|
Looks good. Thanks! I'll try to wrap up a new release this or next weekend. |
| (r'[^"\\]+', String.Double), | ||
| ], | ||
| 'literal-string': [ | ||
| (r".*'", String.Single, '#pop'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bug. This will capture too much if there is a literal string followed by a comment containing '. This broke mypy docs build on this line:
'two\.pyi$', # but TOML's single-quoted strings do not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gosh. There should of course have been a ? here.
And the worst is I wrote a test exactly for this, but I missed that the output was wrong. I probably made some slight change after I checked all the golden outputs.
Sorry about that, will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, NP!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dear. I'll try to get it done today, somehow. I'm always afraid this happens, and our release process is still fairly manual :( Goals for 2024 I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you want me to step in. Also, what would you like to automate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things I'd like to automate:
- From tag to PyPi -- ideally, to test-pypi on every tagged commit (https://github.com/marketplace/actions/pypi-publish) -- and the actual release would be a special action I just click on. It's not that it takes a lot of time, but I'm always nervous I mess something up with the command line, forget to delete a file, git clean, etc. -- I very diligently work through the
release-checklistto avoid that. Literally signing things off. - Auto-formatting -- I tend to clean up the formatting of the lexers every time close to release, at least the worst offenders. I use
autopep8at the moment, would rather applyflake8orblackon the entire codebase. - Auto-check that the new arguments like URL etc. are present on new lexers
- Auto-check
.. versionadded::is there -- costs me a lot of time to open up every Lexer close to release and make sure it's present and in the right format (i.e.2.17.0vs.2.17) - Actually get all checks working/passing (i.e. the additional checkers I wrote and possibly PyLint).
check_whitespace_tokensandcheck_repeated_tokensneed an expected-fail list so we can whitelist currently existing lexers until we fix those, but new lexers should always pass those tests. - Verify all PR numbers closed/merged since last release are mentioned in the
CHANGESfile. I'm pretty good at assigning tasks to milestones now, but I still miss things in theCHANGESfile, and it's super time consuming to open 100 tabs, go through each item one-by-item, check the PR number/issue number is present, etc. If there was a way to auto-generate the changelog that would be even better, but my experience is that those look pretty ugly and some manual checkup is fine.
I'll get to the release in a moment, thanks for the offer though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick fix! 👍
With Pygments 2.17+, the TOML parser was rewritten [^1][^2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [^1] https://pygments.org/docs/changelog/#version-2-17-0 [^2] pygments/pygments#2576
With Pygments 2.17+, the TOML parser was rewritten[^1] and[^2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [^1] https://pygments.org/docs/changelog/#version-2-17-0 [^2] pygments/pygments#2576
With Pygments 2.17+, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576 (cherry picked from commit 7ff64f3)
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576 (cherry picked from commit 7ff64f3)
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576
The new lexer matches the TOML spec much more closely.
User-visible differences should be these:
Includes several new tests, most of which were not working before.