Skip to content

Conversation

@jeanas
Copy link
Contributor

@jeanas jeanas commented Nov 8, 2023

The new lexer matches the TOML spec much more closely.

User-visible differences should be these:

  • Add MIME type
  • Highlight string escapes
  • Recognize \uXXXX and \UXXXX escapes
  • Also recognize booleans if they are followed by a comment
  • Fix single quotes inside multiline literal strings (closes TOML: Multi-line string lexing issue when using same quote character #2488)
  • Prevent multiline literal strings from eating comments
  • Add multiline basic strings (""")
  • Improve datetime recognition: recognize times without dates, dates without times and datetimes without time zone; allow sub-millisecond precision
  • Recognize floats with exponents (they used not to be recognized when having a decimal point)
  • Recognize binary, octal and hex literals
  • Recognize strings inside table headers
  • Recognize table headers followed by comments
  • Don't parse sequences of digits as integers when they are actually keys

Includes several new tests, most of which were not working before.

The new lexer matches the TOML spec much more closely.

User-visible differences should be these:

* Add MIME type
* Highlight string escapes
* Recognize \uXXXX and \UXXXX escapes
* Also recognize booleans if they are followed by a comment
* Fix single quotes inside multiline literal strings (closes pygments#2488)
* Prevent multiline literal strings from eating comments
* Add multiline basic strings (""")
* Improve datetime recognition: recognize times without
  dates, dates without times and datetimes without time zone;
  allow sub-millisecond precision
* Recognize floats with exponents (they used not to be recognized
  when having a decimal point)
* Recognize binary, octal and hex literals
* Recognize strings inside table headers
* Recognize table headers followed by comments
* Don't parse sequences of digits as integers when they
  are actually keys

Includes several new tests, most of which were not working before.
Copy link
Member

@birkenfeld birkenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

(r'[A-Za-z0-9_-]+', Keyword),
(r'"', String.Double, 'basic-string'),
(r"'", String.Single, 'literal-string'),
(r'\.', Keyword),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to put the dot in the above char class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep this state similar to 'key'. I'd be fine with changing it though.

@jeanas
Copy link
Contributor Author

jeanas commented Nov 9, 2023

Oops — I realized that a [ \t]+ was missing in table-key because TOML allows table headers like [ foo . bar ].

Fixed, with a test.

@Anteru Anteru added the A-lexing area: changes to individual lexers label Nov 10, 2023
@Anteru
Copy link
Collaborator

Anteru commented Nov 10, 2023

Looks good. Thanks! I'll try to wrap up a new release this or next weekend.

@Anteru Anteru merged commit 6bc0332 into pygments:master Nov 10, 2023
@Anteru Anteru added this to the 2.17 milestone Nov 10, 2023
@Anteru Anteru added the changelog-update Items which need to get mentioned in the changelog label Nov 10, 2023
@jeanas jeanas deleted the toml branch November 10, 2023 19:37
@jeanas jeanas removed the changelog-update Items which need to get mentioned in the changelog label Nov 11, 2023
(r'[^"\\]+', String.Double),
],
'literal-string': [
(r".*'", String.Single, '#pop'),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bug. This will capture too much if there is a literal string followed by a comment containing '. This broke mypy docs build on this line:

              'two\.pyi$',  # but TOML's single-quoted strings do not

see https://github.com/python/mypy/actions/runs/6916132945/job/18815845352

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gosh. There should of course have been a ? here.

And the worst is I wrote a test exactly for this, but I missed that the output was wrong. I probably made some slight change after I checked all the golden outputs.

Sorry about that, will fix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, NP!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 220a2a9.

@Anteru In case you have some spare time to do a bugfix release... Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear. I'll try to get it done today, somehow. I'm always afraid this happens, and our release process is still fairly manual :( Goals for 2024 I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you want me to step in. Also, what would you like to automate?

Copy link
Collaborator

@Anteru Anteru Nov 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things I'd like to automate:

  • From tag to PyPi -- ideally, to test-pypi on every tagged commit (https://github.com/marketplace/actions/pypi-publish) -- and the actual release would be a special action I just click on. It's not that it takes a lot of time, but I'm always nervous I mess something up with the command line, forget to delete a file, git clean, etc. -- I very diligently work through the release-checklist to avoid that. Literally signing things off.
  • Auto-formatting -- I tend to clean up the formatting of the lexers every time close to release, at least the worst offenders. I use autopep8 at the moment, would rather apply flake8 or black on the entire codebase.
  • Auto-check that the new arguments like URL etc. are present on new lexers
  • Auto-check .. versionadded:: is there -- costs me a lot of time to open up every Lexer close to release and make sure it's present and in the right format (i.e. 2.17.0 vs. 2.17)
  • Actually get all checks working/passing (i.e. the additional checkers I wrote and possibly PyLint). check_whitespace_tokens and check_repeated_tokens need an expected-fail list so we can whitelist currently existing lexers until we fix those, but new lexers should always pass those tests.
  • Verify all PR numbers closed/merged since last release are mentioned in the CHANGES file. I'm pretty good at assigning tasks to milestones now, but I still miss things in the CHANGES file, and it's super time consuming to open 100 tabs, go through each item one-by-item, check the PR number/issue number is present, etc. If there was a way to auto-generate the changelog that would be even better, but my experience is that those look pretty ugly and some manual checkup is fine.

I'll get to the release in a moment, thanks for the offer though!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick fix! 👍

@jeanas jeanas mentioned this pull request Nov 19, 2023
awelzel added a commit to zeek/zeek-docs that referenced this pull request Nov 20, 2023
With Pygments 2.17+, the TOML parser was rewritten [^1][^2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[^1] https://pygments.org/docs/changelog/#version-2-17-0
[^2] pygments/pygments#2576
awelzel added a commit to zeek/zeek-docs that referenced this pull request Nov 20, 2023
With Pygments 2.17+, the TOML parser was rewritten[^1] and[^2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[^1] https://pygments.org/docs/changelog/#version-2-17-0
[^2] pygments/pygments#2576
awelzel added a commit to zeek/zeek-docs that referenced this pull request Nov 20, 2023
With Pygments 2.17+, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576
awelzel added a commit to zeek/zeek-docs that referenced this pull request Nov 20, 2023
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576
ckreibich pushed a commit to zeek/zeek-docs that referenced this pull request Jan 18, 2024
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576

(cherry picked from commit 7ff64f3)
ckreibich pushed a commit to zeek/zeek-docs that referenced this pull request Jan 18, 2024
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576

(cherry picked from commit 7ff64f3)
timwoj pushed a commit to zeek/zeek-docs that referenced this pull request Sep 11, 2025
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576
timwoj pushed a commit to zeek/zeek that referenced this pull request Sep 12, 2025
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576
timwoj pushed a commit to zeek/zeek that referenced this pull request Sep 12, 2025
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576
timwoj pushed a commit to zeek/zeek that referenced this pull request Sep 15, 2025
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576
timwoj pushed a commit to zeek/zeek that referenced this pull request Sep 15, 2025
With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails
to parse and highlight the `full-config.ini` file. The key-only `agent-testbox`
within `[instances]` makes the standard toml parsing barf, too. Flip to ini.

[1] https://pygments.org/docs/changelog/#version-2-17-0
[2] pygments/pygments#2576
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-lexing area: changes to individual lexers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TOML: Multi-line string lexing issue when using same quote character

4 participants