Skip to content

Conversation

@uvtc
Copy link
Contributor

@uvtc uvtc commented Aug 23, 2020

No description provided.

@uvtc
Copy link
Contributor Author

uvtc commented Aug 23, 2020

Janet syntax is lisp-like, and also has similarities with Clojure (one main difference from both is that Janet uses # for comments instead of ;).

@Anteru Anteru self-assigned this Aug 24, 2020
(r'0[xX][a-fA-F0-9]+', Number.Hex),

# double-quoted string
(r'@?"(\\\\|\\"|[^"])*"', String),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to result in catastrophic backtracking (similar for the case below.) Please do two things to make sure we won't suffer from this:

  • Add a test case with a long string with backslashes in a string. That should take a looong time to execute, potentially timeout.
  • Change it to [^\\"] to avoid the backtracking (same below), the test should pass just fine now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Anteru. Sorry for the delay. Thank you for investigating this. Can you please tell me where to add this test case? I'm not familiar with adding tests to pygments, and also don't see a tests/test_lisp.py file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just create a new file then :) The way to write a test is to use the testcase formatter (see also the docs), here's an example of the exact same bug just in the JsonLexer which you can use as a blueprint: gerner@ec77778

Copy link
Collaborator

@Anteru Anteru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see an unit test for the problematic string regex, because that's prone to return if left unchecked. Other than that looks good to me.

@uvtc
Copy link
Contributor Author

uvtc commented Sep 12, 2020

Hi Anteru, thanks so much for your review work, and patience here. My free time as almost completely dried up in the last few weeks, and I'm looking up the git docs now on how to proceed (my git knowledge is currently very thin!).

@uvtc
Copy link
Contributor Author

uvtc commented Sep 12, 2020

Anteru, does your correction ([^\\"]) apply to the regexes for the double-quoted strings and for the backtick-quoted strings? Note that, unlike double-quoted strings, backtick-quoted strings may utilize multiple backticks, as in

  `some str`
 ``some str``
```some str```

Incidentally, both double-quoted and backtick-quoted strings can be multi-line.

@Anteru
Copy link
Collaborator

Anteru commented Sep 12, 2020

I only applied it to double-quoted strings. Do backtick-quoted strings allow for escape characters inside? At some point it may be easier to mimic what the Python lexer does and have separate rules for handling the various escapes.

Also, no need to hurry, we just released 2.7 and 2.8 is not coming that soon :)

@uvtc
Copy link
Contributor Author

uvtc commented Sep 12, 2020

No, backtick-quoted strings do not allow for escape sequences.

@uvtc
Copy link
Contributor Author

uvtc commented Sep 12, 2020

But backtick quoted strings do allow for:

``this is `something` here``
```this is ``something`` here```

The language allows for even more backticks (as long as there's the same number of them starting as ending the string), but so far the most I've seen people use is

```three```

@Anteru
Copy link
Collaborator

Anteru commented Sep 14, 2020

My advice would be to look at the Python lexer family and copy that, using three states -- for single, double, triple quoted backticks. Same as the string handling there -- that'll be much less hassle than any other approach IMHO.

@uvtc
Copy link
Contributor Author

uvtc commented Nov 16, 2020

Sorry I dropped the ball on this. Unable to make time to look more at this right now, and my git knowledge needs to be better so I can get this current before continuing. I'd like to return to it when I can, but if someone wants to pick it up and run with it that's fine with me.

@jeanas jeanas added the update needed Waiting for an update from the PR/issue creator label Mar 2, 2022
@sogaiu sogaiu mentioned this pull request Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

update needed Waiting for an update from the PR/issue creator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants