-
Notifications
You must be signed in to change notification settings - Fork 766
Added lexer for Janet language. #1519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Janet syntax is lisp-like, and also has similarities with Clojure (one main difference from both is that Janet uses |
| (r'0[xX][a-fA-F0-9]+', Number.Hex), | ||
|
|
||
| # double-quoted string | ||
| (r'@?"(\\\\|\\"|[^"])*"', String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to result in catastrophic backtracking (similar for the case below.) Please do two things to make sure we won't suffer from this:
- Add a test case with a long string with backslashes in a string. That should take a looong time to execute, potentially timeout.
- Change it to
[^\\"]to avoid the backtracking (same below), the test should pass just fine now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Anteru. Sorry for the delay. Thank you for investigating this. Can you please tell me where to add this test case? I'm not familiar with adding tests to pygments, and also don't see a tests/test_lisp.py file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just create a new file then :) The way to write a test is to use the testcase formatter (see also the docs), here's an example of the exact same bug just in the JsonLexer which you can use as a blueprint: gerner@ec77778
Anteru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see an unit test for the problematic string regex, because that's prone to return if left unchecked. Other than that looks good to me.
|
Hi Anteru, thanks so much for your review work, and patience here. My free time as almost completely dried up in the last few weeks, and I'm looking up the git docs now on how to proceed (my git knowledge is currently very thin!). |
|
Anteru, does your correction ( Incidentally, both double-quoted and backtick-quoted strings can be multi-line. |
|
I only applied it to double-quoted strings. Do backtick-quoted strings allow for escape characters inside? At some point it may be easier to mimic what the Python lexer does and have separate rules for handling the various escapes. Also, no need to hurry, we just released 2.7 and 2.8 is not coming that soon :) |
|
No, backtick-quoted strings do not allow for escape sequences. |
|
But backtick quoted strings do allow for: The language allows for even more backticks (as long as there's the same number of them starting as ending the string), but so far the most I've seen people use is |
|
My advice would be to look at the Python lexer family and copy that, using three states -- for single, double, triple quoted backticks. Same as the string handling there -- that'll be much less hassle than any other approach IMHO. |
|
Sorry I dropped the ball on this. Unable to make time to look more at this right now, and my git knowledge needs to be better so I can get this current before continuing. I'd like to return to it when I can, but if someone wants to pick it up and run with it that's fine with me. |
No description provided.