Improve vscode syntax highlighting#6543
Conversation
josh11b
left a comment
There was a problem hiding this comment.
I'm finding this difficult to review. Is there a way to include comments in the file saying what you are doing? Also, possibly break this into smaller changes that can each get their own explanation?
ozguronsoy
left a comment
There was a problem hiding this comment.
I added comments to the new patterns explaining what they highlight.
| }, | ||
| { | ||
| "name": "support.class.carbon", | ||
| "comment": "Handles `class C`, `class C(I)`, `impl C as I(A)`, etc.", |
There was a problem hiding this comment.
I don't see this in the schema? The only use of comment keys I see are for syntax-highlighting comments in the underlying source code.
What would help here are comments in the actual grammar definition. It would also be good to include the explanations from your code review replies in comments.
While JSON strictly doesn't have a comment syntax, many parsers accept // ... comments, and the spec for textmate grammars seems to use these in its examples, so maybe that works here? https://macromates.com/manual/en/language_grammars
There was a problem hiding this comment.
Unfortunately // style comments do not work with the vscode textmate parser. The comment key is not in the schema but the parser just ignores the invalid keys.
ozguronsoy
left a comment
There was a problem hiding this comment.
Added comments from review and limited numeric type literal highlighting. It used to highlight anything that starts with i, u, or f and contains only numbers after like i1. Now it highlights common types.
ozguronsoy
left a comment
There was a problem hiding this comment.
- Removed matching compound member access.
- Fixed matching multiple generic parameters
class C(I1, I2, D(E).F). - Fixed matching dot chaining
impl C as I(A.B).
ozguronsoy
left a comment
There was a problem hiding this comment.
- Added
consttomodifier-keywords. - Added
charandstrtotype-literals. - Refactored type, function, and variable matching to use
begin/endto simplify regex and handle line breaks. - Fixed nested parens, commas, and dots when matching function return types.
- Fixed broken type highlighting after
forall. - Added
adapt,alias,constraint,impls,where, andchoiceto type matching rules. - Highlighted choice values as enum values.
Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.
It looks like this PR has gradually grown in scope over time (eg const, char, str). I am going to take over review, so just want to express that I would like to avoid that continuing during the review process. We can always add more things later. |
Also, there is a lot going on here in the comments explaining various things being done and added. Can those be repeated/lifted up into the PR description so they will go in the git log? |
Done. |
danakj
left a comment
There was a problem hiding this comment.
It's hard to test this stuff, but even so I wonder if we could generate a comprehensive set of examples that demonstrate all of the rules here, and generate a screenshot of it, and check those into the tree with them. Then we would have a sort of regression test as long as we update the screenshot when we change the rules. We could add a README.md describing the process, as well as how to generate the screenshot in a consistent way so that we can look for differences easily, if that's possible?
| "name": "keyword.operator.carbon", | ||
| "match": "\\b(addr|and|as|impls|in|like|not|or|partial|template|where)\\b" | ||
| "name": "keyword.control.carbon", | ||
| "match": "\\b(addr|and|as|impls|in|like|not|or|partial|ref|template|where)\\b" |
There was a problem hiding this comment.
ref seems like it would fit more correctly with let and var (which I see in two different places), how come it's here?
likewise as is part of meta.type.carbon now, and where is up there as well. Do they need to be duplicated here? Does that override their status somehow?
And btw addr has been removed from the language, replaced by ref.
There was a problem hiding this comment.
ref seems like it would fit more correctly with let and var
Do you mean in introducer-keywords or in variables?
likewise as is part of meta.type.carbon now, and where is up there as well. Do they need to be duplicated here? Does that override their status somehow?
Yes, they should also be here. Otherwise a rule that uses begin/end might "consume" these keywords since they also wouldn't be in #reserved.
And btw addr has been removed from the language, replaced by ref.
Okay, thanks. I'll remove it.
Quick further thought - if we can do that then we should be able to generate an image of those comprehensive examples before this PR and after, to experience the process of looking at said image diffs. |
Co-authored-by: Dana Jansens <[email protected]>
Co-authored-by: Dana Jansens <[email protected]>
|
been busy ay I agree highlighting invalid tokens should be done by semantic highlighting https://www.npmjs.com/package/textmate-grammar-test can be used for writing tests my extension has a code-action for optimizing the regexes |
I added examples and before/after screenshots. I'll add a description to |
Thanks, I should have a chance to look tomorrow. |
danakj
left a comment
There was a problem hiding this comment.
Thanks, I like the before/afters a lot, it made it much easier to understand the change. But it's also a lot of files to regenerate all the time, so it would really really nice if we can automate it, even just on Linux. Do you have any thoughts on that? I see there are some tools around for this, though I am not sure how well they'd work?
There was a problem hiding this comment.
If we have to split the image file, then we should split the carbon file. Maybe one for class/interface/constraint and one for variables/functions and one for choice (which will get more complicated eventually)?
Or maybe the variables/functions part could move into the variables.carbon and functions.carbon files?
Also let's include an impl as and extend impl as inside a class, and an extend require impls in the constraint.
| "end": "(?=[;{])", | ||
| "patterns": [ | ||
| { "include": "#reserved" }, | ||
| { "include": "#variables" }, |
There was a problem hiding this comment.
If we can just keep one of these, that sounds nicer.
In practice I think parameter bindings is going to break with things like ref and const ref (and later, safety annotations), so we'll need more than a simple regex match, and more like variables with the begin/end, but without requiring it to start with var or let.
Bit of convo about generating the images in #infra. The high-level bit is that this syntax highlighting system will be replaced entirely by the LSP in the future, and it may not be that much work to get the LSP working well enough to replace this. So it's not worth putting too much effort into this textmate stuff. For this PR, I propose we check in the samples but don't add any work into automating them. It's a bit of a pain to update them, but maybe we won't want to keep updating them. For next steps, I wondered if you'd be interested in working on the LSP to get it to the point where it can replace this textmate stuff? |
|
semantic highlighting should only be used as an enhancement, for when TextMate falls short, not a replacement |
danakj
left a comment
There was a problem hiding this comment.
Thanks! This LG now generally.
Thanks for the before/after, since we had no "before" when this PR was created, it allowed for a nice comparison.
For checking it in, can we remove the before jpgs (they don't need to live in the tree), and rename the others to drop the _after suffix. Then in the future it will simply become a diff of the image in the tree vs the image in a PR.
Yeah, I think automizing tests for textmate would be too much effort. FWIW, I think textmate should stay for highlighting until LSP launches. When using slow hardware, it takes minutes to launch the LSP.
Sure, I'll take a look at it after this. |
Are you saying that launching |
I wasn't talking about carbon, it's sub-second for me too but I develop carbon with my own setup. Most defense sector companies don't allow that and give you an old laptop with two cores. Those laptops struggle with everything though but one of the things that irritated me was waiting for the highlighting to work. Maybe that's just my experience. |
To be sure, if you're used to using clangd, I understand a poor performance expectation. There are multiple reasons for that, such as C++ needing parsing of headers in order to understand code, and also elements of Clang's AST perhaps giving some marginal slowdown too. In contrast though, Carbon's a context-free grammar; in the case of highlighting, we should be able to highlight with a parse (of the single file being higlighted). Since we're using a hand-built parser, it might actually be faster than textmate's regular expressions. |
This PR improves the vscode syntax highlighting.
commentkeys.0x,0b,0xa, etc.i1.as).charandstrto type literals.constto modifier keywords.addrkeyword.begin/endto handle line breaks.choicevalues asenumvalues.types.adapt,alias,choice,constraint,impl,interface,as, andimpls.functions.variables.variable.