Skip to content

Improve vscode syntax highlighting#6543

Merged
danakj merged 19 commits intocarbon-language:trunkfrom
ozguronsoy:fix/syntax-higlighting
Feb 13, 2026
Merged

Improve vscode syntax highlighting#6543
danakj merged 19 commits intocarbon-language:trunkfrom
ozguronsoy:fix/syntax-higlighting

Conversation

@ozguronsoy
Copy link
Contributor

@ozguronsoy ozguronsoy commented Dec 30, 2025

This PR improves the vscode syntax highlighting.

  • Added comment keys.
  • Added highlighting of invalid numbers such as 0x, 0b, 0xa, etc.
  • Restricted highlighting of numeric type literals to common types to avoid highlighting identifiers such as i1.
  • Changed the highlighting of named operators (e.g., as).
  • Added char and str to type literals.
  • Added const to modifier keywords.
  • Removed addr keyword.
  • Refactored some rules to use begin/end to handle line breaks.
  • Added highlighting to choice values as enum values.
  • Updated the rules for matching types.
    • Added highlighting to rhs of adapt, alias, choice, constraint, impl, interface, as, and impls.
    • Added highlighting to rhs of bindings.
    • Added highlighting to function return types.
  • Updated the rules for matching functions.
  • Updated the rules for matching variables.
  • Added highlighting unidentified words as variable.
  • Added examples and before/after screenshots.
Before After
before now

@ozguronsoy ozguronsoy requested a review from a team as a code owner December 30, 2025 23:25
@ozguronsoy ozguronsoy requested review from josh11b and removed request for a team December 30, 2025 23:25
@github-actions github-actions bot added the utilities Utilities for working with Carbon code: syntax highlighting, editor plugins, etc. label Dec 30, 2025
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding this difficult to review. Is there a way to include comments in the file saying what you are doing? Also, possibly break this into smaller changes that can each get their own explanation?

Copy link
Contributor Author

@ozguronsoy ozguronsoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments to the new patterns explaining what they highlight.

},
{
"name": "support.class.carbon",
"comment": "Handles `class C`, `class C(I)`, `impl C as I(A)`, etc.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this in the schema? The only use of comment keys I see are for syntax-highlighting comments in the underlying source code.

What would help here are comments in the actual grammar definition. It would also be good to include the explanations from your code review replies in comments.

While JSON strictly doesn't have a comment syntax, many parsers accept // ... comments, and the spec for textmate grammars seems to use these in its examples, so maybe that works here? https://macromates.com/manual/en/language_grammars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately // style comments do not work with the vscode textmate parser. The comment key is not in the schema but the parser just ignores the invalid keys.

Copy link
Contributor Author

@ozguronsoy ozguronsoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments from review and limited numeric type literal highlighting. It used to highlight anything that starts with i, u, or f and contains only numbers after like i1. Now it highlights common types.

Copy link
Contributor Author

@ozguronsoy ozguronsoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Removed matching compound member access.
  • Fixed matching multiple generic parameters class C(I1, I2, D(E).F).
  • Fixed matching dot chaining impl C as I(A.B).

Copy link
Contributor Author

@ozguronsoy ozguronsoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added const to modifier-keywords.
  • Added char and str to type-literals.
  • Refactored type, function, and variable matching to use begin/end to simplify regex and handle line breaks.
  • Fixed nested parens, commas, and dots when matching function return types.
  • Fixed broken type highlighting after forall.
  • Added adapt, alias, constraint, impls, where, and choice to type matching rules.
  • Highlighted choice values as enum values.

Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.

@ozguronsoy ozguronsoy requested a review from josh11b February 3, 2026 15:17
@danakj
Copy link
Contributor

danakj commented Feb 3, 2026

  • Added const to modifier-keywords.
  • Added char and str to type-literals.
  • Refactored type, function, and variable matching to use begin/end to simplify regex and handle line breaks.
  • Fixed nested parens, commas, and dots when matching function return types.
  • Fixed broken type highlighting after forall.
  • Added adapt, alias, constraint, impls, where, and choice to type matching rules.
  • Highlighted choice values as enum values.

Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.

It looks like this PR has gradually grown in scope over time (eg const, char, str). I am going to take over review, so just want to express that I would like to avoid that continuing during the review process. We can always add more things later.

@danakj danakj requested review from danakj and removed request for josh11b February 3, 2026 17:20
@danakj
Copy link
Contributor

danakj commented Feb 3, 2026

  • Added const to modifier-keywords.
  • Added char and str to type-literals.
  • Refactored type, function, and variable matching to use begin/end to simplify regex and handle line breaks.
  • Fixed nested parens, commas, and dots when matching function return types.
  • Fixed broken type highlighting after forall.
  • Added adapt, alias, constraint, impls, where, and choice to type matching rules.
  • Highlighted choice values as enum values.

Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.

It looks like this PR has gradually grown in scope over time (eg const, char, str). I am going to take over review, so just want to express that I would like to avoid that continuing during the review process. We can always add more things later.

Also, there is a lot going on here in the comments explaining various things being done and added. Can those be repeated/lifted up into the PR description so they will go in the git log?

@ozguronsoy
Copy link
Contributor Author

  • Added const to modifier-keywords.
  • Added char and str to type-literals.
  • Refactored type, function, and variable matching to use begin/end to simplify regex and handle line breaks.
  • Fixed nested parens, commas, and dots when matching function return types.
  • Fixed broken type highlighting after forall.
  • Added adapt, alias, constraint, impls, where, and choice to type matching rules.
  • Highlighted choice values as enum values.

Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.

It looks like this PR has gradually grown in scope over time (eg const, char, str). I am going to take over review, so just want to express that I would like to avoid that continuing during the review process. We can always add more things later.

Also, there is a lot going on here in the comments explaining various things being done and added. Can those be repeated/lifted up into the PR description so they will go in the git log?

Done.

Copy link
Contributor

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to test this stuff, but even so I wonder if we could generate a comprehensive set of examples that demonstrate all of the rules here, and generate a screenshot of it, and check those into the tree with them. Then we would have a sort of regression test as long as we update the screenshot when we change the rules. We could add a README.md describing the process, as well as how to generate the screenshot in a consistent way so that we can look for differences easily, if that's possible?

"name": "keyword.operator.carbon",
"match": "\\b(addr|and|as|impls|in|like|not|or|partial|template|where)\\b"
"name": "keyword.control.carbon",
"match": "\\b(addr|and|as|impls|in|like|not|or|partial|ref|template|where)\\b"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref seems like it would fit more correctly with let and var (which I see in two different places), how come it's here?

likewise as is part of meta.type.carbon now, and where is up there as well. Do they need to be duplicated here? Does that override their status somehow?

And btw addr has been removed from the language, replaced by ref.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref seems like it would fit more correctly with let and var

Do you mean in introducer-keywords or in variables?

likewise as is part of meta.type.carbon now, and where is up there as well. Do they need to be duplicated here? Does that override their status somehow?

Yes, they should also be here. Otherwise a rule that uses begin/end might "consume" these keywords since they also wouldn't be in #reserved.

And btw addr has been removed from the language, replaced by ref.

Okay, thanks. I'll remove it.

@danakj
Copy link
Contributor

danakj commented Feb 3, 2026

It's hard to test this stuff, but even so I wonder if we could generate a comprehensive set of examples that demonstrate all of the rules here, and generate a screenshot of it, and check those into the tree with them. Then we would have a sort of regression test as long as we update the screenshot when we change the rules. We could add a README.md describing the process, as well as how to generate the screenshot in a consistent way so that we can look for differences easily, if that's possible?

Quick further thought - if we can do that then we should be able to generate an image of those comprehensive examples before this PR and after, to experience the process of looking at said image diffs.

@RedCMD
Copy link

RedCMD commented Feb 3, 2026

been busy ay

I agree highlighting invalid tokens should be done by semantic highlighting
you could do it in TextMate
but only if is really simple, small and no possible exceptions

https://www.npmjs.com/package/textmate-grammar-test can be used for writing tests

my extension has a code-action for optimizing the regexes
can use ctrl+enter to preview the changes
https://github.com/slevithan/oniguruma-parser?tab=readme-ov-file#-optimize-regexes

@ozguronsoy
Copy link
Contributor Author

It's hard to test this stuff, but even so I wonder if we could generate a comprehensive set of examples that demonstrate all of the rules here, and generate a screenshot of it, and check those into the tree with them. Then we would have a sort of regression test as long as we update the screenshot when we change the rules. We could add a README.md describing the process, as well as how to generate the screenshot in a consistent way so that we can look for differences easily, if that's possible?

Quick further thought - if we can do that then we should be able to generate an image of those comprehensive examples before this PR and after, to experience the process of looking at said image diffs.

I added examples and before/after screenshots. I'll add a description to README.md on how we generate these screenshots when we establish a convention.

@ozguronsoy ozguronsoy requested a review from danakj February 10, 2026 13:55
@danakj
Copy link
Contributor

danakj commented Feb 11, 2026

ozguronsoy requested a review from danakj yesterday

Thanks, I should have a chance to look tomorrow.

Copy link
Contributor

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like the before/afters a lot, it made it much easier to understand the change. But it's also a lot of files to regenerate all the time, so it would really really nice if we can automate it, even just on Linux. Do you have any thoughts on that? I see there are some tools around for this, though I am not sure how well they'd work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to split the image file, then we should split the carbon file. Maybe one for class/interface/constraint and one for variables/functions and one for choice (which will get more complicated eventually)?

Or maybe the variables/functions part could move into the variables.carbon and functions.carbon files?

Also let's include an impl as and extend impl as inside a class, and an extend require impls in the constraint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"end": "(?=[;{])",
"patterns": [
{ "include": "#reserved" },
{ "include": "#variables" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can just keep one of these, that sounds nicer.

In practice I think parameter bindings is going to break with things like ref and const ref (and later, safety annotations), so we'll need more than a simple regex match, and more like variables with the begin/end, but without requiring it to start with var or let.

@danakj
Copy link
Contributor

danakj commented Feb 12, 2026

Thanks, I like the before/afters a lot, it made it much easier to understand the change. But it's also a lot of files to regenerate all the time, so it would really really nice if we can automate it, even just on Linux. Do you have any thoughts on that? I see there are some tools around for this, though I am not sure how well they'd work?

Bit of convo about generating the images in #infra. The high-level bit is that this syntax highlighting system will be replaced entirely by the LSP in the future, and it may not be that much work to get the LSP working well enough to replace this.

So it's not worth putting too much effort into this textmate stuff. For this PR, I propose we check in the samples but don't add any work into automating them. It's a bit of a pain to update them, but maybe we won't want to keep updating them.

For next steps, I wondered if you'd be interested in working on the LSP to get it to the point where it can replace this textmate stuff?

@RedCMD
Copy link

RedCMD commented Feb 13, 2026

semantic highlighting should only be used as an enhancement, for when TextMate falls short, not a replacement
as the TextMate highlighting still has control over many features like intelisense, comments and brackets
and a quicker response time

Copy link
Contributor

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This LG now generally.

Thanks for the before/after, since we had no "before" when this PR was created, it allowed for a nice comparison.

For checking it in, can we remove the before jpgs (they don't need to live in the tree), and rename the others to drop the _after suffix. Then in the future it will simply become a diff of the image in the tree vs the image in a PR.

@ozguronsoy
Copy link
Contributor Author

Thanks, I like the before/afters a lot, it made it much easier to understand the change. But it's also a lot of files to regenerate all the time, so it would really really nice if we can automate it, even just on Linux. Do you have any thoughts on that? I see there are some tools around for this, though I am not sure how well they'd work?

Bit of convo about generating the images in #infra. The high-level bit is that this syntax highlighting system will be replaced entirely by the LSP in the future, and it may not be that much work to get the LSP working well enough to replace this.

So it's not worth putting too much effort into this textmate stuff. For this PR, I propose we check in the samples but don't add any work into automating them. It's a bit of a pain to update them, but maybe we won't want to keep updating them.

Yeah, I think automizing tests for textmate would be too much effort. FWIW, I think textmate should stay for highlighting until LSP launches. When using slow hardware, it takes minutes to launch the LSP.

For next steps, I wondered if you'd be interested in working on the LSP to get it to the point where it can replace this textmate stuff?

Sure, I'll take a look at it after this.

Copy link
Contributor

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danakj danakj enabled auto-merge February 13, 2026 16:32
@danakj danakj added this pull request to the merge queue Feb 13, 2026
Merged via the queue into carbon-language:trunk with commit 2184663 Feb 13, 2026
8 checks passed
@jonmeow
Copy link
Contributor

jonmeow commented Feb 13, 2026

Yeah, I think automizing tests for textmate would be too much effort. FWIW, I think textmate should stay for highlighting until LSP launches. When using slow hardware, it takes minutes to launch the LSP.

Are you saying that launching carbon language-server is taking you minutes, when compiled in opt mode? Can you please provide output of that? It's really surprising to me since I see it sub-second.

@ozguronsoy
Copy link
Contributor Author

Are you saying that launching carbon language-server is taking you minutes, when compiled in opt mode? Can you please provide output of that? It's really surprising to me since I see it sub-second.

I wasn't talking about carbon, it's sub-second for me too but I develop carbon with my own setup. Most defense sector companies don't allow that and give you an old laptop with two cores. Those laptops struggle with everything though but one of the things that irritated me was waiting for the highlighting to work. Maybe that's just my experience.

@ozguronsoy ozguronsoy deleted the fix/syntax-higlighting branch February 13, 2026 18:51
@jonmeow
Copy link
Contributor

jonmeow commented Feb 13, 2026

Are you saying that launching carbon language-server is taking you minutes, when compiled in opt mode? Can you please provide output of that? It's really surprising to me since I see it sub-second.

I wasn't talking about carbon, it's sub-second for me too but I develop carbon with my own setup. Most defense sector companies don't allow that and give you an old laptop with two cores. Those laptops struggle with everything though but one of the things that irritated me was waiting for the highlighting to work. Maybe that's just my experience.

To be sure, if you're used to using clangd, I understand a poor performance expectation. There are multiple reasons for that, such as C++ needing parsing of headers in order to understand code, and also elements of Clang's AST perhaps giving some marginal slowdown too. In contrast though, Carbon's a context-free grammar; in the case of highlighting, we should be able to highlight with a parse (of the single file being higlighted). Since we're using a hand-built parser, it might actually be faster than textmate's regular expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

utilities Utilities for working with Carbon code: syntax highlighting, editor plugins, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants