Improve vscode syntax highlighting by ozguronsoy · Pull Request #6543 · carbon-language/carbon-lang

ozguronsoy · 2025-12-30T23:25:29Z

This PR improves the vscode syntax highlighting.

Added comment keys.
Added highlighting of invalid numbers such as 0x, 0b, 0xa, etc.
Restricted highlighting of numeric type literals to common types to avoid highlighting identifiers such as i1.
Changed the highlighting of named operators (e.g., as).
Added char and str to type literals.
Added const to modifier keywords.
Removed addr keyword.
Refactored some rules to use begin/end to handle line breaks.
Added highlighting to choice values as enum values.
Updated the rules for matching types.
- Added highlighting to rhs of adapt, alias, choice, constraint, impl, interface, as, and impls.
- Added highlighting to rhs of bindings.
- Added highlighting to function return types.
Updated the rules for matching functions.
Updated the rules for matching variables.
Added highlighting unidentified words as variable.
Added examples and before/after screenshots.

Before	After

josh11b

I'm finding this difficult to review. Is there a way to include comments in the file saying what you are doing? Also, possibly break this into smaller changes that can each get their own explanation?

utils/textmate/Syntaxes/carbon.tmLanguage.json

ozguronsoy

I added comments to the new patterns explaining what they highlight.

utils/textmate/Syntaxes/carbon.tmLanguage.json

chandlerc · 2026-01-01T00:26:23Z

utils/textmate/Syntaxes/carbon.tmLanguage.json

        },
        {
          "name": "support.class.carbon",
+          "comment": "Handles `class C`, `class C(I)`, `impl C as I(A)`, etc.",


I don't see this in the schema? The only use of comment keys I see are for syntax-highlighting comments in the underlying source code.

What would help here are comments in the actual grammar definition. It would also be good to include the explanations from your code review replies in comments.

While JSON strictly doesn't have a comment syntax, many parsers accept // ... comments, and the spec for textmate grammars seems to use these in its examples, so maybe that works here? https://macromates.com/manual/en/language_grammars

Unfortunately // style comments do not work with the vscode textmate parser. The comment key is not in the schema but the parser just ignores the invalid keys.

ozguronsoy

Added comments from review and limited numeric type literal highlighting. It used to highlight anything that starts with i, u, or f and contains only numbers after like i1. Now it highlights common types.

utils/textmate/Syntaxes/carbon.tmLanguage.json

…arams

ozguronsoy

Removed matching compound member access.
Fixed matching multiple generic parameters class C(I1, I2, D(E).F).
Fixed matching dot chaining impl C as I(A.B).

utils/textmate/Syntaxes/carbon.tmLanguage.json

ozguronsoy

Added const to modifier-keywords.
Added char and str to type-literals.
Refactored type, function, and variable matching to use begin/end to simplify regex and handle line breaks.
Fixed nested parens, commas, and dots when matching function return types.
Fixed broken type highlighting after forall.
Added adapt, alias, constraint, impls, where, and choice to type matching rules.
Highlighted choice values as enum values.

Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.

utils/textmate/Syntaxes/carbon.tmLanguage.json

danakj · 2026-02-03T17:19:59Z

Added const to modifier-keywords.

Added char and str to type-literals.

Refactored type, function, and variable matching to use begin/end to simplify regex and handle line breaks.

Fixed nested parens, commas, and dots when matching function return types.

Fixed broken type highlighting after forall.

Added adapt, alias, constraint, impls, where, and choice to type matching rules.

Highlighted choice values as enum values.

Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.

It looks like this PR has gradually grown in scope over time (eg const, char, str). I am going to take over review, so just want to express that I would like to avoid that continuing during the review process. We can always add more things later.

danakj · 2026-02-03T17:20:56Z

Added const to modifier-keywords.

Added char and str to type-literals.

Refactored type, function, and variable matching to use begin/end to simplify regex and handle line breaks.

Fixed nested parens, commas, and dots when matching function return types.

Fixed broken type highlighting after forall.

Added adapt, alias, constraint, impls, where, and choice to type matching rules.

Highlighted choice values as enum values.

Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.

It looks like this PR has gradually grown in scope over time (eg const, char, str). I am going to take over review, so just want to express that I would like to avoid that continuing during the review process. We can always add more things later.

Also, there is a lot going on here in the comments explaining various things being done and added. Can those be repeated/lifted up into the PR description so they will go in the git log?

ozguronsoy · 2026-02-03T18:17:56Z

Added const to modifier-keywords.

Added char and str to type-literals.

Refactored type, function, and variable matching to use begin/end to simplify regex and handle line breaks.

Fixed nested parens, commas, and dots when matching function return types.

Fixed broken type highlighting after forall.

Added adapt, alias, constraint, impls, where, and choice to type matching rules.

Highlighted choice values as enum values.

Note: The rules using begin/end start matching when the pattern in begin is matched, and keep matching until the pattern in end is matched. The includes such as #reserved are for keeping the parent rule from overriding other rules between begin and end.

It looks like this PR has gradually grown in scope over time (eg const, char, str). I am going to take over review, so just want to express that I would like to avoid that continuing during the review process. We can always add more things later.

Also, there is a lot going on here in the comments explaining various things being done and added. Can those be repeated/lifted up into the PR description so they will go in the git log?

Done.

danakj

It's hard to test this stuff, but even so I wonder if we could generate a comprehensive set of examples that demonstrate all of the rules here, and generate a screenshot of it, and check those into the tree with them. Then we would have a sort of regression test as long as we update the screenshot when we change the rules. We could add a README.md describing the process, as well as how to generate the screenshot in a consistent way so that we can look for differences easily, if that's possible?

utils/textmate/Syntaxes/carbon.tmLanguage.json

danakj · 2026-02-03T17:45:31Z

utils/textmate/Syntaxes/carbon.tmLanguage.json

-          "name": "keyword.operator.carbon",
-          "match": "\\b(addr|and|as|impls|in|like|not|or|partial|template|where)\\b"
+          "name": "keyword.control.carbon",
+          "match": "\\b(addr|and|as|impls|in|like|not|or|partial|ref|template|where)\\b"


ref seems like it would fit more correctly with let and var (which I see in two different places), how come it's here?

likewise as is part of meta.type.carbon now, and where is up there as well. Do they need to be duplicated here? Does that override their status somehow?

And btw addr has been removed from the language, replaced by ref.

ref seems like it would fit more correctly with let and var

Do you mean in introducer-keywords or in variables?

likewise as is part of meta.type.carbon now, and where is up there as well. Do they need to be duplicated here? Does that override their status somehow?

Yes, they should also be here. Otherwise a rule that uses begin/end might "consume" these keywords since they also wouldn't be in #reserved.

And btw addr has been removed from the language, replaced by ref.

Okay, thanks. I'll remove it.

utils/textmate/Syntaxes/carbon.tmLanguage.json

danakj · 2026-02-03T18:30:50Z

It's hard to test this stuff, but even so I wonder if we could generate a comprehensive set of examples that demonstrate all of the rules here, and generate a screenshot of it, and check those into the tree with them. Then we would have a sort of regression test as long as we update the screenshot when we change the rules. We could add a README.md describing the process, as well as how to generate the screenshot in a consistent way so that we can look for differences easily, if that's possible?

Quick further thought - if we can do that then we should be able to generate an image of those comprehensive examples before this PR and after, to experience the process of looking at said image diffs.

Co-authored-by: Dana Jansens <[email protected]>

RedCMD · 2026-02-03T22:12:13Z

been busy ay

I agree highlighting invalid tokens should be done by semantic highlighting
you could do it in TextMate
but only if is really simple, small and no possible exceptions

https://www.npmjs.com/package/textmate-grammar-test can be used for writing tests

my extension has a code-action for optimizing the regexes
can use ctrl+enter to preview the changes
https://github.com/slevithan/oniguruma-parser?tab=readme-ov-file#-optimize-regexes

ozguronsoy · 2026-02-05T14:36:03Z

It's hard to test this stuff, but even so I wonder if we could generate a comprehensive set of examples that demonstrate all of the rules here, and generate a screenshot of it, and check those into the tree with them. Then we would have a sort of regression test as long as we update the screenshot when we change the rules. We could add a README.md describing the process, as well as how to generate the screenshot in a consistent way so that we can look for differences easily, if that's possible?

Quick further thought - if we can do that then we should be able to generate an image of those comprehensive examples before this PR and after, to experience the process of looking at said image diffs.

I added examples and before/after screenshots. I'll add a description to README.md on how we generate these screenshots when we establish a convention.

danakj · 2026-02-11T21:56:30Z

ozguronsoy requested a review from danakj yesterday

Thanks, I should have a chance to look tomorrow.

danakj

Thanks, I like the before/afters a lot, it made it much easier to understand the change. But it's also a lot of files to regenerate all the time, so it would really really nice if we can automate it, even just on Linux. Do you have any thoughts on that? I see there are some tools around for this, though I am not sure how well they'd work?

utils/textmate/Samples/main.jpg

danakj · 2026-02-12T15:04:28Z

utils/textmate/Samples/types_after_2.jpg

If we have to split the image file, then we should split the carbon file. Maybe one for class/interface/constraint and one for variables/functions and one for choice (which will get more complicated eventually)?

Or maybe the variables/functions part could move into the variables.carbon and functions.carbon files?

Also let's include an impl as and extend impl as inside a class, and an extend require impls in the constraint.

utils/textmate/Syntaxes/carbon.tmLanguage.json

danakj · 2026-02-12T15:21:34Z

utils/textmate/Syntaxes/carbon.tmLanguage.json

+          "end": "(?=[;{])",
+          "patterns": [
+            { "include": "#reserved" },
+            { "include": "#variables" },


If we can just keep one of these, that sounds nicer.

In practice I think parameter bindings is going to break with things like ref and const ref (and later, safety annotations), so we'll need more than a simple regex match, and more like variables with the begin/end, but without requiring it to start with var or let.

danakj · 2026-02-12T15:51:56Z

Thanks, I like the before/afters a lot, it made it much easier to understand the change. But it's also a lot of files to regenerate all the time, so it would really really nice if we can automate it, even just on Linux. Do you have any thoughts on that? I see there are some tools around for this, though I am not sure how well they'd work?

Bit of convo about generating the images in #infra. The high-level bit is that this syntax highlighting system will be replaced entirely by the LSP in the future, and it may not be that much work to get the LSP working well enough to replace this.

So it's not worth putting too much effort into this textmate stuff. For this PR, I propose we check in the samples but don't add any work into automating them. It's a bit of a pain to update them, but maybe we won't want to keep updating them.

For next steps, I wondered if you'd be interested in working on the LSP to get it to the point where it can replace this textmate stuff?

RedCMD · 2026-02-13T06:04:28Z

semantic highlighting should only be used as an enhancement, for when TextMate falls short, not a replacement
as the TextMate highlighting still has control over many features like intelisense, comments and brackets
and a quicker response time

danakj

Thanks! This LG now generally.

Thanks for the before/after, since we had no "before" when this PR was created, it allowed for a nice comparison.

For checking it in, can we remove the before jpgs (they don't need to live in the tree), and rename the others to drop the _after suffix. Then in the future it will simply become a diff of the image in the tree vs the image in a PR.

ozguronsoy · 2026-02-13T16:25:28Z

Thanks, I like the before/afters a lot, it made it much easier to understand the change. But it's also a lot of files to regenerate all the time, so it would really really nice if we can automate it, even just on Linux. Do you have any thoughts on that? I see there are some tools around for this, though I am not sure how well they'd work?

Bit of convo about generating the images in #infra. The high-level bit is that this syntax highlighting system will be replaced entirely by the LSP in the future, and it may not be that much work to get the LSP working well enough to replace this.

So it's not worth putting too much effort into this textmate stuff. For this PR, I propose we check in the samples but don't add any work into automating them. It's a bit of a pain to update them, but maybe we won't want to keep updating them.

Yeah, I think automizing tests for textmate would be too much effort. FWIW, I think textmate should stay for highlighting until LSP launches. When using slow hardware, it takes minutes to launch the LSP.

For next steps, I wondered if you'd be interested in working on the LSP to get it to the point where it can replace this textmate stuff?

Sure, I'll take a look at it after this.

danakj

LGTM

jonmeow · 2026-02-13T17:32:27Z

Yeah, I think automizing tests for textmate would be too much effort. FWIW, I think textmate should stay for highlighting until LSP launches. When using slow hardware, it takes minutes to launch the LSP.

Are you saying that launching carbon language-server is taking you minutes, when compiled in opt mode? Can you please provide output of that? It's really surprising to me since I see it sub-second.

ozguronsoy · 2026-02-13T18:42:21Z

Are you saying that launching carbon language-server is taking you minutes, when compiled in opt mode? Can you please provide output of that? It's really surprising to me since I see it sub-second.

I wasn't talking about carbon, it's sub-second for me too but I develop carbon with my own setup. Most defense sector companies don't allow that and give you an old laptop with two cores. Those laptops struggle with everything though but one of the things that irritated me was waiting for the highlighting to work. Maybe that's just my experience.

jonmeow · 2026-02-13T19:15:36Z

Are you saying that launching carbon language-server is taking you minutes, when compiled in opt mode? Can you please provide output of that? It's really surprising to me since I see it sub-second.

I wasn't talking about carbon, it's sub-second for me too but I develop carbon with my own setup. Most defense sector companies don't allow that and give you an old laptop with two cores. Those laptops struggle with everything though but one of the things that irritated me was waiting for the highlighting to work. Maybe that's just my experience.

To be sure, if you're used to using clangd, I understand a poor performance expectation. There are multiple reasons for that, such as C++ needing parsing of headers in order to understand code, and also elements of Clang's AST perhaps giving some marginal slowdown too. In contrast though, Carbon's a context-free grammar; in the case of highlighting, we should be able to highlight with a parse (of the single file being higlighted). Since we're using a hand-built parser, it might actually be faster than textmate's regular expressions.

improve vscode regex syntax highlighting

a54b6f0

ozguronsoy requested a review from a team as a code owner December 30, 2025 23:25

ozguronsoy requested review from josh11b and removed request for a team December 30, 2025 23:25

github-actions bot added the utilities Utilities for working with Carbon code: syntax highlighting, editor plugins, etc. label Dec 30, 2025

josh11b reviewed Dec 31, 2025

View reviewed changes

add comments to textmate json

f29abfa

ozguronsoy commented Dec 31, 2025

View reviewed changes

utils/textmate/Syntaxes/carbon.tmLanguage.json Outdated Show resolved Hide resolved

utils/textmate/Syntaxes/carbon.tmLanguage.json Outdated Show resolved Hide resolved

utils/textmate/Syntaxes/carbon.tmLanguage.json Outdated Show resolved Hide resolved

chandlerc reviewed Jan 1, 2026

View reviewed changes

ozguronsoy added 2 commits January 12, 2026 00:54

add comments from review

5b955e6

fix invalid numeric type literals highlighting

166b60b

ozguronsoy commented Jan 11, 2026

View reviewed changes

josh11b reviewed Jan 13, 2026

View reviewed changes

utils/textmate/Syntaxes/carbon.tmLanguage.json Outdated Show resolved Hide resolved

ozguronsoy added 2 commits January 15, 2026 04:05

remove compound access highlighting and match multiple generic type p…

8fc10da

…arams

fix dot chaining in declarations and type definitions

ba09c66

ozguronsoy commented Jan 16, 2026

View reviewed changes

utils/textmate/Syntaxes/carbon.tmLanguage.json Outdated Show resolved Hide resolved

josh11b reviewed Jan 16, 2026

View reviewed changes

utils/textmate/Syntaxes/carbon.tmLanguage.json Outdated Show resolved Hide resolved

utils/textmate/Syntaxes/carbon.tmLanguage.json Outdated Show resolved Hide resolved

ozguronsoy added 4 commits January 18, 2026 21:46

add char & str to type literals, const to keywords

cc19f0a

refactor to use begin/end to support multi-line highlighting

b8048af

add missing includes to where patterns

76f1242

update comments

a4cdb68

ozguronsoy commented Jan 18, 2026

View reviewed changes

utils/textmate/Syntaxes/carbon.tmLanguage.json Outdated Show resolved Hide resolved

fix lambda body highlighting with fn return types

398a5f0

ozguronsoy requested a review from josh11b February 3, 2026 15:17

danakj requested review from danakj and removed request for josh11b February 3, 2026 17:20

danakj reviewed Feb 3, 2026

View reviewed changes

ozguronsoy and others added 3 commits February 3, 2026 22:43

Apply suggestions from code review

3e5bbfd

Co-authored-by: Dana Jansens <[email protected]>

Apply review suggestions: fix typo

68517d1

Co-authored-by: Dana Jansens <[email protected]>

remove 'E' and 'P' from number patterns

1d1b6c2

ozguronsoy added 3 commits February 5, 2026 00:36

remove addr

2a0a263

add samples

db7c9b4

add before/after screenshots

5313170

ozguronsoy requested a review from danakj February 10, 2026 13:55

danakj reviewed Feb 12, 2026

View reviewed changes

address review suggestions

72cc623

danakj reviewed Feb 13, 2026

View reviewed changes

address review suggestions

d2564f6

danakj approved these changes Feb 13, 2026

View reviewed changes

danakj enabled auto-merge February 13, 2026 16:32

danakj added this pull request to the merge queue Feb 13, 2026

Merged via the queue into carbon-language:trunk with commit 2184663 Feb 13, 2026
8 checks passed

ozguronsoy deleted the fix/syntax-higlighting branch February 13, 2026 18:51

Conversation

ozguronsoy commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josh11b left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ozguronsoy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chandlerc Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

ozguronsoy Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

ozguronsoy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ozguronsoy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ozguronsoy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danakj commented Feb 3, 2026

Uh oh!

danakj commented Feb 3, 2026

Uh oh!

ozguronsoy commented Feb 3, 2026

Uh oh!

danakj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danakj Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

ozguronsoy Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danakj commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RedCMD commented Feb 3, 2026

Uh oh!

ozguronsoy commented Feb 5, 2026

Uh oh!

danakj commented Feb 11, 2026

Uh oh!

danakj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danakj Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ozguronsoy Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

ozguronsoy commented Dec 30, 2025 •

edited

Loading

danakj commented Feb 3, 2026 •

edited

Loading

RedCMD commented Feb 13, 2026 •

edited

Loading

jonmeow commented Feb 13, 2026 •

edited

Loading