Skip to content

Conversation

@jeremy-rifkin
Copy link
Contributor

@jeremy-rifkin jeremy-rifkin commented May 25, 2023

I think monarch is not powerful enough to tokenize constructs such as C++ raw string literals in the general case. This is an attempt to expand monarch functionality so it can.

Monaco tokenization of C++ raw strings is broken when the d-char-sequence  contains double quotes microsoft/monaco-editor#3128

image

This is because https://github.com/microsoft/monaco-editor/blob/main/src/basic-languages/cpp/cpp.ts looks for the end of a raw string literal by searching for /(.*)(\))(?:([^ ()\\\t"]*))(\")/ and then checks $3==$S2 to see if the end has in fact been found, but only compares what is between a closing parenthesis and the first double quote.

As far as I can tell, there is no way to solve this currently in monarch, without doing something hacky like hard-coding for all d-char-sequences of 16 or fewer characters, for example (the C++ standard does specify this limit). Maybe there is a way that I haven't thought of, in which case this PR isn't needed. Most languages do not seem to place such limitations on the delimiter sequences, e.g. PHP and C#. Rust tokenization can be improved as well with the tools provided by the PR.


This PR expands monarch functionality by allowing $S2 to be used within rule regexes, so the fix for microsoft/monaco-editor#3128 can be something along the lines of

        root: [
            // C++ 11 Raw String
            [/@encoding?R\"(?:([^ ()\\\t]*))\(/, { token: 'string.raw.begin', next: '@raw.$1' }],
            ...
        ],
        raw: [
+           [/.*\)$S2\"/, 'string.raw', '@pop']
-           [
-               /(.*)(\))(?:([^ ()\\\t"]*))(\")/,
-               {
-                   cases: {
-                       '$3==$S2': [
-                           'string.raw',
-                           'string.raw.end',
-                           'string.raw.end',
-                           { token: 'string.raw.end', next: '@pop' }
-                       ],
-                       '@default': ['string.raw', 'string.raw', 'string.raw', 'string.raw']
-                   }
-               }
-           ],
            [/.*/, 'string.raw']
        ],

I have not fully tested this PR yet.

@jeremy-rifkin
Copy link
Contributor Author

@microsoft-github-policy-service agree

@jeremy-rifkin
Copy link
Contributor Author

The automated test cases and unit tests pass. I could use some guidance on how to test the changes in conjunction with monaco / a test monarch grammar.

@alexdima alexdima self-requested a review August 31, 2023 07:42
@alexdima alexdima added this to the March 2024 milestone Mar 19, 2024
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@alexdima alexdima enabled auto-merge (squash) March 19, 2024 15:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants