Skip to content

Conversation

@marknn3
Copy link
Contributor

@marknn3 marknn3 commented Feb 9, 2022

This will fix microsoft/vscode#142516 : [css] support unicode-range wildcard.

Wildcard characters in @font-face unicode-range did generate a css language syntax error.

@ghost
Copy link

ghost commented Feb 9, 2022

CLA assistant check
All CLA requirements met.

@aeschli
Copy link
Collaborator

aeschli commented Feb 9, 2022

Thanks a lot @marknn3. I think the proper solution would be to deal with the unicode range in the scanner.
https://www.w3.org/TR/CSS21/syndata.html#tokenization

Right now it is parsed as an expression U + 0025 which doesn't make much sense.

Surprisingly, we already have a scanner token type for unicode-range (I don't remember why)

@marknn3
Copy link
Contributor Author

marknn3 commented Feb 9, 2022

The fact that the current CSS expression parser works at all on unicode-ranges is pure coincidence/luck!
This was the reason I decided to tweak it to allow for '?' suffixes on expressions, as a quick and low-risk fix.

For sure the best solution is to create actual unicode-range type tokens. I will look into this to do it properly.
This is a low prio- and fun issue to get my feet wet.

@aeschli
Copy link
Collaborator

aeschli commented Feb 10, 2022

@marknn3 Cool, no rush, and thanks again!

@marknn3
Copy link
Contributor Author

marknn3 commented Mar 1, 2022

Surprisingly, we already have a scanner token type for unicode-range (I don't remember why)

Having a TokenType.UnicodeRange does not really make sense, because the scanner can not recognize unicode tokens without proper context. And a range has a start and end value, which should not be a single token.

What I have implemented is a UnicodeRange node.
This node has a rangeStart and rangeEnd property. These properties contain a single expression leaf node, which represents the raw unicode value, e.g. `01??' or '1F49C'.

@aeschli
Copy link
Collaborator

aeschli commented Mar 1, 2022

From looking at https://www.w3.org/TR/CSS21/syndata.html#tokenization I still think this should be done in the scanner. I don't think context is needed

@aeschli
Copy link
Collaborator

aeschli commented Mar 1, 2022

Ok, now I see https://www.w3.org/TR/css-syntax-3/#urange

What I don't like with the current proposal is the special handling for unicode-range is in parseDeclaration. It's far too prominent.

I suggest to handle unicode range as a valid term in _parseTermExpression
We would leave it to a lint rule to verify that it can only be used in a unicode-range property.

Alternative is to do this in _parseFontFace. So instead of using this._parseRuleSetDeclaration.bind(this), we introduce this._parseFontFaceDeclaration.bind(this) and handle this in there.

@marknn3
Copy link
Contributor Author

marknn3 commented Mar 1, 2022

Alternative is to do this in _parseFontFace. So instead of using this._parseRuleSetDeclaration.bind(this), we introduce this._parseFontFaceDeclaration.bind(this) and handle this in there.

I agree, this makes more sense! Will refactor this.

Do you agree with the UnicodeRange Node implementation?
Could this impact existing Node tree consumers? Because it does replace the current (meaningless!) expression node for unicode-ranges with a different node type.

@aeschli
Copy link
Collaborator

aeschli commented Mar 2, 2022

Sounds good.
No need to worry about existing Node tree consumers. The tree not public API so we are free to change it.

- In _parseDeclaration we first call _tryParseUnicodeRangeExpr, and when this fails to parse then fallback to  _parseExpr.
According to the spec (https://www.w3.org/TR/css-syntax-3/#urange-syntax) anything that looks like a unicode-range should be interpreted as a unicode-range.

- Same change applied in scssParser.ts.

- Added new @font-face tests in scss/parser.tests.ts to cover the changes in scssParser.ts.
@marknn3
Copy link
Contributor Author

marknn3 commented Mar 3, 2022

The introduction of a dedicated _parseFontFaceDeclaration parser would result in a lot of duplicate and error prone code.
Also because the scss parser has an overridden _parseRuleSetDeclaration function which then needs to be changed accordingly.

Instead, I introduced _tryParseUnicodeRangeExpr which is used in _parseDeclaration as follows (in cssParser.ts and scssParser.ts):

node.setValue(this._tryParseUnicodeRangeExpr() || this._parseExpr())

I think this will address your valid concerns and stays close to the <urange> parsing specs

Also added a new @font-face test in scss/parser.test.ts to cover the change in scssParser.ts.

A dedicated linter for the misuse of unicode-ranges is IMHO overkill. It would be extremely rare to trigger and quite expensive to detect.

@aeschli
Copy link
Collaborator

aeschli commented Mar 4, 2022

My idea was that _parseFontFaceDeclaration will call _parseRuleSetDeclaration after looking for a unicode-range property.

But yes, just allowing it a valid expression for all properties is fine too. Just add this._tryParseUnicodeRangeExpr() as part of parseTerm. Yes that will allow to have unicode ranges in things like binary expressions, but that's fine. It will be up to linter to generate an error if a property value has the wrong type.
I don't think we should have any special handling for unicode range in _parseDeclaration.

@marknn3
Copy link
Contributor Author

marknn3 commented Mar 4, 2022

My idea was that _parseFontFaceDeclaration will call _parseRuleSetDeclaration after looking for a unicode-range property.

I already did try and implement this idea, but it did break the scss parser, surprisingly. A fix would require a similar (and less obvious) duplication in scssParser.
There might be another way to implement _parseFontFaceDeclaration without all the duplication but don't see it right now.

@aeschli
Copy link
Collaborator

aeschli commented Mar 18, 2022

I pushed a commit that uses the does Unicode range parsing in a special method in the scanner. That's much simpler than trying to do it in the parser. E.g. what was missing before was a check that there are no whitespaces between the tokens.
I also added _parseUnicodeRange right in _parseTermExpression, removing any special handling of unicode ranges in _parseExpression.
As mentioned, this will permit unicode ranges also in binary expressions. That's something that then a linter has to detect and warn about. We currently don't have an linter that checks the attribute values for the correct type.

I hope that's ok for you. If you see a problem, please speak up

Thanks a lot for your help!

@aeschli aeschli merged commit 6fc9798 into microsoft:main Mar 18, 2022
@aeschli aeschli added this to the March 2022 milestone Mar 18, 2022
@aeschli aeschli self-assigned this Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[css] support unicode-range wildcard

2 participants