Skip to content

Conversation

@teo-tsirpanis
Copy link
Contributor

Fixes #72312.

When emitting code that matches regex character classes, if all other shortcuts fail, we resort to hardcoding whether ASCII characters belong in the set using a string as a bit vector, and calling CharInClass for the other characters. Sometimes, based on RegexCharClass.Analyze we determine that the character class contains either all ASCII characters or none of them, and skip the bit vector search.

However these analysis results are not perfect and cannot produce reliable results if for example the character class contains a subtraction. This PR checks the string bit vector after its construction to see if it contains only ones or zeroes and if so, emits code that does not use it.

@ghost ghost added area-System.Text.RegularExpressions community-contribution Indicates that the PR has been added by a community member labels Jul 16, 2022
@ghost
Copy link

ghost commented Jul 16, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #72312.

When emitting code that matches regex character classes, if all other shortcuts fail, we resort to hardcoding whether ASCII characters belong in the set using a string as a bit vector, and calling CharInClass for the other characters. Sometimes, based on RegexCharClass.Analyze we determine that the character class contains either all ASCII characters or none of them, and skip the bit vector search.

However these analysis results are not perfect and cannot produce reliable results if for example the character class contains a subtraction. This PR checks the string bit vector after its construction to see if it contains only ones or zeroes and if so, emits code that does not use it.

Author: teo-tsirpanis
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -

@stephentoub
Copy link
Member

Thanks. Are these new post-string-building cases exercised by our existing tests? i.e. does code coverage show all the new branches hit?

@stephentoub stephentoub merged commit 8a709bc into dotnet:main Jul 17, 2022
@teo-tsirpanis teo-tsirpanis deleted the fix-72312 branch July 17, 2022 09:38
@ghost ghost locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Text.RegularExpressions community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove redundant check of 128-bit bitmap of zeros in regex

2 participants