-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Use more char.Is helpers from RegexCompiler / source generator #68924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsThis PR causes regex to now specially-recognize additional categories that map to sets Example: previously resulted in: if ((uint)slice.Length < 18 ||
(char.GetUnicodeCategory(slice[0]) switch { UnicodeCategory.Control or UnicodeCategory.Format or UnicodeCategory.OtherNotAssigned or UnicodeCategory.PrivateUse or UnicodeCategory.Surrogate => false, _ => true }) || // Match a character in the set [\p{C}].
(char.GetUnicodeCategory(slice[1]) switch { UnicodeCategory.Control or UnicodeCategory.Format or UnicodeCategory.OtherNotAssigned or UnicodeCategory.PrivateUse or UnicodeCategory.Surrogate => true, _ => false }) || // Match a character in the set [\P{C}].
(char.GetUnicodeCategory(slice[2]) switch { UnicodeCategory.LowercaseLetter or UnicodeCategory.ModifierLetter or UnicodeCategory.OtherLetter or UnicodeCategory.TitlecaseLetter or UnicodeCategory.UppercaseLetter => false, _ => true }) || // Match a character in the set [\p{L}].
(char.GetUnicodeCategory(slice[3]) switch { UnicodeCategory.LowercaseLetter or UnicodeCategory.ModifierLetter or UnicodeCategory.OtherLetter or UnicodeCategory.TitlecaseLetter or UnicodeCategory.UppercaseLetter => true, _ => false }) || // Match a character in the set [\P{L}].
(char.GetUnicodeCategory(slice[4]) switch { UnicodeCategory.LowercaseLetter or UnicodeCategory.ModifierLetter or UnicodeCategory.OtherLetter or UnicodeCategory.TitlecaseLetter or UnicodeCategory.UppercaseLetter or UnicodeCategory.DecimalDigitNumber => false, _ => true }) || // Match a character in the set [\p{L}\d].
(char.GetUnicodeCategory(slice[5]) switch { UnicodeCategory.LowercaseLetter or UnicodeCategory.ModifierLetter or UnicodeCategory.OtherLetter or UnicodeCategory.TitlecaseLetter or UnicodeCategory.UppercaseLetter or UnicodeCategory.DecimalDigitNumber => true, _ => false }) || // Match a character in the set [^\p{L}\d].
(char.GetUnicodeCategory(slice[6]) != UnicodeCategory.LowercaseLetter) || // Match a character in the set [\p{Ll}].
(char.GetUnicodeCategory(slice[7]) == UnicodeCategory.LowercaseLetter) || // Match a character in the set [\P{Ll}].
(char.GetUnicodeCategory(slice[8]) != UnicodeCategory.UppercaseLetter) || // Match a character in the set [\p{Lu}].
(char.GetUnicodeCategory(slice[9]) == UnicodeCategory.UppercaseLetter) || // Match a character in the set [\P{Lu}].
(char.GetUnicodeCategory(slice[10]) switch { UnicodeCategory.DecimalDigitNumber or UnicodeCategory.LetterNumber or UnicodeCategory.OtherNumber => false, _ => true }) || // Match a character in the set [\p{N}].
(char.GetUnicodeCategory(slice[11]) switch { UnicodeCategory.DecimalDigitNumber or UnicodeCategory.LetterNumber or UnicodeCategory.OtherNumber => true, _ => false }) || // Match a character in the set [\P{N}].
(char.GetUnicodeCategory(slice[12]) switch { UnicodeCategory.ConnectorPunctuation or UnicodeCategory.DashPunctuation or UnicodeCategory.ClosePunctuation or UnicodeCategory.OtherPunctuation or UnicodeCategory.OpenPunctuation or UnicodeCategory.FinalQuotePunctuation or UnicodeCategory.InitialQuotePunctuation => false, _ => true }) || // Match a character in the set [\p{P}].
(char.GetUnicodeCategory(slice[13]) switch { UnicodeCategory.ConnectorPunctuation or UnicodeCategory.DashPunctuation or UnicodeCategory.ClosePunctuation or UnicodeCategory.OtherPunctuation or UnicodeCategory.OpenPunctuation or UnicodeCategory.FinalQuotePunctuation or UnicodeCategory.InitialQuotePunctuation => true, _ => false }) || // Match a character in the set [\P{P}].
(char.GetUnicodeCategory(slice[14]) switch { UnicodeCategory.LineSeparator or UnicodeCategory.ParagraphSeparator or UnicodeCategory.SpaceSeparator => false, _ => true }) || // Match a character in the set [\p{Z}].
(char.GetUnicodeCategory(slice[15]) switch { UnicodeCategory.LineSeparator or UnicodeCategory.ParagraphSeparator or UnicodeCategory.SpaceSeparator => true, _ => false }) || // Match a character in the set [\P{Z}].
(char.GetUnicodeCategory(slice[16]) switch { UnicodeCategory.CurrencySymbol or UnicodeCategory.ModifierSymbol or UnicodeCategory.MathSymbol or UnicodeCategory.OtherSymbol => false, _ => true }) || // Match a character in the set [\p{S}].
(char.GetUnicodeCategory(slice[17]) switch { UnicodeCategory.CurrencySymbol or UnicodeCategory.ModifierSymbol or UnicodeCategory.MathSymbol or UnicodeCategory.OtherSymbol => true, _ => false })) // Match a character in the set [\P{S}].
{
return false; // The input didn't match.
}and now results in: if ((uint)slice.Length < 18 ||
!char.IsControl(slice[0]) || // Match a character in the set [\p{C}].
char.IsControl(slice[1]) || // Match a character in the set [\P{C}].
!char.IsLetter(slice[2]) || // Match a character in the set [\p{L}].
char.IsLetter(slice[3]) || // Match a character in the set [\P{L}].
!char.IsLetterOrDigit(slice[4]) || // Match a character in the set [\p{L}\d].
char.IsLetterOrDigit(slice[5]) || // Match a character in the set [^\p{L}\d].
!char.IsLower(slice[6]) || // Match a character in the set [\p{Ll}].
char.IsLower(slice[7]) || // Match a character in the set [\P{Ll}].
!char.IsUpper(slice[8]) || // Match a character in the set [\p{Lu}].
char.IsUpper(slice[9]) || // Match a character in the set [\P{Lu}].
!char.IsNumber(slice[10]) || // Match a character in the set [\p{N}].
char.IsNumber(slice[11]) || // Match a character in the set [\P{N}].
!char.IsPunctuation(slice[12]) || // Match a character in the set [\p{P}].
char.IsPunctuation(slice[13]) || // Match a character in the set [\P{P}].
!char.IsSeparator(slice[14]) || // Match a character in the set [\p{Z}].
char.IsSeparator(slice[15]) || // Match a character in the set [\P{Z}].
!char.IsSymbol(slice[16]) || // Match a character in the set [\p{S}].
!char.IsSymbol(slice[17])) // Match a character in the set [\P{S}].
{
return false; // The input didn't match.
}
|
|
We'll subsequently want to use any new |
This PR causes regex to now specially-recognize additional categories that map to sets `char` already has `IsXx` methods for and call them, e.g. `char.IsControl`, `char.IsLetter`, etc.
joperezr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I assume we already have tests for all of these constructs for the compiled and source generated engines?
|
Yes, though several are outerloop. |
This PR causes regex to now specially-recognize additional categories that map to sets
charalready hasIsXxmethods for and call them, e.g.char.IsControl,char.IsLetter, etc.Example:
previously resulted in:
and now results in: