Skip to content

Regex fails to accept pattern that it probably should. #24759

@CyrusNajmabadi

Description

@CyrusNajmabadi

Found while writing dotnet/roslyn#23984 (a usable regex parser for use in IDE scenarios)

The pattern in question is: (cat)(\c[*)(dog)

According to .net regex rules \c[ is supposed to match "the ASCII control character that is specified by X or x, where X or x is the letter of the control character." So, in this case it should match the [ control character (equivalent to \u001b). But, instead, the .net regex parser fails on this. The reason for this is that it does a prepass where it attempts to find all the captures (CountCaptures()). But during that prepass escape handling is different from the normal parse:

https://github.com/dotnet/corefx/blob/353bdc62ccb5c56148bf0e31814f1fd4f84a7fd9/src/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs#L1644-L1649

As you can see, when it sees an escape \, it just skips the next character and the proceeds to scanning normally. This means it only skips the \c and goes back to its normal loop. This makes it think that [ is the start of a character class, which then causes it to fail since that character class is unterminated.

The appropriate fix would likely be to not just do MoveRight() here, but instead use ScanBackslash() the same way the code does ScanCharClass immediately below:

https://github.com/dotnet/corefx/blob/353bdc62ccb5c56148bf0e31814f1fd4f84a7fd9/src/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs#L1659-L1661

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions