-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Found while writing dotnet/roslyn#23984 (a usable regex parser for use in IDE scenarios)
The pattern in question is: (cat)(\c[*)(dog)
According to .net regex rules \c[ is supposed to match "the ASCII control character that is specified by X or x, where X or x is the letter of the control character." So, in this case it should match the [ control character (equivalent to \u001b). But, instead, the .net regex parser fails on this. The reason for this is that it does a prepass where it attempts to find all the captures (CountCaptures()). But during that prepass escape handling is different from the normal parse:
As you can see, when it sees an escape \, it just skips the next character and the proceeds to scanning normally. This means it only skips the \c and goes back to its normal loop. This makes it think that [ is the start of a character class, which then causes it to fail since that character class is unterminated.
The appropriate fix would likely be to not just do MoveRight() here, but instead use ScanBackslash() the same way the code does ScanCharClass immediately below: