Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Fix regex scanner for \c[#26828

Merged
danmoseley merged 6 commits intodotnet:masterfrom
danmoseley:regex.pattern.rebased
Feb 5, 2018
Merged

Fix regex scanner for \c[#26828
danmoseley merged 6 commits intodotnet:masterfrom
danmoseley:regex.pattern.rebased

Conversation

@danmoseley
Copy link
Member

@danmoseley danmoseley commented Feb 4, 2018

Fix https://github.com/dotnet/corefx/issues/26501

In (cat)(\c[*)(dog) the \c[ should match the [ control character ie \u001b. However it was rejecting the pattern because it does a pre-scan for capturing groups. In this phase it assumes that \ always escaped a single character, so it treats the [ as the start of a character class, which is ultimately never closed.

The fix is to call into the real scanner for escape characters. It is necessary to pass a flag indicating we are scanning. This is to avoid validating that backreferences refer to valid capturing groups (these are not all set up until the pre-scanning is complete) and also to avoid allocating objects that aren't needed in this phase. We already do something very similar with ScanCharClass.

Ideally there would not be a pre-scan but I imagine that would be a substantial change.

This causes more code to run during the pre-scan when a backslash is encountered. To verify this does not impact perf I used the perf test in #26827

Before

System.Text.RegularExpressions.Performance.Tests.dll Metric Unit Iterations Average STDEV.S Min Max
System.Text.RegularExpressions.Tests.Perf_Regex.Match Duration msec 40 254.4937651 10.61602132 242.199422 292.2573603
System.Text.RegularExpressions.Tests.Perf_Regex.Match GC Allocations bytes 40 199009143.4 51155.77152 198894312 199115104
--- --- --- --- --- --- --- ---
System.Text.RegularExpressions.Tests.Perf_Regex.Match Duration msec 40 254.5757042 9.581244315 240.640903 280.4672477
System.Text.RegularExpressions.Tests.Perf_Regex.Match GC Allocations bytes 40 199014283.6 53740.21012 198891600 199147112

After this and #26543

System.Text.RegularExpressions.Performance.Tests.dll Metric Unit Iterations Average STDEV.S Min Max
System.Text.RegularExpressions.Tests.Perf_Regex.Match Duration msec 40 255.7465251 8.22304857 245.721344 274.720744
System.Text.RegularExpressions.Tests.Perf_Regex.Match GC Allocations bytes 40 199048536.2 43711.56892 198975760 199152184
--- --- --- --- --- --- --- ---
System.Text.RegularExpressions.Tests.Perf_Regex.Match Duration msec 40 252.0985474 6.940020203 241.216865 268.1445129
System.Text.RegularExpressions.Tests.Perf_Regex.Match GC Allocations bytes 40 199049236 42700.49669 198976144 199151832

Elapsed difference is within noise level and allocations are negligibly increased.

@danmoseley danmoseley changed the title Regex.pattern.rebased Fix regex scanner for \c[ Feb 4, 2018
* a RegexNode for the type of atom scanned.
*/
internal RegexNode ScanBackslash()
internal RegexNode ScanBackslash(bool scanOnly = false)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, optional parameters for these internal APIs aren't a great idea. By being non-optional it ensures there isn't a caller missed.

}

// Try to parse backreference: \<1> or \<cap>
// Try to parse backreference: \<1>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment seemed right. it was just referring to the if and the else section.

@danmoseley
Copy link
Member Author

@danmoseley danmoseley merged commit 80e220f into dotnet:master Feb 5, 2018
@danmoseley danmoseley deleted the regex.pattern.rebased branch February 5, 2018 16:59
@karelz karelz added this to the 2.1.0 milestone Feb 5, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Fix with test

* Avoid excess allocations

* Negative test

* Exclude from NETFX

* Avoid default params

* NETFX


Commit migrated from dotnet/corefx@80e220f
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regex fails to accept pattern that it probably should.

3 participants