Refactor Regex FindFirstChar #60822

stephentoub · 2021-10-25T02:41:42Z

In the first commit, RegexCompiler/Emitter.EmitFindFirstChar are updated to separate out each of the strategies into their own function. There are no functional changes as part of this; the code is simply moved into a helper and then that helper invoked where the original code was. This will be easiest reviewed ignoring whitespace.

In the second commit, RegexInterpreter.FindFirstChar is also updated. However, this does involve some functional changes. Rather than each call to FindFirstChar needing to figure out which strategy to use, we now compute that once in the RegexInterpreter ctor, and then in FindFirstChar just branch to the right place.

I have some subsequent changes around vectorization for FindFirstChar I aim to make, and it was getting unwieldy doing so with the code as previously structured.

ghost · 2021-10-25T02:41:47Z

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

In the first commit, RegexCompiler/Emitter.EmitFindFirstChar are updated to separate out each of the strategies into their own function. There are no functional changes as part of this; the code is simply moved into a helper and then that helper invoked where the original code was. This will be easiest reviewed ignoring whitespace.

In the second commit, RegexInterpreter.FindFirstChar is also updated. However, this does involve some functional changes. Rather than each call to FindFirstChar needing to figure out which strategy to use, we now compute that once in the RegexInterpreter ctor, and then in FindFirstChar just branch to the right place.

I have some subsequent changes around vectorization for FindFirstChar I aim to make, and it was getting unwieldy doing so with the code as previously structured.

Author:	stephentoub
Assignees:	-
Labels:	`area-System.Text.RegularExpressions`
Milestone:	7.0.0

eerhardt

LGTM

eerhardt · 2021-10-25T19:06:02Z

...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs

(nit) why is the ! needed on code.LeadingCharClasses![0], when the line directly above already dereferences code.LeadingCharClasses?

Leftover from when the code was written differently. Will remove.

...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs

These have become unwieldy, with multiple large code blocks for finding the first character at which to perform a full match, and we'll be adding more soon. This simply splits each of those out into their own method. No functional changes.

We don't need to re-evaluate which mechanism to use on every FindFirstChar call... we can do so once in the ctor and then just switch on that in the actual FindFirstChar call.

stephentoub added the area-System.Text.RegularExpressions label Oct 25, 2021

stephentoub added this to the 7.0.0 milestone Oct 25, 2021

eerhardt approved these changes Oct 25, 2021

View reviewed changes

stephentoub added 2 commits October 25, 2021 17:43

Refactor Regexcompiler/Emitter.FindFirstChar

cb77039

These have become unwieldy, with multiple large code blocks for finding the first character at which to perform a full match, and we'll be adding more soon. This simply splits each of those out into their own method. No functional changes.

Use switch in RegexInterpreter.FindFirstChar

c668463

We don't need to re-evaluate which mechanism to use on every FindFirstChar call... we can do so once in the ctor and then just switch on that in the actual FindFirstChar call.

stephentoub force-pushed the refactorfindfirst branch from 41e444d to c668463 Compare October 25, 2021 21:49

stephentoub merged commit 1cc64c5 into dotnet:main Oct 26, 2021

stephentoub deleted the refactorfindfirst branch October 26, 2021 00:50

ghost locked as resolved and limited conversation to collaborators Nov 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Regex FindFirstChar #60822

Refactor Regex FindFirstChar #60822

Uh oh!

stephentoub commented Oct 25, 2021

Uh oh!

ghost commented Oct 25, 2021

Uh oh!

eerhardt left a comment

Uh oh!

eerhardt Oct 25, 2021

Uh oh!

stephentoub Oct 25, 2021

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor Regex FindFirstChar #60822

Refactor Regex FindFirstChar #60822

Uh oh!

Conversation

stephentoub commented Oct 25, 2021

Uh oh!

ghost commented Oct 25, 2021

Uh oh!

eerhardt left a comment

Choose a reason for hiding this comment

Uh oh!

eerhardt Oct 25, 2021

Choose a reason for hiding this comment

Uh oh!

stephentoub Oct 25, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants