Skip to content

Conversation

@stephentoub
Copy link
Member

In the first commit, RegexCompiler/Emitter.EmitFindFirstChar are updated to separate out each of the strategies into their own function. There are no functional changes as part of this; the code is simply moved into a helper and then that helper invoked where the original code was. This will be easiest reviewed ignoring whitespace.

In the second commit, RegexInterpreter.FindFirstChar is also updated. However, this does involve some functional changes. Rather than each call to FindFirstChar needing to figure out which strategy to use, we now compute that once in the RegexInterpreter ctor, and then in FindFirstChar just branch to the right place.

I have some subsequent changes around vectorization for FindFirstChar I aim to make, and it was getting unwieldy doing so with the code as previously structured.

@ghost
Copy link

ghost commented Oct 25, 2021

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

In the first commit, RegexCompiler/Emitter.EmitFindFirstChar are updated to separate out each of the strategies into their own function. There are no functional changes as part of this; the code is simply moved into a helper and then that helper invoked where the original code was. This will be easiest reviewed ignoring whitespace.

In the second commit, RegexInterpreter.FindFirstChar is also updated. However, this does involve some functional changes. Rather than each call to FindFirstChar needing to figure out which strategy to use, we now compute that once in the RegexInterpreter ctor, and then in FindFirstChar just branch to the right place.

I have some subsequent changes around vectorization for FindFirstChar I aim to make, and it was getting unwieldy doing so with the code as previously structured.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) why is the ! needed on code.LeadingCharClasses![0], when the line directly above already dereferences code.LeadingCharClasses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from when the code was written differently. Will remove.

These have become unwieldy, with multiple large code blocks for finding the first character at which to perform a full match, and we'll be adding more soon.  This simply splits each of those out into their own method.  No functional changes.
We don't need to re-evaluate which mechanism to use on every FindFirstChar call... we can do so once in the ctor and then just switch on that in the actual FindFirstChar call.
@stephentoub stephentoub merged commit 1cc64c5 into dotnet:main Oct 26, 2021
@stephentoub stephentoub deleted the refactorfindfirst branch October 26, 2021 00:50
@ghost ghost locked as resolved and limited conversation to collaborators Nov 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants