Improve Regex's knowledge of captures and atomicity in the tree

Handling captures involves additional code, e.g. uncapturing after a failed match attempt.  For both RegexOptions.Compiled and the source generator, we output code to both track where we need to uncapture to and to then perform that uncapture if it's possible there may be captures involved.  But today the analysis for this is relatively limited: we mark each node in the tree as to whether it contains captures, and that allows us to avoid the boilerplate uncapture code after failures in child nodes.  But we don't analyze/track whether any nodes _after_ a given node involve captures, and that's relevant for backtracking, e.g. today for the expression `(a*)b*[bc]` the codegen for that `b*` loop will involve the uncapture boilerplate because it'll see that the whole expression contains captures, even though if backtracking occurs due to the `[bc]` not matching, there is provably nothing that will need to be uncaptured. We should change how we annotate the tree and include this idea of "is a node followed by a capture".  This could be done, for example, by walking the tree backwards, tracking whether we've seen a capture, and if we have, marking every future node we see.

We have a similar issue for atomic.  Code gen can be much better for a variety of constructs if they're atomic.  Today we walk up the tree from a node to see if anything in its parent hierarchy makes it atomic, but worst case that could be an O(N^2) algorithm (as part of construction, not matching).  We could instead do a single O(N) pass over the tree to compute this for every node.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Regex's knowledge of captures and atomicity in the tree #62451

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improve Regex's knowledge of captures and atomicity in the tree #62451

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions