Skip to content

Conversation

@stephentoub
Copy link
Member

  • For a lazy loop, compute whether the child might match empty. Only if it might do we need any of the checks around empty iterations, and only in those situations do we need to track the starting position or whether we've seen an empty loop, so avoid all work associated with that if the min length of the child is greater than zero.
  • If the lazy loop failing will exit the match, avoid doing all the cleanup work that ends up being irrelevant.
  • If the lazy loop isn't itself part of another loop, we don't need the backtracking section to push/pop any additional state on the backtracking stack.
  • Avoid outputting two gotos in a row in greedy loops.
  • Avoid outputting additional if blocks in greedy loops when the checks are identical.
  • Add some comments, in particular to loops where the logic can be tricky and hard to follow / reason about.
  • Remove some emitted semicolons after labels.
  • Fix the naming of a few "SkipBacktrack" labels to conform to the same prefixed naming convention used for other labels

Fixes #68415

- For a lazy loop, compute whether the child might match empty.  Only if it might do we need any of the checks around empty iterations, and only in those situations do we need to track the starting position or whether we've seen an empty loop, so avoid all work associated with that if the min length of the child is greater than zero.
- If the lazy loop failing will exit the match, avoid doing all the cleanup work that ends up being irrelevant.
- If the lazy loop isn't itself part of another loop, we don't need the backtracking section to push/pop any additional state on the backtracking stack.
- Avoid outputting two gotos in a row in greedy loops.
- Avoid outputting additional if blocks in greedy loops when the checks are identical.
- Add some comments, in particular to loops where the logic can be tricky and hard to follow / reason about.
- Remove some emitted semicolons after labels.
- Fix the naming of a few "SkipBacktrack" labels to conform to the same prefixed naming convention used for other labels
@stephentoub stephentoub added this to the 7.0.0 milestone Apr 25, 2022
@stephentoub stephentoub requested a review from joperezr April 25, 2022 14:05
@ghost ghost assigned stephentoub Apr 25, 2022
@ghost
Copy link

ghost commented Apr 25, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details
  • For a lazy loop, compute whether the child might match empty. Only if it might do we need any of the checks around empty iterations, and only in those situations do we need to track the starting position or whether we've seen an empty loop, so avoid all work associated with that if the min length of the child is greater than zero.
  • If the lazy loop failing will exit the match, avoid doing all the cleanup work that ends up being irrelevant.
  • If the lazy loop isn't itself part of another loop, we don't need the backtracking section to push/pop any additional state on the backtracking stack.
  • Avoid outputting two gotos in a row in greedy loops.
  • Avoid outputting additional if blocks in greedy loops when the checks are identical.
  • Add some comments, in particular to loops where the logic can be tricky and hard to follow / reason about.
  • Remove some emitted semicolons after labels.
  • Fix the naming of a few "SkipBacktrack" labels to conform to the same prefixed naming convention used for other labels

Fixes #68415

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions, tenet-performance

Milestone: 7.0.0

Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I assume we already have more than enough coverage around all of these branches with our existing testbed.

@stephentoub
Copy link
Member Author

I assume we already have more than enough coverage around all of these branches with our existing testbed.

I hope so :) We at least have these code paths covered.

@stephentoub stephentoub merged commit 90d6f53 into dotnet:main Apr 25, 2022
@stephentoub stephentoub deleted the lazyloopwork branch April 25, 2022 19:02
@ghost ghost locked as resolved and limited conversation to collaborators May 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve handling of lazy loops in regex source generator / compiler

2 participants