-
Notifications
You must be signed in to change notification settings - Fork 5.3k
A bunch of small regex tweaks #66572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsI spent some time exploring combining all of the TryFindXx and TryMatchXx methods in RegexCompiler / source generator into just a single Scan. It doesn't yet appear to be worth doing, but along the way I tweaked a bunch of things, and am submitting that as a PR separately. Functional:
Performance:
Most everything else is for code readability / maintenance.
|
deeprobin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall good to me - Good job!
👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, it would be useful to add a comment here explaining this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comment why supressing the _regexTree nullable warning would be imo useful.
How do we know that _regexTree is not null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comment why supressing the _regexTree nullable warning would be imo useful.
How do we know that _regexTree is not null?
EmitScope's usage was resulting in a "//" line being output. We can just delete all remaining usage of EmitScope; it's no longer helpful.
…nd optimization Store the end of the loop position into the unused runtrackpos, and use that to jump right to that position instead of the normal loop handling code.
2aafaea to
5567159
Compare
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
joperezr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some small questions but this looks good otherwise. I like the simplified codegen this produces for most common cases.
Yeah, this covers ~1/3rd of the cases in our corpus, so that's decent. |
* Remove extra blank comment line EmitScope's usage was resulting in a "//" line being output. We can just delete all remaining usage of EmitScope; it's no longer helpful. * Fix some blocks not using EmitBlock * Minor tweak to add some uint casts to eliminate a few bounds checks * Avoid rematching loops already matched with the literal-after-loop find optimization Store the end of the loop position into the unused runtrackpos, and use that to jump right to that position instead of the normal loop handling code. * Remove unnecessary writer argument to the SliceInputSpan local function * Make sure we have some RightToLeft culture tests * Use a more canonical form of loop in EmitLiteralAfterAtomicLoop * Avoid unnecessary loop in literal-after-atomic-loop for no min bound * Add more emitted comments to TryFindNextPossibleStartingPosition * Invert some anchor checks to make the code cleaner * Remove stray "global::" * Rename original_pos to matchStart * Add an additional reduction for concats containing nothings * Fix exception type thrown for invalid NonBacktracking options combination * Add special-cased code emitting to EmitScan * Avoid label in TryFindNextPossibleStartingPosition when it's unnecessary
I spent some time exploring combining all of the TryFindXx and TryMatchXx methods in RegexCompiler / source generator into just a single Scan. It doesn't yet appear to be worth doing, but along the way I tweaked a bunch of things, and am submitting that as a PR separately.
Functional:
Performance:
Most everything else is for code readability / maintenance.