-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Regex refactoring #27473
Regex refactoring #27473
Conversation
| /// <summary> | ||
| /// Returns an empty Match object. | ||
| /// </summary> | ||
| public static Match Empty => new Match(null, 1, string.Empty, 0, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a mistake. Previously it would only make a singleton empty match object, now it's making one every time it's requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see a test is failing for this reason: NextMatch_EmptyMatch_ReturnsEmptyMatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To speed things up I pushed this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction, I didn't as I don't have permissions. It does fix the test though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this need to be { get; } = instead of =>
|
|
||
| _canonical = true; | ||
| _rangelist.Sort(SingleRangeComparer.Instance); | ||
| _rangelist.Sort(SingleRangeComparer.s_instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style wise I don't know how we feel about directly accessing fields, even of nested classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see we were already doing it, you renamed. Fine.
danmoseley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if you fix the Match.Empty mistake.
Also please do a sanity test of some perf scenario.
I agree let's not squash this.
| int IList<Capture>.IndexOf(Capture item) | ||
| { | ||
| var comparer = EqualityComparer<Capture>.Default; | ||
| EqualityComparer<Capture> comparer = EqualityComparer<Capture>.Default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, I would suggest looking at not storing this into a local first. With devirtualization support for EqualityComparer.Default, this might be better as:
for (int i = 0; i < Count; i++)
{
if (EqualityComparer<Capture>.Default.Equals(this[i], item)
{
return i;
}
}
return -1;There's also a lot of code behind the seemingly innocent this[i]; something else to look at separately.
src/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Group.cs
Show resolved
Hide resolved
| public static readonly TimeSpan InfiniteMatchTimeout = Timeout.InfiniteTimeSpan; | ||
|
|
||
| // timeout for the execution of this regex | ||
| // timeout for the execution of this regex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it's strange to me to see a period added at the end, when it's not a full sentence and doesn't begin with a capital
| if (caps == null) | ||
| { | ||
| caps = new Hashtable(value); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this (and other cases like it) can be simplified, e.g.
caps = value as Hashtable ?? new Hashtable(value);| return RegexCompiler.Compile(code, roptions); | ||
| } | ||
|
|
||
| [SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly", MessageId = "assemblyname", Justification = "Microsoft: already shipped since v1 - can't fix without causing a breaking change")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we run these FxCop naming rules still? I'm wondering if all of these SuppressMessage attributes can just be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, we can remove them all.
| sb.Append(s_hex[(ch >> shift) & 0xF]); | ||
| } | ||
|
|
||
| return sb.ToString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this CharDescription code does not need to be efficient? (It's not ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tell me more please :) We loose a lot of performance in RegexCharClass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We loose a lot of performance in RegexCharClass.
I was referring specifically to these "Description" methods. They're used in common code paths in release builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, that method is only used in DEBUG code.
| private sealed class SingleRangeComparer : IComparer<SingleRange> | ||
| { | ||
| public static readonly SingleRangeComparer Instance = new SingleRangeComparer(); | ||
| public static readonly SingleRangeComparer s_instance = new SingleRangeComparer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is a public field, so we generally have used PascalCasing. It just happens to be on a private type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should adjust our .editorconfig file as VS recommends such a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to internal would be non breaking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a private class
13eba3c to
4407742
Compare
Are you going to do any squashing prior to merging? There are a huge number of commits in this PR, and many of them don't need to stand on their own. |
|
Yes I will squash the appropriate ones before merging. E.g. code cleanups together, split of Regex class into subfiles together, Span for Match & Capture |
|
Thanks. |
f339978 to
7716503
Compare
Done.
Just done locally. Results are the same for ctor construction and Match. Replace could be slightly faster because we use Slice instead of Substring now. I will merge this when CI turns green. |
Move Cache related code to Regex.Cache.cs Move Match related code to Regex.Match.cs Move Replace related code to Regex.Replace.cs Move Split related code to Match.Split.cs Move Timeout related code to Regex.Timeout.cs
Code cleanup in RegexBoyerMoore Code Cleanup in RegexCharClass RegexFCD code cleanup Code cleanup & limit visibility of helper methods in RegexNode Code cleanup in RegexReplacement Code cleanup in RegexTree Group code cleanup Match code cleanup Capture code cleanup Fix Match.Empty initalization to only compute ones Regex code cleanup and comments added
|
@dotnet-bot test this No idea what's going on here. AFAIK AsReadOnlySpan takes 2 arguments... Ah it is renamed to ToSpan: 913d11e |
7716503 to
6bbdbbe
Compare
Use Span in Match and avoid intermidiate string allocation
6bbdbbe to
072036f
Compare
|
MetadataReaderTests failed on OSX10.12 with BIFException. I"ll open an issue if that proves to be more than a build glitch. |
|
I nearly squashed... |
|
@stephentoub I overlooked that you hadn't hit signoff. |
I guess that means I have a get-out-of-jail-free card if this causes everything to go haywire. |
…() (dotnet#27473) * Revert removal of SuppressGCTransition from SystemNative_GetTimestamp() * Insert GC_POLL before statement with unmanaged call. * JIT test for insertion of GCPoll Signed-off-by: dotnet-bot <[email protected]>
…() (#27473) * Revert removal of SuppressGCTransition from SystemNative_GetTimestamp() * Insert GC_POLL before statement with unmanaged call. * JIT test for insertion of GCPoll Signed-off-by: dotnet-bot <[email protected]>
From the previous PR with the changes now grouped into multiple commits. No squash please!
@stephentoub I will incorporate your feedback here later today.