-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Regex perf test for cache, refs #24425 #27202
Conversation
| [MeasureGCAllocations] | ||
| public void IsMatch() | ||
| { | ||
| var cacheSizeOld = Regex.CacheSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap the code inside a try finally to make sure to reset the cache size even if an Exception is thrown somewhere.
try { … } finally { Regex.CacheSize = cacheSizeOld; }| using (iteration.StartMeasurement()) | ||
| { | ||
| for (var i = 0; i < Benchmark.InnerIterationCount; i++) | ||
| s_IsMatch = Regex.IsMatch("0123456789", _regexes[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to do something with the return value? I see that you currently store it as a volatile instance field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ViktorHofer No. Just to make sure it is not optimized away by JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that's needed. In my benchmarks with at least BenchmarkDotNet I don't need to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly how JIT can optimize it away. Probably if at some iteration it doesn't produce any changes it can just drop it? I've read about this in the performance tests doc and thought it can be useful for future. Why to remove?
| { | ||
| private const int N = 50_000; | ||
| private const int UniqueRegsNum = (int)(N * 0.02); | ||
| private const int CacheSize = (int)(N * 0.02); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the number of unique patterns is the same as the cache size the cache will always hit, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Not so realistic. Don't know what's better. Make two tests: for the default cache size and for some part of unique regexes? I'll think about it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just make UniqueRegsNum = (int)(N * 0.04) or 0.06 or 0.08 or 0.10. A factor between 2 and 5 of the cache size should suffice here.
| try | ||
| { | ||
|
|
||
| Regex.CacheSize = CacheSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a normal Xunit test this would not be safe as other tests would run concurrently and see this change. One would assume this can never happen when running a perf test though..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Good note about multithreading for tests
| for (; i < N; i++) _regexes[i] = _regexes[i % UniqueRegsNum]; | ||
| } | ||
| // shuffle: | ||
| var random = new Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to seed for reproducibility? 42 seems favored here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Looks like not everyone favors 42 :)
| var i = 0; | ||
| for (; i < UniqueRegsNum; i++) | ||
| { | ||
| // "(0+)" "(1+)" .. "(9+)(9+)(8+)" .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these particular set of regexes? "(9+)(9+)(8+)" etc would not match "0123456789" and if you don't care whether there's a match you could have a simpler set of regexes eg "1", "2", ... "998" ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't say exactly why it is better or not. They're unique, not simple (groups, quantification). So somewhat more realistic for apps I guess. Input is small so we run the test quicker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pattern is fine as it may be used for others scenarios in the future as well
|
@ViktorHofer Yeap, adding with another PR |
Performance test for System.Text.RegularExpressions.Regex cache. Refs #24425