Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@artkpv
Copy link
Contributor

@artkpv artkpv commented Feb 16, 2018

Performance test for System.Text.RegularExpressions.Regex cache. Refs #24425

[MeasureGCAllocations]
public void IsMatch()
{
var cacheSizeOld = Regex.CacheSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap the code inside a try finally to make sure to reset the cache size even if an Exception is thrown somewhere.

try {} finally { Regex.CacheSize = cacheSizeOld; }

using (iteration.StartMeasurement())
{
for (var i = 0; i < Benchmark.InnerIterationCount; i++)
s_IsMatch = Regex.IsMatch("0123456789", _regexes[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to do something with the return value? I see that you currently store it as a volatile instance field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ViktorHofer No. Just to make sure it is not optimized away by JIT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that's needed. In my benchmarks with at least BenchmarkDotNet I don't need to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly how JIT can optimize it away. Probably if at some iteration it doesn't produce any changes it can just drop it? I've read about this in the performance tests doc and thought it can be useful for future. Why to remove?

{
private const int N = 50_000;
private const int UniqueRegsNum = (int)(N * 0.02);
private const int CacheSize = (int)(N * 0.02);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the number of unique patterns is the same as the cache size the cache will always hit, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Not so realistic. Don't know what's better. Make two tests: for the default cache size and for some part of unique regexes? I'll think about it more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just make UniqueRegsNum = (int)(N * 0.04) or 0.06 or 0.08 or 0.10. A factor between 2 and 5 of the cache size should suffice here.

try
{

Regex.CacheSize = CacheSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a normal Xunit test this would not be safe as other tests would run concurrently and see this change. One would assume this can never happen when running a perf test though..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Good note about multithreading for tests

for (; i < N; i++) _regexes[i] = _regexes[i % UniqueRegsNum];
}
// shuffle:
var random = new Random();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to seed for reproducibility? 42 seems favored here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Looks like not everyone favors 42 :)

var i = 0;
for (; i < UniqueRegsNum; i++)
{
// "(0+)" "(1+)" .. "(9+)(9+)(8+)" ..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these particular set of regexes? "(9+)(9+)(8+)" etc would not match "0123456789" and if you don't care whether there's a match you could have a simpler set of regexes eg "1", "2", ... "998" ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say exactly why it is better or not. They're unique, not simple (groups, quantification). So somewhat more realistic for apps I guess. Input is small so we run the test quicker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pattern is fine as it may be used for others scenarios in the future as well

@ViktorHofer
Copy link
Member

@artkpv should we close this one now as the change is already included in #27278?

@artkpv
Copy link
Contributor Author

artkpv commented Mar 1, 2018

@ViktorHofer Yeap, adding with another PR

@artkpv artkpv closed this Mar 1, 2018
@artkpv artkpv deleted the RegexPerfTest_24425 branch March 3, 2018 12:55
@karelz karelz added this to the 2.1.0 milestone Mar 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants