Fix perf bug in RegexCompiler when handling .*? #118373

stephentoub · 2025-08-05T02:55:41Z

We have a special-code path that exists to optimize a singleline .*?, in which case we can just search for what comes after the loop in the pattern because the loop itself will lazily match everything. Unfortunately, we're passing the wrong node to the EmitIndexOf helper that emits that search. We should be passing the node which represents the subsequent literal, but we're accidentally passing the set loop itself. We're only here if that set loop matches everything, so we're emitting an IndexOfAnyInRange(0, \uFFFF) call. This is functionally ok, but perf tanks because we end up needing to do non-trivial work for every character that matches the loop.

We have a special-code path that exists to optimize a singleline `.*?`, in which case we can just search for what comes after the loop in the pattern because the loop itself will lazily match everything. Unfortunately, we're passing the wrong node to the EmitIndexOf helper that emits that search. We should be passing the node which represents the subsequent literal, but we're accidentally passing the set loop itself. We're only here if that set loop matches everything, so we're emitting an IndexOfAnyInRange(0, \uFFFF) call. This is functionally ok, but perf tanks because we end up needing to do non-trivial work for every character that matches the loop.

Copilot

Pull Request Overview

This PR fixes a performance bug in the regex compiler's optimization for lazy quantifiers followed by literals. The bug occurred when handling patterns like .*? (singleline lazy dot-star) followed by a literal, where the compiler was incorrectly passing the wrong node to the IndexOf emission helper.

Corrects the node parameter passed to EmitIndexOf from the loop node to the literal node
Fixes performance degradation caused by inefficient IndexOfAnyInRange(0, \uFFFF) calls
Maintains functional correctness while improving performance for this optimization path

dotnet-policy-service · 2025-08-05T02:56:36Z

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

MihaZupan

Nice. Did you spot this in some benchmark given it's compiler-only?

stephentoub · 2025-08-05T11:01:49Z

Nice. Did you spot this in some benchmark given it's compiler-only?

Yup, the numbers I was getting out made no sense.

We have a special-code path that exists to optimize a singleline `.*?`, in which case we can just search for what comes after the loop in the pattern because the loop itself will lazily match everything. Unfortunately, we're passing the wrong node to the EmitIndexOf helper that emits that search. We should be passing the node which represents the subsequent literal, but we're accidentally passing the set loop itself. We're only here if that set loop matches everything, so we're emitting an IndexOfAnyInRange(0, \uFFFF) call. This is functionally ok, but perf tanks because we end up needing to do non-trivial work for every character that matches the loop.

stephentoub requested review from MihaZupan and Copilot August 5, 2025 02:55

github-actions bot added the area-System.Text.RegularExpressions label Aug 5, 2025

Copilot AI reviewed Aug 5, 2025

View reviewed changes

dotnet-policy-service bot assigned stephentoub Aug 5, 2025

stephentoub enabled auto-merge (squash) August 5, 2025 02:56

build-analysis bot mentioned this pull request Aug 5, 2025

Unable to pull image from mcr.microsoft.com #117164

Open

MihaZupan approved these changes Aug 5, 2025

View reviewed changes

stephentoub merged commit 0686df9 into dotnet:main Aug 5, 2025
80 of 82 checks passed

stephentoub deleted the fixanysearch branch August 5, 2025 11:01

github-actions bot locked and limited conversation to collaborators Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix perf bug in RegexCompiler when handling .*? #118373

Fix perf bug in RegexCompiler when handling .*? #118373

Uh oh!

stephentoub commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

dotnet-policy-service bot commented Aug 5, 2025

Uh oh!

MihaZupan left a comment

Uh oh!

Uh oh!

stephentoub commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix perf bug in RegexCompiler when handling .*? #118373

Fix perf bug in RegexCompiler when handling .*? #118373

Uh oh!

Conversation

stephentoub commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

dotnet-policy-service bot commented Aug 5, 2025

Uh oh!

MihaZupan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stephentoub commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants