Skip to content

Optimize file system enumeration pattern matching with specialized delegates#123041

Open
Copilot wants to merge 16 commits intomainfrom
copilot/optimize-file-system-enumeration
Open

Optimize file system enumeration pattern matching with specialized delegates#123041
Copilot wants to merge 16 commits intomainfrom
copilot/optimize-file-system-enumeration

Conversation

Copy link
Contributor

Copilot AI commented Jan 9, 2026

Description

File system enumeration was re-checking pattern types on every entry to determine if fast paths apply (*literal → EndsWith). This moves pattern analysis to enumerable creation time and creates specialized delegates upfront.

Changes:

  • Add GetPredicate<T> that analyzes patterns once and returns optimized FileSystemEnumerable<T>.FindPredicate delegates:
    • * → always-true predicate (with IsDirectory/IsFile check)
    • literalEquals for exact filename matching (e.g., "log.txt" across directory hierarchies)
    • *literalEndsWith (existing optimization, moved earlier)
    • literal*StartsWith
    • *literal*Contains
    • prefix*suffixStartsWith + EndsWith with minimum length check
  • Complex patterns fall back to full NFA-based matching
  • Extract wildcard constants as SearchValues<char> (s_simpleWildcards, s_extendedWildcards) to internal static readonly fields in FileSystemName for shared use
  • Combine IsDirectory/IsFile check with pattern matcher into a single delegate invocation (avoids double delegate call)
  • Capture only expression in lambdas and compute span slices inline on each invocation to avoid string allocations and minimize capture overhead
  • Switch over (useExtendedWildcards, entryType) tuple to return distinct delegates without capturing unnecessary variables
  • Restructured pattern matching logic to use switch statement on (startsWithStar, endsWithStar) tuple for cleaner code organization

Customer Impact

Performance improvement for file enumeration with common glob patterns. Each file entry now calls a simple string operation instead of re-evaluating pattern structure and potentially running the full matching algorithm.

Regression

No. This is a performance optimization.

Testing

  • All existing FileSystem tests pass
  • Added unit test cases in FileSystemNameTests.cs for literal patterns, literal* (StartsWith), *literal* (Contains), and prefix*suffix patterns
  • Added end-to-end integration tests in PatternTransformTests.cs that reference existing SimpleMatchData and Win32MatchData theory data from FileSystemNameTests, creating actual files and validating enumeration results for each pattern type through the full enumeration path including the optimized delegates

Risk

Low. The optimization preserves existing behavior by falling back to full pattern matching for any pattern not matching the fast paths. All existing tests pass unchanged.

Original prompt

When doing file system enumeration, every entry returned by the is is then checked to see whether it matches the caller-provided pattern. That's handled by a delegate that eventually calls into a Match Pattern routine. That routine has special-case, eg if the pattern is *literal it'll just do EndsWith. But that means that on every entry it's rechecking whether any of those fast paths apply. Instead, the entry points that create those delegates could create different delegates for each of the most important special-case, such that each call doesn't need to recheck the condition. It'll also be good to add additional special-case. For example, right now it special cases *literal to use EndsWith, but not literal* to use Starts With, and not *literal* to use Contains. Please optimize all of these things.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Copilot AI changed the title [WIP] Optimize file system enumeration pattern matching Optimize file system enumeration pattern matching with specialized delegates Jan 9, 2026
Copilot AI requested a review from stephentoub January 9, 2026 20:13
…, combine IsDirectory check with matcher

Co-authored-by: stephentoub <[email protected]>
@stephentoub
Copy link
Member

@EgorBot -amd -intel -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[HideColumns("Job", "Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(false)]
public partial class Tests
{
    private string _directoryPath;

    [GlobalSetup]
    public void Setup()
    {
        _directoryPath = Path.GetTempFileName();
        File.Delete(_directoryPath);

        Directory.CreateDirectory(_directoryPath);
        for (int i = 0; i < 10_000; i++)
        {
            File.Create(Path.Combine(_directoryPath, $"file{i}.txt")).Dispose();
        }
    }

    [Benchmark]
    public int Sum1()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath)) sum += path.Length;
        return sum;
    }

    [Benchmark]
    public int Sum2()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath, "file*")) sum += path.Length;
        return sum;
    }

    [Benchmark]
    public int Sum3()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath, "*.txt")) sum += path.Length;
        return sum;
    }

    [Benchmark]
    public int Sum4()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath, "*12*")) sum += path.Length;
        return sum;
    }

    [GlobalCleanup]
    public void Cleanup()
    {
        Directory.Delete(_directoryPath, true);
    }
}

@stephentoub stephentoub marked this pull request as ready for review January 11, 2026 22:38
Copilot AI review requested due to automatic review settings January 11, 2026 22:38
@stephentoub
Copy link
Member

@EgorBot -amd -intel -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[HideColumns("Job", "Error", "StdDev", "Median", "RatioSD")]
[MemoryDiagnoser(false)]
public partial class Tests
{
    private string _directoryPath;

    [GlobalSetup]
    public void Setup()
    {
        _directoryPath = Path.GetTempFileName();
        File.Delete(_directoryPath);

        Directory.CreateDirectory(_directoryPath);
        for (int i = 0; i < 10_000; i++)
        {
            File.Create(Path.Combine(_directoryPath, $"file{i}.txt")).Dispose();
        }
    }

    [Benchmark]
    public int NoPattern()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath)) sum += path.Length;
        return sum;
    }

    [Benchmark]
    public int Literal()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath, "file123.txt")) sum += path.Length;
        return sum;
    }

    [Benchmark]
    public int LiteralStar()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath, "file*")) sum += path.Length;
        return sum;
    }

    [Benchmark]
    public int StarLiteral()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath, "*.txt")) sum += path.Length;
        return sum;
    }

    [Benchmark]
    public int StarLiteralStar()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath, "*12*")) sum += path.Length;
        return sum;
    }

    [Benchmark]
    public int LiteralStarLiteral()
    {
        int sum = 0;
        foreach (string path in Directory.EnumerateFiles(_directoryPath, "12*.txt")) sum += path.Length;
        return sum;
    }

    [GlobalCleanup]
    public void Cleanup()
    {
        Directory.Delete(_directoryPath, true);
    }
}

@stephentoub
Copy link
Member

@copilot run the code review skill

Copy link
Contributor Author

Copilot AI commented Feb 8, 2026

@copilot run the code review skill

Here's my review of the current state of the PR:

Correctness — No issues found:

  • The pattern analysis logic correctly identifies each special case and falls back to full NFA matching for anything it can't optimize.
  • The prefix*suffix optimization correctly handles the edge case where prefix and suffix overlap in the matched name (e.g., pattern "a*a" matching "aa") — the minLength check ensures the name is long enough for both parts to fit.
  • The literal pattern check correctly excludes expressions containing \\ (the escape character) which need full pattern processing.
  • The EscapeExpression refactoring preserves the original behavior: the old code had ContainsAny as a guard before 4 separate Replace calls; the new code does IndexOfAny internally in EscapeExpression and handles all escapes in a single pass, which is both correct and more efficient.
  • The notLastPeriod simplification using name.Slice(nameOffset).Contains('.') is equivalent to the original loop.

Performance — Two observations:

⚠️ prefix*suffix captures prefixLength, suffixLength, and minLength as closure locals. Earlier feedback asked to avoid captured locals and just compute from expression inline. This case is slightly different (computing from expression.IndexOf('*') would require re-finding the star index on each call), so caching prefixLength and minLength is reasonable since they're plain ints and the closure struct is allocated once regardless. But it's worth noting the inconsistency with the other cases.

💡 The UserFiles/UserDirectories/UserEntries methods now pass isNormalized: false, expression to the internal constructor, while previously they used the public 3-arg constructor (which defaulted to isNormalized: false, expression: null). Passing the expression is correct since FileSystemEnumerable already supported it, and it enables the enumerator infrastructure to use the expression. No issue here, just noting the change.

Tests — Good coverage:

  • The end-to-end tests correctly reference SimpleMatchData and Win32MatchData via MemberData, avoiding duplication.
  • The skip conditions in the end-to-end tests are reasonable: null/empty inputs, invalid filename chars, hidden files (.-prefix on Unix), and *.*/. patterns that have special Win32 translation semantics.

No blocking issues found. The PR looks correct and well-structured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants