Skip to content

Conversation

@stephentoub
Copy link
Member

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Concurrent;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[DisassemblyDiagnoser]
[MemoryDiagnoser]
public class Bench
{
    private ConcurrentDictionary<int, int> _ints = new ConcurrentDictionary<int, int>(Enumerable.Range(0, 1000).ToDictionary(i => i, i => i));
    private ConcurrentDictionary<string, string> _strings = new ConcurrentDictionary<string, string>(Enumerable.Range(0, 1000).ToDictionary(i => i.ToString(), i => i.ToString()));

    [Benchmark] 
    public int EnumerateInts()
    {
        int sum = 0;
        foreach (var kvp in _ints) sum += kvp.Value;
        return sum;
    }

    [Benchmark]
    public int EnumerateStrings()
    {
        int sum = 0;
        foreach (var kvp in _strings) sum += kvp.Value.Length;
        return sum;
    }
}
Method Toolchain Mean Ratio Code Size Allocated Alloc Ratio
EnumerateInts \main\corerun.exe 4.537 us 1.00 261 B 56 B 1.00
EnumerateInts \pr\corerun.exe 3.022 us 0.67 198 B 56 B 1.00
EnumerateStrings \main\corerun.exe 10.909 us 1.00 286 B 64 B 1.00
EnumerateStrings \pr\corerun.exe 9.067 us 0.83 224 B 64 B 1.00

cc: @AndyAyersMS

Copilot AI review requested due to automatic review settings June 24, 2025 02:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR streamlines the implementation of ConcurrentDictionary’s enumerator to improve performance and reduce code complexity.

  • Eliminates the explicit state machine by replacing the switch statement with a while(true) loop.
  • Adopts a primary constructor for the Enumerator class and simplifies the bucket iteration logic.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@stephentoub
Copy link
Member Author

/ba-g unrelated wasm failures

@stephentoub stephentoub merged commit 1c4f468 into dotnet:main Jun 24, 2025
81 of 87 checks passed
@stephentoub stephentoub deleted the cdenumerator branch June 24, 2025 12:29
@AndyAyersMS
Copy link
Member

This speedup is likely from getting rid of the irreducible loop. As #25448 (comment) notes, [AggressiveInlining] would give an extra boost, currently just for value class cases.

I'll take another look at the JIT heuristics and see if there's some other way to get inlining to happen.

@AndyAyersMS
Copy link
Member

I'll take another look at the JIT heuristics and see if there's some other way to get inlining to happen.

With this change the JIT will inline MoveNext along the "successful" GDV path, but not along the failed one, and since the enumerator allocation is not under a GDV, CEA doesn't kick in, so the enumerator looks like it escapes, so we don't remove the allocation.

We should be able to resolve those GDVs earlier (for value type instantiations) but we currently don't propagate the type information fast enough, so the GDVs are still there during escape analysis and make the JIT overly conservative.

I think I can add a post-pass between inlining and EA where if we have discovered the exact type of an enumerator we can try and aggressively resolve its GDVs.

AggressiveInlining fixes this problem by forcibly inlining along the failed path as well, so works even when the GDVs are not resolved; that's why my perf numbers were better.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants