Skip to content

Vectorize SearchValues.Create min/max scan using Vector128#124667

Closed
Copilot wants to merge 3 commits intomainfrom
copilot/vectorize-searchvalues-create
Closed

Vectorize SearchValues.Create min/max scan using Vector128#124667
Copilot wants to merge 3 commits intomainfrom
copilot/vectorize-searchvalues-create

Conversation

Copy link
Contributor

Copilot AI commented Feb 20, 2026

SearchValues.Create walks input values to find min/max (used to select the optimal search implementation) using a scalar loop. This vectorizes that scan to process multiple elements per iteration.

Description

SearchValues.cs — vectorized TryGetSingleRange

Added a GetMinMax<T> helper that computes min and max in a single vectorized pass:

  • Uses Vector128.Min / Vector128.Max to process Vector128<T>.Count elements per iteration (8 ushorts or 16 bytes at once)
  • Handles the tail with a final overlapping vector load
  • Horizontal reduction is a scalar loop over the ≤16 lane results
  • char is not a valid Vector128 element type — the span is cast to ReadOnlySpan<ushort>, and the results are converted back with Unsafe.BitCast<ushort, T>
  • Falls back to a scalar loop when SIMD is unavailable or input is smaller than one vector
// Before: one element at a time
foreach (T value in values) { min = T.Min(min, value); max = T.Max(max, value); }

// After: Vector128<T>.Count elements per iteration, min and max in a single pass
do {
    Vector128<T> v = Vector128.LoadUnsafe(ref current);
    vMin = Vector128.Min(vMin, v);
    vMax = Vector128.Max(vMax, v);
    current = ref Unsafe.Add(ref current, Vector128<T>.Count);
} while (Unsafe.IsAddressLessThan(ref current, ref lastVectorStart));
// + overlapping tail + horizontal reduction

Testing

Performance was validated using @EgorBot on Linux/AMD64 and macOS/ARM64 with SearchValues.Create on the 64-character Base64 alphabet as the benchmark input.

Risk

Low — GetMinMax is only called during SearchValues.Create, not on hot search paths. The scalar fallback is identical to the original code.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 20, 2026
@MihaZupan MihaZupan added area-System.Buffers and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 20, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

@MihaZupan MihaZupan added the NO-REVIEW Experimental/testing PR, do NOT review it label Feb 20, 2026
Copilot AI changed the title [WIP] Vectorize min and max computation in SearchValues.Create Vectorize SearchValues.Create min/max scan and ASCII bitmap construction Feb 20, 2026
Copilot AI requested a review from MihaZupan February 20, 2026 20:12
Copilot AI changed the title Vectorize SearchValues.Create min/max scan and ASCII bitmap construction Vectorize SearchValues.Create min/max scan using Vector128 Feb 20, 2026
@MihaZupan
Copy link
Member

@copilot Use the benchmark skill to check the performance difference of SearchValues.Create for chars by using the Base64 alphabet as the input.

Copy link
Contributor Author

Copilot AI commented Feb 20, 2026

@copilot Use the benchmark skill to check the performance difference of SearchValues.Create for chars by using the Base64 alphabet as the input.

@EgorBot -linux_amd -osx_arm64

using System.Buffers;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    // Standard Base64 alphabet (64 chars)
    private const string Base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

    [Benchmark]
    public SearchValues<char> Create_Base64Alphabet() =>
        SearchValues.Create(Base64Chars);
}

@danmoseley
Copy link
Member

EgorBot Benchmark Analysis

Linux AMD64 — AMD EPYC 9V74 (Genoa)

Method Toolchain Mean Ratio
Create_Base64Alphabet PR #124667 133.1 ns 1.00
Create_Base64Alphabet main 314.3 ns 2.36

macOS ARM64 — Apple M1

Method Toolchain Mean Ratio
Create_Base64Alphabet PR #124667 130.7 ns 1.00
Create_Base64Alphabet main 262.7 ns 2.01

Summary: The vectorized Vector128 min/max scan is a clear win on both platforms — ~2.4x faster on AMD64 and ~2.0x faster on ARM64 for SearchValues.Create with the 64-char Base64 alphabet. The PR brings creation time down from ~260-314 ns to ~131-133 ns, which is a solid improvement for a low-risk change (only affects the Create path, not hot search paths).

Full results from EgorBot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Buffers NO-REVIEW Experimental/testing PR, do NOT review it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants