Skip to content

JIT doesn't allow method prologues to have more than one instruction group #104585

@tannergooding

Description

@tannergooding

The JIT currently has a restriction that there can only be one IG for the method prologue, this is unlike funclets or the method epilogue which can extend across several.

This is normally not problematic, however there are many scenarios under which the method prologue can extend past the limits of a single group since a single group has a finite number of instructions it can hold.

An example of this is the following program:

using System.Numerics.Tensors;
using System.Runtime.CompilerServices;

internal class Program
{
    private static void Main(string[] args)
    {
        ReadOnlySpan<ulong> x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16];
        Console.WriteLine(Invoke(x, x));
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static ulong Invoke(ReadOnlySpan<ulong> x, ReadOnlySpan<ulong> y)
    {
        return TensorPrimitives.ProductOfDifferences<ulong>(x, y);
    }
}

If this is run under a checked JIT with DOTNET_ReadyToRun=0, DOTNET_TieredCompilation=0, and DOTNET_JitStressRegs=0x80 then it will trigger the following assert: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/emit.cpp#L9670-L9672

    /* Right now we don't allow multi-IG prologs */

    assert(emitCurIG != emitPrologIG);

This happens because we set genUseBlockInit = (genInitStkLclCnt > 4) and then in genZeroInitFrame use that to determine if we're zeroing using SIMD or using a what is basically sizeof(void*) stores of the native general purpose register

That itself seems "bad" from a performance perspective since it's not accounting for how big these 4 locals are and therefore whether block vs scalar zeroing is "better". But, independently it means that this code path is broken if the total number of store instructions required extends past the limits of a single IG as occurs if you have 4x TYP_SIMD64 as an example.

The JIT needs to be updated to support prologues that extend past 1 group to ensure that we are robust in the face of having more than EMIT_MAX_IG_INS_COUNT (which can be less in practice for large instrDesc, instructions, in the failure above we hit the limit at 61 instructions out of the maximum 256).

Additionally, it would probably be beneficial to have zeroing pick the optimal strategy based on number of bytes needing to be zeroed rather than number of locals.

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions