Skip to content

Conversation

@jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jan 8, 2024

Delete the old loop canonicalization. Implement canonicalization on new loops; the only canonicalization done is creation of preheaders.

Diffs expected; in particular due to ambiguous cases where the old loop finding identifies more loops than the new loop finding. The old loop finding would then canonicalize these extra loops in such a way that new loop finding also found them. That is, the old pipeline:

old loop finding -> old loop canonicalization -> new loop finding

would result in more loops found than

old loop finding -> new loop finding -> new loop canonicalization

An example of a program where it is ambiguous how many loops it contains based only on the flow graph is the following:

[MethodImpl(MethodImplOptions.NoInlining)]
private static void Foo(int n)
{
    n *= 2;

    do
    {
        if (n == 3)
        {
            n -= 2;
            continue;
        }

        if (n == 4)
        {
            n -= 3;
            continue;
        }

        if (n == 5)
        {
            n -= 4;
            continue;
        }

        n--;
    } while (n > 0);
}

Old loop finding finds 4 loops in this program, while new loop finding finds 1 loop. The backedges (continue statements) can either be seen as nested loop backedges or as backedges to the outer loop). Here is another, opposite case:

[MethodImpl(MethodImplOptions.NoInlining)]
private static void Foo(int n)
{
    n *= 2;
    do
    {
        do
        {
            n *= 3;
        } while (n < 5);

        n -= 10;
    } while (n != 0);
}

Old loop finding finds 2 loops but new loop finding finds 1 loop.

The differences in loop identification leads to the diffs. Instead of trying to proactively handle them, we are going to see if the perf lab finds any examples that we can gain insight on. One avenue to explore may be to use PGO information to figure out what may be nested loops vs what may just be backedges.

I validated that diffs are coming from this by writing a small program to go through all contexts < 50 KiB with diffs in libraries_tests.run and verify that the baseline found more loops than the diff:

Processed 554/554 contexts, 554 with new loops, 528 due to canonicalizing nested inner loop backedges

Nested inner loop backedges is what old loop canonicalization calls the situation where it creates a new "top" block in the above case, that then would cause new loop finding to also recognize the new loop.

The remaining case 26 cases seemed to be cases where new loop canonicalization did not need to create a preheader, but old loop canonicalization did, because of old loop recognition identifying a smaller nested inner loop that did not contain one of the backedges. Example: old, new, BB05 only needs a preheader in "old".

Invalidate the old loop table immediately after old loop finding. Remove
`m_oldToNewLoop` and `m_newToOldLoop`. Move `BBF_OLD_LOOP_HEADER_QUIRK`
to be set directly by old loop finding.
@ghost ghost assigned jakobbotsch Jan 8, 2024
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 8, 2024
@ghost
Copy link

ghost commented Jan 8, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch changed the title Canonicalize new loops JIT: Create preheaders on new loops and delete old loop canonicalization Jan 8, 2024
@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 9, 2024

cc @dotnet/jit-contrib PTAL @BruceForstall

The libraries-jitstress failure is #96715. The jitstress one looks like an infra issue (restore failed on osx-arm64).

Diffs. As mentioned above, due to differences in loop finding around ambiguous nested inner loop cases.
We will be able to regain the TP regression once we get rid of the old loop compaction code; there is an unnecessary old dominator computation right now because of it.

// recorded in terms of block numbers, so flag it invalid.
fgDomsComputed = false;
fgRenumberBlocks();
fgUpdateChangedFlowGraph(FlowGraphUpdates::COMPUTE_DOMS);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to recompute the old dominators here since optFindAndScaleGeneralLoopBlocks currently depends on them, but we should be able to get rid of this computation very soon. The current TP regressions are because of cases where we compute the dominators here and then also again after new canonicalization; in those cases only the latter computation is necessary (but it's a little hard to detect, so I just left it in both places for now).

Comment on lines +4888 to +4911
void Compiler::optFindNewLoops()
{
m_loops = FlowGraphNaturalLoops::Find(m_dfsTree);

if (optCanonicalizeLoops(m_loops))
{
fgUpdateChangedFlowGraph(FlowGraphUpdates::COMPUTE_DOMS);
m_dfsTree = fgComputeDfs();
m_loops = FlowGraphNaturalLoops::Find(m_dfsTree);
}

// Starting now, we require all loops to have pre-headers.
optLoopsRequirePreHeaders = true;

// Leave a bread crumb for future phases like loop alignment about whether
// looking for loops makes sense. We generally do not expect phases to
// introduce new cycles/loops in the flow graph; if they do, they should
// set this to true themselves.
// We use more general cycles over "m_loops->NumLoops() > 0" here because
// future optimizations can easily cause general cycles to become natural
// loops by removing edges.
fgMightHaveNaturalLoops = m_dfsTree->HasCycle();
assert(fgMightHaveNaturalLoops || (m_loops->NumLoops() == 0));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why in the diff this shows up as a big addition... I only added the canonicalization part here.

@jakobbotsch
Copy link
Member Author

I pushed a minor changes, but this passed jitstress/libraries-jitstress before those.

Copy link
Contributor

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tmds pushed a commit to tmds/runtime that referenced this pull request Jan 23, 2024
…ion (dotnet#96647)

Delete the old loop canonicalization. Implement canonicalization on new loops;
the only canonicalization done is creation of preheaders.

Diffs expected, in particular due to ambiguous cases where the old loop finding
identifies more loops than the new loop finding. The old loop finding would then
canonicalize these extra loops in such a way that new loop finding also found
them. That is, the old pipeline:

```
old loop finding -> old loop canonicalization -> new loop finding
```

would result in more loops found than

```
old loop finding -> new loop finding -> new loop canonicalization
```

An example of a program where it is ambiguous how many loops it contains based
only on the flow graph is the following:

```csharp
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Foo(int n)
{
    n *= 2;

    do
    {
        if (n == 3)
        {
            n -= 2;
            continue;
        }

        if (n == 4)
        {
            n -= 3;
            continue;
        }

        if (n == 5)
        {
            n -= 4;
            continue;
        }

        n--;
    } while (n > 0);
}
```

Old loop finding finds 4 loops in this program, while new loop finding finds 1
loop. The backedges (`continue` statements) can either be seen as nested loop
backedges or as backedges to the outer loop). Here is another, opposite case:

```csharp
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Foo(int n)
{
    n *= 2;
    do
    {
        do
        {
            n *= 3;
        } while (n < 5);

        n -= 10;
    } while (n != 0);
}
```

Old loop finding finds 2 loops but new loop finding finds 1 loop.

The differences in loop identification lead to the diffs. Instead of trying to
proactively handle them, we are going to see if the perf lab finds any examples
that we can gain insight on. One avenue to explore may be to use PGO information
to figure out what may be nested loops vs what may just be backedges.
@github-actions github-actions bot locked and limited conversation to collaborators Feb 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants