Skip to content

[OpenMP][OMPIRBuilder] Avoid querying SmallPtrSet during removal#198690

Merged
chichunchen merged 3 commits into
mainfrom
users/cchen/mlir_openmp_fuse_loop_dead_blocks
May 20, 2026
Merged

[OpenMP][OMPIRBuilder] Avoid querying SmallPtrSet during removal#198690
chichunchen merged 3 commits into
mainfrom
users/cchen/mlir_openmp_fuse_loop_dead_blocks

Conversation

@chichunchen
Copy link
Copy Markdown
Contributor

@chichunchen chichunchen commented May 20, 2026

openmp-cli-fuse02.mlir can intermittently leave behind a dead block after loop fusion, causing LLVM IR verification to fail. (#197637 (comment))

This happened because removeUnusedBlocksFromParent queried BBsToErase while using SmallPtrSet::remove_if on the same set, which made the result depend on the set's internal mutation order during removal.

This patch tracks candidate blocks by index with SmallBitVector instead of mutating and querying the same SmallPtrSet. This also preserves the original BBs order when collecting the final dead blocks.


The original PR was created with assistance from Copilot. The later analysis and fix were revised based on reviewer feedback.

Co-authored-by: @slinder1

@chichunchen chichunchen force-pushed the users/cchen/mlir_openmp_fuse_loop_dead_blocks branch from 22922a0 to 5db7a42 Compare May 20, 2026 01:54
The openmp-cli-fuse02.mlir test can fail nondeterministically when
fuseLoops leaves a dead block in the generated function. On AArch64 this
was reproduced as omp_omp.loop.cond3 being emitted with no predecessors,
which breaks CHECK-NEXT in the MLIR LLVM IR translation test.

The issue is in removeUnusedBlocksFromParent. It used
SmallPtrSet::remove_if while HasRemainingUses also queried the same set.
SmallPtrSet iteration order depends on pointer addresses, which can vary
with ASLR, allocation order, or object layout. Erasing one block while
walking the set can therefore change the keep/erase decision for blocks
visited later. The older make_early_inc_range form had the same underlying
defect.

Fix this by collecting all blocks that still have external uses before
erasing any of them. This evaluates every block against the same snapshot
of the erase set and removes the iteration-order dependency.

Add a llvm unit test that fuses a loop range and verifies that no
non-entry dead blocks remain in the function.

Assisted with copilot.
@chichunchen chichunchen marked this pull request as ready for review May 20, 2026 02:46
@llvmorg-github-actions llvmorg-github-actions Bot added flang:openmp clang:openmp OpenMP related changes to Clang labels May 20, 2026
@llvmorg-github-actions
Copy link
Copy Markdown

@llvm/pr-subscribers-flang-openmp

Author: Chi-Chun, Chen (chichunchen)

Changes

The openmp-cli-fuse02.mlir test can fail nondeterministically when
fuseLoops leaves a dead block in the generated function. On AArch64 this
was reproduced as omp_omp.loop.cond3 being emitted with no predecessors,
which breaks CHECK-NEXT in the MLIR LLVM IR translation test.

The issue is in removeUnusedBlocksFromParent. It used
SmallPtrSet::remove_if while HasRemainingUses also queried the same set.
SmallPtrSet iteration order depends on pointer addresses, which can vary
with ASLR, allocation order, or object layout. Erasing one block while
walking the set can therefore change the keep/erase decision for blocks
visited later. The older make_early_inc_range form had the same underlying
defect.

Fix this by collecting all blocks that still have external uses before
erasing any of them. This evaluates every block against the same snapshot
of the erase set and removes the iteration-order dependency.

Add a llvm unit test that fuses a loop range and verifies that no
non-entry dead blocks remain in the function.

Assisted with copilot.


Full diff: https://github.com/llvm/llvm-project/pull/198690.diff

2 Files Affected:

  • (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+11-2)
  • (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+51)
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 06026582538a2..200e6897fd6ab 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -6573,8 +6573,17 @@ static void removeUnusedBlocksFromParent(ArrayRef<BasicBlock *> BBs) {
     return false;
   };
 
-  while (BBsToErase.remove_if(HasRemainingUses)) {
-    // Try again if anything was removed.
+  // Compute the keep set before mutating BBsToErase; otherwise the result can
+  // depend on SmallPtrSet iteration order.
+  bool Changed = true;
+  while (Changed) {
+    Changed = false;
+    SmallVector<BasicBlock *, 4> BBsToKeep;
+    for (BasicBlock *BB : BBsToErase)
+      if (HasRemainingUses(BB))
+        BBsToKeep.push_back(BB);
+    for (BasicBlock *BB : BBsToKeep)
+      Changed |= BBsToErase.erase(BB);
   }
 
   SmallVector<BasicBlock *, 7> BBVec(BBsToErase.begin(), BBsToErase.end());
diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
index acfa993855b2e..a7e572bc427a3 100644
--- a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+++ b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
@@ -8461,4 +8461,55 @@ TEST_F(OpenMPIRBuilderTest, ScopeDirectiveNowait) {
   }
 }
 
+TEST_F(OpenMPIRBuilderTest, FuseLoopRangeNoDeadBlocks) {
+  using InsertPointTy = OpenMPIRBuilder::InsertPointTy;
+  OpenMPIRBuilder OMPBuilder(*M);
+  OMPBuilder.initialize();
+  F->setName("func");
+
+  IRBuilder<> Builder(BB);
+  Type *LCTy = F->getArg(0)->getType();
+  Value *TripCount = F->getArg(0);
+
+  auto LoopBodyGenCB = [&](InsertPointTy CodeGenIP, Value *LC) {
+    Builder.restoreIP(CodeGenIP);
+    createPrintfCall(Builder, "%d\n", {LC});
+    return Error::success();
+  };
+
+  OpenMPIRBuilder::LocationDescription Loc1({Builder.saveIP(), DL});
+  ASSERT_EXPECTED_INIT(
+      CanonicalLoopInfo *, Loop1,
+      OMPBuilder.createCanonicalLoop(Loc1, LoopBodyGenCB, TripCount, "loop1"));
+  Builder.restoreIP(Loop1->getAfterIP());
+
+  Value *TripCount2 = Builder.CreateAdd(TripCount, ConstantInt::get(LCTy, 1));
+  OpenMPIRBuilder::LocationDescription Loc2({Builder.saveIP(), DL});
+  ASSERT_EXPECTED_INIT(
+      CanonicalLoopInfo *, Loop2,
+      OMPBuilder.createCanonicalLoop(Loc2, LoopBodyGenCB, TripCount2, "loop2"));
+  Builder.restoreIP(Loop2->getAfterIP());
+
+  Value *TripCount3 = Builder.CreateAdd(TripCount, ConstantInt::get(LCTy, 2));
+  OpenMPIRBuilder::LocationDescription Loc3({Builder.saveIP(), DL});
+  ASSERT_EXPECTED_INIT(
+      CanonicalLoopInfo *, Loop3,
+      OMPBuilder.createCanonicalLoop(Loc3, LoopBodyGenCB, TripCount3, "loop3"));
+  Builder.restoreIP(Loop3->getAfterIP());
+  Builder.CreateRetVoid();
+
+  CanonicalLoopInfo *Fused = OMPBuilder.fuseLoops(DL, {Loop1, Loop2});
+
+  OMPBuilder.finalize();
+  ASSERT_FALSE(verifyModule(*M, &errs()));
+
+  Fused->assertOK();
+  for (BasicBlock &Block : *F) {
+    if (&Block == &F->getEntryBlock())
+      continue;
+    EXPECT_TRUE(Block.hasNPredecessorsOrMore(1))
+        << "Dead block found: " << Block.getName().str();
+  }
+}
+
 } // namespace

@chichunchen chichunchen requested review from Meinersbur and nikic May 20, 2026 02:47
Copy link
Copy Markdown
Contributor

@slinder1 slinder1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just started hitting this, so thank you for having already posted a patch!

I reviewed your fix first, but then thought to look through some blame, and was confused to see no recent changes here. Then I took a look at SmallPtrSets blame and noticed that the contract (or at least the documentation of the contract) for remove_if changed in #197637 from:

It is safe to read the set inside
the predicate function. However, the predicate must not modify the set
itself, only indicate a removal by returning true.

to:

The predicate must not access the
set being modified: it may inspect the element passed to it and return
true to request removal, but must not read (e.g. count()/find()) or
otherwise mutate the set. If anything is removed, all iterators and
references into the set are invalidated.

Which seems like it is the real root cause? @MaskRay is this kind of fallout possible in other places in the codebase, or is remove_if generally not used in this way? Did the contract actually change, or was the old comment just wrong?

Comment on lines 6576 to 6587
// Compute the keep set before mutating BBsToErase; otherwise the result can
// depend on SmallPtrSet iteration order.
bool Changed = true;
while (Changed) {
Changed = false;
SmallVector<BasicBlock *, 4> BBsToKeep;
for (BasicBlock *BB : BBsToErase)
if (HasRemainingUses(BB))
BBsToKeep.push_back(BB);
for (BasicBlock *BB : BBsToKeep)
Changed |= BBsToErase.erase(BB);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Compute the keep set before mutating BBsToErase; otherwise the result can
// depend on SmallPtrSet iteration order.
bool Changed = true;
while (Changed) {
Changed = false;
SmallVector<BasicBlock *, 4> BBsToKeep;
for (BasicBlock *BB : BBsToErase)
if (HasRemainingUses(BB))
BBsToKeep.push_back(BB);
for (BasicBlock *BB : BBsToKeep)
Changed |= BBsToErase.erase(BB);
}
// Compute the keep set before mutating BBsToErase; otherwise the result can
// depend on SmallPtrSet iteration order.
bool Changed = false;
do {
SmallVector<BasicBlock *, 4> BBsToKeep;
for (BasicBlock *BB : BBsToErase)
if (HasRemainingUses(BB))
BBsToKeep.push_back(BB);
for (BasicBlock *BB : BBsToKeep)
Changed |= BBsToErase.erase(BB);
} while (Changed);

Copy link
Copy Markdown
Member

@Meinersbur Meinersbur May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You suggestion never resets Change to false, causing an infinity loop.

If we are going to nitpick in this, what I would prefer is

  SmallVector<BasicBlock *> BBsToKeep;
  while (true) {
    bool Changed = false;
    BBsToKeep.clear();

    for (BasicBlock *BB : BBsToErase)
      if (HasRemainingUses(BB))
        BBsToKeep.push_back(BB);
    for (BasicBlock *BB : BBsToKeep)
      Changed |= BBsToErase.erase(BB);

    if (!Changed)
      break;
  }

such that Changed remains scoped in the loop body.

Declare vectors outside of the loop: https://llvm.org/docs/ProgrammersManual.html#vector

It is recommended to not specify a small size for vectors on the stack: https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, ignore my original comment here. I was working too late, and didn't check my own suggestion carefully, sorry about that!

@Meinersbur
Copy link
Copy Markdown
Member

Meinersbur commented May 20, 2026

Can you help me understand the problem?

The patch is based on a commit before #197637 where SmallPtrSet::remove_if has nothing that would make the data structure invalid between calls of the predicate. I could not make your test fail without your change in OpenMPIRBuilder.
The algorithm is supposed to be a fixpoint iteration: It should not matter which order BBs are iterated over; if a BB is still marked for deletion in one round due to different iteration order, it will be unmarked in the next.

@aengelke
Copy link
Copy Markdown
Contributor

Did the contract actually change,

Yes. I think this was an oversight, as the line is explicitly mentioned in one of the comments..

The explanation in the PR is wrong. The problem is that during the execution of SmallPtrSet::remove_if(), lookup in the set doesn't work correctly until the rehash at the end of remove_if.

@slinder1
Copy link
Copy Markdown
Contributor

slinder1 commented May 20, 2026

Did the contract actually change,

Yes. I think this was an oversight, as the line is explicitly mentioned in one of the comments..

The explanation in the PR is wrong. The problem is that during the execution of SmallPtrSet::remove_if(), lookup in the set doesn't work correctly until the rehash at the end of remove_if.

That makes sense, thank you!

It seems like copilot may have helped find this (now) broken use of remove_if, but the stochastically reverse-engineered "analysis" of it is junk (go figure)

If the remove_if change is purely for performance can we just revert for now?

Edit: I'm just now reading your comment on the other change. I will comment there instead

I'll update the PR descrition and title for the correct information.
@chichunchen chichunchen force-pushed the users/cchen/mlir_openmp_fuse_loop_dead_blocks branch from 5db7a42 to c06bd06 Compare May 20, 2026 16:46
@chichunchen chichunchen changed the title [OpenMP][OMPIRBuilder] Fix nondeterministic dead block removal [OpenMP][OMPIRBuilder] Avoid querying SmallPtrSet during removal May 20, 2026
@slinder1
Copy link
Copy Markdown
Contributor

slinder1 commented May 20, 2026

Since the input is a small array of BasicBlocks (otherwise it probably makes more sense to just do a DFS/simple DCE?), we could also work in terms of indices into BBs and use SmallBitVectors:

/// Determine which blocks in \p BBs are reachable from outside and remove the
/// ones that are not reachable from the function.
static void removeUnusedBlocksFromParent(ArrayRef<BasicBlock *> BBs) {
  DenseMap<BasicBlock *, unsigned> BB2Index(BBs.size());
  for (const auto [I, V] : enumerate(BBs))
    BB2Index.insert({V, I});

  // Optimistically assume all BBs are dead. We iteratively remove
  // immediately-used blocks from this set until we reach a fixed point.
  SmallBitVector BBsToErase(BBs.size(), true);
  // Accumulates immediately-used blocks for one iteration.
  SmallBitVector BBsToKeep(BBs.size());
  do {
    BBsToKeep.reset();
    for (unsigned BBToEraseIndex : BBsToErase.set_bits()) {
      for (Use &U : BBs[BBToEraseIndex]->uses()) {
        auto *UseInst = dyn_cast<Instruction>(U.getUser());
        if (!UseInst)
          continue;
        BasicBlock *UseBB = UseInst->getParent();
        auto UseBBIndexI = BB2Index.find(UseBB);
        // If this use is from an external block, or from a block in BBs we have
        // already proved is transitively used by an external block, then
        // this block is also used.
        if (UseBBIndexI == BB2Index.end() ||
            !BBsToErase.test(UseBBIndexI->second)) {
          BBsToKeep.set(BBToEraseIndex);
          break;
        }
      }
    }
    BBsToErase &= ~BBsToKeep;
  } while (BBsToKeep.any());

  SmallVector<BasicBlock *> FinalBBsToErase;
  for (unsigned BBToEraseIndex : BBsToErase.set_bits())
    FinalBBsToErase.push_back(BBs[BBToEraseIndex]);
  DeleteDeadBlocks(FinalBBsToErase);
}

… slinder1)

Track candidate blocks by stable BB indices instead of mutating pointer
sets, avoiding SmallPtrSet tombstone/rehash/iteration-order issues while
preserving original BB order for deletion.
Copy link
Copy Markdown
Contributor

@slinder1 slinder1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version LGTM, and landing something so the buildbots aren't randomly failing would be good ASAP

If there is more review we can do it after. Thanks again!

Edit: I had missed the previous revision when I responded with the SmallBitVector suggestion, but either version LGTM!

@chichunchen chichunchen merged commit fadbc3f into main May 20, 2026
10 checks passed
@chichunchen chichunchen deleted the users/cchen/mlir_openmp_fuse_loop_dead_blocks branch May 20, 2026 19:09
Copy link
Copy Markdown
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like AI-overcomplexity. BBsToKeep is not even needed because it's finding a fixpoint, just reset that bit in BBsToErase directly. In constrast to SmallPtrSet you can actually do that while iterating over it. Also, building the DenseMap over a small list of an estimated 7 elements max is probably overkill.

Here someting simpler by just inverting BBsToErase to BBsToKeep:

static void removeUnusedBlocksFromParent(ArrayRef<BasicBlock *> BBs) {
  SmallPtrSet<BasicBlock *, 6> BBsToKeep(llvm::from_range, BBs);
  auto HasRemainingUses = [&BBsToKeep](BasicBlock *BB) {
    for (Use &U : BB->uses()) {
      auto *UseInst = dyn_cast<Instruction>(U.getUser());
      if (!UseInst)
        continue;
      if (!BBsToKeep.count(UseInst->getParent()))
        continue;
      return true;
    }
    return false;
  };

  while (true) {
    bool Changed = false;

    for (BasicBlock *BB : BBs) {
      if (BBsToKeep.count(BB) || !HasRemainingUses(BB)) 
        continue;

      BBsToKeep.insert(BB);
      Changed = true;
    }

    if (!Changed)
      break;
  }

  SmallVector<BasicBlock *, 7> BBVec;
  for (BasicBlock *BB : BBs) {
    if (!BBsToKeep.count(BB))
      BBVec.push_back(BB);
  }
  DeleteDeadBlocks(BBVec);
}

if (!UseInst)
continue;
BasicBlock *UseBB = UseInst->getParent();
auto UseBBIndexI = BB2Index.find(UseBB);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer DenseMap<BasicBlock *, unsigned>::iterator UseBBIndexI?

This isn't about "almost always" using auto, iterator types are the canonical example of why auto is helpful for readability.

@slinder1
Copy link
Copy Markdown
Contributor

slinder1 commented May 20, 2026

This looks like AI-overcomplexity. BBsToKeep is not even needed because it's finding a fixpoint, just reset that bit in BBsToErase directly. In constrast to SmallPtrSet you can actually do that while iterating over it. Also, building the DenseMap over a small list of an estimated 7 elements max is probably overkill.

Here someting simpler by just inverting BBsToErase to BBsToKeep:

static void removeUnusedBlocksFromParent(ArrayRef<BasicBlock *> BBs) {
  SmallPtrSet<BasicBlock *, 6> BBsToKeep(llvm::from_range, BBs);
  auto HasRemainingUses = [&BBsToKeep](BasicBlock *BB) {
    for (Use &U : BB->uses()) {
      auto *UseInst = dyn_cast<Instruction>(U.getUser());
      if (!UseInst)
        continue;
      if (!BBsToKeep.count(UseInst->getParent()))
        continue;
      return true;
    }
    return false;
  };

  while (true) {
    bool Changed = false;

    for (BasicBlock *BB : BBs) {
      if (BBsToKeep.count(BB) || !HasRemainingUses(BB)) 
        continue;

      BBsToKeep.insert(BB);
      Changed = true;
    }

    if (!Changed)
      break;
  }

  SmallVector<BasicBlock *, 7> BBVec;
  for (BasicBlock *BB : BBs) {
    if (!BBsToKeep.count(BB))
      BBVec.push_back(BB);
  }
  DeleteDeadBlocks(BBVec);
}

No AI used in this case, I'm just overcomplicating things. I was biased towards the likely LLM generated original version, though

@slinder1
Copy link
Copy Markdown
Contributor

slinder1 commented May 20, 2026

I think these are just artifacts of you giving the gist of the approach quickly, but I don't think your code works as-is:

static void removeUnusedBlocksFromParent(ArrayRef<BasicBlock *> BBs) {
  SmallPtrSet<BasicBlock *, 6> BBsToKeep(llvm::from_range, BBs);

I assume this should start empty?

  auto HasRemainingUses = [&BBsToKeep](BasicBlock *BB) {
    for (Use &U : BB->uses()) {
      auto *UseInst = dyn_cast<Instruction>(U.getUser());
      if (!UseInst)
        continue;
      if (!BBsToKeep.count(UseInst->getParent()))
        continue;
      return true;

Assuming a block is in BBsToKeep iff we know it has an external use, this seems slightly wrong. Isn't it if (<UseBB is not in BBs> || BBsToKeep.count(UseBB)) return true;?

    }
    return false;
  };

  while (true) {
    bool Changed = false;

    for (BasicBlock *BB : BBs) {
      if (BBsToKeep.count(BB) || !HasRemainingUses(BB)) 
        continue;

This is the only use of HasRemainingUses anyway, and it isn't passed as a callback, so we probably don't need the lambda, it could just be inline. I guess the return from the lambda is nicer than assign-and-break? If the lambda remains then the Remaining in the name no longer fits.


      BBsToKeep.insert(BB);
      Changed = true;
    }

    if (!Changed)
      break;
  }

  SmallVector<BasicBlock *, 7> BBVec;
  for (BasicBlock *BB : BBs) {
    if (!BBsToKeep.count(BB))
      BBVec.push_back(BB);
  }
  DeleteDeadBlocks(BBVec);
}

@slinder1
Copy link
Copy Markdown
Contributor

I think this works, and is what you were suggesting?

Some other incidental changes I made are because the SmallPtrSet count is rounded up to a power of two anyway, and SmallVector generally doesn't benefit from a count. I also removed an extra set of braces.

static void removeUnusedBlocksFromParent(ArrayRef<BasicBlock *> BBs) {
  SmallPtrSet<BasicBlock *, 8> InternalBBs(from_range, BBs);
  SmallPtrSet<BasicBlock *, 8> BBsToKeep;

  while (true) {
    bool Changed = false;

    for (BasicBlock *BB : BBs) {
      if (BBsToKeep.contains(BB))
        continue;

      for (Use &U : BB->uses()) {
        auto *UseInst = dyn_cast<Instruction>(U.getUser());
        if (!UseInst)
          continue;
        BasicBlock *UseBB = UseInst->getParent();
        if (!InternalBBs.contains(UseBB) || BBsToKeep.contains(UseBB)) {
          BBsToKeep.insert(BB);
          Changed = true;
          break;
        }
      }
    }

    if (!Changed)
      break;
  }

  SmallVector<BasicBlock *> BBVec;
  for (BasicBlock *BB : BBs)
    if (!BBsToKeep.count(BB))
      BBVec.push_back(BB);

  DeleteDeadBlocks(BBVec);
}

slinder1 added a commit that referenced this pull request May 21, 2026
This is essentially post-commit review for #198690 which was landed
quickly to fix nondeterminism in tests introduced in #197637

Change-Id: Ib3603ef3c70dde5bb22d0fc04d9249e62ecccf0c
Co-authored-by: @Meinersbur
Co-authored-by: @chichunchen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:openmp OpenMP related changes to Clang flang:openmp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants