[SmallPtrSet] Drop tombstones in large mode by MaskRay · Pull Request #197637 · llvm/llvm-project

MaskRay · 2026-05-14T08:32:25Z

SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R. erase opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot. The scan stops at the next empty bucket, which is guaranteed to
exist.

remove_if clears matches in a single pass then calls Grow at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Mirrors the small-mode change in 42c3edb, which dropped tombstones
from the inline array and likewise had erase invalidate iterators.
erase now physically relocates following entries, so it invalidates
iterators and references in large mode too.

Linear probing is vulnerable to primary clustering. Therefore, we lower
the max load factor to 2/3, slightly faster than the current 3/4.

SmallPtrSet uses quadratic probing with tombstone deletion in large mode. Tombstones occupy a third bucket state and hurts lookup. Switch to linear probing with deletion implemented using Knuth TAOCP 6.4 Algorithm R. `erase` opens a hole at the removed slot, walks forward sliding each following entry whose probe path crosses the hole back into it (the hole moves with each slide), and stops at the next empty slot. The scan stops at the next empty bucket, which is guaranteed to exist. `remove_if` clears matches in a single pass then calls `Grow` at the current size to restore the linear-probe invariant, O(N) total. (Per-match Algorithm R erase would be O(N * cluster).) My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss Table family (not good at small keys) are actually worse than the baseline. Mirrors the small-mode change in 42c3edb, which dropped tombstones from the inline array and likewise had `erase` invalidate iterators. `erase` now physically relocates following entries, so it invalidates iterators and references in large mode too. Linear probing is vulnerable to primary clustering. Therefore, we lower the max load factor to 2/3, slightly faster than the current 3/4.

llvmorg-github-actions · 2026-05-14T08:33:07Z

@llvm/pr-subscribers-llvm-adt

@llvm/pr-subscribers-llvm-support

Author: Fangrui Song (MaskRay)

Changes

SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R. erase opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot. The scan stops at the next empty bucket, which is guaranteed to
exist.

remove_if clears matches in a single pass then calls Grow at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Mirrors the small-mode change in 42c3edb, which dropped tombstones
from the inline array and likewise had erase invalidate iterators.
erase now physically relocates following entries, so it invalidates
iterators and references in large mode too.

Linear probing is vulnerable to primary clustering. Therefore, we lower
the max load factor to 2/3, slightly faster than the current 3/4.

Full diff: https://github.com/llvm/llvm-project/pull/197637.diff

3 Files Affected:

(modified) llvm/include/llvm/ADT/SmallPtrSet.h (+33-29)
(modified) llvm/lib/Support/SmallPtrSet.cpp (+29-45)
(modified) llvm/unittests/ADT/SmallPtrSetTest.cpp (+3-2)

diff --git a/llvm/include/llvm/ADT/SmallPtrSet.h b/llvm/include/llvm/ADT/SmallPtrSet.h
index 8e7c8b30293b2..73abcf8ff58c5 100644
--- a/llvm/include/llvm/ADT/SmallPtrSet.h
+++ b/llvm/include/llvm/ADT/SmallPtrSet.h
@@ -47,12 +47,14 @@ namespace llvm {
 /// sets are often small.  In this case, no memory allocation is used, and only
 /// light-weight and cache-efficient scanning is used.
 ///
-/// Large sets use a classic quadratically-probed hash table.  Empty buckets are
-/// represented with an illegal pointer value (-1) to allow null pointers to be
-/// inserted.  Tombstones are represented with another illegal pointer value
-/// (-2), to allow deletion.  The hash table is resized when the table is 3/4 or
-/// more.  When this happens, the table is doubled in size.
-///
+/// Large sets use a linear-probed hash table with deletion implemented using
+/// Knuth TAOCP 6.4 Algorithm R: `erase` opens a hole, walks forward sliding
+/// each following entry whose probe path crosses the hole back into it (the
+/// hole moves with each slide), and stops at the next empty slot.  Empty
+/// buckets are represented with an illegal pointer value (-1) to allow null
+/// pointers to be inserted; no tombstone state is needed.  The hash table is
+/// resized when the table is 2/3 or more.  When this happens, the table is
+/// doubled in size.
 class SmallPtrSetImplBase : public DebugEpochBase {
   friend class SmallPtrSetIteratorImpl;
 
@@ -66,8 +68,6 @@ class SmallPtrSetImplBase : public DebugEpochBase {
   /// If small, all these elements are at the beginning of CurArray and the rest
   /// is uninitialized.
   unsigned NumEntries;
-  /// Number of tombstones in CurArray.
-  unsigned NumTombstones;
   /// Whether the set is in small representation.
   bool IsSmall;
 
@@ -80,7 +80,7 @@ class SmallPtrSetImplBase : public DebugEpochBase {
 
   explicit SmallPtrSetImplBase(const void **SmallStorage, unsigned SmallSize)
       : CurArray(SmallStorage), CurArraySize(SmallSize), NumEntries(0),
-        NumTombstones(0), IsSmall(true) {
+        IsSmall(true) {
     assert(llvm::has_single_bit(SmallSize) &&
            "Initial size must be a power of two!");
   }
@@ -111,7 +111,6 @@ class SmallPtrSetImplBase : public DebugEpochBase {
     }
 
     NumEntries = 0;
-    NumTombstones = 0;
   }
 
   void reserve(size_type NewNumEntries) {
@@ -122,13 +121,13 @@ class SmallPtrSetImplBase : public DebugEpochBase {
     // No need to expand if we're small and NewNumEntries will fit in the space.
     if (isSmall() && NewNumEntries <= CurArraySize)
       return;
-    // insert_imp_big will reallocate if stores is more than 75% full, on the
+    // insert_imp_big will reallocate if stores is more than 2/3 full, on the
     // /final/ insertion.
-    if (!isSmall() && ((NewNumEntries - 1) * 4) < (CurArraySize * 3))
+    if (!isSmall() && ((NewNumEntries - 1) * 3) < (CurArraySize * 2))
       return;
-    // We must Grow -- find the size where we'd be 75% full, then round up to
+    // We must Grow -- find the size where we'd be 2/3 full, then round up to
     // the next power of two.
-    size_type NewSize = NewNumEntries + (NewNumEntries / 3);
+    size_type NewSize = NewNumEntries + (NewNumEntries / 2);
     NewSize = llvm::bit_ceil(NewSize);
     // Like insert_imp_big, always allocate at least 128 elements.
     NewSize = std::max(128u, NewSize);
@@ -136,8 +135,6 @@ class SmallPtrSetImplBase : public DebugEpochBase {
   }
 
 protected:
-  static void *getTombstoneMarker() { return reinterpret_cast<void *>(-2); }
-
   static void *getEmptyMarker() {
     // Note that -1 is chosen to make clear() efficiently implementable with
     // memset and because it's not a valid pointer value.
@@ -206,11 +203,8 @@ class SmallPtrSetImplBase : public DebugEpochBase {
     if (!Bucket)
       return false;
 
-    *const_cast<const void **>(Bucket) = getTombstoneMarker();
-    NumTombstones++;
+    eraseFromBucket(const_cast<const void **>(Bucket));
     --NumEntries;
-    // Treat this consistently from an API perspective, even if we don't
-    // actually invalidate iterators here.
     incrementEpoch();
     return true;
   }
@@ -254,10 +248,16 @@ class SmallPtrSetImplBase : public DebugEpochBase {
   const void *const *FindBucketFor(const void *Ptr) const;
   LLVM_ABI void shrink_and_clear();
 
-  /// Grow - Allocate a larger backing store for the buckets and move it over.
+protected:
+  /// Erase the entry at \p Bucket and close the resulting hole via Knuth
+  /// TAOCP 6.4 Algorithm R. Caller must update \c NumEntries and the epoch.
+  LLVM_ABI void eraseFromBucket(const void **Bucket);
+
+  /// Allocate a larger backing store for the buckets and move it over.
+  /// Passing the current size triggers a same-size rehash, used by batch
+  /// erase to compact away empty slots left by mark-then-rebuild.
   LLVM_ABI void Grow(unsigned NewSize);
 
-protected:
   /// swap - Swaps the elements of two sets.
   /// Note: This method assumes that both sets have the same small size.
   LLVM_ABI void swap(const void **SmallStorage, const void **RHSSmallStorage,
@@ -313,9 +313,7 @@ class LLVM_DEBUGEPOCHBASE_HANDLEBASE_EMPTYBASE SmallPtrSetIteratorImpl
   /// valid.
   void AdvanceIfNotValid() {
     assert(Bucket <= End);
-    while (Bucket != End &&
-           (*Bucket == SmallPtrSetImplBase::getEmptyMarker() ||
-            *Bucket == SmallPtrSetImplBase::getTombstoneMarker()))
+    while (Bucket != End && *Bucket == SmallPtrSetImplBase::getEmptyMarker())
       ++Bucket;
   }
 
@@ -436,18 +434,24 @@ template <typename PtrType> class SmallPtrSetImpl : public SmallPtrSetImplBase {
       return Removed;
     }
 
+    // Mark-then-rebuild: one pass to clear matches without sliding (which
+    // would re-walk the cluster on every erase), then a single rehash to
+    // restore the linear-probe invariant.  O(N) total, vs O(N * cluster)
+    // for repeated per-match Algorithm R erases.
     for (const void *&Bucket : buckets()) {
-      if (Bucket == getTombstoneMarker() || Bucket == getEmptyMarker())
+      if (Bucket == getEmptyMarker())
         continue;
       PtrType Ptr = PtrTraits::getFromVoidPointer(const_cast<void *>(Bucket));
       if (P(Ptr)) {
-        Bucket = getTombstoneMarker();
-        ++NumTombstones;
+        Bucket = getEmptyMarker();
         --NumEntries;
-        incrementEpoch();
         Removed = true;
       }
     }
+    if (Removed) {
+      incrementEpoch();
+      Grow(CurArraySize);
+    }
     return Removed;
   }
 
diff --git a/llvm/lib/Support/SmallPtrSet.cpp b/llvm/lib/Support/SmallPtrSet.cpp
index e377dbf4a6999..d8e66b9c89c74 100644
--- a/llvm/lib/Support/SmallPtrSet.cpp
+++ b/llvm/lib/Support/SmallPtrSet.cpp
@@ -29,7 +29,7 @@ void SmallPtrSetImplBase::shrink_and_clear() {
   // Reduce the number of buckets.
   unsigned Size = size();
   CurArraySize = Size > 16 ? 1 << (Log2_32_Ceil(Size) + 1) : 32;
-  NumEntries = NumTombstones = 0;
+  NumEntries = 0;
 
   // Install the new array.  Clear all the buckets to empty.
   CurArray = (const void**)safe_malloc(sizeof(void*) * CurArraySize);
@@ -39,14 +39,9 @@ void SmallPtrSetImplBase::shrink_and_clear() {
 
 std::pair<const void *const *, bool>
 SmallPtrSetImplBase::insert_imp_big(const void *Ptr) {
-  if (LLVM_UNLIKELY(size() * 4 >= CurArraySize * 3)) {
-    // If more than 3/4 of the array is full, grow.
+  if (LLVM_UNLIKELY(size() * 3 >= CurArraySize * 2)) {
+    // If more than 2/3 of the array is full, grow.
     Grow(CurArraySize < 64 ? 128 : CurArraySize * 2);
-  } else if (LLVM_UNLIKELY(CurArraySize - NumEntries - NumTombstones <
-                           CurArraySize / 8)) {
-    // If fewer of 1/8 of the array is empty (meaning that many are filled with
-    // tombstones), rehash.
-    Grow(CurArraySize);
   }
 
   // Okay, we know we have space.  Find a hash bucket.
@@ -54,9 +49,7 @@ SmallPtrSetImplBase::insert_imp_big(const void *Ptr) {
   if (*Bucket == Ptr)
     return {Bucket, false}; // Already inserted, good.
 
-  // Otherwise, insert it!
-  if (*Bucket == getTombstoneMarker())
-    --NumTombstones;
+  // Otherwise, insert it.
   ++NumEntries;
   *Bucket = Ptr;
   incrementEpoch();
@@ -64,48 +57,46 @@ SmallPtrSetImplBase::insert_imp_big(const void *Ptr) {
 }
 
 const void *const *SmallPtrSetImplBase::doFind(const void *Ptr) const {
-  unsigned BucketNo =
-      DenseMapInfo<void *>::getHashValue(Ptr) & (CurArraySize - 1);
-  unsigned ProbeAmt = 1;
+  unsigned Mask = CurArraySize - 1;
+  unsigned BucketNo = DenseMapInfo<void *>::getHashValue(Ptr) & Mask;
   while (true) {
     const void *const *Bucket = CurArray + BucketNo;
     if (LLVM_LIKELY(*Bucket == Ptr))
       return Bucket;
     if (LLVM_LIKELY(*Bucket == getEmptyMarker()))
       return nullptr;
-
-    // Otherwise, it's a hash collision or a tombstone, continue quadratic
-    // probing.
-    BucketNo += ProbeAmt++;
-    BucketNo &= CurArraySize - 1;
+    BucketNo = (BucketNo + 1) & Mask;
   }
 }
 
 const void *const *SmallPtrSetImplBase::FindBucketFor(const void *Ptr) const {
-  unsigned Bucket = DenseMapInfo<void *>::getHashValue(Ptr) & (CurArraySize-1);
-  unsigned ArraySize = CurArraySize;
-  unsigned ProbeAmt = 1;
+  unsigned Mask = CurArraySize - 1;
+  unsigned Bucket = DenseMapInfo<void *>::getHashValue(Ptr) & Mask;
   const void *const *Array = CurArray;
-  const void *const *Tombstone = nullptr;
   while (true) {
-    // If we found an empty bucket, the pointer doesn't exist in the set.
-    // Return a tombstone if we've seen one so far, or the empty bucket if
-    // not.
     if (LLVM_LIKELY(Array[Bucket] == getEmptyMarker()))
-      return Tombstone ? Tombstone : Array+Bucket;
-
-    // Found Ptr's bucket?
+      return Array + Bucket;
     if (LLVM_LIKELY(Array[Bucket] == Ptr))
-      return Array+Bucket;
-
-    // If this is a tombstone, remember it.  If Ptr ends up not in the set, we
-    // prefer to return it than something that would require more probing.
-    if (Array[Bucket] == getTombstoneMarker() && !Tombstone)
-      Tombstone = Array+Bucket;  // Remember the first tombstone found.
+      return Array + Bucket;
+    Bucket = (Bucket + 1) & Mask;
+  }
+}
 
-    // It's a hash collision or a tombstone. Reprobe.
-    Bucket = (Bucket + ProbeAmt++) & (ArraySize-1);
+void SmallPtrSetImplBase::eraseFromBucket(const void **Bucket) {
+  // Knuth TAOCP 6.4 Algorithm R: walk forward sliding each following entry
+  // whose probe path crosses the hole.
+  unsigned Mask = CurArraySize - 1;
+  unsigned I = Bucket - CurArray;
+  unsigned J = I;
+  const void *Empty = getEmptyMarker();
+  while ((J = (J + 1) & Mask), CurArray[J] != Empty) {
+    auto Ideal = DenseMapInfo<void *>::getHashValue(CurArray[J]);
+    if (((I - Ideal) & Mask) < ((J - Ideal) & Mask)) {
+      CurArray[I] = CurArray[J];
+      I = J;
+    }
   }
+  CurArray[I] = Empty;
 }
 
 /// Grow - Allocate a larger backing store for the buckets and move it over.
@@ -125,13 +116,12 @@ void SmallPtrSetImplBase::Grow(unsigned NewSize) {
   // Copy over all valid entries.
   for (const void *&Bucket : OldBuckets) {
     // Copy over the element if it is valid.
-    if (Bucket != getTombstoneMarker() && Bucket != getEmptyMarker())
+    if (Bucket != getEmptyMarker())
       *const_cast<void **>(FindBucketFor(Bucket)) = const_cast<void *>(Bucket);
   }
 
   if (!WasSmall)
     free(OldBuckets.begin());
-  NumTombstones = 0;
   IsSmall = false;
 }
 
@@ -194,7 +184,6 @@ void SmallPtrSetImplBase::copyHelper(const SmallPtrSetImplBase &RHS) {
   llvm::copy(RHS.buckets(), CurArray);
 
   NumEntries = RHS.NumEntries;
-  NumTombstones = RHS.NumTombstones;
 }
 
 void SmallPtrSetImplBase::moveFrom(const void **SmallStorage,
@@ -224,13 +213,11 @@ void SmallPtrSetImplBase::moveHelper(const void **SmallStorage,
   // Copy the rest of the trivial members.
   CurArraySize = RHS.CurArraySize;
   NumEntries = RHS.NumEntries;
-  NumTombstones = RHS.NumTombstones;
   IsSmall = RHS.IsSmall;
 
   // Make the RHS small and empty.
   RHS.CurArraySize = SmallSize;
   RHS.NumEntries = 0;
-  RHS.NumTombstones = 0;
   RHS.IsSmall = true;
 }
 
@@ -244,7 +231,6 @@ void SmallPtrSetImplBase::swap(const void **SmallStorage,
     std::swap(this->CurArray, RHS.CurArray);
     std::swap(this->CurArraySize, RHS.CurArraySize);
     std::swap(this->NumEntries, RHS.NumEntries);
-    std::swap(this->NumTombstones, RHS.NumTombstones);
     return;
   }
 
@@ -263,7 +249,6 @@ void SmallPtrSetImplBase::swap(const void **SmallStorage,
     }
     assert(this->CurArraySize == RHS.CurArraySize);
     std::swap(this->NumEntries, RHS.NumEntries);
-    std::swap(this->NumTombstones, RHS.NumTombstones);
     return;
   }
 
@@ -276,7 +261,6 @@ void SmallPtrSetImplBase::swap(const void **SmallStorage,
   llvm::copy(SmallSide.small_buckets(), LargeSideInlineStorage);
   std::swap(LargeSide.CurArraySize, SmallSide.CurArraySize);
   std::swap(LargeSide.NumEntries, SmallSide.NumEntries);
-  std::swap(LargeSide.NumTombstones, SmallSide.NumTombstones);
   SmallSide.CurArray = LargeSide.CurArray;
   SmallSide.IsSmall = false;
   LargeSide.CurArray = LargeSideInlineStorage;
diff --git a/llvm/unittests/ADT/SmallPtrSetTest.cpp b/llvm/unittests/ADT/SmallPtrSetTest.cpp
index fe7a8279d06b1..b42da537510de 100644
--- a/llvm/unittests/ADT/SmallPtrSetTest.cpp
+++ b/llvm/unittests/ADT/SmallPtrSetTest.cpp
@@ -476,7 +476,8 @@ TEST(SmallPtrSetTest, Reserve) {
   EXPECT_EQ(Set.size(), 6u);
   EXPECT_THAT(Set, UnorderedElementsAre(&Vals[0], &Vals[1], &Vals[2], &Vals[3], &Vals[4], &Vals[5]));
 
-  // Reserving 192 should result in 256 buckets.
+  // Reserving 192 should result in 512 buckets: 192 * 3 / 2 = 288, rounded
+  // up to the next power of two.
   Set.reserve(192);
-  EXPECT_EQ(Set.capacity(), 256u);
+  EXPECT_EQ(Set.capacity(), 512u);
 }

MaskRay · 2026-05-14T08:34:17Z

https://llvm-compile-time-tracker.com/compare.php?from=f996980ff486a0cc03b84f87bbb3554ea99be1e2&to=106e2189ab0c56a6e1e259e4603d8b53f7012fd5&stat=instructions%3Au

stage1-O0-g:
geomean 17469M 17460M (-0.05%)

stage2-O3:
geomean 51391M 51271M (-0.23%)

clang build:
instructions:u 21430106M 21406732M (-0.11%)

nikic

This generally looks reasonable to me.

I think it would be good to do some local wall-time measurements to confirm this is indeed better (or at least neutral). For these kinds of data structure changes, instructions:u probably does not paint the full picture.

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Is this in terms of instructions or wall-time? Especially for Swiss Table I'd expect that these may not be well-correlated.

nikic · 2026-05-14T08:49:39Z

+  unsigned I = Bucket - CurArray;
+  unsigned J = I;
+  const void *Empty = getEmptyMarker();
+  while ((J = (J + 1) & Mask), CurArray[J] != Empty) {


Can we avoid this assignment in the while condition? You can use for (; CurArray[J] != Empty; J = (J + 1) & Mask) plus initialize J = (I + 1) & Mask?

The current code deduplicates (J = (I + 1) & Mask) and (J = (J + 1) & Mask)

nikic · 2026-05-14T08:52:58Z

    }
+    if (Removed) {
+      incrementEpoch();
+      Grow(CurArraySize);


So if remove_if() removes anything, we'll now always force a new allocation?

Yes i think this is small cost to pay since this O(n) operation is used very rarely

Mention in doc comment?

MaxGraey · 2026-05-14T15:48:52Z

+void SmallPtrSetImplBase::eraseFromBucket(const void **Bucket) {
+  // Knuth TAOCP 6.4 Algorithm R: walk forward sliding each following entry
+  // whose probe path crosses the hole.
+  unsigned Mask = CurArraySize - 1;
+  unsigned I = Bucket - CurArray;
+  unsigned J = I;
+  const void *Empty = getEmptyMarker();
+  while ((J = (J + 1) & Mask), CurArray[J] != Empty) {
+    auto Ideal = DenseMapInfo<void *>::getHashValue(CurArray[J]);
+    if (((I - Ideal) & Mask) < ((J - Ideal) & Mask)) {
+      CurArray[I] = CurArray[J];
+      I = J;
+    }
  }
+  CurArray[I] = Empty;
 }


It seems we can drop one mask wrap:

((I - Ideal) & Mask) < ((J - Ideal) & Mask)

can be represented as:

dist(Ideal -> hole) < dist(Ideal -> current)

Or in another representaion: Entry if Ideal <=probe Hole <probe Scan,
where <=probe is Ideal, Ideal + 1, Ideal + 2, ... modulo N
and the same with Gap:

dist(Ideal -> Scan) = (dist(Ideal -> Hole) + Gap) % N

But since we know Gap > 0 the condition becomes:

Back < (Back + Gap) % N

or similified equivalent:

Back + Gap < N

So we can avoid the second & Mask:

Back <= Mask - Gap

and writ this something like:

void SmallPtrSetImplBase::eraseFromBucket(const void **Bucket) { const unsigned Mask = CurArraySize - 1; const void *Empty = getEmptyMarker(); unsigned Hole = Bucket - CurArray; for (unsigned Scan = (Hole + 1) & Mask, Gap = 1; ; Scan = (Scan + 1) & Mask, ++Gap) { const void *Entry = CurArray[Scan]; if (Entry == Empty) break; const unsigned Back = (Hole - DenseMapInfo<void *>::getHashValue(Entry)) & Mask; if (Back <= Mask - Gap) { CurArray[Hole] = Entry; Hole = Scan; Gap = 0; } } CurArray[Hole] = Empty; }

Although I might be wrong, you should test this in practice

Thanks for suggesting the alternative. It's correct but there is no difference
https://llvm-compile-time-tracker.com/compare.php?from=6b1a6d3581202cfbce8c3bb030dbf06f1f2423fb&to=cfb3002ff020c198ea2033abc800d4195bc6fe25&stat=instructions:u

Measured locally

▎ | | C instr | C wall (ms) | ▎ |-----------|-------------|-------------| ▎ | original | 138,911,223 | 48.30 | ▎ | suggested | 139,018,238 | 48.20 | ▎ ▎ +0.08 % instructions / ~-0.2 % wall — within noise. I'll keep the original form for readability (it's a direct expression of the Knuth ▎ Algorithm R invariant) unless you'd like me to take the suggestion anyway.

▎ Compared the generated code (llvm-objdump -d, clang++ -O2, the eraseFromFilledBucket inlined into the C workload): ▎ ▎ Original: 2× sub + 2× and + cmp, both (I-H)&Mask and (J-H)&Mask chained through the entry's hash. ▎ Suggested: 2× sub + 1× and + cmp, plus 1× inc and 1× xor for the Gap counter. The Mask - Gap operand is on an independent dependency chain (no hash dependency), shortening the critical path to the compare. ▎ ▎ Same total op count per iteration; one fewer AND, but an extra INC. Measurements bear that out: +0.08 % instructions:u / -0.21 % wall on the erase-heavy workload — within noise. I think keeping the original form is preferable for readability; happy to take the change if you feel ▎ strongly about the dep-chain win.

When the hash function is not inline, Gap becomes a new live value across the call.

Sticking with my current implementation.

Yeah, Krut's algorithms are really hard to beat

Slightly off-topic regarding wall-time on https://llvm-compile-time-tracker.com. Have you considered using hyperfine (https://github.com/sharkdp/hyperfine)? It would significantly reduce noise and provide more reliable statistics. It also has many useful options, such as configuring how long the warmup stage should run and how it should be performed.

My local workbench uses this to pin a single core and disable address space randomization to get more stable results.

numactl -C 20 setarch x86_64 -R perf stat -x, -e instructions:u,cycles,L1-dcache-load-misses,task-clock

Have you considered using hyperfine (sharkdp/hyperfine)?

This is a question for nikic, but I think it's not doable since running all of stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g stage1-aarch64-O3 stage1-aarch64-O0-g stage2-O3 stage2-O0-g stage2-clang already eats up way too many resources on a machine...

BTW, TAOCP uses a decreasing iterator and makes no mention of I = J after sliding the current entry to the hole.

https://github.com/rigtorp/HashMap/blob/master/include/rigtorp/HashMap.h#L259 also employs TAOCP 6.4 Algorithm R. Its benchmark shows it beats absl::flat_hash_map in its small key benchmarks.

For DenseMap/SmallPtrSet, the code size from the template instantiations is critical. I believe swiss table family algorithms lose to simple linear probing...

MaskRay · 2026-05-14T17:48:36Z

This generally looks reasonable to me.

I think it would be good to do some local wall-time measurements to confirm this is indeed better (or at least neutral). For these kinds of data structure changes, instructions:u probably does not paint the full picture.

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Is this in terms of instructions or wall-time? Especially for Swiss Table I'd expect that these may not be well-correlated.

Both instructions:u and wall-time move in the same direction here. I extracted DenseMap / DenseMapInfo into a standalone workbench and simulated two workloads against it; one is modeled on clang Decl * lookups, which perf identifies as one of clang's heaviest DenseMap consumers. Both workloads use small pair<key, value> buckets. The relative ordering matches what shows up on https://llvm-compile-time-tracker.com/.

  ┌──────────┬──────────────────────────┬──────────────────────────┬──────────────────────────┐
  │ workload │   tombstone (original)   │ linear-probe + backshift │        Boost FOA         │
  ├──────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┤
  │ A        │ 1,295M instr / 257.06 ms │ 1,283M instr / 256.06 ms │ 1,345M instr / 257.16 ms │
  ├──────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┤
  │ C        │  147.8M instr / 51.36 ms │  138.9M instr / 50.15 ms │  170.9M instr / 50.34 ms │
  └──────────┴──────────────────────────┴──────────────────────────┴──────────────────────────┘

aengelke

LGTM otherwise.

aengelke · 2026-05-18T09:47:27Z

      if (P(Ptr)) {
-        Bucket = getTombstoneMarker();
-        ++NumTombstones;
+        Bucket = getEmptyMarker();


The comment says: "It is safe to read the set inside the predicate function." -- But this seems no longer to be the case? I guess that means finding and validating all in-tree uses of remove_if..

Updated the comment.

I find 3 references with my language server. The uses do not require changes.

llvm/include/llvm/IR/Analysis.h:227:18: llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:6576:21:

There seem to be a few more, but all seem fine:

InterleavedAccessInfo::invalidateGroupsRequiringScalarEpilogue

AArch64MachineFunctionInfo

SPIRVEmitIntrinsics::postprocessTypes

set_intersect

aengelke · 2026-05-18T09:48:07Z

    }
+    if (Removed) {
+      incrementEpoch();
+      Grow(CurArraySize);


Mention in doc comment?

aengelke

LGTM

SmallPtrSet uses quadratic probing with tombstone deletion in large mode. Tombstones occupy a third bucket state and hurts lookup. Switch to linear probing with deletion implemented using Knuth TAOCP 6.4 Algorithm R. `erase` opens a hole at the removed slot, walks forward sliding each following entry whose probe path crosses the hole back into it (the hole moves with each slide), and stops at the next empty slot. The scan stops at the next empty bucket, which is guaranteed to exist. `remove_if` clears matches in a single pass then calls `Grow` at the current size to restore the linear-probe invariant, O(N) total. (Per-match Algorithm R erase would be O(N * cluster).) My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss Table family (not good at small keys) are actually worse than the baseline. Mirrors the small-mode change in 42c3edb, which dropped tombstones from the inline array and likewise had `erase` invalidate iterators. `erase` now physically relocates following entries, so it invalidates iterators and references in large mode too. Linear probing is vulnerable to primary clustering. Therefore, we lower the max load factor to 2/3, slightly faster than the current 3/4.

commit 70f8c7b51ab600261c8f45e29d6d683a72f196e3 Author: Paul Trojahn <[email protected]> Date: Tue May 19 20:08:31 2026 +0200 [AMDGPU] Disable dpp src1 sgpr on gfx11 (#164241) https://github.com/llvm/llvm-project/pull/67461 enabled SGPRs as src1 by default for all dpp opcodes with manual checks for targets where this is not supported. In that case, isOperandLegal checked if the second operand is legal as src0. https://github.com/llvm/llvm-project/pull/155595 disabled this check by removing the calls to isOperandLegal, which resulted in SGPRs being used as operands for src1 on gfx11. This PR reenables this check and fixes the lit test. --------- Co-authored-by: Paul Trojahn <[email protected]> commit b16e3a0db7759ec31de981692e10c19d930c9dac Author: Luca Barbato <[email protected]> Date: Tue May 19 20:01:58 2026 +0200 [libc] Remove broken __builtin_aarch64_wsr fallback in set_thread_ptr (#197295) The fallback used __builtin_aarch64_wsr (32-bit) instead of __builtin_aarch64_wsr64, truncating the 64-bit thread pointer value and causing non-deterministic runtime crashes. Modern GCC correctly warns about it and -Werror=conversion catches it. ``` /var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp: In function ‘bool __llvm_libc_22_1_5_::set_thread_ptr(uintptr_t)’: /var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp:90:38: error: conversion from ‘uintptr_t’ {aka ‘long unsigned int’} to ‘unsigned int’ may change value [-Werror=conversion] 90 | __builtin_aarch64_wsr("tpidr_el0", val); | ^~~ cc1plus: all warnings being treated as errors ``` commit e13d9c2630569e024a0763694806716354b54e1f Author: Erich Keane <[email protected]> Date: Tue May 19 10:59:00 2026 -0700 [CIR] Implement atomic cmp exhange with non-const 'weak' lowering (#198546) This was left as an NYI, but appears in self build! This patch follows the existing solution in that we are doing the branching of weak vs not-weak at the CIR level. This is necessary because the LLVM intrinsics (and the CIR operaions) take 'weak' as a constant value. Unlike classic-codegen, this patch uses an 'if' instead of a 'switch' on the 'weak' value. This is mainly for readability (since it is a switch over a bool!), but also because our 'switch' doesn't seem to support 'bool', so this would require an additional cast. As a future direction, we may wish to modify the CIR operations to take 'weak' and 'failure' value (both are constants in LLVM intrinsics!) as non-constants, and handle the switch/if statement during lowering. This would give us an opportunity to optimize the value out without having to collapse the if/switch/etc, and minimize the size of the CIR. However, as that is a larger direction, this patch skips that for now. commit bb95a8dd341e6bac6e6d94ac8906d223bc7a215d Author: Erich Keane <[email protected]> Date: Tue May 19 10:58:56 2026 -0700 [CIR] Fix assumption that 'curFn' is always a function in direct-call (#197766) The code to do some checking with a builtin function tried to tell whether it is being called inside of a function of the same name. This isn't necessarily true (that it is in a function), since we generate 'global' ops as a curFn too. This patch just removes the assumption and changes the condition to only happen when we're in a function. commit 58a43dcbfe4577773c09fce0de9f0036cd99c49f Author: Arseniy Obolenskiy <[email protected]> Date: Tue May 19 19:57:52 2026 +0200 [mlir][SPIR-V] Support literal struct type in spirv.Constant (#198414) commit ace44dc5a2a48e5d024bc5c015f69a67cd9be909 Author: Arseniy Obolenskiy <[email protected]> Date: Tue May 19 19:56:04 2026 +0200 [AMDGPU] Gate `S_LSHL[1-4]_ADD_U32` patterns on uniform results (#198508) Like the other SOP2 patterns in this file, these scalar instructions require the result to be uniform. Wrap them in `UniformBinFrag` so divergent shl/add chains use `V_LSHL_ADD_U32` commit 987b0e3db2f4bd9d3622679e3ba376836a90270b Author: Krzysztof Drewniak <[email protected]> Date: Tue May 19 12:44:41 2026 -0500 [mlir][AMDGPU] Extend amdgpu.transpose_load for gfx1250 (#198354) This commit adds support for gfx1250's ds_load_tr* instructions to `amdgpu.transpose_load` since they're pretty close to the gfx950 ones. --------- Co-authored-by: Codex <[email protected]> commit ed1441988943c6ad00b1452b54f632019f1c0f47 Author: Jacques Pienaar <[email protected]> Date: Tue May 19 19:44:25 2026 +0200 [mlirbc] Add missing encoding for float types (#191962) Enabling but making it easy to disable to enable reader side first updates. commit d46cca082b2e4974985118fb7a419f9ba309ab60 Author: Jonas Devlieghere <[email protected]> Date: Tue May 19 10:43:49 2026 -0700 [dsymutil] Add missing --linker {classic,parallel} in tests (#198568) As I'm preparing to toggle the default, I found another set of tests that don't explicitly pass the linker to dsymutil. commit 43b66dfce623888a8f88415a2a97ca96d4405cd3 Author: Aiden Grossman <[email protected]> Date: Tue May 19 10:42:09 2026 -0700 [IR] Explicitly note C standard library UB (#198562) This language is to my understanding a bit outdated (if we're in a freestanding environment, we should be handling things fine to my knowledge, or at least I'm not aware of any outstanding issues reported by people compiling for freestanding environments/different languages which are somewhat prominent at this point). The language here dates back to 68f971b1d67d51272f5c141fc9e4740e27e279f4 with some minor modifications in 722212d1a0672ae18a23db58c4cfb7e38073abfa. Explicitly note the UB aspect as this came up recently when working on llubi in #190147 and I do not think hurts to explicitly note. commit c5e2c25988fb1e4dc4121adc61933109fb04a4e0 Author: Tomer Shafir <[email protected]> Date: Tue May 19 20:37:04 2026 +0300 [MC] Add -sched-model-reservation-station-scale-factor option (#195638) This patch adds a new CLI option to the MC layer called `-sched-model-reservation-station-scale-factor` that enables to scale the buffer size of all reservation stations (RS) for resources in the scheduling model by a positive `float` factor. It is limited to scaling OOO resources, and not special buffer sizes (-1,0,1), and similarly it is only allowed to produce OOO resources. This can be used for example to try find headroom in post-RA instruction scheduling for OOO cores - e.g. scale RS size by 2 and observe IPC gains, if so the code may be senetitive to the schedule and we may do better. Note: Currently, BufferSize for LSU resources defines the reservation station (RS), but if present also the ld/st queue size, which just points to the provided LSU resource. Thus, we currently scale them both in lockstep, until we have an independent ld/st queue model. The implementation adds a new method `MCSchedule::getResourceBufferSize()` that calculates the buffer size and exits early by default. It is wrapped on `TargetSchedule` for CodeGen, or other users call it directly. For `ResourceState` we add a scheduling model context to the constructor. `LSUnit` still uses `BufferSize` directly, as it has inline logic for load/store queues handling. So is `llvm/lib/Target/SystemZ/SystemZHazardRecognizer.cpp` which is target specific. Testing through `llvm-mca` debug output. We could syntesize a BB that is sensitive to the schedule and check an IPC improvement, but it would be more compilated. This only impacts llvm-mca at the moment. However, I think it should logically belong to the MC layer, which owns the scheduling model infrastructure, as the option is coupled to the model and not MCA. Thus, this patch adds it speculatively to the MC layer, e.g. so that we can easily observe how a RS-based scheduler performs for next-generation larger-RS target without re-coding and re-building LLVM. Following up, we may even move other such options to from the `llvm-mca` tool to the MC library like dispatch width etc. commit 7b227a281d29861c959db0d08c02fc97b55709e0 Author: Fangrui Song <[email protected]> Date: Tue May 19 10:30:52 2026 -0700 [SmallPtrSet] Drop tombstones in large mode (#197637) SmallPtrSet uses quadratic probing with tombstone deletion in large mode. Tombstones occupy a third bucket state and hurts lookup. Switch to linear probing with deletion implemented using Knuth TAOCP 6.4 Algorithm R. `erase` opens a hole at the removed slot, walks forward sliding each following entry whose probe path crosses the hole back into it (the hole moves with each slide), and stops at the next empty slot. The scan stops at the next empty bucket, which is guaranteed to exist. `remove_if` clears matches in a single pass then calls `Grow` at the current size to restore the linear-probe invariant, O(N) total. (Per-match Algorithm R erase would be O(N * cluster).) My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss Table family (not good at small keys) are actually worse than the baseline. Mirrors the small-mode change in 42c3edb4819f, which dropped tombstones from the inline array and likewise had `erase` invalidate iterators. `erase` now physically relocates following entries, so it invalidates iterators and references in large mode too. Linear probing is vulnerable to primary clustering. Therefore, we lower the max load factor to 2/3, slightly faster than the current 3/4. commit 5d5220c53848ef86de57946acda4bc7d8b91880c Author: Kareem Ergawy <[email protected]> Date: Tue May 19 19:23:55 2026 +0200 [CUF] Fix `do concurrent` IV type in `cuf.kernel` lowering (#198584) Induction variables in a `do concurrent` construct inside a cuf kernel directive were allocated with the `index` type instead of the Fortran-declared `integer` type. This caused a type conversion failure when the index variable was used in a context requiring a different integer or real type (e.g. `real(i)`). Fix by using `genType(*name.symbol)` to derive the allocation type from the symbol's Fortran declaration. PR stack: - https://github.com/llvm/llvm-project/pull/198584 ◀️ - https://github.com/llvm/llvm-project/pull/198585 Co-authored-by: Claude Sonnet 4.6 <[email protected]> commit 40fa628fffb96d55cdaaa294a5d02b75bb738418 Author: Michael Jones <[email protected]> Date: Tue May 19 10:22:06 2026 -0700 [libc] implement putwc (#196165) Add putwc function and tests. Part 9/10. Assisted by Gemini commit 7fc16358d9c4753a8ecdbd58698fcf4315b90646 Author: Sang Ik Lee <[email protected]> Date: Tue May 19 10:19:11 2026 -0700 [MLIR][XeGPU] Temporarily disable XeGPU peephole optimizer for CRI (#198031) Temporarily disabling the optimization pass until a fix is ready. Support for CRI was added in recent PR: https://github.com/llvm/llvm-project/pull/197229 But had post merge issues. commit f06dac7cf8efab07ce12667792210d823f2d18f3 Author: Razvan Lupusoru <[email protected]> Date: Tue May 19 10:18:18 2026 -0700 [mlir][acc] Allow non-resolvable callee in ACCImplicitRoutine (#198581) Some dialect call operations do not necessarily require or prevent for a call to occur to non-defined symbol (only func.call enforces this). Thus instead of making the condition an assert simply skip it just like other cases where call target cannot be found. This PR also adds IR testing for ACCImplicitRoutine with the final test case being related to the issue this PR is solving. commit 0b714ac473a28d45df01a507e7615f4f847bd065 Author: Sang Ik Lee <[email protected]> Date: Tue May 19 10:17:55 2026 -0700 [MLIR][XeGPU] Fix initial value issue with dpas_mx e2m1 test. (#198056) Integration test was using incorrect initial values. commit 0fc3823cd589d14fa66108b4759604c37ba0432c Author: Sang Ik Lee <[email protected]> Date: Tue May 19 10:17:19 2026 -0700 [MLIR][XeVM][CODEOWNERS] Add @akroviakov as additional XeVM code owner. (#197965) commit 618df18438ca55498326d4e56c40b98f83eb671c Author: Kareem Ergawy <[email protected]> Date: Tue May 19 19:16:11 2026 +0200 [CUF] Handle renamed managed companion pointer in `CUFDeviceAddressOpConversion` (#198161) CUFAddConstructor creates a companion pointer global (`@sym.managed.ptr`) for each non-allocatable managed variable. `CompilerGeneratedNamesConversion` may run before `CUFOpConversionLate` and rename the global by replacing dots with 'X', producing `@symXmanagedXptr`. Extend `CUFDeviceAddressOpConversion` to try both the original and the renamed suffix when looking up the companion pointer. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> --------- Co-authored-by: Claude Sonnet 4.6 <[email protected]> commit b22dbfca3e7d02f4c87f4825e23cd83ee385ffbd Author: EuphoricThinking <[email protected]> Date: Tue May 19 19:15:04 2026 +0200 [offload] update README for Tablegen (#196553) This patch changes the non-existent target `LLVMOffload` to the correct target `offload` and adds a more detailed path to `OffloadImplFuncDecls.inc`. commit c520ae386458ac3763c7d16a0be105f906e319ed Author: Michael Jones <[email protected]> Date: Tue May 19 10:14:23 2026 -0700 [libc] implement getwchar (#196164) Add getwchar function and tests. Part 8/10. Assisted by Gemini commit 8f50a28e675f71a1e9d19ba65de2b7571b566402 Author: Jeff Bailey <[email protected]> Date: Tue May 19 17:13:48 2026 +0000 [libc][NFC] Fix typos in time.yaml keys for header generation (#198551) The YAML parser for header generation expects the plural 'standards:' key for function definitions. Using the singular 'standard:' caused these entries to be ignored by the parser. Assisted-by: Automated tooling, human reviewed. commit e69073e6ed10e003aebb3c99c4739dc07807248d Author: Akshay Deodhar <[email protected]> Date: Tue May 19 10:12:14 2026 -0700 [NVPTX] Autoupgrade atom.load intrinsics to device scope (#198431) The default scope for the "atom" instruction in PTX is "gpu" (corresponds to "device" in LLVM). The default LLVM scope is "system". When scopes were not supported, the "system" scope was silently dropped, and the default device scope was desired behavior. This patch fixes the discrepancy introduced by https://github.com/llvm/llvm-project/pull/179553. commit 22e8c55ccf808620cce66c9b1fb6f1341d15a3cc Author: Yihan Wang <[email protected]> Date: Wed May 20 01:06:02 2026 +0800 Reapply [Clang] Implement P2843R3 - Preprocessing is never undefined (#196989) This PR reapply https://github.com/llvm/llvm-project/pull/192073, and make most `-pedantic` warnings default on. --- This PR marks [P2843R3 - Preprocessing is never undefined](https://wg21.link/P2843) as implemented and add tests. Fixes https://github.com/llvm/llvm-project/issues/145658 --------- Signed-off-by: yronglin <[email protected]> commit a9ceabab84b916d4e2b980a9c9d579bd48629442 Author: forking-google-bazel-bot[bot] <265904573+forking-google-bazel-bot[bot]@users.noreply.github.com> Date: Tue May 19 12:05:13 2026 -0500 [Bazel] Fixes b9c7a8e (#198579) This fixes b9c7a8e43ac26b0df422cd9a403382f0656785a0. Co-authored-by: Google Bazel Bot <[email protected]> commit b39ffdb9bff7ea0a1319b724b620b0aba8b83642 Author: Michael Jones <[email protected]> Date: Tue May 19 10:04:14 2026 -0700 [libc] implement fgetws (#196161) Add fgetws function and tests. Part 5/10. Assisted by Gemini commit 5c74de974170451733011a4b46c2e8395419e4fb Author: NeKon69 <[email protected]> Date: Tue May 19 20:02:36 2026 +0300 [LifetimeSafety] Warn when lifetimebound attribute is present on the definition but not on declaration (#197753) Warn when the `[[clang::lifetimebound]]` attribute is present on a definition but not on the corresponding declaration, with a note pointing to where the attribute appears. Closes #183016 commit ac3ccd277f8be425ffa0dd56ae4223683130e719 Author: Hassnaa Hamdi <[email protected]> Date: Tue May 19 17:55:24 2026 +0100 [LV] Recognize UF*vscale as a valid canonical-IV increment step (#196722) This patch teaches `findCanonicalIVIncrement` to recognize the pre-materialization step pattern (`mul vscale, UF`) as a valid canonical-IV step. commit 21121533d7969837f95c687f55e0a25e463fbee7 Author: Slava Zakharin <[email protected]> Date: Tue May 19 09:55:10 2026 -0700 [flang][build] Fixed missing library dependency. (#198575) I started getting this error in shared-libs build after #197442: ``` ld.lld: error: undefined symbol: Fortran::evaluate::Expr<Fortran::evaluate::SomeType>::~Expr() >>> referenced by optional:257 (.../include/c++/9.3.0/optional:257) >>> tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/DoConcurrentConversion.cpp.o:(std::_Optional_payload_base<Fortran::evaluate::Expr<Fortran::evaluate::SomeType>>::_M_destroy()) ``` This patch adds the missing dependency. commit 51c961b7eb5c55b6770fce7399d522ea112370ce Author: Michael Jones <[email protected]> Date: Tue May 19 09:54:20 2026 -0700 [libc] implement fputws (#196160) Add fputws function and tests. Part 4/10. Assisted by Gemini commit 4fb265d3629c8243a073ab598e2dbf8250741d2c Author: Oskar Wirga <[email protected]> Date: Tue May 19 12:54:16 2026 -0400 [AArch64] Fix return address auth in swiftasync epilogues (#189484) This is part of work being done in #188378 and #188638, split out from When a tail call has a non-zero FPDiff the epilogue adjusts SP before auth so when `AUTI[AB]SP` (and `autiasppc` in the PAuthLR path) authenticates LR using the current SP as the modifier the return address signed at function entry with the *entry* SP means they no longer match, causing `EXC_ARM_PAC_FAIL`. Fix by computing the entry SP into X16 and using explicit `AUTI[AB] x30, x16` when FPDiff != 0. This PR was mostly developed with LLM assistance, but human tested on arm64e hardware. --------- Co-authored-by: Jon Roelofs <[email protected]> commit 24163114a38cb53a1369583c44c1890d00f480d5 Author: Charles Zablit <[email protected]> Date: Tue May 19 18:52:55 2026 +0200 [lldb][windows] add bin to PATH in API tests (#198550) commit 24df8a6e30b96efa8cde0c88b0ba97b958af9f83 Author: Charles Zablit <[email protected]> Date: Tue May 19 18:50:49 2026 +0200 [lldb][windows] Add --lldb-use-lldb-server to lit (#197660) Currently, the Windows builds of lit only allow `SystemDrive` when copying env vars into the test environment. `LLDB_USE_LLDB_SERVER=1` in the developer shell never reached the `lldb.exe` subprocess under `check-lldb`. Baseline and lldb-server runs therefore looked identical even when the user expected lldb-server to be used. `--lldb-use-lldb-server` in lit invocations sets `LLDB_USE_LLDB_SERVER=1` to test lldb on Windows with `lldb-server` instead of the in process plugin. Depends on: - https://github.com/llvm/llvm-zorg/pull/848 rdar://177442264 commit 196c1693b4954604fc11627d8e3aafacc5afe78a Author: Michael Jones <[email protected]> Date: Tue May 19 09:43:34 2026 -0700 [libc] implement fputwc (#196158) Add fputwc function and tests. Part 2/10. Assisted by Gemini commit 39e208448d952baf50e87a3d5e51456dcb6e00ba Author: Ryan Buchner <[email protected]> Date: Tue May 19 09:40:27 2026 -0700 [SLP] Precommit tests for incorrect base pointer failure for strided store vectorization (#198564) Issue first noted https://github.com/llvm/llvm-project/pull/198408#issuecomment-4485336867 The incorrect base pointer is used for strided loads in some reordering scenarios. commit 2474ac009bd8683a7bac11f514a6c2bd73c58d6c Author: Jonas Devlieghere <[email protected]> Date: Tue May 19 09:37:13 2026 -0700 [lldb] Make CommandObject::GetTarget filter out the dummy target (#198429) Follow-up to #197805. Make CommandObject::GetTarget the canonical target accessor for command code, and tighten its semantics so that DoExecute methods can't accidentally operate on the dummy target. GetTarget now returns Target* instead of Target&. The result is the target from the command's frozen execution context, falling back to the interpreter's execution context. The dummy target is filtered out and replaced with nullptr unless the command opts in via one of the eCommandRequires{Target,Process,Thread,Frame} flags (in which case CheckRequirements has already guaranteed a real target) or via the new eCommandAllowsDummyTarget flag. This is the first half of the cleanup discussed at the end of #197805. A follow-up will audit DoExecute methods that still reach for GetSelectedTarget or m_exe_ctx.GetTargetPtr() directly and migrate them to GetTarget. commit faad403cb68b57be327302c669f24833bc70ba9f Author: kwyatt-ext <[email protected]> Date: Tue May 19 11:36:57 2026 -0500 [flang][Semantics] Enforce IMPLICIT NONE(EXTERNAL) for dummy procedures (#198398) Fix `CheckImplicitNoneExternal()` to correctly diagnose dummy arguments used as procedures that lack an explicit `EXTERNAL` attribute when `IMPLICIT NONE(EXTERNAL)` is in effect. Fixes #198395 Flang silently accepted code where a dummy argument was called as a procedure under `IMPLICIT NONE(EXTERNAL)` without the required explicit `EXTERNAL` attribute. The Fortran 2018 standard C895 requires that each dummy procedure used as a procedure shall explicitly have the `EXTERNAL` attribute when `IMPLICIT NONE(EXTERNAL)` is specified. **`flang/lib/Semantics/resolve-names.cpp`:** 1. In `NoteExecutablePartCall()`: Guard the `SetImplicitAttr(*symbol, Attr::EXTERNAL)` call with a check that EXTERNAL is not already explicitly set. This prevents clobbering the explicit/implicit distinction when a dummy argument already has an explicit `EXTERNAL` attribute. 2. In `CheckImplicitNoneExternal()`: Change the EXTERNAL attribute test from `!symbol.attrs().test(Attr::EXTERNAL)` to `(!symbol.attrs().test(Attr::EXTERNAL) || symbol.implicitAttrs().test(Attr::EXTERNAL))`. This catches both the case where EXTERNAL has not yet been set (non-dummy externals) and the case where it was set only implicitly (dummy arguments used as procedures). **`flang/test/Semantics/implicit07.f90`:** Added test cases for various uses of `IMPLICIT NONE(EXTERNAL)`. Assisted by Claude Opus. commit b22016bd20c42a3811bf8185d92e5802852d4ca7 Author: Nick Sarnie <[email protected]> Date: Tue May 19 12:36:32 2026 -0400 [offload][LIT] Remove XFAIL: intelgpu from 5 virtual function tests (#198559) Passing on the buildbot now, probably [this](https://github.com/llvm/llvm-project/pull/197556) change fixed them. Signed-off-by: Nick Sarnie <[email protected]> commit 9ab16a1fa9c3f5a6bce8fc4723710ae2906a9ca9 Author: Charles Zablit <[email protected]> Date: Tue May 19 18:35:16 2026 +0200 [lldb] split NativeFile in platform specific implementations (#196293) commit b9c7a8e43ac26b0df422cd9a403382f0656785a0 Author: Nishant Sachdeva <[email protected]> Date: Tue May 19 22:03:26 2026 +0530 [llvm-ir2vec] Creating one directory per library for IRUtils, and MIRUtils (#198170) Follow up to https://github.com/llvm/llvm-project/pull/194414#issuecomment-4467455664 . Made a patch to move IRUtils, and MIRUtils to their own subdirectories commit c9213db3ea3bb8be9eb079f20ab3aaff11b73048 Author: Stefan Weigl-Bosker <[email protected]> Date: Tue May 19 12:29:26 2026 -0400 [InstCombine] Fold select of neg/not into sext-sub (#198225) Adds the following folds: - `select C, (sub 0, Y), (xor Y, -1) -> sub (sext !C), Y` - `select C, (xor Y, -1), (sub 0, Y) -> sub (sext C), Y` The original motivation is the high half of a two-word negation (See ```ll define i64 @f(i64 %lo, i64 %hi) { entry: %not.hi = xor i64 %hi, -1 %lo.is.zero = icmp eq i64 %lo, 0 %neg.hi = sub i64 0, %hi %r = select i1 %lo.is.zero, i64 %neg.hi, i64 %not.hi ret i64 %r } ``` Which can be folded to: ```ll define i64 @f(i64 %lo, i64 %hi) { entry: %lo.not.zero = icmp ne i64 %lo, 0 %mask = sext i1 %lo.not.zero to i64 %r = sub i64 %mask, %hi ret i64 %r } ``` Proofs: https://alive2.llvm.org/ce/z/DATqof Fixes: #198179 commit 7a2aa25c8c6dd77888dfa49aae4688e5daa3d82d Author: Jordan Rupprecht <[email protected]> Date: Tue May 19 11:27:16 2026 -0500 [bazel][mlir][OpenMP] Fix 91467766a8afb52439619163828c5f6816ddd550 (#198567) Add missing dep for Arith.h commit 7f676cc7f0d861a9665b6235ac81d9246e8fd97c Author: Hassnaa Hamdi <[email protected]> Date: Tue May 19 17:17:50 2026 +0100 [LV] Fix analysis remarks leak (#197697) Currently, `LoopVectorizeHints::vectorizeAnalysisPassName()` returns empty pass name when the user forces vectorization, which allows analysis remarks to be emitted implicitly. The code sometimes emits analysis remarks with passing explicit pass name and other times it uses the `vectorizeAnalysisPassName()` which prevents consistent behaviour. This PR removes the implicit emitting so that analysis remarks consistently require `-pass-remarks-analysis` to print. Affected tests are updated to add `-pass-remarks-analysis`. commit f5598a91d39c5a2a7d57693f8bc89c6562eb9fde Author: asadium <[email protected]> Date: Tue May 19 12:16:55 2026 -0400 [AMDGPU][GlobalISel] Add RegBankLegalize rules for G_FPTOSI_SAT/G_FPTOUI_SAT (#198208) commit 998d8e87daf371b51822ce97e36046755668e5e6 Author: Jason Van Beusekom <[email protected]> Date: Tue May 19 11:13:42 2026 -0500 [Flang][Fir] Set default alignment of array globals to 64 bytes (#194969) This commit implements the proposal from the RFC: https://discourse.llvm.org/t/rfc-alignment-of-global-arrays/90397/13 This PR sets the alignment of all global arrays to 64 bytes (except for BIND(C) and common blocks). This Mirrors the execution of other fortran compilers (CCE, gfortran, nvfortran and ifx). commit a5077468984ac3c47e6a3ca779c6f0ba680706c0 Author: Alex MacLean <[email protected]> Date: Tue May 19 09:11:39 2026 -0700 [NVPTX] Fixup and test ExpandVariadics (#195709) The lowering of va_arg will load the arguments from the cursor via ld.local since the stack is used to store va_args. Update the ExpandVariadics NVPTX info struct to be consistent with this. Also initialize the pass for easier debugging. commit 12470b3109bfa6e5fb3afda6bb22001add4d788b Author: Artem Kroviakov <[email protected]> Date: Tue May 19 17:57:23 2026 +0200 [MLIR][XeGPU] Improve deinterleave/interleave/dpas_mx ops handling (#197223) commit 868bf3fbb96a4daae438c83484881bfcb0e68c2c Author: Jordan Rupprecht <[email protected]> Date: Tue May 19 10:51:33 2026 -0500 [bazel][libc] Fix 8076d17b61028e7fd5723fa84fd5615c945ae46b (#198553) Add new targets & deps for syscall wrappers commit 5ff6c5e4a0f7882c2b9b52dff6710ff33c11a74c Author: hidekisaito <[email protected]> Date: Tue May 19 08:48:37 2026 -0700 [AMDGPU] SIFoldOperands: constant-fold S_ADD/S_SUB with immediate operands (#198410) Extend SIFoldOperands::tryConstantFoldOp to recognise three patterns * ADD/SUB(imm, imm) -> S_MOV_B32 (LHS +/- RHS) * ADD x, 0 -> COPY x (Also `0 + x`) * SUB x, 0 -> COPY x (SUB is not commutable) Assisted-by: Claude Opus 4.7 commit 8148e168f5f8d91fc364947749d3362c630762b7 Author: Walter <[email protected]> Date: Wed May 20 01:31:59 2026 +1000 [X86] Fold splat AND on VGF2P8AFFINEQB source (#193364) Given that each row within `vgf2p8affineqb`'s matrix controls which source bits are selected, zeroing the same bit within all rows treats that corresponding source bit like it is zero. This means a AND of the input by any splatted 8-bit values can be folded with the matrix. This is patch: - Can eliminate a constant and/or reduces the instruction count from 2 to 1. - Only occurs when the matrix is constant, ensuring that it can't increase the dependency chain. - Don't occur if the AND is multi use well the splat isn't constant, preventing additional operations. - Work with both constant 8-bit splats and scalars value that where splatted to a vector. - Includes test coverage for positive cases (by constants, variable scalars, non zero immediates) and negative (multi use, larger splats, variable matrices). Fixes #191325 commit e0f8b79090063d1e5cebcc85198821c60a9685d8 Author: Shilei Tian <[email protected]> Date: Tue May 19 11:31:16 2026 -0400 [AMDGPU] Add three target features msad-insts, mqsad-pk-insts, and mqsad-insts (#198432) commit af0b42d8a5b0dcfacf958a2267855ce25002675f Author: Kiriti Ponduri <[email protected]> Date: Tue May 19 20:57:04 2026 +0530 [libc] prefer *at syscalls in sys/stat wrappers (#197940) - These changes flips the #ifdef order to prefer the *at syscalls over normal ones. - In modern architectures, *at system calls are preferred over normal system calls cuz of safety issues. - So by checking for ""*at"" system calls first, we ensure better compatibility with modern systems. - After then normal syscalls moved else or elif for support to older ones. - From merged pr(#195792) and issue(#195620) --------- Signed-off-by: udaykiriti <[email protected]> Co-authored-by: Jeff Bailey <[email protected]> commit e04895ff05c58f78acfbe5be02973c9bd1f0c689 Author: Harishankar.1 <[email protected]> Date: Tue May 19 20:56:50 2026 +0530 [X86] Lower vector i8 ashr-by-1 using pavgb (#198487) For vector i8 arithmetic shift right by 1, the current lowering produces a 5-instruction sequence (psrlw + pand + xor + psubb plus a constant load) with a 4-deep dependency chain. This patch uses the identity ashr(x, 1) == avgceilu(x, -1) ^ (~x & 0x80) to lower to ISD::AVGCEILU + a short fixup, producing 4 instructions on SSE/AVX/AVX2 and 3 on AVX-512BW (after vpternlogd fusion of the AND/XOR pair), with two parallel dependency chains instead of one long one. The freeze on R is required because the target reads it twice, matching the pattern of the existing `shl R, 1 -> add R, R` case in LowerShiftByScalarImmediate. Alive2 proof: https://alive2.llvm.org/ce/z/LbXPhE Fixes #198061 commit 6fd09a539f5f397c569f81e22dcd032d5e818526 Author: Carlos Seo <[email protected]> Date: Tue May 19 12:22:24 2026 -0300 [flang][OpenMP] Skip declare simd lowering for interface bodies (#197010) When DECLARE SIMD appears in the specification part of an interface body, the PFT records the directive as an evaluation of the enclosing program unit rather than of the interface body's subprogram. Its clause operands (linear/aligned/uniform) reference dummy arguments local to the interface body, which have no address in the enclosing scope, causing a crash. Detect the mismatch by comparing the program unit containing the directive with the procedure currently being lowered, and skip op emission when they differ. This handles both explicit declare simd(proc-name) and implicit forms in any enclosing context. Fixes #192581 commit 91467766a8afb52439619163828c5f6816ddd550 Author: SunilKuravinakop <[email protected]> Date: Tue May 19 20:52:20 2026 +0530 [Flang] [OpenMP] atomic compare (#184761) Support for `omp atomic compare` in flang. Multiple clauses like capture with compare are not supported An issue for this was raised earlier at [181116](https://github.com/llvm/llvm-project/issues/181116) --------- Co-authored-by: Sunil Kuravinakop <[email protected]> commit e32fc5dc5e71251f0f7d605604739602792b6e61 Author: Craig Topper <[email protected]> Date: Tue May 19 08:20:40 2026 -0700 [SelectionDAG] Use getExtractSubvector. NFC (#198450) commit 9ccaf838121687821a098975f55be1872bc288f3 Author: Benjamin Stott <[email protected]> Date: Tue May 19 16:13:42 2026 +0100 [LLVM] Add a function to reset the opt bisector (#197723) For daemonized testing, we need to be able to reset the global opt bisector between test runs. This PR just adds a small function to the OptPassGate class to reset its state. commit 8f6ed9f490bc8f7b88d945b9e38fd715fcee1966 Author: Ebuka Ezike <[email protected]> Date: Tue May 19 16:13:18 2026 +0100 [lldb] Fix possible invalidated iterator. (#198482) The begin or end interator may be invalidated when a idx_pos in erased from the vector. Unblocks sanitised CI. commit 38949dbc8b8f763cc14139703f1513abfe7f4b2f Author: Alex Bradbury <[email protected]> Date: Tue May 19 16:08:31 2026 +0100 Revert "[AArch64] Copy x4/x5 vararg payload into the x64 stack in Arm64EC exit thunks" (#198540) Reverts llvm/llvm-project#190933 Reported issues with an EXPENSIVE_CHECKS_BUILD. Reverting so this can be fixed without undue time pressure. commit 960ae6f812c8db7c92532898c338aeefc6f7ac6d Author: Jean-Didier PAILLEUX <[email protected]> Date: Tue May 19 17:07:20 2026 +0200 [Flang] Adding -ffree-line-length-<value> flag (#192941) Added support for the `-ffree-line-length-<value>` flag in Flang, which is equivalent to `-ffixed-line-length-<value>` but in free form. This flag is supported by gfortran and can be used in some applications. --------- Co-authored-by: Tarun Prabhu <[email protected]> Co-authored-by: Andre Kuhlenschmidt <[email protected]> commit 4cb4bbca087458d33e946341cd79c6a2b9a605cc Author: Sirraide <[email protected]> Date: Tue May 19 17:05:29 2026 +0200 [Clang] [NFC] Use `unique_ptr<Lexer>` everywhere (#198393) Replace every instance of `new Lexer` with `make_unique<Lexer>` and adjust `Lexer::Create_PragmaLexer()` to return a `std::unique_ptr<Lexer>` instead. The Preprocessor was already storing a `unique_ptr<Lexer>`, so there’s no need to change how that works. commit f6fb8f573ecbb64f41b473700589040acf0b44b9 Author: Charles Zablit <[email protected]> Date: Tue May 19 17:03:41 2026 +0200 [lldb][windows] remove path separator replacement from TestGDBRemoteClient.py (#198537) Since https://github.com/llvm/llvm-project/pull/197942, vRun packets use the native path separators. TestGDBRemoteClient.py now fails on Windows because it converts the path to POSIX style paths, which is a workaround for what https://github.com/llvm/llvm-project/pull/197942 fixed. rdar://177342572 commit 673b17e1b38fbe4dfe01c9aa66c368857b13dfdb Author: Simon Pilgrim <[email protected]> Date: Tue May 19 16:02:41 2026 +0100 [DAG] scalarizeExtractedBinOp - extract from non-constant one use buildvectors (#198013) When attempting to scalarize a vector binop that has a single extract, we currently only fold if either of the binop's operands is a constant buildvector - but we can extract from non-constant buildvectors without increasing instruction count as long as the vector binop was the only use of the buildvector. More yak shaving for #196493 commit bcfa53e6e778295ea51665284da583e19a8073b9 Author: Razvan Lupusoru <[email protected]> Date: Tue May 19 08:02:13 2026 -0700 [flang][acc] Handle Fortran do loops as acc loops in acc routine (#198420) As was previously done for do loops in acc compute constructs in https://github.com/llvm/llvm-project/issues/149614 , this PR does the same for do loops in `acc routine`. The rules are follows: - Do loops not marked with `acc loop` are considered `auto` - Do concurrent loops are considered `independent` - Any loops in an `acc routine seq` are considered `seq` This ensures that the IV is correctly privatized and attached to acc loop. commit 75e4aafad0b3b1cb3a9f4314da8eca299437646c Author: Jameson Nash <[email protected]> Date: Tue May 19 10:51:00 2026 -0400 Reland "[CodeGen] Use byte offsets and ptradd in ShadowStackGCLowering" (#197436) Replace typed struct GEPs with byte array allocation and ptradd operations: 1. Track root offsets as byte offsets instead of building typed struct. 2. Use `ComputeFrameLayout` to compute byte offsets based on DataLayout, properly accounting for each root's size and alignment. 3. Allocate frame as `[FrameSize x i8]` byte array instead of typed struct. 4. Replace all CreateGEP operations with CreatePtrAdd using computed offsets. 5. Frame layout unchanged: `[Next ptr | Map ptr | Root 0 | Root 1 | ... | Root N]` where each root is placed at its computed aligned offset. 6. Zero out padding between roots with memset for deterministic frame contents for GC. Benefits: - Removes dependency on `getAllocatedType` for building frame struct - Properly handles root alignment requirements (fixes TODO in code) Relands #178436, after fixing the tests that I'd added specifically to show what changed in that PR, but forgot to rerun. And fixed the logic bug that having tests uncovered (using the wrong size for the frame in the metadata). Co-authored-by: Claude Sonnet 4.5 <[email protected]> commit 52ca170e12c6ec42aa9327e3486e4c37efec1c31 Author: Tom Stellard <[email protected]> Date: Tue May 19 07:50:36 2026 -0700 [Github] Hashpin base container in CI Tooling containerfile (#197315) https://github.com/llvm/llvm-project/security/code-scanning/1492 commit 213b3292904995e00db309eea66431e1b727c181 Author: Adel Ejjeh <[email protected]> Date: Tue May 19 09:48:59 2026 -0500 [AMDGPU][NFCI] Change MCSubtargetInfo references in AMDGPUBaseInfo.h/.cpp to be const ref instead of pointers (#197038) Change all `AMDGPU::IsaInfo` functions and `initDefaultAMDKernelCodeT` to take `const MCSubtargetInfo &` instead of `const MCSubtargetInfo *`. These functions never accept null, so a reference better expresses the contract. Also change `AMDGPUMCKernelCodeT::initDefault` to take a const reference for consistency, and convert local `MCSubtargetInfo` pointer variables to references in `AMDGPUMCExpr.cpp` where the pointer is always dereferenced. Requested by @arsenm in https://github.com/llvm/llvm-project/pull/192306#discussion_r2076113671. Co-authored-by: Claude Opus 4 (1M context) <[email protected]> commit 94b1d194ac4b7c11ad871efe5e333aed36f576bf Author: Jameson Nash <[email protected]> Date: Tue May 19 10:48:17 2026 -0400 [Utils] Examine debug info type instead of alloca type to guess the debug behavior of the alloca uses (#177480) Replace `isArray` and `isStructure` helpers that queried alloca IR type with a `isCompositeType` helper that checks the debug variable's source-level type from debug info metadata to decide if this seems perhaps profitable to convert to this debug info from #debug_declare to a #debug_value. This changes behavior: the lowering decision is now based on the source-level type from debug info rather than the IR alloca type, which is more semantically correct for debug info processing. This should have minimal effect on clang, but may change behavior more significantly on front-ends like rust that have not used semantically meaningful alloca element types. Removes all uses of getAllocatedType() from Utils/Local.cpp. This seemed slightly more semantically correct to me, though it is slightly challenging to enumerate all of the possible scalar debug types. Another alternative perhaps is to actually examine all of the uses and decide if all of them will get successfully handled before the `DDI->eraseFromParent` deletes the debug record, since `ConvertDebugDeclareToDebugValue` also just deletes the debug info (replaces with poison) if it finds a use that doesn't have an easy way to replace. Co-authored-by: Claude Sonnet 4.5 <[email protected]> commit b7cc80035ad6bd4b778680707283f06fe0912f21 Author: Luke Lau <[email protected]> Date: Tue May 19 15:46:00 2026 +0100 [VPlan] Simplify select x, (i1 y | z), y -> y | (x && z) (#190196) Fixes https://github.com/llvm/llvm-project/issues/189553 This adds a canonicalization `select x, (i1 y | z), y -> y | (x && z)`, [Alive2]( https://alive2.llvm.org/ce/z/qcQRn6). InstCombine already performs this. This adds a canonicalization which causes the `lhs | (headermask && rhs) -> vp.merge rhs, true, lhs, evl` pattern in optimizeMasksToEVL to match, improving the RISC-V codegen for an anyof select reduction. commit 581dd5b183e1cc29e4b129f12e3c1b203a77e6b0 Author: Sairudra More <[email protected]> Date: Tue May 19 19:46:15 2026 +0530 [flang] Inline scalar-to-array hlfir.assign at -O0 (#197092) At `-O0`, Flang can lower trivial scalar-to-array broadcasts such as `c = a(1) + 1.0` through `_FortranAAssign`. That runtime path can call `free()`, which is not valid in OpenMP GPU device code. This patch teaches `InlineHLFIRAssign` to handle trivial scalar RHS values. At `-O0`, the pipeline runs it in a scalar-RHS-only mode, so only scalar-to-array broadcasts are inlined. Array-to-array assignments still fall back to `_FortranAAssign` at `-O0`. Scalar RHS values are materialized before the generated loop with `loadTrivialScalar`, preserving intrinsic assignment ordering for cases like `a = a(1)`. At `O1+`, the full `InlineHLFIRAssign` pass still runs as before, now also supporting scalar RHS. The remaining files are test updates from scalar-to-array assignments now being inlined at `-O0` instead of lowering through `_FortranAAssign`. Other device-runtime call paths are left out of scope. Fixes #197091. Co-authored-by: Sairudra More <[email protected]> commit ed81c50b077d172e2691acc7a8887d8f0a278880 Author: Charles Zablit <[email protected]> Date: Tue May 19 16:09:16 2026 +0200 [lldb][windows] Fix second-chance exception delivery on lldb-server (#197956) Currently, all tests that wait for the debugger to stop when the process crashes time out on Windows under `LLDB_USE_LLDB_SERVER=1` because of 2 issues: 1. The `if (!first_chance) SetState(eStateStopped, false)` before the switch silently advances `m_state` on every second-chance event. The `default:` branch later calls `SetState(eStateStopped, true)` but this is never reached because `state == m_state`. The client is waiting for a reply that is never sent. 2. The `default:` branch's first-chance handling stops all threads and then returns `SendToApplication`, which tells Windows `DBG_EXCEPTION_NOT_HANDLED`. This hangs the process, the second-chance event never arrives. `ProcessWindows` is a no-op on first-chance non-breakpoint exceptions because of this: it just returns `ExceptionResult::SendToApplication` with no `StopThread/SetState`. This patch fixes both issues and prevents the following tests from timing out: ``` lldb-api :: functionalities/breakpoint/breakpoint_conditions/crashing_condition/TestCrashingCondition.py lldb-api :: functionalities/builtin-debugtrap/TestBuiltinDebugTrap.py lldb-api :: functionalities/inferior-assert/TestInferiorAssert.py lldb-api :: functionalities/inferior-crashing/TestInferiorCrashingStep.py lldb-api :: functionalities/inferior-crashing/recursive-inferior/TestRecursiveInferiorStep.py lldb-api :: functionalities/location-list-lookup/TestLocationListLookup.py ``` rdar://177425581 commit 69385549f52497076e7951f7373c192abdf04005 Author: Nico Weber <[email protected]> Date: Tue May 19 10:08:48 2026 -0400 [gn] port 182ae96a82fc (#198533) commit d06e6936a42375bf44ba58401f0f01f89ff1f750 Author: agozillon <[email protected]> Date: Tue May 19 16:08:00 2026 +0200 [Flang][OpenMP] Restrict implicit default declare mapper from applying deep-copies of pointer members (#197885) According to the OpenMP specification, only allocatables should get deep-copy behaviour inside of implicit default declare mappers. This PR restricts this behaviour. Relevant specification exert, added as a comment for a reminder: // "If a component of a derived type list item is a map clause list item // that results from the predefined default mapper for that derived type, // and the component is not also an explicit list item or the array base // of an explicit list item on the same construct, then: if it has the // POINTER attribute, it is attach-INELIGIBLE. If a list item in a map // clause is an associated pointer that is attach-ineligible, the effect // of the map clause does not apply to its pointer target." This prevents certain programs from unexpected over-mapping via pointer nesting, doesn't prevent that for allocatables, but that's OpenMP specification mandated foot shooting, so it's free game. commit 52871b5416688bd904d293cc2bd4c9e5ac3452b1 Author: Charles Zablit <[email protected]> Date: Tue May 19 16:07:23 2026 +0200 [lldb][PECOFF] Recognise truncated .lldb{summaries,formatters} section names (#198377) commit 7ae1a3fad8f06625d4e55b846a090ce9206c3c82 Author: Vlad Serebrennikov <[email protected]> Date: Tue May 19 18:05:41 2026 +0400 [clang] Give unnamed namespaces internal linkage (#198215) Recently in #194600 we exposed formal linkage in AST dump. That PR came with a bunch of FIXMEs. One of them is about the fact that we consider unnamed namespaces to have external linkage, while the Standard says it's internal linkage ([[basic.link]/4](https://eel.is/c++draft/basic.link#4.sentence-1)): > An unnamed namespace or a namespace declared directly or indirectly within an unnamed namespace has internal linkage. Of course, declarations within unnamed namespaces still had internal linkage (nothing would work otherwise). The intent of this patch is to give unnamed namespaces internal linkage and to do a bit of refactoring in `LinkageComputer::getLVForNamespaceScopeDecl` to use linkage of the enclosing namespace as the default linkage of declarations within it, now that all kinds of namespaces have the correct linkage. No changes to the behavior of programs are intended. commit 8076d17b61028e7fd5723fa84fd5615c945ae46b Author: Pavel Labath <[email protected]> Date: Tue May 19 16:00:59 2026 +0200 [libc] Port remaining socket functions to syscall_wrappers (#198463) While in there: - fix file headers to conform to latest standards - add missing restrict qualifier to recvfrom Assisted by Gemini. commit c8d9852b0a8d5ea3e8c4549f8574e0a5cebb5997 Author: Mariusz Sikora <[email protected]> Date: Tue May 19 15:58:51 2026 +0200 [AMDGPU][NFC] Add tests for 64bit literals in single DWORD instructions for gfx13 (#197907) Co-authored-by: sstipano <[email protected]> commit d6d5a3da41e24a754fdc9b3a2cb06d60f3874ce1 Author: Brian Cain <[email protected]> Date: Tue May 19 08:46:52 2026 -0500 [BOLT] Gate PointerAuthCFIFixup unit test on AArch64 target availability (#197464) The test bodies reference AArch64:: namespace identifiers (ADDSXri, X0) which fail to compile when AArch64 is not in LLVM_TARGETS_TO_BUILD. Wrap all TEST_P bodies in #ifdef AARCH64_AVAILABLE and add GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST to suppress GoogleTest's uninstantiated suite error when no target instantiates the tests. commit f3c6257fabff4a138a0d119041c52678586eb013 Author: Simon Pilgrim <[email protected]> Date: Tue May 19 14:43:56 2026 +0100 [lanai] multiply.ll - regenerate test checks (#198521) commit ae8d973ae253463e4b821086b720fbc813c61e96 Author: jinge90 <[email protected]> Date: Tue May 19 21:43:29 2026 +0800 Fix unused parameter for add_bitcode_entrypoint_library for GPU Libc (#198458) commit 39a1ed4222782ccc1c97bbb497aeca2718b1b793 Author: Dmitry Vasilyev <[email protected]> Date: Tue May 19 17:41:53 2026 +0400 [lldb][Windows] Disable TestLinuxCore.LinuxCoreTestCase.test_object_map on Windows (#198473) See https://github.com/llvm/llvm-project/issues/198471 for details. commit e2464bf702197349a1ec0691e4ab188ea9a18155 Author: Pavel Labath <[email protected]> Date: Tue May 19 15:33:29 2026 +0200 [libc] Add struct sockaddr_in (#197909) The struct needs to be 16 bytes long for compatibility with the linux kernel (which rejects smaller sizes, even though the reset of the bytes are unused). The padding field (and its name) is not specified by POSIX, but it's traditionally called sin_zero, and there exists a fair amount of code that references that name, so I'm matching it as well. I'm testing the compatibility of this struct by binding to a localhost address. This test requires that the machine has a loopback interface with an assigned ipv4 address. If some of the environments do not have it, we can try to detect this in the test and skip it, but this would diminish the value of the test. As a drive-by, I'm also adding the (non-POSIX) INADDR_LOOPBACK constant. Assisted by Gemini. commit 4a32cf0b6d90f6bec3b5f2ad50b11ca1dba62954 Author: Hans Wennborg <[email protected]> Date: Tue May 19 15:15:39 2026 +0200 Revert "[MC/DC][Coverage] Enable profile correlation for MC/DC" (#198520) The instrprof-mcdc-correlation.c test doesn't pass on Mac, see discussion on the PR. Reverts llvm/llvm-project#136437 commit c80b992d20e9d71c1361e28c5e4d25fc7716ee99 Author: Raul Tambre <[email protected]> Date: Tue May 19 16:12:55 2026 +0300 [cmake][runtimes] Remove unused local CMake flag transform (#198506) This was used by some code to pass flags per-architecture to runtime builds. User code can never reach this path and the last use was removed more than a year ago. Fixes: bd6df0fe21faeaf3fd2a8ad17074f10620c78378 commit e7df16d856c86d2ecdcdf110bf62a5af46513ad6 Author: Ryotaro Kasuga <[email protected]> Date: Tue May 19 21:58:26 2026 +0900 [LoopInterchange] Add test for poison can be produced due to ninf (NFC) (#197922) This patch adds a test case illustrating that loop interchange can turn an input program that was previously well‑defined into one that exhibits UB, because reordering floating‑point operations in the presence of the ninf flag may produce a poison value. In such a case, we should drop the ninf from instructions involved in the transformation. Also add another case to show that we can safely apply loop-interchange without eliminating ninf, because it doesn't change the order of floating-point operations chain. Related issue: #148851 commit fa46161cbd86f59a2b4d66dcac8909c7690135c9 Author: Sam Parker <[email protected]> Date: Tue May 19 13:56:32 2026 +0100 [NFC][WebAssembly] Codegen test (#198503) commit 14cb771bb45040d217c205624aaa9cd8e505ab5a Author: David Sherwood <[email protected]> Date: Tue May 19 13:54:51 2026 +0100 [LV][NFC] Fix enums and comments in VPInstruction class (#198498) From an audit of the code I can see that we only ever create VPInstructionWithType objects for the following opcodes: Scalar loads - see createVPInstructionsForVPBB All supported CastInst variants - see createVPInstructionsForVPBB VPInstruction::StepVector via createNaryOp VPInstruction::WideIVStep via createNaryOp VPInstruction::VScale via createElementCount Explicit creation of Instruction::Trunc, Instruction::SExt and Instruction::ZExt via createScalarZExtOrTrunc, etc. Therefore, I've updated the list of enums in the VPInstruction class to reflect what classes they are associated with. This now correctly matches the code in VPInstructionWithType::classof. commit a84d94802dad5b60947b66c2b04ec12e1efa8d7a Author: Raphael Isemann <[email protected]> Date: Tue May 19 13:29:52 2026 +0100 [lldb] Add a packet-test-delay setting for testing slow connections (#195440) [lldb] Add a packet-test-delay setting for testing slow connections Sending/receiving packets to/from a non-host devices adds latency to the gdb-remote communication that induces substantial slowdown into the debugging experience. This patch adds an artificial delay before sending a packet to simulate this latency without requiring an emulated/physical device. The test checks only checks that there is a delay when this setting is activated, but not that there is no delay when the setting is not activated. The reason for this is that this negative test would either require a large delay (which further slows down the test suite) or be prone to accidential failures on overloaded test bots. There is also the question whether we should have dynamic delays that depend on the number of transferred bytes. From talking to Felipe it seems the real bottleneck is really just the latency and not the transfer speed. From my own testing, it seems that a simple fixed array mostly simulated the remote debugging performance, and the delays I can see on my own machine seem to mostly resemble the delays we can see when debugging on a remote device. This patch also required wrapping `SendPacketAndWaitForResponse` so we can sync this setting on a per-packet basis for the GDBRemoteCommunication class. Without this, we could only set the delay before connecting and then never change afterwards which would make this feature less useful and writing a simple test impossible. commit ea922daab1803d4be3a1df0bb37cce83bc48d8a3 Author: Charles Zablit <[email protected]> Date: Tue May 19 14:27:54 2026 +0200 [lldb][windows] Plumb Windows DLL load/unload through lldb-server (#197901) commit b4aca7a59d0754eb54bbeddb53de746adfbe8b4b Author: Donát Nagy <[email protected]> Date: Tue May 19 14:12:12 2026 +0200 [NFC] Clarify behavior of ownership_holds in docs (#197933) I misunderstood the behavior of the attribute `ownership_holds` based on the claim that "using held memory is assumed to be legitimate". To avoid similar misunderstandings in the future, I'm adding an extra sentence to the documentation. (This question came up during the discussion of #196798) commit 313cc46e5893e6299ee1460e8dcc4f0ff5da015a Author: dcandler <[email protected]> Date: Tue May 19 13:08:19 2026 +0100 [libc][cmake] Don't assume CPU features in cross builds (#198308) Assuming all CPU features are available during cross builds will be incorrect in some cases, such as armv8.0-a where FullFP16 is not available and therefore FP16 instructions will not be understood. Looking at the file history, the feature detection code previously relied on try_run instead of try_compile, which might explain why it couldn't be used in cross builds as cross built binaries wouldn't be executable. But now the tests are compile only, they can work for cross builds too, which would be better than assuming features. commit 75ae202145b316883caa6074e9c94431ef152c26 Author: David Green <[email protected]> Date: Tue May 19 12:56:32 2026 +0100 [AArch64][GlobalISel] Add sign bits for G_FCMEQ (#198314) This adds basic num-sign-bits for G_FCMEQ, G_FCMGE and G_FCMGT, which all produce either all-ones or all-zeros in each vector lane. This function apparently goes in AArch64ISelLowering. commit 0e8f3574ad757596eeb1df446121f61c3146c961 Author: Alexey Bataev <[email protected]> Date: Tue May 19 07:09:14 2026 -0400 [SLP] Prefer outer binary op when opcode groups tie in buildInstructionsState When VL contains instructions of different opcodes with equal counts, the tie-breaking in buildInstructionsState could replace an outer operation (e.g., fadd) with an inner one (e.g., fmul) that appears as its direct operand, depending on SmallMapVector iteration order. Add a check: if the current MainOp is a BinaryOperator with a direct operand matching the challenger partition's opcode in the same block, keep MainOp instead of switching to the inner operation. Partially fixes #43353 Reviewers: bababuck, hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/198194 commit 230980947083c2309f135c82f10c735b0e06461f Author: Alexey Bataev <[email protected]> Date: Tue May 19 06:56:00 2026 -0400 [SLP] Support ordered fadd reduction via reduction intrinsics Add matchOrderedReduction() to recognize linearized ordered fadd chains (both LHS- and RHS-associated) and tryToReduceOrdered() to vectorize them using ordered reduction intrinsics (llvm.vector.reduce.fadd). Previously, the SLP vectorizer could only vectorize ordered reductions by keeping the original scalar chain and emitting extractelement instructions. The new path replaces the scalar chain with a vector ordered reduction intrinsic (where profitable), which allows the backend to lower it more efficiently. Reviewers: hiraditya, RKSimon, bababuck Pull Request: https://github.com/llvm/llvm-project/pull/189451 commit 5ac91ca11e897f1e06b1cd1439ade7e191d03001 Author: Jack Styles <[email protected]> Date: Tue May 19 11:51:13 2026 +0100 [Flang][OpenMP][NFC] Track Objects for BlockArgs (#197442) When lowering a BlockArg in OpenMP, currently the symbol is tracked. This can however cause issues later on down the line as information may be lost relating to an expression. For example, an ArrayElement will be represented by its symbol, in this case the full array. This is not ideal as its just he ArrayElement that is intended to be represented. Now, the object is tracked instead of the Symbol. For cases where the symbol is required, appropriate API is available to retrieve this information. This change opens the ability to better handle lowering of expressions such as Array Elements. Assisted-by: Codex commit 182ae96a82fc355eca89d12c179d6a0961c653d8 Author: Nikolas Klauser <[email protected]> Date: Tue May 19 12:41:42 2026 +0200 [libc++] Port The OpenBSD localization to the new locale API (#194317) commit de3ee84346d6dcf77ac20fe5c8acc95594886cbc Author: forking-google-bazel-bot[bot] <265904573+forking-google-bazel-bot[bot]@users.noreply.github.com> Date: Tue May 19 12:14:52 2026 +0200 [Bazel] Fixes 6f92180 (#198467) This fixes 6f9218051ab9e04a8547f7029ca1a9804b5c526d. Co-authored-by: Google Bazel Bot <[email protected]> commit aecb57616f5853d3c4a405eff8a9b1c14ed05d23 Author: Ebuka Ezike <[email protected]> Date: Tue May 19 11:12:54 2026 +0100 [lldb] Fix no compile unit crash. (#195853) This crash happens in lldb-dap when hovering inspecting over instruction addresses in a frame that does not have debug information. commit 647cb063d5d3d91547f35809b7990a7769227984 Author: Hristo Hristov <[email protected]> Date: Tue May 19 13:07:55 2026 +0300 [libc++][ranges] `ranges::iota_view` update tests with `__int128` (#175447) https://github.com/llvm/llvm-project/pull/167869 made `iota_view` `__int128` aware but tests needed updating. --------- Co-authored-by: Hristo Hristov <[email protected]> commit 4c10240569025ab1fe086f220d59d36abfcbe906 Author: Nishant Sachdeva <[email protected]> Date: Tue May 19 15:35:27 2026 +0530 [llvm-ir2vec][NFC] Adding disclaimer to Bindings requirements.txt to check compatibility with ml-compiler-opt (#198171) Follow up PR for https://github.com/llvm/llvm-zorg/pull/846#issuecomment-4467263196 commit 7c45228c97288c2665788e698feb58a6a4d9ff47 Author: Charles Zablit <[email protected]> Date: Tue May 19 12:02:15 2026 +0200 [lldb] Remove FileAction::Clear (#198350) commit a256cf74819b911a8876e5cfba58ec8aade6dca7 Author: Chaitanya <[email protected]> Date: Tue May 19 15:25:59 2026 +0530 [CIR] Implement function target/tune attrs and FMV metadata. (#195813) Port OGCG's GetCPUAndFeaturesAttributes into CIRGenModule, replacing the opFuncMultiVersioning placeholder. Handles TargetAttr / TargetVersionAttr /CPUSpecificAttr / TargetClonesAttr, AMDGPU delta-feature encoding, and AArch64 fmv-features metadata. commit 8ec15f5ac95ddc6bc794e43c714b2704081b8efe Author: Ivan Kosarev <[email protected]> Date: Tue May 19 10:49:39 2026 +0100 [TableGen] Fix getting weights of register classes (#198328) The first member can be an aritifical register, so we have to find a non-artificial one to query its weight. commit 0eceac10de71050086961c6bde5a6976727b30e8 Author: Jeff Bailey <[email protected]> Date: Tue May 19 09:48:12 2026 +0000 [libc] Add regex_macros dependency to regex header (#198453) Added the regex_macros dependency to the regex header target. regex-macros.h was not being installed when regex entrypoints were enabled. Assisted-by: Automated tooling, human reviewed. commit 304d077d43240d43408dc5595a00374629e97886 Author: Jacob Crawley <[email protected]> Date: Tue May 19 10:47:58 2026 +0100 [AArch64] Add missing FSub case to isLegalToVectorizeReduction (#198302) Adds missing RecurKind::Fsub case to lower to partial reduction. commit 64f70354e262e288e3787566645919230c05196d Author: Raphael Isemann <[email protected]> Date: Tue May 19 10:24:40 2026 +0100 [lldb] Fix wrong buffer size when fetching Objective-C classes (#197389) LLDB calls objc_getRealizedClassList_trylock to fetch the list of realized Objective-C classes. Jim spotted that we currently pass the buffer length in *bytes*, when actually this API takes the buffer length in number of elements. This causes that the Objective-C runtime write more memory that we allocated for it. This can cause that the function calling expression crashes and leaves the Objective-C runtime mutex locked. commit e7887d5470e080c55fd69de83786fbf060d3955a Author: Sushant Gokhale <[email protected]> Date: Tue May 19 14:38:42 2026 +0530 [AArch64][SVE] Use truncating stores whenever possible (#196029) For fixed length SVE and fixed length vectors x/y, fold ``` store(concat_vector(truncate(x), truncate(y))) --> store(truncate(x)) store(truncate(y)) ``` commit 60e95e5d2685d6097aad31af6f9e77a9bb7c79ac Author: Michael Kruse <[email protected]> Date: Tue May 19 10:55:47 2026 +0200 [Flang][Driver] Add per-target search path for modules (#196558) Adds the version- and target-specific path ../lib/clang/<version>/finclude/flang/<target> to the intrinsic module search path in addition to ../finclude/flang with the former taking precedence if a module file should exist in both. The version/target-specific path is added by the driver by passing `-fintrinsic-modules-path` to the `-fc1` invocation. This is consistent with gfortran and the usual pattern that the driver resolves paths into the resource path, not the frontend. This PR adds nothing into that directory, which will be done in #171515. Extracted out of #171515 as requested by https://github.com/llvm/llvm-project/pull/171515#pullrequestreview-4179212219 --------- Co-authored-by: Tarun Prabhu <[email protected]> commit f3da7b92f7f87f2fb9c4174fe75fae70da132157 Author: David Green <[email protected]> Date: Tue May 19 09:35:40 2026 +0100 [MIPS][GlobalISel] Remove dependency on legal ruleset (#197379) This fills in always legal rules, to remove the dependency on the legacy rul…

ronlieb · 2026-05-20T10:35:11Z

seeing same issues downstream merges from this PR

mikaelholmen · 2026-05-20T10:39:59Z

I see some MLIR lit test flakiness with this patch.
E.g.

build-all/bin/mlir-translate -mlir-to-llvmir -split-input-file /repo/llvm/mlir/test/Target/LLVMIR/openmp-llvm.mlir

now randomly hits

mlir-translate: /repo/llvm/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:926: void llvm::OpenMPIRBuilder::finalize(Function *): Assertion `Extractor->isEligible() && "Expected OpenMP outlining to be possible!"' failed.

I have no idea if this patch is really to blame or if users of SmallPtrSet do things they shouldn't, but either way, with this patch things randomly start to fail.

I've seen the assert above and CHECK failures for at least

mlir/test/Target/LLVMIR/openmp-cli-fuse01.mlir
mlir/test/Target/LLVMIR/openmp-cli-fuse02.mlir
mlir/test/Target/LLVMIR/openmp-cli-tile02.mlir
mlir/test/Target/LLVMIR/openmp-llvm.mlir
mlir/test/Target/LLVMIR/openmp-taskloop-collapse.mlir

I suppose #198690 tries to do something about it.

slinder1 · 2026-05-20T14:17:48Z

I know we don't consider downstream users when making breaking changes outside of the IRs and some of the C API, but this change to the semantics of remove_if seems like it should at least get a call out in some release note or something.

Since this broke in-tree code I think we should revert now, confirm any now-broken uses are updated (seems like there is just the single use in the mlir omp code?), and then re-land. That the bug is non-deterministic means it is likely to eat a lot of developer time. I will revert it in ~1 hour from this comment unless I hear otherwise from someone.

aengelke · 2026-05-20T14:27:53Z

I'm +/-0 on reverting. Given that the single problematic use in-tree already has a fix with #198690, we might just as well fix forward? If that needs more discussion, we should obviously revert. +1 to release note, remove_if is used rarely, but there's no assert to catch this, so clearly documenting this change is a good idea.

@slinder1

…8690) openmp-cli-fuse02.mlir can intermittently leave behind a dead block after loop fusion, causing LLVM IR verification to fail. (#197637 (comment)) This happened because `removeUnusedBlocksFromParent` queried `BBsToErase` while using `SmallPtrSet::remove_if` on the same set, which made the result depend on the set's internal mutation order during removal. This patch tracks candidate blocks by index with `SmallBitVector` instead of mutating and querying the same `SmallPtrSet`. This also preserves the original `BBs` order when collecting the final dead blocks. --- The original PR was created with assistance from Copilot. The later analysis and fix were revised based on reviewer feedback. Co-authored-by: @slinder1

@slinder1

…emoval (#198690) openmp-cli-fuse02.mlir can intermittently leave behind a dead block after loop fusion, causing LLVM IR verification to fail. (llvm/llvm-project#197637 (comment)) This happened because `removeUnusedBlocksFromParent` queried `BBsToErase` while using `SmallPtrSet::remove_if` on the same set, which made the result depend on the set's internal mutation order during removal. This patch tracks candidate blocks by index with `SmallBitVector` instead of mutating and querying the same `SmallPtrSet`. This also preserves the original `BBs` order when collecting the final dead blocks. --- The original PR was created with assistance from Copilot. The later analysis and fix were revised based on reviewer feedback. Co-authored-by: @slinder1

@slinder1

…emoval (#198690) openmp-cli-fuse02.mlir can intermittently leave behind a dead block after loop fusion, causing LLVM IR verification to fail. (llvm/llvm-project#197637 (comment)) This happened because `removeUnusedBlocksFromParent` queried `BBsToErase` while using `SmallPtrSet::remove_if` on the same set, which made the result depend on the set's internal mutation order during removal. This patch tracks candidate blocks by index with `SmallBitVector` instead of mutating and querying the same `SmallPtrSet`. This also preserves the original `BBs` order when collecting the final dead blocks. --- The original PR was created with assistance from Copilot. The later analysis and fix were revised based on reviewer feedback. Co-authored-by: @slinder1

@slinder1

…emoval (#198690) openmp-cli-fuse02.mlir can intermittently leave behind a dead block after loop fusion, causing LLVM IR verification to fail. (llvm/llvm-project#197637 (comment)) This happened because `removeUnusedBlocksFromParent` queried `BBsToErase` while using `SmallPtrSet::remove_if` on the same set, which made the result depend on the set's internal mutation order during removal. This patch tracks candidate blocks by index with `SmallBitVector` instead of mutating and querying the same `SmallPtrSet`. This also preserves the original `BBs` order when collecting the final dead blocks. --- The original PR was created with assistance from Copilot. The later analysis and fix were revised based on reviewer feedback. Co-authored-by: @slinder1

This reverts commit 7b227a2.

@Meinersbur

This is essentially post-commit review for #198690 which was landed quickly to fix nondeterminism in tests introduced in #197637 Change-Id: Ib3603ef3c70dde5bb22d0fc04d9249e62ecccf0c Co-authored-by: @Meinersbur Co-authored-by: @chichunchen

SmallPtrSet uses quadratic probing with tombstone deletion in large mode. Tombstones occupy a third bucket state and hurts lookup. Switch to linear probing with deletion implemented using Knuth TAOCP 6.4 Algorithm R. `erase` opens a hole at the removed slot, walks forward sliding each following entry whose probe path crosses the hole back into it (the hole moves with each slide), and stops at the next empty slot. The scan stops at the next empty bucket, which is guaranteed to exist. `remove_if` clears matches in a single pass then calls `Grow` at the current size to restore the linear-probe invariant, O(N) total. (Per-match Algorithm R erase would be O(N * cluster).) My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss Table family (not good at small keys) are actually worse than the baseline. Mirrors the small-mode change in 42c3edb, which dropped tombstones from the inline array and likewise had `erase` invalidate iterators. `erase` now physically relocates following entries, so it invalidates iterators and references in large mode too. Linear probing is vulnerable to primary clustering. Therefore, we lower the max load factor to 2/3, slightly faster than the current 3/4.

DenseMap uses quadratic probing with lazy deletion. Tombstones occupy a third bucket state and hurts lookup. Switch to linear probing with deletion implemented using Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change llvm#197637. While this removes tombstone keys, `erase` now relocates following live entries and therefore invalidates their iterators and references. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). `ValueHandleBase::RemoveFromUseList` uses this to refresh each moved handle's PrevPtr, mirroring the post-grow fix-up in `AddToUseList`.

DenseMap uses quadratic probing with lazy deletion. Tombstones occupy a third bucket state and hurts lookup. Switch to linear probing with deletion implemented using Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change llvm#197637. While this removes tombstone keys, `erase` now relocates following live entries and therefore invalidates their iterators and references. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). `ValueHandleBase::RemoveFromUseList` uses this to refresh each moved handle's PrevPtr, mirroring the post-grow fix-up in `AddToUseList`. Aided by Claude Opus 4.7

DenseMap uses quadratic probing with lazy deletion. Tombstones occupy a third bucket state and hurts lookup. Switch to linear probing with deletion implemented using Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change llvm#197637. While this removes tombstone keys, `erase` now relocates following live entries and therefore invalidates their iterators and references. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). `ValueHandleBase::RemoveFromUseList` uses this to refresh each moved handle's PrevPtr, mirroring the post-grow fix-up in `AddToUseList`. Linear probing is vulnerable to primary clustering, so a stronger mixer for `DenseMapInfo<T*>::getHashValue` is needed (llvm#197390). Lots of non-core cleanups are handled by Claude Opus 4.7

DenseMap uses quadratic probing with lazy deletion: an erased entry becomes a tombstone, a third bucket state alongside empty and live that every lookup must skip past. Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the tombstone state entirely. In exchange, erase now relocates the following live entries to close the gap, so it invalidates iterators and references other than the erased one. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). ValueHandleBase::RemoveFromUseList uses this to refresh each moved handle's PrevPtr, mirroring the post-grow fix-up in AddToUseList. Linear probing is more vulnerable to primary clustering than quadratic probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue mixer from llvm#197390. Non-core cleanups aided by Claude Opus 4.7.

DenseMap uses quadratic probing with lazy deletion: an erased entry becomes a tombstone, a third bucket state alongside empty and live that every find/insert must inspect. Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the tombstone state entirely. Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp: 62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap specializations have pointer keys and 16-byte key/value pairs. Alternatives such as Robin Hood hashing, Verstable, and Boost's unordered_flat_map are evaluated - slower with larger code footprint. In exchange, erase now relocates the following live entries to close the hole, so it invalidates iterators and references other than the erased one. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). ValueHandleBase::RemoveFromUseList uses this to refresh each moved handle's PrevPtr. Linear probing is more vulnerable to primary clustering than quadratic probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue mixer from llvm#197390. Non-core cleanups aided by Claude Opus 4.7.

DenseMap uses quadratic probing with lazy deletion: an erased entry becomes a tombstone, a third bucket state alongside empty and live that every find/insert must inspect. Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the tombstone state entirely. In exchange, erase now relocates the following live entries to close the hole, so it invalidates iterators and references other than the erased one. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). ValueHandleBase::RemoveFromUseList uses this to refresh each moved handle's PrevPtr. Linear probing is more vulnerable to primary clustering than quadratic probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue mixer from llvm#197390. Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp: 62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap specializations have pointer keys and 16-byte key/value pairs. Alternatives such as Robin Hood hashing, Verstable, and Boost's unordered_flat_map are evaluated - they are slower but with larger code footprint. Non-core cleanups aided by Claude Opus 4.7.

DenseMap uses quadratic probing with lazy deletion: an erased entry becomes a tombstone, a third bucket state alongside empty and live that every find/insert must inspect. Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the tombstone state entirely. In exchange, erase now relocates the following live entries to close the hole, so it invalidates iterators and references other than the erased one. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). ValueHandleBase::RemoveFromUseList uses this to refresh each moved handle's PrevPtr. Linear probing is more vulnerable to primary clustering than quadratic probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue mixer from llvm#197390. Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp: 62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap specializations have pointer keys and 16-byte key/value pairs. Alternatives such as Robin Hood hashing, Verstable, and Boost's unordered_flat_map were evaluated; they are slower and have a larger code footprint. I believe the current in-band sentinel value approach, despite the pain (llvm#146595), is the best, or at least very difficult to beat. Non-core cleanups aided by Claude Opus 4.7.

…9615) DenseMap uses quadratic probing with lazy deletion: an erased entry becomes a tombstone, a third bucket state alongside empty and live that every find/insert must inspect. Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change #197637. This removes the tombstone state entirely. In exchange, erase now relocates the following live entries to close the hole, so it invalidates iterators and references other than the erased one. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). ValueHandleBase::RemoveFromUseList uses this to refresh each moved handle's PrevPtr. Linear probing is more vulnerable to primary clustering than quadratic probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue mixer from #197390. Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp: 62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap specializations have pointer keys and 16-byte key/value pairs. Alternatives such as Robin Hood hashing, Verstable, and Boost's unordered_flat_map were evaluated; they are slower and have a larger code footprint. I believe the current in-band sentinel value approach, despite the pain (#146595), is the best, or at least very difficult to beat. Non-core cleanups aided by Claude Opus 4.7.

…0595) DenseMap uses quadratic probing with lazy deletion: an erased entry becomes a tombstone, a third bucket state alongside empty and live that every find/insert must inspect. Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change #197637. This removes the tombstone state entirely. In exchange, erase now relocates the following live entries to close the hole, so it invalidates iterators and references other than the erased one. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). ValueHandleBase::RemoveFromUseList uses this to refresh each moved handle's PrevPtr. Linear probing is more vulnerable to primary clustering than quadratic probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue mixer from #197390. Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp: 62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap specializations have pointer keys and 16-byte key/value pairs. Alternatives such as Robin Hood hashing, Verstable, and Boost's unordered_flat_map were evaluated; they are slower and have a larger code footprint. I believe the current in-band sentinel value approach, despite the pain (#146595), is the best, or at least very difficult to beat. --- This is a pure reland of #199615 (reverted in #200421) after fixing the PoisoningVH bug by #200540. Non-core cleanups aided by Claude Opus 4.7.

… for CTT) Squash of the landed SmallPtrSet/DenseMap optimization series plus the IR/Analysis getTombstoneKey/getEmptyKey cleanups, on top of the pre-SmallPtrSet baseline (parent of llvm#197637), so the compile-time tracker measures the whole series' cumulative impact as one from->to. The ADT DenseMapInfo getEmptyKey/getTombstoneKey definitions are intentionally kept: the primitive/generic specializations have direct callers tree-wide (e.g. CodeView TypeIndex.h, TableGen), so their removal needs the whole-tree caller cleanup that is out of scope here. Includes: llvm#197637 [SmallPtrSet] Drop tombstones in large mode llvm#198982 Don't assume non-erased DenseMap entries remain valid after erase llvm#199369 [DenseMap] Invalidate iterators on erase llvm#200540 [IR] Fix PoisoningVH relocation of a poisoned handle llvm#200595 [DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R llvm#200958 [IR][Analysis] Remove unused DenseMapInfo::getTombstoneKey llvm#201281 [DenseMap] Store occupancy in a packed used-bit array llvm#201742 [DenseMap] Fix ubsan error after llvm#201281 llvm#201997 [IR][Analysis] Remove unused DenseMapInfo::getEmptyKey Not for merge; measurement aid only.

MaskRay requested review from kuhar and nikic May 14, 2026 08:32

llvmorg-github-actions Bot added llvm:support llvm:adt labels May 14, 2026

nikic reviewed May 14, 2026

View reviewed changes

MaxGraey reviewed May 14, 2026

View reviewed changes

MaskRay requested review from aengelke and nikic May 15, 2026 08:08

aengelke reviewed May 18, 2026

View reviewed changes

MaskRay added 2 commits May 18, 2026 22:20

Rebase onto main

05b11b7

update comment

d83bb12

aengelke approved these changes May 19, 2026

View reviewed changes

MaskRay merged commit 7b227a2 into llvm:main May 19, 2026
12 of 14 checks passed

slinder1 mentioned this pull request May 20, 2026

[OpenMP][OMPIRBuilder] Avoid querying SmallPtrSet during removal #198690

Merged

boomanaiden154 mentioned this pull request May 20, 2026

[Flang] [OpenMP] atomic compare #184761

Merged

SunilKuravinakop mentioned this pull request May 20, 2026

[flang][OpenMP] Make the atomic-compare test run on aarch64 too #198849

Merged

suryajasper pushed a commit to suryajasper/rocm-llvm-project that referenced this pull request May 20, 2026

Revert "[SmallPtrSet] Drop tombstones in large mode (llvm#197637)"

c4bf0dd

This reverts commit 7b227a2.

slinder1 mentioned this pull request May 20, 2026

[1/1]: [OpenMP][OMPIRBuilder] Refactor removeUnusedBlocksFromParent #198938

Merged

MaskRay mentioned this pull request May 26, 2026

[DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R #199615

Merged

MaskRay mentioned this pull request May 30, 2026

[DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R #200595

Merged

Conversation

MaskRay commented May 14, 2026

Uh oh!

llvmorg-github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaskRay commented May 14, 2026

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaskRay May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaskRay May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaskRay commented May 14, 2026

Uh oh!

aengelke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aengelke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ronlieb commented May 20, 2026

Uh oh!

mikaelholmen commented May 20, 2026

Uh oh!

slinder1 commented May 20, 2026

Uh oh!

aengelke commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

llvmorg-github-actions Bot commented May 14, 2026 •

edited

Loading

MaskRay May 15, 2026 •

edited

Loading

MaskRay May 15, 2026 •

edited

Loading