Skip to content

[SmallPtrSet] Drop tombstones in large mode#197637

Merged
MaskRay merged 3 commits into
llvm:mainfrom
MaskRay:pr/smallptrset-6.4R
May 19, 2026
Merged

[SmallPtrSet] Drop tombstones in large mode#197637
MaskRay merged 3 commits into
llvm:mainfrom
MaskRay:pr/smallptrset-6.4R

Conversation

@MaskRay
Copy link
Copy Markdown
Member

@MaskRay MaskRay commented May 14, 2026

SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R. erase opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot. The scan stops at the next empty bucket, which is guaranteed to
exist.

remove_if clears matches in a single pass then calls Grow at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Mirrors the small-mode change in 42c3edb, which dropped tombstones
from the inline array and likewise had erase invalidate iterators.
erase now physically relocates following entries, so it invalidates
iterators and references in large mode too.

Linear probing is vulnerable to primary clustering. Therefore, we lower
the max load factor to 2/3, slightly faster than the current 3/4.

SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R.  `erase` opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot.  The scan stops at the next empty bucket, which is guaranteed to
exist.

`remove_if` clears matches in a single pass then calls `Grow` at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Mirrors the small-mode change in 42c3edb, which dropped tombstones
from the inline array and likewise had `erase` invalidate iterators.
`erase` now physically relocates following entries, so it invalidates
iterators and references in large mode too.

Linear probing is vulnerable to primary clustering. Therefore, we lower
the max load factor to 2/3, slightly faster than the current 3/4.
@llvmorg-github-actions
Copy link
Copy Markdown

llvmorg-github-actions Bot commented May 14, 2026

@llvm/pr-subscribers-llvm-adt

@llvm/pr-subscribers-llvm-support

Author: Fangrui Song (MaskRay)

Changes

SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R. erase opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot. The scan stops at the next empty bucket, which is guaranteed to
exist.

remove_if clears matches in a single pass then calls Grow at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Mirrors the small-mode change in 42c3edb, which dropped tombstones
from the inline array and likewise had erase invalidate iterators.
erase now physically relocates following entries, so it invalidates
iterators and references in large mode too.

Linear probing is vulnerable to primary clustering. Therefore, we lower
the max load factor to 2/3, slightly faster than the current 3/4.


Full diff: https://github.com/llvm/llvm-project/pull/197637.diff

3 Files Affected:

  • (modified) llvm/include/llvm/ADT/SmallPtrSet.h (+33-29)
  • (modified) llvm/lib/Support/SmallPtrSet.cpp (+29-45)
  • (modified) llvm/unittests/ADT/SmallPtrSetTest.cpp (+3-2)
diff --git a/llvm/include/llvm/ADT/SmallPtrSet.h b/llvm/include/llvm/ADT/SmallPtrSet.h
index 8e7c8b30293b2..73abcf8ff58c5 100644
--- a/llvm/include/llvm/ADT/SmallPtrSet.h
+++ b/llvm/include/llvm/ADT/SmallPtrSet.h
@@ -47,12 +47,14 @@ namespace llvm {
 /// sets are often small.  In this case, no memory allocation is used, and only
 /// light-weight and cache-efficient scanning is used.
 ///
-/// Large sets use a classic quadratically-probed hash table.  Empty buckets are
-/// represented with an illegal pointer value (-1) to allow null pointers to be
-/// inserted.  Tombstones are represented with another illegal pointer value
-/// (-2), to allow deletion.  The hash table is resized when the table is 3/4 or
-/// more.  When this happens, the table is doubled in size.
-///
+/// Large sets use a linear-probed hash table with deletion implemented using
+/// Knuth TAOCP 6.4 Algorithm R: `erase` opens a hole, walks forward sliding
+/// each following entry whose probe path crosses the hole back into it (the
+/// hole moves with each slide), and stops at the next empty slot.  Empty
+/// buckets are represented with an illegal pointer value (-1) to allow null
+/// pointers to be inserted; no tombstone state is needed.  The hash table is
+/// resized when the table is 2/3 or more.  When this happens, the table is
+/// doubled in size.
 class SmallPtrSetImplBase : public DebugEpochBase {
   friend class SmallPtrSetIteratorImpl;
 
@@ -66,8 +68,6 @@ class SmallPtrSetImplBase : public DebugEpochBase {
   /// If small, all these elements are at the beginning of CurArray and the rest
   /// is uninitialized.
   unsigned NumEntries;
-  /// Number of tombstones in CurArray.
-  unsigned NumTombstones;
   /// Whether the set is in small representation.
   bool IsSmall;
 
@@ -80,7 +80,7 @@ class SmallPtrSetImplBase : public DebugEpochBase {
 
   explicit SmallPtrSetImplBase(const void **SmallStorage, unsigned SmallSize)
       : CurArray(SmallStorage), CurArraySize(SmallSize), NumEntries(0),
-        NumTombstones(0), IsSmall(true) {
+        IsSmall(true) {
     assert(llvm::has_single_bit(SmallSize) &&
            "Initial size must be a power of two!");
   }
@@ -111,7 +111,6 @@ class SmallPtrSetImplBase : public DebugEpochBase {
     }
 
     NumEntries = 0;
-    NumTombstones = 0;
   }
 
   void reserve(size_type NewNumEntries) {
@@ -122,13 +121,13 @@ class SmallPtrSetImplBase : public DebugEpochBase {
     // No need to expand if we're small and NewNumEntries will fit in the space.
     if (isSmall() && NewNumEntries <= CurArraySize)
       return;
-    // insert_imp_big will reallocate if stores is more than 75% full, on the
+    // insert_imp_big will reallocate if stores is more than 2/3 full, on the
     // /final/ insertion.
-    if (!isSmall() && ((NewNumEntries - 1) * 4) < (CurArraySize * 3))
+    if (!isSmall() && ((NewNumEntries - 1) * 3) < (CurArraySize * 2))
       return;
-    // We must Grow -- find the size where we'd be 75% full, then round up to
+    // We must Grow -- find the size where we'd be 2/3 full, then round up to
     // the next power of two.
-    size_type NewSize = NewNumEntries + (NewNumEntries / 3);
+    size_type NewSize = NewNumEntries + (NewNumEntries / 2);
     NewSize = llvm::bit_ceil(NewSize);
     // Like insert_imp_big, always allocate at least 128 elements.
     NewSize = std::max(128u, NewSize);
@@ -136,8 +135,6 @@ class SmallPtrSetImplBase : public DebugEpochBase {
   }
 
 protected:
-  static void *getTombstoneMarker() { return reinterpret_cast<void *>(-2); }
-
   static void *getEmptyMarker() {
     // Note that -1 is chosen to make clear() efficiently implementable with
     // memset and because it's not a valid pointer value.
@@ -206,11 +203,8 @@ class SmallPtrSetImplBase : public DebugEpochBase {
     if (!Bucket)
       return false;
 
-    *const_cast<const void **>(Bucket) = getTombstoneMarker();
-    NumTombstones++;
+    eraseFromBucket(const_cast<const void **>(Bucket));
     --NumEntries;
-    // Treat this consistently from an API perspective, even if we don't
-    // actually invalidate iterators here.
     incrementEpoch();
     return true;
   }
@@ -254,10 +248,16 @@ class SmallPtrSetImplBase : public DebugEpochBase {
   const void *const *FindBucketFor(const void *Ptr) const;
   LLVM_ABI void shrink_and_clear();
 
-  /// Grow - Allocate a larger backing store for the buckets and move it over.
+protected:
+  /// Erase the entry at \p Bucket and close the resulting hole via Knuth
+  /// TAOCP 6.4 Algorithm R. Caller must update \c NumEntries and the epoch.
+  LLVM_ABI void eraseFromBucket(const void **Bucket);
+
+  /// Allocate a larger backing store for the buckets and move it over.
+  /// Passing the current size triggers a same-size rehash, used by batch
+  /// erase to compact away empty slots left by mark-then-rebuild.
   LLVM_ABI void Grow(unsigned NewSize);
 
-protected:
   /// swap - Swaps the elements of two sets.
   /// Note: This method assumes that both sets have the same small size.
   LLVM_ABI void swap(const void **SmallStorage, const void **RHSSmallStorage,
@@ -313,9 +313,7 @@ class LLVM_DEBUGEPOCHBASE_HANDLEBASE_EMPTYBASE SmallPtrSetIteratorImpl
   /// valid.
   void AdvanceIfNotValid() {
     assert(Bucket <= End);
-    while (Bucket != End &&
-           (*Bucket == SmallPtrSetImplBase::getEmptyMarker() ||
-            *Bucket == SmallPtrSetImplBase::getTombstoneMarker()))
+    while (Bucket != End && *Bucket == SmallPtrSetImplBase::getEmptyMarker())
       ++Bucket;
   }
 
@@ -436,18 +434,24 @@ template <typename PtrType> class SmallPtrSetImpl : public SmallPtrSetImplBase {
       return Removed;
     }
 
+    // Mark-then-rebuild: one pass to clear matches without sliding (which
+    // would re-walk the cluster on every erase), then a single rehash to
+    // restore the linear-probe invariant.  O(N) total, vs O(N * cluster)
+    // for repeated per-match Algorithm R erases.
     for (const void *&Bucket : buckets()) {
-      if (Bucket == getTombstoneMarker() || Bucket == getEmptyMarker())
+      if (Bucket == getEmptyMarker())
         continue;
       PtrType Ptr = PtrTraits::getFromVoidPointer(const_cast<void *>(Bucket));
       if (P(Ptr)) {
-        Bucket = getTombstoneMarker();
-        ++NumTombstones;
+        Bucket = getEmptyMarker();
         --NumEntries;
-        incrementEpoch();
         Removed = true;
       }
     }
+    if (Removed) {
+      incrementEpoch();
+      Grow(CurArraySize);
+    }
     return Removed;
   }
 
diff --git a/llvm/lib/Support/SmallPtrSet.cpp b/llvm/lib/Support/SmallPtrSet.cpp
index e377dbf4a6999..d8e66b9c89c74 100644
--- a/llvm/lib/Support/SmallPtrSet.cpp
+++ b/llvm/lib/Support/SmallPtrSet.cpp
@@ -29,7 +29,7 @@ void SmallPtrSetImplBase::shrink_and_clear() {
   // Reduce the number of buckets.
   unsigned Size = size();
   CurArraySize = Size > 16 ? 1 << (Log2_32_Ceil(Size) + 1) : 32;
-  NumEntries = NumTombstones = 0;
+  NumEntries = 0;
 
   // Install the new array.  Clear all the buckets to empty.
   CurArray = (const void**)safe_malloc(sizeof(void*) * CurArraySize);
@@ -39,14 +39,9 @@ void SmallPtrSetImplBase::shrink_and_clear() {
 
 std::pair<const void *const *, bool>
 SmallPtrSetImplBase::insert_imp_big(const void *Ptr) {
-  if (LLVM_UNLIKELY(size() * 4 >= CurArraySize * 3)) {
-    // If more than 3/4 of the array is full, grow.
+  if (LLVM_UNLIKELY(size() * 3 >= CurArraySize * 2)) {
+    // If more than 2/3 of the array is full, grow.
     Grow(CurArraySize < 64 ? 128 : CurArraySize * 2);
-  } else if (LLVM_UNLIKELY(CurArraySize - NumEntries - NumTombstones <
-                           CurArraySize / 8)) {
-    // If fewer of 1/8 of the array is empty (meaning that many are filled with
-    // tombstones), rehash.
-    Grow(CurArraySize);
   }
 
   // Okay, we know we have space.  Find a hash bucket.
@@ -54,9 +49,7 @@ SmallPtrSetImplBase::insert_imp_big(const void *Ptr) {
   if (*Bucket == Ptr)
     return {Bucket, false}; // Already inserted, good.
 
-  // Otherwise, insert it!
-  if (*Bucket == getTombstoneMarker())
-    --NumTombstones;
+  // Otherwise, insert it.
   ++NumEntries;
   *Bucket = Ptr;
   incrementEpoch();
@@ -64,48 +57,46 @@ SmallPtrSetImplBase::insert_imp_big(const void *Ptr) {
 }
 
 const void *const *SmallPtrSetImplBase::doFind(const void *Ptr) const {
-  unsigned BucketNo =
-      DenseMapInfo<void *>::getHashValue(Ptr) & (CurArraySize - 1);
-  unsigned ProbeAmt = 1;
+  unsigned Mask = CurArraySize - 1;
+  unsigned BucketNo = DenseMapInfo<void *>::getHashValue(Ptr) & Mask;
   while (true) {
     const void *const *Bucket = CurArray + BucketNo;
     if (LLVM_LIKELY(*Bucket == Ptr))
       return Bucket;
     if (LLVM_LIKELY(*Bucket == getEmptyMarker()))
       return nullptr;
-
-    // Otherwise, it's a hash collision or a tombstone, continue quadratic
-    // probing.
-    BucketNo += ProbeAmt++;
-    BucketNo &= CurArraySize - 1;
+    BucketNo = (BucketNo + 1) & Mask;
   }
 }
 
 const void *const *SmallPtrSetImplBase::FindBucketFor(const void *Ptr) const {
-  unsigned Bucket = DenseMapInfo<void *>::getHashValue(Ptr) & (CurArraySize-1);
-  unsigned ArraySize = CurArraySize;
-  unsigned ProbeAmt = 1;
+  unsigned Mask = CurArraySize - 1;
+  unsigned Bucket = DenseMapInfo<void *>::getHashValue(Ptr) & Mask;
   const void *const *Array = CurArray;
-  const void *const *Tombstone = nullptr;
   while (true) {
-    // If we found an empty bucket, the pointer doesn't exist in the set.
-    // Return a tombstone if we've seen one so far, or the empty bucket if
-    // not.
     if (LLVM_LIKELY(Array[Bucket] == getEmptyMarker()))
-      return Tombstone ? Tombstone : Array+Bucket;
-
-    // Found Ptr's bucket?
+      return Array + Bucket;
     if (LLVM_LIKELY(Array[Bucket] == Ptr))
-      return Array+Bucket;
-
-    // If this is a tombstone, remember it.  If Ptr ends up not in the set, we
-    // prefer to return it than something that would require more probing.
-    if (Array[Bucket] == getTombstoneMarker() && !Tombstone)
-      Tombstone = Array+Bucket;  // Remember the first tombstone found.
+      return Array + Bucket;
+    Bucket = (Bucket + 1) & Mask;
+  }
+}
 
-    // It's a hash collision or a tombstone. Reprobe.
-    Bucket = (Bucket + ProbeAmt++) & (ArraySize-1);
+void SmallPtrSetImplBase::eraseFromBucket(const void **Bucket) {
+  // Knuth TAOCP 6.4 Algorithm R: walk forward sliding each following entry
+  // whose probe path crosses the hole.
+  unsigned Mask = CurArraySize - 1;
+  unsigned I = Bucket - CurArray;
+  unsigned J = I;
+  const void *Empty = getEmptyMarker();
+  while ((J = (J + 1) & Mask), CurArray[J] != Empty) {
+    auto Ideal = DenseMapInfo<void *>::getHashValue(CurArray[J]);
+    if (((I - Ideal) & Mask) < ((J - Ideal) & Mask)) {
+      CurArray[I] = CurArray[J];
+      I = J;
+    }
   }
+  CurArray[I] = Empty;
 }
 
 /// Grow - Allocate a larger backing store for the buckets and move it over.
@@ -125,13 +116,12 @@ void SmallPtrSetImplBase::Grow(unsigned NewSize) {
   // Copy over all valid entries.
   for (const void *&Bucket : OldBuckets) {
     // Copy over the element if it is valid.
-    if (Bucket != getTombstoneMarker() && Bucket != getEmptyMarker())
+    if (Bucket != getEmptyMarker())
       *const_cast<void **>(FindBucketFor(Bucket)) = const_cast<void *>(Bucket);
   }
 
   if (!WasSmall)
     free(OldBuckets.begin());
-  NumTombstones = 0;
   IsSmall = false;
 }
 
@@ -194,7 +184,6 @@ void SmallPtrSetImplBase::copyHelper(const SmallPtrSetImplBase &RHS) {
   llvm::copy(RHS.buckets(), CurArray);
 
   NumEntries = RHS.NumEntries;
-  NumTombstones = RHS.NumTombstones;
 }
 
 void SmallPtrSetImplBase::moveFrom(const void **SmallStorage,
@@ -224,13 +213,11 @@ void SmallPtrSetImplBase::moveHelper(const void **SmallStorage,
   // Copy the rest of the trivial members.
   CurArraySize = RHS.CurArraySize;
   NumEntries = RHS.NumEntries;
-  NumTombstones = RHS.NumTombstones;
   IsSmall = RHS.IsSmall;
 
   // Make the RHS small and empty.
   RHS.CurArraySize = SmallSize;
   RHS.NumEntries = 0;
-  RHS.NumTombstones = 0;
   RHS.IsSmall = true;
 }
 
@@ -244,7 +231,6 @@ void SmallPtrSetImplBase::swap(const void **SmallStorage,
     std::swap(this->CurArray, RHS.CurArray);
     std::swap(this->CurArraySize, RHS.CurArraySize);
     std::swap(this->NumEntries, RHS.NumEntries);
-    std::swap(this->NumTombstones, RHS.NumTombstones);
     return;
   }
 
@@ -263,7 +249,6 @@ void SmallPtrSetImplBase::swap(const void **SmallStorage,
     }
     assert(this->CurArraySize == RHS.CurArraySize);
     std::swap(this->NumEntries, RHS.NumEntries);
-    std::swap(this->NumTombstones, RHS.NumTombstones);
     return;
   }
 
@@ -276,7 +261,6 @@ void SmallPtrSetImplBase::swap(const void **SmallStorage,
   llvm::copy(SmallSide.small_buckets(), LargeSideInlineStorage);
   std::swap(LargeSide.CurArraySize, SmallSide.CurArraySize);
   std::swap(LargeSide.NumEntries, SmallSide.NumEntries);
-  std::swap(LargeSide.NumTombstones, SmallSide.NumTombstones);
   SmallSide.CurArray = LargeSide.CurArray;
   SmallSide.IsSmall = false;
   LargeSide.CurArray = LargeSideInlineStorage;
diff --git a/llvm/unittests/ADT/SmallPtrSetTest.cpp b/llvm/unittests/ADT/SmallPtrSetTest.cpp
index fe7a8279d06b1..b42da537510de 100644
--- a/llvm/unittests/ADT/SmallPtrSetTest.cpp
+++ b/llvm/unittests/ADT/SmallPtrSetTest.cpp
@@ -476,7 +476,8 @@ TEST(SmallPtrSetTest, Reserve) {
   EXPECT_EQ(Set.size(), 6u);
   EXPECT_THAT(Set, UnorderedElementsAre(&Vals[0], &Vals[1], &Vals[2], &Vals[3], &Vals[4], &Vals[5]));
 
-  // Reserving 192 should result in 256 buckets.
+  // Reserving 192 should result in 512 buckets: 192 * 3 / 2 = 288, rounded
+  // up to the next power of two.
   Set.reserve(192);
-  EXPECT_EQ(Set.capacity(), 256u);
+  EXPECT_EQ(Set.capacity(), 512u);
 }

@MaskRay
Copy link
Copy Markdown
Member Author

MaskRay commented May 14, 2026

Copy link
Copy Markdown
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks reasonable to me.

I think it would be good to do some local wall-time measurements to confirm this is indeed better (or at least neutral). For these kinds of data structure changes, instructions:u probably does not paint the full picture.

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Is this in terms of instructions or wall-time? Especially for Swiss Table I'd expect that these may not be well-correlated.

unsigned I = Bucket - CurArray;
unsigned J = I;
const void *Empty = getEmptyMarker();
while ((J = (J + 1) & Mask), CurArray[J] != Empty) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this assignment in the while condition? You can use for (; CurArray[J] != Empty; J = (J + 1) & Mask) plus initialize J = (I + 1) & Mask?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code deduplicates (J = (I + 1) & Mask) and (J = (J + 1) & Mask)

}
if (Removed) {
incrementEpoch();
Grow(CurArraySize);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if remove_if() removes anything, we'll now always force a new allocation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i think this is small cost to pay since this O(n) operation is used very rarely

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention in doc comment?

Comment on lines +85 to 100
void SmallPtrSetImplBase::eraseFromBucket(const void **Bucket) {
// Knuth TAOCP 6.4 Algorithm R: walk forward sliding each following entry
// whose probe path crosses the hole.
unsigned Mask = CurArraySize - 1;
unsigned I = Bucket - CurArray;
unsigned J = I;
const void *Empty = getEmptyMarker();
while ((J = (J + 1) & Mask), CurArray[J] != Empty) {
auto Ideal = DenseMapInfo<void *>::getHashValue(CurArray[J]);
if (((I - Ideal) & Mask) < ((J - Ideal) & Mask)) {
CurArray[I] = CurArray[J];
I = J;
}
}
CurArray[I] = Empty;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can drop one mask wrap:

((I - Ideal) & Mask) < ((J - Ideal) & Mask)

can be represented as:

dist(Ideal -> hole) < dist(Ideal -> current) 

Or in another representaion: Entry if Ideal <=probe Hole <probe Scan,
where <=probe is Ideal, Ideal + 1, Ideal + 2, ... modulo N
and the same with Gap:

dist(Ideal -> Scan) = (dist(Ideal -> Hole) + Gap) % N

But since we know Gap > 0 the condition becomes:

Back < (Back + Gap) % N

or similified equivalent:

Back + Gap < N

So we can avoid the second & Mask:

Back <= Mask - Gap

and writ this something like:

void SmallPtrSetImplBase::eraseFromBucket(const void **Bucket) {
  const unsigned Mask = CurArraySize - 1;
  const void *Empty = getEmptyMarker();
  unsigned Hole = Bucket - CurArray;

  for (unsigned Scan = (Hole + 1) & Mask, Gap = 1; ; Scan = (Scan + 1) & Mask, ++Gap) {
    const void *Entry = CurArray[Scan];
    if (Entry == Empty)
      break;

    const unsigned Back = 
      (Hole - DenseMapInfo<void *>::getHashValue(Entry)) & Mask;

    if (Back <= Mask - Gap) {
      CurArray[Hole] = Entry;
      Hole = Scan;
      Gap = 0;
    }
  }

  CurArray[Hole] = Empty;
}

Although I might be wrong, you should test this in practice

Copy link
Copy Markdown
Member Author

@MaskRay MaskRay May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggesting the alternative. It's correct but there is no difference
https://llvm-compile-time-tracker.com/compare.php?from=6b1a6d3581202cfbce8c3bb030dbf06f1f2423fb&to=cfb3002ff020c198ea2033abc800d4195bc6fe25&stat=instructions:u

Measured locally

  ▎ |           |     C instr | C wall (ms) |
  ▎ |-----------|-------------|-------------|
  ▎ | original  | 138,911,223 |       48.30 |
  ▎ | suggested | 139,018,238 |       48.20 |
  ▎
  ▎ +0.08 % instructions / ~-0.2 % wall — within noise. I'll keep the original form for readability (it's a direct expression of the Knuth
  ▎ Algorithm R invariant) unless you'd like me to take the suggestion anyway.
  ▎ Compared the generated code (llvm-objdump -d, clang++ -O2, the eraseFromFilledBucket inlined into the C workload):
  ▎
  ▎ Original: 2× sub + 2× and + cmp, both (I-H)&Mask and (J-H)&Mask chained through the entry's hash.
  ▎ Suggested: 2× sub + 1× and + cmp, plus 1× inc and 1× xor for the Gap counter. The Mask - Gap operand is on an independent dependency chain (no hash dependency), shortening the critical path to the compare.
  ▎
  ▎ Same total op count per iteration; one fewer AND, but an extra INC. Measurements bear that out: +0.08 % instructions:u / -0.21 % wall on the erase-heavy workload — within noise. I think keeping the original form is preferable for readability; happy to take the change if you feel
  ▎ strongly about the dep-chain win.

When the hash function is not inline, Gap becomes a new live value across the call.

Sticking with my current implementation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Krut's algorithms are really hard to beat

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly off-topic regarding wall-time on https://llvm-compile-time-tracker.com. Have you considered using hyperfine (https://github.com/sharkdp/hyperfine)? It would significantly reduce noise and provide more reliable statistics. It also has many useful options, such as configuring how long the warmup stage should run and how it should be performed.

Copy link
Copy Markdown
Member Author

@MaskRay MaskRay May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My local workbench uses this to pin a single core and disable address space randomization to get more stable results.

numactl -C 20 setarch x86_64 -R perf stat -x, -e instructions:u,cycles,L1-dcache-load-misses,task-clock

Have you considered using hyperfine (sharkdp/hyperfine)?

This is a question for nikic, but I think it's not doable since running all of stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g stage1-aarch64-O3 stage1-aarch64-O0-g stage2-O3 stage2-O0-g stage2-clang already eats up way too many resources on a machine...


BTW, TAOCP uses a decreasing iterator and makes no mention of I = J after sliding the current entry to the hole.

https://github.com/rigtorp/HashMap/blob/master/include/rigtorp/HashMap.h#L259 also employs TAOCP 6.4 Algorithm R. Its benchmark shows it beats absl::flat_hash_map in its small key benchmarks.


For DenseMap/SmallPtrSet, the code size from the template instantiations is critical. I believe swiss table family algorithms lose to simple linear probing...

@MaskRay
Copy link
Copy Markdown
Member Author

MaskRay commented May 14, 2026

This generally looks reasonable to me.

I think it would be good to do some local wall-time measurements to confirm this is indeed better (or at least neutral). For these kinds of data structure changes, instructions:u probably does not paint the full picture.

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Is this in terms of instructions or wall-time? Especially for Swiss Table I'd expect that these may not be well-correlated.

Both instructions:u and wall-time move in the same direction here. I extracted DenseMap / DenseMapInfo into a standalone workbench and simulated two workloads against it; one is modeled on clang Decl * lookups, which perf identifies as one of clang's heaviest DenseMap consumers. Both workloads use small pair<key, value> buckets. The relative ordering matches what shows up on https://llvm-compile-time-tracker.com/.

  ┌──────────┬──────────────────────────┬──────────────────────────┬──────────────────────────┐
  │ workload │   tombstone (original)   │ linear-probe + backshift │        Boost FOA         │
  ├──────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┤
  │ A        │ 1,295M instr / 257.06 ms │ 1,283M instr / 256.06 ms │ 1,345M instr / 257.16 ms │
  ├──────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┤
  │ C        │  147.8M instr / 51.36 ms │  138.9M instr / 50.15 ms │  170.9M instr / 50.34 ms │
  └──────────┴──────────────────────────┴──────────────────────────┴──────────────────────────┘

@MaskRay MaskRay requested review from aengelke and nikic May 15, 2026 08:08
Copy link
Copy Markdown
Contributor

@aengelke aengelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

if (P(Ptr)) {
Bucket = getTombstoneMarker();
++NumTombstones;
Bucket = getEmptyMarker();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says: "It is safe to read the set inside the predicate function." -- But this seems no longer to be the case? I guess that means finding and validating all in-tree uses of remove_if..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment.

I find 3 references with my language server. The uses do not require changes.

llvm/include/llvm/IR/Analysis.h:227:18:

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:6576:21:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be a few more, but all seem fine:

  • InterleavedAccessInfo::invalidateGroupsRequiringScalarEpilogue
  • AArch64MachineFunctionInfo
  • SPIRVEmitIntrinsics::postprocessTypes
  • set_intersect

}
if (Removed) {
incrementEpoch();
Grow(CurArraySize);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention in doc comment?

Copy link
Copy Markdown
Contributor

@aengelke aengelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MaskRay MaskRay merged commit 7b227a2 into llvm:main May 19, 2026
12 of 14 checks passed
pedroMVicente pushed a commit to pedroMVicente/llvm-project that referenced this pull request May 19, 2026
SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R.  `erase` opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot.  The scan stops at the next empty bucket, which is guaranteed to
exist.

`remove_if` clears matches in a single pass then calls `Grow` at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Mirrors the small-mode change in 42c3edb, which dropped tombstones
from the inline array and likewise had `erase` invalidate iterators.
`erase` now physically relocates following entries, so it invalidates
iterators and references in large mode too.

Linear probing is vulnerable to primary clustering. Therefore, we lower
the max load factor to 2/3, slightly faster than the current 3/4.
pedroMVicente added a commit to pedroMVicente/llvm-project that referenced this pull request May 19, 2026
commit 70f8c7b51ab600261c8f45e29d6d683a72f196e3
Author: Paul Trojahn <[email protected]>
Date:   Tue May 19 20:08:31 2026 +0200

    [AMDGPU] Disable dpp src1 sgpr on gfx11 (#164241)

    https://github.com/llvm/llvm-project/pull/67461 enabled SGPRs as src1 by
    default for all dpp opcodes with manual checks for targets where this is
    not supported. In that case, isOperandLegal checked if the second
    operand is legal as src0.
    https://github.com/llvm/llvm-project/pull/155595 disabled this check by
    removing the calls to isOperandLegal, which resulted in SGPRs being used
    as operands for src1 on gfx11. This PR reenables this check and fixes
    the lit test.

    ---------

    Co-authored-by: Paul Trojahn <[email protected]>

commit b16e3a0db7759ec31de981692e10c19d930c9dac
Author: Luca Barbato <[email protected]>
Date:   Tue May 19 20:01:58 2026 +0200

    [libc] Remove broken __builtin_aarch64_wsr fallback in set_thread_ptr (#197295)

    The fallback used __builtin_aarch64_wsr (32-bit) instead of
    __builtin_aarch64_wsr64, truncating the 64-bit thread pointer value and
    causing non-deterministic runtime crashes.

    Modern GCC correctly warns about it and -Werror=conversion catches it.

    ```
    /var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp: In function ‘bool __llvm_libc_22_1_5_::set_thread_ptr(uintptr_t)’:
    /var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp:90:38: error: conversion from ‘uintptr_t’ {aka ‘long unsigned int’} to ‘unsigned int’ may change value [-Werror=conversion]
       90 |   __builtin_aarch64_wsr("tpidr_el0", val);
          |                                      ^~~
    cc1plus: all warnings being treated as errors
    ```

commit e13d9c2630569e024a0763694806716354b54e1f
Author: Erich Keane <[email protected]>
Date:   Tue May 19 10:59:00 2026 -0700

    [CIR] Implement atomic cmp exhange with non-const 'weak' lowering (#198546)

    This was left as an NYI, but appears in self build!

    This patch follows the existing solution in that we are doing the
    branching of weak vs not-weak at the CIR level. This is necessary
    because the LLVM intrinsics (and the CIR operaions) take 'weak' as a
    constant value.

    Unlike classic-codegen, this patch uses an 'if' instead of a 'switch' on
    the 'weak' value. This is mainly for readability (since it is a switch
        over a bool!), but also because our 'switch' doesn't seem to support
    'bool', so this would require an additional cast.

    As a future direction, we may wish to modify the CIR operations to take
    'weak' and 'failure' value (both are constants in LLVM intrinsics!) as
    non-constants, and handle the switch/if statement during lowering. This
    would give us an opportunity to optimize the value out without having to
    collapse the if/switch/etc, and minimize the size of the CIR. However,
    as that is a larger direction, this patch skips that for now.

commit bb95a8dd341e6bac6e6d94ac8906d223bc7a215d
Author: Erich Keane <[email protected]>
Date:   Tue May 19 10:58:56 2026 -0700

    [CIR] Fix assumption that 'curFn' is always a function in direct-call (#197766)

    The code to do some checking with a builtin function tried to tell
    whether it is being called inside of a function of the same name. This
    isn't necessarily true (that it is in a function), since we generate
    'global' ops as a curFn too. This patch just removes the assumption and
    changes the condition to only happen when we're in a function.

commit 58a43dcbfe4577773c09fce0de9f0036cd99c49f
Author: Arseniy Obolenskiy <[email protected]>
Date:   Tue May 19 19:57:52 2026 +0200

    [mlir][SPIR-V] Support literal struct type in spirv.Constant (#198414)

commit ace44dc5a2a48e5d024bc5c015f69a67cd9be909
Author: Arseniy Obolenskiy <[email protected]>
Date:   Tue May 19 19:56:04 2026 +0200

    [AMDGPU] Gate `S_LSHL[1-4]_ADD_U32` patterns on uniform results (#198508)

    Like the other SOP2 patterns in this file, these scalar instructions
    require the result to be uniform. Wrap them in `UniformBinFrag` so
    divergent shl/add chains use `V_LSHL_ADD_U32`

commit 987b0e3db2f4bd9d3622679e3ba376836a90270b
Author: Krzysztof Drewniak <[email protected]>
Date:   Tue May 19 12:44:41 2026 -0500

    [mlir][AMDGPU] Extend amdgpu.transpose_load for gfx1250 (#198354)

    This commit adds support for gfx1250's ds_load_tr* instructions to
    `amdgpu.transpose_load` since they're pretty close to the gfx950 ones.

    ---------

    Co-authored-by: Codex <[email protected]>

commit ed1441988943c6ad00b1452b54f632019f1c0f47
Author: Jacques Pienaar <[email protected]>
Date:   Tue May 19 19:44:25 2026 +0200

    [mlirbc] Add missing encoding for float types (#191962)

    Enabling but making it easy to disable to enable reader side first updates.

commit d46cca082b2e4974985118fb7a419f9ba309ab60
Author: Jonas Devlieghere <[email protected]>
Date:   Tue May 19 10:43:49 2026 -0700

    [dsymutil] Add missing --linker {classic,parallel} in tests (#198568)

    As I'm preparing to toggle the default, I found another set of tests
    that don't explicitly pass the linker to dsymutil.

commit 43b66dfce623888a8f88415a2a97ca96d4405cd3
Author: Aiden Grossman <[email protected]>
Date:   Tue May 19 10:42:09 2026 -0700

    [IR] Explicitly note C standard library UB (#198562)

    This language is to my understanding a bit outdated (if we're in a
    freestanding environment, we should be handling things fine to my
    knowledge, or at least I'm not aware of any outstanding issues reported
    by people compiling for freestanding environments/different languages
    which are somewhat prominent at this point). The language here dates
    back to
    68f971b1d67d51272f5c141fc9e4740e27e279f4 with some minor modifications
    in 722212d1a0672ae18a23db58c4cfb7e38073abfa. Explicitly note the UB
    aspect as this came up recently when working on llubi in #190147 and I
    do not think hurts to explicitly note.

commit c5e2c25988fb1e4dc4121adc61933109fb04a4e0
Author: Tomer Shafir <[email protected]>
Date:   Tue May 19 20:37:04 2026 +0300

    [MC] Add -sched-model-reservation-station-scale-factor option (#195638)

    This patch adds a new CLI option to the MC layer called
    `-sched-model-reservation-station-scale-factor` that enables to scale
    the buffer size of all reservation stations (RS) for resources in the
    scheduling model by a positive `float` factor. It is limited to scaling
    OOO resources, and not special buffer sizes (-1,0,1), and similarly it
    is only allowed to produce OOO resources.

    This can be used for example to try find headroom in post-RA instruction
    scheduling for OOO cores - e.g. scale RS size by 2 and observe IPC
    gains, if so the code may be senetitive to the schedule and we may do
    better.

    Note: Currently, BufferSize for LSU resources defines the reservation
    station (RS), but if present also the ld/st queue size, which just
    points to the provided LSU resource. Thus, we currently scale them both
    in lockstep, until we have an independent ld/st queue model.

    The implementation adds a new method
    `MCSchedule::getResourceBufferSize()` that calculates the buffer size
    and exits early by default. It is wrapped on `TargetSchedule` for
    CodeGen, or other users call it directly. For `ResourceState` we add a
    scheduling model context to the constructor. `LSUnit` still uses
    `BufferSize` directly, as it has inline logic for load/store queues
    handling. So is `llvm/lib/Target/SystemZ/SystemZHazardRecognizer.cpp`
    which is target specific. Testing through `llvm-mca` debug output. We
    could syntesize a BB that is sensitive to the schedule and check an IPC
    improvement, but it would be more compilated.

    This only impacts llvm-mca at the moment. However, I think it should
    logically belong to the MC layer, which owns the scheduling model
    infrastructure, as the option is coupled to the model and not MCA. Thus,
    this patch adds it speculatively to the MC layer, e.g. so that we can
    easily observe how a RS-based scheduler performs for next-generation
    larger-RS target without re-coding and re-building LLVM. Following up,
    we may even move other such options to from the `llvm-mca` tool to the
    MC library like dispatch width etc.

commit 7b227a281d29861c959db0d08c02fc97b55709e0
Author: Fangrui Song <[email protected]>
Date:   Tue May 19 10:30:52 2026 -0700

    [SmallPtrSet] Drop tombstones in large mode (#197637)

    SmallPtrSet uses quadratic probing with tombstone deletion in large
    mode. Tombstones occupy a third bucket state and hurts lookup.

    Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
    Algorithm R.  `erase` opens a hole at the removed slot, walks forward
    sliding each following entry whose probe path crosses the hole back
    into it (the hole moves with each slide), and stops at the next empty
    slot.  The scan stops at the next empty bucket, which is guaranteed to
    exist.

    `remove_if` clears matches in a single pass then calls `Grow` at the
    current size to restore the linear-probe invariant, O(N) total.
    (Per-match Algorithm R erase would be O(N * cluster).)

    My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
    Table family (not good at small keys) are actually worse than the
    baseline.

    Mirrors the small-mode change in 42c3edb4819f, which dropped tombstones
    from the inline array and likewise had `erase` invalidate iterators.
    `erase` now physically relocates following entries, so it invalidates
    iterators and references in large mode too.

    Linear probing is vulnerable to primary clustering. Therefore, we lower
    the max load factor to 2/3, slightly faster than the current 3/4.

commit 5d5220c53848ef86de57946acda4bc7d8b91880c
Author: Kareem Ergawy <[email protected]>
Date:   Tue May 19 19:23:55 2026 +0200

    [CUF] Fix `do concurrent` IV type in `cuf.kernel` lowering (#198584)

    Induction variables in a `do concurrent` construct inside a cuf kernel
    directive were allocated with the `index` type instead of the
    Fortran-declared `integer` type. This caused a type conversion failure
    when the index variable was used in a context requiring a different
    integer or real type (e.g. `real(i)`).

    Fix by using `genType(*name.symbol)` to derive the allocation type from
    the symbol's Fortran declaration.

    PR stack:
    - https://github.com/llvm/llvm-project/pull/198584 ◀️
    - https://github.com/llvm/llvm-project/pull/198585

    Co-authored-by: Claude Sonnet 4.6 <[email protected]>

commit 40fa628fffb96d55cdaaa294a5d02b75bb738418
Author: Michael Jones <[email protected]>
Date:   Tue May 19 10:22:06 2026 -0700

    [libc] implement putwc (#196165)

    Add putwc function and tests. Part 9/10.

    Assisted by Gemini

commit 7fc16358d9c4753a8ecdbd58698fcf4315b90646
Author: Sang Ik Lee <[email protected]>
Date:   Tue May 19 10:19:11 2026 -0700

    [MLIR][XeGPU] Temporarily disable XeGPU peephole optimizer for CRI (#198031)

    Temporarily disabling the optimization pass until a fix is ready.
    Support for CRI was added in recent PR:
    https://github.com/llvm/llvm-project/pull/197229
    But had post merge issues.

commit f06dac7cf8efab07ce12667792210d823f2d18f3
Author: Razvan Lupusoru <[email protected]>
Date:   Tue May 19 10:18:18 2026 -0700

    [mlir][acc] Allow non-resolvable callee in ACCImplicitRoutine (#198581)

    Some dialect call operations do not necessarily require or prevent for a
    call to occur to non-defined symbol (only func.call enforces this). Thus
    instead of making the condition an assert simply skip it just like other
    cases where call target cannot be found.

    This PR also adds IR testing for ACCImplicitRoutine with the final test
    case being related to the issue this PR is solving.

commit 0b714ac473a28d45df01a507e7615f4f847bd065
Author: Sang Ik Lee <[email protected]>
Date:   Tue May 19 10:17:55 2026 -0700

    [MLIR][XeGPU] Fix initial value issue with dpas_mx e2m1 test. (#198056)

    Integration test was using incorrect initial values.

commit 0fc3823cd589d14fa66108b4759604c37ba0432c
Author: Sang Ik Lee <[email protected]>
Date:   Tue May 19 10:17:19 2026 -0700

    [MLIR][XeVM][CODEOWNERS] Add @akroviakov as additional XeVM code owner. (#197965)

commit 618df18438ca55498326d4e56c40b98f83eb671c
Author: Kareem Ergawy <[email protected]>
Date:   Tue May 19 19:16:11 2026 +0200

    [CUF] Handle renamed managed companion pointer in `CUFDeviceAddressOpConversion` (#198161)

    CUFAddConstructor creates a companion pointer global
    (`@sym.managed.ptr`) for
    each non-allocatable managed variable.
    `CompilerGeneratedNamesConversion` may
    run before `CUFOpConversionLate` and rename the global by replacing dots
    with
    'X', producing `@symXmanagedXptr`. Extend `CUFDeviceAddressOpConversion`
    to try
    both the original and the renamed suffix when looking up the companion
    pointer.

    Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

    ---------

    Co-authored-by: Claude Sonnet 4.6 <[email protected]>

commit b22dbfca3e7d02f4c87f4825e23cd83ee385ffbd
Author: EuphoricThinking <[email protected]>
Date:   Tue May 19 19:15:04 2026 +0200

    [offload] update README for Tablegen (#196553)

    This patch changes the non-existent target `LLVMOffload` to the correct
    target `offload` and adds a more detailed path to
    `OffloadImplFuncDecls.inc`.

commit c520ae386458ac3763c7d16a0be105f906e319ed
Author: Michael Jones <[email protected]>
Date:   Tue May 19 10:14:23 2026 -0700

    [libc] implement getwchar (#196164)

    Add getwchar function and tests. Part 8/10.

    Assisted by Gemini

commit 8f50a28e675f71a1e9d19ba65de2b7571b566402
Author: Jeff Bailey <[email protected]>
Date:   Tue May 19 17:13:48 2026 +0000

    [libc][NFC] Fix typos in time.yaml keys for header generation (#198551)

    The YAML parser for header generation expects the plural 'standards:'
    key for function definitions. Using the singular 'standard:' caused
    these entries to be ignored by the parser.

    Assisted-by: Automated tooling, human reviewed.

commit e69073e6ed10e003aebb3c99c4739dc07807248d
Author: Akshay Deodhar <[email protected]>
Date:   Tue May 19 10:12:14 2026 -0700

    [NVPTX] Autoupgrade atom.load intrinsics to device scope (#198431)

    The default scope for the "atom" instruction in PTX is "gpu"
    (corresponds to "device" in LLVM). The default LLVM scope is "system".

    When scopes were not supported, the "system" scope was silently dropped,
    and the default device scope was desired behavior.

    This patch fixes the discrepancy introduced by
    https://github.com/llvm/llvm-project/pull/179553.

commit 22e8c55ccf808620cce66c9b1fb6f1341d15a3cc
Author: Yihan Wang <[email protected]>
Date:   Wed May 20 01:06:02 2026 +0800

    Reapply [Clang] Implement P2843R3 - Preprocessing is never undefined (#196989)

    This PR reapply https://github.com/llvm/llvm-project/pull/192073, and
    make most `-pedantic` warnings default on.

    ---
    This PR marks [P2843R3 - Preprocessing is never
    undefined](https://wg21.link/P2843) as implemented and add tests.

    Fixes https://github.com/llvm/llvm-project/issues/145658

    ---------

    Signed-off-by: yronglin <[email protected]>

commit a9ceabab84b916d4e2b980a9c9d579bd48629442
Author: forking-google-bazel-bot[bot] <265904573+forking-google-bazel-bot[bot]@users.noreply.github.com>
Date:   Tue May 19 12:05:13 2026 -0500

    [Bazel] Fixes b9c7a8e (#198579)

    This fixes b9c7a8e43ac26b0df422cd9a403382f0656785a0.

    Co-authored-by: Google Bazel Bot <[email protected]>

commit b39ffdb9bff7ea0a1319b724b620b0aba8b83642
Author: Michael Jones <[email protected]>
Date:   Tue May 19 10:04:14 2026 -0700

    [libc] implement fgetws (#196161)

    Add fgetws function and tests. Part 5/10.

    Assisted by Gemini

commit 5c74de974170451733011a4b46c2e8395419e4fb
Author: NeKon69 <[email protected]>
Date:   Tue May 19 20:02:36 2026 +0300

    [LifetimeSafety] Warn when lifetimebound attribute is present on the definition but not on declaration (#197753)

    Warn when the `[[clang::lifetimebound]]` attribute is present on a
    definition but not on the corresponding declaration, with a note
    pointing to where the attribute appears.

    Closes #183016

commit ac3ccd277f8be425ffa0dd56ae4223683130e719
Author: Hassnaa Hamdi <[email protected]>
Date:   Tue May 19 17:55:24 2026 +0100

    [LV] Recognize UF*vscale as a valid canonical-IV increment step (#196722)

    This patch teaches `findCanonicalIVIncrement` to recognize the
    pre-materialization step pattern (`mul vscale, UF`) as a valid
    canonical-IV step.

commit 21121533d7969837f95c687f55e0a25e463fbee7
Author: Slava Zakharin <[email protected]>
Date:   Tue May 19 09:55:10 2026 -0700

    [flang][build] Fixed missing library dependency. (#198575)

    I started getting this error in shared-libs build after #197442:
    ```
    ld.lld: error: undefined symbol: Fortran::evaluate::Expr<Fortran::evaluate::SomeType>::~Expr()
    >>> referenced by optional:257 (.../include/c++/9.3.0/optional:257)
    >>>               tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/DoConcurrentConversion.cpp.o:(std::_Optional_payload_base<Fortran::evaluate::Expr<Fortran::evaluate::SomeType>>::_M_destroy())
    ```

    This patch adds the missing dependency.

commit 51c961b7eb5c55b6770fce7399d522ea112370ce
Author: Michael Jones <[email protected]>
Date:   Tue May 19 09:54:20 2026 -0700

    [libc] implement fputws (#196160)

    Add fputws function and tests. Part 4/10.

    Assisted by Gemini

commit 4fb265d3629c8243a073ab598e2dbf8250741d2c
Author: Oskar Wirga <[email protected]>
Date:   Tue May 19 12:54:16 2026 -0400

    [AArch64] Fix return address auth in swiftasync epilogues (#189484)

    This is part of work being done in #188378 and #188638, split out from

    When a tail call has a non-zero FPDiff the epilogue adjusts SP before
    auth so when `AUTI[AB]SP` (and `autiasppc` in the PAuthLR path)
    authenticates LR using the current SP as the modifier the return address
    signed at function entry with the *entry* SP means they no longer match,
    causing `EXC_ARM_PAC_FAIL`.

    Fix by computing the entry SP into X16 and using explicit `AUTI[AB] x30,
    x16` when FPDiff != 0.

    This PR was mostly developed with LLM assistance, but human tested on
    arm64e hardware.

    ---------

    Co-authored-by: Jon Roelofs <[email protected]>

commit 24163114a38cb53a1369583c44c1890d00f480d5
Author: Charles Zablit <[email protected]>
Date:   Tue May 19 18:52:55 2026 +0200

    [lldb][windows] add bin to PATH in API tests (#198550)

commit 24df8a6e30b96efa8cde0c88b0ba97b958af9f83
Author: Charles Zablit <[email protected]>
Date:   Tue May 19 18:50:49 2026 +0200

    [lldb][windows] Add --lldb-use-lldb-server to lit (#197660)

    Currently, the Windows builds of lit only allow `SystemDrive` when
    copying env vars into the test environment. `LLDB_USE_LLDB_SERVER=1` in
    the developer shell never reached the `lldb.exe` subprocess under
    `check-lldb`. Baseline and lldb-server runs therefore looked identical
    even when the user expected lldb-server to be used.

    `--lldb-use-lldb-server` in lit invocations sets
    `LLDB_USE_LLDB_SERVER=1` to test lldb on Windows with `lldb-server`
    instead of the in process plugin.

    Depends on:
    - https://github.com/llvm/llvm-zorg/pull/848

    rdar://177442264

commit 196c1693b4954604fc11627d8e3aafacc5afe78a
Author: Michael Jones <[email protected]>
Date:   Tue May 19 09:43:34 2026 -0700

    [libc] implement fputwc (#196158)

    Add fputwc function and tests. Part 2/10.

    Assisted by Gemini

commit 39e208448d952baf50e87a3d5e51456dcb6e00ba
Author: Ryan Buchner <[email protected]>
Date:   Tue May 19 09:40:27 2026 -0700

    [SLP] Precommit tests for incorrect base pointer failure for strided store vectorization (#198564)

    Issue first noted
    https://github.com/llvm/llvm-project/pull/198408#issuecomment-4485336867

    The incorrect base pointer is used for strided loads in some reordering
    scenarios.

commit 2474ac009bd8683a7bac11f514a6c2bd73c58d6c
Author: Jonas Devlieghere <[email protected]>
Date:   Tue May 19 09:37:13 2026 -0700

    [lldb] Make CommandObject::GetTarget filter out the dummy target (#198429)

    Follow-up to #197805. Make CommandObject::GetTarget the canonical target
    accessor for command code, and tighten its semantics so that DoExecute
    methods can't accidentally operate on the dummy target.

    GetTarget now returns Target* instead of Target&. The result is the
    target from the command's frozen execution context, falling back to the
    interpreter's execution context. The dummy target is filtered out and
    replaced with nullptr unless the command opts in via one of the
    eCommandRequires{Target,Process,Thread,Frame} flags (in which case
    CheckRequirements has already guaranteed a real target) or via the new
    eCommandAllowsDummyTarget flag.

    This is the first half of the cleanup discussed at the end of #197805. A
    follow-up will audit DoExecute methods that still reach for
    GetSelectedTarget or m_exe_ctx.GetTargetPtr() directly and migrate them
    to GetTarget.

commit faad403cb68b57be327302c669f24833bc70ba9f
Author: kwyatt-ext <[email protected]>
Date:   Tue May 19 11:36:57 2026 -0500

    [flang][Semantics] Enforce IMPLICIT NONE(EXTERNAL) for dummy procedures (#198398)

    Fix `CheckImplicitNoneExternal()` to correctly diagnose dummy arguments
    used as procedures that lack an explicit `EXTERNAL` attribute when
    `IMPLICIT NONE(EXTERNAL)` is in effect.

    Fixes #198395

    Flang silently accepted code where a dummy argument was called as a
    procedure under `IMPLICIT NONE(EXTERNAL)` without the required explicit
    `EXTERNAL` attribute. The Fortran 2018 standard C895 requires that each
    dummy procedure used as a procedure shall explicitly have the `EXTERNAL`
    attribute when `IMPLICIT NONE(EXTERNAL)` is specified.

    **`flang/lib/Semantics/resolve-names.cpp`:**

    1. In `NoteExecutablePartCall()`: Guard the `SetImplicitAttr(*symbol,
    Attr::EXTERNAL)` call with a check that EXTERNAL is not already
    explicitly set. This prevents clobbering the explicit/implicit
    distinction when a dummy argument already has an explicit `EXTERNAL`
    attribute.

    2. In `CheckImplicitNoneExternal()`: Change the EXTERNAL attribute test
    from `!symbol.attrs().test(Attr::EXTERNAL)` to
    `(!symbol.attrs().test(Attr::EXTERNAL) ||
    symbol.implicitAttrs().test(Attr::EXTERNAL))`. This catches both the
    case where EXTERNAL has not yet been set (non-dummy externals) and the
    case where it was set only implicitly (dummy arguments used as
    procedures).

    **`flang/test/Semantics/implicit07.f90`:**

    Added test cases for various uses of `IMPLICIT NONE(EXTERNAL)`.

    Assisted by Claude Opus.

commit b22016bd20c42a3811bf8185d92e5802852d4ca7
Author: Nick Sarnie <[email protected]>
Date:   Tue May 19 12:36:32 2026 -0400

    [offload][LIT] Remove XFAIL: intelgpu from 5 virtual function tests (#198559)

    Passing on the buildbot now, probably
    [this](https://github.com/llvm/llvm-project/pull/197556) change fixed
    them.

    Signed-off-by: Nick Sarnie <[email protected]>

commit 9ab16a1fa9c3f5a6bce8fc4723710ae2906a9ca9
Author: Charles Zablit <[email protected]>
Date:   Tue May 19 18:35:16 2026 +0200

    [lldb] split NativeFile in platform specific implementations (#196293)

commit b9c7a8e43ac26b0df422cd9a403382f0656785a0
Author: Nishant Sachdeva <[email protected]>
Date:   Tue May 19 22:03:26 2026 +0530

    [llvm-ir2vec] Creating one directory per library for IRUtils, and MIRUtils (#198170)

    Follow up to
    https://github.com/llvm/llvm-project/pull/194414#issuecomment-4467455664
    .
    Made a patch to move IRUtils, and MIRUtils to their own subdirectories

commit c9213db3ea3bb8be9eb079f20ab3aaff11b73048
Author: Stefan Weigl-Bosker <[email protected]>
Date:   Tue May 19 12:29:26 2026 -0400

    [InstCombine] Fold select of neg/not into sext-sub (#198225)

    Adds the following folds:
    - `select C, (sub 0, Y), (xor Y, -1) -> sub (sext !C), Y`
    - `select C, (xor Y, -1), (sub 0, Y) -> sub (sext C), Y`

    The original motivation is the high half of a two-word negation (See

    ```ll
    define i64 @f(i64 %lo, i64 %hi) {
    entry:
      %not.hi = xor i64 %hi, -1
      %lo.is.zero = icmp eq i64 %lo, 0
      %neg.hi = sub i64 0, %hi
      %r = select i1 %lo.is.zero, i64 %neg.hi, i64 %not.hi
      ret i64 %r
    }
    ```

    Which can be folded to:

    ```ll
    define i64 @f(i64 %lo, i64 %hi) {
    entry:
      %lo.not.zero = icmp ne i64 %lo, 0
      %mask = sext i1 %lo.not.zero to i64
      %r = sub i64 %mask, %hi
      ret i64 %r
    }
    ```

    Proofs: https://alive2.llvm.org/ce/z/DATqof
    Fixes: #198179

commit 7a2aa25c8c6dd77888dfa49aae4688e5daa3d82d
Author: Jordan Rupprecht <[email protected]>
Date:   Tue May 19 11:27:16 2026 -0500

    [bazel][mlir][OpenMP] Fix 91467766a8afb52439619163828c5f6816ddd550 (#198567)

    Add missing dep for Arith.h

commit 7f676cc7f0d861a9665b6235ac81d9246e8fd97c
Author: Hassnaa Hamdi <[email protected]>
Date:   Tue May 19 17:17:50 2026 +0100

    [LV] Fix analysis remarks leak (#197697)

    Currently, `LoopVectorizeHints::vectorizeAnalysisPassName()` returns
    empty pass name when the user forces vectorization, which allows
    analysis remarks to be emitted implicitly.
    The code sometimes emits analysis remarks with passing explicit pass
    name and other times it uses the `vectorizeAnalysisPassName()` which
    prevents consistent behaviour.

    This PR removes the implicit emitting so that analysis remarks
    consistently require `-pass-remarks-analysis` to print. Affected tests
    are updated to add `-pass-remarks-analysis`.

commit f5598a91d39c5a2a7d57693f8bc89c6562eb9fde
Author: asadium <[email protected]>
Date:   Tue May 19 12:16:55 2026 -0400

    [AMDGPU][GlobalISel] Add RegBankLegalize rules for G_FPTOSI_SAT/G_FPTOUI_SAT (#198208)

commit 998d8e87daf371b51822ce97e36046755668e5e6
Author: Jason Van Beusekom <[email protected]>
Date:   Tue May 19 11:13:42 2026 -0500

    [Flang][Fir] Set default alignment of array globals to 64 bytes (#194969)

    This commit implements the proposal from the RFC:
    https://discourse.llvm.org/t/rfc-alignment-of-global-arrays/90397/13

    This PR sets the alignment of all global arrays to 64 bytes (except for
    BIND(C) and common blocks). This Mirrors the execution of other fortran
    compilers (CCE, gfortran, nvfortran and ifx).

commit a5077468984ac3c47e6a3ca779c6f0ba680706c0
Author: Alex MacLean <[email protected]>
Date:   Tue May 19 09:11:39 2026 -0700

    [NVPTX] Fixup and test ExpandVariadics (#195709)

    The lowering of va_arg will load the arguments from the cursor via
    ld.local since the stack is used to store va_args. Update the
    ExpandVariadics NVPTX info struct to be consistent with this. Also
    initialize the pass for easier debugging.

commit 12470b3109bfa6e5fb3afda6bb22001add4d788b
Author: Artem Kroviakov <[email protected]>
Date:   Tue May 19 17:57:23 2026 +0200

    [MLIR][XeGPU] Improve deinterleave/interleave/dpas_mx ops handling (#197223)

commit 868bf3fbb96a4daae438c83484881bfcb0e68c2c
Author: Jordan Rupprecht <[email protected]>
Date:   Tue May 19 10:51:33 2026 -0500

    [bazel][libc] Fix 8076d17b61028e7fd5723fa84fd5615c945ae46b (#198553)

    Add new targets & deps for syscall wrappers

commit 5ff6c5e4a0f7882c2b9b52dff6710ff33c11a74c
Author: hidekisaito <[email protected]>
Date:   Tue May 19 08:48:37 2026 -0700

    [AMDGPU] SIFoldOperands: constant-fold S_ADD/S_SUB with immediate operands (#198410)

    Extend SIFoldOperands::tryConstantFoldOp to recognise three patterns
      * ADD/SUB(imm, imm) -> S_MOV_B32 (LHS +/- RHS)
      * ADD x, 0          -> COPY x   (Also `0 + x`)
      * SUB x, 0          -> COPY x   (SUB is not commutable)

    Assisted-by: Claude Opus 4.7

commit 8148e168f5f8d91fc364947749d3362c630762b7
Author: Walter <[email protected]>
Date:   Wed May 20 01:31:59 2026 +1000

    [X86] Fold splat AND on VGF2P8AFFINEQB source (#193364)

    Given that each row within `vgf2p8affineqb`'s matrix controls which
    source bits are selected, zeroing the same bit within all rows treats
    that corresponding source bit like it is zero. This means a AND of the
    input by any splatted 8-bit values can be folded with the matrix. This
    is patch:

    - Can eliminate a constant and/or reduces the instruction count from 2
    to 1.
    - Only occurs when the matrix is constant, ensuring that it can't
    increase the dependency chain.
    - Don't occur if the AND is multi use well the splat isn't constant,
    preventing additional operations.
    - Work with both constant 8-bit splats and scalars value that where
    splatted to a vector.
    - Includes test coverage for positive cases (by constants, variable
    scalars, non zero immediates) and negative (multi use, larger splats,
    variable matrices).

    Fixes #191325

commit e0f8b79090063d1e5cebcc85198821c60a9685d8
Author: Shilei Tian <[email protected]>
Date:   Tue May 19 11:31:16 2026 -0400

    [AMDGPU] Add three target features msad-insts, mqsad-pk-insts, and mqsad-insts (#198432)

commit af0b42d8a5b0dcfacf958a2267855ce25002675f
Author: Kiriti Ponduri <[email protected]>
Date:   Tue May 19 20:57:04 2026 +0530

    [libc] prefer *at syscalls in sys/stat wrappers (#197940)

    - These changes flips the #ifdef order to prefer the *at syscalls over
    normal ones.
    - In modern architectures, *at system calls are preferred over normal
    system calls cuz of safety issues.
    - So by checking for ""*at"" system calls first, we ensure better
    compatibility with modern systems.
    - After then normal syscalls moved else or elif for support to older
    ones.
      - From merged pr(#195792) and issue(#195620)

    ---------

    Signed-off-by: udaykiriti <[email protected]>
    Co-authored-by: Jeff Bailey <[email protected]>

commit e04895ff05c58f78acfbe5be02973c9bd1f0c689
Author: Harishankar.1 <[email protected]>
Date:   Tue May 19 20:56:50 2026 +0530

    [X86] Lower vector i8 ashr-by-1 using pavgb (#198487)

    For vector i8 arithmetic shift right by 1, the current lowering produces
    a 5-instruction sequence (psrlw + pand + xor + psubb plus a constant
    load) with a 4-deep dependency chain.

    This patch uses the identity

      ashr(x, 1) == avgceilu(x, -1) ^ (~x & 0x80)

    to lower to ISD::AVGCEILU + a short fixup, producing 4 instructions on
    SSE/AVX/AVX2 and 3 on AVX-512BW (after vpternlogd fusion of the AND/XOR
    pair), with two parallel dependency chains instead of one long one.

    The freeze on R is required because the target reads it twice, matching
    the pattern of the existing `shl R, 1 -> add R, R` case in
    LowerShiftByScalarImmediate.

    Alive2 proof: https://alive2.llvm.org/ce/z/LbXPhE

    Fixes #198061

commit 6fd09a539f5f397c569f81e22dcd032d5e818526
Author: Carlos Seo <[email protected]>
Date:   Tue May 19 12:22:24 2026 -0300

    [flang][OpenMP] Skip declare simd lowering for interface bodies (#197010)

    When DECLARE SIMD appears in the specification part of an interface
    body, the PFT records the directive as an evaluation of the enclosing
    program unit rather than of the interface body's subprogram. Its clause
    operands (linear/aligned/uniform) reference dummy arguments local to the
    interface body, which have no address in the enclosing scope, causing a
    crash.

    Detect the mismatch by comparing the program unit containing the
    directive with the procedure currently being lowered, and skip op
    emission when they differ.

    This handles both explicit declare simd(proc-name) and implicit forms in
    any enclosing context.

    Fixes #192581

commit 91467766a8afb52439619163828c5f6816ddd550
Author: SunilKuravinakop <[email protected]>
Date:   Tue May 19 20:52:20 2026 +0530

    [Flang] [OpenMP] atomic compare (#184761)

    Support for `omp atomic compare` in flang.
    Multiple clauses like capture with compare are not supported

    An issue for this was raised earlier at
    [181116](https://github.com/llvm/llvm-project/issues/181116)

    ---------

    Co-authored-by: Sunil Kuravinakop <[email protected]>

commit e32fc5dc5e71251f0f7d605604739602792b6e61
Author: Craig Topper <[email protected]>
Date:   Tue May 19 08:20:40 2026 -0700

    [SelectionDAG] Use getExtractSubvector. NFC (#198450)

commit 9ccaf838121687821a098975f55be1872bc288f3
Author: Benjamin Stott <[email protected]>
Date:   Tue May 19 16:13:42 2026 +0100

    [LLVM] Add a function to reset the opt bisector (#197723)

    For daemonized testing, we need to be able to reset the global opt
    bisector between test runs. This PR just adds a small function to the
    OptPassGate class to reset its state.

commit 8f6ed9f490bc8f7b88d945b9e38fd715fcee1966
Author: Ebuka Ezike <[email protected]>
Date:   Tue May 19 16:13:18 2026 +0100

    [lldb] Fix possible invalidated iterator. (#198482)

    The begin or end interator may be invalidated when a idx_pos in erased
    from the vector.

    Unblocks sanitised CI.

commit 38949dbc8b8f763cc14139703f1513abfe7f4b2f
Author: Alex Bradbury <[email protected]>
Date:   Tue May 19 16:08:31 2026 +0100

    Revert "[AArch64] Copy x4/x5 vararg payload into the x64 stack in Arm64EC exit thunks" (#198540)

    Reverts llvm/llvm-project#190933

    Reported issues with an EXPENSIVE_CHECKS_BUILD. Reverting so this can be
    fixed without undue time pressure.

commit 960ae6f812c8db7c92532898c338aeefc6f7ac6d
Author: Jean-Didier PAILLEUX <[email protected]>
Date:   Tue May 19 17:07:20 2026 +0200

    [Flang] Adding -ffree-line-length-<value> flag (#192941)

    Added support for the `-ffree-line-length-<value>` flag in Flang, which
    is equivalent to `-ffixed-line-length-<value>` but in free form.
    This flag is supported by gfortran and can be used in some applications.

    ---------

    Co-authored-by: Tarun Prabhu <[email protected]>
    Co-authored-by: Andre Kuhlenschmidt <[email protected]>

commit 4cb4bbca087458d33e946341cd79c6a2b9a605cc
Author: Sirraide <[email protected]>
Date:   Tue May 19 17:05:29 2026 +0200

    [Clang] [NFC] Use `unique_ptr<Lexer>` everywhere (#198393)

    Replace every instance of `new Lexer` with `make_unique<Lexer>` and
    adjust `Lexer::Create_PragmaLexer()` to return a `std::unique_ptr<Lexer>`
    instead.

    The Preprocessor was already storing a `unique_ptr<Lexer>`, so there’s
    no need to change how that works.

commit f6fb8f573ecbb64f41b473700589040acf0b44b9
Author: Charles Zablit <[email protected]>
Date:   Tue May 19 17:03:41 2026 +0200

    [lldb][windows] remove path separator replacement from TestGDBRemoteClient.py (#198537)

    Since https://github.com/llvm/llvm-project/pull/197942, vRun packets use
    the native path separators. TestGDBRemoteClient.py now fails on Windows
    because it converts the path to POSIX style paths, which is a workaround
    for what https://github.com/llvm/llvm-project/pull/197942 fixed.

    rdar://177342572

commit 673b17e1b38fbe4dfe01c9aa66c368857b13dfdb
Author: Simon Pilgrim <[email protected]>
Date:   Tue May 19 16:02:41 2026 +0100

    [DAG] scalarizeExtractedBinOp - extract from non-constant one use buildvectors (#198013)

    When attempting to scalarize a vector binop that has a single extract,
    we currently only fold if either of the binop's operands is a constant
    buildvector - but we can extract from non-constant buildvectors without
    increasing instruction count as long as the vector binop was the only
    use of the buildvector.

    More yak shaving for #196493

commit bcfa53e6e778295ea51665284da583e19a8073b9
Author: Razvan Lupusoru <[email protected]>
Date:   Tue May 19 08:02:13 2026 -0700

    [flang][acc] Handle Fortran do loops as acc loops in acc routine (#198420)

    As was previously done for do loops in acc compute constructs in
    https://github.com/llvm/llvm-project/issues/149614 , this PR does the
    same for do loops in `acc routine`. The rules are follows:
    - Do loops not marked with `acc loop` are considered `auto`
    - Do concurrent loops are considered `independent`
    - Any loops in an `acc routine seq` are considered `seq`

    This ensures that the IV is correctly privatized and attached to acc
    loop.

commit 75e4aafad0b3b1cb3a9f4314da8eca299437646c
Author: Jameson Nash <[email protected]>
Date:   Tue May 19 10:51:00 2026 -0400

    Reland "[CodeGen] Use byte offsets and ptradd in ShadowStackGCLowering" (#197436)

    Replace typed struct GEPs with byte array allocation and ptradd
    operations:

    1. Track root offsets as byte offsets instead of building typed struct.
    2. Use `ComputeFrameLayout` to compute byte offsets based on DataLayout,
    properly accounting for each root's size and alignment.
    3. Allocate frame as `[FrameSize x i8]` byte array instead of typed
    struct.
    4. Replace all CreateGEP operations with CreatePtrAdd using computed
    offsets.
    5. Frame layout unchanged: `[Next ptr | Map ptr | Root 0 | Root 1 | ...
    | Root N]` where each root is placed at its computed aligned offset.
    6. Zero out padding between roots with memset for deterministic frame
    contents for GC.

    Benefits:
    - Removes dependency on `getAllocatedType` for building frame struct
    - Properly handles root alignment requirements (fixes TODO in code)

     Relands #178436, after fixing the tests that I'd added specifically to
    show what changed in that PR, but forgot to rerun. And fixed the logic
    bug that having tests uncovered (using the wrong size for the frame in
    the metadata).

    Co-authored-by: Claude Sonnet 4.5 <[email protected]>

commit 52ca170e12c6ec42aa9327e3486e4c37efec1c31
Author: Tom Stellard <[email protected]>
Date:   Tue May 19 07:50:36 2026 -0700

    [Github] Hashpin base container in CI Tooling containerfile (#197315)

    https://github.com/llvm/llvm-project/security/code-scanning/1492

commit 213b3292904995e00db309eea66431e1b727c181
Author: Adel Ejjeh <[email protected]>
Date:   Tue May 19 09:48:59 2026 -0500

    [AMDGPU][NFCI] Change MCSubtargetInfo references in AMDGPUBaseInfo.h/.cpp to be const ref instead of pointers (#197038)

    Change all `AMDGPU::IsaInfo` functions and `initDefaultAMDKernelCodeT`
    to take `const MCSubtargetInfo &` instead of `const MCSubtargetInfo *`.
    These functions never accept null, so a reference better expresses the
    contract.

    Also change `AMDGPUMCKernelCodeT::initDefault` to take a const reference
    for consistency, and convert local `MCSubtargetInfo` pointer variables
    to references in `AMDGPUMCExpr.cpp` where the pointer is always
    dereferenced.

    Requested by @arsenm in
    https://github.com/llvm/llvm-project/pull/192306#discussion_r2076113671.

    Co-authored-by: Claude Opus 4 (1M context) <[email protected]>

commit 94b1d194ac4b7c11ad871efe5e333aed36f576bf
Author: Jameson Nash <[email protected]>
Date:   Tue May 19 10:48:17 2026 -0400

    [Utils] Examine debug info type instead of alloca type to guess the debug behavior of the alloca uses (#177480)

    Replace `isArray` and `isStructure` helpers that queried alloca IR type
    with a `isCompositeType` helper that checks the debug variable's
    source-level type from debug info metadata to decide if this seems
    perhaps profitable to convert to this debug info from #debug_declare to
    a #debug_value.

    This changes behavior: the lowering decision is now based on the
    source-level type from debug info rather than the IR alloca type, which
    is more semantically correct for debug info processing. This should
    have minimal effect on clang, but may change behavior more
    significantly on front-ends like rust that have not used semantically
    meaningful alloca element types.

    Removes all uses of getAllocatedType() from Utils/Local.cpp.

    This seemed slightly more semantically correct to me, though it is
    slightly challenging to enumerate all of the possible scalar debug
    types. Another alternative perhaps is to actually examine all of the
    uses and decide if all of them will get successfully handled before the
    `DDI->eraseFromParent` deletes the debug record, since
    `ConvertDebugDeclareToDebugValue` also just deletes the debug info
    (replaces with poison) if it finds a use that doesn't have an easy way
    to replace.

    Co-authored-by: Claude Sonnet 4.5 <[email protected]>

commit b7cc80035ad6bd4b778680707283f06fe0912f21
Author: Luke Lau <[email protected]>
Date:   Tue May 19 15:46:00 2026 +0100

    [VPlan] Simplify select x, (i1 y | z), y -> y | (x && z) (#190196)

    Fixes https://github.com/llvm/llvm-project/issues/189553

    This adds a canonicalization `select x, (i1 y | z), y -> y | (x && z)`,
    [Alive2]( https://alive2.llvm.org/ce/z/qcQRn6). InstCombine already
    performs this.

    This adds a canonicalization which causes the `lhs | (headermask && rhs)
    -> vp.merge rhs, true, lhs, evl` pattern in optimizeMasksToEVL to match,
    improving the RISC-V codegen for an anyof select reduction.

commit 581dd5b183e1cc29e4b129f12e3c1b203a77e6b0
Author: Sairudra More <[email protected]>
Date:   Tue May 19 19:46:15 2026 +0530

    [flang] Inline scalar-to-array hlfir.assign at -O0 (#197092)

    At `-O0`, Flang can lower trivial scalar-to-array broadcasts such as `c
    = a(1) + 1.0` through `_FortranAAssign`. That runtime path can call
    `free()`, which is not valid in OpenMP GPU device code.

    This patch teaches `InlineHLFIRAssign` to handle trivial scalar RHS
    values. At `-O0`, the pipeline runs it in a scalar-RHS-only mode, so
    only scalar-to-array broadcasts are inlined. Array-to-array assignments
    still fall back to `_FortranAAssign` at `-O0`.

    Scalar RHS values are materialized before the generated loop with
    `loadTrivialScalar`, preserving intrinsic assignment ordering for cases
    like `a = a(1)`. At `O1+`, the full `InlineHLFIRAssign` pass still runs
    as before, now also supporting scalar RHS.

    The remaining files are test updates from scalar-to-array assignments
    now being inlined at `-O0` instead of lowering through
    `_FortranAAssign`.

    Other device-runtime call paths are left out of scope.

    Fixes #197091.

    Co-authored-by: Sairudra More <[email protected]>

commit ed81c50b077d172e2691acc7a8887d8f0a278880
Author: Charles Zablit <[email protected]>
Date:   Tue May 19 16:09:16 2026 +0200

    [lldb][windows] Fix second-chance exception delivery on lldb-server (#197956)

    Currently, all tests that wait for the debugger to stop when the process
    crashes time out on Windows under `LLDB_USE_LLDB_SERVER=1` because of 2
    issues:

    1. The `if (!first_chance) SetState(eStateStopped, false)` before the
    switch silently advances `m_state` on every second-chance event. The
    `default:` branch later calls `SetState(eStateStopped, true)` but this
    is never reached because `state == m_state`. The client is waiting for a
    reply that is never sent.

    2. The `default:` branch's first-chance handling stops all threads and
    then returns `SendToApplication`, which tells Windows
    `DBG_EXCEPTION_NOT_HANDLED`. This hangs the process, the second-chance
    event never arrives. `ProcessWindows` is a no-op on first-chance
    non-breakpoint exceptions because of this: it just returns
    `ExceptionResult::SendToApplication` with no `StopThread/SetState`.

    This patch fixes both issues and prevents the following tests from
    timing out:
    ```
    lldb-api :: functionalities/breakpoint/breakpoint_conditions/crashing_condition/TestCrashingCondition.py
    lldb-api :: functionalities/builtin-debugtrap/TestBuiltinDebugTrap.py
    lldb-api :: functionalities/inferior-assert/TestInferiorAssert.py
    lldb-api :: functionalities/inferior-crashing/TestInferiorCrashingStep.py
    lldb-api :: functionalities/inferior-crashing/recursive-inferior/TestRecursiveInferiorStep.py
    lldb-api :: functionalities/location-list-lookup/TestLocationListLookup.py
    ```

    rdar://177425581

commit 69385549f52497076e7951f7373c192abdf04005
Author: Nico Weber <[email protected]>
Date:   Tue May 19 10:08:48 2026 -0400

    [gn] port 182ae96a82fc (#198533)

commit d06e6936a42375bf44ba58401f0f01f89ff1f750
Author: agozillon <[email protected]>
Date:   Tue May 19 16:08:00 2026 +0200

    [Flang][OpenMP] Restrict implicit default declare mapper from applying deep-copies of pointer members (#197885)

    According to the OpenMP specification, only allocatables should get
    deep-copy behaviour inside of implicit default declare mappers. This PR
    restricts this behaviour. Relevant specification exert, added as a
    comment for a reminder:

    // "If a component of a derived type list item is a map clause list item
    // that results from the predefined default mapper for that derived
    type,
    // and the component is not also an explicit list item or the array base
    // of an explicit list item on the same construct, then: if it has the
    // POINTER attribute, it is attach-INELIGIBLE. If a list item in a map
    // clause is an associated pointer that is attach-ineligible, the effect
        // of the map clause does not apply to its pointer target."

    This prevents certain programs from unexpected over-mapping via pointer
    nesting, doesn't prevent that for allocatables, but that's OpenMP
    specification mandated foot shooting, so it's free game.

commit 52871b5416688bd904d293cc2bd4c9e5ac3452b1
Author: Charles Zablit <[email protected]>
Date:   Tue May 19 16:07:23 2026 +0200

    [lldb][PECOFF] Recognise truncated .lldb{summaries,formatters} section names (#198377)

commit 7ae1a3fad8f06625d4e55b846a090ce9206c3c82
Author: Vlad Serebrennikov <[email protected]>
Date:   Tue May 19 18:05:41 2026 +0400

    [clang] Give unnamed namespaces internal linkage (#198215)

    Recently in #194600 we exposed formal linkage in AST dump. That PR came
    with a bunch of FIXMEs. One of them is about the fact that we consider
    unnamed namespaces to have external linkage, while the Standard says
    it's internal linkage
    ([[basic.link]/4](https://eel.is/c++draft/basic.link#4.sentence-1)):

    > An unnamed namespace or a namespace declared directly or indirectly
    within an unnamed namespace has internal linkage.

    Of course, declarations within unnamed namespaces still had internal
    linkage (nothing would work otherwise).

    The intent of this patch is to give unnamed namespaces internal linkage
    and to do a bit of refactoring in
    `LinkageComputer::getLVForNamespaceScopeDecl` to use linkage of the
    enclosing namespace as the default linkage of declarations within it,
    now that all kinds of namespaces have the correct linkage. No changes to
    the behavior of programs are intended.

commit 8076d17b61028e7fd5723fa84fd5615c945ae46b
Author: Pavel Labath <[email protected]>
Date:   Tue May 19 16:00:59 2026 +0200

    [libc] Port remaining socket functions to syscall_wrappers (#198463)

    While in there:
    - fix file headers to conform to latest standards
    - add missing restrict qualifier to recvfrom

    Assisted by Gemini.

commit c8d9852b0a8d5ea3e8c4549f8574e0a5cebb5997
Author: Mariusz Sikora <[email protected]>
Date:   Tue May 19 15:58:51 2026 +0200

    [AMDGPU][NFC] Add tests for 64bit literals in single DWORD instructions for gfx13 (#197907)

    Co-authored-by: sstipano <[email protected]>

commit d6d5a3da41e24a754fdc9b3a2cb06d60f3874ce1
Author: Brian Cain <[email protected]>
Date:   Tue May 19 08:46:52 2026 -0500

    [BOLT] Gate PointerAuthCFIFixup unit test on AArch64 target availability (#197464)

    The test bodies reference AArch64:: namespace identifiers (ADDSXri, X0)
    which fail to compile when AArch64 is not in LLVM_TARGETS_TO_BUILD. Wrap
    all TEST_P bodies in #ifdef AARCH64_AVAILABLE and add
    GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST to suppress GoogleTest's
    uninstantiated suite error when no target instantiates the tests.

commit f3c6257fabff4a138a0d119041c52678586eb013
Author: Simon Pilgrim <[email protected]>
Date:   Tue May 19 14:43:56 2026 +0100

    [lanai] multiply.ll - regenerate test checks (#198521)

commit ae8d973ae253463e4b821086b720fbc813c61e96
Author: jinge90 <[email protected]>
Date:   Tue May 19 21:43:29 2026 +0800

    Fix unused parameter for add_bitcode_entrypoint_library for GPU Libc (#198458)

commit 39a1ed4222782ccc1c97bbb497aeca2718b1b793
Author: Dmitry Vasilyev <[email protected]>
Date:   Tue May 19 17:41:53 2026 +0400

    [lldb][Windows] Disable TestLinuxCore.LinuxCoreTestCase.test_object_map on Windows (#198473)

    See https://github.com/llvm/llvm-project/issues/198471 for details.

commit e2464bf702197349a1ec0691e4ab188ea9a18155
Author: Pavel Labath <[email protected]>
Date:   Tue May 19 15:33:29 2026 +0200

    [libc] Add struct sockaddr_in (#197909)

    The struct needs to be 16 bytes long for compatibility with the linux
    kernel (which rejects smaller sizes, even though the reset of the bytes
    are unused).

    The padding field (and its name) is not specified by POSIX, but it's
    traditionally called sin_zero, and there exists a fair amount of code
    that references that name, so I'm matching it as well.

    I'm testing the compatibility of this struct by binding to a localhost
    address. This test requires that the machine has a loopback interface
    with an assigned ipv4 address. If some of the environments do not have
    it, we can try to detect this in the test and skip it, but this would
    diminish the value of the test.

    As a drive-by, I'm also adding the (non-POSIX) INADDR_LOOPBACK constant.

    Assisted by Gemini.

commit 4a32cf0b6d90f6bec3b5f2ad50b11ca1dba62954
Author: Hans Wennborg <[email protected]>
Date:   Tue May 19 15:15:39 2026 +0200

    Revert "[MC/DC][Coverage] Enable profile correlation for MC/DC" (#198520)

    The instrprof-mcdc-correlation.c test doesn't pass on Mac, see
    discussion on the PR.

    Reverts llvm/llvm-project#136437

commit c80b992d20e9d71c1361e28c5e4d25fc7716ee99
Author: Raul Tambre <[email protected]>
Date:   Tue May 19 16:12:55 2026 +0300

    [cmake][runtimes] Remove unused local CMake flag transform (#198506)

    This was used by some code to pass flags per-architecture to runtime builds.
    User code can never reach this path and the last use was removed more than a year ago.

    Fixes: bd6df0fe21faeaf3fd2a8ad17074f10620c78378

commit e7df16d856c86d2ecdcdf110bf62a5af46513ad6
Author: Ryotaro Kasuga <[email protected]>
Date:   Tue May 19 21:58:26 2026 +0900

    [LoopInterchange] Add test for poison can be produced due to ninf (NFC) (#197922)

    This patch adds a test case illustrating that loop interchange can turn
    an input program that was previously well‑defined into one that exhibits
    UB, because reordering floating‑point operations in the presence of the
    ninf flag may produce a poison value. In such a case, we should drop the
    ninf from instructions involved in the transformation.
    Also add another case to show that we can safely apply loop-interchange
    without eliminating ninf, because it doesn't change the order of
    floating-point operations chain.

    Related issue: #148851

commit fa46161cbd86f59a2b4d66dcac8909c7690135c9
Author: Sam Parker <[email protected]>
Date:   Tue May 19 13:56:32 2026 +0100

    [NFC][WebAssembly] Codegen test (#198503)

commit 14cb771bb45040d217c205624aaa9cd8e505ab5a
Author: David Sherwood <[email protected]>
Date:   Tue May 19 13:54:51 2026 +0100

    [LV][NFC] Fix enums and comments in VPInstruction class (#198498)

    From an audit of the code I can see that we only ever create
    VPInstructionWithType objects for the following opcodes:

    Scalar loads - see createVPInstructionsForVPBB
    All supported CastInst variants - see createVPInstructionsForVPBB
    VPInstruction::StepVector via createNaryOp
    VPInstruction::WideIVStep via createNaryOp
    VPInstruction::VScale via createElementCount
    Explicit creation of Instruction::Trunc, Instruction::SExt and
    Instruction::ZExt via createScalarZExtOrTrunc, etc.

    Therefore, I've updated the list of enums in the VPInstruction class to
    reflect what classes they are associated with. This now correctly
    matches the code in VPInstructionWithType::classof.

commit a84d94802dad5b60947b66c2b04ec12e1efa8d7a
Author: Raphael Isemann <[email protected]>
Date:   Tue May 19 13:29:52 2026 +0100

    [lldb] Add a packet-test-delay setting for testing slow connections (#195440)

    [lldb] Add a packet-test-delay setting for testing slow connections

    Sending/receiving packets to/from a non-host devices adds latency to
    the gdb-remote communication that induces substantial slowdown into the
    debugging experience.

    This patch adds an artificial delay before sending a packet to
    simulate this latency without requiring an emulated/physical device.

    The test checks only checks that there is a delay when this setting is
    activated, but not that there is no delay when the setting is not
    activated. The reason for this is that this negative test would either
    require a large delay (which further slows down the test suite) or be
    prone to accidential failures on overloaded test bots.

    There is also the question whether we should have dynamic delays that
    depend on the number of transferred bytes. From talking to Felipe it
    seems the real bottleneck is really just the latency and not the
    transfer speed. From my own testing, it seems that a simple fixed array
    mostly simulated the remote debugging performance, and the delays I can
    see on my own machine seem to mostly resemble the delays we can see
    when debugging on a remote device.

    This patch also required wrapping `SendPacketAndWaitForResponse` so we
    can sync this setting on a per-packet basis for the
    GDBRemoteCommunication class. Without this, we could only set the delay
    before connecting and then never change afterwards which would make
    this feature less useful and writing a simple test impossible.

commit ea922daab1803d4be3a1df0bb37cce83bc48d8a3
Author: Charles Zablit <[email protected]>
Date:   Tue May 19 14:27:54 2026 +0200

    [lldb][windows] Plumb Windows DLL load/unload through lldb-server (#197901)

commit b4aca7a59d0754eb54bbeddb53de746adfbe8b4b
Author: Donát Nagy <[email protected]>
Date:   Tue May 19 14:12:12 2026 +0200

    [NFC] Clarify behavior of ownership_holds in docs (#197933)

    I misunderstood the behavior of the attribute `ownership_holds` based on
    the claim that "using held memory is assumed to be legitimate".

    To avoid similar misunderstandings in the future, I'm adding an extra
    sentence to the documentation.

    (This question came up during the discussion of #196798)

commit 313cc46e5893e6299ee1460e8dcc4f0ff5da015a
Author: dcandler <[email protected]>
Date:   Tue May 19 13:08:19 2026 +0100

    [libc][cmake] Don't assume CPU features in cross builds (#198308)

    Assuming all CPU features are available during cross builds will be
    incorrect in some cases, such as armv8.0-a where FullFP16 is not
    available and therefore FP16 instructions will not be understood.

    Looking at the file history, the feature detection code previously
    relied on try_run instead of try_compile, which might explain why it
    couldn't be used in cross builds as cross built binaries wouldn't be
    executable. But now the tests are compile only, they can work for cross
    builds too, which would be better than assuming features.

commit 75ae202145b316883caa6074e9c94431ef152c26
Author: David Green <[email protected]>
Date:   Tue May 19 12:56:32 2026 +0100

    [AArch64][GlobalISel] Add sign bits for G_FCMEQ (#198314)

    This adds basic num-sign-bits for G_FCMEQ, G_FCMGE and G_FCMGT, which
    all produce either all-ones or all-zeros in each vector lane. This
    function apparently goes in AArch64ISelLowering.

commit 0e8f3574ad757596eeb1df446121f61c3146c961
Author: Alexey Bataev <[email protected]>
Date:   Tue May 19 07:09:14 2026 -0400

    [SLP] Prefer outer binary op when opcode groups tie in buildInstructionsState

    When VL contains instructions of different opcodes with equal counts,
    the tie-breaking in buildInstructionsState could replace an outer
    operation (e.g., fadd) with an inner one (e.g., fmul) that appears as
    its direct operand, depending on SmallMapVector iteration order. Add a
    check: if the current MainOp is a BinaryOperator with a direct operand
    matching the challenger partition's opcode in the same block, keep
    MainOp instead of switching to the inner operation.

    Partially fixes #43353

    Reviewers: bababuck, hiraditya, RKSimon

    Pull Request: https://github.com/llvm/llvm-project/pull/198194

commit 230980947083c2309f135c82f10c735b0e06461f
Author: Alexey Bataev <[email protected]>
Date:   Tue May 19 06:56:00 2026 -0400

    [SLP] Support ordered fadd reduction via reduction intrinsics

    Add matchOrderedReduction() to recognize linearized ordered fadd chains
    (both LHS- and RHS-associated) and tryToReduceOrdered() to vectorize
    them using ordered reduction intrinsics (llvm.vector.reduce.fadd).

    Previously, the SLP vectorizer could only vectorize ordered reductions
    by keeping the original scalar chain and emitting extractelement
    instructions. The new path replaces the scalar chain with a vector
    ordered reduction intrinsic (where profitable), which allows the backend to lower it
    more efficiently.

    Reviewers: hiraditya, RKSimon, bababuck

    Pull Request: https://github.com/llvm/llvm-project/pull/189451

commit 5ac91ca11e897f1e06b1cd1439ade7e191d03001
Author: Jack Styles <[email protected]>
Date:   Tue May 19 11:51:13 2026 +0100

    [Flang][OpenMP][NFC] Track Objects for BlockArgs (#197442)

    When lowering a BlockArg in OpenMP, currently the symbol is tracked.
    This can however cause issues later on down the line as information may
    be lost relating to an expression. For example, an ArrayElement will be
    represented by its symbol, in this case the full array. This is not
    ideal as its just he ArrayElement that is intended to be represented.

    Now, the object is tracked instead of the Symbol. For cases where the
    symbol is required, appropriate API is available to retrieve this
    information. This change opens the ability to better handle lowering of
    expressions such as Array Elements.

    Assisted-by: Codex

commit 182ae96a82fc355eca89d12c179d6a0961c653d8
Author: Nikolas Klauser <[email protected]>
Date:   Tue May 19 12:41:42 2026 +0200

    [libc++] Port The OpenBSD localization to the new locale API (#194317)

commit de3ee84346d6dcf77ac20fe5c8acc95594886cbc
Author: forking-google-bazel-bot[bot] <265904573+forking-google-bazel-bot[bot]@users.noreply.github.com>
Date:   Tue May 19 12:14:52 2026 +0200

    [Bazel] Fixes 6f92180 (#198467)

    This fixes 6f9218051ab9e04a8547f7029ca1a9804b5c526d.

    Co-authored-by: Google Bazel Bot <[email protected]>

commit aecb57616f5853d3c4a405eff8a9b1c14ed05d23
Author: Ebuka Ezike <[email protected]>
Date:   Tue May 19 11:12:54 2026 +0100

    [lldb] Fix no compile unit crash. (#195853)

    This crash happens in lldb-dap when hovering inspecting over instruction
    addresses in a frame that does not have debug information.

commit 647cb063d5d3d91547f35809b7990a7769227984
Author: Hristo Hristov <[email protected]>
Date:   Tue May 19 13:07:55 2026 +0300

    [libc++][ranges] `ranges::iota_view` update tests with `__int128` (#175447)

    https://github.com/llvm/llvm-project/pull/167869 made `iota_view`
    `__int128` aware but tests needed updating.

    ---------

    Co-authored-by: Hristo Hristov <[email protected]>

commit 4c10240569025ab1fe086f220d59d36abfcbe906
Author: Nishant Sachdeva <[email protected]>
Date:   Tue May 19 15:35:27 2026 +0530

    [llvm-ir2vec][NFC] Adding disclaimer to Bindings requirements.txt to check compatibility with ml-compiler-opt (#198171)

    Follow up PR for
    https://github.com/llvm/llvm-zorg/pull/846#issuecomment-4467263196

commit 7c45228c97288c2665788e698feb58a6a4d9ff47
Author: Charles Zablit <[email protected]>
Date:   Tue May 19 12:02:15 2026 +0200

    [lldb] Remove FileAction::Clear (#198350)

commit a256cf74819b911a8876e5cfba58ec8aade6dca7
Author: Chaitanya <[email protected]>
Date:   Tue May 19 15:25:59 2026 +0530

    [CIR] Implement function target/tune attrs and FMV metadata. (#195813)

    Port OGCG's GetCPUAndFeaturesAttributes into CIRGenModule, replacing the
    opFuncMultiVersioning placeholder. Handles TargetAttr /
    TargetVersionAttr /CPUSpecificAttr / TargetClonesAttr, AMDGPU
    delta-feature encoding, and AArch64 fmv-features metadata.

commit 8ec15f5ac95ddc6bc794e43c714b2704081b8efe
Author: Ivan Kosarev <[email protected]>
Date:   Tue May 19 10:49:39 2026 +0100

    [TableGen] Fix getting weights of register classes (#198328)

    The first member can be an aritifical register, so we have to find a
    non-artificial one to query its weight.

commit 0eceac10de71050086961c6bde5a6976727b30e8
Author: Jeff Bailey <[email protected]>
Date:   Tue May 19 09:48:12 2026 +0000

    [libc] Add regex_macros dependency to regex header (#198453)

    Added the regex_macros dependency to the regex header target.
    regex-macros.h was not being installed when regex entrypoints were
    enabled.

    Assisted-by: Automated tooling, human reviewed.

commit 304d077d43240d43408dc5595a00374629e97886
Author: Jacob Crawley <[email protected]>
Date:   Tue May 19 10:47:58 2026 +0100

    [AArch64] Add missing FSub case to isLegalToVectorizeReduction (#198302)

    Adds missing RecurKind::Fsub case to lower to partial reduction.

commit 64f70354e262e288e3787566645919230c05196d
Author: Raphael Isemann <[email protected]>
Date:   Tue May 19 10:24:40 2026 +0100

    [lldb] Fix wrong buffer size when fetching Objective-C classes (#197389)

    LLDB calls objc_getRealizedClassList_trylock to fetch the list of
    realized Objective-C classes.

    Jim spotted that we currently pass the buffer length in *bytes*, when
    actually this API takes the buffer length in number of elements. This
    causes that the Objective-C runtime write more memory that we allocated
    for it. This can cause that the function calling expression crashes and
    leaves the Objective-C runtime mutex locked.

commit e7887d5470e080c55fd69de83786fbf060d3955a
Author: Sushant Gokhale <[email protected]>
Date:   Tue May 19 14:38:42 2026 +0530

    [AArch64][SVE] Use truncating stores whenever possible (#196029)

    For fixed length SVE and fixed length vectors x/y, fold

    ```
    store(concat_vector(truncate(x), truncate(y)))
    -->  store(truncate(x))
         store(truncate(y))
    ```

commit 60e95e5d2685d6097aad31af6f9e77a9bb7c79ac
Author: Michael Kruse <[email protected]>
Date:   Tue May 19 10:55:47 2026 +0200

    [Flang][Driver] Add per-target search path for modules (#196558)

    Adds the version- and target-specific path

        ../lib/clang/<version>/finclude/flang/<target>

    to the intrinsic module search path in addition to

        ../finclude/flang

    with the former taking precedence if a module file should exist in both.
    The version/target-specific path is added by the driver by passing
    `-fintrinsic-modules-path` to the `-fc1` invocation. This is consistent
    with gfortran and the usual pattern that the driver resolves paths into
    the resource path, not the frontend.

    This PR adds nothing into that directory, which will be done in #171515.

    Extracted out of #171515 as requested by
    https://github.com/llvm/llvm-project/pull/171515#pullrequestreview-4179212219

    ---------

    Co-authored-by: Tarun Prabhu <[email protected]>

commit f3da7b92f7f87f2fb9c4174fe75fae70da132157
Author: David Green <[email protected]>
Date:   Tue May 19 09:35:40 2026 +0100

    [MIPS][GlobalISel] Remove dependency on legal ruleset (#197379)

    This fills in always legal rules, to remove the dependency on the legacy
    rul…
@ronlieb
Copy link
Copy Markdown
Contributor

ronlieb commented May 20, 2026

seeing same issues downstream merges from this PR

@mikaelholmen
Copy link
Copy Markdown
Contributor

I see some MLIR lit test flakiness with this patch.
E.g.

build-all/bin/mlir-translate -mlir-to-llvmir -split-input-file /repo/llvm/mlir/test/Target/LLVMIR/openmp-llvm.mlir

now randomly hits

mlir-translate: /repo/llvm/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:926: void llvm::OpenMPIRBuilder::finalize(Function *): Assertion `Extractor->isEligible() && "Expected OpenMP outlining to be possible!"' failed.

I have no idea if this patch is really to blame or if users of SmallPtrSet do things they shouldn't, but either way, with this patch things randomly start to fail.

I've seen the assert above and CHECK failures for at least

mlir/test/Target/LLVMIR/openmp-cli-fuse01.mlir
mlir/test/Target/LLVMIR/openmp-cli-fuse02.mlir
mlir/test/Target/LLVMIR/openmp-cli-tile02.mlir
mlir/test/Target/LLVMIR/openmp-llvm.mlir
mlir/test/Target/LLVMIR/openmp-taskloop-collapse.mlir

I suppose #198690 tries to do something about it.

@slinder1
Copy link
Copy Markdown
Contributor

I know we don't consider downstream users when making breaking changes outside of the IRs and some of the C API, but this change to the semantics of remove_if seems like it should at least get a call out in some release note or something.

Since this broke in-tree code I think we should revert now, confirm any now-broken uses are updated (seems like there is just the single use in the mlir omp code?), and then re-land. That the bug is non-deterministic means it is likely to eat a lot of developer time. I will revert it in ~1 hour from this comment unless I hear otherwise from someone.

@aengelke
Copy link
Copy Markdown
Contributor

I'm +/-0 on reverting. Given that the single problematic use in-tree already has a fix with #198690, we might just as well fix forward? If that needs more discussion, we should obviously revert. +1 to release note, remove_if is used rarely, but there's no assert to catch this, so clearly documenting this change is a good idea.

chichunchen added a commit that referenced this pull request May 20, 2026
…8690)

openmp-cli-fuse02.mlir can intermittently leave behind a dead block
after loop fusion, causing LLVM IR verification to fail.
(#197637 (comment))

This happened because `removeUnusedBlocksFromParent` queried
`BBsToErase` while using `SmallPtrSet::remove_if` on the same set, which
made the result depend on the set's internal mutation order during
removal.

This patch tracks candidate blocks by index with `SmallBitVector`
instead of mutating and querying the same `SmallPtrSet`. This also
preserves the original `BBs` order when collecting the final dead
blocks.

---

The original PR was created with assistance from Copilot. The later
analysis and fix were revised based on reviewer feedback.

Co-authored-by: @slinder1
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request May 20, 2026
…emoval (#198690)

openmp-cli-fuse02.mlir can intermittently leave behind a dead block
after loop fusion, causing LLVM IR verification to fail.
(llvm/llvm-project#197637 (comment))

This happened because `removeUnusedBlocksFromParent` queried
`BBsToErase` while using `SmallPtrSet::remove_if` on the same set, which
made the result depend on the set's internal mutation order during
removal.

This patch tracks candidate blocks by index with `SmallBitVector`
instead of mutating and querying the same `SmallPtrSet`. This also
preserves the original `BBs` order when collecting the final dead
blocks.

---

The original PR was created with assistance from Copilot. The later
analysis and fix were revised based on reviewer feedback.

Co-authored-by: @slinder1
llvm-upstreamsync Bot pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request May 20, 2026
…emoval (#198690)

openmp-cli-fuse02.mlir can intermittently leave behind a dead block
after loop fusion, causing LLVM IR verification to fail.
(llvm/llvm-project#197637 (comment))

This happened because `removeUnusedBlocksFromParent` queried
`BBsToErase` while using `SmallPtrSet::remove_if` on the same set, which
made the result depend on the set's internal mutation order during
removal.

This patch tracks candidate blocks by index with `SmallBitVector`
instead of mutating and querying the same `SmallPtrSet`. This also
preserves the original `BBs` order when collecting the final dead
blocks.

---

The original PR was created with assistance from Copilot. The later
analysis and fix were revised based on reviewer feedback.

Co-authored-by: @slinder1
cpullvm-upstream-sync Bot pushed a commit to navaneethshan/cpullvm-toolchain-1 that referenced this pull request May 20, 2026
…emoval (#198690)

openmp-cli-fuse02.mlir can intermittently leave behind a dead block
after loop fusion, causing LLVM IR verification to fail.
(llvm/llvm-project#197637 (comment))

This happened because `removeUnusedBlocksFromParent` queried
`BBsToErase` while using `SmallPtrSet::remove_if` on the same set, which
made the result depend on the set's internal mutation order during
removal.

This patch tracks candidate blocks by index with `SmallBitVector`
instead of mutating and querying the same `SmallPtrSet`. This also
preserves the original `BBs` order when collecting the final dead
blocks.

---

The original PR was created with assistance from Copilot. The later
analysis and fix were revised based on reviewer feedback.

Co-authored-by: @slinder1
suryajasper pushed a commit to suryajasper/rocm-llvm-project that referenced this pull request May 20, 2026
slinder1 added a commit that referenced this pull request May 21, 2026
This is essentially post-commit review for #198690 which was landed
quickly to fix nondeterminism in tests introduced in #197637

Change-Id: Ib3603ef3c70dde5bb22d0fc04d9249e62ecccf0c
Co-authored-by: @Meinersbur
Co-authored-by: @chichunchen
sbryngelson pushed a commit to sbryngelson/llvm-project that referenced this pull request May 21, 2026
SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R.  `erase` opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot.  The scan stops at the next empty bucket, which is guaranteed to
exist.

`remove_if` clears matches in a single pass then calls `Grow` at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

Mirrors the small-mode change in 42c3edb, which dropped tombstones
from the inline array and likewise had `erase` invalidate iterators.
`erase` now physically relocates following entries, so it invalidates
iterators and references in large mode too.

Linear probing is vulnerable to primary clustering. Therefore, we lower
the max load factor to 2/3, slightly faster than the current 3/4.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 25, 2026
DenseMap uses quadratic probing with lazy deletion. Tombstones occupy
a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637.

While this removes tombstone keys, `erase` now relocates following live
entries and therefore invalidates their iterators and references.

For callers that cache pointers into the bucket array, erase(Key,
OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted
bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
`ValueHandleBase::RemoveFromUseList` uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in `AddToUseList`.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 25, 2026
DenseMap uses quadratic probing with lazy deletion. Tombstones occupy
a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637.

While this removes tombstone keys, `erase` now relocates following live
entries and therefore invalidates their iterators and references.

For callers that cache pointers into the bucket array, erase(Key,
OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted
bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
`ValueHandleBase::RemoveFromUseList` uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in `AddToUseList`.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 25, 2026
DenseMap uses quadratic probing with lazy deletion. Tombstones occupy
a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637.

While this removes tombstone keys, `erase` now relocates following live
entries and therefore invalidates their iterators and references.

For callers that cache pointers into the bucket array, erase(Key,
OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted
bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
`ValueHandleBase::RemoveFromUseList` uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in `AddToUseList`.

Aided by Claude Opus 4.7
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 25, 2026
DenseMap uses quadratic probing with lazy deletion. Tombstones occupy
a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637.

While this removes tombstone keys, `erase` now relocates following live
entries and therefore invalidates their iterators and references.

For callers that cache pointers into the bucket array, erase(Key,
OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted
bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
`ValueHandleBase::RemoveFromUseList` uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in `AddToUseList`.

Aided by Claude Opus 4.7
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 25, 2026
DenseMap uses quadratic probing with lazy deletion. Tombstones occupy
a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637.

While this removes tombstone keys, `erase` now relocates following live
entries and therefore invalidates their iterators and references.

For callers that cache pointers into the bucket array, erase(Key,
OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted
bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
`ValueHandleBase::RemoveFromUseList` uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in `AddToUseList`.

Linear probing is vulnerable to primary clustering, so a stronger mixer
for `DenseMapInfo<T*>::getHashValue` is needed (llvm#197390).

Lots of non-core cleanups are handled by Claude Opus 4.7
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 25, 2026
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every lookup must skip past.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the
tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
gap, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in AddToUseList.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 25, 2026
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every lookup must skip past.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the
tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
gap, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in AddToUseList.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 26, 2026
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every lookup must skip past.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the
tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
gap, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in AddToUseList.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 26, 2026
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every lookup must skip past.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the
tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
gap, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in AddToUseList.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 26, 2026
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every lookup must skip past.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the
tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
gap, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr, mirroring the post-grow fix-up in AddToUseList.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 26, 2026
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes
the tombstone state entirely.

Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp:
62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap
specializations have pointer keys and 16-byte key/value pairs.
Alternatives such as Robin Hood hashing, Verstable, and Boost's
unordered_flat_map are evaluated - slower with larger code footprint.

In exchange, erase now relocates the following live entries to close the
hole, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 26, 2026
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes
the tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
hole, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp:
62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap
specializations have pointer keys and 16-byte key/value pairs.
Alternatives such as Robin Hood hashing, Verstable, and Boost's
unordered_flat_map are evaluated - they are slower but with larger code
footprint.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request May 27, 2026
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes
the tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
hole, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp:
62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap
specializations have pointer keys and 16-byte key/value pairs.
Alternatives such as Robin Hood hashing, Verstable, and Boost's
unordered_flat_map were evaluated; they are slower and have a larger
code footprint. I believe the current in-band sentinel value approach,
despite the pain (llvm#146595), is the best, or at least very difficult to
beat.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit that referenced this pull request May 28, 2026
…9615)

DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change #197637. This removes
the tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
hole, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once
per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from #197390.

Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp:
62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap
specializations have pointer keys and 16-byte key/value pairs.
Alternatives such as Robin Hood hashing, Verstable, and Boost's
unordered_flat_map were evaluated; they are slower and have a larger
code footprint. I believe the current in-band sentinel value approach,
despite the pain (#146595), is the best, or at least very difficult to
beat.

Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit that referenced this pull request May 30, 2026
…0595)

DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change #197637. This removes
the tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
hole, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once
per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from #197390.

Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp:
62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap
specializations have pointer keys and 16-byte key/value pairs.
Alternatives such as Robin Hood hashing, Verstable, and Boost's
unordered_flat_map were evaluated; they are slower and have a larger
code footprint. I believe the current in-band sentinel value approach,
despite the pain (#146595), is the best, or at least very difficult to
beat.

---

This is a pure reland of #199615 (reverted in #200421) after fixing
the PoisoningVH bug by #200540.
Non-core cleanups aided by Claude Opus 4.7.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Jun 7, 2026
… for CTT)

Squash of the landed SmallPtrSet/DenseMap optimization series plus the
IR/Analysis getTombstoneKey/getEmptyKey cleanups, on top of the pre-SmallPtrSet
baseline (parent of llvm#197637), so the compile-time tracker measures the whole
series' cumulative impact as one from->to.

The ADT DenseMapInfo getEmptyKey/getTombstoneKey definitions are intentionally
kept: the primitive/generic specializations have direct callers tree-wide
(e.g. CodeView TypeIndex.h, TableGen), so their removal needs the whole-tree
caller cleanup that is out of scope here.

Includes:
  llvm#197637 [SmallPtrSet] Drop tombstones in large mode
  llvm#198982 Don't assume non-erased DenseMap entries remain valid after erase
  llvm#199369 [DenseMap] Invalidate iterators on erase
  llvm#200540 [IR] Fix PoisoningVH relocation of a poisoned handle
  llvm#200595 [DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R
  llvm#200958 [IR][Analysis] Remove unused DenseMapInfo::getTombstoneKey
  llvm#201281 [DenseMap] Store occupancy in a packed used-bit array
  llvm#201742 [DenseMap] Fix ubsan error after llvm#201281
  llvm#201997 [IR][Analysis] Remove unused DenseMapInfo::getEmptyKey

Not for merge; measurement aid only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants