Skip to content

[DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R#200595

Merged
MaskRay merged 1 commit into
llvm:mainfrom
MaskRay:users/MaskRay/densemap-backshift-reland-1
May 30, 2026
Merged

[DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R#200595
MaskRay merged 1 commit into
llvm:mainfrom
MaskRay:users/MaskRay/densemap-backshift-reland-1

Conversation

@MaskRay
Copy link
Copy Markdown
Member

@MaskRay MaskRay commented May 30, 2026

DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change #197637. This removes
the tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
hole, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from #197390.

Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp:
62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap
specializations have pointer keys and 16-byte key/value pairs.
Alternatives such as Robin Hood hashing, Verstable, and Boost's
unordered_flat_map were evaluated; they are slower and have a larger
code footprint. I believe the current in-band sentinel value approach,
despite the pain (#146595), is the best, or at least very difficult to
beat.


This is a pure reland of #199615 (reverted in #200421) after fixing
the PoisoningVH bug by #200540.
Non-core cleanups aided by Claude Opus 4.7.

DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes
the tombstone state entirely.

In exchange, erase now relocates the following live entries to close the
hole, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr.

Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from llvm#197390.

Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp:
62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap
specializations have pointer keys and 16-byte key/value pairs.
Alternatives such as Robin Hood hashing, Verstable, and Boost's
unordered_flat_map were evaluated; they are slower and have a larger
code footprint. I believe the current in-band sentinel value approach,
despite the pain (llvm#146595), is the best, or at least very difficult to
beat.

---

This is a pure reland of llvm#199615 (reverted in llvm#200421) after fixing
the PoisoningVH bug by llvm#200540.
Non-core cleanups aided by Claude Opus 4.7.
@MaskRay
Copy link
Copy Markdown
Member Author

MaskRay commented May 31, 2026

@boomanaiden154 Mind running your internal TAP Global Presubmit with #200540 and #200595 applied? Just want to make sure we're in the clear before landing the stacked DenseMap cleanups :)

@MaskRay MaskRay deleted the users/MaskRay/densemap-backshift-reland-1 branch May 31, 2026 21:32
@boomanaiden154
Copy link
Copy Markdown
Contributor

Mind running your internal TAP Global Presubmit with #200540 and #200595 applied? Just want to make sure we're in the clear before landing the stacked DenseMap cleanups

Now that everything is merged, I think I'll just wait for things to go through the integrate/release process. I don't think I'll be able to start a TGP any more quickly. I'm working on some tooling that would automate the process, but haven't gotten it finished yet.

MaskRay added a commit that referenced this pull request Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 2, 2026
#200634)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 2, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 2, 2026
…00956)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 2, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 2, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit that referenced this pull request Jun 2, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
yingopq pushed a commit to yingopq/llvm-project that referenced this pull request Jun 5, 2026
llvm#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
yingopq pushed a commit to yingopq/llvm-project that referenced this pull request Jun 5, 2026
…vm#200956)

llvm#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
yingopq pushed a commit to yingopq/llvm-project that referenced this pull request Jun 5, 2026
llvm#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
yingopq pushed a commit to yingopq/llvm-project that referenced this pull request Jun 5, 2026
llvm#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
yingopq pushed a commit to yingopq/llvm-project that referenced this pull request Jun 5, 2026
llvm#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Jun 7, 2026
… for CTT)

Squash of the landed SmallPtrSet/DenseMap optimization series plus the
IR/Analysis getTombstoneKey/getEmptyKey cleanups, on top of the pre-SmallPtrSet
baseline (parent of llvm#197637), so the compile-time tracker measures the whole
series' cumulative impact as one from->to.

The ADT DenseMapInfo getEmptyKey/getTombstoneKey definitions are intentionally
kept: the primitive/generic specializations have direct callers tree-wide
(e.g. CodeView TypeIndex.h, TableGen), so their removal needs the whole-tree
caller cleanup that is out of scope here.

Includes:
  llvm#197637 [SmallPtrSet] Drop tombstones in large mode
  llvm#198982 Don't assume non-erased DenseMap entries remain valid after erase
  llvm#199369 [DenseMap] Invalidate iterators on erase
  llvm#200540 [IR] Fix PoisoningVH relocation of a poisoned handle
  llvm#200595 [DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R
  llvm#200958 [IR][Analysis] Remove unused DenseMapInfo::getTombstoneKey
  llvm#201281 [DenseMap] Store occupancy in a packed used-bit array
  llvm#201742 [DenseMap] Fix ubsan error after llvm#201281
  llvm#201997 [IR][Analysis] Remove unused DenseMapInfo::getEmptyKey

Not for merge; measurement aid only.
MaskRay added a commit that referenced this pull request Jun 8, 2026
…201755)

DenseMap no longer use in-band sentinel keys. (#200595 and #201281).
Update the GDB pretty printer and LLDB data formatters to test the used
bit rather than comparing keys.

GDB: advancePastEmptyBuckets relied on DenseMapInfo::getEmptyKey(),
which could not be evaluated in GDB and so was disabled, leaving the printer
to emit empty and erased buckets. It now walks bucket indices and skips any
whose used bit is clear.

LLDB: DenseMapSynthetic used a key-uniqueness heuristic to guess which
buckets were live, which mishandled a lone erased bucket (hence the
former tombstones=1 summary note). It now reads the used array directly,
so erased entries are skipped exactly. NumTombstones no longer exists,
so drop it from the summary.

Written by Claude Opus 4.8

---------

Co-authored-by: Dave Lee <[email protected]>
MaskRay added a commit that referenced this pull request Jun 8, 2026
…4 Algorithm R (#202231)

sanitizer_dense_map.h is a fork of llvm/ADT/DenseMap.h, which uses
quadratic probing with lazy deletion: an erased entry becomes a
tombstone, a third bucket state alongside empty and live that every
find/insert must inspect.

Port the upstream #200595 and getTombstoneKey() removal.

Aided by Claude Opus 4.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:adt llvm:ir skip-precommit-approval PR for CI feedback, not intended for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants