[DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R#200595
Merged
MaskRay merged 1 commit intoMay 30, 2026
Conversation
DenseMap uses quadratic probing with lazy deletion: an erased entry becomes a tombstone, a third bucket state alongside empty and live that every find/insert must inspect. Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4 Algorithm R), similar to the SmallPtrSet change llvm#197637. This removes the tombstone state entirely. In exchange, erase now relocates the following live entries to close the hole, so it invalidates iterators and references other than the erased one. For callers that cache pointers into the bucket array, erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries). ValueHandleBase::RemoveFromUseList uses this to refresh each moved handle's PrevPtr. Linear probing is more vulnerable to primary clustering than quadratic probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue mixer from llvm#197390. Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp: 62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap specializations have pointer keys and 16-byte key/value pairs. Alternatives such as Robin Hood hashing, Verstable, and Boost's unordered_flat_map were evaluated; they are slower and have a larger code footprint. I believe the current in-band sentinel value approach, despite the pain (llvm#146595), is the best, or at least very difficult to beat. --- This is a pure reland of llvm#199615 (reverted in llvm#200421) after fixing the PoisoningVH bug by llvm#200540. Non-core cleanups aided by Claude Opus 4.7.
This was referenced May 31, 2026
Member
Author
|
@boomanaiden154 Mind running your internal TAP Global Presubmit with #200540 and #200595 applied? Just want to make sure we're in the clear before landing the stacked DenseMap cleanups :) |
Contributor
Now that everything is merged, I think I'll just wait for things to go through the integrate/release process. I don't think I'll be able to start a TGP any more quickly. I'm working on some tooling that would automate the process, but haven't gotten it finished yet. |
MaskRay
added a commit
that referenced
this pull request
Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
MaskRay
added a commit
that referenced
this pull request
Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
MaskRay
added a commit
that referenced
this pull request
Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
MaskRay
added a commit
that referenced
this pull request
Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
MaskRay
added a commit
that referenced
this pull request
Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
This was referenced Jun 1, 2026
MaskRay
added a commit
that referenced
this pull request
Jun 1, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
MaskRay
added a commit
that referenced
this pull request
Jun 2, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
MaskRay
added a commit
that referenced
this pull request
Jun 2, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
MaskRay
added a commit
that referenced
this pull request
Jun 2, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
MaskRay
added a commit
that referenced
this pull request
Jun 2, 2026
#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
yingopq
pushed a commit
to yingopq/llvm-project
that referenced
this pull request
Jun 5, 2026
llvm#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
yingopq
pushed a commit
to yingopq/llvm-project
that referenced
this pull request
Jun 5, 2026
…vm#200956) llvm#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
yingopq
pushed a commit
to yingopq/llvm-project
that referenced
this pull request
Jun 5, 2026
llvm#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
yingopq
pushed a commit
to yingopq/llvm-project
that referenced
this pull request
Jun 5, 2026
llvm#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
yingopq
pushed a commit
to yingopq/llvm-project
that referenced
this pull request
Jun 5, 2026
llvm#200595 changed DenseMap to no longer create tombstone buckets, so DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead definitions and dead tombstone branches.
This was referenced Jun 5, 2026
MaskRay
added a commit
to MaskRay/llvm-project
that referenced
this pull request
Jun 7, 2026
… for CTT) Squash of the landed SmallPtrSet/DenseMap optimization series plus the IR/Analysis getTombstoneKey/getEmptyKey cleanups, on top of the pre-SmallPtrSet baseline (parent of llvm#197637), so the compile-time tracker measures the whole series' cumulative impact as one from->to. The ADT DenseMapInfo getEmptyKey/getTombstoneKey definitions are intentionally kept: the primitive/generic specializations have direct callers tree-wide (e.g. CodeView TypeIndex.h, TableGen), so their removal needs the whole-tree caller cleanup that is out of scope here. Includes: llvm#197637 [SmallPtrSet] Drop tombstones in large mode llvm#198982 Don't assume non-erased DenseMap entries remain valid after erase llvm#199369 [DenseMap] Invalidate iterators on erase llvm#200540 [IR] Fix PoisoningVH relocation of a poisoned handle llvm#200595 [DenseMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R llvm#200958 [IR][Analysis] Remove unused DenseMapInfo::getTombstoneKey llvm#201281 [DenseMap] Store occupancy in a packed used-bit array llvm#201742 [DenseMap] Fix ubsan error after llvm#201281 llvm#201997 [IR][Analysis] Remove unused DenseMapInfo::getEmptyKey Not for merge; measurement aid only.
MaskRay
added a commit
that referenced
this pull request
Jun 8, 2026
…201755) DenseMap no longer use in-band sentinel keys. (#200595 and #201281). Update the GDB pretty printer and LLDB data formatters to test the used bit rather than comparing keys. GDB: advancePastEmptyBuckets relied on DenseMapInfo::getEmptyKey(), which could not be evaluated in GDB and so was disabled, leaving the printer to emit empty and erased buckets. It now walks bucket indices and skips any whose used bit is clear. LLDB: DenseMapSynthetic used a key-uniqueness heuristic to guess which buckets were live, which mishandled a lone erased bucket (hence the former tombstones=1 summary note). It now reads the used array directly, so erased entries are skipped exactly. NumTombstones no longer exists, so drop it from the summary. Written by Claude Opus 4.8 --------- Co-authored-by: Dave Lee <[email protected]>
MaskRay
added a commit
that referenced
this pull request
Jun 8, 2026
…4 Algorithm R (#202231) sanitizer_dense_map.h is a fork of llvm/ADT/DenseMap.h, which uses quadratic probing with lazy deletion: an erased entry becomes a tombstone, a third bucket state alongside empty and live that every find/insert must inspect. Port the upstream #200595 and getTombstoneKey() removal. Aided by Claude Opus 4.8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DenseMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.
Switch to linear probing with backward-shift deletion (Knuth TAOCP 6.4
Algorithm R), similar to the SmallPtrSet change #197637. This removes
the tombstone state entirely.
In exchange, erase now relocates the following live entries to close the
hole, so it invalidates iterators and references other than the erased
one. For callers that cache pointers into the bucket array,
erase(Key, OnMoved) and erase(iterator, OnMoved) fire a callback once per
shifted bucket, so fix-ups cost O(cluster) rather than O(NumEntries).
ValueHandleBase::RemoveFromUseList uses this to refresh each moved
handle's PrevPtr.
Linear probing is more vulnerable to primary clustering than quadratic
probing, so this relies on the stronger DenseMapInfo<T*>::getHashValue
mixer from #197390.
Operation distribution when compiling CGExpr.cpp/ScalarEvolution.cpp:
62.8% lookups, 34.3% inserts, 2.9% erases. The heaviest DenseMap
specializations have pointer keys and 16-byte key/value pairs.
Alternatives such as Robin Hood hashing, Verstable, and Boost's
unordered_flat_map were evaluated; they are slower and have a larger
code footprint. I believe the current in-band sentinel value approach,
despite the pain (#146595), is the best, or at least very difficult to
beat.
This is a pure reland of #199615 (reverted in #200421) after fixing
the PoisoningVH bug by #200540.
Non-core cleanups aided by Claude Opus 4.7.