[MOD-13911] Add GC logic for NumericRangeTree#8293
Conversation
e31551b to
3afb760
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8293 +/- ##
==========================================
+ Coverage 82.53% 82.64% +0.11%
==========================================
Files 417 418 +1
Lines 59759 60029 +270
Branches 17815 18085 +270
==========================================
+ Hits 49320 49611 +291
+ Misses 10256 10232 -24
- Partials 183 186 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1c84fcd to
9db43f7
Compare
716ca3d to
2762c4d
Compare
1b9cd2f to
0aef10b
Compare
2762c4d to
130f854
Compare
0aef10b to
ad6b3fe
Compare
130f854 to
0be66c9
Compare
ad6b3fe to
8bef85f
Compare
0be66c9 to
ce4e0bc
Compare
8bef85f to
a099cbf
Compare
ce4e0bc to
1b5c31b
Compare
a099cbf to
67a358b
Compare
90c6ed2 to
e84db9a
Compare
67a358b to
937e999
Compare
|
84f9171 to
2bfad4a
Compare
010f7d0 to
e7bee69
Compare
2bfad4a to
21e0009
Compare
e7bee69 to
f685066
Compare
21e0009 to
b6fbd83
Compare
76cfb77 to
b4bfcfb
Compare
| /// Apply garbage collection deltas to this index. | ||
| /// | ||
| /// Consumes the `delta` and returns information about what changed. | ||
| pub fn apply_gc(&mut self, delta: inverted_index::GcScanDelta) -> inverted_index::GcApplyInfo { |
There was a problem hiding this comment.
Is there a specific reason not to put apply_gc and scan_gc in gc.rs?
There was a problem hiding this comment.
I kept them in index since they just forward to the underlying methods with the same name. I've tried to avoid littering the rest of the crate with match statements over the compressed/uncompressed variants.
There was a problem hiding this comment.
In practice we could have moved this into tree/gc.rs right?
impl NumericIndex {
// apply_gc
// scan_gc
}
``
There was a problem hiding this comment.
Yes, that would be possible since both enum variants are public.
| NumericRangeNode::Leaf(leaf) => { | ||
| rv.num_leaves_delta -= 1; | ||
| rv.size_delta -= leaf.range.memory_usage() as i64; | ||
| rv.num_records_delta -= leaf.range.num_entries() as i32; |
There was a problem hiding this comment.
is rv.num_records_delta used by the caller?
There was a problem hiding this comment.
It isn't used on the GC path.
We could remove it, by creating a distinct stat-tracking output type for trim_empty_leaves rather than reusing AddResult.
b4bfcfb to
747de96
Compare
| #[cfg(all(feature = "unittest", not(miri)))] | ||
| mod invariants; | ||
|
|
||
| pub use gc::{NodeGcDelta, SingleNodeGcResult}; |
There was a problem hiding this comment.
CompactIfSparseResult not re-exported from crate
Low Severity
CompactIfSparseResult is the return type of the public method compact_if_sparse() on NumericRangeTree, but it's not re-exported from tree/mod.rs or lib.rs. Meanwhile, NodeGcDelta and SingleNodeGcResult (return types of other public GC methods) are properly re-exported. External callers can use the returned value via type inference but cannot name the type, which is inconsistent with the other GC types.
Additional Locations (1)
❌ Jit Scanner failed - Our team is investigatingJit Scanner failed - Our team has been notified and is working to resolve the issue. Please contact support if you have any questions. 💡 Need to bypass this check? Comment |
|
I've switched all stats operations to use |
95e3628 to
772285d
Compare
| /// Apply garbage collection deltas to this index. | ||
| /// | ||
| /// Consumes the `delta` and returns information about what changed. | ||
| pub fn apply_gc(&mut self, delta: inverted_index::GcScanDelta) -> inverted_index::GcApplyInfo { |
There was a problem hiding this comment.
In practice we could have moved this into tree/gc.rs right?
impl NumericIndex {
// apply_gc
// scan_gc
}
``
| NumericRangeNode::Leaf(leaf) => { | ||
| rv.num_leaves_delta -= 1; | ||
| rv.size_delta -= leaf.range.memory_usage() as i64; | ||
| rv.num_records_delta -= leaf.range.num_entries() as i32; |
| /// The change in the tree's node memory usage, in bytes. | ||
| /// Positive values indicate growth, negative values indicate shrinkage. | ||
| pub node_size_delta: i64, | ||
| } |
There was a problem hiding this comment.
CompactIfSparseResult not re-exported from crate
Low Severity
CompactIfSparseResult is the return type of pub fn compact_if_sparse on NumericRangeTree, but it's not re-exported from tree/mod.rs or lib.rs. The gc module is private, so external callers can invoke the method and access fields via type inference, but cannot name the type explicitly. The other GC result types (NodeGcDelta, SingleNodeGcResult, TrimEmptyLeavesResult) are all properly re-exported — this one appears to be an oversight.
Additional Locations (2)
54d2f47 to
6b3d750
Compare
6b3d750 to
92713bf
Compare
|





Describe the changes in the pull request
Prepare
NumericRangeTreefor GC integration.This PR adds logic to:
The GC relies on the index of a node inside the slab as the node identifier. This differs from the existing C implementation, which relies on the memory address of the node as its identity (i.e. it passes a pointer over from the child to the parent GC process).
Stacked on top of #8292.
Which additional issues this PR fixes
Main objects this PR modified
Mark if applicable
Release Notes
If a release note is required (bug fix / new feature / enhancement), describe the user impact of this PR in the title.
Note
Medium Risk
Touches core GC and tree-mutation paths (stats accounting, node removal, slab compaction, pointer remapping), which could impact correctness or memory usage if edge cases are missed, though it is heavily covered by new tests.
Overview
Adds garbage-collection support for
NumericRangeTree. The tree can now accept per-node GC deltas (NodeGcDelta) and apply them viaapply_gc_to_node, updating entry/memory stats, tracking newly-empty leaves, and repairing HLL cardinality (including post-fork writes and ignored-last-block scenarios).Introduces maintenance operations to reclaim memory:
trim_empty_leavesprunes empty subtrees (freeing internal retained ranges when applicable) andcompact_if_sparsecompacts the node slab when many leaves are empty, remapping node indices safely.Includes related robustness fixes:
GcApplyInfois nowCopy/Clone/Default,InvertedIndex::apply_gccorrectly setsignored_last_blockeven when no last-block delta exists, numeric index exposesscan_gc/apply_gchelpers, and extensive integration/property tests were added to cover GC, trimming, compaction, and last-block edge cases.Written by Cursor Bugbot for commit 92713bf. This will update automatically on new commits. Configure here.