[ENH] Garbage collect usearch index files#6416
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Garbage collector now cleans up USearch index artifacts Extends GC operators so USearch/Spann index binaries are discovered, preserved when needed, and deleted when unused. Adds Key Changes• Updated Possible Issues• No automated tests cover the USearch GC paths, so regressions (e.g., changing key formatting) would go unnoticed. This summary was automatically generated by @propel-code-bot |
This comment has been minimized.
This comment has been minimized.
34250b1 to
da75d4e
Compare
| // For usearch index files, add the single .bin file directly. | ||
| if file_type == QUANTIZED_SPANN_RAW_CENTROID | ||
| || file_type == QUANTIZED_SPANN_QUANTIZED_CENTROID | ||
| { | ||
| let quantized = file_type == QUANTIZED_SPANN_QUANTIZED_CENTROID; | ||
| for file_path in file_paths.paths.iter() { | ||
| let (prefix, id) = | ||
| Segment::extract_prefix_and_id(file_path).map_err(|e| { | ||
| tracing::error!(error = %e, "Failed to extract prefix and ID"); | ||
| ComputeUnusedFilesError::InvalidUuid(e, file_path.to_string()) | ||
| })?; | ||
| let s3_key = | ||
| USearchIndex::format_storage_key(prefix, IndexUuid(id), quantized); | ||
| unused_hnsw_prefixes.push(s3_key); | ||
| } | ||
| continue; | ||
| } |
There was a problem hiding this comment.
[Testing] The new SPANN handling in compute_unused_between_successive_versions isn’t covered by any tests—the existing suites only exercise sparse index blocks and HNSW prefixes. Without a regression test we can’t detect if future refactors stop emitting the formatted .bin key, which would silently leak storage. Please extend the tests in this file to create a CollectionSegmentInfo containing a QUANTIZED_SPANN_* entry and assert that compute_unused_between_successive_versions returns the expected USearchIndex::format_storage_key(...) value in unused_hnsw_prefixes.
Context for Agents
The new SPANN handling in `compute_unused_between_successive_versions` isn’t covered by any tests—the existing suites only exercise sparse index blocks and HNSW prefixes. Without a regression test we can’t detect if future refactors stop emitting the formatted `.bin` key, which would silently leak storage. Please extend the tests in this file to create a `CollectionSegmentInfo` containing a `QUANTIZED_SPANN_*` entry and assert that `compute_unused_between_successive_versions` returns the expected `USearchIndex::format_storage_key(...)` value in `unused_hnsw_prefixes`.
File: rust/garbage_collector/src/operators/compute_unused_files.rs
Line: 123da75d4e to
dc0d345
Compare
d067206 to
2acf8f0
Compare
dc0d345 to
a6a9ce3
Compare
2acf8f0 to
588f737
Compare
This comment has been minimized.
This comment has been minimized.
588f737 to
b64d5d2
Compare
a6a9ce3 to
e43785c
Compare
b64d5d2 to
9debc15
Compare
e3c0a2b to
eb342bb
Compare
eb342bb to
e1fb908
Compare
96c6e0e to
1a1b6e8
Compare
This comment has been minimized.
This comment has been minimized.
1a1b6e8 to
7cefa67
Compare
e1fb908 to
5769085
Compare
7cefa67 to
8f4b1ec
Compare
5769085 to
def34fc
Compare
Merge activity
|
- **[ENH]: Cache rust git submodules in mounted volume (#6424)** - **[CHORE](k8s) increase dev CPU limits from 100m to 200-300m (#6435)** - **[ENH] replace live cloud tests with k8s integration tests (#6434)** - **[ENH] Make dirty_log_collections metric mcmr-aware. (#6353)** - **[ENH] Quantized Spann Segment Writer (#6397)** - **[ENH] Wire up quantized writer in compaction (#6399)** - **[ENH] Quantized Spann Segment Reader (#6405)** - **[ENH] Wire up quantized reader in new orchestrator (#6409)** - **[ENH] Garbage collect usearch index files (#6416)** - **[ENH] Trace quantized spann implementation (#6425)** - **[ENH]: Precompute data chunk len() (#6442)** - **[BUG]: Compaction version file flush was incomplete on MCMR (#6423)** - **[DOC]: Fixed broken links in Readme (#6440)** - **[DOC] Fix link to Rust documentation (#6443)** - **[ENH]: Allow users to disable FTS in schema (#6214)** --------- Co-authored-by: Robert Escriva <[email protected]> Co-authored-by: Macronova <[email protected]> Co-authored-by: Nilpotent <[email protected]> Co-authored-by: anderk222 <[email protected]> Co-authored-by: Sanket Kedia <[email protected]>

s## Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?