[ENH] Wire up quantized reader in new orchestrator#6409
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
ff6baf8 to
9a26a8f
Compare
be396bb to
06f50fa
Compare
This comment has been minimized.
This comment has been minimized.
06f50fa to
50beaff
Compare
|
Quantized SPANN KNN orchestrator with load/merge pipeline wiring Adds a dedicated quantized SPANN KNN orchestrator and operator set that load the quantized reader, navigate cluster centers, fetch clusters, and run a bruteforce step before merging with log-derived distances. The segment reader API was expanded with Key Changes• Introduced Possible Issues• No automated tests cover the new quantized orchestration path, so regressions (e.g., misordered pipeline, dedup edge cases) will go unnoticed This summary was automatically generated by @propel-code-bot |
9a26a8f to
a601921
Compare
3f3ff93 to
ce47c0b
Compare
This comment has been minimized.
This comment has been minimized.
80a5af5 to
514208a
Compare
2acf8f0 to
588f737
Compare
514208a to
939cc61
Compare
rust/worker/src/execution/operators/quantized_spann_navigate.rs
Outdated
Show resolved
Hide resolved
|
|
||
| // State tracking. | ||
| num_bruteforces: Option<usize>, | ||
| records: Vec<Vec<RecordMeasure>>, |
There was a problem hiding this comment.
TODO: bruteforce_results
There was a problem hiding this comment.
shall we knock this rename out?
939cc61 to
833b8cc
Compare
b64d5d2 to
9debc15
Compare
rust/worker/src/execution/operators/quantized_spann_load_cluster.rs
Outdated
Show resolved
Hide resolved
96c6e0e to
1a1b6e8
Compare
7cefa67 to
8f4b1ec
Compare
| let versions = | ||
| try_join_all(cluster.ids.iter().map(|&id| self.reader.get_version(id))).await?; | ||
|
|
||
| let global_versions = cluster | ||
| .ids | ||
| .iter() | ||
| .copied() | ||
| .zip(versions) | ||
| .collect::<HashMap<_, _>>(); |
There was a problem hiding this comment.
[Logic] The new load‑cluster path now assumes every ID in the cluster has an associated version row. try_join_all(cluster.ids.iter().map(|&id| self.reader.get_version(id))) will now return an error as soon as one get_version call returns None, which bubbles up as a QuantizedSpannLoadClusterError and aborts the entire query. In the previous in‑reader bruteforce implementation we deliberately tolerated missing versions by flattening them away (future::try_join_all(...).await?.into_iter().flatten()), so stale WAL entries or partially compacted points were simply skipped instead of failing the request. This regression means a single dangling ID can now make every quantized query return 500.
Please make get_version return an Option<u32> (or interpret the "version not found" case as a skip) and only insert IDs whose version is present when building global_versions, restoring the old behavior of ignoring stale points rather than treating them as fatal.
Context for Agents
The new load‑cluster path now assumes every ID in the cluster has an associated version row. `try_join_all(cluster.ids.iter().map(|&id| self.reader.get_version(id)))` will now return an error as soon as one `get_version` call returns `None`, which bubbles up as a `QuantizedSpannLoadClusterError` and aborts the entire query. In the previous in‑reader `bruteforce` implementation we deliberately tolerated missing versions by flattening them away (`future::try_join_all(...).await?.into_iter().flatten()`), so stale WAL entries or partially compacted points were simply skipped instead of failing the request. This regression means a single dangling ID can now make every quantized query return `500`.
Please make `get_version` return an `Option<u32>` (or interpret the "version not found" case as a skip) and only insert IDs whose version is present when building `global_versions`, restoring the old behavior of ignoring stale points rather than treating them as fatal.
File: rust/worker/src/execution/operators/quantized_spann_load_cluster.rs
Line: 61
Merge activity
|
8f4b1ec to
dafbf0f
Compare
- **[ENH]: Cache rust git submodules in mounted volume (#6424)** - **[CHORE](k8s) increase dev CPU limits from 100m to 200-300m (#6435)** - **[ENH] replace live cloud tests with k8s integration tests (#6434)** - **[ENH] Make dirty_log_collections metric mcmr-aware. (#6353)** - **[ENH] Quantized Spann Segment Writer (#6397)** - **[ENH] Wire up quantized writer in compaction (#6399)** - **[ENH] Quantized Spann Segment Reader (#6405)** - **[ENH] Wire up quantized reader in new orchestrator (#6409)** - **[ENH] Garbage collect usearch index files (#6416)** - **[ENH] Trace quantized spann implementation (#6425)** - **[ENH]: Precompute data chunk len() (#6442)** - **[BUG]: Compaction version file flush was incomplete on MCMR (#6423)** - **[DOC]: Fixed broken links in Readme (#6440)** - **[DOC] Fix link to Rust documentation (#6443)** - **[ENH]: Allow users to disable FTS in schema (#6214)** --------- Co-authored-by: Robert Escriva <[email protected]> Co-authored-by: Macronova <[email protected]> Co-authored-by: Nilpotent <[email protected]> Co-authored-by: anderk222 <[email protected]> Co-authored-by: Sanket Kedia <[email protected]>

Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?