Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Precompute Adds a cached Key Changes• Extended Possible Issues• If any consumer mutates This summary was automatically generated by @propel-code-bot |
| /// # Arguments | ||
| /// * `visibility` - A vector of boolean values indicating the visibility of the elements | ||
| pub fn set_visibility(&mut self, visibility: Vec<bool>) { | ||
| self.visible_count = visibility.iter().filter(|&v| *v).count(); |
There was a problem hiding this comment.
[Logic] The method documentation explicitly states that visibility length should match the data chunk length. However, this is not enforced, which could lead to inconsistent state where len() (based on visible_count) disagrees with iter().count() (based on data.len() and visibility) if a longer visibility vector is passed.
Add an assertion to enforce this invariant and prevent subtle bugs.
| self.visible_count = visibility.iter().filter(|&v| *v).count(); | |
| assert_eq!(visibility.len(), self.data.len(), "Visibility vector length must match data length"); | |
| self.visible_count = visibility.iter().filter(|&v| *v).count(); |
Context for Agents
The method documentation explicitly states that `visibility` length should match the data chunk length. However, this is not enforced, which could lead to inconsistent state where `len()` (based on `visible_count`) disagrees with `iter().count()` (based on `data.len()` and `visibility`) if a longer visibility vector is passed.
Add an assertion to enforce this invariant and prevent subtle bugs.
```suggestion
assert_eq!(visibility.len(), self.data.len(), "Visibility vector length must match data length");
self.visible_count = visibility.iter().filter(|&v| *v).count();
```
File: rust/types/src/data_chunk.rs
Line: 81- **[ENH]: Cache rust git submodules in mounted volume (#6424)** - **[CHORE](k8s) increase dev CPU limits from 100m to 200-300m (#6435)** - **[ENH] replace live cloud tests with k8s integration tests (#6434)** - **[ENH] Make dirty_log_collections metric mcmr-aware. (#6353)** - **[ENH] Quantized Spann Segment Writer (#6397)** - **[ENH] Wire up quantized writer in compaction (#6399)** - **[ENH] Quantized Spann Segment Reader (#6405)** - **[ENH] Wire up quantized reader in new orchestrator (#6409)** - **[ENH] Garbage collect usearch index files (#6416)** - **[ENH] Trace quantized spann implementation (#6425)** - **[ENH]: Precompute data chunk len() (#6442)** - **[BUG]: Compaction version file flush was incomplete on MCMR (#6423)** - **[DOC]: Fixed broken links in Readme (#6440)** - **[DOC] Fix link to Rust documentation (#6443)** - **[ENH]: Allow users to disable FTS in schema (#6214)** --------- Co-authored-by: Robert Escriva <[email protected]> Co-authored-by: Macronova <[email protected]> Co-authored-by: Nilpotent <[email protected]> Co-authored-by: anderk222 <[email protected]> Co-authored-by: Sanket Kedia <[email protected]>

Description of changes
DataChunk.len()used to count the number ofvisibility[i] = Truevalues to compute the length(). This is inefficient to compute on every invocation. The metadata log reader has a loop that invokes that invokes thislen()function for every fetched log record. This appears to continue to take up > 40% of CPU on stack traces during gets on a high number of log records. This change precomputes this length value to fix this perf issue.Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the _docs section?_