[BUG]: Compaction version file flush was incomplete on MCMR#6423
[BUG]: Compaction version file flush was incomplete on MCMR#6423tanujnay112 merged 1 commit intomainfrom
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Fix MCMR version file metadata population and validation Updates Key Changes• Refactored Possible Issues• Strict This summary was automatically generated by @propel-code-bot |
| if collection_info.database_name != collection.database { | ||
| tracing::error!( | ||
| expected_database_name = %collection.database, | ||
| version_file_database_name = %collection_info.database_name, | ||
| "database name mismatch" | ||
| ); | ||
| return Err(VersionFileError::ValidationFailed( | ||
| "database name mismatch".to_string(), | ||
| )); | ||
| } |
There was a problem hiding this comment.
[Logic] This new database-name validation will reject every legacy version file written before database_name was populated (the field defaulted to an empty string). As soon as this lands, fetch/upload will start failing with ValidationFailed for all existing collections even when the database actually matches. Treat missing values as "unknown" rather than an error so older files remain readable.
| if collection_info.database_name != collection.database { | |
| tracing::error!( | |
| expected_database_name = %collection.database, | |
| version_file_database_name = %collection_info.database_name, | |
| "database name mismatch" | |
| ); | |
| return Err(VersionFileError::ValidationFailed( | |
| "database name mismatch".to_string(), | |
| )); | |
| } | |
| if !collection_info.database_name.is_empty() | |
| && collection_info.database_name != collection.database | |
| { | |
| tracing::error!( | |
| expected_database_name = %collection.database, | |
| version_file_database_name = %collection_info.database_name, | |
| "database name mismatch" | |
| ); | |
| return Err(VersionFileError::ValidationFailed( | |
| "database name mismatch".to_string(), | |
| )); | |
| } |
Context for Agents
This new database-name validation will reject every legacy version file written before `database_name` was populated (the field defaulted to an empty string). As soon as this lands, `fetch`/`upload` will start failing with `ValidationFailed` for all existing collections even when the database actually matches. Treat missing values as "unknown" rather than an error so older files remain readable.
```suggestion
if !collection_info.database_name.is_empty()
&& collection_info.database_name != collection.database
{
tracing::error!(
expected_database_name = %collection.database,
version_file_database_name = %collection_info.database_name,
"database name mismatch"
);
return Err(VersionFileError::ValidationFailed(
"database name mismatch".to_string(),
));
}
```
File: rust/segment/src/version_file.rs
Line: 247There was a problem hiding this comment.
that's ok, there are no legacy collections on mcmr
This comment has been minimized.
This comment has been minimized.
970e74f to
7f4e3de
Compare
7f4e3de to
979d016
Compare
| segments: vec![ | ||
| Segment { | ||
| id: SegmentUuid(Uuid::new_v4()), | ||
| r#type: SegmentType::BlockfileMetadata, | ||
| scope: SegmentScope::METADATA, | ||
| collection: collection_id, | ||
| file_path: HashMap::new(), | ||
| metadata: None, | ||
| }, | ||
| Segment { | ||
| id: SegmentUuid(Uuid::new_v4()), | ||
| r#type: SegmentType::BlockfileRecord, | ||
| scope: SegmentScope::RECORD, | ||
| collection: collection_id, | ||
| file_path: HashMap::new(), | ||
| metadata: None, | ||
| }, | ||
| Segment { | ||
| id: segment_uuid, | ||
| r#type: SegmentType::HnswDistributed, | ||
| scope: SegmentScope::VECTOR, | ||
| collection: collection_id, | ||
| file_path: HashMap::new(), | ||
| metadata: None, | ||
| }, | ||
| ], |
There was a problem hiding this comment.
It's not obvious to me why we're changing this. Can you say why?
979d016 to
3b7dab1
Compare
3b7dab1 to
b50f25d
Compare
- **[ENH]: Cache rust git submodules in mounted volume (#6424)** - **[CHORE](k8s) increase dev CPU limits from 100m to 200-300m (#6435)** - **[ENH] replace live cloud tests with k8s integration tests (#6434)** - **[ENH] Make dirty_log_collections metric mcmr-aware. (#6353)** - **[ENH] Quantized Spann Segment Writer (#6397)** - **[ENH] Wire up quantized writer in compaction (#6399)** - **[ENH] Quantized Spann Segment Reader (#6405)** - **[ENH] Wire up quantized reader in new orchestrator (#6409)** - **[ENH] Garbage collect usearch index files (#6416)** - **[ENH] Trace quantized spann implementation (#6425)** - **[ENH]: Precompute data chunk len() (#6442)** - **[BUG]: Compaction version file flush was incomplete on MCMR (#6423)** - **[DOC]: Fixed broken links in Readme (#6440)** - **[DOC] Fix link to Rust documentation (#6443)** - **[ENH]: Allow users to disable FTS in schema (#6214)** --------- Co-authored-by: Robert Escriva <[email protected]> Co-authored-by: Macronova <[email protected]> Co-authored-by: Nilpotent <[email protected]> Co-authored-by: anderk222 <[email protected]> Co-authored-by: Sanket Kedia <[email protected]>

Description of changes
A few fields in the CollectionInfo part of version files were not correctly filled in during MCMR compaction flush. This diff fixes that.
Test plan
Tests have been edited to test for the corrected fields.
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?