Skip to content

[BUG]: Compaction version file flush was incomplete on MCMR#6423

Merged
tanujnay112 merged 1 commit intomainfrom
compactor_fix
Feb 15, 2026
Merged

[BUG]: Compaction version file flush was incomplete on MCMR#6423
tanujnay112 merged 1 commit intomainfrom
compactor_fix

Conversation

@tanujnay112
Copy link
Copy Markdown
Contributor

@tanujnay112 tanujnay112 commented Feb 12, 2026

Description of changes

A few fields in the CollectionInfo part of version files were not correctly filled in during MCMR compaction flush. This diff fixes that.

  • Improvements & Bug fixes
    • ...
  • New functionality
    • ...

Test plan

Tests have been edited to test for the corrected fields.

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Copy Markdown

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@tanujnay112 tanujnay112 marked this pull request as ready for review February 12, 2026 09:39
@propel-code-bot
Copy link
Copy Markdown
Contributor

propel-code-bot bot commented Feb 12, 2026

Fix MCMR version file metadata population and validation

Updates VersionFileManager to validate against full Collection metadata, require callers to provide explicit version-file paths, and extend VersionFileType with explicit garbage-collection variants. The sysdb compaction flow now generates MCMR-aligned paths, populates CollectionInfoImmutable/CollectionInfoMutable fields correctly, and persists file paths in version history; accompanying unit/integration tests were rewritten to construct realistic collections, segment IDs, and assertions for the new metadata.

Key Changes

• Refactored VersionFileManager::upload/validate to accept entire Collection, enforce new version expectations, and add database-name verification; generate_file_path now derives paths from the collection object and distinguishes GarbageCollectionMark/Delete suffixes.
• Sysdb compaction now calls generate_file_path, records the path in CollectionVersionInfo.version_file_name, and copies mutable collection fields (log position, version, timestamps) into version history entries.
• Extensive test updates ensure version files mirror collection metadata (tenant/database IDs, segment IDs, version history structure), and integration tests fetch version files after flush to validate contents.

Possible Issues

• Strict database_name comparison will fail for legacy version files where the field was empty, potentially blocking fetch/upload outside MCMR clusters.
VersionFileManager::upload now returns Result<(), _> and expects callers to manage the persisted path; existing callers (beyond sysdb) may need updates.

This summary was automatically generated by @propel-code-bot

Comment on lines +238 to +247
if collection_info.database_name != collection.database {
tracing::error!(
expected_database_name = %collection.database,
version_file_database_name = %collection_info.database_name,
"database name mismatch"
);
return Err(VersionFileError::ValidationFailed(
"database name mismatch".to_string(),
));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical

[Logic] This new database-name validation will reject every legacy version file written before database_name was populated (the field defaulted to an empty string). As soon as this lands, fetch/upload will start failing with ValidationFailed for all existing collections even when the database actually matches. Treat missing values as "unknown" rather than an error so older files remain readable.

Suggested change
if collection_info.database_name != collection.database {
tracing::error!(
expected_database_name = %collection.database,
version_file_database_name = %collection_info.database_name,
"database name mismatch"
);
return Err(VersionFileError::ValidationFailed(
"database name mismatch".to_string(),
));
}
if !collection_info.database_name.is_empty()
&& collection_info.database_name != collection.database
{
tracing::error!(
expected_database_name = %collection.database,
version_file_database_name = %collection_info.database_name,
"database name mismatch"
);
return Err(VersionFileError::ValidationFailed(
"database name mismatch".to_string(),
));
}
Context for Agents
This new database-name validation will reject every legacy version file written before `database_name` was populated (the field defaulted to an empty string). As soon as this lands, `fetch`/`upload` will start failing with `ValidationFailed` for all existing collections even when the database actually matches. Treat missing values as "unknown" rather than an error so older files remain readable.

```suggestion
        if !collection_info.database_name.is_empty()
            && collection_info.database_name != collection.database
        {
            tracing::error!(
                expected_database_name = %collection.database,
                version_file_database_name = %collection_info.database_name,
                "database name mismatch"
            );
            return Err(VersionFileError::ValidationFailed(
                "database name mismatch".to_string(),
            ));
        }
```


File: rust/segment/src/version_file.rs
Line: 247

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's ok, there are no legacy collections on mcmr

@blacksmith-sh

This comment has been minimized.

@tanujnay112 tanujnay112 force-pushed the compactor_fix branch 2 times, most recently from 970e74f to 7f4e3de Compare February 13, 2026 20:01
@tanujnay112 tanujnay112 requested a review from rescrv February 13, 2026 21:01
Comment on lines -1774 to -1799
segments: vec![
Segment {
id: SegmentUuid(Uuid::new_v4()),
r#type: SegmentType::BlockfileMetadata,
scope: SegmentScope::METADATA,
collection: collection_id,
file_path: HashMap::new(),
metadata: None,
},
Segment {
id: SegmentUuid(Uuid::new_v4()),
r#type: SegmentType::BlockfileRecord,
scope: SegmentScope::RECORD,
collection: collection_id,
file_path: HashMap::new(),
metadata: None,
},
Segment {
id: segment_uuid,
r#type: SegmentType::HnswDistributed,
scope: SegmentScope::VECTOR,
collection: collection_id,
file_path: HashMap::new(),
metadata: None,
},
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me why we're changing this. Can you say why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@tanujnay112 tanujnay112 merged commit dbc1f33 into main Feb 15, 2026
67 checks passed
tanujnay112 added a commit that referenced this pull request Feb 18, 2026
- **[ENH]: Cache rust git submodules in mounted volume (#6424)**
- **[CHORE](k8s) increase dev CPU limits from 100m to 200-300m (#6435)**
- **[ENH] replace live cloud tests with k8s integration tests (#6434)**
- **[ENH] Make dirty_log_collections metric mcmr-aware. (#6353)**
- **[ENH] Quantized Spann Segment Writer (#6397)**
- **[ENH] Wire up quantized writer in compaction (#6399)**
- **[ENH] Quantized Spann Segment Reader (#6405)**
- **[ENH] Wire up quantized reader in new orchestrator (#6409)**
- **[ENH] Garbage collect usearch index files (#6416)**
- **[ENH] Trace quantized spann implementation (#6425)**
- **[ENH]: Precompute data chunk len() (#6442)**
- **[BUG]: Compaction version file flush was incomplete on MCMR
(#6423)**
- **[DOC]: Fixed broken links in Readme (#6440)**
- **[DOC] Fix link to Rust documentation (#6443)**
- **[ENH]: Allow users to disable FTS in schema (#6214)**

---------

Co-authored-by: Robert Escriva <[email protected]>
Co-authored-by: Macronova <[email protected]>
Co-authored-by: Nilpotent <[email protected]>
Co-authored-by: anderk222 <[email protected]>
Co-authored-by: Sanket Kedia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants