[ENH]: Create collection impl in rust sysdb#6131
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Implement Rust SysDB collection creation flow Adds full collection creation support to the Rust SysDB/Spanner backend. The change introduces a transactional Key Changes• Implemented Affected Areas• This summary was automatically generated by @propel-code-bot |
| chroma_types::MetadataValue::Int(i) => (None, Some(*i), None, None), | ||
| chroma_types::MetadataValue::Float(f) => (None, None, Some(*f), None), | ||
| chroma_types::MetadataValue::Bool(b) => (None, None, None, Some(*b)), | ||
| chroma_types::MetadataValue::SparseVector(_) => continue, // Not supported |
There was a problem hiding this comment.
[Logic] This creates a consistency issue where SparseVector metadata is present in the returned Collection object (cloned from request) but silently dropped from persistence. Subsequent reads will return incomplete metadata.
If SparseVector is not supported, return an error instead of silently ignoring it.
Context for Agents
This creates a consistency issue where `SparseVector` metadata is present in the returned `Collection` object (cloned from request) but silently dropped from persistence. Subsequent reads will return incomplete metadata.
If `SparseVector` is not supported, return an error instead of silently ignoring it.
File: rust/rust-sysdb/src/spanner.rs
Line: 530| } | ||
| } | ||
|
|
||
| let now_timestamp_us = chrono::Utc::now().timestamp_micros(); |
There was a problem hiding this comment.
[Logic] Using chrono::Utc::now() for created_at and updated_at relies on the client's clock and is inconsistent with create_tenant/create_database which use PENDING_COMMIT_TIMESTAMP(). This could lead to clock skew issues or inconsistencies in updated_at ordering.
Consider using google_cloud_spanner::value::CommitTimestamp::new() which serializes to spanner.commit_timestamp() to ensure causal consistency.
Context for Agents
Using `chrono::Utc::now()` for `created_at` and `updated_at` relies on the client's clock and is inconsistent with `create_tenant`/`create_database` which use `PENDING_COMMIT_TIMESTAMP()`. This could lead to clock skew issues or inconsistencies in `updated_at` ordering.
Consider using `google_cloud_spanner::value::CommitTimestamp::new()` which serializes to `spanner.commit_timestamp()` to ensure causal consistency.
File: rust/rust-sysdb/src/spanner.rs
Line: 428
rust/types/src/collection.rs
Outdated
| version: version.map(|v| v as i32).unwrap_or(0), | ||
| total_records_post_compaction: total_records_post_compaction | ||
| .map(|v| v as u64) | ||
| .unwrap_or(0), |
There was a problem hiding this comment.
shouldn't we expect this field to be Some(x) all the time?
should we silently default this to 0?
| } | ||
|
|
||
| // Check if collection with same name exists in this database | ||
| let mut check_stmt = Statement::new( |
There was a problem hiding this comment.
i feel like you can merge this check with the one on 418 no?
There was a problem hiding this comment.
You mean with the id check?
a5e3758 to
621573e
Compare
8943cfb to
118bdf5
Compare
621573e to
9e6f199
Compare
118bdf5 to
432297c
Compare
9e6f199 to
1b859f3
Compare
rust/types/src/sysdb_errors.rs
Outdated
| SysDbError::SchemaMissing(_) => ErrorCodes::Internal, | ||
| SysDbError::InvalidSchemaJson(_) => ErrorCodes::Internal, | ||
| SysDbError::InvalidSegment(e) => e.code(), | ||
| SysDbError::InvalidSegmentsCount => ErrorCodes::Internal, | ||
| SysDbError::InvalidMetadata(e) => e.code(), | ||
| SysDbError::InvalidDimension(_) => ErrorCodes::Internal, |
There was a problem hiding this comment.
[Logic] Mapping validation errors (SchemaMissing, InvalidSchemaJson, InvalidDimension, InvalidSegmentsCount) to ErrorCodes::Internal will cause the API to return 500 errors for bad user input.
Map these to ErrorCodes::InvalidArgument to ensure correct 400-level HTTP responses.
| SysDbError::SchemaMissing(_) => ErrorCodes::Internal, | |
| SysDbError::InvalidSchemaJson(_) => ErrorCodes::Internal, | |
| SysDbError::InvalidSegment(e) => e.code(), | |
| SysDbError::InvalidSegmentsCount => ErrorCodes::Internal, | |
| SysDbError::InvalidMetadata(e) => e.code(), | |
| SysDbError::InvalidDimension(_) => ErrorCodes::Internal, | |
| SysDbError::SchemaMissing(_) => ErrorCodes::InvalidArgument, | |
| SysDbError::InvalidSchemaJson(_) => ErrorCodes::InvalidArgument, | |
| SysDbError::InvalidSegment(e) => e.code(), | |
| SysDbError::InvalidSegmentsCount => ErrorCodes::InvalidArgument, | |
| SysDbError::InvalidMetadata(e) => e.code(), | |
| SysDbError::InvalidDimension(_) => ErrorCodes::InvalidArgument, |
Context for Agents
Mapping validation errors (`SchemaMissing`, `InvalidSchemaJson`, `InvalidDimension`, `InvalidSegmentsCount`) to `ErrorCodes::Internal` will cause the API to return 500 errors for bad user input.
Map these to `ErrorCodes::InvalidArgument` to ensure correct 400-level HTTP responses.
```suggestion
SysDbError::SchemaMissing(_) => ErrorCodes::InvalidArgument,
SysDbError::InvalidSchemaJson(_) => ErrorCodes::InvalidArgument,
SysDbError::InvalidSegment(e) => e.code(),
SysDbError::InvalidSegmentsCount => ErrorCodes::InvalidArgument,
SysDbError::InvalidMetadata(e) => e.code(),
SysDbError::InvalidDimension(_) => ErrorCodes::InvalidArgument,
```
File: rust/types/src/sysdb_errors.rs
Line: 96432297c to
50a6033
Compare
83936fc to
0197b36
Compare
0197b36 to
ecca61f
Compare
2398ba0 to
3890bf5
Compare
272af49 to
545ac09
Compare
545ac09 to
b42abb8
Compare
| SysDbError::SchemaMissing(_) => ErrorCodes::Internal, | ||
| SysDbError::InvalidSchemaJson(_) => ErrorCodes::Internal, | ||
| SysDbError::InvalidSegment(e) => e.code(), | ||
| SysDbError::InvalidSegmentsCount => ErrorCodes::Internal, | ||
| SysDbError::InvalidMetadata(e) => e.code(), | ||
| SysDbError::InvalidDimension(_) => ErrorCodes::Internal, |
There was a problem hiding this comment.
[Logic] Validation errors caused by invalid user input (SchemaMissing, InvalidSchemaJson, InvalidSegmentsCount, InvalidDimension) should map to ErrorCodes::InvalidArgument (400) rather than ErrorCodes::Internal (500). This ensures the client receives the correct status code for bad requests.
Context for Agents
Validation errors caused by invalid user input (`SchemaMissing`, `InvalidSchemaJson`, `InvalidSegmentsCount`, `InvalidDimension`) should map to `ErrorCodes::InvalidArgument` (400) rather than `ErrorCodes::Internal` (500). This ensures the client receives the correct status code for bad requests.
File: rust/rust-sysdb/src/types.rs
Line: 688448dadd to
9723f17
Compare

Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
Added tests
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
None
Observability plan
None
Documentation Changes
None