Skip to content

[ENH]: Create collection impl in rust sysdb#6131

Merged
sanketkedia merged 8 commits intomainfrom
01-07-_enh_create_collection_impl_in_rust_sysdb
Jan 14, 2026
Merged

[ENH]: Create collection impl in rust sysdb#6131
sanketkedia merged 8 commits intomainfrom
01-07-_enh_create_collection_impl_in_rust_sysdb

Conversation

@sanketkedia
Copy link
Copy Markdown
Contributor

@sanketkedia sanketkedia commented Jan 8, 2026

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Create collection impl with all the shenanigans
  • New functionality
    • ...

Test plan

How are these changes tested?
Added tests

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None

Observability plan

None

Documentation Changes

None

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 8, 2026

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Copy Markdown
Contributor Author

sanketkedia commented Jan 8, 2026

@propel-code-bot
Copy link
Copy Markdown
Contributor

propel-code-bot bot commented Jan 8, 2026

Implement Rust SysDB collection creation flow

Adds full collection creation support to the Rust SysDB/Spanner backend. The change introduces a transactional create_collection implementation with validation, segment and metadata persistence, get-or-create semantics, and conversion utilities, along with the gRPC surface, request/response plumbing, and extensive emulator-backed tests.

Key Changes

• Implemented SpannerBackend::create_collection to validate input, resolve database IDs, handle get-or-create flows, buffer mutations for collections/segments/metadata/cursors, and materialize Collection results inside the transaction.
• Expanded rust-sysdb types with owned request structs (CreateCollectionRequest plus existing requests), enriched SysDbError variants, added SpannerRows to Collection, and introduced CreateCollectionResponse conversions.
• Updated Runnable/Backend APIs to consume requests by value, wired the gRPC create_collection endpoint in server.rs, and added comprehensive integration tests covering duplicate, get_or_create, metadata/schema, and error scenarios.
• Included supporting adjustments such as chrono dependency, metadata/schema parsing, and helper utilities for building test segments and verifying returned collections.

Affected Areas

rust/rust-sysdb/src/spanner.rs
rust/rust-sysdb/src/types.rs
rust/rust-sysdb/src/backend.rs
rust/rust-sysdb/src/server.rs
rust/types/src/collection.rs
rust/rust-sysdb/src/tests

This summary was automatically generated by @propel-code-bot

chroma_types::MetadataValue::Int(i) => (None, Some(*i), None, None),
chroma_types::MetadataValue::Float(f) => (None, None, Some(*f), None),
chroma_types::MetadataValue::Bool(b) => (None, None, None, Some(*b)),
chroma_types::MetadataValue::SparseVector(_) => continue, // Not supported
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] This creates a consistency issue where SparseVector metadata is present in the returned Collection object (cloned from request) but silently dropped from persistence. Subsequent reads will return incomplete metadata.

If SparseVector is not supported, return an error instead of silently ignoring it.

Context for Agents
This creates a consistency issue where `SparseVector` metadata is present in the returned `Collection` object (cloned from request) but silently dropped from persistence. Subsequent reads will return incomplete metadata.

If `SparseVector` is not supported, return an error instead of silently ignoring it.

File: rust/rust-sysdb/src/spanner.rs
Line: 530

}
}

let now_timestamp_us = chrono::Utc::now().timestamp_micros();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] Using chrono::Utc::now() for created_at and updated_at relies on the client's clock and is inconsistent with create_tenant/create_database which use PENDING_COMMIT_TIMESTAMP(). This could lead to clock skew issues or inconsistencies in updated_at ordering.

Consider using google_cloud_spanner::value::CommitTimestamp::new() which serializes to spanner.commit_timestamp() to ensure causal consistency.

Context for Agents
Using `chrono::Utc::now()` for `created_at` and `updated_at` relies on the client's clock and is inconsistent with `create_tenant`/`create_database` which use `PENDING_COMMIT_TIMESTAMP()`. This could lead to clock skew issues or inconsistencies in `updated_at` ordering.

Consider using `google_cloud_spanner::value::CommitTimestamp::new()` which serializes to `spanner.commit_timestamp()` to ensure causal consistency.

File: rust/rust-sysdb/src/spanner.rs
Line: 428

version: version.map(|v| v as i32).unwrap_or(0),
total_records_post_compaction: total_records_post_compaction
.map(|v| v as u64)
.unwrap_or(0),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we expect this field to be Some(x) all the time?

should we silently default this to 0?

}

// Check if collection with same name exists in this database
let mut check_stmt = Statement::new(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like you can merge this check with the one on 418 no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean with the id check?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@sanketkedia sanketkedia force-pushed the 01-07-_enh_create_collection_impl_in_rust_sysdb branch 2 times, most recently from a5e3758 to 621573e Compare January 9, 2026 21:17
@sanketkedia sanketkedia force-pushed the 01-07-_enh_spanner_collection_and_segments_schemas branch from 8943cfb to 118bdf5 Compare January 9, 2026 21:25
@sanketkedia sanketkedia force-pushed the 01-07-_enh_create_collection_impl_in_rust_sysdb branch from 621573e to 9e6f199 Compare January 9, 2026 21:25
@sanketkedia sanketkedia force-pushed the 01-07-_enh_spanner_collection_and_segments_schemas branch from 118bdf5 to 432297c Compare January 9, 2026 22:19
@sanketkedia sanketkedia force-pushed the 01-07-_enh_create_collection_impl_in_rust_sysdb branch from 9e6f199 to 1b859f3 Compare January 9, 2026 22:19
Comment on lines +91 to +96
SysDbError::SchemaMissing(_) => ErrorCodes::Internal,
SysDbError::InvalidSchemaJson(_) => ErrorCodes::Internal,
SysDbError::InvalidSegment(e) => e.code(),
SysDbError::InvalidSegmentsCount => ErrorCodes::Internal,
SysDbError::InvalidMetadata(e) => e.code(),
SysDbError::InvalidDimension(_) => ErrorCodes::Internal,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] Mapping validation errors (SchemaMissing, InvalidSchemaJson, InvalidDimension, InvalidSegmentsCount) to ErrorCodes::Internal will cause the API to return 500 errors for bad user input.

Map these to ErrorCodes::InvalidArgument to ensure correct 400-level HTTP responses.

Suggested change
SysDbError::SchemaMissing(_) => ErrorCodes::Internal,
SysDbError::InvalidSchemaJson(_) => ErrorCodes::Internal,
SysDbError::InvalidSegment(e) => e.code(),
SysDbError::InvalidSegmentsCount => ErrorCodes::Internal,
SysDbError::InvalidMetadata(e) => e.code(),
SysDbError::InvalidDimension(_) => ErrorCodes::Internal,
SysDbError::SchemaMissing(_) => ErrorCodes::InvalidArgument,
SysDbError::InvalidSchemaJson(_) => ErrorCodes::InvalidArgument,
SysDbError::InvalidSegment(e) => e.code(),
SysDbError::InvalidSegmentsCount => ErrorCodes::InvalidArgument,
SysDbError::InvalidMetadata(e) => e.code(),
SysDbError::InvalidDimension(_) => ErrorCodes::InvalidArgument,
Context for Agents
Mapping validation errors (`SchemaMissing`, `InvalidSchemaJson`, `InvalidDimension`, `InvalidSegmentsCount`) to `ErrorCodes::Internal` will cause the API to return 500 errors for bad user input.

Map these to `ErrorCodes::InvalidArgument` to ensure correct 400-level HTTP responses.

```suggestion
            SysDbError::SchemaMissing(_) => ErrorCodes::InvalidArgument,
            SysDbError::InvalidSchemaJson(_) => ErrorCodes::InvalidArgument,
            SysDbError::InvalidSegment(e) => e.code(),
            SysDbError::InvalidSegmentsCount => ErrorCodes::InvalidArgument,
            SysDbError::InvalidMetadata(e) => e.code(),
            SysDbError::InvalidDimension(_) => ErrorCodes::InvalidArgument,
```

File: rust/types/src/sysdb_errors.rs
Line: 96

@sanketkedia sanketkedia force-pushed the 01-07-_enh_spanner_collection_and_segments_schemas branch from 432297c to 50a6033 Compare January 9, 2026 23:02
@sanketkedia sanketkedia force-pushed the 01-07-_enh_create_collection_impl_in_rust_sysdb branch 2 times, most recently from 83936fc to 0197b36 Compare January 9, 2026 23:35
@sanketkedia sanketkedia force-pushed the 01-07-_enh_create_collection_impl_in_rust_sysdb branch from 0197b36 to ecca61f Compare January 11, 2026 21:25
@sanketkedia sanketkedia force-pushed the 01-07-_enh_spanner_collection_and_segments_schemas branch from 2398ba0 to 3890bf5 Compare January 12, 2026 15:49
@sanketkedia sanketkedia force-pushed the 01-07-_enh_create_collection_impl_in_rust_sysdb branch 3 times, most recently from 272af49 to 545ac09 Compare January 13, 2026 02:20
@sanketkedia sanketkedia requested a review from rescrv as a code owner January 13, 2026 02:20
@sanketkedia sanketkedia changed the base branch from 01-07-_enh_spanner_collection_and_segments_schemas to main January 13, 2026 02:20
@sanketkedia sanketkedia force-pushed the 01-07-_enh_create_collection_impl_in_rust_sysdb branch from 545ac09 to b42abb8 Compare January 13, 2026 17:20
Comment on lines +683 to +688
SysDbError::SchemaMissing(_) => ErrorCodes::Internal,
SysDbError::InvalidSchemaJson(_) => ErrorCodes::Internal,
SysDbError::InvalidSegment(e) => e.code(),
SysDbError::InvalidSegmentsCount => ErrorCodes::Internal,
SysDbError::InvalidMetadata(e) => e.code(),
SysDbError::InvalidDimension(_) => ErrorCodes::Internal,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] Validation errors caused by invalid user input (SchemaMissing, InvalidSchemaJson, InvalidSegmentsCount, InvalidDimension) should map to ErrorCodes::InvalidArgument (400) rather than ErrorCodes::Internal (500). This ensures the client receives the correct status code for bad requests.

Context for Agents
Validation errors caused by invalid user input (`SchemaMissing`, `InvalidSchemaJson`, `InvalidSegmentsCount`, `InvalidDimension`) should map to `ErrorCodes::InvalidArgument` (400) rather than `ErrorCodes::Internal` (500). This ensures the client receives the correct status code for bad requests.

File: rust/rust-sysdb/src/types.rs
Line: 688

@sanketkedia sanketkedia force-pushed the 01-07-_enh_create_collection_impl_in_rust_sysdb branch from 448dadd to 9723f17 Compare January 13, 2026 23:40
@sanketkedia sanketkedia merged commit ddb54af into main Jan 14, 2026
128 of 130 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants