Skip to content

[ENH]: Spanner collection and segments Schemas#6123

Merged
sanketkedia merged 10 commits intomainfrom
01-07-_enh_spanner_collection_and_segments_schemas
Jan 12, 2026
Merged

[ENH]: Spanner collection and segments Schemas#6123
sanketkedia merged 10 commits intomainfrom
01-07-_enh_spanner_collection_and_segments_schemas

Conversation

@sanketkedia
Copy link
Copy Markdown
Contributor

@sanketkedia sanketkedia commented Jan 7, 2026

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • create the non-region and region specific collection tables
    • create the segment table which is region specific
    • interleaved them so that they are colocated thus making joins efficient
    • Note: Did not create segment_metadata table as I feel we don't use it anywhere after collection_schema. Will double check later and add it if needed
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None

Observability plan

None

Documentation Changes

None

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2026

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Copy Markdown
Contributor Author

sanketkedia commented Jan 7, 2026

@sanketkedia sanketkedia marked this pull request as ready for review January 7, 2026 20:21
@propel-code-bot
Copy link
Copy Markdown
Contributor

propel-code-bot bot commented Jan 7, 2026

It also introduces region-specific compaction cursor and collection metadata schemas, surfaces errors when decoding tenant resource names, and wires up the SysDB server stub for incrementing the compaction failure counter while leaving the handler unimplemented.

This summary was automatically generated by @propel-code-bot

Copy link
Copy Markdown
Contributor

@tanujnay112 tanujnay112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index name seems fishy

Comment on lines +19 to +20
) PRIMARY KEY (collection_id, region),
INTERLEAVE IN PARENT collections
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] The INTERLEAVE IN PARENT clause is missing ON DELETE CASCADE. This prevents deleting a collections row if it has associated compaction cursors. If the intention is to allow hard deletes of collections (e.g., via garbage collection), you must manually delete all children first without this clause.

This same pattern applies to migrations/0007-create_segments.spanner.sql (segments in cursors) and migrations/0009-create_collection_metadata.spanner.sql (metadata in collections).

Consider adding ON DELETE CASCADE to all interleaved tables to simplify the deletion lifecycle.

Context for Agents
The `INTERLEAVE IN PARENT` clause is missing `ON DELETE CASCADE`. This prevents deleting a `collections` row if it has associated compaction cursors. If the intention is to allow hard deletes of collections (e.g., via garbage collection), you must manually delete all children first without this clause.

This same pattern applies to `migrations/0007-create_segments.spanner.sql` (segments in cursors) and `migrations/0009-create_collection_metadata.spanner.sql` (metadata in collections).

Consider adding `ON DELETE CASCADE` to all interleaved tables to simplify the deletion lifecycle.

File: rust/spanner-migrations/migrations/0006-create_collection_compaction_cursors.spanner.sql
Line: 20

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok, I'll pay the cost of maintaining the integrity at my layer

updated_at TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
file_paths JSON
) PRIMARY KEY (collection_id, region, id),
INTERLEAVE IN PARENT collection_compaction_cursors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Performance] Interleaving collection_segments in collection_compaction_cursors creates a significant risk of lock contention.

Since collection_compaction_cursors tracks mutable state (like last_compacted_offset), its rows will likely be updated frequently by compaction jobs. Inserting rows into collection_segments requires a lock on the parent row to verify existence. This means ingestion (adding segments) and compaction (updating cursor) will contend for locks on the same cursor row, potentially serializing operations per region.

Recommendation: Interleave collection_segments directly in collections.

With PRIMARY KEY (collection_id, region, id), the segments will still be physically colocated by region (due to lexicographical key ordering), achieving the same query efficiency without coupling ingestion to compaction state stability.

Context for Agents
Interleaving `collection_segments` in `collection_compaction_cursors` creates a significant risk of lock contention.

Since `collection_compaction_cursors` tracks mutable state (like `last_compacted_offset`), its rows will likely be updated frequently by compaction jobs. Inserting rows into `collection_segments` requires a lock on the parent row to verify existence. This means ingestion (adding segments) and compaction (updating cursor) will contend for locks on the same cursor row, potentially serializing operations per region.

**Recommendation:** Interleave `collection_segments` directly in `collections`.

With `PRIMARY KEY (collection_id, region, id)`, the segments will still be physically colocated by region (due to lexicographical key ordering), achieving the same query efficiency without coupling ingestion to compaction state stability.

File: rust/spanner-migrations/migrations/0007-create_segments.spanner.sql
Line: 16

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compaction updates both this table and the parent table

@sanketkedia sanketkedia force-pushed the 01-07-_enh_spanner_collection_and_segments_schemas branch from 8943cfb to 118bdf5 Compare January 9, 2026 21:25
is_deleted BOOL DEFAULT (false),
created_at TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
updated_at TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true)
) PRIMARY KEY (collection_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended

[Reliability] Consider defining Foreign Key constraints for database_id and tenant_id to enforce referential integrity at the database level. Without them, orphaned records could exist if parent entities are deleted (unless strict soft-delete logic is guaranteed at the application layer).

Context for Agents
Consider defining Foreign Key constraints for `database_id` and `tenant_id` to enforce referential integrity at the database level. Without them, orphaned records could exist if parent entities are deleted (unless strict soft-delete logic is guaranteed at the application layer).

File: rust/spanner-migrations/migrations/0004-create_collections.spanner.sql
Line: 13

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pay write time overhead esp if it can involve a network RTT in distributed sql such as spanner?

name STRING(MAX) NOT NULL,
dimension INT64,
database_id STRING(MAX) NOT NULL,
database_name STRING(MAX) NOT NULL,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] Storing database_name here creates a denormalized dependency. If a database is renamed in the source-of-truth databases table, this column will become stale unless explicitly synchronized.

Unless this is a specific read-optimization trade-off, consider removing this column and joining with the databases table on database_id when the name is needed.

Context for Agents
Storing `database_name` here creates a denormalized dependency. If a database is renamed in the source-of-truth `databases` table, this column will become stale unless explicitly synchronized. 

Unless this is a specific read-optimization trade-off, consider removing this column and joining with the `databases` table on `database_id` when the name is needed.

File: rust/spanner-migrations/migrations/0004-create_collections.spanner.sql
Line: 8

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we don't support renaming databases. Also our http routes are defined in terms of db names so there's a tight coupling and non trivial overhead to renaming databases anyways. This is done here as. a read-time optimization to avoid joining with the database table

@sanketkedia sanketkedia force-pushed the 01-07-_enh_spanner_collection_and_segments_schemas branch from 118bdf5 to 432297c Compare January 9, 2026 22:19
@sanketkedia sanketkedia changed the base branch from 01-06-_enh_database_and_tenant_crud_enhancements to graphite-base/6123 January 9, 2026 22:46
@sanketkedia sanketkedia changed the base branch from graphite-base/6123 to main January 9, 2026 22:49
Comment on lines +200 to +207
let mut name_check_stmt = Statement::new(
"SELECT id FROM databases WHERE name = @name AND tenant_id = @tenant_id AND is_deleted = FALSE",
);
name_check_stmt.add_param("name", &db_name);
name_check_stmt.add_param("tenant_id", &tenant_id);

let mut name_iter = tx.query(name_check_stmt).await?;
if name_iter.next().await?.is_some() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] Similar to create_tenant, this logic fails if a database with the same name was previously soft-deleted.

The uniqueness check filters out is_deleted = TRUE rows, but the INSERT will violate the unique index on (tenant_id, name) (and the PK on id if reused) because the index includes soft-deleted rows.

Handle the conflict by checking for existing soft-deleted rows and deciding whether to revive them or return a specific error.

Context for Agents
Similar to `create_tenant`, this logic fails if a database with the same name was previously soft-deleted.

The uniqueness check filters out `is_deleted = TRUE` rows, but the `INSERT` will violate the unique index on `(tenant_id, name)` (and the PK on `id` if reused) because the index includes soft-deleted rows.

Handle the conflict by checking for existing soft-deleted rows and deciding whether to revive them or return a specific error.

File: rust/rust-sysdb/src/spanner.rs
Line: 207

Comment on lines 86 to 92

let mut iter = tx.query(check_stmt).await?;

// If tenant doesn't exist, insert it using DML
// If tenant doesn't exist, insert it otherwise ignore for idempotency
// Set last_compaction_time to Unix epoch (0) by default
if iter.next().await?.is_none() {
let mut insert_stmt = Statement::new(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] This creates a potential bug where creating a tenant that was previously soft-deleted will fail with a database constraint violation (PK violation) instead of handling it gracefully.

The check is_deleted = FALSE ignores soft-deleted rows, but the subsequent INSERT will collide with the existing soft-deleted row's primary key id.

Consider checking for the row existence regardless of is_deleted status, and then either reviving the tenant (UPDATE is_deleted = false) or returning a specific error.

Context for Agents
This creates a potential bug where creating a tenant that was previously soft-deleted will fail with a database constraint violation (PK violation) instead of handling it gracefully.

The check `is_deleted = FALSE` ignores soft-deleted rows, but the subsequent `INSERT` will collide with the existing soft-deleted row's primary key `id`.

Consider checking for the row existence regardless of `is_deleted` status, and then either reviving the tenant (UPDATE `is_deleted` = false) or returning a specific error.

File: rust/rust-sysdb/src/spanner.rs
Line: 92

@sanketkedia sanketkedia force-pushed the 01-07-_enh_spanner_collection_and_segments_schemas branch from 432297c to 50a6033 Compare January 9, 2026 23:02
@blacksmith-sh

This comment has been minimized.

@sanketkedia sanketkedia mentioned this pull request Jan 11, 2026
1 task
@sanketkedia sanketkedia force-pushed the 01-07-_enh_spanner_collection_and_segments_schemas branch from 2398ba0 to 3890bf5 Compare January 12, 2026 15:49
size_bytes_post_compaction INT64 DEFAULT (0),
num_versions INT64 DEFAULT (0),
version_file_name STRING(MAX),
last_compaction_time_secs TIMESTAMP OPTIONS (allow_commit_timestamp=true),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended

[Maintainability] The column name last_compaction_time_secs implies an integer value representing seconds (e.g., Unix epoch), but the type is TIMESTAMP with allow_commit_timestamp=true. This naming convention is misleading as TIMESTAMP in Spanner is a specific data type, not just a count of seconds.

Consider renaming to last_compaction_time or last_compaction_at to accurately reflect the data type and avoid confusion.

Context for Agents
The column name `last_compaction_time_secs` implies an integer value representing seconds (e.g., Unix epoch), but the type is `TIMESTAMP` with `allow_commit_timestamp=true`. This naming convention is misleading as `TIMESTAMP` in Spanner is a specific data type, not just a count of seconds.

Consider renaming to `last_compaction_time` or `last_compaction_at` to accurately reflect the data type and avoid confusion.

File: rust/spanner-migrations/migrations/0005-create_collection_compaction_cursors.spanner.sql
Line: 14

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All it means is that I am keeping time in secs since epoch

Comment on lines +484 to +489
async fn increment_compaction_failure_count(
&self,
_request: Request<IncrementCompactionFailureCountRequest>,
) -> Result<Response<IncrementCompactionFailureCountResponse>, Status> {
todo!()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Reliability] Using todo!() will cause the server to panic if this endpoint is triggered. For unimplemented gRPC methods, it's safer to return a Status::unimplemented error code. This ensures the service remains stable even if the endpoint is called accidentally.

[FileRule: rust/language_specific.instructions.rust]

Context for Agents
Using `todo!()` will cause the server to panic if this endpoint is triggered. For unimplemented gRPC methods, it's safer to return a `Status::unimplemented` error code. This ensures the service remains stable even if the endpoint is called accidentally.

[FileRule: rust/language_specific.instructions.rust]

File: rust/rust-sysdb/src/server.rs
Line: 489

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal

@sanketkedia sanketkedia merged commit f990c89 into main Jan 12, 2026
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants