[BFTree] Add RangeIndex cluster migration#1731
Conversation
61a467b to
3a6e52f
Compare
3a6e52f to
dc78c02
Compare
74564ec to
74a1e44
Compare
5754d5d to
ac06150
Compare
Missing RangeIndex feature-enabled guardsThis PR doesn't consistently check whether the RangeIndex feature is enabled (i.e., whether 1. Source side —
|
vazois
left a comment
There was a problem hiding this comment.
libs/cluster/Server/Migration/MigrateScanFunctions.cs:53 — maybe not for this PR, but can't we just get these values from an enum?
vazois
left a comment
There was a problem hiding this comment.
libs/cluster/Server/Migration/MigrateSessionKeys.cs:43 — seems really wasteful to maintain a separate dictionary just for skipping keys. We can potentially store the info for the key type inline and skip the key once we read it from the sketch list
Co-authored-by: Copilot <[email protected]>
An accidental undo removed the trailing 'return true;' after the ReceivingFileData try/catch, causing case fall-through (CS0163) and a non-returning path (CS0161). Restore it: when the file is not yet fully received, the case falls past the try and returns true to await more chunks. Co-authored-by: Copilot <[email protected]>
Cover the two remaining deserializer Error paths that lacked tests: split file-size header (fewer than 8 bytes at WaitingForFileHeader) and a trailer chunk smaller than the 12-byte hash+stubLen header. Co-authored-by: Copilot <[email protected]>
…pose throws fileStream.Dispose() can throw; previously that would skip the temp snapshot file deletion (leaking it) and propagate out of Dispose. Wrap the stream disposal in try/catch (log + null in finally) so the temp-file cleanup always runs and Dispose never throws. Co-authored-by: Copilot <[email protected]>
…Reader ReadNextChunkAsync was untested. Add a ChunkDriver-parameterized round-trip body shared between the serializer test helper and the real async RangeIndexMigrationReader, covering single/multi-chunk, key-larger-than-chunk, small-chunk, and exact-fill shapes against both drivers. Plus reader-specific truncation (zero-byte read throws) and cancellation tests. Reuses one assertion body to avoid duplication. Co-authored-by: Copilot <[email protected]>
Clarify how a caller knows serialization is done: loop while IsComplete is false, send each chunk; IsComplete flips true after the final bytes are framed, and a 0 return while not complete signals a too-small buffer. Matches the TransmitRangeIndexAsync caller. Co-authored-by: Copilot <[email protected]>
A chunk smaller than the trailer ([8-byte hash][4-byte stubLen][stub]) can never be framed, so the stream would never complete. Add a public MinChunkSize const and throw ArgumentOutOfRangeException for any smaller chunkSize. Add Reader_ChunkSizeBelowMinimumThrows test. Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
The minimum chunk size is a property of the serializer's wire format (the trailer is the largest single-chunk framing element), not the reader. Define MinChunkSize on RangeIndexChunkedSerializer and have RangeIndexMigrationReader.MinChunkSize delegate to it. Co-authored-by: Copilot <[email protected]>
Below-minimum chunkSize was already covered (Reader_ChunkSizeBelowMinimumThrows). Add RoundTrip_ExactMinChunkSize to verify the boundary: a chunk of exactly MinChunkSize successfully completes a stream (trailer just fits) through both the serializer helper and the real reader. Co-authored-by: Copilot <[email protected]>
The forward-progress minimum (trailer size) applies to the destination buffer that ReadNextChunkAsync frames into, not to the chunkSize-sized internal file-read buffer. Move the MinChunkSize check into ReadNextChunkAsync (throws ArgumentException on a too-small destination), relax the constructor to require chunkSize > 0, and drop the misplaced RangeIndexMigrationReader.MinChunkSize (callers use RangeIndexChunkedSerializer.MinChunkSize). Replace the constructor test with constructor-non-positive and destination-too-small tests. Co-authored-by: Copilot <[email protected]>
The earlier doc described the pre-validation behavior (return 0 on a too-small buffer; caller treats as error). Now that a too-small destination throws ArgumentException upfront, document that the method always returns a positive count while incomplete, and drop the stale internal-read-buffer mention from the guard comment. Co-authored-by: Copilot <[email protected]>
The remarks duplicated the skip-on-existing-key behavior already evident from the method body and the PublishMigratedIndexResult enum docs. Co-authored-by: Copilot <[email protected]>
Correct the migration section (E) and its rows in section G to reflect the shipped code: receive class/file is RangeIndexMigrationReceiveState in RangeIndexMigrationReceiveSession.cs; finalization is RangeIndexManager.PublishMigratedIndex (slot IMPORTING check + KeyExists gate + replaceOption skip outcomes via PublishMigratedIndexResult), not a deserializer Publish(); reader ctor takes tempFilePath; KEYS path discovery returns keys only (no stub, TOCTOU) and skips via the rangeIndexKeysToMigrate set; single SerializedRangeIndexStream=4 wire tag; drop stale aspirational names (HandleMigratedRangeIndexChunk/Stub, MigrateSessionRangeIndex.cs, RangeIndexSnapshotChunk/RangeIndexStub). Changes remain scoped to the migration section. Co-authored-by: Copilot <[email protected]>
Replace the verbose <remarks> block on KeyExists with a one-line comment at the return explaining the Found/IsWrongType check. Co-authored-by: Copilot <[email protected]>
Cover what happens to a migrated RangeIndex tree across the destination node's storage lifecycle: checkpoint/flush of a migrated-then-modified tree, checkpoint + process restart recovery, delete (and recreate at the same key), back-migration round-trip + restart, and restart-then-continue. Co-authored-by: Copilot <[email protected]>
PublishMigratedIndex now treats only a Created RICREATE result as success; an InPlaceUpdated/CopyUpdated result means a concurrent write raced the publish (the KeyExists gate already proved nothing existed), so the recovered BfTree is discarded with a warning. Adds TODO(RI) notes for handling the IMPORTING-state write race, AOF propagation of the migrated file, MIGRATE REPLACE support, and checkpoint-vs-snapshot interaction. Co-authored-by: Copilot <[email protected]>
…rd branch Rework the IMPORTING-state race TODO for clarity, and split the non-Created RICREATE handling: an InPlaceUpdated/CopyUpdated result (a concurrent ASKING write that raced the publish) is logged but the bfTree is left intact since the updated record may now reference it; any other status logs and disposes the unregistered tree. Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Removes four cluster-migration tests that crash the test host via a native bf-tree panic (mini_page_op / circular_buffer assertion): - ClusterMigrateRangeIndexWhileModifyingAsync - ClusterMigrateRangeIndexWhileReadingAndWritingAsync - ClusterMigrateRangeIndexSlowTransmitWithConcurrentOpsAsync (drop-vs-insert use-after-free: the deferred bftree_drop on the migration delete races a concurrent InsertByPtr that runs under the shared lock but outside epoch protection) - ClusterMigrateRangeIndexLargeTree (concurrency-independent: snapshotting/dropping a large multi-page tree during migration crashes) These are preserved (marked [Explicit]) in the RangeIndex crash/recovery PR so the bugs stay tracked. The remaining migration suite now runs green. Co-authored-by: Copilot <[email protected]>
…nk count The test waited on CurrentChunkCount==1 then immediately asserted the temp snapshot file exists. But OnChunkReceived bumps the chunk count just BEFORE ProcessChunk creates the file, so under load the main thread could observe the count and check for the file in that gap, seeing 0. Wait for the temp file to appear (created by ProcessChunk, immediately before the pause point) instead, which reliably proves the worker is parked in-flight. Co-authored-by: Copilot <[email protected]>
The cluster test step ran dotnet test with no dump collection, so a native test-host crash (e.g. the RangeIndex bf-tree panic/abort that aborts the run) left no dump to analyze. Add --blame-crash (full) and --blame-hang (30m, full) so a crash or hang produces a dump in the results directory, which the existing always() upload-artifact step publishes. --blame-crash wires up the runtime createdump handler, capturing native SIGABRT crashes too. Co-authored-by: Copilot <[email protected]>
Summary
Adds end-to-end cluster migration for RangeIndex keys, supporting both
MIGRATE SLOTSandMIGRATE KEYSpaths. RangeIndex keys are backed by a native BfTree whose on-disk state (data.bftree) lives outside Tsavorite — shipping just the 51-byte stub is insufficient. This change streams the entire BfTree snapshot file alongside the stub over the existing migration transport.Architecture
RangeIndexChunkedSerializer): Pure state machine overSpan<byte>— no I/O. Takes file data as input viaMoveNext(dest, fileData, out consumed).RangeIndexMigrationReader): Async wrapper that reads the snapshot file and feeds bytes to the serializer.RangeIndexChunkedDeserializer): Sync state machine that writes received file data to a temp file, validates xxHash64 checksum, recovers the native BfTree, and publishes the stub to the store.RangeIndexManager.SnapshotRangeIndexAndCreateReader): Snapshots the BfTree under an exclusive lock to a temp file, then creates the reader.Wire format
Single
MigrationRecordSpanType.SerializedRangeIndexStream(tag 4). Stream format across chunks:[4B keyLen][key bytes][8B fileSize][file bytes][8B xxHash64][4B stubLen][stub]Key and file bytes may span chunks; all other elements must fit within a single chunk.
SLOTS path
RecordType == 2, captures toMigrateOperation.RangeIndexesMigrateRangeIndexKeysAsyncruns a sketch-protected batch cycle:TransmitRangeIndexAsyncfinally: clear sketch (unblocks clients)KEYS path
GetRangeIndexKeysForMigrationdiscovers RI keys via RIGETTransmitKeysAsyncskips RI keys (inrangeIndexKeysToIgnore)TransmitRangeIndexAsync, then marked in sketch forDeleteKeysAsyncBug fixes
GarnetServer.cs)Publishdeletes existing data file before movePublishregistration: acceptInPlaceUpdated/CopyUpdatedstatusTransmitRangeIndexAsynccatches all exceptions (never throws)Tests
TODO