Bump bf-tree to 0.5.4#1891
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates Garnet’s native bf-tree dependency to 0.5.4 to pick up the upstream fix for a recover-then-drop abort in BfTree’s Drop (seen in RangeIndex cluster-migration shutdown paths). To keep the managed P/Invoke surface unchanged, the Rust FFI layer now wraps the tree plus its configured snapshot destination in an opaque handle and supplies the snapshot path at cpr_snapshot call time.
Changes:
- Bump
bf-treefrom 0.5.x to 0.5.4 (fix included since 0.5.3). - Adapt Rust FFI to bf-tree’s new CPR snapshot API by introducing
BfTreeHandle { tree, snapshot_path }while keeping the exported C ABI function names/arg lists stable. - Update lockfile to reflect the new dependency graph pulled in by
bf-tree0.5.4.
Reviewed changes
Copilot reviewed 2 out of 6 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| libs/native/bftree-garnet/src/lib.rs | Wraps BfTree + snapshot path in BfTreeHandle and updates snapshot/recovery calls to match bf-tree 0.5.3+ API while preserving the existing C ABI. |
| libs/native/bftree-garnet/Cargo.toml | Pins bf-tree dependency to version 0.5.4. |
| libs/native/bftree-garnet/Cargo.lock | Updates resolved crate versions/checksums to match the new bf-tree version. |
2ed38ec to
77b9c42
Compare
How the snapshot path is threaded (updated)bf-tree 0.5.3+ takes the CPR snapshot destination as an argument of
(This supersedes the earlier |
8582942 to
24187c8
Compare
bf-tree 0.5.0 aborts in BfTree's Drop with 'assertion failed: next_level.is_null()' (mini_page_op.rs:429) when a CprSnapshot-recovered tree that is multi-level is dropped. This is the root cause of the intermittent RangeIndex cluster-migration crashes (the migration destination index is a CprSnapshot-recovered tree). Bisected: present in 0.5.0/0.5.1, fixed in 0.5.3+. Bump bf-tree to 0.5.4 and adapt the C FFI wrapper to its redesigned snapshot API: - 0.5.3+ takes the CPR snapshot destination as an argument of BfTree::cpr_snapshot(path) instead of a Config field, and dropped the recovered-tree snapshot-path arg from new_from_cpr_snapshot. Rather than store the path natively, bftree_cpr_snapshot now takes the destination path as a parameter; the managed callers already hold it (BfTreeService._snapshotFilePath for the instance API, and the per-tree scratch path at the RangeIndex OnFlush/checkpoint/migration call sites) and pass it through. - are_all_threads_in_next_version -> are_all_threads_in_next_snapshot_version. Rebuilt the win-x64 and linux-x64 prebuilt binaries. Verified via the full RangeIndex test suite (64 tests on Windows, recover/checkpoint subset on Linux): the multi-level recover-then-drop that aborted on 0.5.0 now drops cleanly with all records intact. Note: osx-x64/osx-arm64 prebuilt binaries are not rebuilt here (no macOS cross-compile); dev machines with cargo rebuild from source, and the release pipeline / a macOS maintainer regenerates them. Co-authored-by: Copilot <[email protected]>
With bf-tree 0.5.3+ taking the CPR snapshot destination as a cpr_snapshot(path) argument, the OnFlush / checkpoint / migration paths can snapshot straight to their final per-flush / per-checkpoint / migration file instead of snapshotting to a shared scratch file and File.Copy-ing it. The scratch+copy was a workaround for 0.5.0, where the snapshot path was fixed in Config and bftree held a persistent file descriptor for it (so a finalized file had to be a copy, and racing consumers shared one snapshot via WaitForSnapshot). In 0.5.4 cpr_snapshot creates a fresh, self-contained file per call, so: - each site snapshots directly to its destination (no scratch file, no copy); - the per-tree SnapshotInProgress claim is kept (bftree silently no-ops a racing cpr_snapshot), so concurrent flush/checkpoint/migration serialize and each produces its own destination file; - TreeEntry.WaitForSnapshot is removed (no longer needed). The scratch path remains only as the use_snapshot enabler passed at tree create/recover time; the scratch file itself is no longer written. Verified: RangeIndex suite (63) and cluster migrate RangeIndex suite (19) pass. Co-authored-by: Copilot <[email protected]>
Update the RangeIndex dev doc to match the bf-tree 0.5.4 snapshot flow: CprSnapshotByPtr now takes the destination path and the OnFlush / checkpoint / migration paths write the CPR snapshot directly to the per-flush / per-token / migration file (no scratch.cpr staging + File.Copy). scratch.cpr remains only as the use_snapshot enabler passed at create/recover; it is no longer written. Co-authored-by: Copilot <[email protected]>
24187c8 to
f26aecc
Compare
…ots toggle Since cpr_snapshot now takes the destination path as a call-time argument, the per-tree scratch path configured at construction was only used to derive the use_snapshot on/off flag. Replace it with an explicit bool: - BfTreeService: ctor and RecoverFromCprSnapshot take `enableSnapshots` instead of a snapshot file path; drop the unused _snapshotFilePath field, SnapshotFilePath property, and the now-callerless CprSnapshot() method. - RangeIndexManager: remove LogScratchPath/LogScratchPathFor and the <hash>.scratch.cpr file; CreateTree/RestoreTree/migration recovery pass the bool. Refresh stale scratch+copy doc comments. Co-authored-by: Copilot <[email protected]>
…shot docs - Add TreeEntry.SnapshotUnderClaim(destinationPath): single home for the spin-claim + cpr_snapshot + release pattern previously duplicated across OnFlush (live/cold), checkpoint, and migration paths. Behavior unchanged (still spins on the per-tree atomic). - Refresh stale doc comments left by the snapshot-direct refactor: drop the obsolete 'File.Move/copy the scratch file' wording in the checkpoint summary and in the bftree_cpr_snapshot FFI docs (C# + Rust). - Document the CprSnapshotByPtr caller contract: it does not self-serialize; a racing cpr_snapshot on the same tree silently no-ops while still returning success, so callers must hold external per-tree serialization. Co-authored-by: Copilot <[email protected]>
The BfTreeInterop test project was orphaned: not referenced by Garnet.slnx (so CI never built it) and still calling a long-removed API (Snapshot(), RecoverFromSnapshot), leaving it uncompilable. - Rewrite the snapshot/recovery tests against the current API: snapshot via the static CprSnapshotByPtr(handle, path) and recover via RecoverFromCprSnapshot(path, enableSnapshots, backend). Round-trips now snapshot to a distinct file and recover from it, mirroring real RangeIndex usage. - Replace tests asserting outdated behavior: nonexistent-file recovery now throws (native returns null) instead of 'creates empty'; memory-backed snapshot is now a working round-trip (bftree supports CPR for memory-backed trees) instead of the old 'throws NotSupportedException' premise. - Add argument-validation tests for null handle / empty paths. - Fix latent analyzer/format issues surfaced now that the project builds (IDE0300, final-newline). - Add the project to Garnet.slnx so CI builds it on every TF/OS/config and catches API drift like this in the future. All 38 tests pass on net8.0 and net10.0. Co-authored-by: Copilot <[email protected]>
Move test/BfTreeInterop.test -> test/standalone/BfTreeInterop.test so it lives with the other standalone test projects and is reachable by the matrix command 'dotnet test test/standalone/<name>'. Add 'BfTreeInterop.test' to the test-garnet-standalone matrix so CI actually runs it (not just builds it), and update the relative ProjectReference depth and the Garnet.slnx path for the new location. Co-authored-by: Copilot <[email protected]>
…UnderClaim doc Co-authored-by: Copilot <[email protected]>
The BfTreeInterop.test project was already orphaned and non-compiling before this branch (not in Garnet.slnx, calling a long-removed API). Reverting the revival keeps this PR focused on the snapshot refactor: - Restore the project to its original test/BfTreeInterop.test/ location and prior contents. - Remove it from Garnet.slnx and from the ci.yml standalone test matrix. - Drop the BfTreeInterop.csproj output-root native-copy fallback that was only needed to run this test in CI. Co-authored-by: Copilot <[email protected]>
What
Bumps the native
bf-treedependency from 0.5.0 to 0.5.4 (latest), fixing a crash where dropping a CprSnapshot-recovered, multi-level tree aborts the process inBfTree''sDropwithassertion failed: next_level.is_null()atmini_page_op.rs:429.This is the root cause of the intermittent RangeIndex cluster-migration crashes — the migration destination index is a CprSnapshot-recovered tree, which aborts when dropped at shutdown. (Fixed upstream in bf-tree 0.5.3; 0.5.4 is the latest.)
Required bf-tree API changes (0.5.0 → 0.5.4)
The snapshot/recovery API was redesigned; the FFI wrapper (
src/lib.rs) and the managed call sites were adapted accordingly:Config::snapshot_file_path(path)configured the snapshot destinationConfig; the destination is passed at call timeBfTree::cpr_snapshot()BfTree::cpr_snapshot(path)BfTree::new_from_cpr_snapshot(recovery, use_snapshot, new_snapshot_path, buf_ptr, buf_size, wal)new_from_cpr_snapshot(recovery, use_snapshot, buf_ptr, buf_size, wal)new_snapshot_pathargumentBfTree::are_all_threads_in_next_version()are_all_threads_in_next_snapshot_version()Data persistence is unchanged: records still accumulate in the data file (
Config::file_path, e.g.<hash>.data.bftree);cpr_snapshot(path)writes a self-contained checkpoint to the given path via a non-blocking versioned sweep. Only the timing of naming the snapshot path moved (Config → call-time arg).Commits
Cargo.toml/Cargo.lock, adapt the FFI wrapper, rebuild win-x64/linux-x64 binaries.bftree_cpr_snapshotnow takes the destination path as a parameter; managed callers pass it (BfTreeService.CprSnapshot()→_snapshotFilePath;CprSnapshotByPtr(handle, snapshotPath)).bftree_create/bftree_new_from_cpr_snapshottake ause_snapshotflag (the snapshot destination is supplied atcpr_snapshotcall time, so no path is configured at create/recover).OnFlush/ checkpoint / migration paths snapshot straight to their per-flush / per-checkpoint / migration file instead of snapshotting to a shared scratch file andFile.Copy-ing it. The scratch+copy existed only because 0.5.0 fixed the snapshot path inConfigand held a persistent file descriptor for it. The per-treeSnapshotInProgressclaim is kept (bftree silently no-ops a racingcpr_snapshot), so concurrent consumers serialize and each writes its own destination;TreeEntry.WaitForSnapshotis removed. The scratch path remains only as theuse_snapshotenabler at tree create/recover time.Verification
.dll/.pdb) and linux-x64 (.so) prebuilt binaries.Note on macOS binaries
The
osx-x64/osx-arm64prebuilt binaries are not rebuilt here (no macOS cross-compile available). Dev machines with cargo rebuild from source for all platforms, and the release pipeline regenerates the prebuilts. A maintainer on macOS (or the pipeline) should refresh those two binaries.