Skip to content

Bump bf-tree to 0.5.4#1891

Merged
tiagonapoli merged 9 commits into
mainfrom
tiagonapoli/bump-bftree-0.5.4
Jun 22, 2026
Merged

Bump bf-tree to 0.5.4#1891
tiagonapoli merged 9 commits into
mainfrom
tiagonapoli/bump-bftree-0.5.4

Conversation

@tiagonapoli

@tiagonapoli tiagonapoli commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

What

Bumps the native bf-tree dependency from 0.5.0 to 0.5.4 (latest), fixing a crash where dropping a CprSnapshot-recovered, multi-level tree aborts the process in BfTree''s Drop with assertion failed: next_level.is_null() at mini_page_op.rs:429.

This is the root cause of the intermittent RangeIndex cluster-migration crashes — the migration destination index is a CprSnapshot-recovered tree, which aborts when dropped at shutdown. (Fixed upstream in bf-tree 0.5.3; 0.5.4 is the latest.)

Required bf-tree API changes (0.5.0 → 0.5.4)

The snapshot/recovery API was redesigned; the FFI wrapper (src/lib.rs) and the managed call sites were adapted accordingly:

0.5.0 0.5.4 adaptation
Config::snapshot_file_path(path) configured the snapshot destination removed no longer set on Config; the destination is passed at call time
BfTree::cpr_snapshot() BfTree::cpr_snapshot(path) the path is now a call-time argument
BfTree::new_from_cpr_snapshot(recovery, use_snapshot, new_snapshot_path, buf_ptr, buf_size, wal) new_from_cpr_snapshot(recovery, use_snapshot, buf_ptr, buf_size, wal) dropped the new_snapshot_path argument
BfTree::are_all_threads_in_next_version() are_all_threads_in_next_snapshot_version() renamed

Data persistence is unchanged: records still accumulate in the data file (Config::file_path, e.g. <hash>.data.bftree); cpr_snapshot(path) writes a self-contained checkpoint to the given path via a non-blocking versioned sweep. Only the timing of naming the snapshot path moved (Config → call-time arg).

Commits

  1. Bump bf-tree to 0.5.4Cargo.toml/Cargo.lock, adapt the FFI wrapper, rebuild win-x64/linux-x64 binaries. bftree_cpr_snapshot now takes the destination path as a parameter; managed callers pass it (BfTreeService.CprSnapshot()_snapshotFilePath; CprSnapshotByPtr(handle, snapshotPath)). bftree_create / bftree_new_from_cpr_snapshot take a use_snapshot flag (the snapshot destination is supplied at cpr_snapshot call time, so no path is configured at create/recover).
  2. Snapshot directly to destination (drop scratch+copy) — now that the destination is a call-time argument, the OnFlush / checkpoint / migration paths snapshot straight to their per-flush / per-checkpoint / migration file instead of snapshotting to a shared scratch file and File.Copy-ing it. The scratch+copy existed only because 0.5.0 fixed the snapshot path in Config and held a persistent file descriptor for it. The per-tree SnapshotInProgress claim is kept (bftree silently no-ops a racing cpr_snapshot), so concurrent consumers serialize and each writes its own destination; TreeEntry.WaitForSnapshot is removed. The scratch path remains only as the use_snapshot enabler at tree create/recover time.
  3. Docs — update the RangeIndex dev doc to the direct-to-destination snapshot flow.

Verification

  • Full RangeIndex suite: 64/64 on Windows; recover/checkpoint subset on Linux.
  • Cluster migrate RangeIndex suite: 19/19 on Windows (exercises the changed migration snapshot path).
  • A multi-level recover-then-drop (1000 records) that aborted on 0.5.0 now drops cleanly with all records intact, exercised end-to-end through the FFI.
  • Rebuilt the win-x64 (.dll/.pdb) and linux-x64 (.so) prebuilt binaries.

Note on macOS binaries

The osx-x64/osx-arm64 prebuilt binaries are not rebuilt here (no macOS cross-compile available). Dev machines with cargo rebuild from source for all platforms, and the release pipeline regenerates the prebuilts. A maintainer on macOS (or the pipeline) should refresh those two binaries.

Copilot AI review requested due to automatic review settings June 19, 2026 19:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Garnet’s native bf-tree dependency to 0.5.4 to pick up the upstream fix for a recover-then-drop abort in BfTree’s Drop (seen in RangeIndex cluster-migration shutdown paths). To keep the managed P/Invoke surface unchanged, the Rust FFI layer now wraps the tree plus its configured snapshot destination in an opaque handle and supplies the snapshot path at cpr_snapshot call time.

Changes:

  • Bump bf-tree from 0.5.x to 0.5.4 (fix included since 0.5.3).
  • Adapt Rust FFI to bf-tree’s new CPR snapshot API by introducing BfTreeHandle { tree, snapshot_path } while keeping the exported C ABI function names/arg lists stable.
  • Update lockfile to reflect the new dependency graph pulled in by bf-tree 0.5.4.

Reviewed changes

Copilot reviewed 2 out of 6 changed files in this pull request and generated 2 comments.

File Description
libs/native/bftree-garnet/src/lib.rs Wraps BfTree + snapshot path in BfTreeHandle and updates snapshot/recovery calls to match bf-tree 0.5.3+ API while preserving the existing C ABI.
libs/native/bftree-garnet/Cargo.toml Pins bf-tree dependency to version 0.5.4.
libs/native/bftree-garnet/Cargo.lock Updates resolved crate versions/checksums to match the new bf-tree version.

Comment thread libs/native/bftree-garnet/src/lib.rs
Comment thread libs/native/bftree-garnet/src/lib.rs
@tiagonapoli tiagonapoli changed the title Bump bf-tree to 0.5.4 (fixes RangeIndex recover-then-drop crash) Bump bf-tree to 0.5.4 Jun 19, 2026
@tiagonapoli tiagonapoli force-pushed the tiagonapoli/bump-bftree-0.5.4 branch from 2ed38ec to 77b9c42 Compare June 19, 2026 19:54
@tiagonapoli

tiagonapoli commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

How the snapshot path is threaded (updated)

bf-tree 0.5.3+ takes the CPR snapshot destination as an argument of BfTree::cpr_snapshot(path). bftree_cpr_snapshot now takes the destination path as a parameter and the managed callers pass it:

  • BfTreeService.CprSnapshot() passes its _snapshotFilePath.
  • CprSnapshotByPtr(nint handle, string snapshotPath) — the RangeIndex OnFlush / checkpoint / migration sites now snapshot directly to their final destination (the per-flush / per-checkpoint / migration file), removing the previous scratch-file + File.Copy staging (which only existed because 0.5.0 fixed the snapshot path in Config and held a persistent FD for it).
  • The per-tree SnapshotInProgress claim is kept (bftree silently no-ops a racing cpr_snapshot); TreeEntry.WaitForSnapshot was removed.
  • bftree_create / bftree_new_from_cpr_snapshot keep their path arguments but use them only to toggle use_snapshot (the scratch path now serves only as that enabler; the scratch file is no longer written).

(This supersedes the earlier BfTreeHandle and scratch+copy notes — both were dropped in favor of passing the destination path explicitly.)

@tiagonapoli tiagonapoli force-pushed the tiagonapoli/bump-bftree-0.5.4 branch 4 times, most recently from 8582942 to 24187c8 Compare June 19, 2026 22:45
Tiago Napoli and others added 3 commits June 19, 2026 15:46
bf-tree 0.5.0 aborts in BfTree's Drop with 'assertion failed:
next_level.is_null()' (mini_page_op.rs:429) when a CprSnapshot-recovered tree
that is multi-level is dropped. This is the root cause of the intermittent
RangeIndex cluster-migration crashes (the migration destination index is a
CprSnapshot-recovered tree). Bisected: present in 0.5.0/0.5.1, fixed in 0.5.3+.

Bump bf-tree to 0.5.4 and adapt the C FFI wrapper to its redesigned snapshot API:
- 0.5.3+ takes the CPR snapshot destination as an argument of
  BfTree::cpr_snapshot(path) instead of a Config field, and dropped the
  recovered-tree snapshot-path arg from new_from_cpr_snapshot. Rather than store
  the path natively, bftree_cpr_snapshot now takes the destination path as a
  parameter; the managed callers already hold it (BfTreeService._snapshotFilePath
  for the instance API, and the per-tree scratch path at the RangeIndex
  OnFlush/checkpoint/migration call sites) and pass it through.
- are_all_threads_in_next_version -> are_all_threads_in_next_snapshot_version.

Rebuilt the win-x64 and linux-x64 prebuilt binaries. Verified via the full
RangeIndex test suite (64 tests on Windows, recover/checkpoint subset on Linux):
the multi-level recover-then-drop that aborted on 0.5.0 now drops cleanly with
all records intact.

Note: osx-x64/osx-arm64 prebuilt binaries are not rebuilt here (no macOS
cross-compile); dev machines with cargo rebuild from source, and the release
pipeline / a macOS maintainer regenerates them.

Co-authored-by: Copilot <[email protected]>
With bf-tree 0.5.3+ taking the CPR snapshot destination as a cpr_snapshot(path)
argument, the OnFlush / checkpoint / migration paths can snapshot straight to
their final per-flush / per-checkpoint / migration file instead of snapshotting
to a shared scratch file and File.Copy-ing it.

The scratch+copy was a workaround for 0.5.0, where the snapshot path was fixed in
Config and bftree held a persistent file descriptor for it (so a finalized file
had to be a copy, and racing consumers shared one snapshot via WaitForSnapshot).
In 0.5.4 cpr_snapshot creates a fresh, self-contained file per call, so:
- each site snapshots directly to its destination (no scratch file, no copy);
- the per-tree SnapshotInProgress claim is kept (bftree silently no-ops a racing
  cpr_snapshot), so concurrent flush/checkpoint/migration serialize and each
  produces its own destination file;
- TreeEntry.WaitForSnapshot is removed (no longer needed).

The scratch path remains only as the use_snapshot enabler passed at tree
create/recover time; the scratch file itself is no longer written.

Verified: RangeIndex suite (63) and cluster migrate RangeIndex suite (19) pass.

Co-authored-by: Copilot <[email protected]>
Update the RangeIndex dev doc to match the bf-tree 0.5.4 snapshot flow:
CprSnapshotByPtr now takes the destination path and the OnFlush / checkpoint /
migration paths write the CPR snapshot directly to the per-flush / per-token /
migration file (no scratch.cpr staging + File.Copy). scratch.cpr remains only as
the use_snapshot enabler passed at create/recover; it is no longer written.

Co-authored-by: Copilot <[email protected]>
@tiagonapoli tiagonapoli force-pushed the tiagonapoli/bump-bftree-0.5.4 branch from 24187c8 to f26aecc Compare June 19, 2026 22:46
Tiago Napoli and others added 4 commits June 22, 2026 11:04
…ots toggle

Since cpr_snapshot now takes the destination path as a call-time argument,
the per-tree scratch path configured at construction was only used to derive
the use_snapshot on/off flag. Replace it with an explicit bool:

- BfTreeService: ctor and RecoverFromCprSnapshot take `enableSnapshots`
  instead of a snapshot file path; drop the unused _snapshotFilePath field,
  SnapshotFilePath property, and the now-callerless CprSnapshot() method.
- RangeIndexManager: remove LogScratchPath/LogScratchPathFor and the
  <hash>.scratch.cpr file; CreateTree/RestoreTree/migration recovery pass
  the bool. Refresh stale scratch+copy doc comments.

Co-authored-by: Copilot <[email protected]>
…shot docs

- Add TreeEntry.SnapshotUnderClaim(destinationPath): single home for the
  spin-claim + cpr_snapshot + release pattern previously duplicated across
  OnFlush (live/cold), checkpoint, and migration paths. Behavior unchanged
  (still spins on the per-tree atomic).
- Refresh stale doc comments left by the snapshot-direct refactor: drop the
  obsolete 'File.Move/copy the scratch file' wording in the checkpoint summary
  and in the bftree_cpr_snapshot FFI docs (C# + Rust).
- Document the CprSnapshotByPtr caller contract: it does not self-serialize;
  a racing cpr_snapshot on the same tree silently no-ops while still returning
  success, so callers must hold external per-tree serialization.

Co-authored-by: Copilot <[email protected]>
The BfTreeInterop test project was orphaned: not referenced by Garnet.slnx
(so CI never built it) and still calling a long-removed API (Snapshot(),
RecoverFromSnapshot), leaving it uncompilable.

- Rewrite the snapshot/recovery tests against the current API: snapshot via
  the static CprSnapshotByPtr(handle, path) and recover via
  RecoverFromCprSnapshot(path, enableSnapshots, backend). Round-trips now
  snapshot to a distinct file and recover from it, mirroring real RangeIndex
  usage.
- Replace tests asserting outdated behavior: nonexistent-file recovery now
  throws (native returns null) instead of 'creates empty'; memory-backed
  snapshot is now a working round-trip (bftree supports CPR for memory-backed
  trees) instead of the old 'throws NotSupportedException' premise.
- Add argument-validation tests for null handle / empty paths.
- Fix latent analyzer/format issues surfaced now that the project builds
  (IDE0300, final-newline).
- Add the project to Garnet.slnx so CI builds it on every TF/OS/config and
  catches API drift like this in the future.

All 38 tests pass on net8.0 and net10.0.

Co-authored-by: Copilot <[email protected]>
Move test/BfTreeInterop.test -> test/standalone/BfTreeInterop.test so it lives
with the other standalone test projects and is reachable by the matrix command
'dotnet test test/standalone/<name>'. Add 'BfTreeInterop.test' to the
test-garnet-standalone matrix so CI actually runs it (not just builds it), and
update the relative ProjectReference depth and the Garnet.slnx path for the new
location.

Co-authored-by: Copilot <[email protected]>
Tiago Napoli and others added 2 commits June 22, 2026 11:45
The BfTreeInterop.test project was already orphaned and non-compiling before
this branch (not in Garnet.slnx, calling a long-removed API). Reverting the
revival keeps this PR focused on the snapshot refactor:

- Restore the project to its original test/BfTreeInterop.test/ location and
  prior contents.
- Remove it from Garnet.slnx and from the ci.yml standalone test matrix.
- Drop the BfTreeInterop.csproj output-root native-copy fallback that was only
  needed to run this test in CI.

Co-authored-by: Copilot <[email protected]>
@tiagonapoli tiagonapoli merged commit 60c1922 into main Jun 22, 2026
300 of 301 checks passed
@tiagonapoli tiagonapoli deleted the tiagonapoli/bump-bftree-0.5.4 branch June 22, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants