Skip to content

Fix flush-async races in Tsavorite LargeObject flush tests#1711

Merged
badrishc merged 2 commits into
devfrom
badrishc/fix-object-flush-test
Apr 17, 2026
Merged

Fix flush-async races in Tsavorite LargeObject flush tests#1711
badrishc merged 2 commits into
devfrom
badrishc/fix-object-flush-test

Conversation

@badrishc

Copy link
Copy Markdown
Collaborator
  • LargeObjectMultiFlushedPages: remove [Explicit] and add a no-op ShiftReadOnlyAddress(currentROA, wait: true) sync point so the FlushedUntilAddress/ReadOnlyAddress assertions see a quiesced state.
  • LargeObjectLinearizeFlushedPages: drop the Is.True assertion on the second ShiftReadOnlyAddress in DoFlush. That call intentionally races with the main thread's ShiftReadOnlyToTail; if the latter wins, MonotonicUpdate correctly refuses to move ROA backwards and returns false. The post-Task.WhenAll invariants (ROA == FUA == TailAddress) still fully verify correctness, and the test continues to exercise overlapping shifts.

No product code changes. Validated with 10 consecutive green runs of both tests (4 parameterized cases each).

Copilot AI review requested due to automatic review settings April 17, 2026 00:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR stabilizes Tsavorite LargeObject flush tests by removing a temporary [Explicit] skip and adjusting synchronization/assertions to account for intentional async and race behavior in read-only shifting and flush completion.

Changes:

  • Re-enabled LargeObjectMultiFlushedPages by removing [Explicit] and adding a deterministic sync point that waits for FlushedUntilAddress to catch up to the already-established ReadOnlyAddress.
  • Updated LargeObjectLinearizeFlushedPages to avoid asserting a deterministic outcome for a deliberately racing ShiftReadOnlyAddress call, while preserving post-WhenAll invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

1. LargeObjectMultiFlushedPages (Tsavorite ObjectTests):
   Remove [Explicit] and add a no-op ShiftReadOnlyAddress(currentROA,
   wait: true) sync point so the FlushedUntilAddress / ReadOnlyAddress
   assertions see a quiesced state after the async flush pipeline
   catches up. MonotonicUpdate is a no-op when passed the current ROA;
   the built-in wait loop in ShiftReadOnlyAddressWithWait blocks until
   FUA >= ROA, which is exactly the observation the test wants.

2. LargeObjectLinearizeFlushedPages (Tsavorite ObjectTests):
   Drop the Is.True assertion on the second ShiftReadOnlyAddress in
   DoFlush. That call intentionally races with the main thread's
   ShiftReadOnlyToTail; if the latter wins, MonotonicUpdate correctly
   refuses to move ROA backwards and returns false. The post-
   Task.WhenAll invariants (ROA == FUA == TailAddress) still fully
   verify correctness, and the test continues to exercise the
   overlapping-shift linearization path.

3. ConfigSetHeapMemorySizeUtilizationTest (Garnet RespConfigTests):
   Rewrite test body. The old test inserted 4 tiny lists (i++ inside
   the loop meant only 4 unique keys) against a 3MB budget, so the
   LogSizeTracker never saw over-budget and PostMemoryTrim was never
   invoked; trimCompleteEvent.Wait timed out at 90s. The post-wait
   assertion also expected AllocatedPageCount < MinLogAllocatedPageCount
   (== 2), which is logically impossible. The fix inserts 128 list
   objects with meaningful heap (16 * 256-byte items each, ~512 KB
   total), then issues CONFIG SET memory 8192 to trigger the shrink
   branch of UpdateTargetSize. TotalSize > highTargetSize causes the
   resizer to compute an eviction range, shift addresses with
   waitForEviction, and invoke PostMemoryTrim. The test now waits at
   most ~30s for the callback and asserts that APC or heap actually
   decreased. Remove [Explicit].

No product code changes. Validated with many consecutive green runs
of each affected test, plus a full pass of the LargeObject and
RespConfig* suites.

Co-authored-by: Copilot <[email protected]>
@badrishc badrishc force-pushed the badrishc/fix-object-flush-test branch from 62172db to 0839224 Compare April 17, 2026 01:34
@badrishc badrishc merged commit 0740ff9 into dev Apr 17, 2026
28 of 30 checks passed
@badrishc badrishc deleted the badrishc/fix-object-flush-test branch April 17, 2026 02:29
badrishc added a commit that referenced this pull request Apr 20, 2026
Integrates 4 new dev commits:
- #1716: Fix use-after-free in GetAllocationForRetry when page is evicted
- #1712: Internalize heap-size tracking into Tsavorite (IRecordTriggers refactor)
- #1711: Fix three broken/flaky Tsavorite and Garnet tests
- #1710: Fix native linux device

Key changes from #1712 affecting bf-tree:
- IRecordDisposer replaced by IRecordTriggers with OnEvict/OnFlush/OnDispose/OnDiskRead
- GarnetRecordDisposer replaced by GarnetRecordTriggers
- OnEvict now takes EvictionSource parameter (MainLog vs ReadCache)
- Heap-size tracking internalized into Tsavorite

Resolution: merged with -X theirs, then restored bf-tree GarnetRecordTriggers
implementation (OnEvict for BfTree disposal, OnFlush for snapshot, OnDiskRead
for handle invalidation, OnDispose for delete cleanup, OnCheckpoint/OnRecovery
for checkpoint lifecycle). Updated OnEvict signature to accept EvictionSource.
Fixed unnecessary using in ReadMethods.cs.

All tests pass: 55 RangeIndex, 345 RespTests, 34 MultiDatabase.

Co-authored-by: Copilot <[email protected]>
badrishc added a commit that referenced this pull request Apr 20, 2026
Integrates 4 new dev commits:
- #1716: Fix use-after-free in GetAllocationForRetry when page is evicted
- #1712: Internalize heap-size tracking into Tsavorite (IRecordTriggers refactor)
- #1711: Fix three broken/flaky Tsavorite and Garnet tests
- #1710: Fix native linux device

Key changes from #1712 affecting bf-tree:
- IRecordDisposer replaced by IRecordTriggers with OnEvict/OnFlush/OnDispose/OnDiskRead
- GarnetRecordDisposer replaced by GarnetRecordTriggers
- OnEvict now takes EvictionSource parameter (MainLog vs ReadCache)
- Heap-size tracking internalized into Tsavorite via logSizeTracker
- OnDisposeValueObject removed; disposal routed through OnDispose

Resolution: took dev's Tsavorite src/test wholesale, then restored bf-tree
additions: CheckpointTrigger enum, OnRecovery/OnCheckpoint/OnRecoverySnapshotRead
hooks in IRecordTriggers/IStoreFunctions/StoreFunctions, WrongType in UpsertAction,
HybridLogCheckpointSMTask and Recovery call sites. Updated GarnetRecordTriggers
OnEvict to accept EvictionSource parameter. Re-added BfTreeInterop project ref
and WRONGTYPE guard in UpsertMethods.

All tests pass: 55 RangeIndex (3 runs), 345 RespTests. Format clean.

Co-authored-by: Copilot <[email protected]>
badrishc added a commit that referenced this pull request Apr 20, 2026
Integrates 4 dev commits (#1716, #1712, #1711, #1710).

Key change: #1712 internalized heap-size tracking into Tsavorite and
refactored IRecordDisposer → IRecordTriggers with OnEvict(EvictionSource).

Resolution:
- IRecordTriggers: take dev's version (EvictionSource enum, OnEvict with
  EvictionSource param, OnDisposeDiskRecord, removed OnDisposeValueObject),
  add bf-tree checkpoint hooks (OnRecovery, OnCheckpoint, OnRecoverySnapshotRead)
- GarnetRecordTriggers: update OnEvict to accept EvictionSource; remove
  OnDisposeValueObject and heap-tracking from OnDispose (now internal to
  Tsavorite); keep bf-tree dispatch (OnFlush, OnEvict, OnDiskRead, OnDispose,
  OnCheckpoint, OnRecovery)
- GarnetServer: keep cacheSizeTracker + rangeIndexManager constructor
- RMWMethods PostCopyUpdater: keep ClearTreeHandle for RIPROMOTE
- Fix unnecessary using in ReadMethods.cs

All tests pass: 55 RangeIndex (3 runs), 345 RespTests, 4 CacheSizeTracker.
Format clean (Garnet + Tsavorite).

Co-authored-by: Copilot <[email protected]>
badrishc added a commit that referenced this pull request Apr 20, 2026
Integrates 4 dev commits (#1716, #1712, #1711, #1710).

Key change: #1712 internalized heap-size tracking into Tsavorite and
refactored IRecordDisposer → IRecordTriggers with OnEvict(EvictionSource).

Resolution:
- IRecordTriggers: take dev's version (EvictionSource enum, OnEvict with
  EvictionSource param, OnDisposeDiskRecord, removed OnDisposeValueObject),
  add bf-tree checkpoint hooks (OnRecovery, OnCheckpoint, OnRecoverySnapshotRead)
- GarnetRecordTriggers: update OnEvict to accept EvictionSource; remove
  OnDisposeValueObject and heap-tracking from OnDispose (now internal to
  Tsavorite); keep bf-tree dispatch (OnFlush, OnEvict, OnDiskRead, OnDispose,
  OnCheckpoint, OnRecovery)
- GarnetServer: keep cacheSizeTracker + rangeIndexManager constructor
- RMWMethods PostCopyUpdater: keep ClearTreeHandle for RIPROMOTE
- Fix unnecessary using in ReadMethods.cs

All tests pass: 55 RangeIndex (3 runs), 345 RespTests, 4 CacheSizeTracker.
Format clean (Garnet + Tsavorite).

Co-authored-by: Copilot <[email protected]>
badrishc added a commit that referenced this pull request Apr 20, 2026
Integrates 4 dev commits (#1716, #1712, #1711, #1710).

Key change: #1712 internalized heap-size tracking into Tsavorite and
refactored IRecordDisposer → IRecordTriggers with OnEvict(EvictionSource).

Resolution:
- IRecordTriggers: take dev's version (EvictionSource enum, OnEvict with
  EvictionSource param, OnDisposeDiskRecord, removed OnDisposeValueObject),
  add bf-tree checkpoint hooks (OnRecovery, OnCheckpoint, OnRecoverySnapshotRead)
- GarnetRecordTriggers: update OnEvict to accept EvictionSource; remove
  OnDisposeValueObject and heap-tracking from OnDispose (now internal to
  Tsavorite); keep bf-tree dispatch (OnFlush, OnEvict, OnDiskRead, OnDispose,
  OnCheckpoint, OnRecovery)
- GarnetServer: keep cacheSizeTracker + rangeIndexManager constructor
- RMWMethods PostCopyUpdater: keep ClearTreeHandle for RIPROMOTE
- Fix unnecessary using in ReadMethods.cs

All tests pass: 55 RangeIndex (3 runs), 345 RespTests, 4 CacheSizeTracker.
Format clean (Garnet + Tsavorite).

Co-authored-by: Copilot <[email protected]>
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants