Pause VectorManager cleanup before cluster/other resets of log#1775
Merged
Conversation
badrishc
commented
May 6, 2026
Collaborator
- Cover reset with a boolean Initializing check that is epoch-protected, so that concurrent iterators end gracefully.
- Add ability to pause vector cleanup, used before we reset as a second line of defense
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens Tsavorite allocator reset behavior against concurrent scan iterators, and adds an explicit pause/resume mechanism to prevent VectorManager’s background cleanup scan from racing with store resets during cluster replica re-attach.
Changes:
- Add an
AllocatorBase.Initializinggate aroundReset()+Initialize(), and update scan iterators to acquire epoch earlier and end iteration cleanly if reset/initialization is in progress. - Introduce
VectorManager.PauseCleanupAsync()/ResumeCleanup()and use them in diskless/diskbased replica attach flows to quiesce the background cleanup scan beforestoreWrapper.Reset(). - Remove allocator-specific
Reset()overrides now thatAllocatorBase.Reset()performs the full reset + initialize sequence.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/storage/Tsavorite/cs/src/core/Allocator/AllocatorBase.cs | Adds Initializing flag and restructures reset into a gated Reset() plus ResetCore() cascade. |
| libs/storage/Tsavorite/cs/src/core/Allocator/SpanByteScanIterator.cs | Acquires epoch before sampling allocator state; exits early if Initializing is true; rechecks stop conditions after snapping to BeginAddress. |
| libs/storage/Tsavorite/cs/src/core/Allocator/ObjectScanIterator.cs | Same iterator safety changes as SpanByteScanIterator. |
| libs/storage/Tsavorite/cs/src/core/Allocator/SpanByteAllocatorImpl.cs | Removes redundant Reset() override (base handles reset+initialize). |
| libs/storage/Tsavorite/cs/src/core/Allocator/ObjectAllocatorImpl.cs | Removes redundant Reset() override (base handles reset+initialize). |
| libs/storage/Tsavorite/cs/src/core/Allocator/TsavoriteLogAllocatorImpl.cs | Removes redundant Reset() override (base handles reset+initialize). |
| libs/server/Resp/Vector/VectorManager.Cleanup.cs | Adds cleanupGate and exposes pause/resume APIs; cleanup loop takes the gate per iteration. |
| libs/server/Resp/Vector/VectorManager.cs | Disposes cleanupGate after draining the cleanup task. |
| libs/cluster/Server/Replication/ReplicaOps/ReplicaDisklessSync.cs | Pauses VectorManager cleanup before calling storeWrapper.Reset() in replica attach path. |
| libs/cluster/Server/Replication/ReplicaOps/ReplicaDiskbasedSync.cs | Pauses VectorManager cleanup before calling storeWrapper.Reset() in replica attach path. |
9c842f1 to
7f14504
Compare
kevin-montrose
approved these changes
May 7, 2026
7f14504 to
c78e394
Compare
…nitialize() epoch-gate Fixes a rare CI fatal AccessViolationException at LogRecord.get_Info / ObjectScanIterator.GetPhysicalAddressAndAllocatedSize during VectorManager.RunCleanupTaskAsync. Same crash signature as PR #1765 but a different remaining gap. ## Root cause PR #1765 made AllocatorBase.Reset()'s page-free section epoch-safe (2-phase BumpCurrentEpoch barrier around OnPagesClosed + FreeAllAllocatedPages). It did NOT cover the per-allocator Initialize() that runs after base.Reset() returns. Initialize() non-monotonically rewinds HeadAddress / BeginAddress / TailPageOffset back to FirstValidAddress while pagePointers[i] are mostly 0 (only pages 0 and 1 are reallocated). Race window between HeadAddress reset and TailPageOffset reset: - Iterator reads stopAddress = TailAddress = old high (stale) - Iterator passes currentAddress < stopAddress check, proceeds - Iterator reads headAddress = HeadAddress = new low - Iterator chooses in-memory path (currentAddress >= headAddress) - Iterator dereferences pagePointers[X] + offset = 0 + offset → AVE The cluster re-attach paths (ReplicaDisklessSync, ReplicaDiskbasedSync) call SuspendPrimaryOnlyTasksAsync() AFTER Reset, and even then it only covers TaskManager-registered tasks. VectorManager.cleanupTask is started directly in the constructor (VectorManager.cs:110) as a plain async Task and is invisible to TaskManager. ## Fix Two complementary fixes: ### A. Tsavorite-side defense-in-depth - AllocatorBase: added `internal volatile bool Initializing`. Refactored Reset() to be non-virtual, wrapping the existing 2-phase epoch logic (renamed to ResetCore()) PLUS the per-allocator Initialize() call inside Initializing = true / false (try/finally so a throw can't leave iterators stuck). - Per-allocator Reset() overrides removed in ObjectAllocatorImpl, SpanByteAllocatorImpl, TsavoriteLogAllocatorImpl — base now does ResetCore() + Initialize() directly under the gate. - ObjectScanIterator + SpanByteScanIterator (InitializeGetNextAndAcquireEpoch): - Move epoch.Resume() BEFORE sampling TailAddress (was sampled outside epoch protection — preexisting issue). - Spin-wait `while (hlogBase.Initializing) { Thread.Yield(); epoch.ProtectAndDrain(); }` with `!assumeInMemory` bypass (avoids deadlock with internal MemoryPageScan inside OnPagesClosed which constructs assumeInMemory iterators on the same thread that is running the Initializing-gated deferred action). - LoadPageIfNeeded: recheck `currentAddress >= stopAddress` after snapping forward to BeginAddress (stale snap could place currentAddress at TailAddress with an unparseable record). ### B. Caller-side explicit drain (restores Reset's documented contract) - VectorManager.Cleanup.cs: added PauseCleanupAsync() / ResumeCleanup() API. Cleanup loop uses increment-then-check pattern (bump cleanupInProgress first, re-read pauseCleanupRequests, back out if non-zero) so a pause caller cannot race past an iteration about to start. - ReplicaDisklessSync.cs + ReplicaDiskbasedSync.cs: wrap storeWrapper.Reset() in `await vectorManager.PauseCleanupAsync()` / `vectorManager.ResumeCleanup()` try/finally. A is defense-in-depth (any future background reader is safe); B restores the documented contract for the known case and skips the per-iteration spin-wait when callers cooperate. ## Validation - Build: 0 warnings, 0 errors. dotnet format clean. - Tsavorite tests: 1453/1453 pass. - Garnet.test Reset/Recovery/Iterate filter: 30/30 pass. - ClusterMigrate + ClusterVectorSet (52 tests): all pass. - MigrateVectorSetWhileModifyingAsync (the originally failing cluster test): stable across 8 sequential runs (~18s each) with --blame-crash. - VectorCleanupVsResetRaceTests (PR #1765's regression): pass [Repeat(5)]. - ClusterResetDuringReplicationTests: 2/2 pass. Co-authored-by: Copilot <[email protected]>
c78e394 to
8e819a8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.