Avoid tempKv-driven scan iteration in Garnet callers#1797
Conversation
There was a problem hiding this comment.
Pull request overview
This PR removes tempKv-backed scan iteration from Garnet/Cluster callsites to avoid temporary memory spikes proportional to the keyspace. It does so by adding a Tsavorite ClientSession.IterateLookupSnapshot(...) helper (capturing TailAddress once and pinning endAddress == maxAddress) and routing affected iterations through this snapshot lookup path, while also updating compaction-type options/docs and adding regression tests.
Changes:
- Added
ClientSession.IterateLookupSnapshot<TScanFunctions>(...)and switched Garnet/Cluster scan-based callers (e.g., KEYS, slot scans, vector cleanup) to use snapshot lookup iteration (notempKv). - Removed the parameterless pull
IGarnetApi.IterateStore()overload and replaced the slot-deletion path withIGarnetApi.DeleteSlotKeys(HashSet<int>)+ a cached scan callback. - Updated
LogCompactionType/options/docs to remove deprecatedShiftForced, reorder option presentation, and warn whenScancompaction is selected; added tests for snapshot iteration correctness and the converted callsites.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/getting-started/configuration.md | Updates compaction-type option ordering and warns about Scan transient memory costs. |
| test/Garnet.test/RespScanCommandsTests.cs | Adds regression test ensuring KEYS returns no duplicates under concurrent RCUs. |
| test/Garnet.test.cluster/ClusterMigrateTests.cs | Adds regression test for CLUSTER DELKEYSINSLOT deleting both string and object keys via unified lookup scan. |
| libs/storage/Tsavorite/cs/test/SpanByteIterationTests.cs | Adds correctness test for IterateLookupSnapshot behavior after forced RCUs. |
| libs/storage/Tsavorite/cs/src/core/ClientSession/ClientSession.cs | Introduces IterateLookupSnapshot convenience API built on ScanCursor with pinned snapshot bounds. |
| libs/server/StoreWrapper.cs | Converts HasKeysInSlots to a single unified lookup snapshot scan with a cached callback. |
| libs/server/Storage/Session/Common/ArrayKeyIterationFunctions.cs | Adds DeleteSlotKeys and new cached scan callbacks; converts DBKeys to snapshot lookup iteration. |
| libs/server/Resp/Vector/VectorManager.Cleanup.cs | Switches cleanup scan to IterateLookupSnapshot to avoid tempKv. |
| libs/server/LogCompactionType.cs | Removes ShiftForced, reorders enum, and adds Scan warning text in docs/comments. |
| libs/server/API/IGarnetApi.cs | Replaces pull IterateStore() with DeleteSlotKeys(HashSet<int>). |
| libs/server/API/GarnetWatchApi.cs | Implements new DeleteSlotKeys API forwarding. |
| libs/server/API/GarnetApi.cs | Implements new DeleteSlotKeys API forwarding. |
| libs/host/defaults.conf | Updates compaction-type comments to recommend Lookup and warn on Scan. |
| libs/host/Configuration/Options.cs | Updates compaction-type help text and adds a startup warning when Scan is selected. |
| libs/cluster/Server/ClusterManagerSlotState.cs | Refactors slot-key deletion to call basicGarnetApi.DeleteSlotKeys(...). |
|
Addressed both reviewer comments in commit 0364fc8:
Both targeted tests still pass locally. |
Replace the four Garnet/Cluster callsites of `Session.Iterate(...)` (which allocates a parallel `tempKv` proportional to the keyspace and copies every in-range record into it) with the lookup-based `Session.IterateLookupSnapshot`, a new Tsavorite ClientSession method that captures `Log.TailAddress` once and pins both `untilAddress` and `maxAddress` to it. This preserves the prior snapshot semantics (each unique live key emitted exactly once based on its latest in-range version, with the in-flight RCU race window correctly handled) without the O(N) `tempKv` memory cost. Converted callsites: - `StoreWrapper.HasKeysInSlots`: collapses two pull-iterations (string + object) into a single push scan over the unified context with a cached early-exit callback (`hasKeysInSlotsFuncs` field on StoreWrapper). - `StorageSession.DBKeys` (KEYS command): switched the existing cached `unifiedStoreDbKeysFuncs` callback from `Session.Iterate(...)` to `IterateLookupSnapshot`. - `ClusterManagerSlotState.DeleteKeysInSlots`: refactored into a new `StorageSession.DeleteSlotKeys(slots)` method on the unified context with a cached `deleteSlotKeysFuncs` callback. Cluster static helper now one-line delegate. Preserves prior behavior — deletes every matched live key including expired-but-not-yet-tombstoned ones (no expiry filter). - `VectorManager.Cleanup`: switched from push `Session.Iterate(ref callbacks)` to `IterateLookupSnapshot`. API changes: - New `ClientSession.IterateLookupSnapshot<TScanFunctions>(ref scanFns, bool includeTombstones = false)` on Tsavorite. Wraps `ScanCursor` with the snapshot pattern `untilAddress = maxAddress = capturedTail`. - Removed parameterless pull `IGarnetApi.IterateStore()` overload (the only Garnet API method that exposed the tempKv pull iterator). The push overload `IterateStore<TScanFunctions>(...)` is unchanged. **Breaking change** for out-of-tree extensions that bind to `IGarnetApi`. - Removed deprecated `LogCompactionType.ShiftForced` enum value plus its back-compat shim. `ShiftForced` was deprecated in PR #482 (June 2024, almost two years ago); long-deprecated values are removed in v2. Migration: `--compaction-type Shift --compaction-force-delete true`. **Breaking change** in config surface — `CommandLineParser` produces a clear startup error listing the now-valid values `None | Shift | Lookup | Scan`. - Reordered `LogCompactionType` enum members + help-text to `None | Shift | Lookup | Scan` so users see the recommended `Lookup` before the not-recommended `Scan`. Added a startup `LogWarning` when `CompactionType = Scan` is selected to flag the temporary-memory-spike cost. `Scan` itself still works as before — no behavior change. All callbacks follow the existing cached-field pattern (`xxxFuncs ??= new(); xxxFuncs.Initialize(...)`) used by `unifiedStoreDbKeysFuncs`, `unifiedStoreDbScanFuncs`, `unifiedStoreDbSizeFuncs`, `expiredKeyDeletionScanFuncs` — steady-state per-call callback allocations are zero. Net per-call cost on the four hot paths drops from `{1-2 sessions + 1-2 TsavoriteKV instances + 2 iterators + O(N) record copies}` to `{1 session, 0 callback alloc}`. Tests added: - `SpanByteIterateLookupSnapshotBasicCorrectness` (libs/storage/Tsavorite/cs/test/SpanByteIterationTests.cs): inserts 200 keys, RCUs all of them, asserts `IterateLookupSnapshot` emits each unique live key exactly once with the post-RCU value; asserts the snapshot is stable across calls. - `ClusterDelKeysInSlotRemovesStringAndObjectKeys` (test/Garnet.test.cluster/ClusterMigrateTests.cs): exercises `CLUSTER DELKEYSINSLOT` over both raw-string and collection-object keys via the unified scan; asserts both are deleted. - `SeKeysNoDuplicatesUnderRcu` (test/Garnet.test/RespScanCommandsTests.cs): runs concurrent SET RCUs while issuing KEYS in a loop, asserts each key is returned exactly once across multiple rounds. Documentation: - `libs/host/defaults.conf` and `website/docs/getting-started/configuration.md` updated to use the new compaction-type ordering and call out `Scan` as NOT RECOMMENDED. Co-authored-by: Copilot <[email protected]>
- ClusterDelKeysInSlotRemovesStringAndObjectKeys: change [Order(22)] to [Order(28)] to avoid collision with the existing #if DEBUG-gated Order(22) test in the same fixture (ordered, non-parallelizable execution would otherwise be ambiguous). - SeKeysNoDuplicatesUnderRcu: assert that the background writer actually stops within the wait window (raised from 5s to 30s) and surface any exception the writer threw, so a hung/faulted writer can't leak a connection or silently mask test failures. Co-authored-by: Copilot <[email protected]>
0364fc8 to
890226a
Compare
TedHartMS
left a comment
There was a problem hiding this comment.
I suggest adding a second slot to ClusterDelKeysInSlotRemovesStringAndObjectKeys and ensuring that keys in that slot are not deleted
…l slot The previous version of the test only verified that DELKEYSINSLOT removed the targeted keys; it did not verify that keys in OTHER slots are preserved. A bug in the slot-set filter inside DeleteSlotKeysScan.Reader (e.g., a wrong `slots.Contains(...)` check or a hash-slot computation regression) would have collateral-deleted unrelated keys without this test catching it. Now the test: - Picks two distinct slots owned by node 0 (delSlot, keepSlot). - Inserts one string key + one object key in EACH slot (4 keys total). - Runs CLUSTER DELKEYSINSLOT on delSlot only. - Asserts delSlot is empty and both delSlot keys are gone via GET/EXISTS. - Asserts keepSlot still has both keys with original values (string GET returns 'raw-keep'; SCARD on the object key returns 3). Co-authored-by: Copilot <[email protected]>
Summary
Eliminates
tempKv-driven temporary memory spikes in Garnet's scan-based key iteration by routing all four Garnet/Cluster callsites ofSession.Iterate(...)through a new lookup-basedSession.IterateLookupSnapshot(...)API on Tsavorite'sClientSession. The snapshot variant capturesLog.TailAddressonce and pins bothuntilAddressandmaxAddressto it — preserving the prior snapshot semantics (each unique live key emitted exactly once based on its latest in-range version, with the in-flight RCU race window correctly handled) without the O(N)tempKvallocation cost.Why
Session.Iterate(...)(pull and push variants) builds a parallelTsavoriteKV<...>index keyed by every distinct key encountered while scanning the main log (libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/TsavoriteIterator.cs:90) and copies in-range non-tail records into it. The hash index is sized like the main store; the in-memory log buffer is in addition to the main log. So each pull/pushSession.Iterate(...)call causes a temporary memory spike proportional to the keyspace — exactly the problem this PR fixes.The lookup-based equivalent uses the main store's hash chain for liveness checking and never builds a parallel KV. The snapshot variant additionally pins
maxAddress = untilAddressso concurrent RCUs above the snapshot don't silently drop in-range records (rule S2).Audit results
Garnet/Cluster callsites that built
tempKv(now converted):libs/server/StoreWrapper.csHasKeysInSlots(string + object pull-iter ×2)IterateLookupSnapshotwith cached early-exit callbacklibs/server/Storage/Session/Common/ArrayKeyIterationFunctions.csDBKeys(KEYS command, push)unifiedStoreDbKeysFuncsnow usesIterateLookupSnapshotlibs/server/Storage/Session/Common/ArrayKeyIterationFunctions.csIterateStore()parameterless pullIGarnetApi(breaking)libs/cluster/Server/ClusterManagerSlotState.csDeleteKeysInSlotsStorageSession.DeleteSlotKeys(slots)with cached callbacklibs/server/Resp/Vector/VectorManager.Cleanup.csRunCleanupTaskAsync(push)IterateLookupSnapshotAlready lookup-based (no change): SCAN cmd, DBSIZE, expired-key deletion, cluster
KEYSINSLOT/COUNTKEYSINSLOT,MigrateOperation.Scan, StreamingSnapshot replication.Raw
Log.Scan(...)callsites (verifiedtempKv-free, out of scope): post-checkpoint serialised-blob clear, hybrid log stats diagnostics, AOF replay.API changes
Added:
ClientSession.IterateLookupSnapshot<TScanFunctions>(ref scanFns, bool includeTombstones = false)(Tsavorite). WrapsScanCursorwith the snapshot patternuntilAddress = maxAddress = capturedTail. Drops boilerplate at four callsites.Removed (breaking):
IGarnetApi.IterateStore()overload (returnedITsavoriteScanIterator). Sole tempKv-instantiating Garnet API. The push overloadIterateStore<TScanFunctions>(ref scanFns, ref cursor, ...)is unchanged. Out-of-tree extensions binding toIGarnetApiwill need to migrate to the push overload.LogCompactionType.ShiftForcedenum value (deprecated 2024-06-23 in PR Add compaction docs, update website, add compaction switch #482, ~2 years ago) + back-compat shim. Migration:--compaction-type Shift --compaction-force-delete true.CommandLineParsernow produces a clear startup error listing the now-valid values.Reordered (no behavior change):
LogCompactionTypeenum members + help text + defaults.conf comment + website docs toNone | Shift | Lookup | Scanso users see the recommendedLookupfirst. Added a startupLogWarningif--compaction-type Scanis selected, flagging the temporary-memory-spike cost.Scanitself still works as before.Implementation conventions
All new callbacks follow the existing cached-field pattern (
xxxFuncs ??= new(); xxxFuncs.Initialize(...)) used byunifiedStoreDbKeysFuncs,unifiedStoreDbScanFuncs,unifiedStoreDbSizeFuncs,expiredKeyDeletionScanFuncs. Steady-state per-call callback allocations are zero.Per-call cost on the four hot paths drops from
{1–2 sessions + 1–2 TsavoriteKV instances + 2 iterators + O(N) record copies}to{1 session, 0 callback alloc}.Tests
New (this PR):
SpanByteIterateLookupSnapshotBasicCorrectness(libs/storage/Tsavorite/cs/test/SpanByteIterationTests.cs): inserts 200 keys, RCUs all of them, assertsIterateLookupSnapshotemits each unique live key exactly once with the post-RCU value; asserts the snapshot is stable across repeated calls.ClusterDelKeysInSlotRemovesStringAndObjectKeys(test/Garnet.test.cluster/ClusterMigrateTests.cs): exercisesCLUSTER DELKEYSINSLOTover both raw-string and collection-object keys; asserts both are deleted via the unified scan.SeKeysNoDuplicatesUnderRcu(test/Garnet.test/RespScanCommandsTests.cs): runs concurrent SET RCUs while issuingKEYSin a loop; asserts each key is returned exactly once across multiple rounds.Pre-existing tests touching the changed paths (all pass):
SpanByteIteration*suite (libs/storage/Tsavorite/cs/test/): 16/16 (+6 device-skips).RespScanCommandsTests(KEYS, SCAN, DBSIZE, expired-key deletion): 22/22.ClusterResetFailsForMasterWithKeysInSlots*(string + object),ClusterResetAfterFLushAll,ClusterSimpleSlotInfo,ClusterSimpleMigrateSlots: 9/9.RespVectorSetTests.DeleteVectorSet: passes.dotnet buildclean;dotnet format Garnet.slnx --verify-no-changesclean on changed files.Migration notes for downstream users
--compaction-type ShiftForced→ use--compaction-type Shift --compaction-force-delete true.IGarnetApi.IterateStore()(pull, no args) → useIGarnetApi.IterateStore<TScanFunctions>(ref scanFns, ref cursor, ...)(push, lookup-based) instead. The push overload is the recommended pattern and is already used by all in-tree cluster code.--compaction-type Scanwill see a one-time startupLogWarning. No behavior change — Scan compaction still works exactly as before.Verified facts cited in design
tempKvinstantiation site:libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/TsavoriteIterator.cs:90.maxAddressstrict check at< maxAddress):libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/FindRecord.cs:140.Readerruns with epoch suspended (DELETE-from-Reader is safe; precedent:ExpiredKeyDeletionScan):libs/storage/Tsavorite/cs/src/core/Allocator/AllocatorScan.cs:312-331.ScanCursorState.functionsis interface-typed → struct callbacks are boxed, so all callbacks are classes:libs/storage/Tsavorite/cs/src/core/Allocator/ScanCursorState.cs:8.