Skip to content

Avoid tempKv-driven scan iteration in Garnet callers#1797

Merged
badrishc merged 3 commits into
mainfrom
badrishc/avoid-scan-iteration
May 13, 2026
Merged

Avoid tempKv-driven scan iteration in Garnet callers#1797
badrishc merged 3 commits into
mainfrom
badrishc/avoid-scan-iteration

Conversation

@badrishc

Copy link
Copy Markdown
Collaborator

Summary

Eliminates tempKv-driven temporary memory spikes in Garnet's scan-based key iteration by routing all four Garnet/Cluster callsites of Session.Iterate(...) through a new lookup-based Session.IterateLookupSnapshot(...) API on Tsavorite's ClientSession. The snapshot variant captures Log.TailAddress once and pins both untilAddress and maxAddress to it — preserving the prior snapshot semantics (each unique live key emitted exactly once based on its latest in-range version, with the in-flight RCU race window correctly handled) without the O(N) tempKv allocation cost.

Why

Session.Iterate(...) (pull and push variants) builds a parallel TsavoriteKV<...> index keyed by every distinct key encountered while scanning the main log (libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/TsavoriteIterator.cs:90) and copies in-range non-tail records into it. The hash index is sized like the main store; the in-memory log buffer is in addition to the main log. So each pull/push Session.Iterate(...) call causes a temporary memory spike proportional to the keyspace — exactly the problem this PR fixes.

The lookup-based equivalent uses the main store's hash chain for liveness checking and never builds a parallel KV. The snapshot variant additionally pins maxAddress = untilAddress so concurrent RCUs above the snapshot don't silently drop in-range records (rule S2).

Audit results

Garnet/Cluster callsites that built tempKv (now converted):

File:Line Caller Conversion
libs/server/StoreWrapper.cs HasKeysInSlots (string + object pull-iter ×2) Single unified IterateLookupSnapshot with cached early-exit callback
libs/server/Storage/Session/Common/ArrayKeyIterationFunctions.cs DBKeys (KEYS command, push) Existing cached unifiedStoreDbKeysFuncs now uses IterateLookupSnapshot
libs/server/Storage/Session/Common/ArrayKeyIterationFunctions.cs IterateStore() parameterless pull Removed from IGarnetApi (breaking)
libs/cluster/Server/ClusterManagerSlotState.cs DeleteKeysInSlots Refactored into StorageSession.DeleteSlotKeys(slots) with cached callback
libs/server/Resp/Vector/VectorManager.Cleanup.cs RunCleanupTaskAsync (push) IterateLookupSnapshot

Already lookup-based (no change): SCAN cmd, DBSIZE, expired-key deletion, cluster KEYSINSLOT/COUNTKEYSINSLOT, MigrateOperation.Scan, StreamingSnapshot replication.

Raw Log.Scan(...) callsites (verified tempKv-free, out of scope): post-checkpoint serialised-blob clear, hybrid log stats diagnostics, AOF replay.

API changes

Added: ClientSession.IterateLookupSnapshot<TScanFunctions>(ref scanFns, bool includeTombstones = false) (Tsavorite). Wraps ScanCursor with the snapshot pattern untilAddress = maxAddress = capturedTail. Drops boilerplate at four callsites.

Removed (breaking):

  • Parameterless pull IGarnetApi.IterateStore() overload (returned ITsavoriteScanIterator). Sole tempKv-instantiating Garnet API. The push overload IterateStore<TScanFunctions>(ref scanFns, ref cursor, ...) is unchanged. Out-of-tree extensions binding to IGarnetApi will need to migrate to the push overload.
  • LogCompactionType.ShiftForced enum value (deprecated 2024-06-23 in PR Add compaction docs, update website, add compaction switch #482, ~2 years ago) + back-compat shim. Migration: --compaction-type Shift --compaction-force-delete true. CommandLineParser now produces a clear startup error listing the now-valid values.

Reordered (no behavior change): LogCompactionType enum members + help text + defaults.conf comment + website docs to None | Shift | Lookup | Scan so users see the recommended Lookup first. Added a startup LogWarning if --compaction-type Scan is selected, flagging the temporary-memory-spike cost. Scan itself still works as before.

Implementation conventions

All new callbacks follow the existing cached-field pattern (xxxFuncs ??= new(); xxxFuncs.Initialize(...)) used by unifiedStoreDbKeysFuncs, unifiedStoreDbScanFuncs, unifiedStoreDbSizeFuncs, expiredKeyDeletionScanFuncs. Steady-state per-call callback allocations are zero.

Per-call cost on the four hot paths drops from {1–2 sessions + 1–2 TsavoriteKV instances + 2 iterators + O(N) record copies} to {1 session, 0 callback alloc}.

Tests

New (this PR):

  • SpanByteIterateLookupSnapshotBasicCorrectness (libs/storage/Tsavorite/cs/test/SpanByteIterationTests.cs): inserts 200 keys, RCUs all of them, asserts IterateLookupSnapshot emits each unique live key exactly once with the post-RCU value; asserts the snapshot is stable across repeated calls.
  • ClusterDelKeysInSlotRemovesStringAndObjectKeys (test/Garnet.test.cluster/ClusterMigrateTests.cs): exercises CLUSTER DELKEYSINSLOT over both raw-string and collection-object keys; asserts both are deleted via the unified scan.
  • SeKeysNoDuplicatesUnderRcu (test/Garnet.test/RespScanCommandsTests.cs): runs concurrent SET RCUs while issuing KEYS in a loop; asserts each key is returned exactly once across multiple rounds.

Pre-existing tests touching the changed paths (all pass):

  • SpanByteIteration* suite (libs/storage/Tsavorite/cs/test/): 16/16 (+6 device-skips).
  • RespScanCommandsTests (KEYS, SCAN, DBSIZE, expired-key deletion): 22/22.
  • ClusterResetFailsForMasterWithKeysInSlots* (string + object), ClusterResetAfterFLushAll, ClusterSimpleSlotInfo, ClusterSimpleMigrateSlots: 9/9.
  • RespVectorSetTests.DeleteVectorSet: passes.

dotnet build clean; dotnet format Garnet.slnx --verify-no-changes clean on changed files.

Migration notes for downstream users

  • --compaction-type ShiftForced → use --compaction-type Shift --compaction-force-delete true.
  • IGarnetApi.IterateStore() (pull, no args) → use IGarnetApi.IterateStore<TScanFunctions>(ref scanFns, ref cursor, ...) (push, lookup-based) instead. The push overload is the recommended pattern and is already used by all in-tree cluster code.
  • Users explicitly opting into --compaction-type Scan will see a one-time startup LogWarning. No behavior change — Scan compaction still works exactly as before.

Verified facts cited in design

  • tempKv instantiation site: libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/TsavoriteIterator.cs:90.
  • Snapshot semantics rationale (maxAddress strict check at < maxAddress): libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/FindRecord.cs:140.
  • Reader runs with epoch suspended (DELETE-from-Reader is safe; precedent: ExpiredKeyDeletionScan): libs/storage/Tsavorite/cs/src/core/Allocator/AllocatorScan.cs:312-331.
  • ScanCursorState.functions is interface-typed → struct callbacks are boxed, so all callbacks are classes: libs/storage/Tsavorite/cs/src/core/Allocator/ScanCursorState.cs:8.

Copilot AI review requested due to automatic review settings May 13, 2026 01:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes tempKv-backed scan iteration from Garnet/Cluster callsites to avoid temporary memory spikes proportional to the keyspace. It does so by adding a Tsavorite ClientSession.IterateLookupSnapshot(...) helper (capturing TailAddress once and pinning endAddress == maxAddress) and routing affected iterations through this snapshot lookup path, while also updating compaction-type options/docs and adding regression tests.

Changes:

  • Added ClientSession.IterateLookupSnapshot<TScanFunctions>(...) and switched Garnet/Cluster scan-based callers (e.g., KEYS, slot scans, vector cleanup) to use snapshot lookup iteration (no tempKv).
  • Removed the parameterless pull IGarnetApi.IterateStore() overload and replaced the slot-deletion path with IGarnetApi.DeleteSlotKeys(HashSet<int>) + a cached scan callback.
  • Updated LogCompactionType/options/docs to remove deprecated ShiftForced, reorder option presentation, and warn when Scan compaction is selected; added tests for snapshot iteration correctness and the converted callsites.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
website/docs/getting-started/configuration.md Updates compaction-type option ordering and warns about Scan transient memory costs.
test/Garnet.test/RespScanCommandsTests.cs Adds regression test ensuring KEYS returns no duplicates under concurrent RCUs.
test/Garnet.test.cluster/ClusterMigrateTests.cs Adds regression test for CLUSTER DELKEYSINSLOT deleting both string and object keys via unified lookup scan.
libs/storage/Tsavorite/cs/test/SpanByteIterationTests.cs Adds correctness test for IterateLookupSnapshot behavior after forced RCUs.
libs/storage/Tsavorite/cs/src/core/ClientSession/ClientSession.cs Introduces IterateLookupSnapshot convenience API built on ScanCursor with pinned snapshot bounds.
libs/server/StoreWrapper.cs Converts HasKeysInSlots to a single unified lookup snapshot scan with a cached callback.
libs/server/Storage/Session/Common/ArrayKeyIterationFunctions.cs Adds DeleteSlotKeys and new cached scan callbacks; converts DBKeys to snapshot lookup iteration.
libs/server/Resp/Vector/VectorManager.Cleanup.cs Switches cleanup scan to IterateLookupSnapshot to avoid tempKv.
libs/server/LogCompactionType.cs Removes ShiftForced, reorders enum, and adds Scan warning text in docs/comments.
libs/server/API/IGarnetApi.cs Replaces pull IterateStore() with DeleteSlotKeys(HashSet<int>).
libs/server/API/GarnetWatchApi.cs Implements new DeleteSlotKeys API forwarding.
libs/server/API/GarnetApi.cs Implements new DeleteSlotKeys API forwarding.
libs/host/defaults.conf Updates compaction-type comments to recommend Lookup and warn on Scan.
libs/host/Configuration/Options.cs Updates compaction-type help text and adds a startup warning when Scan is selected.
libs/cluster/Server/ClusterManagerSlotState.cs Refactors slot-key deletion to call basicGarnetApi.DeleteSlotKeys(...).

Comment thread test/Garnet.test/RespScanCommandsTests.cs Outdated
Comment thread test/Garnet.test.cluster/ClusterMigrateTests.cs Outdated
@badrishc

Copy link
Copy Markdown
Collaborator Author

Addressed both reviewer comments in commit 0364fc8:

  1. ClusterMigrateTests.cs Order collision — changed the new ClusterDelKeysInSlotRemovesStringAndObjectKeys from [Order(22)] to [Order(28)]. The existing Orders go 20, 21, 22 (#if DEBUG block at line 1930), 23, 24, 25, 26, 27 — 28 is the next free slot.

  2. RespScanCommandsTests.SeKeysNoDuplicatesUnderRcu writer waitwriter.Wait(...) now (a) uses a 30s window so the writer has time to drain even on slow CI, (b) asserts the wait succeeded so the test fails fast if the writer hangs (preventing connection leakage into subsequent tests), and (c) re-throws writer.Exception if the writer task faulted so background errors aren't silently masked.

Both targeted tests still pass locally.

badrishc and others added 2 commits May 12, 2026 19:12
Replace the four Garnet/Cluster callsites of `Session.Iterate(...)` (which
allocates a parallel `tempKv` proportional to the keyspace and copies every
in-range record into it) with the lookup-based `Session.IterateLookupSnapshot`,
a new Tsavorite ClientSession method that captures `Log.TailAddress` once
and pins both `untilAddress` and `maxAddress` to it. This preserves the prior
snapshot semantics (each unique live key emitted exactly once based on its
latest in-range version, with the in-flight RCU race window correctly handled)
without the O(N) `tempKv` memory cost.

Converted callsites:
- `StoreWrapper.HasKeysInSlots`: collapses two pull-iterations (string +
  object) into a single push scan over the unified context with a cached
  early-exit callback (`hasKeysInSlotsFuncs` field on StoreWrapper).
- `StorageSession.DBKeys` (KEYS command): switched the existing cached
  `unifiedStoreDbKeysFuncs` callback from `Session.Iterate(...)` to
  `IterateLookupSnapshot`.
- `ClusterManagerSlotState.DeleteKeysInSlots`: refactored into a new
  `StorageSession.DeleteSlotKeys(slots)` method on the unified context
  with a cached `deleteSlotKeysFuncs` callback. Cluster static helper now
  one-line delegate. Preserves prior behavior — deletes every matched live
  key including expired-but-not-yet-tombstoned ones (no expiry filter).
- `VectorManager.Cleanup`: switched from push `Session.Iterate(ref callbacks)`
  to `IterateLookupSnapshot`.

API changes:
- New `ClientSession.IterateLookupSnapshot<TScanFunctions>(ref scanFns,
  bool includeTombstones = false)` on Tsavorite. Wraps `ScanCursor` with
  the snapshot pattern `untilAddress = maxAddress = capturedTail`.
- Removed parameterless pull `IGarnetApi.IterateStore()` overload (the
  only Garnet API method that exposed the tempKv pull iterator). The push
  overload `IterateStore<TScanFunctions>(...)` is unchanged. **Breaking
  change** for out-of-tree extensions that bind to `IGarnetApi`.
- Removed deprecated `LogCompactionType.ShiftForced` enum value plus its
  back-compat shim. `ShiftForced` was deprecated in PR #482 (June 2024,
  almost two years ago); long-deprecated values are removed in v2.
  Migration: `--compaction-type Shift --compaction-force-delete true`.
  **Breaking change** in config surface — `CommandLineParser` produces a
  clear startup error listing the now-valid values `None | Shift | Lookup | Scan`.
- Reordered `LogCompactionType` enum members + help-text to
  `None | Shift | Lookup | Scan` so users see the recommended `Lookup`
  before the not-recommended `Scan`. Added a startup `LogWarning` when
  `CompactionType = Scan` is selected to flag the temporary-memory-spike
  cost. `Scan` itself still works as before — no behavior change.

All callbacks follow the existing cached-field pattern
(`xxxFuncs ??= new(); xxxFuncs.Initialize(...)`) used by
`unifiedStoreDbKeysFuncs`, `unifiedStoreDbScanFuncs`,
`unifiedStoreDbSizeFuncs`, `expiredKeyDeletionScanFuncs` — steady-state
per-call callback allocations are zero. Net per-call cost on the four
hot paths drops from `{1-2 sessions + 1-2 TsavoriteKV instances + 2
iterators + O(N) record copies}` to `{1 session, 0 callback alloc}`.

Tests added:
- `SpanByteIterateLookupSnapshotBasicCorrectness`
  (libs/storage/Tsavorite/cs/test/SpanByteIterationTests.cs):
  inserts 200 keys, RCUs all of them, asserts `IterateLookupSnapshot`
  emits each unique live key exactly once with the post-RCU value;
  asserts the snapshot is stable across calls.
- `ClusterDelKeysInSlotRemovesStringAndObjectKeys`
  (test/Garnet.test.cluster/ClusterMigrateTests.cs): exercises
  `CLUSTER DELKEYSINSLOT` over both raw-string and collection-object
  keys via the unified scan; asserts both are deleted.
- `SeKeysNoDuplicatesUnderRcu`
  (test/Garnet.test/RespScanCommandsTests.cs): runs concurrent SET
  RCUs while issuing KEYS in a loop, asserts each key is returned
  exactly once across multiple rounds.

Documentation:
- `libs/host/defaults.conf` and `website/docs/getting-started/configuration.md`
  updated to use the new compaction-type ordering and call out
  `Scan` as NOT RECOMMENDED.

Co-authored-by: Copilot <[email protected]>
- ClusterDelKeysInSlotRemovesStringAndObjectKeys: change [Order(22)] to
  [Order(28)] to avoid collision with the existing #if DEBUG-gated Order(22)
  test in the same fixture (ordered, non-parallelizable execution would
  otherwise be ambiguous).
- SeKeysNoDuplicatesUnderRcu: assert that the background writer actually
  stops within the wait window (raised from 5s to 30s) and surface any
  exception the writer threw, so a hung/faulted writer can't leak a
  connection or silently mask test failures.

Co-authored-by: Copilot <[email protected]>
@badrishc badrishc force-pushed the badrishc/avoid-scan-iteration branch from 0364fc8 to 890226a Compare May 13, 2026 02:13

@TedHartMS TedHartMS left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a second slot to ClusterDelKeysInSlotRemovesStringAndObjectKeys and ensuring that keys in that slot are not deleted

…l slot

The previous version of the test only verified that DELKEYSINSLOT removed
the targeted keys; it did not verify that keys in OTHER slots are preserved.
A bug in the slot-set filter inside DeleteSlotKeysScan.Reader (e.g., a wrong
`slots.Contains(...)` check or a hash-slot computation regression) would
have collateral-deleted unrelated keys without this test catching it.

Now the test:
- Picks two distinct slots owned by node 0 (delSlot, keepSlot).
- Inserts one string key + one object key in EACH slot (4 keys total).
- Runs CLUSTER DELKEYSINSLOT on delSlot only.
- Asserts delSlot is empty and both delSlot keys are gone via GET/EXISTS.
- Asserts keepSlot still has both keys with original values (string GET
  returns 'raw-keep'; SCARD on the object key returns 3).

Co-authored-by: Copilot <[email protected]>
@badrishc badrishc merged commit a9aa8b7 into main May 13, 2026
30 of 34 checks passed
@badrishc badrishc deleted the badrishc/avoid-scan-iteration branch May 13, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants