Adjust v2 defaults by badrishc · Pull Request #1805 · microsoft/garnet

badrishc · 2026-05-15T17:18:56Z

Updates Garnet's v2 defaults to better reflect typical operational deployments and the vector-database workload.

PageSize               : 32m → 16m
ReadCachePageSize      : 32m → 4m
ValueOverflowThreshold : 4k  → 16k

Rationale

`PageSize = 16m` (was `32m`)

A modest 2× reduction. Earlier drafts proposed 4 MB; after a critical re-examination (see below), 16 MB is the more defensible compromise.

Real benefits over 32 MB:

Halves flush-completion latency at the device level (~5 ms vs ~10 ms for 16 MB vs 32 MB on a 3 GB/s NVMe). Flush completion gates FlushedUntilAddress, which gates replication acks and AOF commit cadence.
2× finer HeadAddress step → LogSizeTracker keeps memory closer to target on tight budgets.
Halves the worst-case tail-page memory waste (16 MB pinned for a nearly-empty tail page vs 32 MB).
Page allocation zero-fill is 2× faster — less latency on the unlucky thread that crosses a page boundary into an unallocated page.

Cache behavior is preserved. For a 64 GB typical deployment:

32 MB pages → 2,048 entries → pagePointers 16 KB → L1d.
16 MB pages → 4,096 entries → pagePointers 32 KB → still L1d on every mainstream server CPU.
(4 MB pages would have been 16,384 entries → 128 KB → spills to L2. That's why we backed off 4 MB.)

Costs are modest:

2× more page-boundary crossings (more PageOffset CAS / HandlePageOverflow invocations).
2× faster RCU onset (records age from mutable → read-only 2× sooner).
2× larger per-page metadata arrays (PageStatusIndicator, objectPages, etc.).

`ReadCachePageSize = 4m` (was `32m`)

The previous 32 MB default was clearly broken for the read cache; 4 MB is the right operational point.

Avoid the broken-at-32m regime. With a 1 GB ReadCacheMemorySize default and 32 MB pages, there are only ~32 pages total — too coarse for any useful eviction or second-chance behavior. 4 MB gives 256 pages, which is plenty.
Eviction is byte-granular. LogSizeTracker.DetermineEvictionRange evicts at record granularity; the page-size choice only governs when the underlying page buffer is freed back to the allocator. At 256 pages on a 1 GB budget, each freed page is a 0.4% memory step — comfortably smooth.
Why not 1 MB? Earlier drafts proposed 1 MB. At 1,024 pages the eviction granularity (0.1% per page free) is overkill, and you pay 4× the per-page metadata (ObjectIdMap, PageStatusIndicator) and 4× the page-allocation churn during cache fills. 4 MB captures the win at a fraction of the overhead and matches the 4:1 ratio with the 16 MB main-log page.
No flush penalty either way. Read cache uses NullDevice — there is no flush IO whose cost grows with page count, so smaller pages are essentially free on that axis.

`ValueOverflowThreshold = 16k` (was `4k`)

Critical for the vector-database use case.

DiskANN "1 IO per hop" invariant. Garnet's vector-set surface uses DiskANN, whose disk-graph traversal performance hinges on each hop being a single IO. Records stored as overflow byte[] currently incur 2 IOs on cold reads (one for the main-log record header to learn the overflow offset, one for the overflow payload itself) and cannot be coalesced because the overflow position lives in the header.
Covers all mainstream embeddings inline. Typical embedding vector sizes (FP32, plus ~24 B record header):
- BGE-Base (768d): ~3.1 KB
- text-embedding-ada-002 (1536d): ~6.3 KB
- text-embedding-3-large (3072d): ~12.3 KB
16 KB inline keeps all of those in a single IO. The next jump (4096d FP32, Cohere v4 / Mistral-embed at ~16.3 KB) still overflows, but those models are uncommon enough that the page-density trade-off of moving to 32 KB was judged not worth it.
Optional optimization assumed. Maximum benefit requires IStreamBuffer.InitialIOSize to be hinted up to the record size so the first IO grabs the entire inline record; without that hint the OS-page-sized initial read (4 KB on Linux x64) wastes one IO for inline records > 4 KB. That fix is straightforward and orthogonal.

Why not 4 MB? (notes from review)

Earlier iterations proposed PageSize = 4m. Critical review concluded the operational arguments for going that small don't hold up:

"Smaller flush bursts" — Flush is async; the writer never waits on it. Total bytes pushed are identical; smaller IOs may be less device-efficient. Only matters for FlushedUntilAddress tick frequency, which is a marginal effect.
"Finer ReadOnlyAddress granularity" — Actually a negative. Records age into immutable 8× faster, causing more RCU traffic on writes.
"Smaller blast radius on corruption" — Tsavorite doesn't have page-checksum-and-discard recovery; corruption typically fails recovery rather than discarding a page.
Cache fit — pagePointers for 4 MB / 64 GB log is 128 KB, which spills out of L1d on every modern server CPU.
YCSB delta — The 5–6% in-memory-read advantage seen at 4 MB was on a dataset that fit entirely in memory (no flush/eviction), within run-to-run noise, and likely an artifact of insert-phase contention patterns rather than a steady-state read win.

16 MB captures the meaningful operational wins without these downsides.

Testing

GarnetServerConfigTests (ImportExportConfigLocal, ImportExportConfigAzure, ValueOverflowThresholdParsing) had hard-coded "32m" and "4k" assertions; updated to the new defaults. Affected tests pass locally.

…ueOverflowThreshold=16k Co-authored-by: Copilot <[email protected]>

Co-authored-by: Copilot <[email protected]>

Addresses Copilot review comments on PR #1805: - ImportExportConfigLocal/Azure: default PageSize now "4m" (was "32m"); override changed to "8m" so the test continues to verify non-default tracking now that "4m" matches the default. - ValueOverflowThresholdParsing: default ValueOverflowThreshold now "16k" (16384 bytes), was "4k" (4096 bytes). Co-authored-by: Copilot <[email protected]>

After a more critical review, 4 MB is hard to defend on a typical 64 GB log: - The flush-burst, ReadOnly-granularity, and blast-radius rationale is weak or wrong-directional (smaller pages age records into immutable faster, causing MORE RCU, not less). - pagePointers at 4 MB / 64 GB log = 128 KB, which spills out of L1d on every modern server CPU (32-48 KB L1d). - The YCSB result was on an in-memory dataset with no flush/eviction activity and was within noise across iterations. 16 MB is a defensible compromise: - Halves flush-completion latency and tail-page waste vs 32 MB. - 2x finer eviction granularity. - pagePointers (32 KB on 64 GB log) still fits L1d. - Only doubles per-page metadata vs 32 MB instead of 8x. ReadCachePageSize=1m and ValueOverflowThreshold=16k unchanged (those wins are clean and well-grounded). Co-authored-by: Copilot <[email protected]>

With a 1 GB ReadCacheMemorySize default and byte-granular eviction via LogSizeTracker, 256 read-cache pages (4 MB each) already provide ample eviction granularity (~0.4% memory step per page free) while halving per-page metadata (ObjectIdMap, PageStatusIndicator) and reducing page-allocation churn during cache fills. 1 MB was over-conservative. Co-authored-by: Copilot <[email protected]>

badrishc and others added 2 commits May 15, 2026 09:34

Adjust Garnet server defaults: PageSize=4m, ReadCachePageSize=1m, Val…

aa7bca2

…ueOverflowThreshold=16k Co-authored-by: Copilot <[email protected]>

Align programmatic GarnetServerOptions defaults with defaults.conf

bea29a7

Co-authored-by: Copilot <[email protected]>

Copilot AI review requested due to automatic review settings May 15, 2026 17:18

Copilot started reviewing on behalf of badrishc May 15, 2026 17:20 View session

This comment was marked as outdated.

Sign in to view

badrishc marked this pull request as draft May 15, 2026 17:26

badrishc marked this pull request as ready for review May 16, 2026 01:00

TedHartMS approved these changes May 16, 2026

View reviewed changes

tiagonapoli approved these changes May 16, 2026

View reviewed changes

badrishc and others added 2 commits May 16, 2026 12:28

TedHartMS approved these changes May 17, 2026

View reviewed changes

badrishc merged commit d3677cf into main May 17, 2026
282 of 283 checks passed

badrishc deleted the badrishc/adjust-defaults branch May 17, 2026 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust v2 defaults#1805

Adjust v2 defaults#1805
badrishc merged 5 commits into
mainfrom
badrishc/adjust-defaults

badrishc commented May 15, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

badrishc commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

PageSize = 16m (was 32m)

ReadCachePageSize = 4m (was 32m)

ValueOverflowThreshold = 16k (was 4k)

Why not 4 MB? (notes from review)

Testing

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

badrishc commented May 15, 2026 •

edited

Loading

`PageSize = 16m` (was `32m`)

`ReadCachePageSize = 4m` (was `32m`)

`ValueOverflowThreshold = 16k` (was `4k`)