Skip to content

Fix AOF page/memory size defaults and validation (#1811)#1838

Merged
badrishc merged 1 commit into
mainfrom
badrishc/fix-size-defaults
Jun 2, 2026
Merged

Fix AOF page/memory size defaults and validation (#1811)#1838
badrishc merged 1 commit into
mainfrom
badrishc/fix-size-defaults

Conversation

@badrishc

@badrishc badrishc commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Tighten startup validation in GetAofSettings to catch all three AOF sizing mistakes with Garnet-specific errors that name the actual option, the effective rounded sizes, and the value to set, instead of letting them reach the generic Tsavorite messages or the runtime 'Entry does not fit on page' failure:

  1. AofMemorySize <= AofPageSize (was '<', allowed equality) — root cause of the misleading 'MemorySize must be at least twice the page size' error reported in Garnet crashes on large dataset ingestion (20K records) with high CPU spike #1811.
  2. AofPageSize > AofSegmentSize — clearer message naming both options.
  3. New: AofPageSize < 2 * main-log PageSize. An AOF entry can be as large as the underlying main-log record (raw-string values up to PageSize, object commands like LPUSH/HSET can push it higher), so the prior default ratio (AofPageSize=4m vs PageSize=16m) made the runtime 'Entry does not fit on page' easy to hit on routine writes.

Improve the Tsavorite runtime error in ValidateAllocatedLength to also include the entry size and a fix hint, while preserving the 'Entry does not fit on page' prefix for backward compat with the OversizedAofEntryDoesNotHangServer regression test (#1749).

Bump defaults to comply with the new rule:

  • AofPageSize: 4m -> 32m
  • AofMemorySize: 64m -> 128m
    Applied in defaults.conf, GarnetServerOptions.cs, and the Resp.benchmark
    CLI defaults so the AOF benchmark works out of the box.

Update CLI help text (--aof-memory, --aof-page-size) and the public website doc to explain the 2x rule.

Tests:

  • Added two regression tests in GarnetServerConfigTests covering the AofPageSize-vs-PageSize boundary (fails just below 2x, succeeds at exactly 2x).
  • Updated OversizedAofEntryDoesNotHangServer to pass pageSize: '1k' alongside aofPageSize: '4k' so it still triggers the runtime oversize-entry path under the new validation.

Fixes #1811
Fixes #1770

Copilot AI review requested due to automatic review settings June 1, 2026 00:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens AOF sizing validation and updates defaults/documentation so Garnet catches invalid AOF configurations at startup with clearer Garnet-specific messages instead of surfacing generic Tsavorite/runtime failures.

Changes:

  • Bumps default AOF page/memory sizes to satisfy the new validation rules.
  • Adds AOF size validation for memory/page, page/segment, and AOF page vs main-log page relationships.
  • Updates tests, CLI help, benchmark defaults, docs, and the Tsavorite oversized-entry error message.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
libs/server/Servers/GarnetServerOptions.cs Updates AOF defaults and adds startup validation with detailed error messages.
libs/host/defaults.conf Updates default AOF sizes and explanatory configuration comments.
libs/host/Configuration/Options.cs Updates Garnet CLI help text for AOF sizing requirements.
benchmark/Resp.benchmark/Options.cs Aligns RESP benchmark AOF defaults/help text with server defaults.
libs/storage/Tsavorite/cs/src/core/TsavoriteLog/TsavoriteLog.cs Improves oversized log-entry exception details while preserving the existing prefix.
test/standalone/Garnet.test/GarnetServerConfigTests.cs Adds/updates regression coverage for AOF validation boundaries.
test/standalone/Garnet.test.scripting/RespAofTests.cs Adjusts oversized AOF entry test to satisfy new startup validation.
website/docs/getting-started/configuration.md Documents the new AOF memory/page sizing requirements.

Comment thread libs/host/defaults.conf Outdated
@badrishc badrishc force-pushed the badrishc/fix-size-defaults branch from d1e659d to 6543b8f Compare June 1, 2026 00:48
@rdavisunr

Copy link
Copy Markdown

Looks like this also fixes #1770?

When ready, could these changes go into v1 as well?

@badrishc

badrishc commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

Looks like this also fixes #1770?

When ready, could these changes go into v1 as well?

These are simply usability fixes (defaults) - in v1, you could simply manually override them via config. Does that work for you?

@rdavisunr

rdavisunr commented Jun 1, 2026

Copy link
Copy Markdown

These are simply usability fixes (defaults) - in v1, you could simply manually override them via config. Does that work for you?

Yes, we could potentially do that. I see #1844 is already in progress though 👍

@badrishc

badrishc commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

These are simply usability fixes (defaults) - in v1, you could simply manually override them via config. Does that work for you?

Yes, we could potentially do that. I see #1844 is already in progress though 👍

Changing the default has memory implications on existing v1 prod deployments that do not need to handle such large values, so we don't plan to backport this to v1.

@badrishc badrishc force-pushed the badrishc/fix-size-defaults branch from 6543b8f to 871eb8f Compare June 2, 2026 01:02
Tighten startup validation in GetAofSettings to catch all three AOF sizing
mistakes with Garnet-specific errors that name the actual option, the
effective rounded sizes, and the value to set, instead of letting them
reach the generic Tsavorite messages or the runtime 'Entry does not fit
on page' failure:

  1. AofMemorySize <= AofPageSize (was '<', allowed equality) — root cause
     of the misleading 'MemorySize must be at least twice the page size'
     error reported in #1811.
  2. AofPageSize > AofSegmentSize — clearer message naming both options.
  3. New: AofPageSize < 2 * main-log PageSize. An AOF entry can be as
     large as the underlying main-log record (raw-string values up to
     PageSize, object commands like LPUSH/HSET can push it higher), so
     the prior default ratio (AofPageSize=4m vs PageSize=16m) made the
     runtime 'Entry does not fit on page' easy to hit on routine writes.

Improve the Tsavorite runtime error in ValidateAllocatedLength to also
include the entry size and a fix hint, while preserving the
'Entry does not fit on page' prefix for backward compat with the
OversizedAofEntryDoesNotHangServer regression test (#1749).

Bump defaults to comply with the new rule:
  - AofPageSize:   4m  -> 32m
  - AofMemorySize: 64m -> 128m
Applied in defaults.conf, GarnetServerOptions.cs, and the Resp.benchmark
CLI defaults so the AOF benchmark works out of the box.

Update CLI help text (--aof-memory, --aof-page-size) and the public
website doc to explain the 2x rule.

Tests:
  - Added two regression tests in GarnetServerConfigTests covering the
    AofPageSize-vs-PageSize boundary (fails just below 2x, succeeds at
    exactly 2x).
  - Updated OversizedAofEntryDoesNotHangServer to pass pageSize: '1k'
    alongside aofPageSize: '4k' so it still triggers the runtime
    oversize-entry path under the new validation.

Co-authored-by: Copilot <[email protected]>
@badrishc badrishc force-pushed the badrishc/fix-size-defaults branch from 871eb8f to 30f7045 Compare June 2, 2026 01:02
@badrishc badrishc merged commit c171934 into main Jun 2, 2026
367 of 369 checks passed
@badrishc badrishc deleted the badrishc/fix-size-defaults branch June 2, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Garnet crashes on large dataset ingestion (20K records) with high CPU spike Value exceeding AofPageSize is not written to AOF but still cached

4 participants