[V1] Backport: Fix AOF page/memory size defaults and validation (#1838)#1844
Closed
badrishc wants to merge 1 commit into
Closed
[V1] Backport: Fix AOF page/memory size defaults and validation (#1838)#1844badrishc wants to merge 1 commit into
badrishc wants to merge 1 commit into
Conversation
Backport of #1838 to release/v1. Tighten startup validation in GetAofSettings to catch all three AOF sizing mistakes with Garnet-specific errors that name the actual option, the effective rounded sizes, and the value to set, instead of letting them reach the generic Tsavorite messages or the runtime 'Entry does not fit on page' failure: 1. AofMemorySize <= AofPageSize (was '<', allowed equality) -- root cause of the misleading 'MemorySize must be at least twice the page size' error reported in #1811. 2. AofPageSize > AofSegmentSize -- clearer message naming both options. 3. New: AofPageSize < 2 * main-log PageSize. An AOF entry can be as large as the underlying main-log record (raw-string values up to PageSize, object commands like LPUSH/HSET can push it higher), so the prior default ratio made the runtime 'Entry does not fit on page' easy to hit on routine writes. This is the failure mode reported in #1770: an oversized SET threw 'Entry does not fit on page', closed the connection, but the value remained cached in memory creating an AOF/cache inconsistency. With the new pre-flight validation the misconfiguration is now caught at startup. Improve the Tsavorite runtime error in ValidateAllocatedLength to also include the entry size and a fix hint, while preserving the 'Entry does not fit on page' prefix for backward compat with the OversizedAofEntryDoesNotHangServer regression test (#1749). Bump defaults to comply with the new rule. v1's default main-log PageSize is 32m (vs 16m on main), so AofPageSize must be at least 64m: - AofPageSize: 4m -> 64m - AofMemorySize: 64m -> 128m Applied in defaults.conf and GarnetServerOptions.cs. Also bump the test-helper aofMemorySize defaults from 64m -> 128m in TestUtils.cs and ClusterTestContext.cs so the new 2x rule is satisfied when tests create a Garnet server with the new default AofPageSize. Update CLI help text (--aof-memory, --aof-page-size) and the public website doc to explain the 2x rules. Tests: - Added three regression tests in GarnetServerConfigTests covering the AofPageSize-vs-AofSegmentSize, AofMemorySize-vs-AofPageSize, and AofPageSize-vs-(2x main PageSize) boundaries. - Updated OversizedAofEntryDoesNotHangServer to pass pageSize: '1k' alongside aofPageSize: '4k' so it still triggers the runtime oversize-entry path under the new validation. Co-authored-by: Copilot <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
Backports the AOF sizing validation + default updates from #1838 onto release/v1, to prevent common misconfigurations from reaching runtime (notably the Entry does not fit on page AOF enqueue failure) and to provide clearer, Garnet-specific startup errors and docs.
Changes:
- Tighten
GarnetServerOptions.GetAofSettingsvalidation and bump v1 defaults toAofPageSize=64m/AofMemorySize=128m. - Improve TsavoriteLog’s oversize-entry exception message to include entry/page sizes and guidance.
- Update CLI help, website docs, and tests/test-helpers to align with the new sizing rules.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/getting-started/configuration.md | Document the new AOF sizing constraints (memory >= 2× page, AOF page >= 2× main PageSize). |
| test/Garnet.test/TestUtils.cs | Bump AOF helper defaults to avoid violating new validation with v1 defaults. |
| test/Garnet.test/RespAofTests.cs | Adjust regression test to satisfy new startup validation while still exercising runtime oversize behavior. |
| test/Garnet.test/GarnetServerConfigTests.cs | Add/extend config regression tests for the new AOF validation boundaries and improved messages. |
| test/Garnet.test.cluster/ClusterTestContext.cs | Bump cluster test helper AOF defaults consistent with new sizing rules. |
| libs/storage/Tsavorite/cs/src/core/TsavoriteLog/TsavoriteLog.cs | Improve oversize-entry exception message for easier diagnosis and remediation. |
| libs/server/Servers/GarnetServerOptions.cs | Enforce new validation rules and update v1 defaults for AOF sizing. |
| libs/host/defaults.conf | Update default config values/comments to match new AOF sizing requirements. |
| libs/host/Configuration/Options.cs | Update CLI help text for AOF-related options to describe the new constraints. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
914
to
916
| var memorySizeBits = AofMemorySizeBits(); | ||
| var pageSizeBits = AofPageSizeBits(); | ||
| var segmentSizeBits = AofSegmentSizeBits(); |
Comment on lines
96
to
105
| /// <summary> | ||
| /// Total AOF memory buffer used in bytes (rounds down to power of 2) - spills to disk after this limit. | ||
| /// </summary> | ||
| public string AofMemorySize = "64m"; | ||
| public string AofMemorySize = "128m"; | ||
|
|
||
| /// <summary> | ||
| /// Aof page size in bytes (rounds down to power of 2). | ||
| /// </summary> | ||
| public string AofPageSize = "4m"; | ||
| public string AofPageSize = "64m"; | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of #1838 to
release/v1. Fixes #1811 and #1770.Why a separate v1 PR
The main PR's cherry-pick almost applies cleanly, but v1 has two divergences that change the right defaults and require a small extra fix in the test helpers:
PageSizeis32m(vs16mon main). The new validation rule requiresAofPageSize >= 2 * PageSize, so on v1 the minimum AofPageSize that complies with defaults is 64m, not 32m. Defaults are therefore set toAofPageSize=64m/AofMemorySize=128mon v1.GetAofSettingssignature differs ((int dbId, LightEpoch epoch, out TsavoriteLogSettings)) from main's array-based one, so the test was rewritten against v1's API.aof-benchscaffolding inbenchmark/Resp.benchmark/Options.cs, so that file from Fix AOF page/memory size defaults and validation (#1811) #1838 is not backported.TestUtils.csandClusterTestContext.csdefaultedaofMemorySize: "64m". Combined with the new server defaultAofPageSize=64m, this would violate the new 2x rule on every test that uses the helper. Bumped these helper defaults from64mto128m.What this PR does
Same intent as #1838:
GetAofSettingsto catch three AOF sizing mistakes with Garnet-specific errors that name the actual option and the value to set:AofMemorySize <= AofPageSize(the original misconfiguration in Garnet crashes on large dataset ingestion (20K records) with high CPU spike #1811).AofPageSize > AofSegmentSize.AofPageSize < 2 * main-log PageSize. An AOF entry can be as large as the underlying main-log record, so without this rule the runtime'Entry does not fit on page'failure (Value exceeding AofPageSize is not written to AOF but still cached #1770) is easy to hit on routine writes with default config.ValidateAllocatedLengthto include entry size and a fix hint (preserving the'Entry does not fit on page'prefix for backward compat with theOversizedAofEntryDoesNotHangServerregression test from High CPU Usage After AofPageSize Exceeded on Subsequent Gets with Same Key #1749).AofPageSize 4m -> 64m,AofMemorySize 64m -> 128m.Why this also fixes #1770
#1770 reports that an oversized SET throws
Tsavorite.core.TsavoriteException: Entry does not fit on pagefrom inside the AOF write path, closes the connection, but leaves the value cached in memory — creating an AOF/cache divergence. The root cause is exactly the misconfiguration the new validation rule prevents: with the old defaults (AofPageSize=4mand main-logPageSize=32mon v1), an entry up toPageSizebytes is committed to the main store but cannot fit on an AOF page. With the new defaults and validation, the misconfiguration cannot reach runtime — it is caught at startup.Validation
dotnet buildclean (net10.0, 0 warnings).GarnetServerConfigTests+RespAofTestsfull suites: 53 passed, 0 failed, 1 skipped onnet10.0Debug.Related