*: lower default GC TTL to 4h#93836
Conversation
a0e56b9 to
6c471eb
Compare
Maybe add another release note "backward-incompatible change", so it gets more scrutiny by users. |
6c471eb to
4b0f8c2
Compare
irfansharif
left a comment
There was a problem hiding this comment.
Updated all the testlogic files but I'm not sure why there's so much indentation diff, I've used just vanilla dev test{logic} ... --rewrite to power through. It's making for a larger diff than is real -- all that's changing is s/90000/14400 wherever we've rendered the zone config.
Maybe add another release note "backward-incompatible change", so it gets more scrutiny by users.
Done (left a note for the docs team to highlight this more prominently, even though it's not technically backwards compatible). PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @dt)
4b0f8c2 to
89b9b38
Compare
Using |
89b9b38 to
9c1ee03
Compare
|
(Bump.) |
|
What's the story for restoring backups here? I only skimmed through the changes but I don't see any injection of zone configs upon restoring pre-23.1 backups, so it's safe to assume that cluster- and partial restores will be subject to this new 4h default as well. Are we cool with that? To me it feels strange. |
|
Good point, I didn't do any such thing. @dt or @adityamaru, can you maybe show me where I could inject such a hook? We backup+restore system.zones: cockroach/pkg/ccl/backupccl/system_schema.go Lines 375 to 376 in 411c8c2 So some point after the system table is restored, and maybe before MVCC GC has a chance to kick in (or before user tables are restored), I go explicitly set the existing GC TTL to whatever it was before? Looking at the code, we do have descriptors that we "pre-"restore before restoring user tables, so somewhere here? cockroach/pkg/ccl/backupccl/restore_job.go Lines 1301 to 1302 in abc836c |
Cluster restore deletes the zone configs of the restoring cluster and applies the zone configs of the backed-up cluster. It does this prior to ingesting user data to allow the allocator to distribute the replicas. Non-cluster restores don't deal in zone configs at all, and the ingested data is subject to whatever configuration applies to the table being restored. The status quo seems okay to me? A cluster restore is the only form of restore that is in the business of altering system tables, so it is what we encourage users to use if they want an exact replica of their backed up cluster. |
|
Thanks! All good then. |
9c1ee03 to
38dca99
Compare
irfansharif
left a comment
There was a problem hiding this comment.
🤦 Somehow forgot that v22.2 and earlier clusters do actually have an explicit entry in system.zones with the default GC TTL. So didn't need any explicit upgrade/migration. Same reason that we didn't need anything for restores. Updated PR + rebased, PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @dt, @herkolategan, and @renatolabs)
d4a1a44 to
1dba482
Compare
adityamaru
left a comment
There was a problem hiding this comment.
Thanks for doing this, it makes me very happy to see us dropping this value.
Fixes cockroachdb#89233. Release note (general change): The GC TTL previously defaulted to 25h. This value was configurable using `ALTER RANGE DEFAULT CONFIGURE ZONE USING gc.ttlseconds = <whatever>`, but also possible to scope to specific schema objects using `ALTER {DATABASE,TABLE,INDEX} CONFIGURE ZONE USING ...`. This value determine how long overwritten values were retained. The `RANGE DEFAULT` value is now lowered to 4h but only for freshly created clusters. When existing clusters upgrade onto this release, we will respect whatever value they were using before the upgrade for all their schema objects. This will be 25h if the GC TTL was never altered, or some specific value if set explicitly. Full cluster backups taken on earlier version clusters, when restored to clusters that started off at v23.1, will use the GC TTL recorded in the backup image. We've found the 25h value to translate to higher-than-necessary storage costs, especially for workloads where rows are deleted frequently. It can also make for costlier reads with respect to CPU since we currently have to scan over overwritten values to get to the one of interest. Finally, we've also observed cluster instability due to large unsplittable ranges that have accumulated an excessive amount of MVCC garbage. We chose a default of 25h originally to accommodate daily incremental backups with revision history. But with the introduction of scheduled backups introduced in 22.2, we no longer need a large GC TTL. Scheduled backups "chain together" and prevent garbage collection of relevant data to ensure coverage of revision history across backups, decoupling it from whatever value is used for GC TTL. So we no longer need a 25h default, hence this change. The GC TTL determines how far back AS OF SYSTEM TIME queries can go, which now if going past `now()-4h`, will start failing informatively. To support larger windows for AS OF SYSTEM TIME queries, users are encouraged to pick a more appropriate GC TTL and set it using `ALTER ... CONFIGURE ZONE using gc.ttlseconds = <whatever>`. The earlier considerations around storage use, read costs, and stability still apply. Release note (backward-incompatible change): See release note above. Technically this is not a backwards-incompatible change since we're only changing the default value that new clusters are initialized with -- existing clusters will remain unaffected. But it might be worth highlighting this change more prominently to our users for added scrutiny.
1dba482 to
1d01816
Compare
irfansharif
left a comment
There was a problem hiding this comment.
TFTR!
bors r+
|
Build succeeded: |
Fixes #89233.
Release note (general change): The GC TTL previously defaulted to 25h. This value was configurable using
ALTER RANGE DEFAULT CONFIGURE ZONE USING gc.ttlseconds = <whatever>, but also possible to scope to specific schema objects usingALTER {DATABASE,TABLE,INDEX} CONFIGURE ZONE USING .... This value determine how long overwritten values were retained. TheRANGE DEFAULTvalue is now lowered to 4h but only for freshly created clusters. When existing clusters upgrade onto this release, we will respect whatever value they were using before the upgrade for all their schema objects. This will be 25h if the GC TTL was never altered, or some specific value if set explicitly. Full cluster backups taken on earlier version clusters, when restored to clusters that started off at v23.1, will use the GC TTL recorded in the backup image.We've found the 25h value to translate to higher-than-necessary storage costs, especially for workloads where rows are deleted frequently. It can also make for costlier reads with respect to CPU since we currently have to scan over overwritten values to get to the one of interest. Finally, we've also observed cluster instability due to large unsplittable ranges that have accumulated an excessive amount of MVCC garbage. We chose a default of 25h originally to accommodate daily incremental backups with revision history. But with the introduction of scheduled backups introduced in 22.2, we no longer need a large GC TTL. Scheduled backups "chain together" and prevent garbage collection of relevant data to ensure coverage of revision history across backups, decoupling it from whatever value is used for GC TTL. So we no longer need a 25h default, hence this change.
The GC TTL determines how far back AS OF SYSTEM TIME queries can go, which now if going past
now()-4h, will start failing informatively. To support larger windows for AS OF SYSTEM TIME queries, users are encouraged to pick a more appropriate GC TTL and set it usingALTER ... CONFIGURE ZONE using gc.ttlseconds = <whatever>. The earlier considerations around storage use, read costs, and stability still apply.Release note (backward-incompatible change): See release note above. Technically this is not a backwards-incompatible change since we're only changing the default value that new clusters are initialized with -- existing clusters will remain unaffected. But it might be worth highlighting this change more prominently to our users for added scrutiny.