Skip to content

Rename Persisted LSN to Durable LSN#3277

Merged
pcholakov merged 1 commit intomainfrom
push-ovqvqktzzuvs
May 31, 2025
Merged

Rename Persisted LSN to Durable LSN#3277
pcholakov merged 1 commit intomainfrom
push-ovqvqktzzuvs

Conversation

@pcholakov
Copy link
Contributor

@pcholakov pcholakov commented May 15, 2025

This change renames "persisted LSN" to "durable LSN" to reflect partition data which has been durably committed to local disk by a node.

This is a breaking change for the following metrics but I feel this is justified; happy to undo if there are any concerns:

  • restate.partition.last_applied_lsn -> restate.partition.applied_lsn
  • restate.partition.applied_lsn_lag -> restate.partition.applied_lsn_lag
  • restate.partition.last_persisted_lsn -> restate.partition.durable_lsn

Additionally:

  • Updates restatectl column display to "Durable LSN"
  • Keeps the NodeStateResponse RPC message (response to GetNodeState, used by Custer Controller) using the old naming as we still have a FlexBuffers-based WireEncode fallback implementation for these; this will be safe to rename when we drop V1 networking support
  • Adds a new durable_log_lsn column to the partition_state view, mirroring persisted_log_lsn (which will be separately documented as deprecated)
  • other purely internal usages
% ./target/debug/restatectl sql 'select partition_id, effective_mode, applied_log_lsn, persisted_log_lsn, durable_log_lsn from partition_state'
 PARTITION_ID  EFFECTIVE_MODE  APPLIED_LOG_LSN  PERSISTED_LOG_LSN  DURABLE_LOG_LSN
 0             leader          53330            50378              50378
 1             leader          52970            50379              50379
 2             leader          52624            50360              50360
 3             leader          53013            50366              50366
 4             leader          51890            50366              50366
 5             leader          53817            50361              50361
 6             leader          53013            50361              50361
 7             leader          53114            50374              50374
 8             leader          53348            50361              50361
 9             leader          53336            50362              50362
 10            leader          53421            50363              50363
 11            leader          53320            50362              50362
 12            leader          54089            50367              50367
 13            leader          55178            50366              50366
 14            leader          52526            50361              50361
 15            leader          53079            50362              50362
 16            leader          54472            50378              50378
 17            leader          53002            50378              50378
 18            leader          52672            50361              50361
 19            leader          53361            50373              50373
 20            leader          53026            50360              50360
 21            leader          53078            50381              50381
 22            leader          53895            50367              50367
 23            leader          53849            50364              50364

24 rows. Query took 10.639125ms

% ./target/debug/restatectl sql 'describe partition_state'
 COLUMN_NAME             DATA_TYPE                               IS_NULLABLE
 partition_id            UInt32                                  YES
 plain_node_id           Utf8                                    YES
 gen_node_id             Utf8                                    YES
 target_mode             Utf8                                    YES
 effective_mode          Utf8                                    YES
 updated_at              Timestamp(Millisecond, Some("+00:00"))  YES
 leader_epoch            UInt64                                  YES
 leader                  Utf8                                    YES
 applied_log_lsn         UInt64                                  YES
 last_record_applied_at  Timestamp(Millisecond, Some("+00:00"))  YES
 skipped_records         UInt64                                  YES
 replay_status           Utf8                                    YES
 persisted_log_lsn       UInt64                                  YES
 durable_log_lsn         UInt64                                  YES
 archived_log_lsn        UInt64                                  YES
 target_tail_lsn         UInt64                                  YES

16 rows. Query took 6.38025ms

Fixes: #3252

@pcholakov pcholakov requested a review from muhamadazmy May 15, 2025 15:19
@pcholakov pcholakov marked this pull request as ready for review May 15, 2025 15:22
@github-actions
Copy link

github-actions bot commented May 15, 2025

Test Results

  7 files  ±0    7 suites  ±0   5m 0s ⏱️ +19s
 54 tests ±0   53 ✅ ±0  1 💤 ±0  0 ❌ ±0 
223 runs  ±0  220 ✅ ±0  3 💤 ±0  0 ❌ ±0 

Results for commit f1b5d9d. ± Comparison against base commit 63496a2.

♻️ This comment has been updated with latest results.

pub durable_lsn: Option<Lsn>,
#[bilrost(11)]
pub last_archived_log_lsn: Option<Lsn>,
pub archived_lsn: Option<Lsn>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect those renames might cause deserialization errors for V1 protocol. @pcholakov did you validate backward compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test a mixed cluster running main and my own changes (only after your fix for gossip went in), and yes, it does work. I believe we never serialize this struct using FlexBuffers or JSON, where this could be a problem. I'd like to do some more though, I've just confirmed that nodes form a cluster, and that all nodes start up partitions - and that all nodes can see the applied + durable LSNs from their peers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's Remove Serialize/Deseralize and see what breaks then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly me! This is used in NodeStateResponse, which is sent in response to Cluster Controller's GetNodeState. My testing was only using restatectl to talk to servers but a V1 admin node does not see these renamed attributes properly. Reverted!

/// Last persisted log LSN
persisted_log_lsn: DataType::UInt64,
/// Durable log LSN
durable_log_lsn: DataType::UInt64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change. This will change the sql schema

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check with @igalshilman what is the best approach to update this. Do we keep both names and mark the old name as deprecated, and keep it in code for couple of versions before removing ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muhamadazmy we try not to do that, but we also have a versioning mechanism.
Both the CLI and the restate admin api, decelerate versions. CLI via the agent / server strings.
It happens here in the CLI https://github.com/restatedev/restate/blob/main/cli/src/clients/admin_client.rs#L150
AFIK @jackkleeman introduced this mechanism in both ends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have been more careful with this; I knew I am renaming the column in the view but switched it anyway. You're right that we should keep this as-is - reverted!

PS. I believe the versioning mechanism is purely about query data encoding, rather than the schema itself. At some point we might want to think about how we version the schema too, but I don't want to open that right now.

@jackkleeman
Copy link
Contributor

btw, i noticed there are still a few config flags related to the old persisted lsn watchdog? like the interval

@pcholakov pcholakov force-pushed the push-ovqvqktzzuvs branch from c0af645 to f25d18d Compare May 26, 2025 17:02
@pcholakov
Copy link
Contributor Author

btw, i noticed there are still a few config flags related to the old persisted lsn watchdog? like the interval

@jackkleeman I've deprecated these in #3278 - one could argue we could just as well drop them, but I figured let's give folks a heads-up :-)

@AhmedSoliman
Copy link
Contributor

btw, i noticed there are still a few config flags related to the old persisted lsn watchdog? like the interval

I'm not sure if this is necessary. I'd remove them from new config dumps and only keep them in shadow to give a notice.

Additionally, can you send a PR to update the relevant documentation to align with these changes please?

Comment on lines 56 to 57
/// Durable log LSN (previously, persisted log LSN)
persisted_log_lsn: DataType::UInt64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we double-write to the new name so we can remove the old one eventually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, great suggestion!

Copy link
Contributor

@muhamadazmy muhamadazmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pcholakov for this PR. I agree with Ahmed that we probably should double-write the new name in the schema and update the docs of the older one as deprecated so we can drop it in the future. Other than that, it's good to go

@pcholakov pcholakov closed this May 30, 2025
@pcholakov pcholakov deleted the push-ovqvqktzzuvs branch May 30, 2025 15:33
@github-actions github-actions bot locked and limited conversation to collaborators May 30, 2025
@pcholakov pcholakov restored the push-ovqvqktzzuvs branch May 30, 2025 15:33
@pcholakov pcholakov reopened this May 30, 2025
@restatedev restatedev unlocked this conversation May 30, 2025
@pcholakov pcholakov force-pushed the push-ovqvqktzzuvs branch from 99aecea to f1b5d9d Compare May 30, 2025 20:53
@pcholakov
Copy link
Contributor Author

Thank you @pcholakov for this PR. I agree with Ahmed that we probably should double-write the new name in the schema and update the docs of the older one as deprecated so we can drop it in the future. Other than that, it's good to go

Done! Thanks for the suggestion, I'll update the docs too 😃

@pcholakov
Copy link
Contributor Author

btw, i noticed there are still a few config flags related to the old persisted lsn watchdog? like the interval

I'm not sure if this is necessary. I'd remove them from new config dumps and only keep them in shadow to give a notice.

Already done in #3278!

@pcholakov pcholakov merged commit 3c0192b into main May 31, 2025
26 checks passed
@pcholakov pcholakov deleted the push-ovqvqktzzuvs branch May 31, 2025 03:55
@github-actions github-actions bot locked and limited conversation to collaborators May 31, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace Persisted LSN with Durable LSN

5 participants