[postgres] Fix data lost problem when new lsn committed to slot between snapshotState and notifyCheckpointComplete #2539

lzshlzsh · 2023-10-09T03:30:00Z

this fix #2538

lzshlzsh · 2023-10-09T05:00:03Z

@loserwang1024 @leonardBang Would you help to review this pr?

loserwang1024

@lzshlzsh, thanks a lot for your contribution. I have provided some advice below：

loserwang1024 · 2023-10-16T09:42:25Z

...e/src/main/java/com/ververica/cdc/connectors/base/source/reader/IncrementalSourceReader.java

+
+    private long maxCompletedCheckpointId;
+
    public IncrementalSourceReader(


Now that only a few type cdc source need commit offset, why not create another subclass named （such as IncrementalSourceReaderWithCommit, not so good but just for example).
Other cdc connectors such as mysql no need maintain states for checkpoint, and no need do some reductant operations.

@loserwang1024 Thanks for review, IncrementalSourceReaderWithCommit is added, and is used for postgres-cdc currently.

loserwang1024 · 2023-10-16T09:55:37Z

...e/src/main/java/com/ververica/cdc/connectors/base/source/reader/IncrementalSourceReader.java

+            if (split.isStreamSplit()) {
+                lastCheckPointStreamSplit.put(checkpointId, split.asStreamSplit());
+                LOG.debug(
+                        "snapshot stream split, checkpoint id {}, stream split {}",


Snapshot state of stream split : {}, and checkpoint id is {}.

loserwang1024 · 2023-10-16T09:55:37Z

...e/src/main/java/com/ververica/cdc/connectors/base/source/reader/IncrementalSourceReader.java


+        for (SourceSplitBase split : stateSplits) {
+            if (split.isStreamSplit()) {
+                lastCheckPointStreamSplit.put(checkpointId, split.asStreamSplit());


In this design, it seems that in one checkpoint id, maybe more than one split need to snapshot its state. It seem that Map<Long, List<>> is better(just like Kafka source)

what about this? In for loop, if not just one meet the requirement, later one will override the later one in heap

what about this? In for loop, if not just one meet the requirement, later one will override the later one in heap

The code has been moved to IncrementalSourceReaderWithCommit. Currently, only StreamSplit is need, and should be just one StreamSplit for a checkpoint. Maybe a TreeMap<Long, StreamSplit> is enough, instead of TreeMap<Long, List<>>, do you think so?

Thanks for your explain. It seems that TreeMap<Long, StreamSplit> are different versions of same streamSplit. Each streamSplit contains lots of redundant informations, such as tableSchemas and finishedSnapshotSplitInfos. It seems that only startingOffset is what we need. Why not change TreeMap<Long, StreamSplit> to TreeMap<Long, Offset>?

Thanks and gree with you

Thanks for your explain. It seems that TreeMap<Long, StreamSplit> are different versions of same streamSplit. Each streamSplit contains lots of redundant informations, such as tableSchemas and finishedSnapshotSplitInfos. It seems that only startingOffset is what we need. Why not change TreeMap<Long, StreamSplit> to TreeMap<Long, Offset>?

Have made the modifications according to your suggestions. Could you please do a review if you have time?

loserwang1024 · 2023-10-16T10:01:34Z

...rc/main/java/com/ververica/cdc/connectors/postgres/source/fetch/PostgresStreamFetchTask.java

+                // it will begin consuming data from the PostgreSQL replication stream.
+                // Within PostgresStreamingChangeEventSource, the context's LSN will be updated.
+                commitLsn =
+                        ((PostgresOffset) split.asStreamSplit().getStartingOffset())


The biggest problem of this PR: each split is read between [start, end). If commit starting offset and then failover. The message of starting offset will lose.

Perhaps what you're talking about is the start and end of the primary key?
This PR is an attempt to solve the data lost problem in the stream stage, where the root cause of the problem is that the offset of the streamsplit in the checkpoint state lags behind the confirmed_flush_lsn of the postgres slot, the wal data between them has been lost if failover before the next checkpoint succeeds, such as the reproducing test case added by the first commit of this PR.

There is no problem in scan/snapshot stage.

loserwang1024 · 2023-10-16T10:10:40Z

...e/src/main/java/com/ververica/cdc/connectors/base/source/reader/IncrementalSourceReader.java

+        }
    }

    @Override


In onSplitFinished method , remove finished splits from lastCheckPointStreamSplit, nor memory leak will occur.

lastCheckPointStreamSplit.headMap(checkpointId, true).clear()

Thanks for your review. Yes, StreamSplit of current checkpointId should be removed.

loserwang1024 · 2023-10-18T04:32:40Z

flink-cdc-base/src/main/java/com/ververica/cdc/connectors/base/dialect/DataSourceDialect.java

     * @see CheckpointListener#notifyCheckpointComplete(long)
     */
    @Override
    default void notifyCheckpointComplete(long checkpointId) throws Exception {}


It seem that this method is no longer use

Have to provide a default implementation, as the supper.CheckpointListener does not have a default implementation.

loserwang1024 · 2023-10-18T04:39:48Z

...rc/main/java/com/ververica/cdc/connectors/postgres/source/fetch/PostgresStreamFetchTask.java

                    (Long)
                            postgresOffsetContext
                                    .getOffset()
                                    .get(PostgresOffsetContext.LAST_COMMIT_LSN_KEY);


It no longer need this code, because only streamingSlit is allowed to commitCurrentOffset. More over, LAST_COMMIT_LSN_KEY is update by Begin or Commit rather than message (See PostgresStreamingChangeEventSource#processMessages -> PostgresStreamingChangeEventSource#commitMessage -> PostgresOffsetContext#updateCommitPosition).

It no longer need this code, because only streamingSlit is allowed to commitCurrentOffset. More over, LAST_COMMIT_LSN_KEY is update by Begin or Commit rather than message (See PostgresStreamingChangeEventSource#processMessages -> PostgresStreamingChangeEventSource#commitMessage -> PostgresOffsetContext#updateCommitPosition).

Thanks for your review, I will modify the code according to your suggestion。
Yes, your are right, LAST_COMMIT_LSN_KEY is updated by Begin or commit message.

I want to explain the data lost problem in more detail, because the problem happens on our online when failover. After apply this pr's fix, there have been no data loss so far.

If checkpoint succeeds and before notifyCheckpoint's commit confirmed_flush_lsn finishes, there's a table UPDATE event arrives, the event appears as BEGIN/UPDATE/COMMIT LSN sequence, the notifyCheckpoint will commit COMMIT's LSN to slot. If failover happends at this time and will restore from this checkpoint, it will begin consume from the slot's confirmed_flush_lsn(that is COMMIT's LSN), nor the LSN of the streamsplit's lsn of checkpoint, because streamsplit's lsn < slot's confirmed_flush_lsn。From the high level view, the table's UPDATE event is lost.

Thanks a lot for your explain, I totally agree with you. Moreover, I have also found that Postgres offset commit is no bound to checkpoint id before your PR. If do checkpoint five consecutive times, then the first one is done(then notifyCheckpoint) , the offset of the fifth checkpoint will be commit. If failover and restart from first checkpoint, Wal-Log between [first_checkpoint, five_checkpoint) will be recycled.

loserwang1024

LGTM

leonardBang · 2023-12-05T16:01:31Z

@lzshlzsh would you like to rebase the PR to latest master so that we can merge it

lzshlzsh · 2023-12-12T04:52:19Z

@lzshlzsh would you like to rebase the PR to latest master so that we can merge it

Sorry, I saw the message too late。I'll rebase it.

lzshlzsh · 2024-01-04T14:10:41Z

@lzshlzsh would you like to rebase the PR to latest master so that we can merge it

@leonardBang have rebased on master， would you help to have a review?

loserwang1024 · 2024-01-18T11:56:24Z

@leonardBang , CC

…d to slot between snapshotState and notifyCheckpointComplete

leonardBang

Thanks @lzshlzsh for the contribution and @loserwang1024 for the review work, LGTM

…d to slot between snapshotState and notifyCheckpointComplete (apache#2539) This closes apache#2538. Co-authored-by: sammieliu <[email protected]>

loserwang1024 reviewed Oct 16, 2023

View reviewed changes

loserwang1024 reviewed Oct 18, 2023

View reviewed changes

lzshlzsh requested a review from loserwang1024 October 21, 2023 11:51

loserwang1024 approved these changes Oct 30, 2023

View reviewed changes

leonardBang assigned lzshlzsh Dec 5, 2023

lzshlzsh force-pushed the postgres-cdc-data-lost branch from a87ebda to 4d9c0d6 Compare January 4, 2024 11:05

github-actions bot added base postgres-cdc-connector labels Jan 4, 2024

[cdc-connector][postgres] Fix data lost problem when new lsn committe…

91acacc

…d to slot between snapshotState and notifyCheckpointComplete

leonardBang force-pushed the postgres-cdc-data-lost branch from 4d9c0d6 to 91acacc Compare January 19, 2024 09:55

leonardBang approved these changes Jan 19, 2024

View reviewed changes

leonardBang merged commit 9ce3656 into apache:master Jan 21, 2024


		private long maxCompletedCheckpointId;

		public IncrementalSourceReader(

[postgres] Fix data lost problem when new lsn committed to slot between snapshotState and notifyCheckpointComplete #2539

[postgres] Fix data lost problem when new lsn committed to slot between snapshotState and notifyCheckpointComplete #2539

Uh oh!

Conversation

lzshlzsh commented Oct 9, 2023

Uh oh!

lzshlzsh commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loserwang1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loserwang1024 Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lzshlzsh Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loserwang1024 Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lzshlzsh Oct 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loserwang1024 left a comment

Choose a reason for hiding this comment

Uh oh!

leonardBang commented Dec 5, 2023

Uh oh!

lzshlzsh commented Dec 12, 2023

Uh oh!

lzshlzsh commented Jan 4, 2024

Uh oh!

loserwang1024 commented Jan 18, 2024

Uh oh!

leonardBang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

lzshlzsh commented Oct 9, 2023 •

edited

Loading

loserwang1024 Oct 26, 2023 •

edited

Loading

lzshlzsh Oct 17, 2023 •

edited

Loading

loserwang1024 Oct 16, 2023 •

edited

Loading

lzshlzsh Oct 25, 2023 •

edited

Loading