[Fix] [Mongo-cdc] Fallback to timestamp startup mode when resume token has expired #8754

jw-itq · 2025-02-17T12:51:49Z

Purpose of this pull request

Synchronize the Flink-CDC code to fix the Mongo-CDC issue.

When MongoDB CDC connector tries to create cursor with an expired resuming token during stream task fetching stage, it will crash with a fatal exception: error due to Command failed with error 280 (ChangeStreamFatalError): cannot resume stream; the resume token was not found.

This PR added fallback logic to create cursor with timestamp, which only runs when:

Mongo CDC is in StreamTaskFetch stage
a ChangeStreamFatalError (280) is raised
Current ChangeStreamOffset has a valid timestamp field

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config
Update the release-note.

…ing with Elasticsearch's automatic index creation apache#7430

…utomatic index creation conflict apache#7430

…utomatically creating indexes based on templates apache#7430

…utomatic index creation conflict apache#7430

Hisoka-X

Thanks @jw-itq , Do we have some way to add test case verify it?

jw-itq · 2025-02-18T12:44:05Z

Thanks @jw-itq , Do we have some way to add test case verify it?

ok, I'll give it a try. It might take some time.

jw-itq · 2025-02-19T07:49:12Z

This PR also fixes the issue where MongoDB CDC fails to resume from the savepoint during restart recovery.

jw-itq · 2025-03-03T08:57:20Z

@Hisoka-X It is difficult to reproduce test cases where the resumetoken is invalid because the checkpoint file of seatunnel is automatically parsed and cannot be specified or modified. Can I reproduce testing such abnormal cases by modifying the checkpoint file

Hisoka-X · 2025-03-03T09:02:38Z

@Hisoka-X It is difficult to reproduce test cases where the resumetoken is invalid because the checkpoint file of seatunnel is automatically parsed and cannot be specified or modified. Can I reproduce testing such abnormal cases by modifying the checkpoint file

sure.

…-resume2

jw-itq · 2025-03-06T11:20:24Z

@Hisoka-X Fixed. Please help review. thanks.

jw-itq · 2025-03-10T03:07:09Z

@Hisoka-X could you please help me review it,thanks!

Hisoka-X · 2025-03-10T03:12:15Z

waiting test case passes.

jw-itq · 2025-03-10T03:14:48Z

waiting test case passes.

only here hasn't passed, but I don't understand why

jw-itq · 2025-03-11T00:28:44Z

waiting test case passes.

only here hasn't passed, but I don't understand why

@Hisoka-X hi,sorry to bother you, may I ask if this needs to be resolved? can you help me take a look,thanks.

…-resume2

jw-itq · 2025-03-11T12:53:37Z

@Hisoka-X test case has passed, please help review it, thanks!

Hisoka-X

Thanks @jw-itq !

Hisoka-X · 2025-03-12T04:01:58Z

...g/apache/seatunnel/connectors/seatunnel/cdc/mongodb/source/fetch/MongodbStreamFetchTask.java

+                try {
+                    next = Optional.ofNullable(changeStreamCursor.tryNext());
+                } catch (MongoCommandException e) {
+                    if (MongodbUtils.checkIfChangeStreamCursorExpires(e)) {


Can we add an option to let user decide to fallback to timestamp restart mode or direct throw exception?

Can we add an option to let user decide to fallback to timestamp restart mode or direct throw exception?

ok, thank you!, but the failure of the resume token captures the 280 exception in mongo. At this time, the resume token is invalid, but the timestamp is generally valid, which means that the oplog log may still be there. When the timestamp cannot be found, it means that the oplog log is also missing. At this time, it means that breakpoint recovery is completely impossible, and there is a normal exception occurring,this is to ensure smooth recovery from breakpoints. I am wondering if it is necessary to add configuration options for users to choose from.

Cursor expire will lost data (or read duplicate) or not? If we can make sure this behavior would not lost any data then not provode any option is ok.

Cursor expire will lost data (or read duplicate) or not? If we can make sure this behavior would not lost any data then not provode any option is ok.

If the current time is used, there is a possibility of data loss. Initially, the code did not include the logic of the current timestamp, and I don't understand why this logic was added at that time.

the issue of writing a large amount of duplicate data during the restart of mongo cdc has been resolved in this submission.

Cursor expire will lost data (or read duplicate) or not? If we can make sure this behavior would not lost any data then not provode any option is ok.

there is usually no data loss situation because the checkpoint saves the resume token and timestamp. If the timestamp is valid, there will be no problem, but if it is invalid, there will be an exception

jw-itq added 24 commits August 22, 2024 19:10

[Fix] [sink elasticsearch] Fix the issue of sink-es saveMode conflict…

d272789

…ing with Elasticsearch's automatic index creation apache#7430

[Fix] [sink elasticsearch] Fix the issue of sink-es saveMode and es a…

4ebc66b

…utomatic index creation conflict apache#7430

[Bug] [sink elasticsearch] the savemode of sink-es conficts with es a…

6810d80

…utomatically creating indexes based on templates apache#7430

[Fix] [sink elasticsearch] Fix the issue of sink-es saveMode and es a…

ac50c3e

…utomatic index creation conflict apache#7430

[Fix] [sink elasticsearch] Fix the issue of sink-es saveMode and es a…

6e8ccc9

…utomatic index creation conflict apache#7430

[Doc] Add IGNORE savemode type into docment apache#7443

3936038

Merge branch 'apache:dev' into dev

3bc24c2

Merge branch 'apache:dev' into dev

c26e89d

Merge branch 'apache:dev' into dev

b5f5162

Merge branch 'apache:dev' into dev

0347da2

Merge branch 'apache:dev' into dev

4bce27f

Merge branch 'apache:dev' into dev

d4993f3

Merge branch 'apache:dev' into dev

5d61e76

Merge branch 'apache:dev' into dev

9f85118

Merge branch 'apache:dev' into dev

bc41c79

Merge branch 'apache:dev' into dev

4b21c83

Merge branch 'apache:dev' into dev

af53478

Merge branch 'apache:dev' into dev

1771001

Merge branch 'apache:dev' into dev

2ea8f05

Merge branch 'apache:dev' into dev

9e8179d

Merge branch 'apache:dev' into dev

c65e6cb

Merge branch 'apache:dev' into dev

5f2d00b

Avoid mongodb source to read data after high_watermark in backfill phase

16b96dd

Fallback to timestamp startup mode when resume token has expired

a33630e

github-actions bot added connectors-v2 cdc labels Feb 17, 2025

Hisoka-X reviewed Feb 18, 2025

View reviewed changes

fix the issue of mongo restore offset

e8346c3

add mongo test case

b1ed777

github-actions bot added the e2e label Feb 22, 2025

add mongo resume token test case

05df0e0

github-actions bot added the Zeta label Mar 5, 2025

jw-itq added 9 commits March 5, 2025 15:50

code style

611c9e9

slf4j.version

f32aa66

add scope

f0f68e4

Merge branch 'apache:dev' into dev

f387d8d

create checkpoint dir

a10bbb2

testMongodbCdcMultiTableToMysqlE2e

7c85cc2

add test timeout

ece7ebc

Merge remote-tracking branch 'refs/remotes/origin/dev' into mongo-fix…

17e09d5

…-resume2

add test timeout

2669bf0

jw-itq added 2 commits March 11, 2025 14:31

Merge branch 'apache:dev' into dev

a94e735

Merge remote-tracking branch 'refs/remotes/origin/dev' into mongo-fix…

025a102

…-resume2

Hisoka-X reviewed Mar 12, 2025

View reviewed changes

Hisoka-X approved these changes Mar 12, 2025

View reviewed changes

Hisoka-X removed the need add test case label Mar 12, 2025

github-actions bot added approved reviewed labels Mar 12, 2025

hailin0 approved these changes Mar 13, 2025

View reviewed changes

hailin0 merged commit afc990d into apache:dev Mar 13, 2025
10 checks passed

[Fix] [Mongo-cdc] Fallback to timestamp startup mode when resume token has expired #8754

[Fix] [Mongo-cdc] Fallback to timestamp startup mode when resume token has expired #8754

Uh oh!

Conversation

jw-itq commented Feb 17, 2025

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

Hisoka-X left a comment

Choose a reason for hiding this comment

Uh oh!

jw-itq commented Feb 18, 2025

Uh oh!

jw-itq commented Feb 19, 2025

Uh oh!

jw-itq commented Mar 3, 2025

Uh oh!

Hisoka-X commented Mar 3, 2025

Uh oh!

jw-itq commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jw-itq commented Mar 10, 2025

Uh oh!

Hisoka-X commented Mar 10, 2025

Uh oh!

jw-itq commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jw-itq commented Mar 11, 2025

Uh oh!

jw-itq commented Mar 11, 2025

Uh oh!

Hisoka-X left a comment

Choose a reason for hiding this comment

Uh oh!

Hisoka-X Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

jw-itq Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

Hisoka-X Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jw-itq Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

jw-itq Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jw-itq commented Mar 6, 2025 •

edited

Loading

jw-itq commented Mar 10, 2025 •

edited

Loading

Hisoka-X Mar 12, 2025 •

edited

Loading