Skip to content

[Bug] Flink Mongo CDC can not fetch incremental data after snapshot dump success, but it can fetch incremental data when set copy.existing=false #2183

@newsbreak-tonglin

Description

@newsbreak-tonglin

Search before asking

  • I searched in the issues and found nothing similar.

Flink version

1.15.2

Flink CDC version

2.4.0 snapshot

Database and its version

Mongo 4.4
Hudi 0.12.1

Minimal reproduce step

Flink SQL:

CREATE TABLE mongo_cdc_test (
_id string,
...
...
PRIMARY KEY(_id) NOT ENFORCED
)
WITH (
'connector' = 'mongodb-cdc',
'poll.max.batch.size' = '100',
'hosts' = 'xxx:27017',
'username' = 'xx',
'password' = 'xx',
'database' = 'xx',
'collection' = 'xx',
'heartbeat.interval.ms' = '60000',
'scan.incremental.snapshot.enabled' = 'true',
'copy.existing' = 'true'
)

CREATE TABLE hudi_test(
_id string,
...
...
PRIMARY KEY(_id) NOT ENFORCED
)
WITH (
'connector' = 'hudi',
'path' = 'xx',
'table.type' = 'MERGE_ON_READ',
'index.type' = 'BUCKET',
'hoodie.datasource.write.recordkey.field' = '_id',
'hoodie.bucket.index.hash.field' = '_id',
'hoodie.bucket.index.num.buckets' = '64',
'changelog.enabled'= 'true',
'compaction.async.enabled'='true',
'compaction.tasks'= 'xx',
'compaction.trigger.strategy'= 'time_elapsed',
'compaction.delta_seconds'= '3600',
'compaction.max_memory'= '1024',
'write.option' = 'upsert',
'read.streaming.enabled'= 'true',
'read.streaming.check-interval'= '4',
'hive_sync.enable' = 'true',
'hive_sync.mode' = 'hms',
'hive_sync.metastore.uris' = 'thrift://xx:9083',
'hive_sync.table'='xx',
'hive_sync.auto_create_database'='false',
'hive_sync.table.strategy'='ALL',
'hive_sync.db'='default',
'write.tasks'='xx'
)

INSERT INTO hudi_test SELECT * FROM mongo_cdc_test

What did you expect to see?

I expect to use flink Mongo CDC to sync snapshot and incremental data from mongo and write it to Hudi

What did you see instead?

Flink Mongo CDC can not fetch incremental data after snapshot dump success, but it can fetch incremental data when set copy.existing=false
mongo cdc source is busy
image
mongo cdc source can not fetch any change streams from mongodb
image
hudi deltacommit instant is in a earlier time
image

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions