Skip to content

[Bug] [sqlserver cdc] incremental snapshot phase restore from checkpoint throw NullPointerException #2536

@edmond-kk

Description

@edmond-kk

Search before asking

  • I searched in the issues and found nothing similar.

Flink version

1.13.6

Flink CDC version

2.5-SNAPSHOT

Database and its version

sqlserver 2016

Minimal reproduce step

使用公司表跑任务时偶发出现,表数据量约1.3e,暂时不知道如何在本地复现。

What did you expect to see?

任务可以正常通过checkpoint恢复

What did you see instead?

任务偶尔会因为异常【Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: 为过程或函数 cdc.fn_cdc_get_all_changes_ ... 提供的参数数目不足。】 挂掉,但因为可以从checkpoint自动恢复,所以之前没去看该异常原因。
但有时无法从checkpoint恢复,异常日志:
2023-10-08 09:28:03,608 ERROR org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Uncaught exception in the SplitEnumerator for Source Source: primaryTableSource while handling operator event RequestSplitEvent (host='10.244.1.67') from subtask 1. Triggering job failover.
org.apache.flink.util.FlinkRuntimeException: Failed to assign splits {0=[SnapshotSplit{tableId=CRDB.dbo.tCR0001_V2.0, splitId='CRDB.dbo.tCR0001_V2.0:875', splitKeyType=[ID BIGINT NOT NULL], splitStart=[87508751], splitEnd=[87608761], highWatermark=null}]} due to
at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.callInCoordinatorThread(SourceCoordinatorContext.java:388) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.assignSplits(SourceCoordinatorContext.java:176) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
at org.apache.flink.api.connector.source.SplitEnumeratorContext.assignSplit(SplitEnumeratorContext.java:82) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
at com.ververica.cdc.connectors.base.source.enumerator.IncrementalSourceEnumerator.assignSplits(IncrementalSourceEnumerator.java:177) ~[blob_p-b4f6d17e0c08bbb487e958e0b3e0e49c0ab04224-e97c463e6ea3a571bf45208735d22f4e:?]
at com.ververica.cdc.connectors.base.source.enumerator.IncrementalSourceEnumerator.handleSplitRequest(IncrementalSourceEnumerator.java:97) ~[blob_p-b4f6d17e0c08bbb487e958e0b3e0e49c0ab04224-e97c463e6ea3a571bf45208735d22f4e:?]
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$handleEventFromOperator$1(SourceCoordinator.java:172) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.lambda$runInEventLoop$8(SourceCoordinator.java:344) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_361]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_361]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_361]
Caused by: java.lang.NullPointerException
at com.ververica.cdc.debezium.history.FlinkJsonTableChangeSerializer.toDocument(FlinkJsonTableChangeSerializer.java:58) ~[blob_p-b4f6d17e0c08bbb487e958e0b3e0e49c0ab04224-e97c463e6ea3a571bf45208735d22f4e:?]
at com.ververica.cdc.connectors.base.source.meta.split.SourceSplitSerializer.writeTableSchemas(SourceSplitSerializer.java:183) ~[blob_p-b4f6d17e0c08bbb487e958e0b3e0e49c0ab04224-e97c463e6ea3a571bf45208735d22f4e:?]
at com.ververica.cdc.connectors.base.source.meta.split.SourceSplitSerializer.serialize(SourceSplitSerializer.java:80) ~[blob_p-b4f6d17e0c08bbb487e958e0b3e0e49c0ab04224-e97c463e6ea3a571bf45208735d22f4e:?]
at com.ververica.cdc.connectors.base.source.meta.split.SourceSplitSerializer.serialize(SourceSplitSerializer.java:44) ~[blob_p-b4f6d17e0c08bbb487e958e0b3e0e49c0ab04224-e97c463e6ea3a571bf45208735d22f4e:?]
at org.apache.flink.runtime.source.event.AddSplitEvent.(AddSplitEvent.java:44) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$null$2(SourceCoordinatorContext.java:198) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
at java.util.HashMap.forEach(HashMap.java:1290) ~[?:1.8.0_361]
at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$3(SourceCoordinatorContext.java:191) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.callInCoordinatorThread(SourceCoordinatorContext.java:386) ~[flink-dist_2.11-1.13.6.jar:1.13.6]
... 9 more

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions