Skip to content

Conversation

@XenosK
Copy link
Contributor

@XenosK XenosK commented Oct 15, 2024

Purpose of this pull request

fix: #7794

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case.

@github-actions github-actions bot added the e2e label Oct 15, 2024
@XenosK
Copy link
Contributor Author

XenosK commented Oct 15, 2024

Please add test case.

Test case have been added, take a look.

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XenosK and others added 7 commits October 16, 2024 11:01
…ain/java/org/apache/seatunnel/connectors/cdc/base/source/enumerator/splitter/AbstractJdbcSourceChunkSplitter.java

Co-authored-by: Jia Fan <[email protected]>
…ain/java/org/apache/seatunnel/connectors/cdc/base/dialect/JdbcDataSourceDialect.java

Co-authored-by: Jia Fan <[email protected]>
…ain/java/org/apache/seatunnel/connectors/cdc/base/dialect/JdbcDataSourceDialect.java

Co-authored-by: Jia Fan <[email protected]>
@XenosK XenosK changed the title [Feature][Connector-V2]Jdbc chunk split add “snapshot.split.column” params #7794 [Feature][Connector-V2]Jdbc chunk split add snapshotSplitColumn config #7794 Nov 11, 2024
hailin0
hailin0 previously approved these changes Nov 12, 2024
Hisoka-X
Hisoka-X previously approved these changes Nov 13, 2024
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XenosK !

@Hisoka-X Hisoka-X merged commit b6c6dc0 into apache:dev Nov 19, 2024
4 checks passed
@ASDFSA13
Copy link

ASDFSA13 commented Jan 8, 2026

@Hisoka-X 这里2.3.12也是依旧的逻辑, 可能要提醒一下用户, 这边代码 如果table-names-config配置后 , 这个指定列的字段, 必须要是unique里面的 ; 特别是对于mysql8.0的表结构如下的情况
PRIMARY KEY (id),
UNIQUE KEY uni_str (str_value_sha256,seq,ledger_date,is_delete) USING BTREE,
KEY idx_status (seller_id,market_place_id,is_delete) USING BTREE,
KEY idx_created_time (create_time) USING BTREE,

对于这个视图的结构 在mysql , 本次pr的代码设置他的table-names-config为id并不会生效, 他会走目前代码的逻辑, 会从unique key里面取spilt key;此时并不会取到用户设置的id字段

这里最好调整一下mysqlcdc的文档2.3.12 , 如果需要的话我可以提供一些边界情况的测试结果集附录在文档上

可以看一下这个patch
0001-add-splitkey-mysql8.0.unique-key-primary-key.patch

@ASDFSA13
Copy link

ASDFSA13 commented Jan 8, 2026

核心逻辑是:

先看用户配置的 snapshotSplitColumn

但它会先判断:配置列是否属于 dialect.getUniqueKeys() 返回的 unique keys
如果不在 unique keys 里,就直接 warn 并忽略配置
然后进入自动选择:

先遍历 PrimaryKey 列(id)
再遍历 UniqueKey 列(seq 等)
最终按类型优先级挑最“优”的一个:TINYINT > SMALLINT > INT > BIGINT > ...

这就解释了整个现象链:

我的 snapshotSplitColumn=id 被判定 “not unique key”
进入自动选择后:

id 是 BIGINT
seq 是 INT
类型优先级 INT(3) < BIGINT(4),于是最终选中 seq
即便主键是 id,最后仍然用 seq 分片。

以为“配置强制生效”,但实际上被静默忽略,只能从日志里看出来;并且再加上官方的mysqlcdc文档, 都不知道如何下手去调整(因为已经按照文档去设置了指定的分片键了)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

4 participants