Skip to content

Conversation

@hawk9821
Copy link
Contributor

@hawk9821 hawk9821 commented Aug 7, 2024

Purpose of this pull request

Support dynamic bucket splitting improves Paimon writing efficiency

Does this PR introduce any user-facing change?

no

How was this patch tested?

e2e: PaimonSinkDynamicBucketIT
UT: PaimonBucketAssignerTest#bucketAssigner
e2e case: PaimonSinkDynamicBucketIT#testPaimonBucketCountOnSparkAndFlink ,because spark and Flink engine can not auto create paimon table on worker node in local file, this e2e case work on local hdfs environment.
17266244003976

Check list

@Hisoka-X Hisoka-X changed the title [Feature][CONNECTORS-V2-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency [Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency Aug 7, 2024
@Hisoka-X
Copy link
Member

Hisoka-X commented Aug 7, 2024

cc @dailai and @TaoZex

@github-actions github-actions bot removed the flink label Aug 8, 2024
@hawk9821 hawk9821 force-pushed the paimon_dynamic_bucket branch 2 times, most recently from 1b445f0 to a5d18ee Compare August 21, 2024 00:44
@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Aug 21, 2024
@dailai
Copy link
Contributor

dailai commented Aug 21, 2024

Please retrigger the ci.

@hawk9821 hawk9821 force-pushed the paimon_dynamic_bucket branch 5 times, most recently from 50764df to c93f7b8 Compare August 23, 2024 01:11
@github-actions github-actions bot removed the dependencies Pull requests that update a dependency file label Aug 23, 2024
@hawk9821 hawk9821 requested review from Hisoka-X and dailai August 24, 2024 10:11
@dailai
Copy link
Contributor

dailai commented Aug 26, 2024

Thinks @hawk9821 . Good job. I think your e2e case needs to be added to the case of multi-parallelism, the current case is all single parallelism. In this way, we can effectively verify whether the dynamic bucketing will change depending on the degree of parallelism of the job. Also, I think you should check the bucket count in every case instead of making a separate case. In addition, each of your cases should verify that the dynamic-bucket.target-row-num argument works as expected.

@github-actions github-actions bot added dependencies Pull requests that update a dependency file CI&CD core SeaTunnel core module and removed paimon labels Aug 29, 2024
@hawk9821 hawk9821 force-pushed the paimon_dynamic_bucket branch 2 times, most recently from 974e481 to af318d5 Compare September 3, 2024 05:51
@Hisoka-X Hisoka-X self-assigned this Sep 4, 2024
@hawk9821 hawk9821 force-pushed the paimon_dynamic_bucket branch from af318d5 to 3874aae Compare September 12, 2024 15:53
@github-actions github-actions bot added core SeaTunnel core module flink and removed paimon labels Sep 12, 2024
@hawk9821 hawk9821 force-pushed the paimon_dynamic_bucket branch 3 times, most recently from ad8281b to e0cd7d8 Compare September 12, 2024 17:11
@hawk9821 hawk9821 force-pushed the paimon_dynamic_bucket branch 4 times, most recently from dc92c7b to a1b351d Compare September 18, 2024 08:39
wuchunfu and others added 5 commits September 20, 2024 10:08
* [Improve] Update snapshot version to 2.3.8

* [Improve] Update snapshot version to 2.3.8
[Feature][CONNECTORS-V2-Paimon] spark task parallelism
[Feature][CONNECTORS-V2-Paimon] update doc

[Feature][CONNECTORS-V2-Paimon] write to dynamic bucket table , spark flink e2e
@hawk9821 hawk9821 force-pushed the paimon_dynamic_bucket branch from b34a78a to 9a7dab1 Compare September 20, 2024 02:18
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if ci passes. Thanks @hawk9821

@hailin0 hailin0 merged commit bc0326c into apache:dev Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants