Skip to content

[BEAM-6721] Set numShards dynamically for TextIO.write()#15500

Merged
pabloem merged 1 commit intoapache:masterfrom
baeminbo:BEAM-6721
Oct 26, 2021
Merged

[BEAM-6721] Set numShards dynamically for TextIO.write()#15500
pabloem merged 1 commit intoapache:masterfrom
baeminbo:BEAM-6721

Conversation

@baeminbo
Copy link
Copy Markdown
Contributor

Currently, TextIO.Write only supports withNumShards(int) so that it cannot be changed dynamically in Dataflow template jobs. This PR adds withNumShards(ValueProvider<Integer>) to resolve this issue.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@baeminbo
Copy link
Copy Markdown
Contributor Author

@lukecwik Hi, could you review this PR?

@robertwb
Copy link
Copy Markdown
Contributor

Any reason not to use flex templates instead? (One of the motivations for this was that plumbing ValueProvider's absolutely everywhere leads to a lot of boilerplate and cruft.)

@baeminbo
Copy link
Copy Markdown
Contributor Author

Hi Robert, it makes sense if Dataflow deprecates classic templates in favor of flex templates. Do they have the plan?

@pabloem pabloem requested a review from ihji September 23, 2021 18:42
@tvalentyn
Copy link
Copy Markdown
Contributor

IIRC there is no plan to deprecate classic templates atm, cc: @an2x.

@tvalentyn
Copy link
Copy Markdown
Contributor

Run Java PreCommit

1 similar comment
@tvalentyn
Copy link
Copy Markdown
Contributor

Run Java PreCommit

@aaltay
Copy link
Copy Markdown
Member

aaltay commented Oct 15, 2021

What is the next step on this PR?

@baeminbo
Copy link
Copy Markdown
Contributor Author

If Flex template is recommended for dynamically changeable shards for TextIO, I think this feature is not necessary, but we ask Dataflow Template creator team to migrate to Flex template from Classic template, at least for the templates writing data with TextIO.

@aaltay
Copy link
Copy Markdown
Member

aaltay commented Oct 15, 2021

If Flex template is recommended for dynamically changeable shards for TextIO, I think this feature is not necessary, but we ask Dataflow Template creator team to migrate to Flex template from Classic template, at least for the templates writing data with TextIO.

Ack. And @an2x could comment on that.

@pabloem
Copy link
Copy Markdown
Member

pabloem commented Oct 19, 2021

Run Java PreCommit

@pabloem
Copy link
Copy Markdown
Member

pabloem commented Oct 25, 2021

I think this is reason enough to include this change. Customers are currently unable to specify this on a supported template, and this would allow us to support them once the template is merged. I'm inclined to just merge this.

@pabloem
Copy link
Copy Markdown
Member

pabloem commented Oct 25, 2021

Run Java PreCommit

@tvalentyn
Copy link
Copy Markdown
Contributor

Run Java PreCommit

@tvalentyn
Copy link
Copy Markdown
Contributor

tvalentyn commented Oct 26, 2021

ES IO test failed yet another time, and I pinged the bug again. +1 to merge if there are no other concerns.

@pabloem pabloem merged commit 84e24ea into apache:master Oct 26, 2021
@pabloem
Copy link
Copy Markdown
Member

pabloem commented Oct 26, 2021

thanks everyone

@baeminbo baeminbo deleted the BEAM-6721 branch April 25, 2022 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants