-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-45917: [C++][Acero] Add flush taskgroup to enable parallelization #45918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
|
@westonpace @pitrou @zanmato1984 Please take a look when you have time. Thanks! |
b6387ae to
aeb47d3
Compare
|
Hi @uchenily , thank you for opening the PR. I would like add some more about the problem this PR is trying to address: Just wondering if you have encountered any case that the above problem causes real performance issue and how bad it is? And how much does this PR improve it? |
|
@zanmato1984 I ran a test It should be noted that during this test above, I modified What I mean is that kNumRowsPerScanTask also significantly impacts the test results. However, since I couldn't determine a more reasonable value for this parameter now, I haven't fully understood how this parameter affects the test results, so I will leave it unchanged in this PR. |
Thank you for the info. May I know the number of threads in your test? |
|
@zanmato1984 I ran the test on a 112-core machine, using the default values for both CPU thread pool and IO thread pool. |
|
Thank you. After some math I think I can explain the perf boost in your setup. Your thread count is Summarizing the comparison:
In terms of the improvement of this PR for But anyway, I think the effectiveness of this PR is independent of |
zanmato1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nice. Only one nit.
7e5a2b0 to
a1934df
Compare
|
@github-actions crossbow submit -g cpp |
zanmato1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll merge once the CI is good.
Thank you for your contribution @uchenily !
|
Revision: a1934df Submitted crossbow builds: ursacomputing/crossbow @ actions-750fc6ab42 |
|
@ursabot please benchmark |
|
Benchmark runs are scheduled for commit a1934df. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
|
Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit a1934df. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
|
Benchmark failures are unrelated. Seems it has been broken for a while. cc @raulcd may know more about this. I'm merging now. |
|
After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit c753740. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
|
Thanks, I've opened a blocker: It seems Arrow fails to build on the buildkite runners |
…ation (apache#45918) ### Rationale for this change Closes apache#45917 ### What changes are included in this PR? Execute JoinResultMaterialize.Flush() in task group to enhance parallelism in downstream processing, improving the performance of hash join. ### Are these changes tested? Yes. ### Are there any user-facing changes? None. * GitHub Issue: apache#45917 Authored-by: uchenily <[email protected]> Signed-off-by: Rossi Sun <[email protected]>
Rationale for this change
Closes #45917
What changes are included in this PR?
Execute JoinResultMaterialize.Flush() in task group to enhance parallelism in downstream processing, improving the performance of hash join.
Are these changes tested?
Yes.
Are there any user-facing changes?
None.