Skip to content

Conversation

@avamingli
Copy link
Contributor

@avamingli avamingli commented Jul 27, 2023

We have separated join locus function for non-parallel and parallel mode into cdbpath_motion_for_join() which handles for locus whose parallel_workers is 0 and cdbpath_motion_for_parallel_join() which handles others.

A lot of logic are meaningless in cdbpath_motion_for_parallel_join():
1.Writeable operations for join.
2.Both sides's locus parallel_workers are no more than one.
3.Some locus couldn't participate parallel join yet now.

We have done the job for SegmentGeneralWorkers as outer side join with other locus as inner side, but lack for other types. Refactoring that to keep codes clear.
Some wrong logic are found and left CBDB_PARALLEL_FIXME to remind us to resolve them in upcoming fixes.

Main changes:
SingleQE join SegmentGeneralWorkers.
SingleQE join Partitioned(workers>1), except HashedOJ.
SegmentGeneral as outer, Assert(false).
Partitioned(workers>1) join SegmentGeneral.
Partitioned(workers>1) join SegmentGeneralWorkers.
Partitioned(workers>1) join SingleQE.
Partitioned join Partitioned are handled as cdbpath_motion_for_join().
Remove all if conditions for inner side locus.

Authored-by: Zhang Mingli [email protected]

closes: #27


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to @cloudberrydb/dev team for review and approval when your PR is ready🥳

@avamingli
Copy link
Contributor Author

@my-ship-it This function is critical, to keep pr small, this commit doesn't cause any behavior change.

Some wrong logic are found and left CBDB_PARALLEL_FIXME to remind us to resolve them in upcoming fixes.

@avamingli
Copy link
Contributor Author

A suggestion for reviewers, it's hard to tell the right by codes diffs.
Compare the logical for parallel join with older and newer codes with two windows may be a little easier..

@avamingli avamingli force-pushed the refactor_parallel_locus_join branch 6 times, most recently from 922038a to 67aa72f Compare July 28, 2023 09:17
@avamingli avamingli self-assigned this Aug 1, 2023
@my-ship-it my-ship-it requested review from foreyes and yjhjstz August 31, 2023 01:19
@avamingli avamingli force-pushed the refactor_parallel_locus_join branch from 67aa72f to 41b0295 Compare October 8, 2023 09:00
We have separated join locus function for non-parallel and parallel
mode into cdbpath_motion_for_join() which handles for locus whose
parallel_workers is 0 and cdbpath_motion_for_parallel_join() which
handles others.

A lot of logic are meaningless in cdbpath_motion_for_parallel_join():
  1.Writeable operations for join.
  2.Both sides's locus parallel_workers are no more than one.
  3.Some locus couldn't participate parallel join yet now.

We have done the job for SegmentGeneralWorkers as outer side join
with other locus as inner side, but lack for other types.
Refactoring that to keep codes clear.
Some wrong logic are found and left CBDB_PARALLEL_FIXME to remind us
to resolve them in upcoming fixes.

Main changes:
SingleQE join SegmentGeneralWorkers.
SingleQE join Partitioned(workers>1), except HashedOJ.
SegmentGeneral as outer, Assert(false).
Partitioned(workers>1) join SegmentGeneral.
Partitioned(workers>1) join SegmentGeneralWorkers.
Partitioned(workers>1) join SingleQE.
Partitioned join Partitioned are handled as cdbpath_motion_for_join().
Remove all if conditions for inner side locus.

Authored-by: Zhang Mingli [email protected]
@avamingli avamingli force-pushed the refactor_parallel_locus_join branch from 41b0295 to eea7cf5 Compare October 9, 2023 07:39
@avamingli avamingli merged commit c00f6d7 into apache:main Oct 9, 2023
@avamingli avamingli deleted the refactor_parallel_locus_join branch October 9, 2023 10:25
@avamingli
Copy link
Contributor Author

Pushed, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cdbpath_motion_for_parallel_join() refactor

3 participants