-
Notifications
You must be signed in to change notification settings - Fork 196
Enable Parallel-oblivious Hash Left Anti Semi (Not-In) Join #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Parallel-oblivious Hash Left Anti Semi (Not-In) Join #30
Conversation
|
This pr is actually ready and works well, but we lack of CBDB Parallel test CI now, so I make it to draft. |
Seems CI will run even it's a draft pr, turn it to ready.. |
|
Hold on this pr, we should wait for parallel CI test enabled. @my-ship-it |
🆗 |
b70b8c9 to
b981058
Compare
6d0c7d6 to
6569711
Compare
07b4588 to
629cae3
Compare
This is a parallel-oblivious parallel hash join, that each inner side
table would be duplicately processed without a shared hash table.
We could benefit from parallel if the outer table is large and inner
table is relatively small.
See [1] below for example DDL and DML.
Non-parallel plan:
explain(analyze, costs off) select sum(t1.c1) from t1 where c1 not in (select c1 from t2);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual time=1808.872..1808.875 rows=1 loops=1)
-> Gather Motion 3:1 (slice1; segments: 3) (actual time=1745.235..1808.858 rows=3 loops=1)
-> Partial Aggregate (actual time=1808.622..1808.625 rows=1 loops=1)
-> Hash Left Anti Semi (Not-In) Join (actual time=2.890..1583.005 rows=1667434 loops=1)
Hash Cond: (t1.c1 = t2.c1)
Extra Text: (seg2) Hash chain length 1.0 avg, 2 max, using 1199 of 524288 buckets.
-> Seq Scan on t1 (actual time=0.355..678.531 rows=1667832 loops=1)
-> Hash (actual time=2.068..2.069 rows=1200 loops=1)
Buckets: 524288 Batches: 1 Memory Usage: 4139kB
-> Broadcast Motion 3:3 (slice2; segments: 3) (actual time=1.476..1.772 rows=1200 loops=1)
-> Seq Scan on t2 (actual time=0.356..0.499 rows=407 loops=1)
Planning Time: 0.454 ms
(slice0) Executor memory: 124K bytes.
(slice1) Executor memory: 4443K bytes avg x 3x(0) workers, 4443K bytes max (seg0). Work_mem: 4139K bytes max.
(slice2) Executor memory: 262K bytes avg x 3x(0) workers, 262K bytes max (seg0).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 1809.517 ms
(18 rows)
Time: 1810.827 ms (00:01.811)
Parallel plan:
explain(analyze, costs off) select sum(t1.c1) from t1 where c1 not in (select c1 from t2);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (actual time=758.707..758.710 rows=1 loops=1)
-> Gather Motion 6:1 (slice1; segments: 6) (actual time=668.747..758.685 rows=6 loops=1)
-> Partial Aggregate (actual time=752.479..752.483 rows=1 loops=1)
-> Hash Left Anti Semi (Not-In) Join (actual time=3.010..565.127 rows=833732 loops=1)
Hash Cond: (t1.c1 = t2.c1)
Extra Text: (seg2) Hash chain length 1.0 avg, 2 max, using 1199 of 524288 buckets.
-> Parallel Seq Scan on t1 (actual time=0.368..231.049 rows=833932 loops=1)
-> Hash (actual time=2.148..2.149 rows=1200 loops=1)
Buckets: 524288 Batches: 1 Memory Usage: 4139kB
-> Broadcast Motion 3:6 (slice2; segments: 3) (actual time=0.203..1.779 rows=1200 loops=1)
-> Seq Scan on t2 (actual time=0.361..0.499 rows=407 loops=1)
Planning Time: 0.470 ms
(slice0) Executor memory: 124K bytes.
(slice1) Executor memory: 4483K bytes avg x 6x(0) workers, 4483K bytes max (seg0). Work_mem: 4139K bytes max.
(slice2) Executor memory: 262K bytes avg x 3x(0) workers, 262K bytes max (seg0).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 759.440 ms
(18 rows)
Time: 760.874 ms
[1] Example:
create table t1(c1 int, c2 int) using ao_row distributed by (c1);
create table t2(c1 int, c2 int) using ao_row distributed by (c1);
set enable_parallel = on;
set gp_appendonly_insert_files = 2;
set gp_appendonly_insert_files_tuples_range = 100;
set max_parallel_workers_per_gather = 2;
insert into t1 select i, i from generate_series(1, 5000000) i;
insert into t2 select i+1, i from generate_series(1, 1200) i;
analyze t1;
analyze t2;
Authored-by: Zhang Mingli [email protected]
629cae3 to
3fd94b0
Compare
We have Parallel Nested Loop Left Anti Semi (Not-In) Join, but in CBDB, enable_nestloop is default false.
And we disable Parallel Hash Left Anti Semi (Not-In) Join before, because we thought that it will return wrong results and it's hard to tell null values in which batch.
So that in CBDB now, there is usually a non-parallel Hash LASJ and it shows degradation performance at a internal test(by @HuSen8891 ) TPCH 's SQL16 case.
After consideration, I think it's only problem for Parallel-aware Hash Left Anti Semi (Not-In) Join with a shared Hash table(yeah, working on another branch to enable that), for no shared Hash table(Parallel-oblivious), it would have all the data it should have from inner side. And it doesn't matter null values in which batch.
Enable Parallel-oblivious Hash Left Anti Semi (Not-In) Join will benefit a lot if outer table is huge and inner table is very small(perfect for TPCH SQL16 see #31 ).
Performance:(Qingyun Cloud, avg for 3 times, t1 has 100 million rows join t2 with 1200 rows, see [0] for DDL and DML, non-parallel compared with parallel_workers 2, 4, 6 , 8) of
Typical LASJ plans are as below, see [1] below for example DDL and DML.
Non-parallel plan:
Parallel plan:
[0] Example:
[1] Example:
Authored-by: Zhang Mingli [email protected]
closes: #ISSUE
Change logs
Describe your change clearly, including what problem is being solved or what feature is being added.
If it has some breaking backward or forward compatibility, please clary.
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders before you submit the pull request:
make installcheckmake -C src/test installcheck-cbdb-parallel