Radix sort for ASOF JOIN by 4ertus2 · Pull Request #4993 · ClickHouse/ClickHouse

4ertus2 · 2019-04-12T12:49:37Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Category (leave one):

Performance Improvement

Short description (up to few sentences):
Use radixSort instead of std::sort for ASOF JOIN values.

Gladdy · 2019-04-12T17:34:12Z

@4ertus2 Do you have any performance benchmarks already comparing std::sort versus your custom radix sort on data that is the result of the insertBlock calls on real-world/representative tables?

If you were going to replace the sorting algorithm, my first hunch would be something like Timsort.

Reason: it already takes into account the natural increasing or decreasing runs that I would expect to find in the resulting pre-sorted series, as there are probably multiple entries from the same block (which would usually already be sorted provided that the last entry of the primary key of the table is the timestamp) that get appended to the SortedLookupVector in order.

4ertus2 · 2019-04-12T17:57:50Z

@4ertus2 Do you have any performance benchmarks already comparing std::sort versus your custom radix sort on data that is the result of the insertBlock calls on real-world/representative tables?

Something like this (on my laptop). It's with clang7. https://pastebin.com/b1CMYS9B
But the results mismatch and it's only for 32-bit data. 64-bit radix could be worse. And it's LSB RadixSort. I expect that MSB variant would be better.

If you were going to replace the sorting algorithm, my first hunch would be something like Timsort.

We could make sort algo for ASOF JOIN as a setting.

Gladdy · 2019-04-12T18:05:35Z

That's a bigger improvement than I was expecting - anyway, given that Clickhouse already includes boost, it might be worth it to run your benchmark with spreadsort (a radix sort), pdqsort and spinsort from https://www.boost.org/doc/libs/1_70_0/libs/sort/doc/html/index.html#sort.introduction.single_thread_algorithms as those are already well tested and should be quicker than std::sort.

4ertus2 · 2019-04-15T14:12:29Z

My test is wrong cause of multiple lines with same value in right ASOF column. I've re-checked it with correct queries and radix returns the same result as std::sort.

Gladdy · 2019-04-15T14:20:17Z

How do the other sorts included in boost perform on your dataset? There is quite probably a pattern that appears in the order that the data gets put into the SortedLookupVector, the question is just to find the right sort to exploit that pattern for extra performance :)

4ertus2 · 2019-04-15T14:27:24Z

How do the other sorts included in boost perform on your dataset? There is quite probably a pattern that appears in the order that the data gets put into the SortedLookupVector, the question is just to find the right sort to exploit that pattern for extra performance :)

I haven't test the others yet. I'm on duty this week and going to have holidays the next two weeks so probably I cannot check it till the second half of may. It would be great if you have time to check what sort is the best ;)

Gladdy · 2019-04-15T14:41:14Z

Cool. I'll have to try to dig up a representative (non-proprietary) dataset somewhere.

4ertus2 · 2019-04-15T14:50:29Z

There're some:
https://clickhouse.yandex/docs/en/getting_started/example_datasets/nyc_taxi/
https://github.com/gregrahn/tpcds-kit.git

Gladdy · 2019-04-24T21:31:08Z

@4ertus2 Found some time and ran some basic tests: the join performance does seem to be dependent on the data. It might be an option to come up with some representative use cases (which the one below is not) and then go for a single implementation, or add the join algorithm as a setting? (in my testing I just passed it in as an env var). Luckily there don't seem to be absolutely massive differences so it would probably be pretty low priority.

DROP TABLE IF EXISTS tvs;

CREATE TABLE tvs(k UInt32, t UInt32, tv UInt64) ENGINE = Memory;
INSERT INTO tvs(k,t,tv) SELECT k, t, t
FROM (SELECT toUInt32(number) AS k FROM numbers(10000)) keys
CROSS JOIN (SELECT toUInt32(number * 3) as t FROM numbers(10000)) tv_times;

SELECT SUM(trades.price - tvs.tv) FROM
(SELECT k, t, t as price
    FROM (SELECT toUInt32(number) AS k FROM numbers(10000)) keys
    CROSS JOIN (SELECT toUInt32(number * 10) AS t FROM numbers(3000)) trade_times) trades
ASOF LEFT JOIN tvs USING(k,t);

DROP TABLE tvs;


custom radix sort
1 rows in set. Elapsed: 5.273 sec. Processed 100.01 million rows, 1.60 GB (18.97 million rows/s., 303.46 MB/s.) 
1 rows in set. Elapsed: 5.354 sec. Processed 100.01 million rows, 1.60 GB (18.68 million rows/s., 298.84 MB/s.) 

std::sort
1 rows in set. Elapsed: 4.765 sec. Processed 100.01 million rows, 1.60 GB (20.99 million rows/s., 335.78 MB/s.) 
1 rows in set. Elapsed: 4.758 sec. Processed 100.01 million rows, 1.60 GB (21.02 million rows/s., 336.27 MB/s.) 

boost::sort::spinsort
1 rows in set. Elapsed: 3.937 sec. Processed 100.01 million rows, 1.60 GB (25.40 million rows/s., 406.45 MB/s.) 
1 rows in set. Elapsed: 3.891 sec. Processed 100.01 million rows, 1.60 GB (25.70 million rows/s., 411.18 MB/s.) 

boost::sort::pdqsort
1 rows in set. Elapsed: 3.996 sec. Processed 100.01 million rows, 1.60 GB (25.03 million rows/s., 400.39 MB/s.) 
1 rows in set. Elapsed: 3.983 sec. Processed 100.01 million rows, 1.60 GB (25.11 million rows/s., 401.77 MB/s.)

radix sort for asof join

4192deb

4ertus2 mentioned this pull request Apr 12, 2019

Significant speedup of ASOF join #4924

Merged

enable RadixSort for ASOF JOIN over UInt32

7697b18

4ertus2 changed the title ~~[WIP] Radix sort for ASOF JOIN~~ Radix sort for ASOF JOIN Apr 15, 2019

Merge branch 'master' into asof

a519e7f

4ertus2 merged commit 29c9237 into ClickHouse:master Apr 16, 2019

abyss7 added the pr-performance Pull request with some performance improvements label Apr 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Radix sort for ASOF JOIN#4993

Radix sort for ASOF JOIN#4993
4ertus2 merged 3 commits intoClickHouse:masterfrom
4ertus2:asof

4ertus2 commented Apr 12, 2019

Uh oh!

Gladdy commented Apr 12, 2019

Uh oh!

4ertus2 commented Apr 12, 2019 •

edited

Loading

Uh oh!

Gladdy commented Apr 12, 2019

Uh oh!

4ertus2 commented Apr 15, 2019

Uh oh!

Gladdy commented Apr 15, 2019

Uh oh!

4ertus2 commented Apr 15, 2019

Uh oh!

Gladdy commented Apr 15, 2019

Uh oh!

4ertus2 commented Apr 15, 2019

Uh oh!

Gladdy commented Apr 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

4ertus2 commented Apr 12, 2019

Uh oh!

Gladdy commented Apr 12, 2019

Uh oh!

4ertus2 commented Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gladdy commented Apr 12, 2019

Uh oh!

4ertus2 commented Apr 15, 2019

Uh oh!

Gladdy commented Apr 15, 2019

Uh oh!

4ertus2 commented Apr 15, 2019

Uh oh!

Gladdy commented Apr 15, 2019

Uh oh!

4ertus2 commented Apr 15, 2019

Uh oh!

Gladdy commented Apr 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

4ertus2 commented Apr 12, 2019 •

edited

Loading