Skip to content

Radix sort for ASOF JOIN#4993

Merged
4ertus2 merged 3 commits intoClickHouse:masterfrom
4ertus2:asof
Apr 16, 2019
Merged

Radix sort for ASOF JOIN#4993
4ertus2 merged 3 commits intoClickHouse:masterfrom
4ertus2:asof

Conversation

@4ertus2
Copy link
Copy Markdown
Contributor

@4ertus2 4ertus2 commented Apr 12, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Category (leave one):

  • Performance Improvement

Short description (up to few sentences):
Use radixSort instead of std::sort for ASOF JOIN values.

@Gladdy
Copy link
Copy Markdown
Contributor

Gladdy commented Apr 12, 2019

@4ertus2 Do you have any performance benchmarks already comparing std::sort versus your custom radix sort on data that is the result of the insertBlock calls on real-world/representative tables?

If you were going to replace the sorting algorithm, my first hunch would be something like Timsort.

Reason: it already takes into account the natural increasing or decreasing runs that I would expect to find in the resulting pre-sorted series, as there are probably multiple entries from the same block (which would usually already be sorted provided that the last entry of the primary key of the table is the timestamp) that get appended to the SortedLookupVector in order.

@4ertus2
Copy link
Copy Markdown
Contributor Author

4ertus2 commented Apr 12, 2019

@4ertus2 Do you have any performance benchmarks already comparing std::sort versus your custom radix sort on data that is the result of the insertBlock calls on real-world/representative tables?

Something like this (on my laptop). It's with clang7. https://pastebin.com/b1CMYS9B
But the results mismatch and it's only for 32-bit data. 64-bit radix could be worse. And it's LSB RadixSort. I expect that MSB variant would be better.

If you were going to replace the sorting algorithm, my first hunch would be something like Timsort.

We could make sort algo for ASOF JOIN as a setting.

@Gladdy
Copy link
Copy Markdown
Contributor

Gladdy commented Apr 12, 2019

That's a bigger improvement than I was expecting - anyway, given that Clickhouse already includes boost, it might be worth it to run your benchmark with spreadsort (a radix sort), pdqsort and spinsort from https://www.boost.org/doc/libs/1_70_0/libs/sort/doc/html/index.html#sort.introduction.single_thread_algorithms as those are already well tested and should be quicker than std::sort.

@4ertus2 4ertus2 changed the title [WIP] Radix sort for ASOF JOIN Radix sort for ASOF JOIN Apr 15, 2019
@4ertus2
Copy link
Copy Markdown
Contributor Author

4ertus2 commented Apr 15, 2019

My test is wrong cause of multiple lines with same value in right ASOF column. I've re-checked it with correct queries and radix returns the same result as std::sort.

@Gladdy
Copy link
Copy Markdown
Contributor

Gladdy commented Apr 15, 2019

How do the other sorts included in boost perform on your dataset? There is quite probably a pattern that appears in the order that the data gets put into the SortedLookupVector, the question is just to find the right sort to exploit that pattern for extra performance :)

@4ertus2
Copy link
Copy Markdown
Contributor Author

4ertus2 commented Apr 15, 2019

How do the other sorts included in boost perform on your dataset? There is quite probably a pattern that appears in the order that the data gets put into the SortedLookupVector, the question is just to find the right sort to exploit that pattern for extra performance :)

I haven't test the others yet. I'm on duty this week and going to have holidays the next two weeks so probably I cannot check it till the second half of may. It would be great if you have time to check what sort is the best ;)

@Gladdy
Copy link
Copy Markdown
Contributor

Gladdy commented Apr 15, 2019

Cool. I'll have to try to dig up a representative (non-proprietary) dataset somewhere.

@4ertus2 4ertus2 merged commit 29c9237 into ClickHouse:master Apr 16, 2019
@abyss7 abyss7 added the pr-performance Pull request with some performance improvements label Apr 22, 2019
@Gladdy
Copy link
Copy Markdown
Contributor

Gladdy commented Apr 24, 2019

@4ertus2 Found some time and ran some basic tests: the join performance does seem to be dependent on the data. It might be an option to come up with some representative use cases (which the one below is not) and then go for a single implementation, or add the join algorithm as a setting? (in my testing I just passed it in as an env var). Luckily there don't seem to be absolutely massive differences so it would probably be pretty low priority.

DROP TABLE IF EXISTS tvs;

CREATE TABLE tvs(k UInt32, t UInt32, tv UInt64) ENGINE = Memory;
INSERT INTO tvs(k,t,tv) SELECT k, t, t
FROM (SELECT toUInt32(number) AS k FROM numbers(10000)) keys
CROSS JOIN (SELECT toUInt32(number * 3) as t FROM numbers(10000)) tv_times;

SELECT SUM(trades.price - tvs.tv) FROM
(SELECT k, t, t as price
    FROM (SELECT toUInt32(number) AS k FROM numbers(10000)) keys
    CROSS JOIN (SELECT toUInt32(number * 10) AS t FROM numbers(3000)) trade_times) trades
ASOF LEFT JOIN tvs USING(k,t);

DROP TABLE tvs;


custom radix sort
1 rows in set. Elapsed: 5.273 sec. Processed 100.01 million rows, 1.60 GB (18.97 million rows/s., 303.46 MB/s.) 
1 rows in set. Elapsed: 5.354 sec. Processed 100.01 million rows, 1.60 GB (18.68 million rows/s., 298.84 MB/s.) 

std::sort
1 rows in set. Elapsed: 4.765 sec. Processed 100.01 million rows, 1.60 GB (20.99 million rows/s., 335.78 MB/s.) 
1 rows in set. Elapsed: 4.758 sec. Processed 100.01 million rows, 1.60 GB (21.02 million rows/s., 336.27 MB/s.) 

boost::sort::spinsort
1 rows in set. Elapsed: 3.937 sec. Processed 100.01 million rows, 1.60 GB (25.40 million rows/s., 406.45 MB/s.) 
1 rows in set. Elapsed: 3.891 sec. Processed 100.01 million rows, 1.60 GB (25.70 million rows/s., 411.18 MB/s.) 

boost::sort::pdqsort
1 rows in set. Elapsed: 3.996 sec. Processed 100.01 million rows, 1.60 GB (25.03 million rows/s., 400.39 MB/s.) 
1 rows in set. Elapsed: 3.983 sec. Processed 100.01 million rows, 1.60 GB (25.11 million rows/s., 401.77 MB/s.) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants