Skip to content

Optimising has(), indexOf(), and countEqual() for Array(LowCardinality(T)) and constant right arguments#12550

Merged
alexey-milovidov merged 70 commits intoClickHouse:masterfrom
myrrc:bug/low-cardinality-arrays-optimisations
Aug 24, 2020
Merged

Optimising has(), indexOf(), and countEqual() for Array(LowCardinality(T)) and constant right arguments#12550
alexey-milovidov merged 70 commits intoClickHouse:masterfrom
myrrc:bug/low-cardinality-arrays-optimisations

Conversation

@myrrc
Copy link
Copy Markdown
Contributor

@myrrc myrrc commented Jul 16, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Optimize has(), indexOf() and countEqual() functions for Array(LowCardinality(T)) and constant right arguments.

The arrayIndex.h holding aforementioned functions doesn't have the specialization for Array(LowCardinality(T)) arguments, so they are converted to ordinary columns, which disables the ability to search in their dictionary (useful for constant right arguments).

Also fixes the unsigned insert/select bug, see the test (inserting -1 into unsigned columns) for detail.

Resolves #6005
Resolves #13392
Resolves #13917
Resolves #10880
Resolves #13576

@robot-clickhouse robot-clickhouse added the pr-performance Pull request with some performance improvements label Jul 16, 2020
@myrrc
Copy link
Copy Markdown
Contributor Author

myrrc commented Jul 16, 2020

Known caveats (why this PR is still a draft):
The LC column we receive is const, so we cant' call ReverseIndex::getInsertionPoint as it modifies the index if it's not present.
So I added the const method which throws on such condition.
The problem is, it happens pretty often, and I think there must be another way to do that.

Steps to reproduce:

create table test_lc_array (
 s LowCardinality(String),
 a Array(LowCardinality(String)) default array(s)
) Engine = MergeTree 
PARTITION BY tuple()
ORDER BY tuple();

insert into test_lc_array (s)
select concat('qwertyuiopasdfghjkl', toString(number%10))
from numbers(1000);  

select count() from test_lc_array where has(a, 'qwertyuiopasdfghjkl0');

@Akazz Akazz self-assigned this Jul 16, 2020
@myrrc myrrc marked this pull request as ready for review July 17, 2020 12:45
Copy link
Copy Markdown
Contributor

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a better changelog entry, like:

optimized has(), indexOf(), and countEqual() for columns of Array(LowCardinality(X)) types (where X can be Nullable or any other combination of types).

Needs some tests: stateless SQL-tests and performance-tests
There are also some questionable constructs here and there, please see comments inline.

@myrrc
Copy link
Copy Markdown
Contributor Author

myrrc commented Aug 20, 2020

Resolves #13917

@filimonov
Copy link
Copy Markdown
Contributor

Resolves #10880 ?

@myrrc
Copy link
Copy Markdown
Contributor Author

myrrc commented Aug 21, 2020

Resolves #10880 ?

Yes, the perf tests showed the speedup with right-hand constant arguments.

@abyss7
Copy link
Copy Markdown
Contributor

abyss7 commented Oct 21, 2020

Probably added regression for queries like:

SELECT * FROM table WHERE has(['a', 'b'], low_cardinality_string_column)

With exception:

Types of array and 2nd argument of function "has" must be identical up to nullability, cardinality, numeric types, or Enum and numeric type. Passed: Array(String) and LowCardinality(String)

@alex-zaitsev
Copy link
Copy Markdown
Contributor

Fixed in #16038 ?
@myrrc, could you check?

Probably added regression for queries like:

SELECT * FROM table WHERE has(['a', 'b'], low_cardinality_string_column)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Confirmed user-visible misbehaviour in official release pr-performance Pull request with some performance improvements

Projects

None yet

9 participants