Binary operator monotonicity#14513
Conversation
|
Hm, afair i've used intDiv in And i've was also looking for the info about binary ops monotonicity, and didn't find it in code... Loosely coupled with #11796 |
What does the query look like? |
There was a problem hiding this comment.
It is not correct because of overflow
SELECT
18446744073709551614 + number AS arg,
arg + 1
FROM numbers(2)
┌──────────────────arg─┬─plus(plus(18446744073709551614, number), 1)─┐
│ 18446744073709551614 │ 18446744073709551615 │
│ 18446744073709551615 │ 0 │
└──────────────────────┴─────────────────────────────────────────────┘
There was a problem hiding this comment.
It should be checked in all other cases too.
There was a problem hiding this comment.
Ouch. Overflow makes things complicated. I wonder if we should provide saturation arithmetic.
There was a problem hiding this comment.
Looks like we want to compare left_point.get<Int64>() >= 0 and right_point.get<Int64>() >= 0
There was a problem hiding this comment.
This code is difficult to read.
It's would be nice to add a comment (or even ascii schema) to each case.
KochetovNicolai
left a comment
There was a problem hiding this comment.
Generally, I would rather update IFunctionBase::getMonotonicityForRange signature to add version constant arguments list. E.g:
Monotonicity getMonotonicityForRange(const IDataType & /*type*/, const Field & /*left*/, const Field & /*right*/, const ColumnsWithTypeAndName & /*arguments*/)where arguments contain arguments.size() - 1 const columns.
This implementation is also ok (it is implemented and works).
It would be nice to add more tests with binary arithmetics in pk, and also test main cases for each function (unit test).
Cases with overflow must be fixed.
|
@KochetovNicolai If we assume that overflow (both signed and unsigned) is implementation specific behaviour, we can also assume that addition of constant is monotonic. |
I do not like it pretty much cause query result ay be wrong in some corner cases, but maybe it is ok for now. Also maybe we need more general definition of interval. E.g. we could work in |
|
Yes, it's better to provide proper monotonicity info for intervals. Yes, for multiplication it's more complicated as users can use it with intended overflow as a simple hash function: |
Rechecked. It seems it somehow works for PARTITION key, but doesn't work for primary key - follow that example: DROP TABLE IF EXISTS intDiv_test;
create table intDiv_test (
a UInt32,
b UInt32
)
Engine = MergeTree
PARTITION BY intDiv(a,536870912) /* splits uint32 range into 8 partitions */
ORDER BY intDiv(b,65536)
SETTINGS index_granularity=1024;
TRUNCATE TABLE intDiv_test;
INSERT INTO intDiv_test SELECT rand(1), rand(2) from numbers(10000);
INSERT INTO intDiv_test SELECT 10000, 10000 from numbers(10);
INSERT INTO intDiv_test SELECT rand(1), rand(2) from numbers(10000);
OPTIMIZE TABLE intDiv_test FINAL;
set send_logs_level='trace';
-- PARTITION key check:
select * from intDiv_test where a = 10000;
-- WORKS!
-- MinMax index condition: (column 0 in [10000, 10000])
-- Selected 1 parts by date, 1 parts by key, 3 marks by primary key, 3 marks to read from 1 ranges
-- for the reference -with identity it reads much more parts:
select * from intDiv_test where identity(a) = 10000;
-- MinMax index condition: unknown
-- Selected 8 parts by date, 8 parts by key, 24 marks by primary key, 24 marks to read from 8 ranges
-- PRIMARY key check:
-- DON'T work.
select * from intDiv_test where b = 10000;
-- Key condition: unknown
-- Selected 8 parts by date, 8 parts by key, 24 marks by primary key, 24 marks to read from 8 ranges
-- for the reference - with identity it's exactly the same
select * from intDiv_test where identity(b) = 10000;
-- Key condition: unknown
-- MinMax index condition: unknown
-- Selected 8 parts by date, 8 parts by key, 24 marks by primary key, 24 marks to read from 8 ranges
-- BUT
select * from intDiv_test where intDiv(b,65536) = intDiv(10000,65536);
-- Key condition: (column 0 in [0, 0])
-- Selected 8 parts by date, 1 parts by key, 1 marks by primary key, 1 marks to read from 1 ranges |
|
@filimonov Partition elimination does not depend on monotonicity at all - it simply evaluates the expressions for every part. |
28a757d to
d2f2b18
Compare
I'm afraid it's not correct. @filimonov I'll do a case by case explanation.
So what does this PR fix? Suppose you have a table partitioned by |
|
Code looks ok, but we need to investigate crash in tests. |
01079_new_range_reader_segfault |
| // variable / constant | ||
| else if (right.column && isColumnConst(*right.column)) | ||
| { | ||
| auto constant = (*left.column)[0]; |
| return false; | ||
| } | ||
|
|
||
| Monotonicity getMonotonicityForRange(const IDataType &, const Field & left_point, const Field & right_point) const override |
There was a problem hiding this comment.
Actually since all of this does not uses generic abstractions, looks like it is better to cover all functions that listed here.
There was a problem hiding this comment.
Which one is not covered?
There was a problem hiding this comment.
Actually only the following is covered
divide(var, const)(viatoDate)
All of the above does not:
divide(const, var)intDiv(var, const)intDiv(const, var)plus(var, const)plus(const, var)minus(var, const)minus(const, var)
There was a problem hiding this comment.
I don't follow. What do you mean by cover here? plus and minus take effect and catch a bug already.
There was a problem hiding this comment.
What do you mean by cover here?
Add primary key with these functions into the 01480_binary_operator_monotonicity.sql
There was a problem hiding this comment.
Ok. So it's "test coverage" you were talking about.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
ClickHouse treats partition expr and key expr differently. Partition expr is used to construct an minmax index containing related columns, while primary key expr is stored as an expr. Sometimes user might partition a table at coarser levels, such as
partition by i / 1000. However, binary operators are not monotonic and this PR tries to fix that. It might also benifit other use cases.Currently only simple binary ops are added.
Detailed description / Documentation draft:
None.