Increase hash table max size for the uniqCombined() function non String types#7221
Increase hash table max size for the uniqCombined() function non String types#7221azat wants to merge 1 commit intoClickHouse:masterfrom
Conversation
…ng types This will give accurate result for uniqCombined() and non-String types for cardinality up to 1<<14 (before this patch the accurate result was for cardinality up to 1<<13).
|
It doesn't look correct: |
|
How to validate: Each state is using no more than 96KiB. |
Hm, did not though about that fact that Also I checked this in debugger, and got this:
That said that even for (If I did not missed anything) |
And indeed, this one fails after the patch |
|
Hm, but don't we need care only about on-disk representation? (on disk it should not greater then 65546 bytes -- |
The hash table is resized (or converted to HLL) when it is reached next power of two (if I remember correctly). |
We care about memory size first. The size on disk (and the size of serialized state transferred over network; and how compressible it will be) also matters. But the sizes in memory must be aligned to each other. |
Indeed
So, the in-memory representation is also significant, and the size of the hash table should be reduced by one, instead of increased by the power of two? |
|
If it is resized and then after adding one next element, converted to HLL, it is waste of memory and CPU and should be fixed. |
|
Okay, thanks, closing |
The size is motivated by memory requirements. |
|
Do you want to add my test just in case? :) |
Sure, along with fix for the |
Due to
So this can be fixed:
|
@alexey-milovidov something like #7236 |
Not an option because it will become extremely slow.
Not an option because it will defeat the whole point of combined data structure.
That's the right solution. |
uniqCombined() uses hashtable for medium cardinality, and since HashTable resize by the power of 2 (well actually HashTableGrower grows double by the power of 2, hence HashTableGrower::increaseSize() should be overwritten to change this), with 1<<13 (default for uniqCombined) and UInt64 HashValueType, the HashTable will takes: getBufferSizeInBytes() == 131072 While it should be not greater then sizeof(HLL) ~= 98K, so reduce the maximum cardinality for hashtable to 1<<12 with UInt64 HashValueType and to 1<13 with UInt32, overwrite HashTableGrower::increaseSize() and cover this using max_memory_usage. Refs: ClickHouse#7221 (comment) v2: cover uniqCombined() with non-default K
Category (leave one):
Short description (up to few sentences):
Increase hash table max size to 2^14 for the uniqCombined() function for non-String types
Detailed description (optional):
This will give accurate result for uniqCombined() and non-String types
for cardinality up to 1<<14 (before this patch the accurate result was
for cardinality up to 1<<13).