Skip to content

Try fix 02151_hash_table_sizes_stats.sh test#48178

Merged
tavplubix merged 2 commits intoClickHouse:masterfrom
nickitat:fix_ht_test
Mar 31, 2023
Merged

Try fix 02151_hash_table_sizes_stats.sh test#48178
tavplubix merged 2 commits intoClickHouse:masterfrom
nickitat:fix_ht_test

Conversation

@nickitat
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@nickitat nickitat requested a review from tavplubix March 29, 2023 15:45
@nickitat nickitat linked an issue Mar 29, 2023 that may be closed by this pull request
@tavplubix
Copy link
Copy Markdown
Member

tavplubix commented Mar 29, 2023

Our memory trackers work with sanitizers as well. So why does OOM happen?

@tavplubix tavplubix self-assigned this Mar 29, 2023
@nickitat
Copy link
Copy Markdown
Member Author

I have no idea, let's at least reduce the amount of noise

@tavplubix
Copy link
Copy Markdown
Member

tavplubix commented Mar 29, 2023

Let's try to debug it. OOMs like this usually happen when we allocate a huge chunk of memory without checking that it doesn't exceed limitations. In other words, when we call CurrentMemoryTracker::allocNoThrow instead of CurrentMemoryTracker::alloc for huge allocations. In other words, when we use new or malloc for huge allocations without additional checks (malloc and new call allocNoThrow because they cannot throw). I quickly looked through Aggregator and found two suspicious places:

std::unique_ptr<AggregateDataPtr[]> places(new AggregateDataPtr[row_end]);

std::unique_ptr<AggregateDataPtr[]> places(new AggregateDataPtr[row_end]);

Where row_end is actually a block size and can be quite big. I tried increasing max_block_size and got an interesting result:
https://pastila.nl/?00b0c940/f011d9e7124b0c30b20cdb4be9872be6

At first, it logs

[ 982431 ] ... <Debug> MemoryTracker: Current memory usage (for query): 16.00 GiB.

and after that it throw an exception in another thread:

Memory limit (for user) exceeded: would use 16.00 GiB (attempt to allocate chunk of 17179953272 bytes), maximum: 953.67 MiB

The maximum was about 1GiB, but looks like Aggregator had allocated 16GiB.

Can it be the reason? Do we have other (less obvious) places in Aggregator where huge allocations may happen without proper checks?

Copy link
Copy Markdown
Member

@tavplubix tavplubix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge it to suppress the issue temporarily, but the root cause must be investigated. We definitely have some bugs with memory accounting in Aggregator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

02151_hash_table_sizes_stats triggers OOM

3 participants