Do not delay final part writing by default (fixes possible Memory limit exceeded during INSERT)#34780
Conversation
29ddb2b to
1e01c53
Compare
|
Note, that performance tests will still show errors, since the problem exists in |
|
Maybe, hm, we will enable this delay streams only for s3 disk? I don't like this new setting - it shows an implementation details to user. |
I thought about this, but even for s3 you should not defer too much parts.
Agree, me neither. |
|
Well, my estimation is that deferred insertions cold cost about x2 memory in worst case - which is not so bad. And I really don't like adding a new setting. |
It is not x2 memory, it depends on number of columns (streams) and number of partitions in the insert block.
The difference can be too high, I don't think that it is good idea to introduce such regression. And I'm pretty sure that there are users that INSERTs lots of partitions in one insert... (that will came with their issues)
I'm not the fan of introduction new setting for everything. I can propagate information about writer to the MergeTreeSink, and do delaying only for it, which static batch of 100 parts (so as thread pool for s3, even though this is a bit different things), but in future this can be extended to give benefits for:
So what will be your final thoughts about this? |
|
The table from perftest is a little bit extreme. Indeed, even 10GB is not enough. Still, this use case is not impossible. I think I almost ok with a new setting. Would better solve it some other way, but can't find anything better. However, setting |
1e01c53 to
e220d29
Compare
Done.
By the way, do you have any numbers on how parallel writes helps? |
e220d29 to
308a782
Compare
|
Sorry for not answering for a long time. Maybe we should rename it to |
It is 1000 by default for S3 in the current version of the patch:
Ok. |
9208e96 to
e17a8c0
Compare
|
Done (also I forgot about |
Signed-off-by: Azat Khuzhin <[email protected]>
For async s3 writes final part flushing was defered until all the INSERT block was processed, however in case of too many partitions/columns you may exceed max_memory_usage limit (since each stream has overhead). Introduce max_insert_delayed_streams_for_parallel_writes (with default to 1000 for S3, 0 otherwise), to avoid this. This should "Memory limit exceeded" errors in performance tests. Signed-off-by: Azat Khuzhin <[email protected]>
e17a8c0 to
3a5a39a
Compare
Added |
Signed-off-by: Azat Khuzhin <[email protected]>
|
|
@KochetovNicolai this looks ready, can you take a look please? |
|
|
||
| /// Whether this disk support parallel write | ||
| /// Overrode in remote fs disks. | ||
| virtual bool supportParallelWrite() const { return false; } |
There was a problem hiding this comment.
btw looks like this never worked because remote disks are wrapped into DiskDecorator for DiskRestartProxy and DiskCacheWrapper (old cache version which is stilled turned on by default) and for DiskDecorator this method was not overriden: #38792
There was a problem hiding this comment.
The test uses explicit max_insert_delayed_streams_for_parallel_write so it works correctly.
As for the failure in the #38792 you can set max_insert_delayed_streams_for_parallel_write=0 explicitly for INSERT w/o this option and then it will ignore remote disk or not.
Also I would recommend add another test that will ensure that S3 does uses parallel writes (via MEMORY_LIMIT_EXCEEDED during INSERT or thread_ids from query_log)
|
At https://clickhouse.com/docs/en/integrations/s3/s3-merge-tree has following desc: I have a question, whether this indicates that under the default configuration, use S3 Table or MergeTree with s3-disk policy, the server requires at least 100 GIB memory(100 * 1000 * 1MB)? Where can I modify the configuration of 100 concurrent file writing threads? |
No, it is
You can adjust this setting But if you really need this, and has good numbers to prove, you can submit a patch/PR. |
|
Thank you for your answer! My real intention is to find out the real cause of memory jitter when I use S3. Please refer to #38839 . I really can't find the reason.My situation is like this. There is no memory problem under normal circumstances. Once it reaches the time to moving to S3 DISK(ssd_s3 policy), or at the same time, there are about tens of thousands of lines per second of data insertion (about 1kb per line), and the memory is like this shake. I'm very frustrated. |
|
Then I have a little question. Where is the setting max_insert_delayed_streams_for_parallel_write configured? Is it in the configuration file |
Have you tried to play with this setting?
This setting (and initial optimization) related only for INSERT, it is not related to merges/TTL moves, so I don't see that it is related.
Yes, or on a per-query basis.
You should do what @kssenii suggested. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Do not delay final part writing by default (fixes possible
Memory limit exceededduringINSERTby addingmax_insert_delayed_streams_for_parallel_writewith default to 1000 for writes to s3 and disabled as before otherwise)For async s3 writes final part flushing was defered until all the INSERT
block was processed, however in case of too many partitions/columns you
may exceed max_memory_usage limit (since each stream has overhead).
Introduce
max_insert_delayed_streams_for_parallel_write(with default to 1000 for writes to s3 and disabled as before otherwise),to avoid this (and avoid introducing regression).
This should "Memory limit exceeded" errors in performance tests.
Async s3 writes: #33291, #34219, #34215 (cc @KochetovNicolai )