Improve reading with prefetch#49732
Conversation
|
This is an automated comment for commit 63b9c1a with description of existing statuses. It's updated for the latest CI running
|
There was a problem hiding this comment.
this is to actually have moderate const-size tasks. and when we will have code to start new prefetch as soon as some of the previous completed, all tasks will be prefetched as soon as max_streams tasks fit in limits.
There was a problem hiding this comment.
it doesn't look necessary and to my taste it is fairly fairly random logic
There was a problem hiding this comment.
changed to debug because on test level we now have literally tons of log messages per query
bc1bf81 to
eef6f32
Compare
1802fb7 to
f9c1f09
Compare
src/Common/PoolBase.h
Outdated
There was a problem hiding this comment.
How about different model here?
Soft and hard limit.
If pool is under soft limit, it allocates more. When some connection if freed, pool closes excess connections.
If pool is close to hard limit. It waits no more than connection timeout when limit hits hard level.
There was a problem hiding this comment.
what advantages it would have? from perf standpoint any waiting is bad I think.
There was a problem hiding this comment.
With those 2 values you are able to configure system in more than two ways: when it waits a vacant connection, when it doesn't wait any and any option between this extremes. However the most important part that you will never go into infinite connections count. That part is worth the efforts.
from perf standpoint any waiting is bad I think.
What do you have as alternative way?
Ofcourse you could fail request instantly. But it is unlogical, since there is a setting connection_timeout.
Imagine that there are a lot of requests to the S3. If you just fail all the requests above the limit the cluster would stop unable to do a progress, that would be denial of service problem. If you slow down the progress by waiting the connections the progress will continue (there would be the brown zone before the black), that would be performance issue with some ways how to mitigate it by changing load pattern from user side or by changing settings from our side. Sounds like gracefull degradation.
9e8f03c to
af59841
Compare
src/IO/S3/PocoHTTPClient.cpp
Outdated
There was a problem hiding this comment.
did not understand why second condition is needed, add a comment?
src/IO/ReadBufferFromS3.cpp
Outdated
There was a problem hiding this comment.
But AsynchronousBoundedReadBuffer has the same check, which means that for the last read range we will never get to this point (as well as to line 178) in code even though the range was all read successfully. I think we can add the same check in destructor before resetting the session and update read_all_range_successfully?
There was a problem hiding this comment.
with_cache can now be make const in .h file )
There was a problem hiding this comment.
ok, I need not to forget to change this a bit in PR with background download in cache :) because with background download we should not reset the implementation_buffer if file_segment is in PARTAILLY_DOWNLOADED state.
There was a problem hiding this comment.
not necessary to keep it
|
Integration tests are related and potentially have to be updated to reflect current behaviour |
5bd2a08 to
f6e4cb3
Compare
| explicit MemoryTrackerSwitcher(MemoryTracker * new_tracker) | ||
| { | ||
| if (!current_thread) | ||
| throw Exception(ErrorCodes::LOGICAL_ERROR, "current_thread is not initialized"); |
There was a problem hiding this comment.
This breaks clickhouse-disks for S3 disks
Failed to make request to: http://localhost:11111/test?list-type=2&max-keys=1&prefix=clickhouse-disks%2Fdefault%2Ftest.copy: Code: 49. DB::Exception: current_thread is not initialized. (LOGICAL_ERROR), Stack trace (when copying this message, always include the lines below):
So after this change it is not possible to do some operations that requires pools from the main thread.
I will fix it here - b2ea45b (#51448)
Otherwise "current_thread is not initialized" error, that had been introduced in ClickHouse#49732, since it is possible to run this code from non-ClickHouse thread pools. Fixes: ClickHouse#52013 Signed-off-by: Azat Khuzhin <[email protected]>
The code that it uses had been removed in ClickHouse#58845. Introduced in ClickHouse#49732 Signed-off-by: Azat Khuzhin <[email protected]>
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Now we use fixed-size tasks in
MergeTreePrefetchedReadPoolas inMergeTreeReadPool. Also from now we use connection pool for S3 requests.Logical continuation of #49287.