Reject queries when the server is overloaded#63206
Reject queries when the server is overloaded#63206alexkats merged 1 commit intoClickHouse:masterfrom
Conversation
592ccb6 to
a0359f2
Compare
|
This is an automated comment for commit 5577931 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
b11d72e to
86093b2
Compare
|
Stateful tests (tsan) — Invalid check_status.tsv |
0b94adb to
418dbf6
Compare
0431684 to
118dcfe
Compare
118dcfe to
9632f2e
Compare
433a357 to
a4702fe
Compare
30d0ca4 to
f5c42e0
Compare
c4bb031 to
83ed02e
Compare
549f18b to
d6c87d1
Compare
|
|
I'll wait for the CI to take a look that there are no unexpected |
8a82318 to
9e98d9e
Compare
9e98d9e to
476c374
Compare
985048c
|
How can I disable this functionality? I started getting SERVER_OVERLOADED errors because of this. I understand the idea but currently it causes me problems so I don't want to have it on by default. Do I need to set min_os_cpu_wait_time_ratio_to_throw and max_os_cpu_wait_time_ratio_to_throw to some values so that the condition is never met, or is there a cleaner approach? I'd appreciate a config snippet I can include to disable this completely. |
You can just set values for both to 0. Also I created a change #79052 to increase the defaults |
|
Thank you! I understand that I won't be bitten by trying to divide 0 by 0? :-) can you share a config snippet for this? I want to be absolutely sure I get it right the first time :) |
It'll be ok due to earlier check between min and max ratio. Regarding the snippet, it's a pretty straightforward user settings change (https://clickhouse.com/docs/operations/configuration-files#user-settings). It can look smth like this: <clickhouse>
<profiles>
<default>
<min_os_cpu_wait_time_ratio_to_throw>0</min_os_cpu_wait_time_ratio_to_throw>
<max_os_cpu_wait_time_ratio_to_throw>0</max_os_cpu_wait_time_ratio_to_throw>
</default>
</profiles>
</clickhouse> |
|
Awesome, thank you! |
Are you 100% sure this config snippet works? I copied it, pasted to /etc/clickhouse-server/config.d/server_overload_disable.xml, restarted the system but I still got: 2025.05.06 05:54:31.380981 [ 65144 ] {4d3e3416-c9e6-4249-866a-317334462d3b} DynamicQueryHandler: Code: 745. DB::Exception: CPU is overloaded, CPU is waiting for execution way more than executing, ratio of wait time (OSCPUWaitMicroseconds metric) to busy time (OSCPUVirtualTimeMicroseconds metric) is 3.2004100897425842. Min ratio for error (min_os_cpu_wait_time_ratio_to_throw setting) 2, max ratio for error (max_os_cpu_wait_time_ratio_to_throw setting) 6, probability used to decide whether to discard the query 0.30010252243564606. Consider reducing the number of queries or increase backoff between retries. (SERVER_OVERLOADED), Stack trace (when copying this message, always include the lines below):
|
When I check in clickhouse-client: SELECT * FROM system.settings WHERE name LIKE '%cpu_wait%' Row 1: Row 2: |
Ah I should've put it in /etc/clickhouse-server/users.d, not config.d Now it works fine, apologies for the noise! |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Reject queries when the server is overloaded. The decision is made based on the ratio of wait time (
OSCPUWaitMicroseconds) to busy time (OSCPUVirtualTimeMicroseconds). The query is dropped with some probability, when this ratio is betweenmin_os_cpu_wait_time_ratio_to_throwandmax_os_cpu_wait_time_ratio_to_throw(those are query level settings).