-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Missing logs for ZooKeeperUserExceptions #20048
Description
Describe the issue
I have ReplicatedReplacingMergeTree on one shard and two replicas managed by ZooKeeper all running on a Kubernetes cluster managed by clickhouse-operator. The setup works fine, data is being replicated correctly but the Prometheus metrics report several ZooKeeperUserExceptions. ClickHouse is not logging those exception, not even on trace level so i have no chance on seeing what the issues are about.
How to reproduce
- ClickHouse server version 21.1.2
- ZooKeeper 3.6.1
CREATE TABLEwithReplicatedReplacingMergeTreeas engine- Set logger level to
Trace
<logger>
<level>trace</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
</logger>
- Grep logs for anything which might be related to the error messages, e.g.
grep -i -E "No node|Bad version|No children for ephemerals|Node exists|Not empty" -r /var/log/clickhouse-server/
- No information on how to reproduce as i can't see the root cause in the missing logs.
Expected behavior
If Prometheus metrics are reporting ZooKeeperUserExceptions i want to see them also in the logs. It seems they are not fatal errors, but normally i would expect them to see on logger level Error already, otherwise Debug at least Trace.
Error message and/or stacktrace
No stacktrace as i can't see the exceptions in the missing logs.
Additional context
I first thought its related to the ClickHouse Kubernetes operator so i opened a ticket there with additional details but we found out its rather a ClickHouse logger issue.
When looking at the implementation you can find the profile events increment here
| ProfileEvents::increment(ProfileEvents::ZooKeeperUserExceptions); |
The places using Coordination::isUserError(Error code) are not really logging anything beside one LOG_INFO in a specific condition
| LOG_INFO(log, "Block with ID {} already exists (it was just appeared). Renaming part {} back to {}. Will retry write.", |
I would expect more logs in that context or the correctly thrown Exceptions being logged where catched.