ZK watch leak in DDLWorker

DDLWorker thread checks in a loop for new tasks by calling ZooKeeper GetChildren with a watch callback.

https://github.com/ClickHouse/ClickHouse/blob/5b0bc8a7fbf846cb45012e8009d58341656bf12c/src/Interpreters/DDLWorker.cpp#L374

Due to a recent change from #25373 instead of blocking on new events we retry the loop every 10 seconds instead of when watch callback is triggered.

https://github.com/ClickHouse/ClickHouse/blob/5b0bc8a7fbf846cb45012e8009d58341656bf12c/src/Interpreters/DDLWorker.cpp#L1140-L1142

This causes a zookeeper watch callback leak. It is probably not very important as all that happens is that a new std::function is added to the watches list for the ddl worker path every 10 seconds. What is 32 bytes (sizeof(std::function)?) every 10 seconds?

However, it is annoying to see that on metrics. The metric is somewhat useful for catching bugs and this isn't an important one.

I understand that we could probably fix this "temporary fix" in DDLWorker and remove the tryWait + retrying, but that pattern was seen in other places and is useful for implementing a non-blocking watch, also used in `waitForTableReplicaToProcessLogEntry`.

Is there a laconic fix for that? Maybe we could somehow remove the callback when not needed/before retry? Maybe we could change the behaviour slightly and add watchcallback only if it wasn't already added (this wouldn't work for getChildren which wraps a Poco Event in a watch)? Anything else beside just ignoring this?

---

I understand that all the callbacks will be cleared when the event is triggered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZK watch leak in DDLWorker #26036

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	/// FIXME It may hang for unknown reason. Timeout is just a hotfix.
	constexpr int queue_wait_timeout_ms = 10000;
	queue_updated_event->tryWait(queue_wait_timeout_ms);

ZK watch leak in DDLWorker #26036

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions