Skip to content

Check and mark the interserver IO address active in DDL worker#92339

Merged
tuanpach merged 3 commits intoClickHouse:masterfrom
tuanpach:ddl-worker-mark-replicas-active-on-new-host-ids
Jan 22, 2026
Merged

Check and mark the interserver IO address active in DDL worker#92339
tuanpach merged 3 commits intoClickHouse:masterfrom
tuanpach:ddl-worker-mark-replicas-active-on-new-host-ids

Conversation

@tuanpach
Copy link
Copy Markdown
Member

@tuanpach tuanpach commented Dec 17, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Related issue #https://github.com/ClickHouse/support-escalation/issues/6365

Previously, when marking replica active, we don't check the interserver IO address. This address is used for cluster created by Replicated DBs.

In this PR:

  • It checks and marks the interserver IO address as active in DDLWorker::markReplicasActive
  • Notify DDLWorker when host IDs are updated when cluster config updated. This is a separate fix to let DDLWorker runs markReplicaActive again when the host IDs are updated in remoter_servers config.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@tuanpach tuanpach added the can be tested Allows running workflows for external contributors label Dec 17, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Dec 17, 2025

Workflow [PR], commit [61b2d55]

Summary:

job_name test_name status info comment
Stateless tests (arm_binary, parallel) failure
03221_merge_profile_events FAIL cidb
Integration tests (arm_binary, distributed plan, 1/4) failure
test_merges_memory_limit/test.py::test_memory_limit_success FAIL cidb
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting in Function_and: the query: (STID: 1941-1bfa) FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Dec 17, 2025
@tuanpach tuanpach force-pushed the ddl-worker-mark-replicas-active-on-new-host-ids branch from 208c812 to f299bf6 Compare December 17, 2025 03:08
@clickhouse-gh clickhouse-gh bot added the submodule changed At least one submodule changed in this PR. label Dec 17, 2025
@tuanpach tuanpach force-pushed the ddl-worker-mark-replicas-active-on-new-host-ids branch 2 times, most recently from 0929de7 to 833f4b7 Compare December 17, 2025 12:34
@tuanpach tuanpach removed the submodule changed At least one submodule changed in this PR. label Dec 17, 2025
@tuanpach tuanpach force-pushed the ddl-worker-mark-replicas-active-on-new-host-ids branch from 833f4b7 to 357dfbe Compare December 17, 2025 23:51
@alesapin alesapin self-assigned this Dec 22, 2025
@tavplubix
Copy link
Copy Markdown
Member

What happens when the cluster gets updated through DatabaseReplicated::setCluster?

@tuanpach
Copy link
Copy Markdown
Member Author

What happens when the cluster gets updated through DatabaseReplicated::setCluster?

I updated the logic, it will notify shared_ddl_worker that host IDs were updated.

Comment on lines +1319 to +1325
// Add interserver IO host IDs for Replicated DBs
try
{
auto host_port = context->getInterserverIOAddress();
HostID interserver_io_host_id = {host_port.first, port};
all_host_ids.emplace(interserver_io_host_id.toString());
LOG_INFO(log, "Add interserver IO host ID {}", interserver_io_host_id.toString());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the main part that will fix the issue in the cloud

The problem was that context->getClusters() does not return Replicated DB clusters, so the list of hosts was empty in the cloud. However, we don't need to check all hosts in Replicated DB clusters because it's enough to simply use getInterserverIOAddress which is our host for sure

And for remote_servers config, we notify DDLWorker on config changes, but it's a separate fix

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Originally, I thought the IP of Replicated DBs was also changeable. But it is not.

I will update the PR description and title.

@tuanpach tuanpach changed the title Notify DDLWorker when host IDs are updated to re-mark replicas active Check and mark the interserver IO address active in DDL worker Jan 21, 2026
@tuanpach
Copy link
Copy Markdown
Member Author

03221_merge_profile_events

test_merges_memory_limit/test.py::test_memory_limit_success

@tuanpach tuanpach added this pull request to the merge queue Jan 22, 2026
Merged via the queue into ClickHouse:master with commit 6f99054 Jan 22, 2026
127 of 131 checks passed
@tuanpach tuanpach deleted the ddl-worker-mark-replicas-active-on-new-host-ids branch January 22, 2026 06:47
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 22, 2026
@tuanpach tuanpach added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Jan 22, 2026
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Jan 22, 2026
robot-ch-test-poll1 added a commit that referenced this pull request Jan 22, 2026
Cherry pick #92339 to 25.11: Check and mark the interserver IO address active in DDL worker
robot-clickhouse added a commit that referenced this pull request Jan 22, 2026
robot-ch-test-poll1 added a commit that referenced this pull request Jan 22, 2026
Cherry pick #92339 to 25.12: Check and mark the interserver IO address active in DDL worker
robot-clickhouse added a commit that referenced this pull request Jan 22, 2026
clickhouse-gh bot added a commit that referenced this pull request Jan 22, 2026
Backport #92339 to 25.12: Check and mark the interserver IO address active in DDL worker
tilman-aiven added a commit to aiven/ClickHouse that referenced this pull request Feb 6, 2026
…very iteration

Replica dirs in ZK are created in enqueueQueryAttempt() when the first DDL is
enqueued. At worker init getChildren(replicas_dir) was still empty, so
markReplicasActive() never created replicas_dir/<host_id>/active for those
host_ids. The worker requires that active node (and for loopback, this node's
UUID) before executing a task, so the task was skipped with "loopback not
claimed" and the initiator saw timeouts (e.g. HTTP 503).

Call markReplicasActive(reinitialized) on every main-loop iteration, before
scheduleTasks(), so new replica dirs get their active node before we schedule
tasks.

Future backport ClickHouse#92339
It checks and marks the interserver IO address as active in DDLWorker::markReplicasActive.
Notify DDLWorker when host IDs are updated when cluster config updated.
This is a separate fix to let DDLWorker runs markReplicaActive again when the host IDs are updated in remoter_servers config.
tilman-aiven added a commit to aiven/ClickHouse that referenced this pull request Feb 9, 2026
…very iteration

Replica dirs in ZK are created in enqueueQueryAttempt() when the first DDL is
enqueued. At worker init getChildren(replicas_dir) was still empty, so
markReplicasActive() never created replicas_dir/<host_id>/active for those
host_ids. The worker requires that active node (and for loopback, this node's
UUID) before executing a task, so the task was skipped with "loopback not
claimed" and the initiator saw timeouts (e.g. HTTP 503).

Call markReplicasActive(reinitialized) on every main-loop iteration, before
scheduleTasks(), so new replica dirs get their active node before we schedule
tasks.

Future backport ClickHouse#92339
It checks and marks the interserver IO address as active in DDLWorker::markReplicasActive.
Notify DDLWorker when host IDs are updated when cluster config updated.
This is a separate fix to let DDLWorker runs markReplicaActive again when the host IDs are updated in remoter_servers config.
tilman-aiven added a commit to aiven/ClickHouse that referenced this pull request Feb 12, 2026
…very iteration

Replica dirs in ZK are created in enqueueQueryAttempt() when the first DDL is
enqueued. At worker init getChildren(replicas_dir) was still empty, so
markReplicasActive() never created replicas_dir/<host_id>/active for those
host_ids. The worker requires that active node (and for loopback, this node's
UUID) before executing a task, so the task was skipped with "loopback not
claimed" and the initiator saw timeouts (e.g. HTTP 503).

Call markReplicasActive(reinitialized) on every main-loop iteration, before
scheduleTasks(), so new replica dirs get their active node before we schedule
tasks.

Future backport ClickHouse#92339
It checks and marks the interserver IO address as active in DDLWorker::markReplicasActive.
Notify DDLWorker when host IDs are updated when cluster config updated.
This is a separate fix to let DDLWorker runs markReplicaActive again when the host IDs are updated in remoter_servers config.
zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Feb 14, 2026
…licas-active-on-new-host-ids

Check and mark the interserver IO address active in DDL worker
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Feb 16, 2026
25.8.16 Stable backport of ClickHouse#92339: Check and mark the interserver IO address active in DDL worker
tuanpach added a commit that referenced this pull request Feb 24, 2026
Backport #92339 to 25.11: Check and mark the interserver IO address active in DDL worker
robot-ch-test-poll2 added a commit that referenced this pull request Feb 24, 2026
Cherry pick #92339 to 25.8: Check and mark the interserver IO address active in DDL worker
robot-clickhouse added a commit that referenced this pull request Feb 24, 2026
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 24, 2026
tuanpach added a commit that referenced this pull request Feb 24, 2026
Backport #92339 to 25.8: Check and mark the interserver IO address active in DDL worker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants