Skip to content

Antalya 25.8: remote initiator improvements#1576

Merged
zvonand merged 9 commits intoantalya-25.8from
feature/antalya-25.8/remote_initiator_improvements
Mar 30, 2026
Merged

Antalya 25.8: remote initiator improvements#1576
zvonand merged 9 commits intoantalya-25.8from
feature/antalya-25.8/remote_initiator_improvements

Conversation

@ianton-ru
Copy link
Copy Markdown

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Different improvement for remote initiator

Documentation entry for user-facing changes

With remote initiator feature queries like

SELECT * FROM iceberg(...) SETTINGS object_storage_cluster='swarm', object_storage_remote_initiator=1

rewrites as

SELECT * FROM remote('remote_host', icebergCluster('swarm', ...)

'remote_host' is a random host from 'swarm' cluster
See #756

Current PR introduces the next improvements:

  1. Partially solved object_storage_remote_initiator auth works incorrectly #1570 - uses username and password if access to cluster requires it. Throws exception if cluster uses common secret, this should be solved in future PRs.
  2. Solved object_storage_remote_initiator with different cluster name #1571 - new setting object_storage_remote_initiator_cluster allows to choose remote_host from different cluster, not only from swarm
  3. remote query did not work with additional setting inside function, like remote('remote_host', iceberg(..., SETTINGS iceberg_metadata_file_path='path/to/metadata.json')). Now must work correctly.
  4. In query remote('remote_host', icebergCluster('remote_cluster', ...)) cluster remote_cluster can be defined only on remote_host and unknown on current initial host. Removed cluster check on early stage, this allows to execute such queries.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@ianton-ru ianton-ru changed the title Feature/antalya 25.8/remote initiator improvements Antalya 25.8: remote initiator improvements Mar 25, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

Workflow [PR], commit [7caab5a]

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 387cfa13d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{
{"allow_local_data_lakes", false, false, "New setting."},
{"iceberg_partition_timezone", "", "", "New setting."},
{"s3_propagate_credentials_to_other_storages", false, false, "New setting"},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop unknown settings from compatibility history

This entry introduces a setting name that is not declared in src/Core/Settings.cpp (same issue also appears for export_merge_tree_partition_lock_inside_the_task and export_merge_tree_partition_system_table_prefer_remote_information later in this block). When compatibility is set to a version below 25.8.16.20002, SettingsImpl::applyCompatibilitySetting() iterates these changes and attempts to read/apply them, which triggers runtime "setting not found" failures instead of applying compatibility defaults.

Useful? React with 👍 / 👎.

{"allow_local_data_lakes", false, false, "New setting."},
{"iceberg_partition_timezone", "", "", "New setting."},
{"s3_propagate_credentials_to_other_storages", false, false, "New setting"},
{"export_merge_tree_part_filename_pattern", "", "{part_name}_{checksum}", "New setting"},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep filename-pattern change in its original version block

This re-adds export_merge_tree_part_filename_pattern as a "new setting" in 25.8.16.20002, but it is already introduced in the 25.8.9.2000 block with the same previous/new values. Because compatibility rollback applies all versions newer than the requested one, setting compatibility to any version between those two (for example 25.8.10) will now incorrectly force this setting back to "", producing defaults that don't match that target version.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b9c0d9bba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

auto remote_initiator_cluster_name = settings[Setting::object_storage_remote_initiator_cluster].value;
if (remote_initiator_cluster_name.empty())
remote_initiator_cluster_name = cluster_name_from_settings;
auto remote_initiator_cluster = getClusterImpl(context, remote_initiator_cluster_name);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply object_storage_max_nodes to initiator cluster selection

IStorageCluster::read now builds remote_initiator_cluster without the object_storage_max_nodes cap, so object_storage_remote_initiator=1 can pick a remote initiator from hosts that the query intentionally excluded via object_storage_max_nodes. In environments that use this cap to avoid unhealthy/unconfigured nodes, this introduces nondeterministic routing failures (auth/connectivity/cluster visibility) even though the setting is configured. The initiator cluster lookup should use the same max-host limit as the normal cluster path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

@ianton-ru ianton-ru Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok, with object_storage_max_nodes getCluster chooses this number of random cluster nodes, and for remote initiator also random node is selected, Select random node from random nodes sublist gives the same result as just select random node from full list.

@ianton-ru
Copy link
Copy Markdown
Author

AI audit note: This review comment was generated by AI (gpt-5.4).

Audit update for PR #1576 after the latest branch update:

Confirmed defects

No confirmed defects in reviewed scope.

Coverage summary

Scope reviewed: current PR diff vs origin/antalya-25.8 across analyzer/table-function SETTINGS propagation, IStorageCluster::convertToRemote() auth and initiator-cluster rewrite, deferred *Cluster existence checks for nested remote(..., xxxCluster(...)), compatibility metadata for object_storage_remote_initiator_cluster, and the added integration coverage in tests/integration/test_s3_cluster.

Categories failed: none confirmed.

Categories passed: previous FunctionNode::toASTImpl AST-shape issue is fixed; previous settings-history duplication issue is fixed; nested table-function SETTINGS now survive analyzer rewrites; secret-cluster path is fail-closed; remote/remoteSecure password arguments are already treated as secret for logging/formatting; unknown nested cluster resolution is correctly deferred to the remote node; no confirmed concurrency/lifetime/rollback defects in the changed C++ paths.

Assumptions/limits: static audit only; I did not run the integration suite or build the server, so this conclusion is limited to code-path reasoning over the current PR state.

Prewhere filter
Prewhere filter column: less(multiply(2, b), 100)
Filter column: and(indexHint(greater(plus(i, 40), 0)), equals(a, 0)) (removed)
Filter column: and(equals(a, 0), indexHint(greater(plus(i, 40), 0))) (removed)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument order depends on hash, hash was changes (see FunctionNode::updateTreeHashImpl)

@svb-alt svb-alt requested a review from arthurpassos March 26, 2026 17:47
arthurpassos
arthurpassos previously approved these changes Mar 27, 2026
@zvonand zvonand merged commit 096f029 into antalya-25.8 Mar 30, 2026
264 of 268 checks passed
@alsugiliazova
Copy link
Copy Markdown
Member

PR #1576 CI Verification Report

CI Results Overview

Category Count
Success ~60
Failure 2 (Swarms aarch64, GrypeScan alpine)
Skipped ~39 (excluded sanitizer suites)

PR's New Test Validation

The PR adds new integration tests for test_object_storage_remote_initiator. These tests initially failed on the Mar 27 CI run but were fixed by subsequent commits. The flaky check also initially failed (5 parametrized runs) but passed on the final run.

Latest run (Mar 30) — all pass:

Job Test Result
Integration tests (amd_binary, 4/5) test_object_storage_remote_initiator OK
Integration tests (amd_asan, old analyzer, 6/6) test_object_storage_remote_initiator OK
Integration tests (arm_binary, distributed plan, 2/4) test_object_storage_remote_initiator OK
Integration tests (amd_asan, flaky check) test_object_storage_remote_initiator[1-5] through [5-5] (15 runs, 3 retries × 5 params) All OK

All 18 test executions passed on the latest CI run.

CI Failures

1. test_object_storage_remote_initiator (Mar 27 run) — Fixed by PR Commits

Jobs: Integration tests (amd_binary 5/5, amd_asan 2/6, arm_binary 3/4) + flaky check (5 parametrized)

Failed on Mar 27 intermediate run. All tests pass on the final run (Mar 30) after fix commits 32c53796 (Fix test) and 7caab5ae (Fix setting cleanup).

Related to PR: Yes — Development-stage failures, resolved in final commits

2. 01625_constraints_index_append (Fast test, Mar 25) — Fixed by PR Commits

Job: Fast test (2 initial runs)

The PR modifies the reference file. Failed twice on early commits (Mar 25), then passed consistently in all subsequent runs.

Related to PR: Yes — Reference file update, resolved in subsequent commits

3. 02668_column_block_number_with_projections — Isolated Flaky

Job: Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel)

Single failure on Mar 27. Only occurrence in the last 30 days for this PR. The test involves column block numbers with projections — unrelated to remote initiator changes.

Related to PR: No — Isolated flaky test failure

4. GrypeScan (-alpine) — CVE in Base Image

CVE in Alpine base image (altinityinfra/clickhouse-server:1576-25.8.16.20002.altinityantalya-alpine). Non-alpine image scan passed.

Related to PR: No — Base image vulnerability

Regression Test Results (PR's Internal CI)

Suite x86_64 aarch64
Iceberg (1) Pass Pass
Iceberg (2) Pass Pass
Parquet Pass Pass
Parquet (aws_s3) Pass Pass
Parquet (minio) Pass Pass
S3 Export (part) Pass Pass
S3 Export (partition) Pass Pass
Swarms Pass Fail

Regression Failure: Swarms aarch64 — Pre-existing Node Failure Instability

Conclusion

Verdict: Ready to merge (already merged) — No unresolved PR-related failures.

@alsugiliazova alsugiliazova added the verified Approved for release label Mar 31, 2026
@Selfeer Selfeer mentioned this pull request Apr 1, 2026
25 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants