Skip to content

Fix IPartitionStrategy race condition#1517

Merged
zvonand merged 9 commits intoantalya-26.1from
fix_race_condition_partition_strategy
Mar 17, 2026
Merged

Fix IPartitionStrategy race condition#1517
zvonand merged 9 commits intoantalya-26.1from
fix_race_condition_partition_strategy

Conversation

@arthurpassos
Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos commented Mar 12, 2026

IPartitionStrategy::computePartitionKey might be called from different threads, and it writes to cached_result concurrently without any sort of protection. It would be easier to add a mutex around it, but we can actually make it lock-free by moving the cache write to the constructor.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix race condition on IPartitionStrategy found in https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1463&sha=5e3b05142815f9d4dbbb89497badc85d455079e4&name_0=PR&name_1=Stateless+tests+%28amd_tsan%2C+s3+storage%2C+sequential%2C+2%2F2%29&name_2=Tests.

This was introduced in ClickHouse#92844, I'll ship a PR to upstream as well

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 12, 2026

Workflow [PR], commit [c1d0f74]

@arthurpassos arthurpassos changed the title add mutex around cached_result Fix IPartitionStrategy race condition Mar 12, 2026
@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@arthurpassos arthurpassos added antalya-26.1 antalya port-antalya PRs to be ported to all new Antalya releases labels Mar 13, 2026
mkmkme
mkmkme previously approved these changes Mar 16, 2026
@zvonand zvonand merged commit 50d2a70 into antalya-26.1 Mar 17, 2026
228 of 237 checks passed
subkanthi pushed a commit that referenced this pull request Mar 25, 2026
…trategy

Fix IPartitionStrategy race condition
@arthurpassos
Copy link
Copy Markdown
Collaborator Author

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1517 ([Fix IPartitionStrategy race condition](https://github.com/Altinity/ClickHouse/pull/1517)):

Confirmed defects:

No confirmed defects in reviewed scope.

Coverage summary:

Scope reviewed: src/Storages/IPartitionStrategy.{h,cpp} plus related test updates in tests/integration/test_export_merge_tree_part_to_object_storage/test.py, tests/integration/test_export_replicated_mt_partition_to_object_storage/test.py, and tests/queries/0_stateless/03572_export_merge_tree_part_limits_and_table_functions.{sh,reference} across merge-base 2fb6ff68..c1d0f747.
Categories failed: none confirmed.
Categories passed: shared-state race path around cached_result publication/use; call-graph entrypoints (PartitionedSink, StorageObjectStorage::import, DeltaLake sink) to partition-key computation; exception/error propagation parity; deterministic vs non-deterministic expression handling; concurrency interleaving checks for concurrent computePartitionKey calls; rollback/partial-update impact (no new mutable state transition in hot path); C++ hazard sweep (lifetime, UB, iterator invalidation, lock-order/deadlock) for changed code.
Assumptions/limits: static code audit only (no runtime fault-injection execution or TSAN run in this pass).

@Selfeer
Copy link
Copy Markdown
Collaborator

Selfeer commented Apr 1, 2026

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1517 (Fix IPartitionStrategy race condition):

Confirmed defects:

No confirmed defects in reviewed scope.

@Selfeer
Copy link
Copy Markdown
Collaborator

Selfeer commented Apr 1, 2026

PR #1517 CI Triage — Fix IPartitionStrategy race condition

Field Value
PR #1517fix_race_condition_partition_strategyantalya-26.1
Author @arthurpassos
Commit c1d0f747746a05801ad58493cdf431dfbcd77264
Regression Run 23850885992
ClickHouse Version 26.1.3.20001 (docker://altinityinfra/clickhouse-server:1517-26.1.3.20001.altinityantalya)
Run Flags --use-keeper --with-analyzer
Date 2026-04-01
Labels antalya, port-antalya, antalya-26.1, antalya-26.1.6.20001
Merge Status MERGED

PR Description

Bug fix for a race condition in IPartitionStrategy::computePartitionKey which could be called from different threads writing to cached_result concurrently without protection. The fix moves the cache write to the constructor, making it lock-free. Originally introduced by ClickHouse#92844.

Changed files (6):

  • src/Storages/IPartitionStrategy.cpp (+63, -21)
  • src/Storages/IPartitionStrategy.h (+7, -7)
  • tests/integration/test_export_merge_tree_part_to_object_storage/test.py (+28, -14)
  • tests/integration/test_export_replicated_mt_partition_to_object_storage/test.py (+38, -2)
  • tests/queries/0_stateless/03572_export_merge_tree_part_limits_and_table_functions.reference (+2)
  • tests/queries/0_stateless/03572_export_merge_tree_part_limits_and_table_functions.sh (+54, -7)

Summary

Category Count Details
PR-caused regression 0
Pre-existing flaky / known 5 settings, aggregate_functions_3, tiered_storage_minio, swarms, ssl_server_3
Infrastructure / GCS performance 1 s3_gcs_2 table function wildcard timeouts
Cancelled (3h timeout) 11 alter_attach_partition_1/2, alter_replace_partition, aggregate_functions_2, rbac_2/3, iceberg_1/2, tiered_storage_gcs, lightweight_delete, parquet
Still running 1 s3_aws_2

Verdict: No failures are caused by this PR. All CI failures are pre-existing or infrastructure-related.

PR-targeted regression suites both passed: s3_minio_export_partition ✅ and s3_minio_export_part ✅ — the two suites most directly relevant to the IPartitionStrategy race condition fix.


CI Jobs Status

Job Name Status Failure Type
settings ❌ failure Pre-existing — snapshot mismatch (input_format_parquet_use_metadata_cache)
aggregate_functions_3 ❌ failure Pre-existing — sumCountState serialization changed
s3_gcs_2 ❌ failure Infrastructure — GCS wildcard query timeouts
tiered_storage_minio ❌ failure Pre-existing — round robin disk selection after adding disk
swarms ❌ failure Pre-existing — known 26.1 issue (#1601)
ssl_server_3 ❌ failure Pre-existing — openssl signing interrupted (exit 130)
alter_attach_partition_1 ⛔ cancelled 3h CI timeout
alter_attach_partition_2 ⛔ cancelled 3h CI timeout
alter_replace_partition ⛔ cancelled 3h CI timeout
aggregate_functions_2 ⛔ cancelled 3h CI timeout
rbac_2 ⛔ cancelled 3h CI timeout
rbac_3 ⛔ cancelled 3h CI timeout
iceberg_1 ⛔ cancelled 3h CI timeout
iceberg_2 ⛔ cancelled 3h CI timeout
tiered_storage_gcs ⛔ cancelled 3h CI timeout
lightweight_delete ⛔ cancelled 3h CI timeout
parquet ⛔ cancelled 3h CI timeout
s3_minio_export_partition success PR-targeted suite
s3_minio_export_part success PR-targeted suite
All other jobs (51 total) ✅ success

Detailed Failure Analysis

1. settings — Pre-existing Snapshot Mismatch

  • Failing test: /settings/default values/input_format_parquet_use_metadata_cache
  • Stats: 1523 scenarios — 1518 ok, 1 failed, 4 xfail (1m 11s)
  • Cause: Snapshot expects {"default":"0"} but server returns {"default":"1"} for input_format_parquet_use_metadata_cache.
  • Relation to PR: None. Parquet input format setting default is unrelated to IPartitionStrategy.
  • Confirmed pre-existing: Same failure on PR #1529 run (23655067918) and scheduled ClickHouse latest run (23773974950). Snapshot needs updating for the new default.

2. aggregate_functions_3 — Pre-existing Serialization Change

  • Failing tests: sumCountState (25 Nullable/LowCardinality type variants) and sumCountMerge (LowCardinality Nullable Float32)
  • Stats: 360 scenarios — 310 ok, 2 failed, 45 skipped, 3 xfail (1h 37m)
  • Cause: sumCountState() serialized output gained a leading 01 byte (e.g. expected 1C04... but got 011C04...). Aggregate state format changed in the ClickHouse build.
  • Relation to PR: None. Aggregate function state serialization is completely unrelated to IPartitionStrategy/export partition.
  • Confirmed pre-existing: Same failure on PR #1529 run (23655067918), scheduled ClickHouse head (23774629703), and scheduled ClickHouse latest (23773974950).

3. s3_gcs_2 — Infrastructure / GCS Performance

  • Failing tests: /s3/gcs/part 2/table function performance/wildcard/nums, nums one invalid, range (120s ExpectTimeoutError), star no match (elapsed 2.017s > 2s threshold)
  • Stats: 70 scenarios — 37 ok, 1 errored, 31 xfail, 1 xerror (1h 35m)
  • Cause: GCS s3() table function queries with brace/glob wildcards hanging or exceeding tight performance thresholds.
  • Relation to PR: None. S3/GCS table function glob performance is unrelated to IPartitionStrategy.
  • Pre-existing signal: s3_gcs_1 failed in scheduled head (23774629703) and scheduled latest (23773974950) runs with similar GCS performance issues. The s3_gcs_2 job specifically did not fail in those runs, but the underlying GCS performance instability is a known pattern.

4. tiered_storage_minio — Pre-existing Disk Selection Issue

  • Failing tests: /tiered storage/with minio/change config norestart/check round robin after adding new disk/ — 4 scenario variants (one_small_disk_main_new_jbod4*)
  • Stats: 94 scenarios — 84 ok, 6 failed, 4 xfail
  • Cause: After hot-adding a disk without restart, system.parts disk_name set doesn't match expected disks. AssertionError: assert disks == used_disks.
  • Relation to PR: None. Tiered storage JBOD disk selection is unrelated to IPartitionStrategy/export partition.
  • Confirmed pre-existing: Same failure on scheduled ClickHouse head (23774629703) and scheduled ClickHouse latest (23773974950).

5. swarms — Pre-existing Known Issue

  • Failing tests:
    • /swarms/feature/node failure/network failure — exit code 0 instead of expected 138
    • /swarms/feature/node failure/initiator out of disk spaceUNKNOWN_DATABASE error (Code: 81)
  • Stats: 1520 scenarios — 1486 ok, 2 failed, 32 xfail (24m 11s)
  • Relation to PR: None. Swarms node failure resilience is completely unrelated to IPartitionStrategy.
  • Confirmed pre-existing: Tracked in Altinity/ClickHouse#1601. Same failure on PR #1529, PR #1576 (23775422047), and PR #1577 (23759752139).

6. ssl_server_3 — Pre-existing OpenSSL Timeout

  • Failing tests: /ssl server/part 3/ca chain/use second intermediate ca/ — 3 variants of all certificates present under server certificate with chain (with/without caconfig) and server certificate without chain (without caconfig)
  • Stats: 34 scenarios — 19 ok, 3 failed, 12 xfail (1h 42m)
  • Cause: openssl x509 -sha256 -req ... -CA sub2/ca.crt process interrupted with exit code 130 (SIGINT) instead of completing successfully. The certificate signing step hangs and gets killed.
  • Relation to PR: None. SSL certificate chain generation is unrelated to IPartitionStrategy.
  • Confirmed pre-existing: Same failure on scheduled ClickHouse head (23774629703, arm rows) and scheduled ClickHouse latest (23773974950, arm rows).

Cancelled Jobs (3h Timeout)

All 11 cancelled jobs exhibit the same pattern: started around 13:30 UTC, cancelled at exactly 16:35 UTC (~3h 5m), with the "Run suite" step cancelled and subsequent steps left pending. This is the CI runner's 3-hour job timeout.

Job Started Cancelled
alter_attach_partition_2 13:30:07 16:35:07
alter_replace_partition 13:31:28 16:36:28
alter_attach_partition_1 13:33:04 16:38:05
aggregate_functions_2 ~13:30 ~16:35
rbac_2, rbac_3 ~13:30 ~16:35
iceberg_1, iceberg_2 ~13:30 ~16:35
tiered_storage_gcs ~13:30 ~16:35
lightweight_delete ~13:30 ~16:35
parquet ~13:30 ~16:35

These suites commonly take >3h on single-threaded --parallel 1 keeper+analyzer runs. The timeouts are not related to PR changes.


PR-Targeted Regression Suites

The regression suites most directly relevant to the IPartitionStrategy race condition fix both passed:

Suite Status Details
s3_minio_export_partition ✅ success Export partition suite — directly tests the fixed code path
s3_minio_export_part ✅ success Export part suite — tests MergeTree part export to S3

@Selfeer Selfeer added the verified Approved for release label Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 antalya-26.1.6.20001 port-antalya PRs to be ported to all new Antalya releases verified Approved for release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants