Skip to content

Maybe fix flaky test_fix_metadata_version_on_attach_part_after_restore#96499

Merged
alexey-milovidov merged 2 commits intomasterfrom
fix-flaky-test-restore-replica-metadata-version
Feb 10, 2026
Merged

Maybe fix flaky test_fix_metadata_version_on_attach_part_after_restore#96499
alexey-milovidov merged 2 commits intomasterfrom
fix-flaky-test-restore-replica-metadata-version

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Move test_fix_metadata_version_on_attach_part_after_restore from test_restore_replica to its own test module
test_restore_replica_metadata_version.

The test shared a module-scoped cluster with 4 other tests that perform 10+ SYSTEM RESTART REPLICA operations. Each restart destroys and recreates storage objects with background threads. In TSAN builds, the accumulated thread churn can exhaust TSAN internal thread slot tracking, causing a segfault in __tsan::TraceMutexUnlock during a system table background merge — crashing the server and failing this unrelated test.

The test is fully independent: it creates its own test_ttl table on only 2 nodes and shares no state with the other restore replica tests. Giving it a fresh cluster avoids the accumulated TSAN thread debris.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=96437&sha=de1caa72d9160d3d01d29245a6f7f1ba453a46e1&name_0=PR&name_1=Integration%20tests%20%28amd_tsan%2C%206%2F6%29

Closes #88500

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Move `test_fix_metadata_version_on_attach_part_after_restore` from
`test_restore_replica` to its own test module
`test_restore_replica_metadata_version`.

The test shared a module-scoped cluster with 4 other tests that perform
10+ `SYSTEM RESTART REPLICA` operations. Each restart destroys and
recreates storage objects with background threads. In TSAN builds, the
accumulated thread churn can exhaust TSAN internal thread slot tracking,
causing a segfault in `__tsan::TraceMutexUnlock` during a system table
background merge — crashing the server and failing this unrelated test.

The test is fully independent: it creates its own `test_ttl` table on
only 2 nodes and shares no state with the other restore replica tests.
Giving it a fresh cluster avoids the accumulated TSAN thread debris.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=96437&sha=de1caa72d9160d3d01d29245a6f7f1ba453a46e1&name_0=PR&name_1=Integration%20tests%20%28amd_tsan%2C%206%2F6%29

Closes #88500

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@alexey-milovidov alexey-milovidov changed the title Fix flaky test_fix_metadata_version_on_attach_part_after_restore Maybe fix flaky test_fix_metadata_version_on_attach_part_after_restore Feb 9, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 9, 2026

Workflow [PR], commit [5e3fb8d]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-ci label Feb 9, 2026
@alexey-milovidov alexey-milovidov self-assigned this Feb 10, 2026
@alexey-milovidov alexey-milovidov merged commit 621b5a3 into master Feb 10, 2026
133 of 134 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-flaky-test-restore-replica-metadata-version branch February 10, 2026 15:32
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_fix_metadata_version_on_attach_part_after_restore is flaky

2 participants