Skip to content

Fix flaky test 03706_statistics_preserve_checksums_on_mutations#100772

Merged
hanfei1991 merged 1 commit intoClickHouse:masterfrom
groeneai:fix/flaky-03706-statistics-checksums
Mar 26, 2026
Merged

Fix flaky test 03706_statistics_preserve_checksums_on_mutations#100772
hanfei1991 merged 1 commit intoClickHouse:masterfrom
groeneai:fix/flaky-03706-statistics-checksums

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

@groeneai groeneai commented Mar 26, 2026

Pin serialization_info_version='basic' in the test's CREATE TABLE to fix
~30% failure rate under CI's MergeTree setting randomization.

When serialization_info_version=with_types is randomized, the INSERT and
mutation (ALTER TABLE REWRITE PARTS) code paths produce different
serialization.json metadata content. The INSERT path constructs
SerializationInfo from the data write, while the full-column mutation path
reads settings from source_part->storage.getSettings() (MutateTask.cpp:642-653).
This causes a mismatch in the serialization metadata format — the actual
column data is identical (verified: uncompressed_hash_of_compressed_files
matches in all failures).

Investigation: out of 30 test runs with MergeTree randomization, 9 failed.
All 9 failures (100%) had serialization_info_version=with_types — probability
of this being random is (0.5)^9 = 0.2%. With the pin to basic, 50/50 passes
under full randomization (both query and MergeTree settings).

Closes #100786

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Pin serialization_info_version='basic' in the test's CREATE TABLE
settings. With 'with_types', the INSERT and mutation code paths
produce different serialization.json metadata content, causing
checksum mismatches (~30% failure rate under CI randomization).
The actual data remains identical — only the serialization metadata
format differs between the two write paths.

All 9 observed failures in 30 test runs correlated 100% with
serialization_info_version=with_types being randomized. With the
pin, 50/50 passes under full randomization (both query and
MergeTree settings).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@groeneai
Copy link
Copy Markdown
Contributor Author

Pre-PR Validation Gate

a) Deterministic repro? Semi-deterministic — ~30% failure rate through the test runner with MergeTree randomization (--no-random-settings --test-runs 30 → 6-9 failures). All failures correlated 100% with serialization_info_version=with_types. Not fully deterministic as a single manual command (the test runner's randomization mechanism is required), but statistically unmistakable: 9/9 failures had with_types, p < 0.002.

b) Root cause explained? When serialization_info_version=with_types is randomized by the CI test runner and stored in the table metadata, the INSERT and mutation paths produce different serialization.json content. The mutation path (MutateTask.cpp:642-653, affects_all_columns=true branch) reads serialization settings from source_part->storage.getSettings(), while the INSERT path constructs SerializationInfo from the data write process. The with_types version adds type-specific serialization version fields to serialization.json, and this metadata differs between the two write paths even though the actual data is identical (uncompressed_hash_of_compressed_files always matches).

c) Fix matches root cause? Yes — pins serialization_info_version='basic' in CREATE TABLE, preventing the with_types code path that causes the divergence. This is a targeted pin of the specific setting identified by statistical analysis, not a blanket no-random-merge-tree-settings tag.

d) Test intent preserved? Yes — the test still verifies that ALTER TABLE REWRITE PARTS preserves checksums with auto_statistics_types enabled. The serialization_info_version setting controls metadata format only, not the statistics or data correctness being tested.

e) Both directions demonstrated? Yes:

  • Without fix: 6/30 failures (~20%) with MergeTree randomization
  • With fix: 50/50 passes with full randomization (both query and MergeTree settings)

Session: 828a9c69

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Mar 26, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 26, 2026

Workflow [PR], commit [1906c0e]

Summary:

job_name test_name status info comment
Stateless tests (arm_binary, parallel) failure
03546_json_input_output_map_as_array FAIL cidb

AI Review

Summary

This PR stabilizes tests/queries/0_stateless/03706_statistics_preserve_checksums_on_mutations.sh by pinning serialization_info_version='basic' in the test table settings, avoiding known metadata-format divergence between INSERT and mutation paths under randomized MergeTree settings. I did not find correctness, safety, compatibility, or performance issues in the submitted change.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-ci label Mar 26, 2026
@hanfei1991
Copy link
Copy Markdown
Member

maybe this fix is correct

Merged via the queue into ClickHouse:master with commit 3570969 Mar 26, 2026
151 of 153 checks passed
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 26, 2026
@azat
Copy link
Copy Markdown
Member

azat commented Mar 27, 2026

This is a better fix - #100896

Desel72 pushed a commit to Desel72/ClickHouse that referenced this pull request Mar 30, 2026
…atistics-checksums

Fix flaky test 03706_statistics_preserve_checksums_on_mutations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: 03706_statistics_preserve_checksums_on_mutations

5 participants