Hypothesis index in the analyzer by antonio2368 · Pull Request #48381 · ClickHouse/ClickHouse

antonio2368 · 2023-04-04T12:16:51Z

Changelog category (leave one):

Not for changelog (changelog entry is not required)

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp

novikd · 2023-04-18T17:12:29Z

src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp

+        /// transform active formulas
+        for (auto & formula : active_atomic_formulas)
+        {
+            formula = formula->clone(); /// do all operations with copy


For Query Tree it'll clone the whole subtree even when we don't need to modify the node. I think it's better to avoid it.

I agree, QueryTree is constructed inside this class + when we create CNF we clone the nodes again.
And throughout the class I think it's cloned almost always.

I'd recommend fixing it but if you don't want to do it now you can merge it anyway.

src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp

novikd · 2023-04-21T16:27:54Z

Also, it seems we don't have a documentation entry for the Hypothesis index. It'd be nice if you write it because at least you now understand how it works.

robot-clickhouse-ci-2 · 2023-05-02T13:01:08Z

This is an automated comment for commit a397f18 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	✅ success
ClickBench	Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table	✅ success
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker server and keeper images	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Docs check	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Fast tests	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	✅ success
Mergeable Check	Checks if all other necessary checks are successful	✅ success
SQLancer	Fuzzing tests that detect logical bugs with SQLancer tool	✅ success
Sqllogic	Run clickhouse on the sqllogic test set against sqlite and checks that all statements are passed	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Style check	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Unit tests	Runs the unit tests for different release types	✅ success

Check name	Description	Status
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	⏳ pending
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	❌ failure
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ failure
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	❌ failure
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	❌ failure

robot-clickhouse-ci-1 · 2023-05-02T13:01:08Z

This is an automated comment for commit 1c7e447 with description of existing statuses. It's updated for the latest CI running
The full report is available here
The overall status of the commit is 🟡 pending

Check name	Description	Status
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	🟡 pending
Mergeable Check	Checks if all other necessary checks are successful	🟢 success

kitaisreal · 2023-05-12T13:06:49Z

src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp

-            std::end(active_atomic_formulas),
-            std::begin(hypothesis),
-            std::end(hypothesis));
+        auto ast_filter_node = buildFilterNode(query_info.query);


I suspect that this can be wrong in very edge cases because this filter AST part in Analyzer is not valid.

Can you explain a bit more? Why wouldn't it be valid?

As far as I understand we use a query tree that can be different from AST in query_info.query because of optimizations. So in the case of allow_experimental_analyzer you want to use query_info.query_tree to build a filter.

@antonio2368 I spent some time and understood that the problem is more complex. The filters in the query tree are also not those that we use. For example, it misses row policy filters which are added in the Planner. In the case of a new analyzer, the correct filter is stored in the SelectQueryInfo::filter_actions_dag. So you need to use it.

…-index

clickhouse-gh · 2024-06-25T16:53:59Z

Dear @novikd, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

clickhouse-gh · 2024-07-30T13:13:42Z

Dear @novikd, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

clickhouse-gh · 2025-11-20T15:12:05Z

Workflow [PR], commit [c7ae560]

Summary: ❌

job_name	test_name	status	info
Docs check		failure
	Run markdown linter	failure	cidb
Build (amd_compat)		failure
	Cmake configuration	failure	cidb
Build (s390x)		failure
	Build ClickHouse	failure	cidb
Stateless tests (arm_asan, targeted)		failure
	01459_manual_write_to_replicas_quorum_detach_attach	FAIL	cidb, flaky
Stateless tests (amd_binary, ParallelReplicas, s3 storage, parallel)		failure
	01624_soft_constraints	FAIL	cidb
Stateless tests (amd_msan, parallel, 2/2)		failure
	Server died	FAIL	cidb
	03009_storage_memory_circ_buffer_usage	SERVER_DIED	cidb
	Sanitizer assert (in stderr.log)	FAIL	cidb
	Killed by signal (in clickhouse-server.log or clickhouse-server.err.log)	FAIL	cidb
Stateless tests (amd_tsan, s3 storage, parallel)		failure
	03212_variant_dynamic_cast_or_default	FAIL	cidb, flaky
Integration tests (amd_tsan, 3/6)		failure
	test_ssh/test.py::test_paramiko_password	FAIL	cidb
Stress test (amd_msan)		failure
	Server died	FAIL	cidb
	Hung check failed, possible deadlock found (see hung_check.log)	FAIL	cidb
	Killed by signal (in clickhouse-server.log)	FAIL	cidb
	Fatal message in clickhouse-server.log (see fatal_messages.txt)	FAIL	cidb
	Killed by signal (output files)	FAIL	cidb
	Found signal in gdb.log	FAIL	cidb
BuzzHouse (amd_debug)		failure
	Logical error: 'Inconsistent AST formatting: the query:	FAIL	cidb

Copilot

Pull request overview

This PR implements hypothesis index support for the analyzer (new query planning infrastructure), enabling it to work with both the legacy AST-based planner and the new analyzer.

Key changes:

Refactored hypothesis index implementation to support both AST and ActionsDAG (analyzer) paths using templates
Added ActionsDAG-based CNF (Conjunctive Normal Form) conversion infrastructure
Implemented semantic hashing and equality for ActionsDAG nodes to enable comparison across different DAG instances
Updated ComparisonGraph to work with ActionsDAG nodes in addition to AST nodes

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/queries/0_stateless/01624_soft_constraints.sh`	Enabled `use_skip_indexes_on_data_read` setting for test
`tests/parallel_replicas_blacklist.txt`	Removed test from blacklist
`tests/analyzer_tech_debt.txt`	Removed test from analyzer technical debt list
`src/Storages/MergeTree/MergeTreeIndices.h`	Added context parameter to `createIndexMergedCondition` signature
`src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.h`	Refactored to use PIMPL pattern with abstract interface
`src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp`	Implemented dual-path support (AST/ActionsDAG) using template
`src/Storages/MergeTree/MergeTreeIndexHypothesis.h`	Added context parameter to method signature
`src/Storages/MergeTree/MergeTreeIndexHypothesis.cpp`	Passed context to merged condition constructor
`src/Storages/ConstraintsDescription.h`	Added ActionsDAGData class and method for constraints
`src/Storages/ConstraintsDescription.cpp`	Implemented ActionsDAG constraint data conversion
`src/Processors/QueryPlan/ReadFromMergeTree.cpp`	Passed context when creating index merged conditions
`src/Interpreters/TreeCNFConverter.h`	Moved CNF helper functions to Common/CNFHelpers.h
`src/Interpreters/ComparisonGraph.h`	Extended to support ActionsDAG nodes
`src/Interpreters/ComparisonGraph.cpp`	Added ActionsDAG node semantic comparison and hashing
`src/Interpreters/ActionsDAGHashUtils.h`	New file with ActionsDAG node hashing utilities
`src/Interpreters/ActionsDAGCNF.h`	New file with ActionsDAG CNF class interface
`src/Interpreters/ActionsDAGCNF.cpp`	New file with ActionsDAG CNF implementation
`src/Common/CNFHelpers.h`	New file with shared CNF reduction algorithms
`src/Common/CNFAtomicFormula.h`	New file with generic CNF atomic formula template
`src/Analyzer/Passes/CNFAtomicFormula.h`	Changed to use shared template from Common
`src/Analyzer/Passes/CNFAtomicFormula.cpp`	File removed (logic moved to template)
`src/Analyzer/Passes/CNF.h`	Added `iterateGroups` method
`src/Analyzer/Passes/CNF.cpp`	Updated include for CNF helpers
`docs/en/engines/table-engines/mergetree-family/mergetree.md`	Added hypothesis index documentation

Copilot · 2025-11-25T13:37:06Z

src/Processors/QueryPlan/ReadFromMergeTree.cpp

            {
-                skip_indexes.merged_indices.emplace_back();
-                skip_indexes.merged_indices.back().condition = index_helper->createIndexMergedCondition(query_info, metadata_snapshot);
+                if (inserted)


Duplicate condition check: if (inserted) appears twice in a nested manner. The inner check at line 1886 is redundant and the code inside will always execute since it's already within an if (inserted) block. Remove the inner condition.

Copilot · 2025-11-25T13:37:06Z

src/Processors/QueryPlan/ReadFromMergeTree.cpp

+                skip_indexes.merged_indices[it->second].addIndex(index_helper);
            }

            skip_indexes.merged_indices[it->second].addIndex(index_helper);


Duplicate addIndex call: This line appears both inside the nested if (inserted) block (line 1891) and after it (line 1894), resulting in the index being added twice when inserted is true. Remove one of these duplicate calls.

Suggested change

skip_indexes.merged_indices[it->second].addIndex(index_helper);

Copilot · 2025-11-25T13:37:06Z

docs/en/engines/table-engines/mergetree-family/mergetree.md

+
+We define the following index first
+```sql
+INDEX t (a < 10) TYPE hypothesis GRANULARITY 1`


Mismatched backtick: The code example has a closing backtick (`) instead of an opening backtick at the end. Should be a regular backtick for opening the code block or removed entirely if it's meant to be inline.

Suggested change

INDEX t (a < 10) TYPE hypothesis GRANULARITY 1`

INDEX t (a < 10) TYPE hypothesis GRANULARITY 1

Copilot · 2025-11-25T13:37:07Z

src/Common/CNFAtomicFormula.h

+        if (node_with_hash.hash > rhs.node_with_hash.hash)
+            return false;
+
+        return node_with_hash.hash < rhs.node_with_hash.hash || negative < rhs.negative;


Incorrect comparison logic: Line 28-29 returns false if lhs.hash > rhs.hash, but line 31 then checks lhs.hash < rhs.hash. This means when hashes are equal, only negative < rhs.negative is evaluated. However, the early return on line 29 makes the < rhs.hash check on line 31 redundant. The logic should be: return node_with_hash < rhs.node_with_hash || (node_with_hash == rhs.node_with_hash && negative < rhs.negative); assuming NodeWithHashType has proper comparison operators.

Suggested change

if (node_with_hash.hash > rhs.node_with_hash.hash)

return false;

return node_with_hash.hash < rhs.node_with_hash.hash || negative < rhs.negative;

return node_with_hash < rhs.node_with_hash

|| (node_with_hash == rhs.node_with_hash && negative < rhs.negative);

clickhouse-gh · 2026-01-27T13:29:30Z

Dear @antonio2368, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message.

Hypothesis index in new analyzer

a8b7fbd

antonio2368 mentioned this pull request Apr 4, 2023

[Umbrella] Analyzer, Planner migration #42648

Closed

robot-ch-test-poll4 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 4, 2023

Add tests

61e2de8

novikd self-assigned this Apr 4, 2023

antonio2368 added 2 commits April 5, 2023 08:06

Fix

bfe4d1b

Merge branch 'master' into analyzer-hypothesis-index

ef04f3a

novikd reviewed Apr 18, 2023

View reviewed changes

Merge branch 'master' into analyzer-hypothesis-index

73b9fe8

antonio2368 added 2 commits May 2, 2023 12:44

Merge branch 'master' into analyzer-hypothesis-index

45966bc

Apply PR review

1c7e447

antonio2368 added 2 commits May 10, 2023 09:00

Merge branch 'master' into analyzer-hypothesis-index

0b1f426

Remove from broken tests

61eb265

kitaisreal reviewed May 12, 2023

View reviewed changes

antonio2368 added 2 commits June 6, 2023 08:35

Merge branch 'master' into analyzer-hypothesis-index

48fd4e5

Correctly handle query_tree

500faf4

qoega added the analyzer Issues and pull-requests related to new analyzer label Jan 9, 2024

novikd and others added 2 commits January 29, 2024 11:22

Merge remote-tracking branch 'origin/master' into analyzer-hypothesis…

a397f18

…-index

Merge branch 'master' into analyzer-hypothesis-index

71c509d

clickhouse-gh bot unassigned novikd Jun 25, 2024

novikd self-assigned this Jun 26, 2024

clickhouse-gh bot unassigned novikd Jul 30, 2024

al13n321 mentioned this pull request Feb 3, 2025

Hypothesis index mark count mismatch #70748

Closed

alexey-milovidov changed the title ~~Hypothesis index in new analyzer~~ Hypothesis index in the analyzer Jul 27, 2025

Merge branch 'master' into analyzer-hypothesis-index

74c10c0

Fix build

e3e1ceb

antonio2368 marked this pull request as draft November 20, 2025 15:11

antonio2368 force-pushed the analyzer-hypothesis-index branch from 0b15381 to dfc5c72 Compare November 20, 2025 15:23

Initial CNF for ActionsDAG

c7ae560

antonio2368 force-pushed the analyzer-hypothesis-index branch from dfc5c72 to c7ae560 Compare November 20, 2025 19:08

antonio2368 requested a review from Copilot November 25, 2025 13:35

Copilot AI reviewed Nov 25, 2025

View reviewed changes

alexey-milovidov added the close in a month if not active This will be closed in case of no information label Jan 27, 2026

alexey-milovidov mentioned this pull request Feb 14, 2026

Remove hypothesis skip index type #96874

Merged

1 task

alexey-milovidov closed this in #96874 Mar 1, 2026

	INDEX t (a < 10) TYPE hypothesis GRANULARITY 1`
	INDEX t (a < 10) TYPE hypothesis GRANULARITY 1

Conversation

antonio2368 commented Apr 4, 2023 • edited by clickhouse-gh bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Documentation entry for user-facing changes

Uh oh!

Uh oh!

Uh oh!

novikd Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

antonio2368 Apr 19, 2023

Choose a reason for hiding this comment

Uh oh!

novikd Apr 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

novikd commented Apr 21, 2023

Uh oh!

robot-clickhouse-ci-2 commented May 2, 2023 • edited by robot-ch-test-poll3 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robot-clickhouse-ci-1 commented May 2, 2023

Uh oh!

kitaisreal May 12, 2023

Choose a reason for hiding this comment

Uh oh!

antonio2368 May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

novikd Jun 5, 2023

Choose a reason for hiding this comment

Uh oh!

novikd Jun 7, 2023

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot commented Jun 25, 2024

Uh oh!

clickhouse-gh bot commented Jul 30, 2024

Uh oh!

clickhouse-gh bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

antonio2368 commented Apr 4, 2023 •

edited by clickhouse-gh bot

Loading

robot-clickhouse-ci-2 commented May 2, 2023 •

edited by robot-ch-test-poll3

Loading

antonio2368 May 15, 2023 •

edited

Loading

clickhouse-gh bot commented Nov 20, 2025 •

edited

Loading