Conversation
src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp
Outdated
Show resolved
Hide resolved
src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp
Outdated
Show resolved
Hide resolved
| /// transform active formulas | ||
| for (auto & formula : active_atomic_formulas) | ||
| { | ||
| formula = formula->clone(); /// do all operations with copy |
There was a problem hiding this comment.
For Query Tree it'll clone the whole subtree even when we don't need to modify the node. I think it's better to avoid it.
There was a problem hiding this comment.
I agree, QueryTree is constructed inside this class + when we create CNF we clone the nodes again.
And throughout the class I think it's cloned almost always.
There was a problem hiding this comment.
I'd recommend fixing it but if you don't want to do it now you can merge it anyway.
src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp
Outdated
Show resolved
Hide resolved
|
Also, it seems we don't have a documentation entry for the Hypothesis index. It'd be nice if you write it because at least you now understand how it works. |
|
This is an automated comment for commit a397f18 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
|
This is an automated comment for commit 1c7e447 with description of existing statuses. It's updated for the latest CI running
|
| std::end(active_atomic_formulas), | ||
| std::begin(hypothesis), | ||
| std::end(hypothesis)); | ||
| auto ast_filter_node = buildFilterNode(query_info.query); |
There was a problem hiding this comment.
I suspect that this can be wrong in very edge cases because this filter AST part in Analyzer is not valid.
There was a problem hiding this comment.
Can you explain a bit more? Why wouldn't it be valid?
There was a problem hiding this comment.
As far as I understand we use a query tree that can be different from AST in query_info.query because of optimizations. So in the case of allow_experimental_analyzer you want to use query_info.query_tree to build a filter.
There was a problem hiding this comment.
@antonio2368 I spent some time and understood that the problem is more complex. The filters in the query tree are also not those that we use. For example, it misses row policy filters which are added in the Planner. In the case of a new analyzer, the correct filter is stored in the SelectQueryInfo::filter_actions_dag. So you need to use it.
|
Dear @novikd, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
Dear @novikd, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
Workflow [PR], commit [c7ae560] Summary: ❌
|
0b15381 to
dfc5c72
Compare
dfc5c72 to
c7ae560
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements hypothesis index support for the analyzer (new query planning infrastructure), enabling it to work with both the legacy AST-based planner and the new analyzer.
Key changes:
- Refactored hypothesis index implementation to support both AST and ActionsDAG (analyzer) paths using templates
- Added ActionsDAG-based CNF (Conjunctive Normal Form) conversion infrastructure
- Implemented semantic hashing and equality for ActionsDAG nodes to enable comparison across different DAG instances
- Updated ComparisonGraph to work with ActionsDAG nodes in addition to AST nodes
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/queries/0_stateless/01624_soft_constraints.sh |
Enabled use_skip_indexes_on_data_read setting for test |
tests/parallel_replicas_blacklist.txt |
Removed test from blacklist |
tests/analyzer_tech_debt.txt |
Removed test from analyzer technical debt list |
src/Storages/MergeTree/MergeTreeIndices.h |
Added context parameter to createIndexMergedCondition signature |
src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.h |
Refactored to use PIMPL pattern with abstract interface |
src/Storages/MergeTree/MergeTreeIndexHypothesisMergedCondition.cpp |
Implemented dual-path support (AST/ActionsDAG) using template |
src/Storages/MergeTree/MergeTreeIndexHypothesis.h |
Added context parameter to method signature |
src/Storages/MergeTree/MergeTreeIndexHypothesis.cpp |
Passed context to merged condition constructor |
src/Storages/ConstraintsDescription.h |
Added ActionsDAGData class and method for constraints |
src/Storages/ConstraintsDescription.cpp |
Implemented ActionsDAG constraint data conversion |
src/Processors/QueryPlan/ReadFromMergeTree.cpp |
Passed context when creating index merged conditions |
src/Interpreters/TreeCNFConverter.h |
Moved CNF helper functions to Common/CNFHelpers.h |
src/Interpreters/ComparisonGraph.h |
Extended to support ActionsDAG nodes |
src/Interpreters/ComparisonGraph.cpp |
Added ActionsDAG node semantic comparison and hashing |
src/Interpreters/ActionsDAGHashUtils.h |
New file with ActionsDAG node hashing utilities |
src/Interpreters/ActionsDAGCNF.h |
New file with ActionsDAG CNF class interface |
src/Interpreters/ActionsDAGCNF.cpp |
New file with ActionsDAG CNF implementation |
src/Common/CNFHelpers.h |
New file with shared CNF reduction algorithms |
src/Common/CNFAtomicFormula.h |
New file with generic CNF atomic formula template |
src/Analyzer/Passes/CNFAtomicFormula.h |
Changed to use shared template from Common |
src/Analyzer/Passes/CNFAtomicFormula.cpp |
File removed (logic moved to template) |
src/Analyzer/Passes/CNF.h |
Added iterateGroups method |
src/Analyzer/Passes/CNF.cpp |
Updated include for CNF helpers |
docs/en/engines/table-engines/mergetree-family/mergetree.md |
Added hypothesis index documentation |
| { | ||
| skip_indexes.merged_indices.emplace_back(); | ||
| skip_indexes.merged_indices.back().condition = index_helper->createIndexMergedCondition(query_info, metadata_snapshot); | ||
| if (inserted) |
There was a problem hiding this comment.
Duplicate condition check: if (inserted) appears twice in a nested manner. The inner check at line 1886 is redundant and the code inside will always execute since it's already within an if (inserted) block. Remove the inner condition.
| skip_indexes.merged_indices[it->second].addIndex(index_helper); | ||
| } | ||
|
|
||
| skip_indexes.merged_indices[it->second].addIndex(index_helper); |
There was a problem hiding this comment.
Duplicate addIndex call: This line appears both inside the nested if (inserted) block (line 1891) and after it (line 1894), resulting in the index being added twice when inserted is true. Remove one of these duplicate calls.
| skip_indexes.merged_indices[it->second].addIndex(index_helper); |
|
|
||
| We define the following index first | ||
| ```sql | ||
| INDEX t (a < 10) TYPE hypothesis GRANULARITY 1` |
There was a problem hiding this comment.
Mismatched backtick: The code example has a closing backtick (`) instead of an opening backtick at the end. Should be a regular backtick for opening the code block or removed entirely if it's meant to be inline.
| INDEX t (a < 10) TYPE hypothesis GRANULARITY 1` | |
| INDEX t (a < 10) TYPE hypothesis GRANULARITY 1 |
| if (node_with_hash.hash > rhs.node_with_hash.hash) | ||
| return false; | ||
|
|
||
| return node_with_hash.hash < rhs.node_with_hash.hash || negative < rhs.negative; |
There was a problem hiding this comment.
Incorrect comparison logic: Line 28-29 returns false if lhs.hash > rhs.hash, but line 31 then checks lhs.hash < rhs.hash. This means when hashes are equal, only negative < rhs.negative is evaluated. However, the early return on line 29 makes the < rhs.hash check on line 31 redundant. The logic should be: return node_with_hash < rhs.node_with_hash || (node_with_hash == rhs.node_with_hash && negative < rhs.negative); assuming NodeWithHashType has proper comparison operators.
| if (node_with_hash.hash > rhs.node_with_hash.hash) | |
| return false; | |
| return node_with_hash.hash < rhs.node_with_hash.hash || negative < rhs.negative; | |
| return node_with_hash < rhs.node_with_hash | |
| || (node_with_hash == rhs.node_with_hash && negative < rhs.negative); |
|
Dear @antonio2368, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message. |
Changelog category (leave one):
Documentation entry for user-facing changes