Conversation
…th bloom_filter, tokenbf_v1, ngrambf_v1, and text (inverted) index types, enabling granule skipping based on the set of JSON paths present in each granule.
|
Workflow [PR], commit [a1492d1] Summary: ❌
AI ReviewSummaryThis PR adds skip-index support for JSON subcolumn path existence via Missing context
ClickHouse Rules
Final Verdict
|
The CI docs check requires all headings to have explicit anchor IDs. Added IDs to 5 headings in the JSON skip index documentation section. #98886 Co-Authored-By: Claude Opus 4.6 <[email protected]>
… really skip any paths
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
|
In general LGTM |
| FROM (EXPLAIN indexes = 1 SELECT * FROM t_json_text WHERE json.a.b::Int64 = 1) | ||
| WHERE explain LIKE '%Parts:%' OR explain LIKE '%Granules:%' OR explain LIKE '%Skip%'; | ||
|
|
||
| -- 1e: CAST ::Int64 = 0 — unsafe |
There was a problem hiding this comment.
The reader wonders why json.a.b::Int64 = 0 is "unsafe"?
There was a problem hiding this comment.
Because we cannot skip granules without path a.b with this condition, values of this path in granules without this path will be read as Null and in cast to Int64 that will be converted to 0.
LLVM Coverage Report
Changed lines: 96.70% (205/212) · Uncovered code |
|
@CurtizJ can you approve so I can merge it? |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add MergeTree skip index support for JSON columns using JSONAllPaths with bloom_filter, tokenbf_v1, ngrambf_v1, and text (inverted) index types, enabling granule skipping based on the set of JSON paths present in each granule.
Documentation entry for user-facing changes
Note
Medium Risk
Touches MergeTree skip-index condition building for multiple index types, so any mismatch could lead to incorrect granule skipping or missed optimizations; mitigated by explicit safety checks for missing-path defaults and extensive new stateless tests.
Overview
Adds MergeTree skip-index support for JSON path presence by allowing indexes on
JSONAllPaths(json_col)to accelerate filters on JSON subcolumns.Extends bloom-filter-based indexes (
bloom_filter,tokenbf_v1,ngrambf_v1) and thetextindex to recognize JSON subcolumn predicates (including CAST/typed subcolumns), supportequals,IN(bloom only), andIS NOT NULLwhere safe, and avoid unsafe skipping when missing paths would produce default values or when using^sub-object access.Includes a new shared helper (
MergeTreeIndexJSONSubcolumnHelper) plus comprehensive stateless tests and documentation updates describing JSON path-based skipping and examples.Written by Cursor Bugbot for commit 2603d47. This will update automatically on new commits. Configure here.