Skip to content

Rewrite IN (subquery) so that it can be executed as JOIN instead of CreatingSets#83991

Merged
davenger merged 22 commits intomasterfrom
rewrite_in_subquery
Sep 30, 2025
Merged

Rewrite IN (subquery) so that it can be executed as JOIN instead of CreatingSets#83991
davenger merged 22 commits intomasterfrom
rewrite_in_subquery

Conversation

@davenger
Copy link
Copy Markdown
Member

@davenger davenger commented Jul 18, 2025

The implementation rewrites
x IN subquery
to
EXISTS (SELECT 1 FROM (SELECT * AS _unique_name_ FROM subquery) WHERE x = _unique_name_ LIMIT 1)
and the EXIST expression is rewritten into JOIN by de-correlation logic.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@davenger davenger marked this pull request as draft July 18, 2025 16:02
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jul 18, 2025

Workflow [PR], commit [da741fb]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jul 18, 2025
@davenger davenger force-pushed the rewrite_in_subquery branch from 3743db6 to 80832c4 Compare July 18, 2025 16:22
@novikd novikd self-assigned this Jul 21, 2025
@davenger davenger force-pushed the rewrite_in_subquery branch from aa5a628 to 68b1e3c Compare July 24, 2025 15:44
@davenger davenger force-pushed the rewrite_in_subquery branch 4 times, most recently from a5d861e to 4ad68dc Compare August 11, 2025 07:49
@davenger davenger force-pushed the rewrite_in_subquery branch from e148585 to 4be2a7f Compare August 11, 2025 14:22
@davenger davenger force-pushed the rewrite_in_subquery branch from 4be2a7f to 63c89b5 Compare August 11, 2025 14:29
@davenger davenger marked this pull request as ready for review August 12, 2025 07:11
{
auto & in_second_argument = function_in_arguments_nodes[1];

if (in_second_argument->as<QueryNode>())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't check if it's a query until you resolve this node. Example:

WITH
    t as (select number from numbers(10)
SELECT *
FROM numbers(20)
WHERE number in t

The second argument here will be IdentifierNode.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added resolving the node here

internal_exists_subquery->getProjection().getNodes().push_back(std::make_shared<IdentifierNode>(Identifier{unique_column_name}));
internal_exists_subquery->getJoinTree() = std::move(subquery_node);

/// SELECT 1 FROM (SELECT * AS _unique_name_ FROM subquery) WHERE a = _unique_name_ LIMIT 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with tuples?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added support for subqueries returning mutiple columns

(function_name == "in" || function_name == "notIn") &&
scope.context->getSettingsRef()[Setting::rewrite_in_to_join])
{
if (!scope.context->getSettingsRef()[Setting::allow_experimental_correlated_subqueries])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe fallback to CreatingSets instead of exception? Both options are okay for me

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better not to silently fallback here because it might hide errors that way

@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 20, 2025
@EmeraldShift
Copy link
Copy Markdown
Contributor

Does this transformation work when the key x is in the primary key, or has special index analysis, like _part_offset + _part_starting_offset? For my use case, I have a query like this:

WITH offsets AS
    (
        SELECT _part_starting_offset + _parent_part_offset AS offset
        FROM mergeTreeProjection('otel', 'spans', 'by_span')
        WHERE trace_id = unhex({trace:String})
    )
SELECT *
FROM otel.spans
WHERE (_part_offset + _part_starting_offset) IN (offsets)

There are ~90 million matching rows in the projection, and CreatingSetsTransformation takes a very long time:

CreatingSetsTransform: Created Set with 89792618 entries from 89792618 rows in 9.025697714 sec.

But the rest of the query is very fast because the part offsets are included in primary index analysis, and can quickly filter parts and granules.
However, If I try to manually rewrite the query as this PR suggests:

WITH offsets AS
    (
        SELECT _part_starting_offset + _parent_part_offset AS offset
        FROM mergeTreeProjection('otel', 'spans', 'by_span')
        WHERE trace_id = unhex({trace:String})
    )
SELECT *
FROM otel.spans
WHERE exists((
    SELECT 1
    FROM
    (
        SELECT offset
        FROM offsets
    )
    WHERE (_part_offset + _part_starting_offset) = offset
    LIMIT 1
))

then the query takes even longer, and it appears to perform a full table scan.

Is there some way to eliminate the lag of CreatingSets, but also utilize the primary key for part offsets? Then, at any size, it would be fast to join against the right side of IN.

@davenger
Copy link
Copy Markdown
Member Author

Is there some way to eliminate the lag of CreatingSets, but also utilize the primary key for part offsets?

I think you can try to play with this sub-query to speed it up. Does it also take 9 sec when run separately?

 SELECT _part_starting_offset + _parent_part_offset AS offset
         FROM mergeTreeProjection('otel', 'spans', 'by_span')
         WHERE trace_id = unhex({trace:String}) FORMAT Null

Then, at any size, it would be fast to join against the right side of IN
You right, just rewriting IN to join will not help, there needs to be an optimization that allows skipping data during reading. Maybe this optimization can help #81526

@EmeraldShift
Copy link
Copy Markdown
Contributor

EmeraldShift commented Aug 29, 2025

I think you can try to play with this sub-query to speed it up. Does it also take 9 sec when run separately?

The subquery only takes ~2 seconds to complete. Then, after it's complete, the CreatingSetsTransformation takes an additional ~9 seconds to transform the result into a set for use with IN, and finally the main query runs quickly, due to the primary index analysis on the part offset columns.

Maybe this optimization can help #81526

Does this work for the primary index too? Notably I am not utilizing any skip indexes in the main query, just the special part offset columns.

At any rate, it seems there are two separate issues:

  • CreatingSets can be very slow for large sets (is this expected? Maybe it can be optimized? Or maybe it's bad and that's why this PR exists?)
  • This JOIN transformation eliminated the cost of CreatingSets for my query (yay!) but also seems to have eliminated index analysis (sad!) on the intermediate result. I don't know how to recover the original behavior of the main query with this transformation.

@Felixoid Felixoid removed the pr-synced-to-cloud The PR is synced to the cloud repo label Sep 26, 2025
Copy link
Copy Markdown
Member

@novikd novikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// SELECT * AS _unique_name_ FROM subquery
auto internal_exists_subquery = std::make_shared<QueryNode>(Context::createCopy(scope.context));
internal_exists_subquery->setIsSubquery(true);
internal_exists_subquery->getProjection().getNodes().push_back(std::make_shared<IdentifierNode>(Identifier{unique_column_name}));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future can be replaced with ColumnNode


auto & copy_of_in_first_parameter = function_in_arguments_nodes[0];

auto subquery_projection = std::make_shared<IdentifierNode>(Identifier{unique_column_name});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@davenger
Copy link
Copy Markdown
Member Author

@davenger davenger enabled auto-merge September 30, 2025 07:03
@davenger davenger added this pull request to the merge queue Sep 30, 2025
Merged via the queue into master with commit a6df40d Sep 30, 2025
119 of 123 checks passed
@davenger davenger deleted the rewrite_in_subquery branch September 30, 2025 07:17
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Sep 30, 2025
@EmeraldShift
Copy link
Copy Markdown
Contributor

Just double checking, is this transformation going to affect queries like my earlier comment, which leverage the primary key analysis between running the subquery and the main query? If it will have a negative impact, then is the transformation optional?

@davenger
Copy link
Copy Markdown
Member Author

@EmeraldShift This transformation is optional and is disabled by default, it is controlled by rewrite_in_to_join setting.
Currently the rewritten query will not use indexes for the IN condition that was transformed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants