Skip to content

Optimize remove unused columns in plan#76487

Merged
antaljanosbenjamin merged 91 commits intomasterfrom
optimize-remove-unused-columns-in-plan
Nov 20, 2025
Merged

Optimize remove unused columns in plan#76487
antaljanosbenjamin merged 91 commits intomasterfrom
optimize-remove-unused-columns-in-plan

Conversation

@antaljanosbenjamin
Copy link
Copy Markdown
Member

@antaljanosbenjamin antaljanosbenjamin commented Feb 20, 2025

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add optimization to remove unused columns in query plans. Resolves #75152

Details

Resolves #75152

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 20, 2025

Workflow [PR], commit [ddd8268]

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Feb 20, 2025
@yariks5s yariks5s self-assigned this Feb 20, 2025

Note that initially (24.12) there was a server setting (`send_settings_to_client`), but latter it got replaced with this client setting, for better usability.
)", 0) \
DECLARE(Bool, query_plan_try_remove_unused_columns, true, R"(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_plan_remove_unused_columns

@antaljanosbenjamin antaljanosbenjamin marked this pull request as draft February 20, 2025 12:22
@yariks5s yariks5s removed their assignment Feb 20, 2025
@antaljanosbenjamin
Copy link
Copy Markdown
Member Author

Just to make it into public knowledge: the current approach is not completely okay, because it might break the rule of subsequent steps must have matching input-output headers. So the plan is to collect the extra columns in the plan from top to bottom, and then remove the ones that can be removed from bottom to the top.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented May 6, 2025

Dear @antaljanosbenjamin, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jul 8, 2025

Dear @antaljanosbenjamin, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message.

const auto updated_anything
= updatedAnything(child_step->removeUnusedColumns(getNameMultiSetFromNames(parent_inputs[child_id]->getNames()), false));

// As removeUnusedColumns might leave additional columns in the output, we have get rid of those outputs by adding a new ExpressionStep
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific here, in what case removeUnusedColumns() cannot remove all unused columns from output?

Copy link
Copy Markdown
Member

@devcrafter devcrafter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antaljanosbenjamin
Copy link
Copy Markdown
Member Author

Failed tests:

  • Unit test: I didn't touch the BackgroundSchedulePool, so I think it is not related. Locally I run the tests without any issues
  • Flaky test check: 01062_pm_all_join_with_block_continuation is known to time out in ASAN builds, it is already tagged, but the flaky tests check ignores it
  • BuzzHouse: inconsistent AST formatting and a crash that doesn't seem to be related

Merging it manually

@antaljanosbenjamin antaljanosbenjamin added this pull request to the merge queue Nov 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 19, 2025
@antaljanosbenjamin
Copy link
Copy Markdown
Member Author

  • 03141_fetches_errors_stress is flaky
  • 01062_pm_all_join_with_block_continuation is flaky with asan
  • BuzzHouse is failed because of inconsistent formatting and a float -> int conversion in profile events. None of them is related.

@antaljanosbenjamin antaljanosbenjamin added this pull request to the merge queue Nov 20, 2025
Merged via the queue into master with commit 43bfad3 Nov 20, 2025
127 of 132 checks passed
@antaljanosbenjamin antaljanosbenjamin deleted the optimize-remove-unused-columns-in-plan branch November 20, 2025 13:45
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 20, 2025
@alexey-milovidov alexey-milovidov added pr-performance Pull request with some performance improvements and removed pr-improvement Pull request with some product improvements labels Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filter-push-down does not remove unneeded columns from JOIN

8 participants