Skip to content

DataLakeCatalog: avoid full catalog read for UNKNOWN_TABLE typo hints#100452

Open
alsugiliazova wants to merge 1 commit intoClickHouse:masterfrom
alsugiliazova:fix_listing_on_drop
Open

DataLakeCatalog: avoid full catalog read for UNKNOWN_TABLE typo hints#100452
alsugiliazova wants to merge 1 commit intoClickHouse:masterfrom
alsugiliazova:fix_listing_on_drop

Conversation

@alsugiliazova
Copy link
Copy Markdown
Contributor

@alsugiliazova alsugiliazova commented Mar 23, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Avoid scanning the whole remote data lake catalog for “Maybe you meant …” table hints when show_data_lake_catalogs_in_system_tables is disabled.

Documentation entry for user-facing changes

When show_data_lake_catalogs_in_system_tables = 0, the server must not implicitly scan the whole remote data lake catalog.
Previously, building the “Maybe you meant …” hint for a missing table in a DataLakeCatalog database still called getAllTableNames()DatabaseDataLake::getTablesIterator(), which lists the entire catalog and loads per-table metadata—heavy work and can OOM on large catalogs, only to enrich an error message.

This change makes TableNameHints::getAllRegisteredNames() return an empty name list for data lake catalogs when that setting is off, so hint generation does not trigger a full catalog listing.

Query examples

SET show_data_lake_catalogs_in_system_tables = 0;

Drop non-existent table:

DROP TABLE datalake.`schema1.table1`;

Previously (undesired): server could answer with UNKNOWN_TABLE and a “Maybe you meant …” suggestion after scanning catalog names:

Received exception from server (version 26.3.1):
Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table datalake.`schema1.table1` does not exist. Maybe you meant datalake.`schema1.table`?. (UNKNOWN_TABLE)

After the fix: UNKNOWN_TABLE without hint when the setting is 0:

Received exception from server (version 26.4.1):
Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table datalake.`schema1.table1` does not exist. (UNKNOWN_TABLE)

Optional follow-up

If local hints are skipped for data lake + setting 0, getHintForTable can still fall back to getExtendedHintForTable, which only scans non–data-lake databases. In edge cases a suggestion could point at another database’s table. If that is undesirable, a follow-up could skip extended hints under the same condition as local enumeration.

@alsugiliazova alsugiliazova changed the title Data lake: avoid full catalog read for UNKNOWN_TABLE typo hints DataLakeCatalog: avoid full catalog read for UNKNOWN_TABLE typo hints Mar 23, 2026
@alsugiliazova
Copy link
Copy Markdown
Contributor Author

@divanik can you please put can-be-tested label?

@alesapin alesapin self-assigned this Mar 24, 2026
@alesapin alesapin added the can be tested Allows running workflows for external contributors label Mar 24, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 24, 2026

Workflow [PR], commit [b1102b9]

Summary:

job_name test_name status info comment
Integration tests (amd_msan, 4/6) failure
test_keeper_remove_rejoin_leader/test.py::test_leader_election_after_rolling_membership_change FAIL cidb
AST fuzzer (arm_asan_ubsan) failure
AddressSanitizer: stack-use-after-scope (STID: 2136-3203) FAIL cidb, issue

AI Review

Summary

This PR avoids expensive remote catalog enumeration for typo hints on missing tables in DataLakeCatalog when show_data_lake_catalogs_in_system_tables = 0 by short-circuiting TableNameHints::getAllRegisteredNames. The change is small, targeted, and consistent with existing hint fallback behavior; I did not find correctness, safety, compatibility, or performance regressions in the modified path.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Mar 24, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 24, 2026

CLA assistant check
All committers have signed the CLA.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 24, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 24.60% 24.40% -0.20%
Branches 76.60% 76.50% -0.10%

PR changed lines: PR changed-lines coverage: 85.71% (6/7, 0 noise lines excluded)
Diff coverage report
Uncovered code

@alsugiliazova
Copy link
Copy Markdown
Contributor Author

Failures (2) are not related to PR:

Test Error
1 AST fuzzer (arm_asan_ubsan) AddressSanitizer: stack-use-after-scope (STID: 2136-3203) — known pre-existing issue ClickHouse#100442, created 2026-03-23
2 Integration tests (amd_msan, 4/6) test_keeper_remove_rejoin_leader::test_leader_election_after_rolling_membership_change — keeper flaky test; has also failed on master and older PRs

zvonand added a commit to Altinity/ClickHouse that referenced this pull request Mar 28, 2026
Antalya 26.1 Backport of ClickHouse#100452 - DataLakeCatalog: avoid full catalog read for UNKNOWN_TABLE typo hints
@alsugiliazova
Copy link
Copy Markdown
Contributor Author

@alesapin CI looks good, can you please merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants