Skip to content

Load join tables first on server startup#15157

Closed
javisantana wants to merge 2 commits intoClickHouse:masterfrom
javisantana:fix/join_tables_dependencies
Closed

Load join tables first on server startup#15157
javisantana wants to merge 2 commits intoClickHouse:masterfrom
javisantana:fix/join_tables_dependencies

Conversation

@javisantana
Copy link
Copy Markdown
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

When you have two tables like this.

create table z (a Int32, b Int32) Engine=Join(ANY, LEFT, a);
create table a(a Int32, b Int32 MATERIALIZED joinGet('default.z', 'b', a)) Engine=MergeTree() order by a;

When clickhouse starts up it fails because when a is loaded z is not ready and the joinGet fails and therefore the table loading stops (and server shuts down)

This PR changes the startup a little bit so CH loads Join tables first and then the other ones.

Not sure if this is the right approach but it's working for us.

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Sep 22, 2020
@tavplubix tavplubix self-assigned this Sep 22, 2020
@tavplubix
Copy link
Copy Markdown
Member

tavplubix commented Sep 22, 2020

Thank you for an attempt to fix this annoying issue, but the fix seems to be very ad hoc. StorageJoin engine tables are not that special and I don't think that we should load them first as a special case. Also, it doesn't fix cases like the following:

create database z;
create table z.j (a Int32, b Int32) Engine=Join(ANY, LEFT, a);
create table a (a Int32, b Int32 MATERIALIZED joinGet('z.j', 'b', a)) Engine=MergeTree() order by a;
-- Server will fail to start with "Database z doesn't exist ... Cannot attach table default.a" exception

We have similar issue with dictGet(...) function, IN operator with StorageSet and maybe in some other scenarios. I think we should fix it in a more general way. Possible solutions:

  • Most fundamental one: analyze queries to make a dependency graph on server startup and load tables in order of topological sorting.
  • Perform a check on table creation that new table will not violate the order of databases and tables loading. Can be combined with the previous solution (in this case we should check that new table doesn't add a cycle into dependency graph). Simpler version: just check that database name of dependent table is lexicographically less then database name of it's dependency (see the note about workaround below).
  • Do not perform full analysis in InterpreterCreateQuery::getColumnsDescription(...) on server startup because metadata files should already contain normalized queries. But I'm not sure that it will fix all scenarios (probably there are some not related to getColumnsDescription(...) ones).

As for now, clickhouse-server loads system database first and then loads all user-created databases in lexicographical order. When loading a database, server loads tables first (in multiple threads, with no guarantees on loading order) and then attaches DDL-dictionaries if any (in single thread). So it's possible to create all such tables and dictionaries in a separate database with the lexicographically smallest name as a workaround.

@javisantana
Copy link
Copy Markdown
Contributor Author

Makes sense, thanks for the info, I didn't realize about dictGet and StorageSet. In any case, I think join tables and dictionaries would solve most of the issues (not at all them but looking at the bug reports seems like most of them come from these kind of tables)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants