Skip to content

Dictionaries ddl loader#7360

Merged
alesapin merged 54 commits intomasterfrom
dictionaries_ddl_loader
Oct 25, 2019
Merged

Dictionaries ddl loader#7360
alesapin merged 54 commits intomasterfrom
dictionaries_ddl_loader

Conversation

@alesapin
Copy link
Copy Markdown
Member

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature

Short description (up to few sentences):
Add the ability to create dictionaries with DDL queries.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Oct 16, 2019

TODO:

  • Engine = Dictionary() table for each dictionary inside database.
  • system.dictionaries
  • More tests (also integeration for MySQL)
  • Bug fixes

second_column UInt8 DEFAULT 1 EXPRESSION rand() % 222,
third_column String DEFAULT 'qqq'
)
PRIMARY KEY key_column, second_column
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks pretty cool, but inconsistent with current implementation for MergeTree ordering/pkey qualifier, which requires (key_column, second_column) format.
Just wondering - would this approach become available for ordinal table creation as well (in future)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything against your proposal. Maybe it would be better to make round brackets optional for dictionaries.

)
PRIMARY KEY key_column
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 'table_for_dict' PASSWORD '' DB 'database_for_dict'))
LIFETIME(MIN 1 MAX 10)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I can see - we have here two mandatory options (min, max). Maybe, it is better to omit them, allowing to specify in a form like LIFETIME(1,10)?
Pros:

  1. No extra syntax constructions needed
  2. Less symbols to type :)
  3. Looks more native for SQL users (like function call)

Cons:

  1. If lifetime will have more complicated configuration, it will get stuck. E.g.: LIFETIME(MAX 10 IF_ONCE_QUERIED_REMEMBER_FOREVER 1) (here MIN 1 is omitted, but another option is introduced)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current LIFETIME definition is consistent with other parts RANGE, SOURCE , etc.

@alesapin alesapin added pr-feature Pull request with new product feature and removed pr-build Pull request with build/testing/packaging improvement labels Oct 22, 2019

SELECT dictGetString('database_for_dict.dictionary_with_hierarchy', 'RegionName', toUInt64(2));
SELECT dictGetHierarchy('database_for_dict.dictionary_with_hierarchy', toUInt64(3));
SELECT dictIsIn('database_for_dict.dictionary_with_hierarchy', toUInt64(3), toUInt64(2));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be posiible to have dict with same name in 2 different databases?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! So i think it will also resolve #4201

LIFETIME(MIN 1 MAX 10)
LAYOUT(FLAT());

SELECT count(*) from database_for_dict.dict1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that it will also allow to access dictionaries data with simple select (as with Engine=Dictionary)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

SELECT second_column FROM database_for_dict.dict1 WHERE key_column = 11;
SELECT dictGetString('database_for_dict.dict1', 'third_column', toUInt64(12));
SELECT third_column FROM database_for_dict.dict1 WHERE key_column = 12;
SELECT dictGetFloat64('database_for_dict.dict1', 'fourth_column', toUInt64(14));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is database part in dictionary name mandatory? Or it will use current database / global scope if not provided?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As tables, it will use the current database.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how that will work with existing dicts (defined in "global"context)

Copy link
Copy Markdown
Member Author

@alesapin alesapin Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing dictionaries, you have to used the full name in dictGet* functions. For normal selects, you have to create database with an engine Dictionary().


SHOW DICTIONARIES FROM ordinary_db LIKE 'dict1';

EXISTS DICTIONARY ordinary_db.dict1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what will happen if same dict is described both in xml and ddl?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we will not be able to load dictionary and got an exception on creation, but I need to add test for this!

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Oct 24, 2019

my own tests failed :(

I have missed config file :(

@alesapin
Copy link
Copy Markdown
Member Author

@alexey-milovidov recommend to merge this PR.

@alesapin alesapin merged commit c7cd911 into master Oct 25, 2019
@alesapin alesapin mentioned this pull request Oct 28, 2019
15 tasks
auto & external_loader = context.getExternalDictionariesLoader();
external_loader.addConfigRepository(getDatabaseName(), std::move(dictionaries_repository));
bool lazy_load = context.getConfigRef().getBool("dictionaries_lazy_load", true);
external_loader.reload(!lazy_load);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after this change all dictionaries will be reloaded, regardless the lifetime?
And this can be not the desired behavior if the dictionary is big enough

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

azat added a commit to azat/ClickHouse that referenced this pull request Nov 26, 2019
This ignores any lifetime, while dictionaries can be quite big.

Fixes: c7cd911 ("Merge pull request ClickHouse#7360")
Refs: ClickHouse#7360 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants