Skip to content

Fix server restart after stress test#13743

Merged
alexey-milovidov merged 3 commits intomasterfrom
fix-stress-test
Aug 15, 2020
Merged

Fix server restart after stress test#13743
alexey-milovidov merged 3 commits intomasterfrom
fix-stress-test

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Aug 15, 2020

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

You may notice that the last two weeks some stress test runs fail sporadically with the "Server failed to start" message.
It started to happen because we have added more checks to the stress test and it does not indicate any new bugs.

Stress test is a check that runs all the queries from functional tests with random order and with multiple threads without checking the results. It checks the following:

  • server does not crash;
  • no errors are triggered under sanitizers (ASan, TSan, MSan, UBSan);
  • no debug assertions are triggered;
  • all queries are finished after runs (no deadlocks or inifinite loops);
  • memory does not leak;

We have added one more check:

  • server can stop and start again after the test.

This check will fail if incorrect tables are spontaneously created during the test, and after server is restarted, it will find these incorrect tables and refuse to start without manual cleanup.

The check is reasonable, because manual cleanup should not be required in general and it's often impossible in managed ClickHouse setups such as Yandex Cloud.

But the check started to fail. There were several causes:

The first was due to leftover .sql.tmp file after unsuccessful table creation: #13557
The second was due to tables created with CREATE ... AS table_function(...).

It is possible to create a table as remote, cluster or merge table function and the function will be re-evaluated during server startup. The order of tables initialization is undefined and it's possible that the merge table function will not find the corresponding tables, throw an exception and server will refuse to start. This issue is minor.

In this pull request, I've edited the 01083_expressions_in_engine_arguments.sql test, so that all tables will refer only the current database. And current database is destroyed after the test, so no leftovers can remain after the test and server will restart successfully.

This modification required a new feature - the possibility to create a dictionary with ClickHouse source in current database if database was not specified (in previous versions, default database was used instead of current database).

The root cause of the issue is not fixed.

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Aug 15, 2020
* </source>
*/
void buildSourceConfiguration(AutoPtr<Document> doc, AutoPtr<Element> root, const ASTFunctionWithKeyValueArguments * source, const ASTDictionarySettings * settings)
void buildSourceConfiguration(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespaces only.

bool complex = DictionaryFactory::instance().isComplex(dictionary_layout->layout_type);

auto all_attr_names_and_types = buildDictionaryAttributesConfiguration(xml_document, structure_element, query.dictionary_attributes_list, pk_attrs);
auto all_attr_names_and_types = buildDictionaryAttributesConfiguration(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespaces only.


CREATE DICTIONARY dict (n UInt64, col String DEFAULT '42') PRIMARY KEY n
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9440 SECURE 1 USER 'default' TABLE 'url' DB 'test_01083')) LIFETIME(1) LAYOUT(CACHE(SIZE_IN_CELLS 1));
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9440 SECURE 1 USER 'default' TABLE 'url')) LIFETIME(1) LAYOUT(CACHE(SIZE_IN_CELLS 1));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the significant change.

@alexey-milovidov alexey-milovidov added the testing Special issue with list of bugs found by CI label Aug 15, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Style check

#13744

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Unit tests

#13745

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Stress tests Ok.
Let's fix "Arcadia" after merge.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

It did not solve the issue completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog testing Special issue with list of bugs found by CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants