Skip to content

Save system logs in functional tests if server crashed#36885

Merged
tavplubix merged 3 commits intomasterfrom
fix_system_logs_saving
May 3, 2022
Merged

Save system logs in functional tests if server crashed#36885
tavplubix merged 3 commits intomasterfrom
fix_system_logs_saving

Conversation

@tavplubix
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Need these logs to debug #36610

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-not-for-changelog This PR should not be mentioned in the changelog label May 3, 2022
@qoega qoega self-assigned this May 3, 2022
@tavplubix
Copy link
Copy Markdown
Member Author

+ for table in query_log zookeeper_log trace_log transactions_info_log
+ clickhouse-local --path /var/lib/clickhouse/ -q 'select * from system.query_log format TSVWithNamesAndTypes'
+ [[ -n '' ]]
+ for table in query_log zookeeper_log trace_log transactions_info_log
+ pigz
+ clickhouse-local --path /var/lib/clickhouse/ -q 'select * from system.zookeeper_log format TSVWithNamesAndTypes'
+ [[ -n '' ]]
+ for table in query_log zookeeper_log trace_log transactions_info_log
+ pigz
+ clickhouse-local --path /var/lib/clickhouse/ -q 'select * from system.trace_log format TSVWithNamesAndTypes'
+ [[ -n '' ]]
+ for table in query_log zookeeper_log trace_log transactions_info_log
+ pigz
+ clickhouse-local --path /var/lib/clickhouse/ -q 'select * from system.transactions_info_log format TSVWithNamesAndTypes'
+ [[ -n '' ]]
+ wait
+ pigz
Code: 76. DB::Exception: Cannot lock file /var/lib/clickhouse/status. Another server instance in same directory is already running. (CANNOT_OPEN_FILE)
Code: 76. DB::Exception: Cannot lock file /var/lib/clickhouse/status. Another server instance in same directory is already running. (CANNOT_OPEN_FILE)
Code: 76. DB::Exception: Cannot lock file /var/lib/clickhouse/status. Another server instance in same directory is already running. (CANNOT_OPEN_FILE)

@tavplubix
Copy link
Copy Markdown
Member Author

Now it looks ok
Stateless tests (release, DatabaseReplicated, actions) [2/2] - server failed to start:

2022.05.04 00:21:18.912276 [ 557 ] {} <Warning> RaftInstance: Election timeout, initiate leader election
2022.05.04 00:21:19.402138 [ 483 ] {} <Warning> KeeperTCPHandler: Ignoring user request, because the server is not active yet
2022.05.04 00:21:19.402151 [ 484 ] {} <Warning> KeeperTCPHandler: Ignoring user request, because the server is not active yet
2022.05.04 00:21:19.402764 [ 484 ] {} <Warning> KeeperTCPHandler: Ignoring user request, because the server is not active yet
2022.05.04 00:21:19.402984 [ 484 ] {} <Warning> KeeperTCPHandler: Ignoring user request, because the server is not active yet
2022.05.04 00:21:19.403298 [ 484 ] {} <Warning> KeeperTCPHandler: Ignoring user request, because the server is not active yet
2022.05.04 00:21:19.403502 [ 484 ] {} <Warning> KeeperTCPHandler: Ignoring user request, because the server is not active yet
2022.05.04 00:21:19.404005 [ 1166 ] {} <Error> virtual bool DB::DDLWorker::initializeMainThread(): Code: 999. Coordination::Exception: All connection tries failed while connecting to ZooKeeper. nodes: [::1]:19181, [::1]:9181, [::1]:29181
Code: 999. Coordination::Exception: Keeper server rejected the connection during the handshake. Possibly it's overloaded, doesn't see leader or stale (Connection loss): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 22.5.1.1), [::1]:19181
Poco::Except

Anyway, I forgot to do another change

@tavplubix tavplubix merged commit c3f177f into master May 3, 2022
@tavplubix tavplubix deleted the fix_system_logs_saving branch May 3, 2022 22:12
azat added a commit to azat/ClickHouse that referenced this pull request May 27, 2022
If 10 seconds will not be enough to finish the server, then
clickhouse-local (that goes after) cannot obtain the logs due to status
file will be locked, like in [1]:

    Code: 76. DB::Exception: Cannot lock file /var/lib/clickhouse/status. Another server instance in same directory is already running. (CANNOT_OPEN_FILE)

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/35075/4a064e5b6f81136f2bf923d85001f25fa05d39ce/stateless_tests_flaky_check__address__actions_.html

So use proper wait via "clickhouse stop"

v2: Fix permissions pid file for replicated database servers
    They do not use default, /var/run/clickhouse-server, that do not have
    proper permissions.

Fixes: ClickHouse#36885
Signed-off-by: Azat Khuzhin <[email protected]>
# for files >64MB, we want this files to be compressed explicitly
for table in query_log zookeeper_log trace_log transactions_info_log
do
clickhouse-client -q "select * from system.$table format TSVWithNamesAndTypes" | pigz > /test_output/$table.tsv.gz &
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with using clickhouse-local here is that now it is not capable to save data from S3 storage, like the comment above this lines states. I guess clickhouse-local can be taught to use S3 (maybe simply using server config will be enough, though not sure)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov alexey-milovidov changed the title Save system logs in functionsl tests if server crashed Save system logs in functional tests if server crashed Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants