Skip to content

Check dependencies on DROP TABLE#30977

Merged
tavplubix merged 6 commits intomasterfrom
check_dependencies_on_drop
Nov 16, 2021
Merged

Check dependencies on DROP TABLE#30977
tavplubix merged 6 commits intomasterfrom
check_dependencies_on_drop

Conversation

@tavplubix
Copy link
Copy Markdown
Member

@tavplubix tavplubix commented Nov 1, 2021

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Do not allow to drop a table or dictionary if some tables or dictionaries depend on it.

Detailed description / Documentation draft:
Fixes Server failed to start error in Stress tests.
Related to #28373, #15170

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Nov 1, 2021
@tavplubix tavplubix marked this pull request as ready for review November 2, 2021 14:31
@Avogar Avogar self-assigned this Nov 8, 2021
Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @tavplubix can you resolve conflicts?

@tavplubix
Copy link
Copy Markdown
Member Author

CIGithubActions / FastTest (pull_request) - Temporary failure in name resolution, maybe we need some retries, cc: @alesapin
Functional stateless tests (release) - session expired four times in a row, will try to debug it again
Functional stateless tests (thread) - 02023_storage_filelog - Another select query is running on this table, need to wait it finish, cc: @ucasfl, @kssenii
Functional stateless tests flaky check (address) - OK
Integration tests (asan) - test_executable_table_function/test.py::test_executable_storage_input - Broken pipe: While executing TabSeparatedRowOutputFormat: While executing ShellCommandSource, cc: @kitaisreal
Integration tests (release) - test_replicated_merge_tree_hdfs_zero_copy/test.py::test_hdfs_zero_copy_with_ttl_delete - AssertionError: assert '(2)' == '(1)', cc: @yuzhichang, @kssenii
Stateless tests flaky check (address, actions) - OK

@tavplubix tavplubix merged commit 710fbeb into master Nov 16, 2021
@tavplubix tavplubix deleted the check_dependencies_on_drop branch November 16, 2021 08:38
@tavplubix
Copy link
Copy Markdown
Member Author

Functional stateless tests (release) - session expired four times in a row, will try to debug it again

On client side:

$ for i in {1..9}; do zgrep -Fai "Finalizing session $i" clickhouse-server.log.gz | head -1; done
2021.11.15 08:35:40.416830 [ 521 ] {} <Test> ZooKeeperClient: Finalizing session 1: finalization_started=false, queue_finished=false, reason=Operation timeout on Multi /clickhouse/tables/00993_system_parts_race_condition_drop_zookeeper_test_xn2tet/alter_table/log/log-
2021.11.15 08:35:55.885380 [ 404 ] {} <Test> ZooKeeperClient: Finalizing session 2: finalization_started=false, queue_finished=false, reason=Operation timeout on Exists /clickhouse/tables/00993_system_parts_race_condition_drop_zookeeper_test_xn2tet/alter_table/replicas/r_6/is_active
2021.11.15 08:37:27.430213 [ 23373 ] {} <Test> ZooKeeperClient: Finalizing session 3: finalization_started=true, queue_finished=false, reason=Exception in receiveThread
2021.11.15 08:40:57.872523 [ 22735 ] {} <Test> ZooKeeperClient: Finalizing session 4: finalization_started=false, queue_finished=false, reason=Exception in receiveThread
2021.11.15 08:42:06.706671 [ 545 ] {} <Test> ZooKeeperClient: Finalizing session 5: finalization_started=true, queue_finished=false, reason=Operation timeout on Multi /clickhouse/tables/00993_system_parts_race_condition_drop_zookeeper_test_e7rkot/alter_table/log/log-
2021.11.15 08:42:45.583607 [ 497 ] {} <Test> ZooKeeperClient: Finalizing session 6: finalization_started=false, queue_finished=false, reason=Operation timeout on Multi /clickhouse/tables/00993_system_parts_race_condition_drop_zookeeper_test_e7rkot/alter_table/log

On server side: https://gist.github.com/tavplubix/ecbf71b28aed2a62a08c2d3611630aa4

2021.11.15 08:35:26.799419 [ 417 ] {} <Trace> KeeperTCPHandler: Received heartbeat for session #1
2021.11.15 08:35:27.065412 [ 422 ] {} <Debug> KeeperStateMachine: Created persistent snapshot 570000 with path /var/lib/clickhouse/coordination/snapshots/snapshot_570000.bin.zstd
2021.11.15 08:35:27.090471 [ 422 ] {} <Trace> KeeperStateMachine: Cleared garbage after snapshot
2021.11.15 08:35:30.241502 [ 455 ] {} <Debug> KeeperStateMachine: Creating snapshot 580000
2021.11.15 08:35:30.241535 [ 455 ] {} <Debug> KeeperStateMachine: In memory snapshot 580000 created, queueing task to flash to disk
2021.11.15 08:35:36.799381 [ 417 ] {} <Trace> KeeperTCPHandler: Received heartbeat for session #1
2021.11.15 08:35:42.026803 [ 417 ] {} <Debug> KeeperTCPHandler: Received close event with xid 2147483647 for session id #1
2021.11.15 08:35:42.028125 [ 417 ] {} <Information> KeeperTCPHandler: Got exception processing session #1: Code: 210. DB::NetException: I/O error: Broken pipe, while writing to socket ([::1]:41964). (NETWORK_ERROR), Stack trace (when copying this message, always include the lines below):
2021.11.15 08:35:42.026426 [ 422 ] {} <Debug> KeeperStateMachine: Created persistent snapshot 580000 with path /var/lib/clickhouse/coordination/snapshots/snapshot_580000.bin.zstd
2021.11.15 08:35:42.050208 [ 417 ] {} <Trace> KeeperTCPHandlerFactory: Keeper request. Address: [::1]:55020
2021.11.15 08:35:42.050384 [ 417 ] {} <Information> KeeperTCPHandler: Requesting session ID for the new client
2021.11.15 08:35:42.079205 [ 422 ] {} <Trace> KeeperStateMachine: Cleared garbage after snapshot
2021.11.15 08:35:42.095719 [ 455 ] {} <Debug> KeeperStateMachine: Session ID response 2 with timeout 30000

...

2021.11.15 08:37:07.131324 [ 455 ] {} <Debug> KeeperStateMachine: Creating snapshot 650000
2021.11.15 08:37:07.131382 [ 455 ] {} <Debug> KeeperStateMachine: In memory snapshot 650000 created, queueing task to flash to disk
2021.11.15 08:37:10.485035 [ 422 ] {} <Debug> KeeperStateMachine: Created persistent snapshot 650000 with path /var/lib/clickhouse/coordination/snapshots/snapshot_650000.bin.zstd
2021.11.15 08:37:10.508656 [ 422 ] {} <Trace> KeeperStateMachine: Cleared garbage after snapshot
2021.11.15 08:37:22.210601 [ 417 ] {} <Information> KeeperTCPHandler: Got exception processing session #3: Code: 159. DB::Exception: Cannot push request to queue within operation timeout. (TIMEOUT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
2021.11.15 08:37:27.546829 [ 417 ] {} <Trace> KeeperTCPHandlerFactory: Keeper request. Address: [::1]:59286
2021.11.15 08:37:27.547038 [ 417 ] {} <Information> KeeperTCPHandler: Requesting session ID for the new client
2021.11.15 08:37:27.547231 [ 455 ] {} <Debug> KeeperStateMachine: Session ID response 4 with timeout 30000

In the first case (sessions 1, 2, 5, 6) Keeper server failed to process request or send response in 10 seconds for unknown reason, so client closed connection.
In the second case (sessions 3 and 4) Keeper server failed to push request to the queue in 10 seconds for unknown reason and closed connection by itself.

I'm not sure if it's related to snapshots or not, but it's a bit suspicious that session expires when Keeper creates a lot of snapshots. On the other hand, it means that requests rate is high (about 2000 rps according to the logs). Anyway, let's try to increase snapshot_distance for functional tests and check what will happen: #31448

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants