Skip to content

Persistent nukeeper snapshot storage#21425

Merged
alesapin merged 64 commits intomasterfrom
persistent_nukeeper_snapshot_storage
Mar 9, 2021
Merged

Persistent nukeeper snapshot storage#21425
alesapin merged 64 commits intomasterfrom
persistent_nukeeper_snapshot_storage

Conversation

@alesapin
Copy link
Copy Markdown
Member

@alesapin alesapin commented Mar 3, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 5, 2021

Shutdown bug :(

@alesapin alesapin added the do not test disable testing on pull request label Mar 5, 2021
@alesapin alesapin removed the do not test disable testing on pull request label Mar 5, 2021
@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 5, 2021

Seems like after query:

SELECT                                                                                                                                                                                                                                         
    [257],                                                                                                                                                                                                                                     
    k,                                                                                                                                                                                                                                         
    [NULL, NULL, 2., 100.0000991821289, NULL, NULL, NULL],                                                                                                                                                                                     
    9223372036854775807,                                                                                                                                                                                                                       
    [NULL, 1., 0, NULL, NULL, NULL],                                                                                                                                                                                                           
    256,                                                                                                                                                                                                                                       
    groupArrayForEach(arr)                                                                                                                                                                                                                     
FROM                                                                                                                                                                                                                                           
(                                                                                                                                                                                                                                              
    SELECT                                                                                                                                                                                                                                     
        number % NULL AS k,                                                                                                                                                                                                                    
        [-2, 1024],                                                                                                                                                                                                                            
        [-1, 1048576],                                                                                                                                                                                                                         
        range(number) AS arr                                                                                                                                                                                                                   
    FROM system.numbers                                                                                                                                                                                                                        
    LIMIT 1023                                                                                                                                                                                                                                 
)                                                                                                                                                                                                                                              
GROUP BY                                                                                                                                                                                                                                       
    arrayJoin([NULL, 100]),
    [1048575, NULL, NULL, NULL, NULL, NULL],
    k
ORDER BY arrayJoin([1048575]) ASC

OOM kills the server.
2021.03.05 18:53:50.435299 [ 268 ] {698bb0eb-77c6-44b8-a3b5-df429a8e0793} MemoryTracker: Current memory usage (total): 37.00 GiB.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 6, 2021

Hmmm, looks like a bug.... But I don't understand where and cannot reproduce it. For some reason when the follower node down leader tries to send a 0-level snapshot (fake) with each append_entries call. Possibly it's OK, but 100% shouldn't affect append_entries performance.
Also, for some reason kazoo client doesn't send any heartbeats for 30 seconds:

2021.03.06 11:33:15.147396 [ 13 ] {} <Information> NuKeeperTCPHandler: Received session ID 1
2021.03.06 11:33:45.334357 [ 13 ] {} <Debug> NuKeeperTCPHandler: Session #1 expired

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 6, 2021

I've added sleeps and really, kazoo just ignore heart beats:

2021.03.06 16:28:44.745204 [ 13 ] {} <Trace> NuKeeperTCPHandlerFactory: NuKeeper request. Address: 192.168.144.1:35340
2021.03.06 16:28:44.745912 [ 13 ] {} <Information> NuKeeperTCPHandler: Requesting session ID for the new client
2021.03.06 16:28:44.750093 [ 13 ] {} <Information> NuKeeperTCPHandler: Received session ID 1
2021.03.06 16:29:15.164276 [ 13 ] {} <Debug> NuKeeperTCPHandler: Session #1 expired

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 6, 2021

Wow, seems like our zookeeper client sends redundant heartbeats. Each operation in ZK considered as a heartbeat and we don't need to send them by timeout if have other ops (https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L1150). Need to change NuKeeper server logic!

@alesapin alesapin marked this pull request as ready for review March 6, 2021 20:15
@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 7, 2021

test_materialize_mysql_database
Not related

@alesapin alesapin changed the title [WIP] Persistent nukeeper snapshot storage Persistent nukeeper snapshot storage Mar 7, 2021
@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 7, 2021

Stress test:

2021.03.07 12:53:55.398258 [ 296 ] {} <Debug> Application: Received termination signal.
2021.03.07 12:53:55.398366 [ 296 ] {} <Debug> Application: Waiting for current connections to close.
<...>
2021.03.07 12:54:01.288076 [ 296 ] {} <Information> Application: Closed connections. But 9 remain. Tip: To increase wait time add to config: <shutdown_wait_unfinished>60</shutdown_wait_unfinished>
2021.03.07 12:54:01.288381 [ 296 ] {} <Information> Application: Will shutdown forcefully.

This is a separate problem with hanging queries, but after that, we cannot start the server because of:

2021.03.07 12:56:30.780362 [ 937894 ] {} <Error> Application: Caught exception while loading metadata: Code: 999, e.displayText() = Coordination::Exception: No node, path: /clickhouse/tables/test_01513/sequence_consistency/replicas/1/metadata: Cannot attach table `test_25`.`quorum1` from metadata file /var/lib/clickhouse/metadata/test_25/quorum1.sql from query ATTACH TABLE test_25.quorum1 (`x` UInt32, `y` Date) ENGINE = ReplicatedMergeTree('/clickhouse/tables/test_01513/sequence_consistency', '1') PARTITION BY y ORDER BY x SETTINGS index_granularity = 8192: while loading database `test_25` from path /var/lib/clickhouse/metadata/test_25, Stack trace (when copying this message, always include the lines below):

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 7, 2021

This replica was created for 5 minutes before the shutdown.

2021.03.07 12:50:50.216462 [ 466338 ] {85c2dd98-5b7d-4cee-a65b-36a7a777f362} <Trace> ContextAccess (default): Access granted: CREATE TABLE ON test_25.quorum1                                                                                  
2021.03.07 12:50:50.243636 [ 466338 ] {85c2dd98-5b7d-4cee-a65b-36a7a777f362} <Debug> test_25.quorum1: Loading data parts                                                                                                                       
2021.03.07 12:50:50.244103 [ 466338 ] {85c2dd98-5b7d-4cee-a65b-36a7a777f362} <Debug> test_25.quorum1: Loaded data parts (0 items)                                                                                                              
2021.03.07 12:50:50.245940 [ 466338 ] {85c2dd98-5b7d-4cee-a65b-36a7a777f362} <Debug> test_25.quorum1: This table /clickhouse/tables/test_01513/sequence_consistency is already created, will add new replica                                   
2021.03.07 12:50:50.247979 [ 466338 ] {85c2dd98-5b7d-4cee-a65b-36a7a777f362} <Debug> test_25.quorum1: Creating replica /clickhouse/tables/test_01513/sequence_consistency/replicas/1                                                           
2021.03.07 12:50:50.268746 [ 372 ] {} <Debug> test_25.quorum1 (ReplicatedMergeTreeRestartingThread): Activating replica.
2021.03.07 12:50:50.276419 [ 372 ] {} <Information> test_25.quorum1: Replica 3 has log pointer '0', approximate 1 queue lag and 0 queue size
2021.03.07 12:50:50.276577 [ 372 ] {} <Information> test_25.quorum1: Will mimic 3
2021.03.07 12:50:50.281992 [ 372 ] {} <Debug> test_25.quorum1: Queued 0 parts to be fetched
2021.03.07 12:50:50.282109 [ 372 ] {} <Debug> test_25.quorum1: Copied 0 queue entries
2021.03.07 12:50:50.282905 [ 372 ] {} <Debug> test_25.quorum1 (ReplicatedMergeTreeQueue): Loading queue from /clickhouse/tables/test_01513/sequence_consistency/replicas/1/queue
2021.03.07 12:50:50.283877 [ 372 ] {} <Debug> test_25.quorum1 (ReplicatedMergeTreeQueue): Having 0 queue entries to load, 0 entries already loaded.
2021.03.07 12:50:50.284342 [ 372 ] {} <Trace> test_25.quorum1 (ReplicatedMergeTreeQueue): Loaded queue
2021.03.07 12:50:50.292629 [ 383 ] {} <Information> test_25.quorum1: Became leader
2021.03.07 12:50:53.670068 [ 466001 ] {f01dffd2-5ecb-4790-acc9-035c69506d30} <Trace> ContextAccess (default): Access granted: SYSTEM MERGES ON test_25.quorum1
2021.03.07 12:50:56.832574 [ 466001 ] {64c769cc-cd12-4c49-88d2-1ffaf8cbec60} <Trace> ContextAccess (default): Access granted: SYSTEM MERGES ON test_25.quorum1
2021.03.07 12:51:19.932018 [ 466000 ] {b66073f5-b290-41a9-babf-6bb2223d6fc1} <Trace> ContextAccess (default): Access granted: SYSTEM MERGES ON test_25.quorum1
<...>
2021.03.07 12:54:01.288381 [ 296 ] {} <Information> Application: Will shutdown forcefully.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 7, 2021

No idea how is it possible. I'll upload of coordination dump after stress test.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 7, 2021

I have some bug with snapshots:

2021.03.07 21:45:48.326377 [ 72262 ] {} <Fatal> BaseDaemon: ########################################
2021.03.07 21:45:48.326433 [ 72262 ] {} <Fatal> BaseDaemon: (version 21.4.1.6189, build id: 9CA33DF56446D8AFAA017C202F16256573DB58C1) (from thread 72184) (no query) Received signal Unknown signal (-3)
2021.03.07 21:45:48.326458 [ 72262 ] {} <Fatal> BaseDaemon: Sanitizer trap.
2021.03.07 21:45:48.326547 [ 72262 ] {} <Fatal> BaseDaemon: Stack trace: 0xdc01d35 0xde2e996 0xdbc60d6 0xdbd18ff 0x1a9fb7b3 0x1a9f87f7 0x1a9f5965 0x1a9f7bb0 0x1a9e703a 0x1a9c7fde 0x1aa241e6 0x19198656 0xdc7d2c3 0x1b4d8aba 0xdc71722 0x1b4eabdb 0xdc6fb3b 0xdbd7c14 0x7f77ff2df0b3 0xdbb5d2e
2021.03.07 21:45:48.344817 [ 72262 ] {} <Fatal> BaseDaemon: 0.1. inlined from ./obj-x86_64-linux-gnu/../src/Common/StackTrace.cpp:298: StackTrace::tryCapture()
2021.03.07 21:45:48.344847 [ 72262 ] {} <Fatal> BaseDaemon: 0. ../src/Common/StackTrace.cpp:259: StackTrace::StackTrace() @ 0xdc01d35 in /usr/bin/clickhouse
2021.03.07 21:45:48.401057 [ 72262 ] {} <Fatal> BaseDaemon: 1. ./obj-x86_64-linux-gnu/../base/daemon/BaseDaemon.cpp:381: sanitizerDeathCallback() @ 0xde2e996 in /usr/bin/clickhouse
2021.03.07 21:45:49.290881 [ 72262 ] {} <Fatal> BaseDaemon: 2. __sanitizer::Die() @ 0xdbc60d6 in /usr/bin/clickhouse
2021.03.07 21:45:50.166361 [ 72262 ] {} <Fatal> BaseDaemon: 3. ? @ 0xdbd18ff in /usr/bin/clickhouse
2021.03.07 21:45:50.203446 [ 72262 ] {} <Fatal> BaseDaemon: 4. ./obj-x86_64-linux-gnu/../contrib/libcxx/include/__hash_table:333: std::__1::__hash_iterator<std::__1::__hash_node<std::__1::__hash_value_type<StringRef, std::__1::__list_iterator<DB::ListNode<DB::NuKeeperStorage::Node>, void*> >, void*>*>::operator->() const @ 0x1a9fb7b3 in /usr/bin/clickhouse
2021.03.07 21:45:50.238919 [ 72262 ] {} <Fatal> BaseDaemon: 5.1. inlined from ./obj-x86_64-linux-gnu/../contrib/libcxx/include/unordered_map:833: std::__1::__hash_map_iterator<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::__hash_value_type<StringRef, std::__1::__list_iterator<DB::ListNode<DB::NuKeeperStorage::Node>, void*> >, void*>*> >::operator->() const
2021.03.07 21:45:50.238957 [ 72262 ] {} <Fatal> BaseDaemon: 5. ../src/Coordination/SnapshotableHashTable.h:124: DB::SnapshotableHashTable<DB::NuKeeperStorage::Node>::updateValue(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<void (DB::NuKeeperStorage::Node&)>) @ 0x1a9f87f7 in /usr/bin/clickhouse
2021.03.07 21:45:50.270801 [ 72262 ] {} <Fatal> BaseDaemon: 6.1. inlined from ./obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:2547: ~function
2021.03.07 21:45:50.270825 [ 72262 ] {} <Fatal> BaseDaemon: 6. ../src/Coordination/NuKeeperSnapshotManager.cpp:197: DB::NuKeeperStorageSnapshot::deserialize(DB::NuKeeperStorage&, DB::ReadBuffer&) @ 0x1a9f5965 in /usr/bin/clickhouse
2021.03.07 21:45:50.307055 [ 72262 ] {} <Fatal> BaseDaemon: 7. ./obj-x86_64-linux-gnu/../src/Coordination/NuKeeperSnapshotManager.cpp:0: DB::NuKeeperSnapshotManager::deserializeSnapshotFromBuffer(DB::NuKeeperStorage*, std::__1::shared_ptr<nuraft::buffer>) @ 0x1a9f7bb0 in /usr/bin/clickhouse
2021.03.07 21:45:50.342970 [ 72262 ] {} <Fatal> BaseDaemon: 8. ./obj-x86_64-linux-gnu/../src/Coordination/NuKeeperStateMachine.cpp:0: DB::NuKeeperStateMachine::init() @ 0x1a9e703a in /usr/bin/clickhouse
2021.03.07 21:45:50.377669 [ 72262 ] {} <Fatal> BaseDaemon: 9. ./obj-x86_64-linux-gnu/../src/Coordination/NuKeeperServer.cpp:40: DB::NuKeeperServer::startup() @ 0x1a9c7fde in /usr/bin/clickhouse
2021.03.07 21:45:50.411125 [ 72262 ] {} <Fatal> BaseDaemon: 10. ./obj-x86_64-linux-gnu/../src/Coordination/NuKeeperStorageDispatcher.cpp:140: DB::NuKeeperStorageDispatcher::initialize(Poco::Util::AbstractConfiguration const&) @ 0x1aa241e6 in /usr/bin/clickhouse
2021.03.07 21:45:50.581705 [ 72262 ] {} <Fatal> BaseDaemon: 11.1. inlined from ./obj-x86_64-linux-gnu/../contrib/libcxx/include/__mutex_base:97: ~lock_guard
2021.03.07 21:45:50.581751 [ 72262 ] {} <Fatal> BaseDaemon: 11. ../src/Interpreters/Context.cpp:1605: DB::Context::initializeNuKeeperStorageDispatcher() const @ 0x19198656 in /usr/bin/clickhouse
2021.03.07 21:45:50.675764 [ 72262 ] {} <Fatal> BaseDaemon: 12.1. inlined from ./obj-x86_64-linux-gnu/../contrib/libcxx/include/vector:1517: std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::begin()
2021.03.07 21:45:50.675791 [ 72262 ] {} <Fatal> BaseDaemon: 12. ../programs/server/Server.cpp:865: DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xdc7d2c3 in /usr/bin/clickhouse
2021.03.07 21:45:50.694075 [ 72262 ] {} <Fatal> BaseDaemon: 13. ./obj-x86_64-linux-gnu/../contrib/poco/Util/src/Application.cpp:0: Poco::Util::Application::run() @ 0x1b4d8aba in /usr/bin/clickhouse
2021.03.07 21:45:50.807178 [ 72262 ] {} <Fatal> BaseDaemon: 14. ./obj-x86_64-linux-gnu/../programs/server/Server.cpp:343: DB::Server::run() @ 0xdc71722 in /usr/bin/clickhouse
2021.03.07 21:45:50.815062 [ 72262 ] {} <Fatal> BaseDaemon: 15. ./obj-x86_64-linux-gnu/../contrib/poco/Util/src/ServerApplication.cpp:612: Poco::Util::ServerApplication::run(int, char**) @ 0x1b4eabdb in /usr/bin/clickhouse
2021.03.07 21:45:50.927598 [ 72262 ] {} <Fatal> BaseDaemon: 16. ./obj-x86_64-linux-gnu/../programs/server/Server.cpp:0: mainEntryClickHouseServer(int, char**) @ 0xdc6fb3b in /usr/bin/clickhouse
2021.03.07 21:45:50.931362 [ 72262 ] {} <Fatal> BaseDaemon: 17. ./obj-x86_64-linux-gnu/../programs/main.cpp:0: main @ 0xdbd7c14 in /usr/bin/clickhouse
2021.03.07 21:45:50.931402 [ 72262 ] {} <Fatal> BaseDaemon: 18. __libc_start_main @ 0x270b3 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.03.07 21:45:51.804285 [ 72262 ] {} <Fatal> BaseDaemon: 19. _start @ 0xdbb5d2e in /usr/bin/clickhouse
2021.03.07 21:45:52.015648 [ 72262 ] {} <Fatal> BaseDaemon: Calculated checksum of the binary: F0DE70ADD84C5628E61189C36A588595. There is no information about the reference checksum.

Child node exist without parent.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 8, 2021

Ok, will make one more CI run just to be sure.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 9, 2021

test_testkeeper_restore_from_snapshot/test.py::test_recover_from_snapshot flaky -- with tsan server start 7 seconds but we wait for 5 only.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Mar 9, 2021

No related failures. Let's merge and see what will happen.
fuzzer:

$ grep -a 'Fatal' server.log
2021.03.09 12:48:09.071765 [ 82 ] {} <Fatal> Application: Child process was terminated by signal 6.

And that is all....

@alesapin alesapin merged commit bcd0732 into master Mar 9, 2021
@alesapin alesapin deleted the persistent_nukeeper_snapshot_storage branch March 9, 2021 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants