Skip to content

Fix race condition on keeper shutdown#24412

Merged
alesapin merged 3 commits intomasterfrom
fix_race_condition_keeper_shutdown
May 23, 2021
Merged

Fix race condition on keeper shutdown#24412
alesapin merged 3 commits intomasterfrom
fix_race_condition_keeper_shutdown

Conversation

@alesapin
Copy link
Copy Markdown
Member

@alesapin alesapin commented May 22, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix a data race on Keeper shutdown.

zoo2_1  | ==================
zoo2_1  | WARNING: ThreadSanitizer: data race (pid=1)
zoo2_1  |   Read of size 8 at 0x7b0400012040 by thread T2 (mutexes: write M2842):
zoo2_1  |     #0 std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, void*>*> std::__1::__hash_table<std::__1::_
_hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, std::__1::hash<long>, std::__1::equal_to<long>, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResp
onse> const&)> >, std::__1::equal_to<long>, std::__1::hash<long>, true>, std::__1::allocator<std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> > > >::find<long>(long c
onst&) const obj-x86_64-linux-gnu/../contrib/libcxx/include/__hash_table:2425:31 (clickhouse+0x14763e1b)
zoo2_1  |     #1 unsigned long std::__1::__hash_table<std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, std::__1::hash<long>, std::__1::equal_to<long>, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, std::__1::equal_to<long>, std::__1::hash<long>, true>, std::__1::allocator<std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> > > >::__count_unique<long>(long const&) const obj-x86_64-linux-gnu/../contrib/libcxx/include/__hash_table:2616:35 (clickhouse+0x14763e1b)
zoo2_1  |     #2 std::__1::unordered_map<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)>, std::__1::hash<long>, std::__1::equal_to<long>, std::__1::allocator<std::__1::pair<long const, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> > > >::count(long const&) const obj-x86_64-linux-gnu/../contrib/libcxx/include/unordered_map:1368:65 (clickhouse+0x14763e1b)
zoo2_1  |     #3 DB::KeeperStorageDispatcher::putRequest(std::__1::shared_ptr<Coordination::ZooKeeperRequest> const&, long) obj-x86_64-linux-gnu/../src/Coordination/KeeperStorageDispatcher.cpp:220:42 (clickhouse+0x14763e1b)
zoo2_1  |     #4 DB::KeeperTCPHandler::receiveRequest() obj-x86_64-linux-gnu/../src/Server/KeeperTCPHandler.cpp:410:40 (clickhouse+0x1438308d)
zoo2_1  |     #5 DB::KeeperTCPHandler::runImpl() obj-x86_64-linux-gnu/../src/Server/KeeperTCPHandler.cpp:334:52 (clickhouse+0x14380d1b)
zoo2_1  |     #6 DB::KeeperTCPHandler::run() obj-x86_64-linux-gnu/../src/Server/KeeperTCPHandler.cpp:222:5 (clickhouse+0x1437ff99)
zoo2_1  |     #7 Poco::Net::TCPServerConnection::start() obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerConnection.cpp:43:3 (clickhouse+0x16fd5602)
zoo2_1  |     #8 Poco::Net::TCPServerDispatcher::run() obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerDispatcher.cpp:115:20 (clickhouse+0x16fd5d8f)
zoo2_1  |     #9 Poco::PooledThread::run() obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/ThreadPool.cpp:199:14 (clickhouse+0x1714d401)
zoo2_1  |     #10 Poco::(anonymous namespace)::RunnableHolder::run() obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/Thread.cpp:55:11 (clickhouse+0x1714b88f)
zoo2_1  |     #11 Poco::ThreadImpl::runnableEntry(void*) obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/Thread_POSIX.cpp:345:27 (clickhouse+0x1714a007)
zoo2_1  | 
zoo2_1  |   Previous write of size 8 at 0x7b0400012040 by main thread (mutexes: write M2546):
zoo2_1  |     #0 std::__1::__hash_table<std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, std::__1::hash<long>, std::__1::equal_to<long>, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> >, std::__1::equal_to<long>, std::__1::hash<long>, true>, std::__1::allocator<std::__1::__hash_value_type<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> > > >::clear() obj-x86_64-linux-gnu/../contrib/libcxx/include/__hash_table:1830:33 (clickhouse+0x14766374)
zoo2_1  |     #1 std::__1::unordered_map<long, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)>, std::__1::hash<long>, std::__1::equal_to<long>, std::__1::allocator<std::__1::pair<long const, std::__1::function<void (std::__1::shared_ptr<Coordination::ZooKeeperResponse> const&)> > > >::clear() obj-x86_64-linux-gnu/../contrib/libcxx/include/unordered_map:1274:42 (clickhouse+0x14766374)
zoo2_1  |     #2 DB::KeeperStorageDispatcher::shutdown() obj-x86_64-linux-gnu/../src/Coordination/KeeperStorageDispatcher.cpp:319:38 (clickhouse+0x14766374)
zoo2_1  |     #3 DB::Context::shutdownKeeperStorageDispatcher() const obj-x86_64-linux-gnu/../src/Interpreters/Context.cpp:1712:44 (clickhouse+0x131a05dc)
zoo2_1  |     #4 DB::Keeper::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)::$_2::operator()() const obj-x86_64-linux-gnu/../programs/keeper/Keeper.cpp:408:5 (clickhouse+0x9211d28)
zoo2_1  |     #5 ext::basic_scope_guard<DB::Keeper::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)::$_2>::invoke() obj-x86_64-linux-gnu/../base/common/../ext/scope_guard.h:97:9 (clickhouse+0x9211d28)
zoo2_1  |     #6 ext::basic_scope_guard<DB::Keeper::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)::$_2>::~basic_scope_guard() obj-x86_64-linux-gnu/../base/common/../ext/scope_guard.h:47:28 (clickhouse+0x9211d28)
zoo2_1  |     #7 DB::Keeper::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) obj-x86_64-linux-gnu/../programs/keeper/Keeper.cpp:462:1 (clickhouse+0x9210032)
zoo2_1  |     #8 Poco::Util::Application::run() obj-x86_64-linux-gnu/../contrib/poco/Util/src/Application.cpp:334:8 (clickhouse+0x16feaeab)
zoo2_1  |     #9 DB::Keeper::run() obj-x86_64-linux-gnu/../programs/keeper/Keeper.cpp:239:25 (clickhouse+0x920d7fe)
zoo2_1  |     #10 Poco::Util::ServerApplication::run(int, char**) obj-x86_64-linux-gnu/../contrib/poco/Util/src/ServerApplication.cpp:611:9 (clickhouse+0x1700a823)
zoo2_1  |     #11 mainEntryClickHouseKeeper(int, char**) obj-x86_64-linux-gnu/../programs/keeper/Keeper.cpp:49:20 (clickhouse+0x920c39b)
zoo2_1  |     #12 main obj-x86_64-linux-gnu/../programs/main.cpp:374:12 (clickhouse+0x9018bd9)

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label May 22, 2021

std::lock_guard lock(push_request_mutex);

if (shutdown_called)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is here. Don't try to push new requests after the shutdown called.

@robot-clickhouse robot-clickhouse added pr-improvement Pull request with some product improvements and removed pr-bugfix Pull request with bugfix, not backported by default labels May 22, 2021
@alexey-milovidov alexey-milovidov self-assigned this May 22, 2021
@alesapin
Copy link
Copy Markdown
Member Author

@alesapin
Copy link
Copy Markdown
Member Author

No related failures, but too much unrelated :(

@alesapin alesapin merged commit 3f2d280 into master May 23, 2021
@alesapin alesapin deleted the fix_race_condition_keeper_shutdown branch May 23, 2021 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants