Skip to content

Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown#78858

Merged
azat merged 1 commit intoClickHouse:masterfrom
azat:rmv-alter-crash-fix
Apr 10, 2025
Merged

Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown#78858
azat merged 1 commit intoClickHouse:masterfrom
azat:rmv-alter-crash-fix

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Apr 8, 2025

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown

The problem is that in case of incorrect shutdown, i.e.:

2025.04.03 22:42:03.620982 [ 14841 ] {632881cd-4918-414c-a2ed-27bb6beee664} <Error> executeQuery: Code: 219. DB::Exception: New table appeared in database being dropped or detached. Try again. (DATABASE_NOT_EMPTY) (version 25.4.1.1) (from [::1]:37582) (comment: 03258_refreshable_mv_misc.sh) (query 1, line 2) (in query: drop database test_15_03258;), Stack trace (when copying this message, always include the lines below):

0. ./contrib/llvm-project/libcxx/include/__exception/exception.h:113: Poco::Exception::Exception(String const&, int) @ 0x0000000020b619a0
1. ./ci/tmp/build/./src/Common/Exception.cpp:108: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000010bd82d4
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x00000000086edcc0
3. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x00000000086fc67a
4. ./ci/tmp/build/./src/Interpreters/DatabaseCatalog.cpp:615: DB::DatabaseCatalog::detachDatabase(std::shared_ptr<DB::Context const>, String const&, bool, bool) @ 0x00000000180a72d7
5. ./ci/tmp/build/./src/Interpreters/InterpreterDropQuery.cpp:535: DB::InterpreterDropQuery::executeToDatabaseImpl(DB::ASTDropQuery const&, std::shared_ptr<DB::IDatabase>&, std::vector<StrongTypedef<wide::integer<128ul, unsigned int>, DB::UUIDTag>, std::allocator<StrongTypedef<wide::integer<128ul, unsigned int>, DB::UUIDTag>>>&) @ 0x00000000186c1105
6. ./ci/tmp/build/./src/Interpreters/InterpreterDropQuery.cpp:364: DB::InterpreterDropQuery::executeToDatabase(DB::ASTDropQuery const&) @ 0x00000000186bbe78
7. ./ci/tmp/build/./src/Interpreters/InterpreterDropQuery.cpp:100: DB::InterpreterDropQuery::executeSingleDropQuery(std::shared_ptr<DB::IAST> const&) @ 0x00000000186bae97
8. ./ci/tmp/build/./src/Interpreters/InterpreterDropQuery.cpp:73: DB::InterpreterDropQuery::execute() @ 0x00000000186baab0
9. ./ci/tmp/build/./src/Interpreters/executeQuery.cpp:1458: DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::shared_ptr<DB::IAST>&) @ 0x0000000018be45da
10. ./ci/tmp/build/./src/Interpreters/executeQuery.cpp:1625: DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x0000000018bdf1ef
11. ./ci/tmp/build/./src/Server/TCPHandler.cpp:665: DB::TCPHandler::runImpl() @ 0x000000001b9cb665
12. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2630: DB::TCPHandler::run() @ 0x000000001b9f23c8
13. ./ci/tmp/build/./base/poco/Net/src/TCPServerConnection.cpp:40: Poco::Net::TCPServerConnection::start() @ 0x0000000020c71da3
14. ./ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x0000000020c72612
15. ./ci/tmp/build/./base/poco/Foundation/src/ThreadPool.cpp:205: Poco::PooledThread::run() @ 0x0000000020be7cc3
16. ./ci/tmp/build/./base/poco/Foundation/src/Thread.cpp:45: Poco::(anonymous namespace)::RunnableHolder::run() @ 0x0000000020be6070
17. ./base/poco/Foundation/src/Thread_POSIX.cpp:335: Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000020be442a
18. __tsan_thread_start_func @ 0x0000000008661428
19. ? @ 0x00007f396f9fdac3
20. ? @ 0x00007f396fa8f850

The table will still exist, but the view will be NULL due to shutdown() had been called already.

And after this any ALTER of it will lead to crash.

So fix this by checking the view under mutex.

P.S. I've looked through other places and it is either called under refreshTask() (which should not be called after shutdown()) or already has check for view under mutex.

Fixes: #78103

@azat azat requested a review from al13n321 April 8, 2025 19:06
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Apr 8, 2025

Workflow [PR], commit [4bb71b9]

@clickhouse-gh clickhouse-gh bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-cloud labels Apr 8, 2025
@azat azat marked this pull request as draft April 9, 2025 15:07
@azat azat force-pushed the rmv-alter-crash-fix branch from 63c37c4 to 7afd599 Compare April 9, 2025 20:02
The problem is that in case of incorrect shutdown, i.e.:

    2025.04.03 22:42:03.620982 [ 14841 ] {632881cd-4918-414c-a2ed-27bb6beee664} <Error> executeQuery: Code: 219. DB::Exception: New table appeared in database being dropped or detached. Try again. (DATABASE_NOT_EMPTY) (version 25.4.1.1) (from [::1]:37582) (comment: 03258_refreshable_mv_misc.sh) (query 1, line 2) (in query: drop database test_15_03258;), Stack trace (when copying this message, always include the lines below):

    0. ./contrib/llvm-project/libcxx/include/__exception/exception.h:113: Poco::Exception::Exception(String const&, int) @ 0x0000000020b619a0
    1. ./ci/tmp/build/./src/Common/Exception.cpp:108: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000010bd82d4
    2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x00000000086edcc0
    3. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x00000000086fc67a
    4. ./ci/tmp/build/./src/Interpreters/DatabaseCatalog.cpp:615: DB::DatabaseCatalog::detachDatabase(std::shared_ptr<DB::Context const>, String const&, bool, bool) @ 0x00000000180a72d7
    5. ./ci/tmp/build/./src/Interpreters/InterpreterDropQuery.cpp:535: DB::InterpreterDropQuery::executeToDatabaseImpl(DB::ASTDropQuery const&, std::shared_ptr<DB::IDatabase>&, std::vector<StrongTypedef<wide::integer<128ul, unsigned int>, DB::UUIDTag>, std::allocator<StrongTypedef<wide::integer<128ul, unsigned int>, DB::UUIDTag>>>&) @ 0x00000000186c1105
    6. ./ci/tmp/build/./src/Interpreters/InterpreterDropQuery.cpp:364: DB::InterpreterDropQuery::executeToDatabase(DB::ASTDropQuery const&) @ 0x00000000186bbe78
    7. ./ci/tmp/build/./src/Interpreters/InterpreterDropQuery.cpp:100: DB::InterpreterDropQuery::executeSingleDropQuery(std::shared_ptr<DB::IAST> const&) @ 0x00000000186bae97
    8. ./ci/tmp/build/./src/Interpreters/InterpreterDropQuery.cpp:73: DB::InterpreterDropQuery::execute() @ 0x00000000186baab0
    9. ./ci/tmp/build/./src/Interpreters/executeQuery.cpp:1458: DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::shared_ptr<DB::IAST>&) @ 0x0000000018be45da
    10. ./ci/tmp/build/./src/Interpreters/executeQuery.cpp:1625: DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x0000000018bdf1ef
    11. ./ci/tmp/build/./src/Server/TCPHandler.cpp:665: DB::TCPHandler::runImpl() @ 0x000000001b9cb665
    12. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2630: DB::TCPHandler::run() @ 0x000000001b9f23c8
    13. ./ci/tmp/build/./base/poco/Net/src/TCPServerConnection.cpp:40: Poco::Net::TCPServerConnection::start() @ 0x0000000020c71da3
    14. ./ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x0000000020c72612
    15. ./ci/tmp/build/./base/poco/Foundation/src/ThreadPool.cpp:205: Poco::PooledThread::run() @ 0x0000000020be7cc3
    16. ./ci/tmp/build/./base/poco/Foundation/src/Thread.cpp:45: Poco::(anonymous namespace)::RunnableHolder::run() @ 0x0000000020be6070
    17. ./base/poco/Foundation/src/Thread_POSIX.cpp:335: Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000020be442a
    18. __tsan_thread_start_func @ 0x0000000008661428
    19. ? @ 0x00007f396f9fdac3
    20. ? @ 0x00007f396fa8f850

The table will still exist, but the view will be NULL due to shutdown()
had been called already.

And after this any ALTER of it will lead to crash.

So fix this by obtaining info for notifying under lock

P.S. I've looked through other places and it is either called under
refreshTask() (which should not be called after shutdown()) or already
has check for `view` under mutex.

Fixes: ClickHouse#78103
v2: do not call RefreshSet::notifyDependents() under lock (will lead to lock order inversion, found by @al13n321)
@azat azat force-pushed the rmv-alter-crash-fix branch from 7afd599 to 4bb71b9 Compare April 9, 2025 20:02
@azat azat marked this pull request as ready for review April 9, 2025 20:06
@azat
Copy link
Copy Markdown
Member Author

azat commented Apr 10, 2025

Stress test (amd_debug) — Killed by signal (in clickhouse-server.log)

@azat azat enabled auto-merge April 10, 2025 05:38
@azat azat added this pull request to the merge queue Apr 10, 2025
Merged via the queue into ClickHouse:master with commit b84c383 Apr 10, 2025
116 of 120 checks passed
@azat azat deleted the rmv-alter-crash-fix branch April 10, 2025 05:43
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 10, 2025
robot-ch-test-poll3 added a commit that referenced this pull request Apr 10, 2025
Cherry pick #78858 to 25.1: Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown
robot-clickhouse added a commit that referenced this pull request Apr 10, 2025
robot-ch-test-poll3 added a commit that referenced this pull request Apr 10, 2025
Cherry pick #78858 to 25.2: Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown
robot-clickhouse added a commit that referenced this pull request Apr 10, 2025
robot-ch-test-poll3 added a commit that referenced this pull request Apr 10, 2025
Cherry pick #78858 to 25.3: Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown
robot-clickhouse added a commit that referenced this pull request Apr 10, 2025
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created-cloud deprecated label, NOOP label Apr 10, 2025
al13n321 pushed a commit that referenced this pull request Apr 10, 2025
… after incorrect shutdown (#78945)

Co-authored-by: robot-clickhouse <[email protected]>
robot-clickhouse-ci-2 added a commit that referenced this pull request Apr 10, 2025
Backport #78858 to 25.2: Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown
robot-clickhouse-ci-1 added a commit that referenced this pull request Apr 10, 2025
Backport #78858 to 25.1: Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Apr 10, 2025
@azat azat added the 🍃 green ci 🌿 Fixing flaky tests in CI label Apr 10, 2025
@robot-clickhouse robot-clickhouse added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid access in MaterializedView/RefreshTask

6 participants