Skip to content

common: assert debug mutex lock is not held if !recursive#60037

Merged
batrick merged 5 commits intoceph:mainfrom
batrick:mutex-debugging-assert
Oct 9, 2024
Merged

common: assert debug mutex lock is not held if !recursive#60037
batrick merged 5 commits intoceph:mainfrom
batrick:mutex-debugging-assert

Conversation

@batrick
Copy link
Member

@batrick batrick commented Sep 28, 2024

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@batrick
Copy link
Member Author

batrick commented Sep 28, 2024

Tested with patch

diff --git a/src/osdc/Journaler.cc b/src/osdc/Journaler.cc
index 40cf67702f4c..b80205652d9e 100644
--- a/src/osdc/Journaler.cc
+++ b/src/osdc/Journaler.cc
@@ -1354,7 +1354,7 @@ void Journaler::trim()
 
 void Journaler::_trim()
 {
-  if (state == STATE_STOPPING)
+  if (is_stopping())
     return;
 
   ceph_assert(!readonly);

generates

   -12> 2024-09-28T01:42:58.131+0000 7f26e0f74640 -1 /home/pdonnell/ceph/src/common/mutex_debug.h: In function 'void ceph::mutex_debug_detail::mutex_debug_impl<<anonymous> >::lock(bool) [with bool Recursive = false]' thread 7f26e0f74640 time 2024-09-28T01:42:58.129910+0000
/home/pdonnell/ceph/src/common/mutex_debug.h: 177: FAILED ceph_assert(recursive || !is_locked_by_me())

 ceph version 19.3.0-5270-g15c2fbe0f951 (15c2fbe0f9516c3b633f3bd40d80e52e56640b17) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x11c) [0x7f26ed4c6bc9]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7f26ed4c6dd0]
 3: (ceph::mutex_debug_detail::mutex_debug_impl<false>::lock(bool)+0x81) [0x55f135e35919]
 4: (Journaler::_trim()+0x3e) [0x55f1362f7056]
 5: (Journaler::_finish_write_head(int, Journaler::Header&, C_OnFinisher*)+0x584) [0x55f1362fa592]
 6: (Journaler::C_WriteHead::finish(int)+0x1b) [0x55f1363048ed]
 7: (Context::complete(int)+0x9) [0x55f135e4c645]
 8: (Finisher::finisher_thread_entry()+0x5e1) [0x7f26ed44ea2b]
 9: (Finisher::FinisherThread::entry()+0xd) [0x55f135e8220f]
 10: (Thread::entry_wrapper()+0x3f) [0x7f26ed494ae7]
 11: (Thread::_entry_func(void*)+0x9) [0x7f26ed494aff]
 12: /lib64/libc.so.6(+0x89c52) [0x7f26ec289c52]
 13: /lib64/libc.so.6(+0x10ec80) [0x7f26ec30ec80]

seems to work as expected.

@batrick
Copy link
Member Author

batrick commented Sep 29, 2024

[ RUN      ] MutexDebug.NotRecursive
/home/jenkins-build/build/workspace/ceph-pull-requests/src/common/mutex_debug.h: In function 'void ceph::mutex_debug_detail::mutex_debug_impl<false>::lock(bool) [Recursive = false]' thread 7f75383fe440 time 2024-09-28T02:36:04.468287+0000
/home/jenkins-build/build/workspace/ceph-pull-requests/src/common/mutex_debug.h: 177: FAILED ceph_assert(recursive || !is_locked_by_me())
 ceph version Development (no_version) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b4) [0x7f753b2c3674]
 2: /home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libceph-common.so.2(+0x16e74bf) [0x7f753b2c34bf]
 3: (ceph::mutex_debug_detail::mutex_debug_impl<false>::lock(bool)+0x3e) [0x559467bd536e]
 4: (MutexDebug_NotRecursive_Test::TestBody()+0x93c) [0x559467bd09dc]
 5: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x7b) [0x559467c42e0b]
 6: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x6a) [0x559467c26b9a]
 7: (testing::Test::Run()+0xc3) [0x559467c048a3]
 8: (testing::TestInfo::Run()+0xda) [0x559467c055ea]
 9: (testing::TestSuite::Run()+0xfb) [0x559467c05e0b]
 10: (testing::internal::UnitTestImpl::RunAllTests()+0x442) [0x559467c13522]
 11: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x7b) [0x559467c45edb]
 12: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x6a) [0x559467c2949a]
 13: (testing::UnitTest::Run()+0xcb) [0x559467c1308b]
 14: (RUN_ALL_TESTS()+0x11) [0x559467be2911]
 15: main()
 16: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f7538fc0d90]
 17: __libc_start_main()
 18: _start()

fun. moving to draft.

@batrick batrick marked this pull request as draft September 29, 2024 13:34
@github-actions github-actions bot added the tests label Sep 30, 2024
@batrick batrick marked this pull request as ready for review September 30, 2024 19:54
@batrick batrick requested review from cbodley and idryomov September 30, 2024 19:54
@cbodley cbodley requested review from rzarzynski and tchaikov October 1, 2024 13:13
There's appropriate checks for unlock and post-lock but nothing to stop the
undefined behavior of a double-lock on a non-recursive mutex.

Signed-off-by: Patrick Donnelly <[email protected]>
Now that we confirm a lock is not held in mutex_debug::lock.

Signed-off-by: Patrick Donnelly <[email protected]>
Signed-off-by: Patrick Donnelly <[email protected]>
The C++ standard does not require that implementations raise std::system_error
when double-locking a non-recursive lock. Our implementation of debug_mutex
now catches this error with a ceph_assert so it cannot be caught.

Signed-off-by: Patrick Donnelly <[email protected]>
@batrick batrick force-pushed the mutex-debugging-assert branch from fba7f39 to e107ddf Compare October 2, 2024 14:49
@batrick batrick force-pushed the mutex-debugging-assert branch 6 times, most recently from 44cf987 to 73ef179 Compare October 3, 2024 13:00
@batrick
Copy link
Member Author

batrick commented Oct 3, 2024

71/302 Test #118: unittest_mutex_debug ...................... Passed 0.41 sec

@batrick batrick requested review from cbodley and idryomov October 3, 2024 15:42
@batrick
Copy link
Member Author

batrick commented Oct 3, 2024

jenkins test make check arm64

@batrick batrick force-pushed the mutex-debugging-assert branch from 73ef179 to a48080a Compare October 3, 2024 20:19
@idryomov
Copy link
Contributor

idryomov commented Oct 3, 2024

jenkins test make check

@batrick
Copy link
Member Author

batrick commented Oct 4, 2024

FATAL: Channel "hudson.remoting.Channel@2132300:JNLP4-connect connection from 8.43.84.3/8.43.84.3:52671": Remote call on JNLP4-connect connection from 8.43.84.3/8.43.84.3:52671 failed. The channel is closing down or has closed down
java.nio.channels.ClosedChannelException
	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:155)
	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:143)
	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:789)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
	at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@2132300:JNLP4-connect connection from 8.43.84.3/8.43.84.3:52671": Remote call on JNLP4-connect connection from 8.43.84.3/8.43.84.3:52671 failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:996)
	at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1147)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:538)
	at hudson.model.Run.execute(Run.java:1894)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
	at hudson.model.ResourceController.execute(ResourceController.java:101)
	at hudson.model.Executor.run(Executor.java:442)

https://jenkins.ceph.com/job/ceph-pull-requests/144450/consoleFull

@batrick
Copy link
Member Author

batrick commented Oct 4, 2024

jenkins test make check

@batrick
Copy link
Member Author

batrick commented Oct 4, 2024

@batrick
Copy link
Member Author

batrick commented Oct 9, 2024

@batrick batrick merged commit 3c9dd67 into ceph:main Oct 9, 2024
@batrick batrick deleted the mutex-debugging-assert branch October 9, 2024 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants