client: Fix a deadlock when osd is full by kotreshhr · Pull Request #60411 · ceph/ceph

kotreshhr · 2024-10-21T11:02:37Z

When osd is full, the client receives the notification and cancels the ongoing writes. If the ongoing writes are async, it could cause a dead lock as the async callback registered also takes the 'client_lock' which the handle_osd_map takes at the beginning.

Stripped down version of Traceback:

0  0x00007f4d77274960 in __lll_lock_wait ()
1  0x00007f4d7727aff2 in pthread_mutex_lock@@GLIBC_2.2.5 () 2  0x00007f4d7491b0a1 in __gthread_mutex_lock (__mutex=0x7f4d200f99b0) 3  std::mutex::lock (this=<optimized out>)
4  std::scoped_lock<std::mutex>::scoped_lock (__m=..., this=<optimized out>, this=<optimized out>, __m=...)
5  Client::C_Lock_Client_Finisher::finish (this=0x7f4ca0103550, r=-28)
6  0x00007f4d74888dfd in Context::complete (this=0x7f4ca0103550, r=<optimized out>)
7  0x00007f4d7498850c in std::__do_visit<...>(...) (__visitor=...) 
8  std::visit<Objecter::Op::complete(...) (__visitor=...) 
9  Objecter::Op::complete(...) (e=..., e=..., r=-28, ec=..., f=...)
10 Objecter::Op::complete (e=..., r=-28, ec=..., this=0x7f4ca022c7f0)
11 Objecter::op_cancel (this=0x7f4d200fab20, s=<optimized out>, tid=<optimized out>, r=-28)
12 0x00007f4d7498ea12 in Objecter::op_cancel_writes (this=0x7f4d200fab20, r=-28, pool=103)
13 0x00007f4d748e1c8e in Client::_handle_full_flag (this=0x7f4d200f9830, pool=103)
14 0x00007f4d748ed20c in Client::handle_osd_map (m=..., this=0x7f4d200f9830)
15 Client::ms_dispatch2 (this=0x7f4d200f9830, m=...) 
16 0x00007f4d75b8add2 in Messenger::ms_deliver_dispatch (m=..., this=0x7f4d200ed3e0) 
17 DispatchQueue::entry (this=0x7f4d200ed6f0)
18 0x00007f4d75c27fa1 in DispatchQueue::DispatchThread::entry (this=<optimized out>) 
19 0x00007f4d77277c02 in start_thread ()
20 0x00007f4d772fcc40 in clone3 ()

This patch fixes the same.

Fixes: https://tracker.ceph.com/issues/68641

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

dparmar18 · 2024-10-21T12:01:26Z

I faced similar stalls while writing an async io test case after filling up the cluster (fbff10a) but with fuse client. From what I found after discussions with greg and reading qa/suites/fs/functional/tasks/mds-full.yaml is that the things go haywire if the near full (and all the relevant) ratio(s) are set default (in other words closer to 1) or more and so when I used the configs vals used in the YAML (close to 0.6-0.7), everything worked fine and the stall was not seen. Do we know what these ratios were set in the cluster used by QE?

vshankar · 2024-10-21T12:08:11Z

I faced similar stalls while writing an async io test case after filling up the cluster (fbff10a) but with fuse client. From what I found after discussions with greg and reading qa/suites/fs/functional/tasks/mds-full.yaml is that the things go haywire if the near full (and all the relevant) ratio(s) are set default (in other words closer to 1) or more and so when I used the configs vals used in the YAML (close to 0.6-0.7), everything worked fine and the stall was not seen. Do we know what these ratios were set in the cluster used by QE?

Are you saying this was seen in fs suite test run? If yes, could you point me to the relevant run link if you have it?

That brings up the question -- are our tests enough to exercise this case or did we miss it in our runs thinking things could go bad when cluster is full?

dparmar18 · 2024-10-21T12:14:16Z

I faced similar stalls while writing an async io test case after filling up the cluster (fbff10a) but with fuse client. From what I found after discussions with greg and reading qa/suites/fs/functional/tasks/mds-full.yaml is that the things go haywire if the near full (and all the relevant) ratio(s) are set default (in other words closer to 1) or more and so when I used the configs vals used in the YAML (close to 0.6-0.7), everything worked fine and the stall was not seen. Do we know what these ratios were set in the cluster used by QE?

Are you saying this was seen in fs suite test run? If yes, could you point me to the relevant run link if you have it?

That brings up the question -- are our tests enough to exercise this case or did we miss it in our runs thinking things could go bad when cluster is full?

I've lost track of the runs, I need to find it in the PR, and they are quite old so i'm not sure if the links are live but yes - when cluster is full and the ratios are near 1 (or let it just be default), everything does down the hill and the stall is inevitable. I guess this happens with sync io too. I can't recall exactly. I can give this a run.

dparmar18 · 2024-10-21T12:16:32Z

That brings up the question -- are our tests enough to exercise this case or did we miss it in our runs thinking things could go bad when cluster is full?

We do exercise this case but with comfort ratios which always yield expected results.

kotreshhr · 2024-10-21T15:25:02Z

jenkins test api

kotreshhr · 2024-10-22T05:22:44Z

That brings up the question -- are our tests enough to exercise this case or did we miss it in our runs thinking things could go bad when cluster is full?

We do exercise this case but with comfort ratios which always yield expected results.

I don't think that is exactly this bug. Since the deadlock here is happening after fixing the corruption as part of 3ebe974

vshankar · 2024-10-22T12:24:43Z

That brings up the question -- are our tests enough to exercise this case or did we miss it in our runs thinking things could go bad when cluster is full?

We do exercise this case but with comfort ratios which always yield expected results.

I don't think that is exactly this bug. Since the deadlock here is happening after fixing the corruption as part of 3ebe974

Let's see if we can have a test case for this. @kotreshhr did mention that it could be possible to test this via the client blocklist codepath.

src/client/Client.cc

kotreshhr · 2024-10-22T17:18:52Z

That brings up the question -- are our tests enough to exercise this case or did we miss it in our runs thinking things could go bad when cluster is full?

We do exercise this case but with comfort ratios which always yield expected results.

I don't think that is exactly this bug. Since the deadlock here is happening after fixing the corruption as part of 3ebe974

Let's see if we can have a test case for this. @kotreshhr did mention that it could be possible to test this via the client blocklist codepath.

I didn't find a easy way to add the test in teuthology. But I could manually reproduce this and tested. This fix breaks the lock dependency. So more work is required here.

Traceback with debug build:

#0  0x00007f78a0908664 in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007f78a08afc4e in raise () from /lib64/libc.so.6
#2  0x00007f78a0897902 in abort () from /lib64/libc.so.6
#3  0x00007f789f40e50e in ceph::__ceph_abort (file=file@entry=0x7f789f9645e0 "/home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/common/lockdep.cc", line=line@entry=337, func=func@entry=0x7f789f964678 "int lockdep_will_lock(const char*, int, bool, bool)", msg="abort() called") at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/common/assert.cc:198
#4  0x00007f789f49577c in lockdep_will_lock (name=0x32dd1b30 "Client::client_lock", id=<optimized out>, force_backtrace=<optimized out>, recursive=recursive@entry=false) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/common/lockdep.cc:337
#5  0x00007f789f498957 in ceph::mutex_debug_detail::mutex_debugging_base::_will_lock (this=this@entry=0x32dce6b0, recursive=recursive@entry=false) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/common/mutex_debug.cc:49
#6  0x00007f78668523cc in ceph::mutex_debug_detail::mutex_debug_impl<false>::lock (this=this@entry=0x32dce6b0, no_lockdep=no_lockdep@entry=false) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/common/mutex_debug.h:179
#7  0x00007f786685a8bf in std::scoped_lock<ceph::mutex_debug_detail::mutex_debug_impl<false> >::scoped_lock (this=0x7f78495fee78, __m=...) at /usr/include/c++/14/mutex:787
#8  Client::C_Lock_Client_Finisher::finish (this=0x7f774401f220, r=-108) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/client/Client.cc:11426
#9  0x00007f786684b691 in Context::complete (this=0x7f774401f220, r=<optimized out>) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/include/Context.h:99
#10 0x00007f786698744d in Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}::operator()<Context*>(Context*&&) const (__closure=0x7f78495feed0, arg=@0x7f774406d9c0: 0x7f774401f220) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/Objecter.h:2050
#11 std::__invoke_impl<void, Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}, Context*>(std::__invoke_other, Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, Context*&&) (__f=...) at /usr/include/c++/14/bits/invoke.h:61
#12 std::__invoke<Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}, Context*>(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, Context*&&) (__fn=...) at /usr/include/c++/14/bits/invoke.h:96
#13 std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<std::__detail::__variant::__deduce_visit_result<void> (*)(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&)>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&) (__visitor=..., __vars#0=...) at /usr/include/c++/14/variant:1060
#14 std::__do_visit<std::__detail::__variant::__deduce_visit_result<void>, Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*> >(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&) (__visitor=...) at /usr/include/c++/14/variant:1821
#15 0x00007f786698746f in std::visit<Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*> >(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&) (__visitor=...) at /usr/include/c++/14/variant:1882
#16 0x00007f78669874bf in Objecter::Op::complete (f=..., ec=..., r=r@entry=-108, e=...) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/Objecter.h:2047
#17 0x00007f78669590ba in Objecter::Op::complete (this=0x7f774406d5d0, ec=..., r=-108, e=...) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/Objecter.h:2062
#18 Objecter::op_cancel (this=this@entry=0x32dc9f50, s=s@entry=0x7f7770013270, tid=<optimized out>, r=r@entry=-108) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/Objecter.cc:2520
#19 0x00007f7866959aed in Objecter::op_cancel_writes (this=0x32dc9f50, r=-108, pool=-1) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/Objecter.cc:2619
#20 0x00007f78668c6ac9 in Client::handle_osd_map (this=this@entry=0x32dce4f0, m=...) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/client/Client.cc:2905
#21 0x00007f78668c782e in Client::ms_dispatch2 (this=0x32dce4f0, m=...) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/client/Client.cc:2980
#22 0x00007f789f53d7fd in Messenger::ms_deliver_dispatch (this=0x32dc5910, m=...) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/msg/Messenger.h:740
#23 0x00007f789f53a344 in DispatchQueue::entry (this=0x32dc5cc8) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/msg/DispatchQueue.cc:201
#24 0x00007f789f5ebc13 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/msg/DispatchQueue.h:101
#25 0x00007f789f3ed669 in Thread::entry_wrapper (this=0x32dc5e58) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/common/Thread.cc:87
#26 0x00007f789f3ed681 in Thread::_entry_func (arg=<optimized out>) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/common/Thread.cc:74
#27 0x00007f78a09066d7 in start_thread () from /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#28 0x00007f78a098a60c in clone3 () from /lib64/libc.so.6

Lock dependency broken:

2024-10-22T21:36:00.239+0530 7f78a04c9240 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-22T21:36:00.245+0530 7f78a04c9240 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-22T21:36:00.275+0530 7f78a04c9240 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-22T21:36:00.280+0530 7f78a04c9240 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-22T21:36:00.280+0530 7f78a04c9240 -1 asok(0x32a4bfe0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/build/asok/client.vstart.112582.asok': (17) File exists
2024-10-22T21:53:39.076+0530 7f78496006c0  0 
------------------------------------
existing dependency Client::client_lock (41) -> Objecter::rwlock (24) at:

2024-10-22T21:53:39.079+0530 7f78496006c0  0 new dependency Objecter::rwlock (24) -> Client::client_lock (41) creates a cycle at
 ceph version Development (no_version) squid (dev)
 1: (ceph::mutex_debug_detail::mutex_debugging_base::_will_lock(bool)+0x17) [0x7f789f498957]
 2: (ceph::mutex_debug_detail::mutex_debug_impl<false>::lock(bool)+0x38) [0x7f78668523cc]
 3: /lib64/libcephfs.so.2(+0x5a8bf) [0x7f786685a8bf]
 4: (Context::complete(int)+0x9) [0x7f786684b691]
 5: /lib64/libcephfs.so.2(+0x18744d) [0x7f786698744d]
 6: /lib64/libcephfs.so.2(+0x18746f) [0x7f786698746f]
 7: /lib64/libcephfs.so.2(+0x1874bf) [0x7f78669874bf]
 8: /lib64/libcephfs.so.2(+0x1590ba) [0x7f78669590ba]
 9: /lib64/libcephfs.so.2(+0x159aed) [0x7f7866959aed]
 10: /lib64/libcephfs.so.2(+0xc6ac9) [0x7f78668c6ac9]
 11: /lib64/libcephfs.so.2(+0xc782e) [0x7f78668c782e]
 12: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x5d) [0x7f789f53d7fd]
 13: (DispatchQueue::entry()+0x418) [0x7f789f53a344]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f789f5ebc13]
 15: (Thread::entry_wrapper()+0x53) [0x7f789f3ed669]
 16: (Thread::_entry_func(void*)+0x9) [0x7f789f3ed681]
 17: /lib64/libc.so.6(+0x976d7) [0x7f78a09066d7]
 18: /lib64/libc.so.6(+0x11b60c) [0x7f78a098a60c]

2024-10-22T21:53:39.080+0530 7f78496006c0  0 btw, i am holding these locks:
2024-10-22T21:53:39.080+0530 7f78496006c0  0   Objecter::rwlock (24)
2024-10-22T21:53:39.080+0530 7f78496006c0  0

batrick · 2024-10-22T17:38:04Z

jenkins test make check arm64

kotreshhr · 2024-10-22T17:46:11Z

@batrick I think you missed the update #60411 (comment) This fix is breaking the lock dependency. I need to see what's the best way to fix this. Any ideas ?

batrick · 2024-10-22T18:07:34Z

@batrick I think you missed the update #60411 (comment) This fix is breaking the lock dependency. I need to see what's the best way to fix this. Any ideas ?

Yes, in the back of my mind I was wondering about that. I think Client::client_lock (41) -> Objecter::rwlock (24) is reasonable but the reverse should not be. Looking at:

Objecter::op_cancel_writes

The rwlock could be structured differently so the rwlock can be dropped before actually cancelling ops. There are a few problems I see:

int Objecter::op_cancel(OSDSession *s, ceph_tid_t tid, int r) is inherently unsafe because the OSDSession* may be invalidated by a session close. We're not doing any kind of reference counting on it.
There is another int Objecter::op_cancel(ceph_tid_t tid, int r) which looks up the session with appropriate session locks (but not the rwlock when traversing sessions??) but it's probably expensive.

I think doing the cancel of the std::vector<ceph_tid_t> after dropping the rwlock is the way to go but the code needs a bit of refactoring work. The inconsistent use of locks is really concerning.

vshankar · 2024-10-23T11:10:16Z

@batrick I think you missed the update #60411 (comment) This fix is breaking the lock dependency. I need to see what's the best way to fix this. Any ideas ?

Yes, in the back of my mind I was wondering about that. I think Client::client_lock (41) -> Objecter::rwlock (24) is reasonable but the reverse should not be. Looking at:

Objecter::op_cancel_writes

The rwlock could be structured differently so the rwlock can be dropped before actually cancelling ops. There are a few problems I see:

int Objecter::op_cancel(OSDSession *s, ceph_tid_t tid, int r) is inherently unsafe because the OSDSession* may be invalidated by a session close. We're not doing any kind of reference counting on it.

There is another int Objecter::op_cancel(ceph_tid_t tid, int r) which looks up the session with appropriate session locks (but not the rwlock when traversing sessions??) but it's probably expensive.

I think doing the cancel of the std::vector<ceph_tid_t> after dropping the rwlock is the way to go but the code needs a bit of refactoring work. The inconsistent use of locks is really concerning.

How about queuing the context callback in a finisher thread? Client has objecter_finisher which is passed to Filer class. objecter_finisher also is used by WritebackHandler.

kotreshhr · 2024-10-23T12:49:30Z

@batrick I think you missed the update #60411 (comment) This fix is breaking the lock dependency. I need to see what's the best way to fix this. Any ideas ?

Yes, in the back of my mind I was wondering about that. I think Client::client_lock (41) -> Objecter::rwlock (24) is reasonable but the reverse should not be. Looking at:
Objecter::op_cancel_writes
The rwlock could be structured differently so the rwlock can be dropped before actually cancelling ops. There are a few problems I see:

int Objecter::op_cancel(OSDSession *s, ceph_tid_t tid, int r) is inherently unsafe because the OSDSession* may be invalidated by a session close. We're not doing any kind of reference counting on it.

There is another int Objecter::op_cancel(ceph_tid_t tid, int r) which looks up the session with appropriate session locks (but not the rwlock when traversing sessions??) but it's probably expensive.

I think doing the cancel of the std::vector<ceph_tid_t> after dropping the rwlock is the way to go but the code needs a bit of refactoring work. The inconsistent use of locks is really concerning.

How about queuing the context callback in a finisher thread? Client has objecter_finisher which is passed to Filer class. objecter_finisher also is used by WritebackHandler.

I think the assumption of op_cancel always succeeds in the following code is required not to cause consistency issues.

epoch_t Objecter::op_cancel_writes(int r, int64_t pool)
{ 
  unique_lock wl(rwlock);
  
  std::vector<ceph_tid_t> to_cancel;
  bool found = false;
  
  for (auto siter = osd_sessions.begin();
       siter != osd_sessions.end(); ++siter) {
    OSDSession *s = siter->second; 
    shared_lock sl(s->lock);
    for (auto op_i = s->ops.begin();
         op_i != s->ops.end(); ++op_i) {
      if (op_i->second->target.flags & CEPH_OSD_FLAG_WRITE
          && (pool == -1 || op_i->second->target.target_oloc.pool == pool)) {
        to_cancel.push_back(op_i->first);
      }
    }
    sl.unlock();
    for (auto titer = to_cancel.begin(); titer != to_cancel.end(); ++titer) {
      int cancel_result = op_cancel(s, *titer, r);
      // We hold rwlock across search and cancellation, so cancels
      // should always succeed
      ceph_assert(cancel_result == 0);
    }
....
....

I tried a solution to unlock and lock the rwlock and slock (session lock) before calling the call back but that resulted two failures

libaio fio - Only one thread received the EIO and the others were still went to completion but mount wasn't accessible. No crashes though.
I saw a crash from handle_osd_op_reply failing to call the callback registered. See the crash below.

So releasing and acquiring the rwlock and slock for the sake of calling callback registered is leading to issues in multiple places.

#0  0x00007fd9f9e5a8c9 in Client::C_Lock_Client_Finisher::finish (this=0x7fd8140d8bc0, r=-108)
   at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/client/Client.cc:11427
#1  0x00007fd9f9e4b691 in Context::complete (this=0x7fd8140d8bc0, r=<optimized out>)
   at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/include/Context.h:99
#2  0x00007fd9f9f8768d in Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}::operator()<Context*>(Context*&&) const (__closure=0x7fd9ef3fe800, 
   arg=@0x7fd9ef3fe9f0: 0x7fd8140d8bc0)
   at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/Objecter.h:2050
#3  std::__invoke_impl<void, Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}, Context*>(std::__invoke_other, Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, Context*&&) (__f=...) at /usr/include/c++/14/bits/invoke.h:61
#4  std::__invoke<Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&--Type <RET> for more, q to quit, c to continue without paging--c
&)#1}, Context*>(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, Context*&&) (__fn=...) at /usr/include/c++/14/bits/invoke.h:96
#5  std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<std::__detail::__variant::__deduce_visit_result<void> (*)(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&)>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&) (__visitor=..., __vars#0=...)
   at /usr/include/c++/14/variant:1060
#6  std::__do_visit<std::__detail::__variant::__deduce_visit_result<void>, Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*> >(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&) (__visitor=...) at /usr/include/c++/14/variant:1821
#7  0x00007fd9f9f876af in std::visit<Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*> >(Objecter::Op::complete(std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&, boost::system::error_code, int, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>)::{lambda(auto:1&&)#1}&&, std::variant<boost::asio::any_completion_handler<void (boost::system::error_code)>, fu2::abi_310::detail::function<fu2::abi_310::detail::config<true, false, 16ul>, fu2::abi_310::detail::property<true, false, void (boost::system::error_code)> >, Context*>&&) (__visitor=...) at /usr/include/c++/14/variant:1882
#8  0x00007fd9f9f876ff in Objecter::Op::complete (f=..., ec=..., r=r@entry=-108, e=...)
   at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/Objecter.h:2047
#9  0x00007fd9f9f5e3ff in Objecter::handle_osd_op_reply (this=this@entry=0x347b28f0, 
   m=m@entry=0x7fd990009af0)

vshankar · 2024-10-23T15:11:08Z

@batrick I think you missed the update #60411 (comment) This fix is breaking the lock dependency. I need to see what's the best way to fix this. Any ideas ?

Yes, in the back of my mind I was wondering about that. I think Client::client_lock (41) -> Objecter::rwlock (24) is reasonable but the reverse should not be. Looking at:
Objecter::op_cancel_writes
The rwlock could be structured differently so the rwlock can be dropped before actually cancelling ops. There are a few problems I see:

int Objecter::op_cancel(OSDSession *s, ceph_tid_t tid, int r) is inherently unsafe because the OSDSession* may be invalidated by a session close. We're not doing any kind of reference counting on it.

There is another int Objecter::op_cancel(ceph_tid_t tid, int r) which looks up the session with appropriate session locks (but not the rwlock when traversing sessions??) but it's probably expensive.

I think doing the cancel of the std::vector<ceph_tid_t> after dropping the rwlock is the way to go but the code needs a bit of refactoring work. The inconsistent use of locks is really concerning.

How about queuing the context callback in a finisher thread? Client has objecter_finisher which is passed to Filer class. objecter_finisher also is used by WritebackHandler.

I think the assumption of op_cancel always succeeds in the following code is required not to cause consistency issues.
epoch_t Objecter::op_cancel_writes(int r, int64_t pool)
{ 
  unique_lock wl(rwlock);
  
  std::vector<ceph_tid_t> to_cancel;
  bool found = false;
  
  for (auto siter = osd_sessions.begin();
       siter != osd_sessions.end(); ++siter) {
    OSDSession *s = siter->second; 
    shared_lock sl(s->lock);
    for (auto op_i = s->ops.begin();
         op_i != s->ops.end(); ++op_i) {
      if (op_i->second->target.flags & CEPH_OSD_FLAG_WRITE
          && (pool == -1 || op_i->second->target.target_oloc.pool == pool)) {
        to_cancel.push_back(op_i->first);
      }
    }
    sl.unlock();
    for (auto titer = to_cancel.begin(); titer != to_cancel.end(); ++titer) {
      int cancel_result = op_cancel(s, *titer, r);
      // We hold rwlock across search and cancellation, so cancels
      // should always succeed
      ceph_assert(cancel_result == 0);
    }
....
....
I tried a solution to unlock and lock the rwlock and slock (session lock) before calling the call back but that resulted two failures

I think that would require a non-trivial refactor in many places in objecter. Were you able to test out with queuing via a finisher thread?

kotreshhr · 2024-10-23T17:57:03Z

@batrick I think you missed the update #60411 (comment) This fix is breaking the lock dependency. I need to see what's the best way to fix this. Any ideas ?

Yes, in the back of my mind I was wondering about that. I think Client::client_lock (41) -> Objecter::rwlock (24) is reasonable but the reverse should not be. Looking at:
Objecter::op_cancel_writes
The rwlock could be structured differently so the rwlock can be dropped before actually cancelling ops. There are a few problems I see:

int Objecter::op_cancel(OSDSession *s, ceph_tid_t tid, int r) is inherently unsafe because the OSDSession* may be invalidated by a session close. We're not doing any kind of reference counting on it.

There is another int Objecter::op_cancel(ceph_tid_t tid, int r) which looks up the session with appropriate session locks (but not the rwlock when traversing sessions??) but it's probably expensive.

I think doing the cancel of the std::vector<ceph_tid_t> after dropping the rwlock is the way to go but the code needs a bit of refactoring work. The inconsistent use of locks is really concerning.

How about queuing the context callback in a finisher thread? Client has objecter_finisher which is passed to Filer class. objecter_finisher also is used by WritebackHandler.

I think the assumption of op_cancel always succeeds in the following code is required not to cause consistency issues.
epoch_t Objecter::op_cancel_writes(int r, int64_t pool)
{ 
  unique_lock wl(rwlock);
  
  std::vector<ceph_tid_t> to_cancel;
  bool found = false;
  
  for (auto siter = osd_sessions.begin();
       siter != osd_sessions.end(); ++siter) {
    OSDSession *s = siter->second; 
    shared_lock sl(s->lock);
    for (auto op_i = s->ops.begin();
         op_i != s->ops.end(); ++op_i) {
      if (op_i->second->target.flags & CEPH_OSD_FLAG_WRITE
          && (pool == -1 || op_i->second->target.target_oloc.pool == pool)) {
        to_cancel.push_back(op_i->first);
      }
    }
    sl.unlock();
    for (auto titer = to_cancel.begin(); titer != to_cancel.end(); ++titer) {
      int cancel_result = op_cancel(s, *titer, r);
      // We hold rwlock across search and cancellation, so cancels
      // should always succeed
      ceph_assert(cancel_result == 0);
    }
....
....
I tried a solution to unlock and lock the rwlock and slock (session lock) before calling the call back but that resulted two failures
I think that would require a non-trivial refactor in many places in objecter. Were you able to test out with queuing via a finisher thread?

It worked for the first time. I will have to do few more testing to get confidence. I will update once I do that.

kotreshhr · 2024-10-23T18:44:04Z

@vshankar I tested the approach of queuing the context callback using the finisher thread. It works! I didn't see any issues.

Here is the manual test done.
The fio is run on TERMINAL-1 as below. The client is blocklisted from TERMINAL-2 as below after fio started rw.
we can observe all the pending async io has received EIO and fio failed gracefully with no hang.
On TERMINAL-2, we can also see that the nfs ganesha is not killed and still running fine. Also observed, no thread is hanging in libcephfs in ganesha process.

TERMINAL -1 

kotresh:build$ sudo mount -t nfs -o port=44349 192.168.1.114:/cephfs /home/kotresh/ganesh_mount1kotresh:build$ sudo fio --ioengine=libaio -directory=/home/kotresh/ganesh_mount1 --filename=fio.randrw.test2 --name=job --bs=16k --direct=1  --group_reporting --iodepth=128  --randrepeat=0 --norandommap=1 --thread=2 --ramp_time=20s --offset_increment=5% --size=2G --time_based --runtime=60 --ramp_time=1s --percentage_random=0 --rw=randrw --rwmixread=50
job: (g=0): rw=randrw, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=libaio, iodepth=128
fio-3.36
Starting 1 thread
job: Laying out IO file (1 file / 2048MiB)
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11862016, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11878400, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11862016, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11878400, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11894784, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11911168, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11927552, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11943936, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11960320, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11976704, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=11993088, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12009472, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12025856, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12042240, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12058624, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11894784, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11943936, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12025856, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11911168, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12075008, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12058624, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11960320, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12091392, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12042240, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12500992, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11927552, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12124160, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12140544, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12156928, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11993088, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12173312, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12189696, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12189696, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12173312, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12206080, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12222464, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12009472, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12288000, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12238848, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12255232, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12271616, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12353536, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12288000, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12369920, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12304384, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12320768, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12337152, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12353536, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12369920, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12386304, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12402688, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12419072, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12419072, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12500992, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12435456, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12107776, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12124160, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12451840, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12468224, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12484608, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12550144, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12107776, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=11976704, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12091392, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12599296, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12075008, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: read offset=12517376, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12140544, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12156928, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12206080, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12222464, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12238848, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12255232, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12271616, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12304384, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12337152, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12402688, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12386304, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12320768, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12451840, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12435456, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12468224, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12484608, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12533760, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12517376, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12566528, buflen=16384
fio: io_u error on file /home/kotresh/ganesh_mount1/fio.randrw.test2: Remote I/O error: write offset=12582912, buflen=16384
fio: pid=118763, err=121/file:io_u.c:1896, func=io_u error, error=Remote I/O error

job: (groupid=0, jobs=1): err=121 (file:io_u.c:1896, func=io_u error, error=Remote I/O error): pid=118763: Thu Oct 24 00:01:40 2024
  read: IOPS=55, BW=868KiB/s (889kB/s)(10.7MiB/12571msec)
    slat (nsec): min=1877, max=49754, avg=6970.58, stdev=5422.17
    clat (msec): min=178, max=1731, avg=899.77, stdev=234.73
     lat (msec): min=178, max=1731, avg=899.78, stdev=234.73
    clat percentiles (msec):
     |  1.00th=[  241],  5.00th=[  493], 10.00th=[  651], 20.00th=[  726],
     | 30.00th=[  768], 40.00th=[  835], 50.00th=[  894], 60.00th=[  969],
     | 70.00th=[ 1062], 80.00th=[ 1116], 90.00th=[ 1200], 95.00th=[ 1234],
     | 99.00th=[ 1301], 99.50th=[ 1334], 99.90th=[ 1737], 99.95th=[ 1737],
     | 99.99th=[ 1737]
   bw (  KiB/s): min=  608, max= 1696, per=100.00%, avg=870.52, stdev=225.83, samples=25
   iops        : min=   38, max=  106, avg=54.40, stdev=14.12, samples=25
  write: IOPS=53, BW=921KiB/s (944kB/s)(11.3MiB/12571msec); 0 zone resets
    slat (nsec): min=2127, max=38750, avg=7842.07, stdev=5627.30
    clat (msec): min=637, max=2702, avg=1445.63, stdev=428.49
     lat (msec): min=637, max=2702, avg=1445.64, stdev=428.49
    clat percentiles (msec):
     |  1.00th=[  818],  5.00th=[  927], 10.00th=[  969], 20.00th=[ 1083],
     | 30.00th=[ 1116], 40.00th=[ 1234], 50.00th=[ 1334], 60.00th=[ 1469],
     | 70.00th=[ 1636], 80.00th=[ 1905], 90.00th=[ 2072], 95.00th=[ 2232],
     | 99.00th=[ 2500], 99.50th=[ 2702], 99.90th=[ 2702], 99.95th=[ 2702],
     | 99.99th=[ 2702]
   bw (  KiB/s): min= 1920, max= 2052, per=100.00%, avg=1987.55, stdev=46.80, samples=11
   iops        : min=  120, max=  128, avg=124.18, stdev= 2.89, samples=11
  lat (msec)   : 250=0.51%, 500=2.27%, 750=11.57%, 1000=23.28%, 2000=56.44%
  lat (msec)   : >=2000=8.86%
  cpu          : usr=0.06%, sys=0.10%, ctx=1316, majf=0, minf=0
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=697,669,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=868KiB/s (889kB/s), 868KiB/s-868KiB/s (889kB/s-889kB/s), io=10.7MiB (11.2MB), run=12571-12571msec
  WRITE: bw=921KiB/s (944kB/s), 921KiB/s-921KiB/s (944kB/s-944kB/s), io=11.3MiB (11.9MB), run=12571-12571msec
kotresh:build$

TERMINAL 2
kotresh:build$ bin/ceph tell mds.a session ls
2024-10-24T00:01:00.960+0530 7fed8d0006c0 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-24T00:01:00.968+0530 7fed8d0006c0 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-24T00:01:00.979+0530 7feda2012b80 10 client.0 _ll_register_callbacks cb 0x5618c918cdd0 invalidate_ino_cb 0 invalidate_dentry_cb 0 switch_interrupt_cb 0 remount_cb 0
2024-10-24T00:01:00.980+0530 7fed634006c0 10 client.0 ms_handle_connect on v2:192.168.1.114:40349/0
2024-10-24T00:01:00.980+0530 7feda2012b80 10 client.4211 fetch_fsmap learned FSMap version 4
2024-10-24T00:01:00.980+0530 7feda2012b80 10 client.4211 fetch_fsmap finished waiting for FSMap version 4
2024-10-24T00:01:00.980+0530 7feda2012b80 10 client.4211 resolve_mds: resolved name 'a' to daemon mds.a
2024-10-24T00:01:00.980+0530 7feda2012b80 20 client.4211 populate_metadata read hostname 'li-4f30544c-234f-11b2-a85c-fca7e10b62e2.ibm.com'
2024-10-24T00:01:00.980+0530 7feda2012b80  4 client.4211 mds_command: new command op to 4131 tid=0 multi_id=0 [{"prefix": "get_command_descriptions"}]
2024-10-24T00:01:00.981+0530 7fed634006c0 10 client.4211 ms_handle_connect on v2:192.168.1.114:6810/4294866277
2024-10-24T00:01:00.981+0530 7fed634006c0 10 client.4211 handle_command_reply: tid=0
2024-10-24T00:01:00.982+0530 7feda2012b80  1 client.4211 shutdown
2024-10-24T00:01:00.982+0530 7fed634006c0  0 client.4211 ms_handle_reset on v2:192.168.1.114:6810/4294866277
2024-10-24T00:01:00.982+0530 7feda2012b80 20 client.4211 trim_cache size 0 max 16384
2024-10-24T00:01:00.998+0530 7feda2012b80 10 client.0 _ll_register_callbacks cb 0x7fed70055ea0 invalidate_ino_cb 0 invalidate_dentry_cb 0 switch_interrupt_cb 0 remount_cb 0
2024-10-24T00:01:00.999+0530 7fed634006c0 10 client.0 ms_handle_connect on v2:192.168.1.114:40349/0
2024-10-24T00:01:00.999+0530 7feda2012b80 10 client.4213 fetch_fsmap learned FSMap version 4
2024-10-24T00:01:00.999+0530 7feda2012b80 10 client.4213 fetch_fsmap finished waiting for FSMap version 4
2024-10-24T00:01:00.999+0530 7feda2012b80 10 client.4213 resolve_mds: resolved name 'a' to daemon mds.a
2024-10-24T00:01:00.999+0530 7feda2012b80 20 client.4213 populate_metadata read hostname 'li-4f30544c-234f-11b2-a85c-fca7e10b62e2.ibm.com'
2024-10-24T00:01:00.999+0530 7feda2012b80  4 client.4213 mds_command: new command op to 4131 tid=0 multi_id=0 [{"prefix": "session ls"}]
2024-10-24T00:01:01.000+0530 7fed634006c0 10 client.4213 ms_handle_connect on v2:192.168.1.114:6810/4294866277
2024-10-24T00:01:01.001+0530 7fed634006c0 10 client.4213 handle_command_reply: tid=0
2024-10-24T00:01:01.001+0530 7feda2012b80  1 client.4213 shutdown
2024-10-24T00:01:01.001+0530 7fed634006c0  0 client.4213 ms_handle_reset on v2:192.168.1.114:6810/4294866277
2024-10-24T00:01:01.001+0530 7feda2012b80 20 client.4213 trim_cache size 0 max 16384
[
    {
        "id": 4181,
        "entity": {
            "name": {
                "type": "client",
                "num": 4181
            },
            "addr": {
                "type": "any",
                "addr": "192.168.1.114:0",
                "nonce": 498969874
            }
        },
        "state": "open",
        "num_leases": 0,
        "num_caps": 1,
        "request_load_avg": 0,
        "uptime": 106.312054258,
        "requests_in_flight": 0,
        "num_completed_requests": 0,
        "num_completed_flushes": 0,
        "reconnecting": false,
        "recall_caps": {
            "value": 0,
            "halflife": 60
        },
        "release_caps": {
            "value": 0,
            "halflife": 60
        },
        "recall_caps_throttle": {
            "value": 0,
            "halflife": 1.5
        },
        "recall_caps_throttle2o": {
            "value": 0,
            "halflife": 0.5
        },
        "session_cache_liveness": {
            "value": 0.78223092939475258,
            "halflife": 300
        },
        "cap_acquisition": {
            "value": 0,
            "halflife": 30
        },
        "last_trim_completed_requests_tid": 2,
        "last_trim_completed_flushes_tid": 0,
        "delegated_inos": [],
        "inst": "client.4181 192.168.1.114:0/498969874",
        "completed_requests": [],
        "prealloc_inos": [],
        "client_metadata": {
            "client_features": {
                "feature_bits": "0x00000000003fffff"
            },
            "metric_spec": {
                "metric_flags": {
                    "feature_bits": "0x000000000000ffff"
                }
            },
            "ceph_sha1": "no_version",
            "ceph_version": "ceph version Development (no_version) squid (dev)",
            "entity_id": "nfs.vstart.a.0558fddd",
            "hostname": "li-4f30544c-234f-11b2-a85c-fca7e10b62e2.ibm.com",
            "pid": "117937",
            "root": "/",
            "timeout": "300",
            "uuid": "ganesha-a-0001"
        }
    }
]
kotresh:build$ ceph osd blocklist ls
2024-10-24T00:01:07.997+0530 7f3d168006c0 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-24T00:01:08.015+0530 7f3d168006c0 -1 WARNING: all dangerous and experimental features are enabled.
listed 0 entries
kotresh:build$ ceph osd blocklist add 192.168.1.114:0/498969874
2024-10-24T00:01:39.493+0530 7f5ee6c006c0 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-24T00:01:39.500+0530 7f5ee6c006c0 -1 WARNING: all dangerous and experimental features are enabled.
blocklisting 192.168.1.114:0/498969874 until 2024-10-24T01:01:39.619887+0530 (3600 sec)

kotresh:build$ ps -ef | grep ganesha.nfsd
root      117937    3824  3 Oct23 ?        00:00:05 ganesha.nfsd -L /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/build/out/ganesha-a.log -f /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/build/dev/ganesha.a/ganesha-a.conf -p /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/build/out/ganesha-a.pid -N NIV_EVENT
kotresh   118852   37197  0 00:01 pts/4    00:00:00 grep --color=auto ganesha.nfsd

$ sudo gdb -ex 'set follow-fork-mode child' -ex 'set logging file gdb.now.txt' -ex 'set logging enabled yes' -p 117937

kotresh:build$ grep Client gdb.now.txt 
#4  0x00007fefe048fe67 in operator() (__closure=0x17c1d328) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/client/Client.cc:7149
#5  0x00007fefe048ff15 in std::__invoke_impl<void, Client::start_tick_thread()::<lambda()> > (__f=...) at /usr/include/c++/14/bits/invoke.h:60
#6  std::__invoke<Client::start_tick_thread()::<lambda()> > (__fn=...) at /usr/include/c++/14/bits/invoke.h:96
#7  std::thread::_Invoker<std::tuple<Client::start_tick_thread()::<lambda()> > >::_M_invoke<0> (this=<optimized out>) at /usr/include/c++/14/bits/std_thread.h:301
#8  std::thread::_Invoker<std::tuple<Client::start_tick_thread()::<lambda()> > >::operator() (this=<optimized out>) at /usr/include/c++/14/bits/std_thread.h:308
#9  std::thread::_State_impl<std::thread::_Invoker<std::tuple<Client::start_tick_thread()::<lambda()> > > >::_M_run(void) (this=<optimized out>) at /usr/include/c++/14/bits/std_thread.h:253
kotresh:build$ grep Object gdb.now.txt 
#4  0x00007fefe052b1f3 in ObjectCacher::flusher_entry (this=0x17b8e4c0) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/ObjectCacher.cc:2005
#5  0x00007fefe053cf43 in ObjectCacher::FlusherThread::entry (this=<optimized out>) at /home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/src/osdc/ObjectCacher.h:439

kotreshhr · 2024-10-23T19:30:50Z

While testing the fio with nfs ganesha I hit a osd crash. I have reported that here https://tracker.ceph.com/issues/68688?next_issue_id=68683
I think this is not related to this particular fix.

src/client/Client.cc

* refs/pull/60411/head: qa: Add async io test to nfs ganesha using fio client: Fix a deadlock when osd is full Reviewed-by: Venky Shankar <[email protected]> Reviewed-by: Patrick Donnelly <[email protected]> Reviewed-by: Dhairya Parmar <[email protected]>

vshankar · 2024-11-28T10:23:30Z

(running this through tests again)

vshankar · 2024-11-28T10:27:40Z

https://pulpito.ceph.com/vshankar-2024-11-28_10:25:29-fs:nfs-wip-vshankar-testing-20241118.055430-debug-testing-default-smithi/

vshankar · 2024-11-28T12:38:32Z

https://pulpito.ceph.com/vshankar-2024-11-28_10:25:29-fs:nfs-wip-vshankar-testing-20241118.055430-debug-testing-default-smithi/

duh! all 10 jobs failed with

Test failure: test_async_io_fio (tasks.cephfs.test_nfs.TestNFS)

@kotreshhr o_O

vshankar · 2024-11-28T12:39:12Z

https://pulpito.ceph.com/vshankar-2024-11-28_10:25:29-fs:nfs-wip-vshankar-testing-20241118.055430-debug-testing-default-smithi/

duh! all 10 jobs failed with

Test failure: test_async_io_fio (tasks.cephfs.test_nfs.TestNFS)

@kotreshhr o_O

@kotreshhr could be an offending PR in https://tracker.ceph.com/issues/68968

kotreshhr · 2024-11-28T12:53:35Z

https://pulpito.ceph.com/vshankar-2024-11-28_10:25:29-fs:nfs-wip-vshankar-testing-20241118.055430-debug-testing-default-smithi/

duh! all 10 jobs failed with

Test failure: test_async_io_fio (tasks.cephfs.test_nfs.TestNFS)

@kotreshhr o_O

@kotreshhr could be an offending PR in https://tracker.ceph.com/issues/68968

@vshankar This run doesn't contain this fix PR60411 ?

vshankar · 2024-11-28T12:58:44Z

https://pulpito.ceph.com/vshankar-2024-11-28_10:25:29-fs:nfs-wip-vshankar-testing-20241118.055430-debug-testing-default-smithi/

duh! all 10 jobs failed with

Test failure: test_async_io_fio (tasks.cephfs.test_nfs.TestNFS)

@kotreshhr o_O

@kotreshhr could be an offending PR in https://tracker.ceph.com/issues/68968

@vshankar This run doesn't contain this fix PR60411 ?

It does. The tracker update (https://tracker.ceph.com/issues/68968#note-1) was just for my tracking so that I don't accidentally merge the change. The test branch still has the change. See: https://github.com/vshankar/ceph/commits/wip-vshankar-testing-20241118.055430-debug/

kotreshhr · 2024-11-28T13:08:19Z

https://pulpito.ceph.com/vshankar-2024-11-28_10:25:29-fs:nfs-wip-vshankar-testing-20241118.055430-debug-testing-default-smithi/

duh! all 10 jobs failed with

Test failure: test_async_io_fio (tasks.cephfs.test_nfs.TestNFS)

@kotreshhr o_O

@kotreshhr could be an offending PR in https://tracker.ceph.com/issues/68968

@vshankar This run doesn't contain this fix PR60411 ?

It does. The tracker update (https://tracker.ceph.com/issues/68968#note-1) was just for my tracking so that I don't accidentally merge the change. The test branch still has the change. See: https://github.com/vshankar/ceph/commits/wip-vshankar-testing-20241118.055430-debug/

@vshankar The failures are in read as seen below. Could this be because of #60507 ? This fix should not affect read as this is write path.

2024-11-28T10:51:11.894 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=49152, buflen=16384
2024-11-28T10:51:12.000 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=114688, buflen=16384
2024-11-28T10:51:12.000 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=16384, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=196608, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=98304, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=32768, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=163840, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=0, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=131072, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=147456, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=360448, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=442368, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=376832, buflen=16384
2024-11-28T10:51:12.001 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=311296, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=65536, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=262144, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=294912, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=475136, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=327680, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=458752, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=524288, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=425984, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=81920, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=409600, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=507904, buflen=16384
2024-11-28T10:51:12.002 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=491520, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=180224, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=245760, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=229376, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=212992, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=278528, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=393216, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=344064, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=573440, buflen=16384
2024-11-28T10:51:12.003 INFO:teuthology.orchestra.run.smithi176.stderr:fio: io_u error on file /mnt/fio/fio.randrw.test: Input/output error: read offset=655360, buflen=16384
:

vshankar · 2024-11-28T13:10:15Z

https://pulpito.ceph.com/vshankar-2024-11-28_10:25:29-fs:nfs-wip-vshankar-testing-20241118.055430-debug-testing-default-smithi/

duh! all 10 jobs failed with

Test failure: test_async_io_fio (tasks.cephfs.test_nfs.TestNFS)

@kotreshhr o_O

@kotreshhr could be an offending PR in https://tracker.ceph.com/issues/68968

@vshankar This run doesn't contain this fix PR60411 ?

It does. The tracker update (https://tracker.ceph.com/issues/68968#note-1) was just for my tracking so that I don't accidentally merge the change. The test branch still has the change. See: https://github.com/vshankar/ceph/commits/wip-vshankar-testing-20241118.055430-debug/

@vshankar The failures are in read as seen below. Could this be because of #60507 ? This fix should not affect read as this is write path.

It's possible, yes. I'll have to double check though.

* refs/pull/60411/head: qa: Add async io test to nfs ganesha using fio client: Fix a deadlock when osd is full Reviewed-by: Dhairya Parmar <[email protected]> Reviewed-by: Venky Shankar <[email protected]> Reviewed-by: Patrick Donnelly <[email protected]>

vshankar · 2024-11-29T19:03:17Z

https://pulpito.ceph.com/vshankar-2024-11-28_10:25:29-fs:nfs-wip-vshankar-testing-20241118.055430-debug-testing-default-smithi/

duh! all 10 jobs failed with

Test failure: test_async_io_fio (tasks.cephfs.test_nfs.TestNFS)

@kotreshhr o_O

@kotreshhr could be an offending PR in https://tracker.ceph.com/issues/68968

@vshankar This run doesn't contain this fix PR60411 ?

It does. The tracker update (https://tracker.ceph.com/issues/68968#note-1) was just for my tracking so that I don't accidentally merge the change. The test branch still has the change. See: https://github.com/vshankar/ceph/commits/wip-vshankar-testing-20241118.055430-debug/

@vshankar The failures are in read as seen below. Could this be because of #60507 ? This fix should not affect read as this is write path.

It's possible, yes. I'll have to double check though.

Still seeing failures without #60507. See: https://pulpito.ceph.com/vshankar-2024-11-29_15:07:04-fs-wip-vshankar-testing-20241129.105255-debug-testing-default-smithi/8014609/

QA tracker: https://tracker.ceph.com/issues/68968

Unfortunately, I didn't get time to debug it.

Problem: When osd is full, the client receives the notification and cancels the ongoing writes. If the ongoing writes are async, it could cause a dead lock as the async callback registered also takes the 'client_lock' which the handle_osd_map takes at the beginning. The op_cancel_writes calls the callback registered for the async write synchronously holding the 'client_lock' causing the deadlock. Earlier approach: It was tried to solve this issue by calling 'op_cancel_writes' without holding 'client_lock'. But this failed lock dependency between objecter's 'rwlock' and async write's callback taking 'client_lock'. The 'client_lock' should always be taken before taking 'rwlock'. So this approach is dropped against the current approach. Solution: Use C_OnFinisher for objecter async write callback i.e., wrap the async write's callback using the Finisher. This queues the callback to the Finisher's context queue which the finisher thread picks up and executes thus avoiding the deadlock. Testing: The fix is tested in the vstart cluster with the following reproducer. 1. Mount the cephfs volume using nfs-ganesha at /mnt 2. Run fio on /mnt on one terminal 3. On the other terminal, blocklist the nfs client session 4. The fio would hang It is reproducing in the vstart cluster most of the times. I think that's because it's slow. The same test written for teuthology is not reproducing the issue. The test expects one or more writes to be on going in rados when the client is blocklisted for the deadlock to be hit. Stripped down version of Traceback: ---------- 0 0x00007f4d77274960 in __lll_lock_wait () 1 0x00007f4d7727aff2 in pthread_mutex_lock@@GLIBC_2.2.5 () 2 0x00007f4d7491b0a1 in __gthread_mutex_lock (__mutex=0x7f4d200f99b0) 3 std::mutex::lock (this=<optimized out>) 4 std::scoped_lock<std::mutex>::scoped_lock (__m=..., this=<optimized out>, this=<optimized out>, __m=...) 5 Client::C_Lock_Client_Finisher::finish (this=0x7f4ca0103550, r=-28) 6 0x00007f4d74888dfd in Context::complete (this=0x7f4ca0103550, r=<optimized out>) 7 0x00007f4d7498850c in std::__do_visit<...>(...) (__visitor=...) 8 std::visit<Objecter::Op::complete(...) (__visitor=...) 9 Objecter::Op::complete(...) (e=..., e=..., r=-28, ec=..., f=...) 10 Objecter::Op::complete (e=..., r=-28, ec=..., this=0x7f4ca022c7f0) 11 Objecter::op_cancel (this=0x7f4d200fab20, s=<optimized out>, tid=<optimized out>, r=-28) 12 0x00007f4d7498ea12 in Objecter::op_cancel_writes (this=0x7f4d200fab20, r=-28, pool=103) 13 0x00007f4d748e1c8e in Client::_handle_full_flag (this=0x7f4d200f9830, pool=103) 14 0x00007f4d748ed20c in Client::handle_osd_map (m=..., this=0x7f4d200f9830) 15 Client::ms_dispatch2 (this=0x7f4d200f9830, m=...) 16 0x00007f4d75b8add2 in Messenger::ms_deliver_dispatch (m=..., this=0x7f4d200ed3e0) 17 DispatchQueue::entry (this=0x7f4d200ed6f0) 18 0x00007f4d75c27fa1 in DispatchQueue::DispatchThread::entry (this=<optimized out>) 19 0x00007f4d77277c02 in start_thread () 20 0x00007f4d772fcc40 in clone3 () -------- Fixes: https://tracker.ceph.com/issues/68641 Signed-off-by: Kotresh HR <[email protected]>

kotreshhr · 2024-12-19T06:18:12Z

The fio test is carved out as separate PR #61143 to run against the main and validate if the issue mentioned in #60411 (comment) exists in main as well.

kotreshhr · 2024-12-19T10:07:14Z

jenkins test make check

vshankar · 2024-12-20T09:11:32Z

@kotreshhr I spoke too soon -- similar failure without this change. See: https://pulpito.ceph.com/vshankar-2024-12-19_10:00:45-fs-wip-vshankar-testing-20241219.063429-debug-testing-default-smithi/8044375/

kotreshhr · 2024-12-20T09:51:44Z

@kotreshhr I spoke too soon -- similar failure without this change. See: https://pulpito.ceph.com/vshankar-2024-12-19_10:00:45-fs-wip-vshankar-testing-20241219.063429-debug-testing-default-smithi/8044375/

ok, then there is a bug in main branch ?

vshankar · 2024-12-20T10:11:41Z

@kotreshhr I spoke too soon -- similar failure without this change. See: https://pulpito.ceph.com/vshankar-2024-12-19_10:00:45-fs-wip-vshankar-testing-20241219.063429-debug-testing-default-smithi/8044375/

ok, then there is a bug in main branch ?

yes. I'll create a redmine tracker for the failure.

This fix can be merged then.

kotreshhr · 2024-12-27T15:02:56Z

@kotreshhr I spoke too soon -- similar failure without this change. See: https://pulpito.ceph.com/vshankar-2024-12-19_10:00:45-fs-wip-vshankar-testing-20241219.063429-debug-testing-default-smithi/8044375/

ok, then there is a bug in main branch ?

yes. I'll create a redmine tracker for the failure.

This fix can be merged then.

@vshankar The qa test PR #61143 is merged. And you mentioned the issue exists (#60411 (comment)) without this PR. How do we proceed here ?

vshankar · 2024-12-27T16:07:51Z

@kotreshhr I spoke too soon -- similar failure without this change. See: https://pulpito.ceph.com/vshankar-2024-12-19_10:00:45-fs-wip-vshankar-testing-20241219.063429-debug-testing-default-smithi/8044375/

ok, then there is a bug in main branch ?

yes. I'll create a redmine tracker for the failure.
This fix can be merged then.

@vshankar The qa test PR #61143 is merged. And you mentioned the issue exists (#60411 (comment)) without this PR. How do we proceed here ?

I'm running this through a sub-suite and will merge. Should be done tomorrow.

vshankar · 2024-12-27T16:25:34Z

This PR is under test in https://tracker.ceph.com/issues/69375.

vshankar · 2024-12-27T16:36:32Z

@kotreshhr I could have merged it with running it through QA again, but I included it just so that the change is tested again.

I don't expect anything to fail though (apart from the failure we already have).

vshankar

https://tracker.ceph.com/projects/cephfs/wiki/Main#wip-vshankar-testing-20241227162501-debug

kotreshhr requested review from a team and vshankar October 21, 2024 11:02

kotreshhr assigned vshankar Oct 21, 2024

github-actions bot added the cephfs Ceph File System label Oct 21, 2024

dparmar18 reviewed Oct 22, 2024

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

vshankar assigned batrick Oct 22, 2024

batrick approved these changes Oct 22, 2024

View reviewed changes

batrick removed their assignment Oct 22, 2024

vshankar requested a review from batrick October 28, 2024 02:32

vshankar reviewed Oct 28, 2024

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

github-actions bot added nfs pybind tests labels Oct 29, 2024

vshankar self-requested a review November 26, 2024 13:45

kotreshhr force-pushed the aio-nfsganesha-osd-full-hang branch from 9b1e196 to 60c5801 Compare December 19, 2024 06:16

vshankar removed the wip-vshankar-testing label Dec 19, 2024

vshankar added the wip-vshankar-testing1 label Dec 24, 2024

vshankar approved these changes Jan 2, 2025

View reviewed changes

vshankar merged commit 557ccdc into ceph:main Jan 2, 2025

vshankar removed the wip-vshankar-testing1 label Jan 2, 2025

kotreshhr mentioned this pull request Mar 6, 2025

squid: client: Fix libcephfs aio metadata corruption. #60245

Closed

joscollin mentioned this pull request Oct 8, 2025

squid: client: Fix a deadlock when osd is full #65821

Open

Conversation

kotreshhr commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

dparmar18 commented Oct 21, 2024

Uh oh!

vshankar commented Oct 21, 2024

Uh oh!

dparmar18 commented Oct 21, 2024

Uh oh!

dparmar18 commented Oct 21, 2024

Uh oh!

kotreshhr commented Oct 21, 2024

Uh oh!

kotreshhr commented Oct 22, 2024

Uh oh!

vshankar commented Oct 22, 2024

Uh oh!

Uh oh!

kotreshhr commented Oct 22, 2024

Uh oh!

batrick commented Oct 22, 2024

Uh oh!

kotreshhr commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

batrick commented Oct 22, 2024

Uh oh!

vshankar commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kotreshhr commented Oct 23, 2024

Uh oh!

vshankar commented Oct 23, 2024

Uh oh!

kotreshhr commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kotreshhr commented Oct 23, 2024

Uh oh!

kotreshhr commented Oct 23, 2024

Uh oh!

Uh oh!

vshankar commented Nov 28, 2024

Uh oh!

vshankar commented Nov 28, 2024

Uh oh!

vshankar commented Nov 28, 2024

Uh oh!

vshankar commented Nov 28, 2024

Uh oh!

kotreshhr commented Nov 28, 2024

Uh oh!

vshankar commented Nov 28, 2024

Uh oh!

kotreshhr commented Nov 28, 2024

Uh oh!

vshankar commented Nov 28, 2024

Uh oh!

vshankar commented Nov 29, 2024

Uh oh!

kotreshhr commented Dec 19, 2024

Uh oh!

kotreshhr commented Dec 19, 2024

Uh oh!

vshankar commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kotreshhr commented Dec 20, 2024

Uh oh!

vshankar commented Dec 20, 2024

Uh oh!

kotreshhr commented Dec 27, 2024

Uh oh!

vshankar commented Dec 27, 2024

Uh oh!

vshankar commented Dec 27, 2024

Uh oh!

kotreshhr commented Oct 21, 2024 •

edited

Loading

kotreshhr commented Oct 22, 2024 •

edited

Loading

vshankar commented Oct 23, 2024 •

edited

Loading

kotreshhr commented Oct 23, 2024 •

edited

Loading

vshankar commented Dec 20, 2024 •

edited

Loading