qa/tasks: watchdog should terminate thrasher#58282
Conversation
qa/tasks/daemonwatchdog.py
Outdated
| for thrasher in self.thrashers: | ||
| if thrasher.exception is not None: | ||
| self.log("{name} failed".format(name=thrasher.name)) | ||
| thrasher.do_join() |
There was a problem hiding this comment.
Only some thrashers implement do_join and honestly it's a poorly named method. There already is a Greenlet.join method which is an analogue to the pthread_join. Those methods do not imply "stopping" the thread at all.
Let's change thrasher.py:
def Thrasher.stop(self):
raise NotImplementedError(...)
and implement for thrashers not inheriting from ThrasherGreenlet (which has its own stopping logic). Then:
def Thrasher.stop_and_join(self):
self.stop()
return self.join()
Call that here.
There was a problem hiding this comment.
@batrick thanks! i added those changes and also fixed the others thrashers
880d118 to
729f28a
Compare
|
jenkins test make check |
|
jenkins test api |
|
This PR is under test in https://tracker.ceph.com/issues/66822. |
|
@NitzanMordhai @ljflores @rzarzynski |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
@NitzanMordhai can you resolve the conflicts? |
If a thrasher exception occurs, the do_dump_ops thread will continue looping until the Teuthology timeout is reached. The watchdog should terminate the thrasher to free up resources. Fixes: https://tracker.ceph.com/issues/66698 Signed-off-by: Nitzan Mordechai <[email protected]>
Thrashers that do not inherit from ThrasherGreenlet previously used a method called do_join, which combined stop and join functionality. To ensure consistency and clarity, we want all thrashers to use separate stop, join, and stop_and_join methods. This commit renames methods and implements missing stop and stop_and_join methods in thrashers that did not inherit from ThrasherGreenlet. Fixes: https://tracker.ceph.com/issues/66698 Signed-off-by: Nitzan Mordechai <[email protected]>
729f28a to
a035b5a
Compare
done |
|
Sorry to comment on a merged PR, but this change seems to be causing failures like: https://pulpito.ceph.com/vshankar-2024-08-07_04:36:45-fs-wip-vshankar-testing-20240806.162829-debug-testing-default-smithi/7841593/ |
|
@vshankar thanks, missed that one, i created https://tracker.ceph.com/issues/67496 and working on it |
stopping was set by PR ceph#58282 to bool instead of set() Fixes: https://tracker.ceph.com/issues/67496 Signed-off-by: Nitzan Mordechai <[email protected]>
If a thrasher exception occurs, the do_dump_ops thread will continue looping until the Teuthology timeout is reached.
The watchdog should terminate the thrasher to free up resources.
Fixes: https://tracker.ceph.com/issues/66698
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e