Skip to content

qa/tasks: watchdog should terminate thrasher#58282

Merged
NitzanMordhai merged 2 commits intoceph:mainfrom
NitzanMordhai:wip-nitzan-daemonwatchdog-should-terminate-thrasher-when-bark
Aug 6, 2024
Merged

qa/tasks: watchdog should terminate thrasher#58282
NitzanMordhai merged 2 commits intoceph:mainfrom
NitzanMordhai:wip-nitzan-daemonwatchdog-should-terminate-thrasher-when-bark

Conversation

@NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Jun 26, 2024

If a thrasher exception occurs, the do_dump_ops thread will continue looping until the Teuthology timeout is reached.
The watchdog should terminate the thrasher to free up resources.

Fixes: https://tracker.ceph.com/issues/66698

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@NitzanMordhai NitzanMordhai requested a review from jdurgin June 26, 2024 13:09
@github-actions github-actions bot added the tests label Jun 26, 2024
@NitzanMordhai NitzanMordhai requested review from a team and rzarzynski June 26, 2024 13:24
for thrasher in self.thrashers:
if thrasher.exception is not None:
self.log("{name} failed".format(name=thrasher.name))
thrasher.do_join()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some thrashers implement do_join and honestly it's a poorly named method. There already is a Greenlet.join method which is an analogue to the pthread_join. Those methods do not imply "stopping" the thread at all.

Let's change thrasher.py:

def Thrasher.stop(self):
   raise NotImplementedError(...)

and implement for thrashers not inheriting from ThrasherGreenlet (which has its own stopping logic). Then:

def Thrasher.stop_and_join(self):
  self.stop()
  return self.join()

Call that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick thanks! i added those changes and also fixed the others thrashers

@batrick batrick removed the needs-qa label Jul 1, 2024
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-daemonwatchdog-should-terminate-thrasher-when-bark branch from 880d118 to 729f28a Compare July 2, 2024 08:55
@NitzanMordhai NitzanMordhai requested a review from batrick July 2, 2024 08:55
@batrick
Copy link
Member

batrick commented Jul 3, 2024

jenkins test make check

@batrick
Copy link
Member

batrick commented Jul 3, 2024

jenkins test api

@batrick
Copy link
Member

batrick commented Jul 3, 2024

@batrick batrick added the core label Jul 23, 2024
@batrick
Copy link
Member

batrick commented Jul 23, 2024

@NitzanMordhai @ljflores @rzarzynski fs QA run looked fine. I suspect you want to run this through rados?

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@NitzanMordhai
Copy link
Contributor Author

@yuriw
Copy link
Contributor

yuriw commented Aug 4, 2024

@ljflores
Copy link
Member

ljflores commented Aug 5, 2024

@NitzanMordhai can you resolve the conflicts?

If a thrasher exception occurs, the do_dump_ops thread will continue
looping until the Teuthology timeout is reached.
The watchdog should terminate the thrasher to free up resources.

Fixes: https://tracker.ceph.com/issues/66698
Signed-off-by: Nitzan Mordechai <[email protected]>
Thrashers that do not inherit from ThrasherGreenlet previously used a
method called do_join, which combined stop and join functionality. To
ensure consistency and clarity, we want all thrashers to use separate
stop, join, and stop_and_join methods.

This commit renames methods and implements missing stop and stop_and_join
methods in thrashers that did not inherit from ThrasherGreenlet.

Fixes: https://tracker.ceph.com/issues/66698
Signed-off-by: Nitzan Mordechai <[email protected]>
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-daemonwatchdog-should-terminate-thrasher-when-bark branch from 729f28a to a035b5a Compare August 6, 2024 06:57
@NitzanMordhai
Copy link
Contributor Author

@NitzanMordhai can you resolve the conflicts?

done

@NitzanMordhai NitzanMordhai merged commit 58a668d into ceph:main Aug 6, 2024
@NitzanMordhai NitzanMordhai deleted the wip-nitzan-daemonwatchdog-should-terminate-thrasher-when-bark branch August 6, 2024 11:26
@vshankar
Copy link
Contributor

@NitzanMordhai
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants