Skip to content

clickhouse-test: send TERM to all childs (to avoid hung check triggering)#23750

Merged
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
azat:tests-hung-check
May 1, 2021
Merged

clickhouse-test: send TERM to all childs (to avoid hung check triggering)#23750
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
azat:tests-hung-check

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Apr 29, 2021

Hung check like here 1.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 29, 2021
@azat azat marked this pull request as draft April 29, 2021 08:11
@azat azat force-pushed the tests-hung-check branch from 86e5e83 to 1b0990c Compare April 29, 2021 08:33
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this earlier with a separate process group + atexit callbacks:
573983d#diff-3a359de18cacf146f406a7ae332fb47196aa5e0aa430eb4b157a202a3cb8e6e3R578

But that commit was later reverted because it also tried to switch to multithreading instead of multiprocessing, and that didn't go good. SIG_IGN and SIG_DFL were broken then (https://bugs.python.org/issue23395), now they are fixed but not quite but maybe it's not relevant for us.

Copy link
Copy Markdown
Member Author

@azat azat Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this earlier with a separate process group + atexit callbacks:
573983d#diff-3a359de18cacf146f406a7ae332fb47196aa5e0aa430eb4b157a202a3cb8e6e3R578

Thanks for the reference!

SIG_IGN and SIG_DFL were broken then (https://bugs.python.org/issue23395), now they are fixed but not quite but maybe it's not relevant for us.

I didn't see any issues while testing locally, and I think that it is better to kill tests and get some (even non-standard since first part of the fix is to identify the error) error instead of leaving tests in background, since hung check will find them and report, thoughts?

@azat azat force-pushed the tests-hung-check branch from 1b0990c to 6301af9 Compare April 30, 2021 06:16
@azat azat marked this pull request as ready for review April 30, 2021 08:25
@azat azat force-pushed the tests-hung-check branch from 6301af9 to d6c3309 Compare April 30, 2021 20:51
@azat azat changed the title Terminate tests in clickhouse-test to avoid hung check triggering clickhouse-test: send TERM to all childs (to avoid hung check triggering) Apr 30, 2021
…ing)

This is another try of not leaving child processes in clickhouse-test,
first one was in [1] by @akuzm:

  "I tried to do this earlier with a separate process group + atexit callbacks:
  ClickHouse@573983d#diff-3a359de18cacf146f406a7ae332fb47196aa5e0aa430eb4b157a202a3cb8e6e3R578

  But that commit was later reverted because it also tried to switch to
  multithreading instead of multiprocessing, and that didn't go good.
  SIG_IGN and SIG_DFL were broken then
  (https://bugs.python.org/issue23395), now they are fixed but not quite
  but maybe it's not relevant for us."

I looked (only briefly) through that bug report in python, but I don't
see any issues with killing child processes during testing this patch.

Plus to me it is better to get some unknown python error (and fix it
somehow) instead of leaving child processes.

v2: correctly catch INT/TERM/HUP too
@azat azat force-pushed the tests-hung-check branch from d6c3309 to 0b58a14 Compare May 1, 2021 05:47
@alexey-milovidov alexey-milovidov merged commit 1862be7 into ClickHouse:master May 1, 2021
@azat azat deleted the tests-hung-check branch May 1, 2021 21:14
# Move to a new process group and kill it at exit so that we don't have any
# infinite tests processes left
# (new process group is required to avoid killing some parent processes)
os.setpgid(0, 0)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually does not work for everything, since timeout also calls setpgid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants