-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Fix process leaks in tool tests #51382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie. Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
| } on FileSystemException catch (error) { | ||
| print('Failed to delete ${directory.path}: $error'); | ||
| } | ||
| directory.deleteSync(recursive: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to leave this for now in case we see flakes around this pop up again. Should I add a TODO to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual reason these files were failing to delete was because we had processes that weren't being killed before trying to delete files in use, which on Windows causes exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you link to the Dart SDK bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that's the right bug to link here - if anything I'd say file a new bug to track removing this code once we're convinced the current change hasn't re-introduced old flakes.
|
|
||
| // TODO(dnfield): This is racy. If dart-lang/sdk#40759 can be resolved, we | ||
| // should remove this. | ||
| Future<void> _ensureDead(int pid, int tries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uber nit: shouldn't be called ensureX if it can't ensure it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to tryWaitForPidDeath, and moved it to test_utils.dart so it can be reused in the daemon_mode_test for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
|
This seems reasonable to me, but I'd wait for @zanderso to take a look too |
|
Initial thoughts:
|
dev/ci/docker_linux/Dockerfile
Outdated
| ca-certificates \ | ||
| gnupg | ||
| gnupg \ | ||
| procps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherfujino - should I split this into a separate PR?
This is needed so that kill is available in the container.
|
@zanderso - I updated the PR description. The closest bug I can think of to this is #29141. The resolution of that issue was to make an outer harness just kill processes at the end of it. We can't quite do that in these tests. As to whether it would be good enough to avoid the |
|
We also might still have a real bug such that in certain debugging scenarios, the Windows VM leaks. That seems to be the case in these tests, without explicitly killing it. |
|
Hm. This seemed to work locally on a Windows box, but I think the Windows instance we're using is just slower... Not sure why the process is still sticking around. |
|
Discussed offline. This approach will still be flaky, and it's not clear that it's any less flaky than the current approach (even though it may avoid flakes in many common cases). I'm going to file an issue specifically about some of the things I've discovered here. I'll leave this open for now - hopefully @Hixie might have some good input on this as the author of many of these tests. |
|
I'm going to close this - it's not ready as is and I don't have a good sense of when it will be. |
The integration tests in the tool invoke Flutter to:
These processes leak. The shell script/batch script files are not set up to wait for the
dartprocess they invoke to finish, and thus they can exit (and complete theprocess.exitCodefuture) before the Dart VM process they fork actually finishes. On Windows, this is particularly bad, since the leaked process(es) end up holding an exclusive lock on the temp directory, meaning we fail to clean it up. We've been swallowing that under the assumption that it was a permissions issue, but it appears to not be a permissions issue and just be an issue of leaked processes.This patch adds some additional waits/attempts to terminate the Dart VM(s) that the tests spawn. In cases where we cannot be sure that the spawned VM has terminated, it adds some additional OS specific waiting to check that the PID the VM created is no longer active (on Windows in
tasklistand on POSIX in response tokill -0).We should be able to further improve the test_driver logic if we can get dart-lang/sdk#40759 resolved. Alternatively, we could look at making
flutter.batwait for Dart to exit when it launches, but I'm a bit scared to touch that logic since it has some comments about how it has to work in a bunch of scenarios I'm not sure we have tests for (/cc @goderbauer). The Dart feature would be useful regardless of this particular instance of usage though.