Fix concurrency/exception bugs in asyncFetchPackages#7929
Fix concurrency/exception bugs in asyncFetchPackages#7929Mikolaj merged 11 commits intohaskell:masterfrom
Conversation
This comment has been minimized.
This comment has been minimized.
|
I'm putting some finishing touches on these commits, then I think this should be ready for review. Need some help with the bootstrap update though, since I don't have a Linux environment ready. |
|
Great job. Re bootstrap files, I cargo-culted them recently (IIRC you just run a script), but I have no idea how they work. Please ask about them on #hackage IRC/Matrix --- I'd also like to learn. |
0bf9d04 to
7b8b934
Compare
|
Any idea who would be a good person to ask to review this? (And is there some way I can make it easier to review?) |
|
I suppose it should be tested in windows, in addition to Linux, will try if I have time but don't wait for me |
|
We have waited so long, we can wait a little longer. |
|
i've tried in windows and it has responded pretty well to ctrl+c, i had only to press it more than one time: But it is infinitely better than now for sure, great job |
|
Thanks for testing! Unfortunately I don't have access to Windows, would love to dig in to what's happening there. I hope that using |
|
@Mergifyio rebase master |
✅ Branch has been successfully rebased |
bb14fb1 to
da2330c
Compare
|
amazing |
|
CI jobs are stuck, so let me merge manually. |
|
@Mergifyio rebase |
- Sort modules and tests alphabetically (they were maddeningly close to alphabetic) - Consistently label unit test groups at top-level
This primarily trigger concurrency bugs that are fixed in the follow-up commits.
This is a step towards fixing interrupt handling during asynchronous download. The concrete bug fixed by this changes is that `try` catches asynchronous exceptions, preventing `withAsync` from interrupting it when its `body` action finishes prematurely. This manifests during `cabal build` when hitting Ctrl-C during download, e.g.: 1. Ctrl-C interrupts a curl process 2. This interrupt is converted into a UserInterrupt exception via the process package's delegate_ctlc functionality. 3. UserInterrupt bubbles up through `fetchPackage`, is caught by the `try` (fine) and writen to the result mvar. Meanwhile, `fetchPackage` continues its loop, starting to fetch the next package. 4. Some `rebuildTarget` is waiting for the interrupted download; `waitAsyncFetchPackage` rethrows UserInterrupt in that thread. 5. UserInterrupt bubbles up through `collectJob`, `execute`, the `body` action. 6. `withAsync`'s finalizer cancels the `fetchPackages` async. 7. The `fetchPackages` async receives the AsyncCancelled exception while fetching the next package (see 3. above). This interrupts the download action (it shouldn't matter whether we happen to be running curl again or not), and AsyncCancelled bubbles up through `fetchPackage`, is caught by the `try` (not fine!) and written to the mvar. But no-one is reading that mvar anymore because the build job already aborted (step 5), and that AsyncCancelled exception is lost. 8. `fetchPackages` keeps doing its thing, exiting eventually. Note that this change affects both instances of `try` in this story: `UserInterrupt` is also an asynchronous exception, so after the change the `UserInterrupt` takes a different path from step 3. Followup commits fix bugs along those paths.
This fixes deadlock in asyncFetchPackages: withAsync action1 (\_ -> action2) loses exceptions thrown by action1. If in addition, action2 has a dependency on data produced by action1 (as is the case here), this will block indefinitely. (This bug caused some of the new tests to hang, since after the change to use async-safe try, it became easier for fetchPackages to throw an exception.)
If `withTempFileName` receives an asynchronous exception (e.g. a canceled async), the bracket cleanup handler will attempt to remove the temporary file. This can fail with an IO exception. Regular bracket then throws that IO exception, swallowing the asynchronous exception. To calling code, this appears no different from an IO exception thrown from the body, and it won't be able to tell that it should be exiting promptly. This manifests concretely during temporary file clean-up of `asyncFetchPackages` on Windows (seen during unit testing in CI), where temporary file removal fails (on GHC 8.4), which leads to `concurrently` failing to cancel the outstanding download because it's handled like a regular download failure.
✅ Branch has been successfully rebased |
da2330c to
458bc3a
Compare
This fixes Ctrl-C not interrupting cabal properly while it is downloading. I believe
that might be #6322, although it would be good to verify that.
There are
twothree concrete bugs:trycatches asynchronous exceptions, preventingwithAsyncfrom interrupting it when itsbodyaction finishes prematurely.withAsyncis not safe since we don't wait for the async and hence lose exceptions it might throwbracketinwithTempFileNamecan "convert"AsyncCancelledinto a regular IO exceptionThe first bug manifests during
cabal buildwhen hitting Ctrl-C duringdownload, e.g.:
UserInterruptexceptionvia the process package's
delegate_ctlcfunctionality.fetchPackage, is caught bythe
try(fine) and written to the result mvar.Meanwhile,
fetchPackagecontinues its loop, starting tofetch the next package.
rebuildTargetis waiting for the interrupted download;waitAsyncFetchPackagerethrowsUserInterruptin that thread.UserInterruptbubbles up throughcollectJob,execute, thebodyaction.withAsync's finalizer cancels thefetchPackagesasync.fetchPackagesasync receives theAsyncCancelledexceptionwhile fetching the next package (see 3. above). This interrupts
the download action (it shouldn't matter whether we happen to
be running curl again or not), and
AsyncCancelledbubbles upthrough
fetchPackage, is caught by thetry(not fine!) andwritten to the mvar. But no-one is reading that mvar anymore
because the build job already aborted (step 5), and that
AsyncCancelledexception is lost.fetchPackageskeeps doing its thing, exiting eventually.To fix this, use
tryfrom safe-exceptions, which doesn't catchasynchronous exceptions. Note that this change affects both
instances of
tryin this story:UserInterruptis also an asynchronousexception, so after the change the
UserInterrupttakes a different pathfrom step 3.
This relates to the second bug, uncovered during testing of the fix for
the first bug:
UserInterruptcan now cause the async part ofwithAsyncto crash. That one is fixed by usingconcurrentlyinsteadof
withAsync.Finally, the third bug showed on Windows CI during testing, where temporary
file cleanup trigged by async cancellation fails with an IO exception, which is
then treated as a regular download failure, while the attempt to cancel the
async is ignored.
cabal buildfor a package with several undownloaded dependencies, and hit Ctrl-C when it gets started. Prior to these changes,cabaldoesn't exit until it's done downloading, afterwards it does. I've checked this on macOS.asyncFetchPackageswith a faked HTTP transport, simulating several failure scenarios where theasyncFetchPackagesmight not exit promptly or even deadlock. Some of these fail before the change, and now pass.Please include the following checklist in your PR: