Fix for https://github.com/typelevel/fs2/issues/3590#3591
Conversation
djspiewak
left a comment
There was a problem hiding this comment.
This is great sleuthing. Also I think it chooses the right tradeoff on cancelation, even though it's somewhat nerve-wracking, since the strong prescription in this ecosystem is to strictly backpressure cancelation and force users to fork if that semantic is undesirable.
This seems to match the style around it. I checked import ordering and there doesn't seem to be a consistent pattern but feel free to reformat as necessary or tell me how you want them done.
|
I admit to not having a great mental model for cancellation nor understanding enough to know how I'd support it, but I presume we could update the recursive wait to check for cancellation. |
|
I don't know if it's the same but I wonder if a similar wait loop would be useful in the PR immediately below this one, right here: |
It already does that :) That's the benefit to autocancelable |
| .make(F.delay(selector.provider.openServerSocketChannel())) { ch => | ||
| F.delay(ch.close()) | ||
| def waitForDeregistration: F[Unit] = | ||
| F.delay(ch.isRegistered()).ifM(F.sleep(2.millis) >> waitForDeregistration, F.unit) |
There was a problem hiding this comment.
Good question. I don't have a great intuition for this but I tried some longer values too and my test case slowed way down. My guess is that the right value for this is going to vary a lot by OS and hardware so having a fixed value is sort of inherently problematic.
Ultimately I chose that value as a duration that seemed like a compromise: short enough that a human waiting on it would never notice but long enough (in computer time) that I was hopefully not sending the computer in a busy loop.
I am happy to try to take some measurements on the hardware I have available, or take suggestions of better compromises, or just document what I did with a comment.
There was a problem hiding this comment.
I guess, to clarify another point, my assumption is that the vast majority of processes that bind to a port as a server, do so once over the lifetime of the application and this loop will happen as the process shuts down, so it seems like adding ~2ms to the shutdown felt like low overhead for something that happens after a server has finished its useful work.
There was a problem hiding this comment.
Based on a quick and dirty measurement or 1000 samples run in a row (with a granularity of 1 ms), I get an average wait around 18 ms with a standard deviation of 13.6. The min was 1ms (so probably less) and the max was 88ms.
On my M2 Max running 1000 times in a row with a granularity of 1ms, I get values ranging from 1 to 88 ms, average just under 18ms and std dev 13.6. The longest sleep is a loop iteration 44 times, the shortest is 1. This feels like 2ms is an okay value, especially considering this work is most likely occurring after a server has served its useful purpose.
|
Thanks! |
This is a potential fix for #3590 which has been plaguing us for some time.
The key insight here is from #3590 (comment) and the JavaDoc for
SelectableChannel.isRegisteredwhich says:This implies that even after the channel is closed, the port may still be bound in the OS, and subsequent attempts to bind that port may fail.
I couldn't find evidence of a way to be notified when a channel is no longer registered beyond simply polling, but since this is likely to happen at the end of a server's life (with the exception of any case where an application repeatedly opens servers on the same port), a short interval of polling (using the IO runtime's non-blocking sleep) is acceptable.