Skip to content

Fix for https://github.com/typelevel/fs2/issues/3590#3591

Merged
mpilquist merged 4 commits intotypelevel:mainfrom
mtomko:main
Aug 26, 2025
Merged

Conversation

@mtomko
Copy link
Copy Markdown
Contributor

@mtomko mtomko commented Aug 22, 2025

This is a potential fix for #3590 which has been plaguing us for some time.

The key insight here is from #3590 (comment) and the JavaDoc for SelectableChannel.isRegistered which says:

Due to the inherent delay between key cancellation and channel deregistration, a channel may remain registered for some time after all of its keys have been cancelled. A channel may also remain registered for some time after it is closed.

This implies that even after the channel is closed, the port may still be bound in the OS, and subsequent attempts to bind that port may fail.

I couldn't find evidence of a way to be notified when a channel is no longer registered beyond simply polling, but since this is likely to happen at the end of a server's life (with the exception of any case where an application repeatedly opens servers on the same port), a short interval of polling (using the IO runtime's non-blocking sleep) is acceptable.

Copy link
Copy Markdown
Member

@djspiewak djspiewak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great sleuthing. Also I think it chooses the right tradeoff on cancelation, even though it's somewhat nerve-wracking, since the strong prescription in this ecosystem is to strictly backpressure cancelation and force users to fork if that semantic is undesirable.

This seems to match the style around it. I checked import ordering and
there doesn't seem to be a consistent pattern but feel free to reformat
as necessary or tell me how you want them done.
@mtomko
Copy link
Copy Markdown
Contributor Author

mtomko commented Aug 23, 2025

I admit to not having a great mental model for cancellation nor understanding enough to know how I'd support it, but I presume we could update the recursive wait to check for cancellation.

@mtomko
Copy link
Copy Markdown
Contributor Author

mtomko commented Aug 23, 2025

I don't know if it's the same but I wonder if a similar wait loop would be useful in the PR immediately below this one, right here:

https://github.com/typelevel/fs2/pull/3589/files#diff-94fe0071922f73eaef1a6257af9dff8f0b7d1cac23ecf1bbee9018607884456eR43

@djspiewak
Copy link
Copy Markdown
Member

update the recursive wait to check for cancellation

It already does that :) That's the benefit to autocancelable flatMap. So you implemented it correctly imo.

.make(F.delay(selector.provider.openServerSocketChannel())) { ch =>
F.delay(ch.close())
def waitForDeregistration: F[Unit] =
F.delay(ch.isRegistered()).ifM(F.sleep(2.millis) >> waitForDeregistration, F.unit)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2 (vs 1 or 10 or ...)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don't have a great intuition for this but I tried some longer values too and my test case slowed way down. My guess is that the right value for this is going to vary a lot by OS and hardware so having a fixed value is sort of inherently problematic.

Ultimately I chose that value as a duration that seemed like a compromise: short enough that a human waiting on it would never notice but long enough (in computer time) that I was hopefully not sending the computer in a busy loop.

I am happy to try to take some measurements on the hardware I have available, or take suggestions of better compromises, or just document what I did with a comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, to clarify another point, my assumption is that the vast majority of processes that bind to a port as a server, do so once over the lifetime of the application and this loop will happen as the process shuts down, so it seems like adding ~2ms to the shutdown felt like low overhead for something that happens after a server has finished its useful work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a quick and dirty measurement or 1000 samples run in a row (with a granularity of 1 ms), I get an average wait around 18 ms with a standard deviation of 13.6. The min was 1ms (so probably less) and the max was 88ms.

On my M2 Max running 1000 times in a row with a granularity of 1ms, I
get values ranging from 1 to 88 ms, average just under 18ms and std dev
13.6.

The longest sleep is a loop iteration 44 times, the shortest is 1. This
feels like 2ms is an okay value, especially considering this work is
most likely occurring after a server has served its useful purpose.
@mpilquist mpilquist merged commit c093a71 into typelevel:main Aug 26, 2025
16 checks passed
@mpilquist
Copy link
Copy Markdown
Member

Thanks!

mtomko pushed a commit to mtomko/fs2 that referenced this pull request Sep 4, 2025
This was referenced Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants