Skip to content

Conversation

@alexmiller-apple
Copy link
Contributor

This was a request from some time ago, but also probably likely more important with snapshot backup/restore #1246 adding a fork+exec to the FDB codebase.

Note that in the Boost Asio case, we can't do O_CLOEXEC, because boost claims close-on-exec is too platform specific for Asio to support it. Oh well.

I tested this via starting two local fdbserver instances, and then looking at /proc/<pid>/fdinfo/* and double checking that every FD above 2 (so not stdin/out/err) had O_CLOEXEC set in flags.

socket.set_option(boost::asio::ip::tcp::no_delay(true));
#if defined(__unixish__)
fcntl(socket.native_handle(), F_SETFD, fcntl(socket.native_handle(), F_GETFD) | FD_CLOEXEC);
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've waffled back and forth on if this should go live in Platform.h somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly ambivalent about that, personally. There's certainly some appeal to moving it to platform, but if you left it here that wouldn't bother me.

One question: do we need to do any error checking on these calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, thank you, I totally missed this.

... honestly, if they fail, there's nothing we can meaningfully do. It's not worth crashing the process over. We could log that it errored, and aggressively suppress it to try and reduce its spammyness, but there wouldn't really be a meaningful action to take as a result. I can add one anyway if you'd like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose in the hypothetical scenario where it does fail to do its thing, knowing that it happened may help us to avoid confusion when we see it behaving differently than we expect. I'd be in favor of adding a log event for this with the suppression that you suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been done.

@alexmiller-apple
Copy link
Contributor Author

The MacOS failure here is from #1090, and not this PR.

@alecgrieser
Copy link
Contributor

run correctness please

@alexmiller-apple
Copy link
Contributor Author

Nope, I just went to all the places that I thought would open files. Let me fix those, and do the scrape through everything that calls open as well.

From missed call sites found during/after code review.
@alexmiller-apple
Copy link
Contributor Author

@fdb-builld, test this please

@ajbeamon ajbeamon merged commit 453724b into apple:master Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants