Skip to content

fix: Close fds in child processes#914

Merged
Disservin merged 1 commit intoDisservin:masterfrom
MinetaS:fix-fd-leak
Sep 17, 2025
Merged

fix: Close fds in child processes#914
Disservin merged 1 commit intoDisservin:masterfrom
MinetaS:fix-fd-leak

Conversation

@MinetaS
Copy link
Copy Markdown
Contributor

@MinetaS MinetaS commented Sep 16, 2025

No description provided.

@MinetaS
Copy link
Copy Markdown
Contributor Author

MinetaS commented Sep 16, 2025

Since we dup pipe fds to stdio fds with setup_spawn_file_actions, all pipe fds should be closed after dup otherwise they are leaked.

"self-pipe trick" thing is removed here; I honestly don't see how engines are supposed to use those pipes for signal processing. Can you explain about its use case, I'm curious about it.

EDIT 1:
Now setup_close_file_actions is redundant as O_CLOEXEC flag does clean up fds in child processes during exec. But I'm not that familiar with POSIX libraries so I will leave it to you to arrange.

@Disservin
Copy link
Copy Markdown
Owner

"self-pipe trick" thing is removed here; I honestly don't see how engines are supposed to use those pipes for signal processing. Can you explain about its use case, I'm curious about it.

Okay so the current implementation uses a poll + read, the timeout on the poll is the remaining thinking time of the engine, meaning if poll times out, the engines time ran out and we return, otherwise we know there is data to be read meaning we can read from the pipe.
If a user specified a long tc like 1h, engines that are about to play the first move get a lot of time, if during that time fastchess receives CTRL + C, the poll will wait for a very long time making the sigint seem unresponsive. In this case the self pipe trick is used, where we write some dummy data so that poll gets notified about the data and we break out of the otherwise blocking poll call.

Meaning in your implementation CTRL + C sent to fastchess will take longer, growing with the engines TC. This is not ideal.


Does your change actually lead to child processes being killed when fastchess receives SIGKILL, I assume so right ?


Maybe we don't need the self pipe trick and instead poll every few cycles (TBD) and implement a timeout in some other cheap way.

@Disservin Disservin closed this Sep 17, 2025
@Disservin Disservin reopened this Sep 17, 2025
@Disservin
Copy link
Copy Markdown
Owner

didn't mean to close

@MinetaS
Copy link
Copy Markdown
Contributor Author

MinetaS commented Sep 17, 2025

Okay so the current implementation uses a poll + read, the timeout on the poll is the remaining thinking time of the engine, meaning if poll times out, the engines time ran out and we return, otherwise we know there is data to be read meaning we can read from the pipe.
If a user specified a long tc like 1h, engines that are about to play the first move get a lot of time, if during that time fastchess receives CTRL + C, the poll will wait for a very long time making the sigint seem unresponsive. In this case the self pipe trick is used, where we write some dummy data so that poll gets notified about the data and we break out of the otherwise blocking poll call.

Thank you for the explanation. Although I still think the comment is somewhat misplaced, because both write/read a NULL byte is done in fastchess (if this is what you mean by "write some dummy data") and completely unrelated to child processes' file descriptors.

Meaning in your implementation CTRL + C sent to fastchess will take longer, growing with the engines TC. This is not ideal.

If I understand correctly (as above), my implementation will never affect to how fastchess manages Ctrl+C signals.

Does your change actually lead to child processes being killed when fastchess receives SIGKILL, I assume so right ?

Yes. But I'd like to reiterate that child processes are not "killed". They terminate once fastchess is killed, because the pipes are closed and getline fails because they receive EOF. The reason why they are not terminated in the current version is that there are open file descriptors inherited from fastchess, and among them especially out_pipe_ is still connected to stdin of stockfish and getline is blocked forever.

https://github.com/official-stockfish/Stockfish/blob/fc54d8730174cdb5cfc4f7074b90128e706e4040/src/uci.cpp#L94-L98

    do
    {
        if (cli.argc == 1
            && !getline(std::cin, cmd))  // Wait for an input or an end-of-file (EOF) indication
            cmd = "quit";

@Disservin
Copy link
Copy Markdown
Owner

oh i think i might have misunderstood the actual behavior.. the comment (// keep open for self to pipe trick) was there because I thought it stops working when there's a close but it seems this is not the case and O_CLOEXEC seems to be a good choice here too, didn't have this in mind

@MinetaS
Copy link
Copy Markdown
Contributor Author

MinetaS commented Sep 17, 2025

Oh macOS doesn't have pipe2... need to fix that. But I don't own any macOS environments so cannot test it myself.

@Disservin
Copy link
Copy Markdown
Owner

you can probably use the fcntl way instead everywhere instead of relying on pipe2

@MinetaS
Copy link
Copy Markdown
Contributor Author

MinetaS commented Sep 17, 2025

Should be fixed by now.
Also removed setup_close_file_actions as it doesn't work and is unnecessary when there are multiple child processes.

@Disservin Disservin merged commit 3c33636 into Disservin:master Sep 17, 2025
18 checks passed
@Disservin
Copy link
Copy Markdown
Owner

Haven't tested it but I think this looks good

@Disservin
Copy link
Copy Markdown
Owner

thanks again got around to finally test this and it works splendid

@Disservin
Copy link
Copy Markdown
Owner

okay just for documentation purposes..
I tested the KILL behavior on Windwos too and it looks like there it already works as expected, at least I see no stockfish processes lying around after taskkill /F /IM fastchess.exe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants