Skip to content

[2.7] bpo-35757: subprocess.Popen: optimize close_fds for Linux#11584

Closed
kolyshkin wants to merge 1 commit intopython:2.7from
kolyshkin:close-fds
Closed

[2.7] bpo-35757: subprocess.Popen: optimize close_fds for Linux#11584
kolyshkin wants to merge 1 commit intopython:2.7from
kolyshkin:close-fds

Conversation

@kolyshkin
Copy link
Copy Markdown

@kolyshkin kolyshkin commented Jan 17, 2019

In case close_fds=True is passed to subprocess.Popen()
or its users (subprocess.call() etc), it might spend
some considerable time closing non-opened file descriptors,
as demonstrated by the following snippet from strace:

close(3)                    = -1 EBADF (Bad file descriptor)
close(5)                    = -1 EBADF (Bad file descriptor)
close(6)                    = -1 EBADF (Bad file descriptor)
close(7)                    = -1 EBADF (Bad file descriptor)
...
close(1021)                 = -1 EBADF (Bad file descriptor)
close(1022)                 = -1 EBADF (Bad file descriptor)
close(1023)                 = -1 EBADF (Bad file descriptor)

This happens because the code in _close_fds() iterates from 3 up to
MAX_FDS = os.sysconf("SC_OPEN_MAX").

Now, syscalls are cheap, but SC_OPEN_MAX (also known as RLIMIT_NOFILE
or ulimit -n) can be quite high, for example:

$ docker run --rm python python3 -c \
	$'import os\nprint(os.sysconf("SC_OPEN_MAX"))'
1048576

This means a million syscalls before spawning a child process, which
can result in major delay, like 0.1s as measured on my fast and mostly
idling laptop. Here is the comparison with python3 (which do not have
this problem):

$ docker run --rm python python3 -c $'import subprocess\nimport time\ns = time.time()\nsubprocess.check_call([\'/bin/true\'], close_fds=True)\nprint(time.time() - s)\n'
0.0009245872497558594

$ docker run --rm python python2 -c $'import subprocess\nimport time\ns = time.time()\nsubprocess.check_call([\'/bin/true\'], close_fds=True)\nprint(time.time() - s)\n'
0.0964419841766

The solution (for Linux, at least) is easy: instead of iterating through
all the possibly opened files, get the list of actually opened
files (from /proc/self/fd/) and iterate through these. In case it
doesn't work (for example when /proc not available or some draconian
selinux/apparmor restrictions), fall back to older method.

[v2: do not use bare exceptions]

https://bugs.python.org/issue35757

@the-knights-who-say-ni
Copy link
Copy Markdown

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@kolyshkin kolyshkin changed the title subprocess.Popen: optimize close_fds for Linux bpo-35757: subprocess.Popen: optimize close_fds for Linux Jan 17, 2019
@kolyshkin kolyshkin changed the title bpo-35757: subprocess.Popen: optimize close_fds for Linux [2.7] bpo-35757: subprocess.Popen: optimize close_fds for Linux Jan 17, 2019
Copy link
Copy Markdown
Contributor

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid bare excepts.
I know old code uses it but there is no reason to follow this practice in new code.

Comment thread Lib/subprocess.py Outdated
Comment thread Lib/subprocess.py Outdated
Comment thread Lib/subprocess.py Outdated
In case close_fds=True is passed to subprocess.Popen()
or its users (subprocess.call() etc), it might spend
some considerable time closing non-opened file descriptors,
as demonstrated by the following snippet from strace:

	close(3)                    = -1 EBADF (Bad file descriptor)
	close(5)                    = -1 EBADF (Bad file descriptor)
	close(6)                    = -1 EBADF (Bad file descriptor)
	close(7)                    = -1 EBADF (Bad file descriptor)
	...
	close(1021)                 = -1 EBADF (Bad file descriptor)
	close(1022)                 = -1 EBADF (Bad file descriptor)
	close(1023)                 = -1 EBADF (Bad file descriptor)

This happens because the code in _close_fds() iterates from 3 up to
MAX_FDS = os.sysconf("SC_OPEN_MAX").

Now, syscalls are cheap, but SC_OPEN_MAX (also known as RLIMIT_NOFILE
or ulimit -n) can be quite high, for example:

	$ docker run --rm python python3 -c \
		$'import os\nprint(os.sysconf("SC_OPEN_MAX"))'
	1048576

This means a million syscalls before spawning a child process, which
can result in major delay, like 0.1s as measured on my fast and mostly
idling laptop. Here is the comparison with python3 (which apparently
as well as according to strace do not have this problem):

	$ docker run --rm python python3 -c $'import subprocess\nimport time\ns = time.time()\nsubprocess.check_call([\'/bin/true\'], close_fds=True)\nprint(time.time() - s)\n'
	0.0009245872497558594

	$ docker run --rm python python2 -c $'import subprocess\nimport time\ns = time.time()\nsubprocess.check_call([\'/bin/true\'], close_fds=True)\nprint(time.time() - s)\n'
	0.0964419841766

The solution (for Linux, at least) is easy: instead of iterating through
all the _possibly_ opened files, get the list of _actually_ opened
files (from /proc/self/fd/) and iterate through these. In case it
doesn't work (for example when /proc not available or some draconian
selinux/apparmor restrictions), fall back to older method.

[v2: do not use bare exceptions]

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Copy Markdown
Author

@asvetlov Thanks for your review (as this is my first code in Python that is longer than two lines your review is very much appreciated)! I believe I have addressed your comments. Ideally, I'd write code without using exceptions at all if something like close_noex() would exist, but I was not able to find anything like this.

@asvetlov
Copy link
Copy Markdown
Contributor

Oooh, missed the fact that the patch is for Python 2.7

@benjaminp would you accept it or the change doesn't worth potential problems?
Python 3 has completely different implementation for closing open FDs, for the note.

@kolyshkin
Copy link
Copy Markdown
Author

Python 3 has completely different implementation for closing open FDs, for the note.

...which is very similar to this implementation (i.e. it also iterates through entries found in /proc/self/fd), except its written in C so no exceptions to care about :)

@benjaminp
Copy link
Copy Markdown
Contributor

Thank you for the PR. Closing per comment on the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants