Skip to content

Conversation

@bdraco
Copy link
Member

@bdraco bdraco commented Feb 12, 2023

Proposed change

The command_line (8.2% of the active installations) and shell_command (4.7% of the active installations) integrations are quite popular. They suffer from a performance issue due to the memory size of the Home Assistant python instance and python's subprocess implementation when using musl. We also experience the issue when pip is called to install a new module.

By default python 3.10 will use fork() which has to copy memory of the parent process. In our case
this can be huge since Home Assistant core can use hundreds of megabytes of RAM. By using posix_spawn this is avoided and subprocess creation does not get discernibly slower the larger the Home Assistant python process grows.

Additional reading: https://github.com/rtomayko/posix-spawn

In python 3.11 vfork will also be available (vfork is stubbed out in 3.10)
python/cpython#80004 (comment) python/cpython#11671 but we won't always be able to use it and posix_spawn is likely safer https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14

The subprocess library doesn't know about musl though even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c so we have to teach it since it only has checks for glibc https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745

The constant is documented as being able to be flipped off at https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn so it seems likely to not change in the future.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

By default python 3.10 will use the fork() which has to
copy all the memory of the parent process (in our case
this can be huge since Home Assistant core can use
hundreds of megabytes of RAM). By using posix_spawn
this is avoided.

In python 3.11 vfork will also be available
python/cpython#80004 (comment)
python/cpython#11671 but we won't
always be able to use it and posix_spawn is considered safer
https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14

The subprocess library doesn't know about musl though
even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
so we have to teach it since it only has checks for glibc
https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745

The constant is documented as being able to be flipped here:
https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
By default python 3.10 will use the fork() which has to
copy memory of the parent process (in our case
this can be huge since Home Assistant core can use
hundreds of megabytes of RAM). By using posix_spawn
this is avoided and subprocess creation does not
get discernibly slow the larger the Home Assistant
python process grows.

In python 3.11 vfork will also be available
python/cpython#80004 (comment)
python/cpython#11671 but we won't
always be able to use it and posix_spawn is considered safer
https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14

The subprocess library doesn't know about musl though
even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
so we have to teach it since it only has checks for glibc
https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745

The constant is documented as being able to be flipped here:
https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
@home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (shell_command) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of shell_command can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Change the title of the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign shell_command Removes the current integration label and assignees on the issue, add the integration domain after the command.

# on Alpine Linux which is supported by musl.
_exists = os.path.exists
subprocess._USE_POSIX_SPAWN = _exists( # pylint: disable=protected-access
ALPINE_RELEASE_FILE
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going back and forth on this check because there could be other cases where they use musl but not alpine. I think that's rare enough that it's not worth the additional overhead of importing the packaging module or requiring it.

@bdraco bdraco marked this pull request as ready for review February 12, 2023 21:03
@bdraco bdraco requested review from a team, epenet and fabaff as code owners February 12, 2023 21:03
def _enable_posix_spawn() -> None:
"""Enable posix_spawn on Alpine Linux."""
# pylint: disable=protected-access
if subprocess._USE_POSIX_SPAWN:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will already be true for any modern glibc system

bdraco added a commit to home-assistant-libs/ha-ffmpeg that referenced this pull request Feb 12, 2023
closing fds can take a long time and its unlikely python is going
to implement the close_range() system call [https://lwn.net/Articles/789023/]
any time soon. Since we do not need to close the fds here as
it does not cause any issues for ffmpeg we can spawn ffmpeg
stream much faster in HA after home-assistant/core#87958
MartinHjelmare pushed a commit to home-assistant-libs/ha-ffmpeg that referenced this pull request Feb 13, 2023
closing fds can take a long time and its unlikely python is going
to implement the close_range() system call [https://lwn.net/Articles/789023/]
any time soon. Since we do not need to close the fds here as
it does not cause any issues for ffmpeg we can spawn ffmpeg
stream much faster in HA after home-assistant/core#87958
*self.config[CONF_ARGS],
env=env,
stdout=asyncio.subprocess.PIPE if self.config[CONF_META] else None,
close_fds=False, # required for posix_spawn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this will break any 3rd party lib using subprocess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What am I missing? Are there specific fds open that will be a problem for the subprocess in this instance?

If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. Otherwise when close_fds is false, file descriptors obey their inheritable flag as described in Inheritance of File Descriptors.

A file descriptor has an “inheritable” flag which indicates if the file descriptor can be inherited by child processes. Since Python 3.4, file descriptors created by Python are non-inheritable by default.

On UNIX, non-inheritable file descriptors are closed in child processes at the execution of a new program, other file descriptors are inherited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more back story. python/cpython#86904

I don’t think we are crossing a security boundary here (but maybe I’m not seeing it). Most of these invocations are even using a shell directly so I’d expect anything they are executing to run at the same privilege level as the python process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant in general. If we have to update all our calls to pass close_fds=False, then 3rd party libs calling a sub process would have to do the same.

Copy link
Member Author

@bdraco bdraco Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that doesn't add the flag will get the existing slow behavior (effectively no change)

If they can add the flag and they don't have anything else special (like changing uids), the subprocess module should automatically use the faster path.

If we know of a library that frequently spawns subprocesses it would be good to adjust it. At least in the profiles users sent, it was almost always the two integrations mentioned above that were generating the load on the system. I expect this change will solve the issue for the 99% case

Copy link
Member

@pvizeli pvizeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's not a breaking change @balloob

@balloob balloob merged commit 03eea7b into dev Feb 13, 2023
@balloob balloob deleted the subprocess_copy branch February 13, 2023 14:02
@github-actions github-actions bot locked and limited conversation to collaborators Feb 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants