Skip to content

Conversation

@sivel
Copy link
Member

@sivel sivel commented Feb 4, 2020

SUMMARY

Add content dir where Ansible will look for content provided by ACD

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

lib/ansible/utils/collection_loader.py

ADDITIONAL INFORMATION

@sivel sivel requested a review from nitzmahone February 4, 2020 16:32
@ansibot ansibot added WIP This issue/PR is a work in progress. Nevertheless it was shared for getting input from peers. affects_2.10 This issue/PR affects Ansible v2.10 feature This issue/PR relates to a feature request. needs_triage Needs a first human triage before being processed. support:community This issue/PR relates to code supported by the Ansible community. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Feb 4, 2020
@ansibot ansibot added stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. and removed stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Feb 13, 2020
@abadger
Copy link
Contributor

abadger commented Feb 17, 2020

Okay, @nitzmahone @mattclay and I talked about this today. We discussed four possibilities with certain drawbacks:

  • A special directory findable from the ansible-base installation (like the original of this PR)

    • Problem with this was that the user might install ansible-base from system packages into /usr/lib/[..]site-packages and collections via pip --user to ~/.local/lib/[...]/site-packages. (and then distributions like Debian might throw dist-packages into the mix too....)
  • Scan sys.path for toplevel ansible_collection directories

    • This is what we decided to do in the end since it seemed to satisfy the requirements and have the least drawbacks.
    • Drawbacks:
      • A badly written pip installed package could bash the ansible_collection namespace for third party tools. (ansible itself will be okay as it makes ansible_collections a namespace package early on.
      • A badly written pip package could overwrite other collections. For instance, pip install ibm might install ansible_collections/ibm/aix/aix_hostname.py, and pip install ibm_community might contain the same filename. (Note: none of these options prevent this; we're just trying to make accidental overwriting unlikely.)
  • Scan sys.path for any toplevel which contains an ansible_collection directory (We called this, using surrogates)

    • The benefit of this is that it isolates the files from one pip installed package from another, making it less likely that files will be overwritten by mistakes in packaging.
    • Drawbacks:
      • a lot more directories would have to be scanned so there would be a slower startup time.
      • bad pip installs could still bash the namespace for third party tools
      • it would be harder for people to make the mistake of overwriting another collection's file as we would encourage separate toplevel directories but ansible would still only load the one that it encountered first in the search order. For instance, pip install ibm_official installs ibm_official/ansible_collections/ibm/aix/aix_hostname.py. pip install ibm_community installs ibm_community/ansible_collections/ibm/aix/aix_hostname.py. Both files exist on the filesystem but ansible will load the ibm_community version due to sort order and ignore the ibm_official version.
      • Plugins for third party tools used by ansible-test will probably need to set PYTHONPATH to pick up the collections' code. With this surrogate directory form, we'll end up with nested directories (PYTHONPATH=/usr/lib/python3.7/site-packages:/usr/lib/python3.7/site-packages/ibm_official/) which can cause problems for python. We don't know that we'll hit any of those but it is non-standard enough that we are wary of it.
  • Scan sys.path for a hardcoded toplevel surrogate directory and then adding ansible_collection directories from inside of those. (What this PR presently does)

    • This is quicker that having arbitrary surrogate directories.
    • Drawbacks:
      • bad pip installs could still bash the namespace for third party tools
      • Since everything lives in the same toplevel namespace (in this example, _ansible_community_distribution, we're back to mistakes in packaging being able to overwrite files from other collections again)
      • The nested PYTHONPATH problem exists here too.

I'll update the code in this PR to reflect using ansible_collections as the toplevel. nitzmahone and I both have some other coding to do with deadlines of the migration date at the beginning of March. Whoever gets free time first will add tests and then we will be able to merge this.

@samdoran samdoran removed the needs_triage Needs a first human triage before being processed. label Feb 18, 2020
@jctanner
Copy link
Contributor

jctanner commented Feb 26, 2020

I'm a bit confused by this PR. @abadger's comment refers to some examples of installing an ibm provided collection, but the code currently -only- looks for an ansible_collections package ...

jtanner@yoga720:/mnt/c/Users/jtanner$ python -c 'import sys; print(sys.path)'
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages']

jtanner@yoga720:/mnt/c/Users/jtanner$ python -c "import sys; import os; print([os.path.join(x, 'ansible_collections') for x in sys.path])"
['ansible_collections', '/usr/lib/python2.7/ansible_collections', '/usr/lib/python2.7/plat-x86_64-linux-gnu/ansible_collections', '/usr/lib/python2.7/lib-tk/ansible_collections', '/usr/lib/python2.7/lib-old/ansible_collections', '/usr/lib/python2.7/lib-dynload/ansible_collections', '/usr/local/lib/python2.7/dist-packages/ansible_collections', '/usr/lib/python2.7/dist-packages/ansible_collections']

Is there a disconnect between what is being asked for and what this PR does or am I reading it incorrectly?

@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Feb 26, 2020
@webknjaz webknjaz self-requested a review February 26, 2020 15:34
for path in sys.path:
sys_installed_path = os.path.join(os.path.abspath(path), 'ansible_collections')
if os.path.isdir(sys_installed_path):
self._n_configured_paths.append(sys_installed_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds paths at the end of the list. Which means that it'll be shadowed by user-provided paths. Is this intended? Should this also be documented somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely the intent; this feature primarily exists to support ACD and have an "it just works" fallback for 2.9-authored content to transparently resolve to, and depending on how downstream ansible-base ends up being packaged, possibly an RPM-installed version of same. User-installed content should absolutely be able to mask it.

Docs and tests are both missing from this PR at the moment, I'd argue that both need to be addressed before it's merged. That should be achievable once migration is behind us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs and tests are both missing from this PR at the moment

Scratch docs. One of the caveats to allowing this feature, is that no documentation is provided for users about it. This is purely an internal thing that we are effectively just not communicating about its existence. It will be unsupported for any users who find it and try to use it. It's basically "for us only".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd want to have it supported b/c it's easier to ship dists this way in some cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd want to have it supported b/c it's easier to ship dists this way in some cases

That decision is mostly out of my hands. The decision came from "higher" up. The supported way of distributing collections is galaxy/ah only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, I agree with @webknjaz - although I understand it's out of your hands. Being able to pip install a library and have that result in a corresponding ansible collection being installed would be hugely beneficial for some things. Mostly saying that in case further conversations happen by those who are higher up to register user interest in such a feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emonty FTR this is how ACD will work @ PyPI @ Ansible 2.10. So it'd be pretty easy for users to figure out how it works. Maybe it's not something that business desires, but the community would definitely want this.

Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if Ansible CLIs should list Python dists that are installed on the system in the ansible_collections implicit namespace. It'd be useful for the purposes of security analysis.

elif self._n_configured_paths is None:
self._n_configured_paths = []

# Append all ``ansible_collections`` dirs from sys.path to the end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would be a bit more useful if it explained the effect of the change:

Suggested change
# Append all ``ansible_collections`` dirs from sys.path to the end
# Append all ``ansible_collections`` dirs from sys.path to the end
# which enables the collection loader to locate and import Python
# modules that are distributed as Python distributed packages via
# PyPI or similar indexes. Specifically, it enables users to distribute
# Python content as long as dists follow PEP 420 Implicit Namespace
# Packages conventions.
# Refs:
# * https://packaging.python.org/guides/packaging-namespace-packages/
# * https://packaging.python.org/guides/creating-and-discovering-plugins/#using-namespace-packages
# * https://setuptools.readthedocs.io/en/latest/setuptools.html#namespace-packages
# * https://setuptools.readthedocs.io/en/latest/setuptools.html#find-namespace-packages
#
# SECURITY CONSIDERATIONS:
# This provides yet another place for injecting Python code.
# We should be clear about it when informing users about the places
# where modules/plugins are loaded from.
#
# NOTE: It is possible to publish a "broken" namespace package
# NOTE: so that when it's installed it breaks user's ability
# NOTE: to install other Python packages in that namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd take out the REFS. Security considerations should go somewhere else. maybe there should be a single place in the documentation which lists all the locations that the collections can be loaded from. The note is correct but within ansible, we use our special loader which establishes a namespace prior to loading anything else so this should only break with third party code which might try to load the ansible-collections package without establishing the namespace first. So if we state that here, it should note that it won't break us, only third parties.

@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Mar 4, 2020
@webknjaz
Copy link
Member

webknjaz commented Mar 5, 2020

So here's a demo Python dist working with this patch: https://github.com/webknjaz/ansible-collection-python-dist-boo / https://pypi.org/project/ansible-collections.python.dist.boo/ / https://test.pypi.org/project/ansible-collections.python.dist.boo/

$ pip install git+https://github.com/sivel/ansible@acd-content-dir
$ pip install -i https://test.pypi.org/simple/ ansible-collections.python.dist.boo --pre
$ ansible -m python.dist.boo -a name=Bob localhost

@webknjaz webknjaz self-assigned this Mar 5, 2020
@abadger
Copy link
Contributor

abadger commented Mar 5, 2020

I'm a bit confused by this PR. @abadger's comment refers to some examples of installing an ibm provided collection, but the code currently -only- looks for an ansible_collections package ...

In the example, the ibm collection would be installed inside of the ansible_collections python namespace package. The collection loader will find all the collections inside of there provided that the proper upper python package is put into the collection path. (Thanks to webknjaz for fixing which path that needs to be). So another example from the installation-and-use side:

With standard sys.path settings in Fedora and RHEL:

  • A collection module is installed into ~/.local/lib/python3.8/site-packages/ansible_collections/ibm/community/plugins/modules/foo.py
    • ansible -m ibm.community.foo would use that module.
  • A collection module is installed into /usr/lib64/python3.8/site-packages/ansible_collections/community/general/plugins/modules/ping.py
    • ansible -m community.general.ping would use that module.

@ansibot

This comment has been minimized.

@webknjaz webknjaz force-pushed the acd-content-dir branch 3 times, most recently from 1f42521 to 22ce66d Compare March 9, 2020 22:50
@ansibot ansibot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Mar 19, 2020
@ansibot ansibot added core_review In order to be merged, this PR must follow the core review workflow. and removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Mar 20, 2020
@webknjaz webknjaz requested a review from mattclay March 20, 2020 14:13
@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed core_review In order to be merged, this PR must follow the core review workflow. labels Mar 20, 2020
sivel and others added 14 commits March 23, 2020 22:50
After talking with nitzmahone and mattclay we settled on this feature
looking up collections in specifically named subdirectories of
sys.path's directories.

We discussed pros and cons of doing this in several different ways and
decided that looking for an ansible_collections package directly in
sys.path had less drawbacks than looking for ansible_collections as a
subdirectory of a package in sys.path.  In some ways of doing it, the
extra redirection could have helped point out packaging mistakes but not
enough to justify the complexity of coding it that way.
Co-Authored-By: Sviatoslav Sydorenko <[email protected]>
@webknjaz webknjaz changed the title Add content dir where Ansible will look for content provided by ACD Add content dir where Ansible will look for the content provided by ACD Mar 23, 2020
@webknjaz webknjaz merged commit 8d8c737 into ansible:devel Mar 23, 2020
@webknjaz
Copy link
Member

Ah, GH lost my commit message and co-authors on merge :(

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

affects_2.10 This issue/PR affects Ansible v2.10 ansible-2.10-plan feature This issue/PR relates to a feature request. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. new_plugin This PR includes a new plugin. support:community This issue/PR relates to code supported by the Ansible community. support:core This issue/PR relates to code supported by the Ansible Engineering Team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.