Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Nov 28, 2020

Using pkg_resources.iter_entry_points validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

  1. providers and plugins load much more reliably.
  2. it's faster too

Closes #12692


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ashb ashb requested a review from potiuk November 28, 2020 21:18
@ashb
Copy link
Member Author

ashb commented Nov 28, 2020

Please test this @potiuk -- I wasn't able to get my providers erroring even without this change, but was with plugins. Both have been changed.

@potiuk
Copy link
Member

potiuk commented Nov 28, 2020

Will do timorrow. Cool if we can have it working.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

…t errors

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes apache#12692
@ashb ashb force-pushed the no-error-setuptools-entrypoint branch from 44709ec to c136067 Compare November 28, 2020 23:21
self.assertTrue('test_plugin' in self.app.blueprints)
self.assertEqual(self.app.blueprints['test_plugin'].name, bp.name)

@mock.patch('airflow.plugins_manager.pkg_resources.iter_entry_points')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was in the "TestPluginsRBAC" class -- I moved to the "right" place.

Comment on lines -372 to -390
class TestPluginsDirectorySource(unittest.TestCase):
def test_should_provide_correct_attribute_values(self):
source = PluginsDirectorySource("./test_views.py")
self.assertEqual("$PLUGINS_FOLDER/../../test_views.py", str(source))
self.assertEqual("<em>$PLUGINS_FOLDER/</em>../../test_views.py", source.__html__())
self.assertEqual("../../test_views.py", source.path)


class TestEntryPointSource(unittest.TestCase):
def test_should_provide_correct_attribute_values(self):
mock_entrypoint = mock.Mock()
mock_entrypoint.dist = 'test-entrypoint-dist==1.0.0'
source = EntryPointSource(mock_entrypoint)
self.assertEqual("test-entrypoint-dist==1.0.0", source.dist)
self.assertEqual(str(mock_entrypoint), source.entrypoint)
self.assertEqual("test-entrypoint-dist==1.0.0: " + str(mock_entrypoint), str(source))
self.assertEqual("<em>test-entrypoint-dist==1.0.0:</em> " + str(mock_entrypoint), source.__html__())


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were duplicating tests in test_plugins_manager.py -- they don't belong here.

@potiuk
Copy link
Member

potiuk commented Nov 29, 2020

Fantastic!

I confirm it works much faster and it is version-conflict safe.

Before:

Screenshot from 2020-11-29 07-14-47

After:

Screenshot from 2020-11-29 07-17-19

There was a random failure for rabbitmq download but all the rest is fine. Merging

@potiuk potiuk merged commit 7ef9aa7 into apache:master Nov 29, 2020
@ashb ashb deleted the no-error-setuptools-entrypoint branch November 29, 2020 08:41
@ashb
Copy link
Member Author

ashb commented Nov 29, 2020

How fast is it now?

Copy link
Contributor

@mjpieters mjpieters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this change, this will help with startup times! Because we have to load plugins early in airflow celery worker we are paying a plugin scan price everywhere else. :-/

funcsigs>=1.0.0, <2.0.0
graphviz>=0.12
gunicorn>=19.5.0, <20.0
importlib_metadata~=1.7 # We could work with 3.1, but argparse needs <2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you meant argcomplete perhaps? I note that argcomplete has been updated to support importlib_metadata 3.x (and otherwise only has bugfixes, looks like a desireable upgrade).

And if 3.1 works, please widen the pin to >=1.7,<4; the argcomplete~=1.10 pin above already lets you use importlib_metadata 3.1 if you have the newer argcomplete installed.

Also, Python 3.8 doesn't need importlib_metadata anymore, and argcomplete correctly won't depend on it in 3.8 and up. Please do so here too:

     importlib_metadata>=1.7,<4;python_version<="3.7"

and make the imports conditional:

try:
    # 3.8 and up
    from importlib import metadata as importlib_metadata
except ImportError:
    # use the backport
    import importlib_metadata

If you update the argcomplete dependency too, then you'd not even have to set the >=1.7 lower bound, ~=3.1 would do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whops, yes I did mean argcomplete. Though I also found virtualenv (dep of pylint) was also pinning this, so it may not help us yet.

This line in the readme https://github.com/python/importlib_metadata/ :

As of Python 3.8, this functionality has been added to the Python standard library. This package supplies backports of that functionality including improvements added to subsequent Python versions

Makes me wonder if/when we should continue to use it anyway on Py 3.8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whops, yes I did mean argcomplete. Though I also found virtualenv (dep of pylint) was also pinning this, so it may not help us yet.

Ah though virtualenv has been updated too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me wonder if/when we should continue to use it anyway on Py 3.8?

Your not using the 3.9 / importlib_metadata 1.6 feature: Added module and attr attributes to EntryPoint; everything else is bug fixing or perf improvements, which all have been backported to 3.8.

Even if you were using those extra attributes, using python_version<="3.8" (and inverting the import guard) would be a good idea to keep the dependencies flexible and avoid limiting what packages people can use with Airflow.

I don't see where virtualenv depends on importlib_metadata? Nor does pylint depend on virtualenv. Might be in older versions. pre-commit depends on virtualenv though. And jsonlint depends on importlib_metadata but sets no pin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God in getting my dist names all in a mix today. I did mean pre-commit -> virtualenv. I'll have to check what version of virtualenv pre-commit specifies.

I'll open a new PR to add the guard as you suggest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already opened a PR :-) See #12703

kaxil pushed a commit that referenced this pull request Dec 3, 2020
…t errors (#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes #12692

(cherry picked from commit 7ef9aa7)
kaxil pushed a commit that referenced this pull request Dec 3, 2020
…t errors (#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes #12692

(cherry picked from commit 7ef9aa7)
ashb added a commit that referenced this pull request Dec 3, 2020
kaxil pushed a commit that referenced this pull request Dec 3, 2020
…t errors (#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes #12692

(cherry picked from commit 7ef9aa7)
AntonyRileyAtVerto pushed a commit to vertoanalytics/incubator-airflow that referenced this pull request Feb 2, 2021
- BugFix: Tasks with ``depends_on_past`` or ``task_concurrency`` are stuck (apache#12663)
- Fix issue with empty Resources in executor_config (apache#12633)
- Fix: Deprecated config ``force_log_out_after`` was not used (apache#12661)
- Fix empty asctime field in JSON formatted logs (apache#10515)
- [AIRFLOW-2809] Fix security issue regarding Flask SECRET_KEY (apache#3651)
- [AIRFLOW-2884] Fix Flask SECRET_KEY security issue in www_rbac (apache#3729)
- [AIRFLOW-2886] Generate random Flask SECRET_KEY in default config (apache#3738)
- Add missing comma in setup.py (apache#12790)
- Bugfix: Unable to import Airflow plugins on Python 3.8 (apache#12859)
- Fix setup.py missing comma in ``setup_requires`` (apache#12880)
- Don't emit first_task_scheduling_delay metric for only-once dags (apache#12835)

- Update setup.py to get non-conflicting set of dependencies (apache#12636)
- Rename ``[scheduler] max_threads`` to ``[scheduler] parsing_processes`` (apache#12605)
- Add metric for scheduling delay between first run task & expected start time (apache#9544)
- Add new-style 2.0 command names for Airflow 1.10.x (apache#12725)
- Add Kubernetes cleanup-pods CLI command for Helm Chart (apache#11802)
- Don't let webserver run with dangerous config (apache#12747)
- Replace pkg_resources with importlib.metadata to avoid VersionConflict errors (apache#12694)

- Clarified information about supported Databases
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
…t errors (apache#12694)

Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.

This sounds nice, but is a huge mis-feature.

So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.

This has two advantages:

1. providers and plugins load much more reliably.
2. it's faster too

Closes apache#12692

(cherry picked from commit 7ef9aa7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provider discovery based on entry_points is rather brittle

3 participants