-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
The tests we run in CI had shown that provider discovery based on entry_points is rather brittle.
Example here:
This is not a problem with Airflow, but wth PIP which might silently upgrade some packages and cause "version conflict" totally independently from Airflow configuration and totally out-of-our-control.
Simple installing a whl package on top of the existing airflow installation (as it happened in the case above) might cause inconsistent requirements (in the case above installing .whl packages with all providers on top of existing Airflow installation caused the requests package to be upgraded to 2.25.0, even if airflow has the right requirements set. In this case it was (correct and it is from the "install_requires" section of airflow's setup.cfg):
Requirement.parse('requests<2.24.0,>=2.20.0'), {'apache-airflow'}
In case you have a version conflict in your env, running entry_point.load() from a package that has this version conflicts results with pkg_resources.VersionConflict error or `pkg_resources.ContextualVersionConflict) rather than returning the entry_point. Or at least that's what I observed so far. It's rather easy to reproduce. Simply install requests > 2.24.0 in the current airflow and see what happens.
So far I could not find a way to mitigate this problem, but @ashb - since you have more experience with it, maybe you can find a workaround for this?
I think we have a few options:
-
We fail 'airflow' hard if there is any Version Conflict. We have a way now after I've implemented #Make sure that we have no conflicting dependencies when installing. #10854 (and after @ephraimbuddy finishes the Upgrade azure blob to v12 #12188 ) - we have a good, maintainable list of non-conflicting dependencies for Airflow and it's providers and we can keep that in the future thanks to pip-check. But I am afraid that will give a hard time to people who would like to install airflow with some custom dependencies (Tensorflow for example, depending on versions is notoriously difficult to sync with Airflow when it comes to dependencies). However, this is the most "Proper" (TM) solution.
-
We find a workaround for the entry_point.load() VersionConflict exception. However, I think that might not be possible or easy looking for example at this SO thread: https://stackoverflow.com/questions/52982603/python-entry-point-fails-for-dependency-conflict . The most upvoted (=1) answer there starts with "Welcome to the world of dependencies hell! I know no clean way to solve this" - which is not very encouraging. I tried also to find it out from docs and code of the entry_point.load() but to no avail. @ashb - maybe you can help here.
-
We go back to the original implementation of mine where I read provider info from provider.yaml embedded into the package. This has disadvantage of being non-standard, but it works independently of version conflicts.
WDYT?