-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Optimizes building of Production image for non-modified www files #19210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
airflow/operators/python.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should
- Raise an exception if both arguments are set
- Maybe ignore
python_versionif it matchessys.version_info...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AH yeah - that's the "previous" PR (this one is based on it) - I will come back to it shortly and address the comments there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the #19189 - but I will take your comments and answer here and address them there.
airflow/utils/python_virtualenv.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we may want to infer this from system_site_packages by default. I’d imagine most existing usages of this function specify system_site_packages because they want to access Airflow, but that would be broken by this PR since Airflow is no longer a part of the site-packages directory.
airflow/utils/python_virtualenv.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to explicitly error out here instead to avoid the user accidentally nuking something. A force flag can be added if needed.
setup.cfg
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add virtualenv>=20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ?
The --network-concurrency=1 is very slow and even if this has been added in apache#17293 to battle connection refused, it slows regular builds far too much. There is a new optimisation in progress that should significantly reduce the yarn installations on kind-cluster deploy: apache#19210 and it should solve the problem much better.
The --network-concurrency=1 is very slow and even if this has been added in #17293 to battle connection refused, it slows regular builds far too much. There is a new optimisation in progress that should significantly reduce the yarn installations on kind-cluster deploy: #19210 and it should solve the problem much better.
|
I wil rebase that one but let's not comment in #1989 for the changes related to changes there (only last commit is relevant here) |
Seems that the future direction of package installation for Python is to always use virtualenvs to install dependencies. There was a heated discussion about it in the issue here: pypa/pip#10556 and general consensus is that virtualenv building should also be used in Docker images as it can help to avoid multiple problems related to interference between distro-managed and PIP-managed files. This change implements it - both PROD and CI images are converted to use virtualenv for installation and instead of copying the `.local` directory between image segments they do the same with newly created `/.venv` virtual environment. All dependencies and shared libraries are installed there are and shared between all users using the images.
When none of the `www` files changed, we do not have to compile assets. Unfortunately the old Dockerfile run COPY for all sources before the asset compilation, so whenever any file changed in sources, the asset compilation was triggered. This change fixes it in the way that assets are compiled first, the dist files are stored on the side and copied back after all sources are copied again (this way they will not get overwritten). This is done in the "build" segment of the image so it has no impact of the size of the "final" image. It will speed up both developing changes for the image as well as building PROD image in CI when none of the www files change (which is very, very often)
141d2d7 to
8644ea7
Compare
The --network-concurrency=1 is very slow and even if this has been added in #17293 to battle connection refused, it slows regular builds far too much. There is a new optimisation in progress that should significantly reduce the yarn installations on kind-cluster deploy: #19210 and it should solve the problem much better. (cherry picked from commit 43e84cc)
When none of the
wwwfiles changed, we do not have to compileassets. Unfortunately the old Dockerfile run COPY for all sources
before the asset compilation, so whenever any file changed in
sources, the asset compilation was triggered. This change fixes it
in the way that assets are compiled first, the dist files are
stored on the side and copied back after all sources are copied
again (this way they will not get overwritten).
This is done in the "build" segment of the image so it has no
impact of the size of the "final" image. It will speed up both
developing changes for the image as well as building PROD image
in CI when none of the www files change (which is very, very
often)
Based on #19189 (only last commit needs to be reviewed)