Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Oct 25, 2021

When none of the www files changed, we do not have to compile
assets. Unfortunately the old Dockerfile run COPY for all sources
before the asset compilation, so whenever any file changed in
sources, the asset compilation was triggered. This change fixes it
in the way that assets are compiled first, the dist files are
stored on the side and copied back after all sources are copied
again (this way they will not get overwritten).

This is done in the "build" segment of the image so it has no
impact of the size of the "final" image. It will speed up both
developing changes for the image as well as building PROD image
in CI when none of the www files change (which is very, very
often)

Based on #19189 (only last commit needs to be reviewed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should

  1. Raise an exception if both arguments are set
  2. Maybe ignore python_version if it matches sys.version_info...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH yeah - that's the "previous" PR (this one is based on it) - I will come back to it shortly and address the comments there :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the #19189 - but I will take your comments and answer here and address them there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we may want to infer this from system_site_packages by default. I’d imagine most existing usages of this function specify system_site_packages because they want to access Airflow, but that would be broken by this PR since Airflow is no longer a part of the site-packages directory.

Comment on lines +115 to +116
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to explicitly error out here instead to avoid the user accidentally nuking something. A force flag can be added if needed.

setup.cfg Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add virtualenv>=20

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ?

potiuk added a commit to potiuk/airflow that referenced this pull request Nov 19, 2021
The --network-concurrency=1 is very slow and even if this has
been added in apache#17293 to battle connection refused, it slows regular
builds far too much.

There is a new optimisation in progress that should significantly
reduce the yarn installations on kind-cluster deploy: apache#19210 and
it should solve the problem much better.
potiuk added a commit that referenced this pull request Nov 19, 2021
The --network-concurrency=1 is very slow and even if this has
been added in #17293 to battle connection refused, it slows regular
builds far too much.

There is a new optimisation in progress that should significantly
reduce the yarn installations on kind-cluster deploy: #19210 and
it should solve the problem much better.
@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2021

I wil rebase that one but let's not comment in #1989 for the changes related to changes there (only last commit is relevant here)

Seems that the future direction of package installation for Python
is to always use virtualenvs to install dependencies. There was a
heated discussion about it in the issue here:

pypa/pip#10556

and general consensus is that virtualenv building should also be
used in Docker images as it can help to avoid multiple
problems related to interference between distro-managed and
PIP-managed files.

This change implements it - both PROD and CI images are converted
to use virtualenv for installation and instead of copying
the `.local` directory between image segments they do the same
with newly created `/.venv` virtual environment. All dependencies
and shared libraries are installed there are and shared between
all users using the images.
When none of the `www` files changed, we do not have to compile
assets. Unfortunately the old Dockerfile run COPY for all sources
before the asset compilation, so whenever any file changed in
sources, the asset compilation was triggered. This change fixes it
in the way that assets are compiled first, the dist files are
stored on the side and copied back after all sources are copied
again (this way they will not get overwritten).

This is done in the "build" segment of the image so it has no
impact of the size of the "final" image. It will speed up both
developing changes for the image as well as building PROD image
in CI when none of the www files change (which is very, very
often)
@potiuk potiuk force-pushed the optimize-rebuilding-from-sources branch from 141d2d7 to 8644ea7 Compare November 25, 2021 15:11
@potiuk potiuk closed this Dec 15, 2021
potiuk added a commit that referenced this pull request Jan 22, 2022
The --network-concurrency=1 is very slow and even if this has
been added in #17293 to battle connection refused, it slows regular
builds far too much.

There is a new optimisation in progress that should significantly
reduce the yarn installations on kind-cluster deploy: #19210 and
it should solve the problem much better.

(cherry picked from commit 43e84cc)
@potiuk potiuk deleted the optimize-rebuilding-from-sources branch July 29, 2022 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants