Add support for --no-pre-install-wheels and --max-install-jobs.#2298
Add support for --no-pre-install-wheels and --max-install-jobs.#2298jsirois merged 16 commits intopex-tool:mainfrom
--no-pre-install-wheels and --max-install-jobs.#2298Conversation
A new `PEX_MAX_INSTALL_JOBS` variable will use this.
This change renames types and variables to take the emphasis off installation and put it on the more generic result of a resolve (which will soon include raw `.whl` files as an option in addition to default installed wheel chroots). Also factor out the machinery needed to be able to ask `BuildAndInstall` for `.whl` file results in addition to the current installed wheel chroot results.
These will be used to select raw wheel dependencies (vs the current pre-installed wheel chroots) and the parallelism used to install dependencies at runtime upon the first PEX boot.
This will be used in subsequent changes.
This support will automatically be used as soon as resolves are wired with an option to return raw `.whl` file results.
This will be needed when building PEXes that use raw `.whl` file dists.
Use the Pip resolver to test the resulting PEX contains .whl files that are not stored in the PEX with compression.
|
I'll attach some example perf numbers for the torch 2GB case here as soon as CI goes green. Suffice to say, this is both a build time win and run time win for that case. |
|
Reviewers: Thanks in advance - this is large. I did break it up into independent commits though, none of which is larger than 550 delta-lines, with most being ~100 delta lines. So instead of ~12 PRs you get 1 with 12 commits. I needed the 1 personally to work out all the titchy details comprehensively and ensure this all worked / vet it was complicated and actually required the 2 new knobs instead of just being the new default. |
The break is only under pypy3.{7,8,9,10} and was caused by the
prompt-toolkit 3.0.42 release today.
Working through the perf analysis in pex-tool#2292 brought these to light.
|
Alright, #2292 now has perf analysis for the OP torch case. I'm still working on the 3 mac-specific IT failures, but this is ready for review. |
These were solved by switching away from the `temporary_dir` context manager to pytest's `tmpdir` fixture. Under `temporary_dir` the `~/.pex/unzipped_pexes/*/.bootstrap` symlink was not working. The mechanism by which this wasn't working still needs to be root-caused.
| if not self._pex_info.deps_are_wheel_files: | ||
| raise ResolveError( | ||
| "Cannot resolve .whl files from PEX at {pex}; its dependencies are in the " | ||
| "form of pre-installed wheel chroots.".format(pex=self.source_pex) | ||
| ) |
There was a problem hiding this comment.
In general I only like to add orthogonal features that can be combined freely. There are 3 Pex resolver sources: Pip, lock files and PEX repositories, and 2 PEX dependency formats: installed wheel chroots and (now) raw .whl files. In this 6 element matrix, only this case of resolving raw .whl files from a PEX repository containing dependencies in installed wheel chroot form cannot be satisfied currently (this is tested explicitly in the changes to test_pex_repository_resolve.py).
The pep_427.py work that now powers .whl file installs instead of pip install does make it possible to plug this gap in a later release though with some modification of pep_376.py. This will also allow the existing PEX_TOOLS=1 ./a.pex repository extract ... tool to produce fully correct wheels for the installed wheel chroot dependencies in PEXes of that style as well.
Filed: #2299
kaos
left a comment
There was a problem hiding this comment.
Not gone through all commits yet. Only been reading the code, trying to grok the change with limited knowledge of the surroundings.
| group.add_argument( | ||
| "--max-install-jobs", | ||
| dest="max_install_jobs", | ||
| default=1, |
There was a problem hiding this comment.
I'm assuming this default is to preserve behaviour; if a breaking change was possible (e.g. in 3.0.0), would you consider defaulting to -1?
There was a problem hiding this comment.
Probably not. Parallel installs slow down most cases and although -1 tries to guess the divide in the cases, I'm sure it does so poorly. My tests were on a 16 core SSD machine and I'm not confident on the heuristic for my machine let alone other disk / CPU combos.
| with identify_layout({pex!r}) as layout, atomic_directory( | ||
| {spread_dest!r}, source={source!r} | ||
| ) as spread_chroot: | ||
| if not spread_chroot.is_finalized(): | ||
| with TRACER.timed({extracting_msg!r}): | ||
| layout.extract_dist( | ||
| dest_dir=spread_chroot.work_dir, | ||
| dist_relpath={dist_relpath!r}, | ||
| is_wheel_file={is_wheel_file!r} | ||
| ) | ||
|
|
||
| symlink_dest = {symlink_dest!r} | ||
| safe_mkdir(os.path.dirname(symlink_dest)) | ||
| os.symlink({symlink_src!r}, symlink_dest) |
There was a problem hiding this comment.
Is it worth exposing the core of this logic as a function that can be used by both both _ensure_distributions_installed_serial directly and by this script?
There was a problem hiding this comment.
I'm not sure how you get much better. This code is pretty small and the sticking point is 7 parameters.
pex/layout.py
Outdated
| pass | ||
|
|
||
|
|
||
| AVERAGE_DISTRIBUTION_SIZE_PARALLEL_JOB_THRESHOLD = 1 * 1024 * 1024 # ~1MB |
There was a problem hiding this comment.
Does this number come from experiments?
There was a problem hiding this comment.
Yes. As noted elsewhere, on 1 machine and with limited cases. I'm very much not confident in the heuristic.
| attr.ib() | ||
| ) # Mapping[ProjectName, OrderedSet[Requirement]] | ||
|
|
||
| def adjust(self, distributions): |
There was a problem hiding this comment.
What's this adjusting? (I acknowledge it's just moved code.)
There was a problem hiding this comment.
The name of the extracted class is indicative here, it adjusts a distribution to note the direct top level requirement it satisfies, if any. Most dists usually don't get adjusted since they satisfy only transitive requirements. The need for associating dists to the direct requirements they satisfy goes back to handling local project directory dependencies. These have no project name known except after the resolve process which builds them into a wheel with a name and version.
| def test_entry_point_verification_3rdparty(tmpdir): | ||
| # type: (Any) -> None | ||
| @pytest.mark.parametrize( | ||
| "layout", [pytest.param(layout, id=layout.value) for layout in Layout.values()] |
There was a problem hiding this comment.
Totally minor, but are you aware that pytest.mark.parametrize supports an ids parameter that can take a function, so these could potentially be something like:
@pytest.mark.parametrize("layout", Layout.values(), ids=lambda layout: layout.value)
...There was a problem hiding this comment.
I was not aware. I'm not sure I like that though! It's not obvious the two align when you split things up like that. 10 values, 6 ids. Clearly they align in your example, but it takes a second to make sure of that.
There was a problem hiding this comment.
Oh, I read poorly. It's 1 function mapped over all the values. Gotcha.
There was a problem hiding this comment.
Yeah, in addition to the function, it also supports passing a list with exactly the right number of IDs, but I don't like that possibility either.
| pex = os.path.join(str(tmpdir), "pex") | ||
| exe = os.path.join(str(tmpdir), "exe") | ||
| with open(exe, "w") as fp: | ||
| fp.write("import colors; print(colors.blue('Moo?'))") |
There was a problem hiding this comment.
I assume the import colours is meaning to validate that ansicolors is actually installed in a useful way. Should there be an import cowsay too?
|
Alright - thanks @huonw and @kaos. I've added the perf analysis for small PEXes to the ticket which backs up what is expected: Builds of |
The
--no-pre-install-wheelsoption causes built PEXes to use raw.whlfiles. For--layout zipappthis means a single.whlfile isSTOREDper dep, and for--layout {packed,loose}this means the loose.deps/dir contains raw.whlfiles. This speeds up all PEX builds byavoiding pre-installing wheel deps (~unzipping into the
PEX_ROOT) andthen, in the case of zipapp and packed layout, re-zipping. For large
dependencies the time savings can be dramatic.
Not pre-installing wheels comes with a PEX boot cold-start performance
tradeoff since installation now needs to be done at runtime. This is
generally a penalty of O(100ms), but that penalty can be erased for some
deployment scenarios with the new
--max-install-jobsbuild option /PEX_MAX_INSTALL_JOBSruntime env var. By default, runtime installs areperformed serially, but this new option can be set to use multiple
parallel install processes, which can speed up cold boots for large
dependencies.
Fixes #2292