Skip to content

Speedup environment activation, part 2#25556

Closed
adamjstewart wants to merge 10 commits intospack:developfrom
adamjstewart:fixes/python-env-activation
Closed

Speedup environment activation, part 2#25556
adamjstewart wants to merge 10 commits intospack:developfrom
adamjstewart:fixes/python-env-activation

Conversation

@adamjstewart
Copy link
Copy Markdown
Member

@adamjstewart adamjstewart commented Aug 22, 2021

This is a direct followup to #13557 which caches additional attributes that were added in #24095 that are expensive to compute.

Before

$ time spack env activate .

real	2m13.037s
user	1m25.584s
sys	0m43.654s
$ time spack env view regenerate
==> Updating view at /Users/Adam/.spack/.spack-env/view

real	16m3.541s
user	10m28.892s
sys	4m57.816s
$ time spack env deactivate

real	2m30.974s
user	1m38.090s
sys	0m49.781s

After

$ time spack env activate .

real	0m8.937s
user	0m7.323s
sys	0m1.074s
$ time spack env view regenerate
==> Updating view at /Users/Adam/.spack/.spack-env/view

real	2m22.024s
user	1m44.739s
sys	0m30.717s
$ time spack env deactivate

real	0m10.398s
user	0m8.414s
sys	0m1.630s

Fixes #25555
Fixes #25541

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of caching this by DAG hash, can you just make get_python_inc and get_python_lib delegate to a single method to get all the information you need from the underlying python and cache that?

The expensive thing here is running self.command at all. Make a function like get_python_data() that runs something like:

from distutils.sysconfig import get_python_inc, get_python_lib
import json

results = [
    fn(plat_specific=ps) for fn, ps in [
        (get_python_lib, True), 
        (get_python_lib, False), 
        (get_python_inc, True), 
        (get_python_inc, False), 
    ]
]

json.dump({
    "PREFIX": distutils.sysconfig.PREFIX,
    "EXEC_PREFIX": distutils.sysconfig.EXEC_PREFIX,
    "results": results
})

You don't need to call this for even every package in the DAG -- all distutils is doing is replacing PREFIX with your prefix (for plat_specific=False) and EXEC_PREFIX with your prefix (for plat_specific=True). I don't think the results are prefix-dependent at all.

Mark the function that runs this @llnl.util.lang.memoized and you can avoid implementing all the caching yourself.

This approach should call self.command once for a given python package instead of O(nodes) like yours does. Python startup is very expensive so we should avoid it as much as possible.

@tgamblin
Copy link
Copy Markdown
Member

Comment: we should really avoid calling self.command at all, as it prevents us from cross-compiling a python. I haven't reviewed python package in a while but most of this stuff is going to break on machines where you can't run backend binaries. So when we figure out what to do about this that doesn't use distutils (maybe the distutils logic isn't all that complicated? I haven't checked) we should avoid things that have to run the built python.

@adamjstewart
Copy link
Copy Markdown
Member Author

I think my latest commit is close to what @tgamblin is looking for. Unfortunately, it doesn't work and I haven't yet figured out why.

@haampie
Copy link
Copy Markdown
Member

haampie commented Aug 23, 2021

@adamjstewart could you also see if https://github.com/spack/spack/commit/383cbcb0521a3c7e9168fb166ab6a6bc6cde72d5.patch changes anything for you? For me it saves another 0.4 seconds, which probably isn't really measurable in your env :p but who knows

@adamjstewart
Copy link
Copy Markdown
Member Author

@haampie thanks for the suggestion, that commit does indeed help quite a bit with activation/deactivation, although less with env update:

$ time spack env activate .

real	0m29.377s
user	0m20.498s
sys	0m7.718s
$ time spack env view regenerate
==> Updating view at /Users/Adam/.spack/.spack-env/view

real	15m54.769s
user	10m23.846s
sys	4m55.371s
$ time spack env deactivate

real	0m55.149s
user	0m35.586s
sys	0m15.777s

No idea what the repercussions of that commit are though.

@haampie
Copy link
Copy Markdown
Member

haampie commented Aug 23, 2021

Wait, that's 78% faster activate with only that commit and not this PR? That would make a more compelling argument for #25507 than the results I've collected there 😅

@adamjstewart
Copy link
Copy Markdown
Member Author

Yes, that's only with your commit, not with this PR. No idea why both seem to be non-orthogonal. Once I get this PR working again I'll try combining them.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Aug 23, 2021

I don’t think we need the trace checking. If it speeds things up that much we should get rid of it as @haampie suggests.

That said I wonder what the percent speed up is with this PR. #24095 added a bunch of redundancy so I suspect the percent speed up will be smaller when this is fixed.

@alalazo: thoughts on getting rid of inspect in env modifications?

@adamjstewart
Copy link
Copy Markdown
Member Author

adamjstewart commented Aug 23, 2021

@memoized doesn't seem to be memoizing. I'm guessing it's because the spec object keeps getting recreated. Going to revert to @alalazo's caching strategy from #13557.

@adamjstewart
Copy link
Copy Markdown
Member Author

@memoized doesn't seem to be memoizing. I'm guessing it's because the spec object keeps getting recreated. Going to revert to @alalazo's caching strategy from #13557.

Or it could be that you can't stack @memoized with @property, checking now...

@adamjstewart
Copy link
Copy Markdown
Member Author

Nope, even if I remove @property nothing gets memoized. Going to go back.

@adamjstewart
Copy link
Copy Markdown
Member Author

Okay, with the latest changes we're about the same as with my first commit:

$ time spack env activate .

real	0m7.588s
user	0m6.731s
sys	0m0.803s
$ time spack env view regenerate
==> Updating view at /Users/Adam/.spack/.spack-env/view

real	2m33.586s
user	1m49.626s
sys	0m32.851s
$ time spack env deactivate

real	0m9.635s
user	0m7.581s
sys	0m1.456s

When stacked with @haampie's suggestion to remove trace checking, we get:

$ time spack env activate .

real	0m7.077s
user	0m6.362s
sys	0m0.537s
$ time spack env view regenerate

real	1m25.397s
user	1m22.537s
sys	0m1.675s
$ time spack env deactivate

real	0m7.497s
user	0m6.633s
sys	0m0.617s

@adamjstewart adamjstewart requested a review from tgamblin August 23, 2021 22:46
Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple questions

@adamjstewart
Copy link
Copy Markdown
Member Author

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Aug 24, 2021

I've started that pipeline for you!

@adamjstewart
Copy link
Copy Markdown
Member Author

Does anyone have any idea why gitlab is failing? Same error message twice now so I'm guessing this PR broke something.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Aug 25, 2021

Here's the log for py-petsc4py -- the first error. You can browse to artifacts and download it even though it isn't shown. Working on getting rid of placeholders for relocation, which seems to be what is killing the log here.

This is the error:

In file included from src/petsc4py.PETSc.c:612:0,
                 from src/PETSc.c:4:
src/include/petsc4py/numpy.h:11:10: fatal error: numpy/arrayobject.h: No such file or directory
 #include "numpy/arrayobject.h"
          ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.

spack-build-out.txt

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Aug 25, 2021

I've started that pipeline for you!

@adamjstewart adamjstewart force-pushed the fixes/python-env-activation branch from 7fb14dc to e3de17d Compare August 25, 2021 18:52
@adamjstewart
Copy link
Copy Markdown
Member Author

Still a lot of placeholder stuff in the build logs. I think I have to rebase for GitLab changes to take effect. Rebasing...

@adamjstewart
Copy link
Copy Markdown
Member Author

GitLab failure is coming from Cython not being found, so PYTHONPATH is messed up. Will investigate.

@adamjstewart
Copy link
Copy Markdown
Member Author

@scottwittenburg GitLab CI is awesome because without it we would have merged multiple bugs that would've completely broken Spack's Python builds!

Pretty sure things are better now. Going to build my entire Python stack with this PR to make sure.

@adamjstewart
Copy link
Copy Markdown
Member Author

Successfully installed an environment containing 250+ Python packages using this branch. I'm not sure why the GitLab tests are still failing, but this PR should be working now.

@scottwittenburg
Copy link
Copy Markdown
Contributor

Downloaded the build output log from one of the failed builds, here's the error:

==> [2021-08-26-04:20:52.178303] py-numpy: Executing phase: 'build'
==> [2021-08-26-04:20:52.225214] '/home/software/radiuss/[padded-to-512-chars]/linux-ubuntu18.04-x86_64/gcc-7.5.0/python-3.8.11-wxjx5zfrenxak54hvd26htqqs7vfuoum/bin/python3.8' '-s' 'setup.py' '--no-user-cfg' 'build'
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_mt19937.pyx
Traceback (most recent call last):
  File "/tmp/root/spack-stage/spack-stage-py-numpy-1.21.2-6gca4ndngenkmkkumxm7ypfm4uumzslg/spack-src/tools/cythonize.py", line 59, in process_pyx
    import Cython
ModuleNotFoundError: No module named 'Cython'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/root/spack-stage/spack-stage-py-numpy-1.21.2-6gca4ndngenkmkkumxm7ypfm4uumzslg/spack-src/tools/cythonize.py", line 240, in <module>
    main()
  File "/tmp/root/spack-stage/spack-stage-py-numpy-1.21.2-6gca4ndngenkmkkumxm7ypfm4uumzslg/spack-src/tools/cythonize.py", line 236, in main
    find_process_files(root_dir)
  File "/tmp/root/spack-stage/spack-stage-py-numpy-1.21.2-6gca4ndngenkmkkumxm7ypfm4uumzslg/spack-src/tools/cythonize.py", line 227, in find_process_files
    process(root_dir, fromfile, tofile, function, hash_db)
  File "/tmp/root/spack-stage/spack-stage-py-numpy-1.21.2-6gca4ndngenkmkkumxm7ypfm4uumzslg/spack-src/tools/cythonize.py", line 193, in process
    processor_function(fromfile, tofile)
  File "/tmp/root/spack-stage/spack-stage-py-numpy-1.21.2-6gca4ndngenkmkkumxm7ypfm4uumzslg/spack-src/tools/cythonize.py", line 66, in process_pyx
    raise OSError(msg) from e
OSError: Cython needs to be installed in Python as a module
Running from numpy source directory.
Cythonizing sources
Traceback (most recent call last):
  File "setup.py", line 448, in <module>
    setup_package()
  File "setup.py", line 430, in setup_package
    generate_cython()
  File "setup.py", line 236, in generate_cython
    raise RuntimeError("Running cythonize failed!")
RuntimeError: Running cythonize failed!

@adamjstewart
Copy link
Copy Markdown
Member Author

Yep, that was happening before when I was installing cython to the wrong location, but that should be fixed now. GitLab doesn't cache "successful" installations from previous commits does it?

@tgamblin
Copy link
Copy Markdown
Member

@scottwittenburg: I think the issue here is that @adamjstewart is making core changes to PythonPackage, but the packages are getting cached based on sub-package-specific factors. Full hash doesn't include the build system superclasses right now.

So I'm guessing that this isn't a gitlab caching issue -- it's that the full hash isn't capturing the changes.

@adamjstewart I think the thing to do here is to open one more PR from this branch, now that you're sure it works. Can you try that? The pipeline for that PR won't have full hashes cached for this work and things changed here will get a final rebuild.

We do not yet have a way to rebuild everything in a PR... but we might need that and a way to invalidate existing caches for a PR (note that the first is a much bigger hammer).

I could also "just" fix the full hash but that is probably best done in a separate PR.

@scottwittenburg
Copy link
Copy Markdown
Contributor

I see, so based on that, what I could do is wipe out the per-PR mirror contents (where as @adamjstewart suggested, successful builds from previous pushes are cached), and then we could ask spackbot to run the pipeline again. I can also just remove a subset of the binaries if you guys think deleting everything is too heavy-handed.

@scottwittenburg
Copy link
Copy Markdown
Contributor

It would be great if we could ask spackbot to rebuild everything from source. That will require some support in spack.

@scottwittenburg
Copy link
Copy Markdown
Contributor

Ah, nevermind, I see you just opened a new PR. 👍

@tgamblin
Copy link
Copy Markdown
Member

@scottwittenburg looks like @adamjstewart has made a PR for this but a spackbot command for exactly that (removing all caches for a PR), that would be huge. And another for rebuilding everything would also be good (I think this one is harder). We can wait on that this time, but note here that if we enable PR graduation for PRs like this one, and their caches are actually wrong, then we'll be promoting bad stuff to the shared PR cache.

I think the right thing to do here is to make the full hash better. But I suspect we'll still want this kind of thing for real core changes -- even ones outside of the build system packages.

tgamblin pushed a commit that referenced this pull request Aug 26, 2021
This is a direct followup to #13557 which caches additional attributes that were added in #24095 that are expensive to compute. I had to reopen #25556 in another PR to invalidate the GitLab CI cache, but see #25556 for prior discussion.

### Before

```console
$ time spack env activate .

real	2m13.037s
user	1m25.584s
sys	0m43.654s
$ time spack env view regenerate
==> Updating view at /Users/Adam/.spack/.spack-env/view

real	16m3.541s
user	10m28.892s
sys	4m57.816s
$ time spack env deactivate

real	2m30.974s
user	1m38.090s
sys	0m49.781s
```

### After
```console
$ time spack env activate .

real	0m8.937s
user	0m7.323s
sys	0m1.074s
$ time spack env view regenerate
==> Updating view at /Users/Adam/.spack/.spack-env/view

real	2m22.024s
user	1m44.739s
sys	0m30.717s
$ time spack env deactivate

real	0m10.398s
user	0m8.414s
sys	0m1.630s
```

Fixes #25555
Fixes #25541 

* Speedup environment activation, part 2
* Only query distutils a single time
* Fix KeyError bug
* Make vermin happy
* Manual memoize
* Add comment on cross-compiling
* Use platform-specific include directory
* Fix multiple bugs
* Fix python_inc discrepancy
* Fix import tests
@adamjstewart adamjstewart deleted the fixes/python-env-activation branch August 26, 2021 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Environment activation is slow Environment view updates are slow

5 participants