Skip to content

Speedup environment activation#13557

Merged
tgamblin merged 5 commits intospack:developfrom
alalazo:fixes/speedup_environment_activation
Dec 2, 2019
Merged

Speedup environment activation#13557
tgamblin merged 5 commits intospack:developfrom
alalazo:fixes/speedup_environment_activation

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Nov 4, 2019

fixes #13554

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2019

@adamjstewart Can you try this and report back? I'll try to see if I can improve it further in terms of performance.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2019

I think a refactor of _set_variables_for_single_module might also be a good thing, but given the entity of the performance bug reported in #13554 I'd be in favor of merging this as a hotfix when considered done and do the refactor without hurry in a later PR.

@adamjstewart
Copy link
Copy Markdown
Member

Here are the same benchmarks on my laptop. From my home directory:

$ time spack help
...
real	0m2.366s
user	0m0.728s
sys	0m0.298s

From the directory containing the large spack.yaml (installed environment):

$ time spack help
...
real	0m6.649s
user	0m4.290s
sys	0m1.054s

Definitely a drastic improvement over develop!

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2019

@adamjstewart So far I tried to optimize for your small environment. Making a small break while the big one is building 🙂

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2019

@adamjstewart Tried on the big environment. I might shave something more if we are fine losing the information in:

def _get_outside_caller_attributes(self):
stack = inspect.stack()
try:
_, filename, lineno, _, context, index = stack[2]
context = context[index].strip()
except Exception:
filename = 'unknown file'
lineno = 'unknown line'
context = 'unknown context'
args = {'filename': filename, 'lineno': lineno, 'context': context}
return args

in cases where we modify the environment a lot. If need be I can compute the information only in debug mode, but if the timings are good enough without this change I would keep the PR as it is. Let me know.

@adamjstewart
Copy link
Copy Markdown
Member

I'm not sure how that function is used, so I don't know if we can skip it or not.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2019

@adamjstewart If there are warnings on e.g. the same variable set two times in a DAG (so the first "set" won't have any effect) it will print context to tell you where both set took place. It would be good to keep it as it is, so if the performance is acceptable right now I'd avoid making any change.

@adamjstewart
Copy link
Copy Markdown
Member

Yeah, I guess it depends on how much of a difference it makes. The current PR drops us from 2 minutes to 6 seconds. If it would drop us from 6 seconds to 3 seconds, it would be worth it in my mind.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2019

The current PR drops us from 2 minutes to 6 seconds.

You didn't try the latest 2 commits 😉

@adamjstewart
Copy link
Copy Markdown
Member

Okay, with the latest commit, I'm down to 4 seconds.

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried cherry-picking this into our new tutorial branch but it breaks the tutorial with

==> Error: name 'spack_cc' is not defined

I haven't had a chance to dig into it, but I moved it to 0.13.2 to be on the safe side.

@alalazo alalazo force-pushed the fixes/speedup_environment_activation branch from 6bfa4e2 to 0aa8f7c Compare November 11, 2019 12:21
@alalazo alalazo force-pushed the fixes/speedup_environment_activation branch from 0aa8f7c to 0cd2610 Compare November 11, 2019 15:17
Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalazo: These are the functions in packages where spack_cc is used in instance methods within packages:

$ git grep -ph  'spack_cc\b' | grep def| sort | uniq -c | sort -rn
     11     def edit(self, spec, prefix):
      8     def install(self, spec, prefix):
      6     def build(self, spec, prefix):
      4     def configure_args(self):
      3     def setup_dependent_environment(self, spack_env, run_env, dependent_spec):
      2     def setup_dependent_environment(self, *args):
      2     def make_defs(self):
      1     def setup_environment(self, spack_env, run_env):
      1     def setup_dependent_package(self, module, dependent_spec):
      1     def setup_dependent_build_environment(self, env, dependent_spec):
      1     def setup_build_environment(self, env):
      1     def patch(self):
      1     def filter_compilers(self):
      1     def cmake_args(self):
      1     def build_targets(self):
      1     def build_libsqlitefunctions(self):
      1     def build_args(self, spec, prefix):

To get git grep -p to work properly for me I had to add this to my git attributes:

$ cat ~/src/spack/.git/info/attributes
*.py diff=python

If you do that I think it's a decent way of seeing where the various module variables are used. I think we need to either:

  1. ensure that these things are available in package instance methods even when they're called outside the build, or
  2. be clearer about their availability and fix the packages.

I think for now (1) makes the most sense. Is reconstructing the compiler paths really all that slow? What's the costly part of that operation?

@alalazo alalazo force-pushed the fixes/speedup_environment_activation branch from 0cd2610 to 3dfaa89 Compare December 2, 2019 09:02
The most expensive calls seems to be the ones trying to recover
compilers and other variables that are needed only at build time. Avoid
to compute them wherever possible.
When setting up dependencies the `home` property of the Python package
might be called a large number of times. Since its computation is
expensive here we cache it based on the DAG hash of the Python spec.
@alalazo alalazo force-pushed the fixes/speedup_environment_activation branch from 3dfaa89 to d29f467 Compare December 2, 2019 14:08
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Dec 2, 2019

@tgamblin I reverted c2f6113 and implemented request number 1. Without that commit we retain most of the speedup so it makes sense also to me to look for further improvements later.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Dec 2, 2019

@adamjstewart: how fast is this for you now?

@adamjstewart
Copy link
Copy Markdown
Member

On my laptop:

Before (develop)

From my home directory:

$ time spack help
...
real	0m2.871s
user	0m0.719s
sys	0m0.307s

From a directory containing a large spack.yaml:

$ time spack help
...
real	3m22.219s
user	3m10.927s
sys	0m3.275s

After (fixes/speedup_environment_activation)

From my home directory:

$ time spack help
...
real	0m1.972s
user	0m0.682s
sys	0m0.271s

From a directory containing a large spack.yaml:

$ time spack help
...
real	0m5.015s
user	0m3.607s
sys	0m0.612s

Let me know if you want benchmarks on Blue Waters where the problem is even more pronounced.

@tgamblin tgamblin merged commit bca59f8 into spack:develop Dec 2, 2019
tgamblin pushed a commit that referenced this pull request Dec 2, 2019
* Add a transaction around repeated calls to `spec.prefix` in the activation process
* cache the computation of home in the python package to speed up setting deps
* ensure that module-scope variables are only set *once* per module
@alalazo alalazo deleted the fixes/speedup_environment_activation branch December 2, 2019 22:18
tgamblin added a commit that referenced this pull request Dec 5, 2019
v0.13.2

This release contains major performance improvements for Spack environments, as well as some bugfixes and minor changes.

* allow missing modules if they are blacklisted (#13540)
* speed up environment activation (#13557)
* mirror path works for unknown versions (#13626)
* environments: don't try to modify run-env if a spec is not installed (#13589)
* use semicolons instead of newlines in module/python command (#13904)
* verify.py: os.path.exists exception handling (#13656)
* Document use of the maintainers field (#13479)
* bugfix with config caching (#13755)
* hwloc: added 'master' version pointing at the HEAD of the master branch (#13734)
* config option to allow gpg warning suppression (#13744)
* fix for relative symlinks when relocating binary packages (#13727)
* allow binary relocation of strings in relative binaries (#13724)
tgamblin added a commit that referenced this pull request Dec 5, 2019
v0.13.2

This release contains major performance improvements for Spack environments, as well as some bugfixes and minor changes.

* allow missing modules if they are blacklisted (#13540)
* speed up environment activation (#13557)
* mirror path works for unknown versions (#13626)
* environments: don't try to modify run-env if a spec is not installed (#13589)
* use semicolons instead of newlines in module/python command (#13904)
* verify.py: os.path.exists exception handling (#13656)
* Document use of the maintainers field (#13479)
* bugfix with config caching (#13755)
* hwloc: added 'master' version pointing at the HEAD of the master branch (#13734)
* config option to allow gpg warning suppression (#13744)
* fix for relative symlinks when relocating binary packages (#13727)
* allow binary relocation of strings in relative binaries (#13724)
samiilvonen pushed a commit to CSCfi/spack that referenced this pull request Jan 10, 2020
* Add a transaction around repeated calls to `spec.prefix` in the activation process
* cache the computation of home in the python package to speed up setting deps
* ensure that module-scope variables are only set *once* per module
tgamblin pushed a commit that referenced this pull request Aug 26, 2021
This is a direct followup to #13557 which caches additional attributes that were added in #24095 that are expensive to compute. I had to reopen #25556 in another PR to invalidate the GitLab CI cache, but see #25556 for prior discussion.

### Before

```console
$ time spack env activate .

real	2m13.037s
user	1m25.584s
sys	0m43.654s
$ time spack env view regenerate
==> Updating view at /Users/Adam/.spack/.spack-env/view

real	16m3.541s
user	10m28.892s
sys	4m57.816s
$ time spack env deactivate

real	2m30.974s
user	1m38.090s
sys	0m49.781s
```

### After
```console
$ time spack env activate .

real	0m8.937s
user	0m7.323s
sys	0m1.074s
$ time spack env view regenerate
==> Updating view at /Users/Adam/.spack/.spack-env/view

real	2m22.024s
user	1m44.739s
sys	0m30.717s
$ time spack env deactivate

real	0m10.398s
user	0m8.414s
sys	0m1.630s
```

Fixes #25555
Fixes #25541 

* Speedup environment activation, part 2
* Only query distutils a single time
* Fix KeyError bug
* Make vermin happy
* Manual memoize
* Add comment on cross-compiling
* Use platform-specific include directory
* Fix multiple bugs
* Fix python_inc discrepancy
* Fix import tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spack Environments are painfully slow

3 participants