Skip to content

performance: add a verification file to the database#14693

Merged
tgamblin merged 6 commits intospack:developfrom
aweits:performance
Mar 21, 2020
Merged

performance: add a verification file to the database#14693
tgamblin merged 6 commits intospack:developfrom
aweits:performance

Conversation

@aweits
Copy link
Copy Markdown
Contributor

@aweits aweits commented Jan 30, 2020

fixes #15030
fixes #15040

Add a file containing a unique uuid that is regenerated at database
write time. Use this uuid to suppress re-parsing the database
contents if we know a previous uuid and the uuid has not changed.

@tgamblin
Copy link
Copy Markdown
Member

@aweits: cool! Do you have numbers on where/how much this improves things?

@aweits
Copy link
Copy Markdown
Contributor Author

aweits commented Jan 30, 2020

pseudo-objectively: 'time spack install gmake'

without:
29.898u 19.417s 0:37.82 130.3% 0+0k 11096+66296io 28pf+0w

with:
18.767u 19.197s 0:26.33 144.1% 0+0k 64+67064io 0pf+0w

But the subjective experience of watching the display after it's in place should make you very happy :-)

@aweits
Copy link
Copy Markdown
Contributor Author

aweits commented Jan 30, 2020

This probably highlights it better:

without:

coltrane (UPSTREAMSPACK/spack):/bin/time spack --timestamp install gmake
==> [2020-01-30-15:51:52.702149] libiconv is already installed in /home/aweits/GIT/UPSTREAMSPACK/spack/opt/spack/linux-centos7-ivybridge/gcc-7.4.0/libiconv-1.16-squ24c2ertryh2g6a7ikdespaywxeuki
...
==> [2020-01-30-15:52:02.436669] Installing gmake

29.45user 19.45system 0:37.32elapsed 131%CPU (0avgtext+0avgdata 47048maxresident)k
16inputs+65816outputs (0major+1195819minor)pagefaults 0swaps

with:

coltrane (UPSTREAMSPACK/spack):/bin/time spack --timestamp install gmake
==> [2020-01-30-15:53:46.201976] libiconv is already installed in /home/aweits/GIT/UPSTREAMSPACK/spack/opt/spack/linux-centos7-ivybridge/gcc-7.4.0/libiconv-1.16-squ24c2ertryh2g6a7ikdespaywxeuki
...
==> [2020-01-30-15:53:46.537670] Installing gmake

18.83user 19.41system 0:26.62elapsed 143%CPU (0avgtext+0avgdata 46568maxresident)k
40inputs+67040outputs (0major+1185091minor)pagefaults 0swaps

@aweits
Copy link
Copy Markdown
Contributor Author

aweits commented Feb 10, 2020

@tgamblin, this should fix the installation performance issue that @adamjstewart mentioned in #14190. This really cuts down on the time that spack spends finding packages that are already installed.

@adamjstewart
Copy link
Copy Markdown
Member

adamjstewart commented Feb 18, 2020

@tgamblin I tested this PR myself. I created a test package that looks like:

class Example(Package):                                                         
    url = "http://zlib.net/fossils/zlib-1.2.11.tar.gz"                          
                                                                                
    version('1.2.11', sha256='c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1')
                                                                                
    depends_on('python')                                                        
                                                                                
    def install(self, spec, prefix):                                            
        touch(prefix.foo)

On develop, on my circa-2013 MBP:

$ time spack install example
==> Warning: [email protected] cannot build optimized binaries for "ivybridge". Using best target possible: "x86_64"
==> libiconv is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/libiconv-1.16-gj3jnj46xug7rpjxwaepe6uwx35jdphz
==> diffutils is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/diffutils-3.7-4irrvtxj6myrrxufikynf3myj2u3plor
==> bzip2 is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/bzip2-1.0.8-boqgsigbjrp3xfvcfphkqhsexqrw2iwd
==> expat is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/expat-2.2.9-ctgvqggynhtr2zdidy2f64gk5vgt6jpg
==> pkgconf is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/pkgconf-1.6.3-t64p5ucui6qgo6tepzu44q3hrgpvabrx
==> ncurses is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/ncurses-6.1-jdnritbfxcmk4kfhfsw7d3xuaeluqqvw
==> readline is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/readline-8.0-n4beggn2l64dzrf6elkjcixrklp46ghh
==> gdbm is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/gdbm-1.18.1-jxyriuyf6a477imqrwkmwbanpy6ngoex
==> xz is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/xz-5.2.4-nahnpskrw2jjhtsviwietm4qcs5sa7oe
==> zlib is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/zlib-1.2.11-cvrek7vtvob6v2dfnvdowu2rkqtqtmmi
==> libxml2 is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/libxml2-2.9.9-mjxx7gnpw6wvpbzll2dl3n5dgsbbxd6y
==> tar is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/tar-1.32-lm2b3otglbcasdxrqqpen6lu6x3tiyr4
==> gettext is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/gettext-0.20.1-qbrme2sb4pm3mglxuacgwdxfba464ych
==> libffi is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/libffi-3.2.1-2mukefjdfv5eb2oyrsawyoqdkh2zndsd
==> perl is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/perl-5.30.1-u3gpsj7ovsgyd43ii4iawwkszuxgyjna
==> openssl is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/openssl-1.1.1d-axb6agqybj5ykz6tsroiirkfk2327l2h
==> sqlite is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/sqlite-3.30.1-gnvq3bbsxwzlkbi2cgsgtfhypc7afpo2
==> python is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/python-3.7.6-3gaqcongtvfw2l553jlr5lz3pug2z27w
==> Installing example
==> Searching for binary cache of example
==> No binary for example found: installing from source
==> Warning: microarchitecture specific optimizations are not supported yet on mixed compiler toolchains [check [email protected] for further details]
==> Using cached archive: /Users/Adam/spack/var/spack/cache/_source-cache/archive/c3/c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1.tar.gz
==> Staging archive: /var/folders/21/hwq39zyj4g36x6zjfyl5l8080000gn/T/Adam/spack-stage/spack-stage-example-1.2.11-zz32i2nau2jmxnlmkp7jhavzdnjqnvur/zlib-1.2.11.tar.gz
==> Created stage in /var/folders/21/hwq39zyj4g36x6zjfyl5l8080000gn/T/Adam/spack-stage/spack-stage-example-1.2.11-zz32i2nau2jmxnlmkp7jhavzdnjqnvur
==> No patches needed for example
==> Building example [Package]
==> Executing phase: 'install'
==> Successfully installed example
  Fetch: 0.01s.  Build: 0.76s.  Total: 0.77s.
[+] /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/example-1.2.11-zz32i2nau2jmxnlmkp7jhavzdnjqnvur

real	1m36.808s
user	1m33.575s
sys	0m1.570s

On this branch:

$ time spack install example
==> Warning: [email protected] cannot build optimized binaries for "ivybridge". Using best target possible: "x86_64"
==> libiconv is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/libiconv-1.16-gj3jnj46xug7rpjxwaepe6uwx35jdphz
==> diffutils is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/diffutils-3.7-4irrvtxj6myrrxufikynf3myj2u3plor
==> bzip2 is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/bzip2-1.0.8-boqgsigbjrp3xfvcfphkqhsexqrw2iwd
==> expat is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/expat-2.2.9-ctgvqggynhtr2zdidy2f64gk5vgt6jpg
==> pkgconf is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/pkgconf-1.6.3-t64p5ucui6qgo6tepzu44q3hrgpvabrx
==> ncurses is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/ncurses-6.1-jdnritbfxcmk4kfhfsw7d3xuaeluqqvw
==> readline is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/readline-8.0-n4beggn2l64dzrf6elkjcixrklp46ghh
==> gdbm is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/gdbm-1.18.1-jxyriuyf6a477imqrwkmwbanpy6ngoex
==> xz is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/xz-5.2.4-nahnpskrw2jjhtsviwietm4qcs5sa7oe
==> zlib is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/zlib-1.2.11-cvrek7vtvob6v2dfnvdowu2rkqtqtmmi
==> libxml2 is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/libxml2-2.9.9-mjxx7gnpw6wvpbzll2dl3n5dgsbbxd6y
==> tar is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/tar-1.32-lm2b3otglbcasdxrqqpen6lu6x3tiyr4
==> gettext is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/gettext-0.20.1-qbrme2sb4pm3mglxuacgwdxfba464ych
==> libffi is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/libffi-3.2.1-2mukefjdfv5eb2oyrsawyoqdkh2zndsd
==> perl is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/perl-5.30.1-u3gpsj7ovsgyd43ii4iawwkszuxgyjna
==> openssl is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/openssl-1.1.1d-axb6agqybj5ykz6tsroiirkfk2327l2h
==> sqlite is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/sqlite-3.30.1-gnvq3bbsxwzlkbi2cgsgtfhypc7afpo2
==> python is already installed in /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/python-3.7.6-3gaqcongtvfw2l553jlr5lz3pug2z27w
==> Installing example
==> Searching for binary cache of example
==> No binary for example found: installing from source
==> Warning: microarchitecture specific optimizations are not supported yet on mixed compiler toolchains [check [email protected] for further details]
==> Using cached archive: /Users/Adam/spack/var/spack/cache/_source-cache/archive/c3/c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1.tar.gz
==> Staging archive: /var/folders/21/hwq39zyj4g36x6zjfyl5l8080000gn/T/Adam/spack-stage/spack-stage-example-1.2.11-zz32i2nau2jmxnlmkp7jhavzdnjqnvur/zlib-1.2.11.tar.gz
==> Created stage in /var/folders/21/hwq39zyj4g36x6zjfyl5l8080000gn/T/Adam/spack-stage/spack-stage-example-1.2.11-zz32i2nau2jmxnlmkp7jhavzdnjqnvur
==> No patches needed for example
==> Building example [Package]
==> Executing phase: 'install'
==> Successfully installed example
  Fetch: 0.01s.  Build: 0.81s.  Total: 0.82s.
[+] /Users/Adam/spack/opt/spack/darwin-catalina-x86_64/clang-11.0.0-apple/example-1.2.11-zz32i2nau2jmxnlmkp7jhavzdnjqnvur

real	0m5.247s
user	0m3.528s
sys	0m0.847s

Yes, that's an 18x speedup. The effect is much more pronounced for packages with a lot of dependencies.

But the subjective experience of watching the display after it's in place should make you very happy :-)

Can confirm! Thanks @aweits!!!

Copy link
Copy Markdown
Member

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside of this PR is that it requires Python built with UUID support. In fact, Spack's Python package defaults to ~uuid. Strangely enough, I'm using a Spack-build python~uuid, but UUID support actually works fine on macOS. At the very least, we should change the default to True. We should also see if this will break Spack for python~uuid users on Linux before merging.

@adamjstewart
Copy link
Copy Markdown
Member

Okay, tested this on Linux. As long as UUID is installed on the OS (which is the case on my cluster), then python~uuid still has UUID support. I still wonder if there's a way to do this same thing without UUID though. I don't want to break Spack for anyone...

@adamjstewart
Copy link
Copy Markdown
Member

Possibly important information for validating this performance improvement: my Spack database currently has 834 installed packages, most of which are duplicates due to unintended hash changes. Can't wait for the new concretizer and #14695

@sethrj
Copy link
Copy Markdown
Contributor

sethrj commented Feb 18, 2020

@adamjstewart The "uuid package installed when ~uuid" issue is because every system seems to have a "python-compatible" libuuid that gets automatically picked up by the python configure, even when ~uuid is set on spack.

In fact I find that including libuuid in my views typically breaks downstream installations that link against system libraries as well, e.g.:

/usr/lib64/libSM.so.6: undefined reference to `uuid_unparse_lower@UUID_1.0'
/usr/lib64/libSM.so.6: undefined reference to `uuid_generate@UUID_1.0'
collect2: error: ld returned 1 exit status

See also #12166

@glennpj
Copy link
Copy Markdown
Contributor

glennpj commented Feb 19, 2020

This PR will help with module regeneration as well, as brought up in #15030.
With 130 packages,
Develop branch:

$ time spack module tcl refresh --delete-tree --yes-to-all 
==> Regenerating tcl module files
==> Warning: Quotes in command arguments can confuse scripts like configure.
  The following arguments may cause problems when executed:
      source /opt/packages/gpjohnsn/spack/opt/spack/linux-centos7-broadwell/intel-19.0.5.281/intel-mkl-2019.5.281-vd3jkq7wb2bzhwj7lmrzv64salew2pru/compilers_and_libraries_2019.5.281/linux/mkl/bin/mklvars.sh intel64 &> /dev/null && python3 -c "import os, json; print(json.dumps(dict(os.environ)))"
  Quotes aren't needed because spack doesn't use a shell.
  Consider removing them

real    0m47.978s
user    0m47.515s
sys     0m0.476s

With this PR:

$ time spack module tcl refresh --delete-tree --yes-to-all 
==> Regenerating tcl module files
==> Warning: Quotes in command arguments can confuse scripts like configure.
  The following arguments may cause problems when executed:
      source /opt/packages/gpjohnsn/spack/opt/spack/linux-centos7-broadwell/intel-19.0.5.281/intel-mkl-2019.5.281-vd3jkq7wb2bzhwj7lmrzv64salew2pru/compilers_and_libraries_2019.5.281/linux/mkl/bin/mklvars.sh intel64 &> /dev/null && python3 -c "import os, json; print(json.dumps(dict(os.environ)))"
  Quotes aren't needed because spack doesn't use a shell.
  Consider removing them

real    0m5.409s
user    0m5.107s
sys     0m0.305s

That is ~9x faster.

@glennpj
Copy link
Copy Markdown
Contributor

glennpj commented Feb 19, 2020

On the full stack from #15030, with 1825 packages, the speed up was even greater.

real    2m34.153s
user    2m0.396s
sys     0m3.725s

I do not have the exact time from the original but it was about 9.5 hours, so
~222x speedup.

Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably put a try/except gate around the import of uuid. That way we don't break anything for users who don't have uuid available. Then we can just write the empty string as the verifier if we don't have uuid available (and lose the performance benefits in that case, but without causing errors).

@adamjstewart
Copy link
Copy Markdown
Member

Ping @aweits. Can you rebase and address the above change request?

@aweits
Copy link
Copy Markdown
Contributor Author

aweits commented Mar 20, 2020

@adamjstewart, @becker33, will do - things have been challenging lately.

aweits added 5 commits March 20, 2020 15:31
Add a file containing a unique uuid that is regenerated at database
write time. Use this uuid to suppress re-parsing the database
contents if we know a previous uuid and the uuid has not changed.
the tests reset the last seen verifier in between tests that use the
database fixture.
@tgamblin tgamblin merged commit e4d2250 into spack:develop Mar 21, 2020
@tgamblin
Copy link
Copy Markdown
Member

Thanks @aweits! This is in and we'll backport it to 0.14 in v0.14.2.

with open(self._verifier_path, 'w') as f:
new_verifier = str(uuid.uuid4())
f.write(new_verifier)
self.last_seen_verifier = new_verifier
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamjstewart @tgamblin @aweits Apparently when we reach this line we don't update self._data and we have therefore an inconsistent view of the DB later on (since some specs might be referencing roots that have been removed). Removing this line seems to fix #15773. I'll check if we can come up with a more performant update of self._data than re-reading from file.

alalazo added a commit to alalazo/spack that referenced this pull request Mar 31, 2020
fixes spack#15773

The performance improvements done in spack#14693 where leaving
the DB in an inconsistent state when specs were removed
from it. This PR updates the DB internal state whenever
it is written to file, but does that using a dictionary
that still resides in memory to maintain the performance
gain of spack#14693
alalazo added a commit to alalazo/spack that referenced this pull request Apr 1, 2020
fixes spack#15773

The performance improvements done in spack#14693 where leaving
the DB in an inconsistent state when specs were removed
from it. This PR updates the DB internal state whenever
it is written to file, but does that using a dictionary
that still resides in memory to maintain the performance
gain of spack#14693
tgamblin pushed a commit that referenced this pull request Apr 12, 2020
The performance improvements done in #14693 where leaving the DB in an inconsistent state when specs were removed from it. This PR updates the DB internal state whenever the DB is written to a file.

Note that we still cannot properly enumerate installed dependents, so there is a TODO in this code. Fixing that will require the dependents dictionaries in specs to be re-keyed (either by hash, or not keyed at all -- a list would do).  See #11983 for details.
tgamblin pushed a commit that referenced this pull request Apr 15, 2020
Reading the database repeatedly can be quite slow.  We need a way to speed
up Spack when it reads the DB multiple times, but the DB has not been
modified between reads (which is nearly all the time).

- [x] Add a file containing a unique uuid that is regenerated at database
    write time. Use this uuid to suppress re-parsing the database
    contents if we know a previous uuid and the uuid has not changed.

- [x] Fix mutable_database fixture so that it resets the last seen
    verifier when it resets.

- [x] Enable not rereading the database immediately after a write. Make
    the tests reset the last seen verifier in between tests that use the
    database fixture.

- [x] make presence of uuid module optional
tgamblin pushed a commit that referenced this pull request Apr 15, 2020
The performance improvements done in #14693 where leaving the DB in an inconsistent state when specs were removed from it. This PR updates the DB internal state whenever the DB is written to a file.

Note that we still cannot properly enumerate installed dependents, so there is a TODO in this code. Fixing that will require the dependents dictionaries in specs to be re-keyed (either by hash, or not keyed at all -- a list would do).  See #11983 for details.
@aweits aweits deleted the performance branch February 12, 2021 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Very slow environment concretization module refresh is really slow

8 participants