Skip to content

Fix NetCDF/HDF5 dependency resolution problems#1553

Merged
tgamblin merged 2 commits intospack:developfrom
adamjstewart:fixes/netcdf-hdf5
Sep 8, 2016
Merged

Fix NetCDF/HDF5 dependency resolution problems#1553
tgamblin merged 2 commits intospack:developfrom
adamjstewart:fixes/netcdf-hdf5

Conversation

@adamjstewart
Copy link
Copy Markdown
Member

Fixes #1552. This was causing errors like:

$ spack spec netcdf ^[email protected]
Input spec
------------------------------
  netcdf
      ^hdf5

Normalized
------------------------------
==> Error: netcdf does not depend on hdf5

and

$ spack spec netcdf+mpi ^mvapich2
Input spec
------------------------------
  netcdf+mpi
      ^mvapich2

Normalized
------------------------------
==> Error: netcdf does not depend on mvapich2

Here's my understanding of the problem. @tgamblin can correct me if I'm wrong. During normalization, variants are ignored unless explicitly stated. So netcdf+mpi ^hdf5 and netcdf~mpi ^hdf5 work but netcdf ^hdf5 doesn't. Also, versions are ignored during normalization unless explicitly stated. So [email protected]+mpi ^mvapich2 and [email protected]+mpi ^mvapich2 work but netcdf+mpi ^mvapich2 doesn't. This solution spreads all of the constraints out so that they do not interfere. Obviously dependency resolution is a little bit broken if mpi defaults to true but netcdf doesn't depend on mpi, but that's another issue that's been well documented.

@davydden
Copy link
Copy Markdown
Member

i was under impression that all depends_on() should be self-excluding to work properly. Not sure if this will not break concretization of big DAGs with netcdf.

@adamjstewart
Copy link
Copy Markdown
Member Author

@davydden Not sure what you mean. Anyway, concretization was already broken for big DAGs like NetCDF. Now it is not broken. If you have anything you want me to test, let me know.

@davydden
Copy link
Copy Markdown
Member

Not sure what you mean

i meant that for each version/variant of a package @ver+vars there should be only one depends_on(other_package,...) which applies. But i could be wrong, perhaps @tgamblin could comment on this.

@xjrc
Copy link
Copy Markdown
Member

xjrc commented Aug 18, 2016

This is pretty cool! I was under the same impression as @davydden when it came to depends_on, but I'm glad that this works because it makes writing complex/overlapping dependencies much simpler.

@adamjstewart: Just for the sake of completeness, would you mind running spack spec on the all of the netcdf[+|~]mpi@[:4.4.0|4.4.1:] variant combinations and verifying that they're equivalent to what was intended by the old depends_on statements?

@adamjstewart
Copy link
Copy Markdown
Member Author

Oops, just discovered a bug with my implementation. When you specify the variants or versions, everything works fine. But if you don't specify anything and don't have a packages.yaml file, you run into this error:

$ spack spec netcdf
==> Error: Invalid spec: '[email protected]%[email protected]+cxx~debug+fortran~mpi+shared~szip~threadsafe arch=linux-centos6-x86_64'. Package hdf5 requires variant +mpi, but spec asked for ~mpi

The problem is that the NetCDF +mpi variant defaults to True and the HDF5 +mpi variant defaults to False. I tried removing depends_on('hdf5~mpi', when='~mpi') but that doesn't seem to help. If I make HDF5's +mpi variant default to True, that seems to fix it. Does anyone have a problem with this solution? The only other thing I can think of is to not enforce variant forwarding, but then there would be no way to ensure that netcdf+mpi gets built with hdf5+mpi.

@davydden
Copy link
Copy Markdown
Member

davydden commented Aug 18, 2016

If I make HDF5's +mpi variant default to True, that seems to fix it. Does anyone have a problem with this solution?

fine with me. The only problem is if somebody later change it in hdf5 it will break netcdf.

The only other thing I can think of is to not enforce variant forwarding, but then there would be no way to ensure that netcdf+mpi gets built with hdf5+mpi.

would not do that.

I would also check that the following works:

spack spec netcdf
spack spec netcdf+mpi
spack spec netcdf~mpi
spack spec [email protected]
spack spec [email protected]~mpi
spack spec [email protected]+mpi

I am afraid that hdf5@:1.8 won't be used for 4.3.3 as it was the before.

@adamjstewart
Copy link
Copy Markdown
Member Author

Discovered a problem with my new solution. If I make the HDF5 +mpi variant default to true and set NetCDF to netcdf~mpi in my packages.yaml file, spack spec netcdf gives the same error message. I'm not sure exactly how to handle variant forwarding here.

@davydden
Copy link
Copy Markdown
Member

just clarify, in the current version of patch there is

+    depends_on('hdf5')
 +
 +    # Variant forwarding
 +    depends_on('hdf5+mpi',  when='+mpi')
 +    depends_on('hdf5~mpi',  when='~mpi')
 +
 +    # NetCDF 4.4.0 and prior have compatibility issues with HDF5 1.10 and later
 +    # https://github.com/Unidata/netcdf-c/issues/250
 +    depends_on('hdf5@:1.8', when='@:4.4.0')

So when you install [email protected]+mpi you actually have 3 depends_on() which satisfy this spec of netcdf. What Spack does in this case -- no idea, but could be nothing good...

@davydden
Copy link
Copy Markdown
Member

davydden commented Aug 18, 2016

@adamjstewart pls, put [WIP] to the title. Unless @becker33 or @tgamblin have an alternative solution, it looks like this patch does not fix a problem [fixes one but introduces others].

@adamjstewart adamjstewart changed the title Fix NetCDF/HDF5 dependency resolution problems [WIP] Fix NetCDF/HDF5 dependency resolution problems Aug 18, 2016
@adamjstewart
Copy link
Copy Markdown
Member Author

@davydden The version handling works fine with my changes:

$ spack spec netcdf
    [email protected]
        ^[email protected]
$ spack spec [email protected]
    [email protected]
        ^[email protected]
$ spack spec netcdf+mpi
    [email protected]%[email protected]~hdf4+mpi
        ^[email protected]%[email protected]+cxx~debug+fortran+mpi+shared~szip~threadsafe
$ spack spec netcdf~mpi
    [email protected]%[email protected]~hdf4~mpi
        ^[email protected]%[email protected]+cxx~debug+fortran~mpi+shared~szip~threadsafe
$ spack spec [email protected]~mpi
    [email protected]%[email protected]~hdf4~mpi
        ^[email protected]%[email protected]+cxx~debug+fortran~mpi+shared~szip~threadsafe
$ spack spec [email protected]+mpi
    [email protected]%[email protected]~hdf4+mpi
        ^[email protected]%[email protected]+cxx~debug+fortran+mpi+shared~szip~threadsafe

In fact, it ensures that you can't fuck up:

$ spack spec [email protected] ^[email protected]
Input spec
------------------------------
  [email protected]
      ^[email protected]

Normalized
------------------------------
==> Error: Invalid spec: '[email protected]'. Package hdf5 requires version :1.8, but spec asked for 1.10.0

The only remaining hurdle is variant forwarding. I believe this is more of a Spack-wide problem. Does anyone have any ideas as to how we can handle this?

@adamjstewart
Copy link
Copy Markdown
Member Author

The way that Spack dependency resolution works is that it considers all constraints and tries to satisfy all of them. So if netcdf depends on hdf5, and netcdf+mpi depends on hdf5+mpi, and [email protected] depends on hdf5@:1.8, then Spack can easily satisfy all of those constraints. In that sense, having 5 separate depends_on statements in a package isn't a problem as long as they don't conflict.

The problem now becomes that HDF5 could try to enable MPI at the same time that NetCDF disables MPI. I'm not sure how to handle this one.

@adamjstewart
Copy link
Copy Markdown
Member Author

The best thing that I can think of until global variant forwarding is put in place is to remove the variant forwarding from the NetCDF packages and raise an error message if someone tries to build netcdf+mpi ^hdf5~mpi, either on purpose or by accident. It's a shame that default settings have the same precedence as explicitly requested settings during concretization.

@eschnett
Copy link
Copy Markdown
Contributor

If two packages want conflicting variants, and one of these is closer to the "root", then this package should take precedence. Say the user wants to install A, and A depends on B, B defaults to +mpi, and B depends on C which defaults to -mpi, then B's setting should take precedence.

@adamjstewart
Copy link
Copy Markdown
Member Author

@eschnett Should as in "in an ideal world" or should as in "that's the way Spack should currently work"?

@eschnett
Copy link
Copy Markdown
Contributor

@adamjstewart "in an ideal world", as in "this seems to be a reasonable way to resolve such a conflict that will work in many cases".

Of course, another reasonable way is "detect this, output an error, ask the user to make an additional choice". Heeding this choice is something that should already be happening...

@adamjstewart
Copy link
Copy Markdown
Member Author

Alright, I removed variant forwarding and made the HDF5 +mpi variant default to True. Now, netcdf+mpi depends on hdf5+mpi and netcdf~mpi also depends on hdf5+mpi. If you try to explicitly build netcdf+mpi ^hdf5~mpi, spack spec won't complain, but the install will fail with that error message. I don't see a better way to solve this problem until variant forwarding works properly.

@adamjstewart adamjstewart changed the title [WIP] Fix NetCDF/HDF5 dependency resolution problems Fix NetCDF/HDF5 dependency resolution problems Aug 18, 2016
@davydden
Copy link
Copy Markdown
Member

davydden commented Aug 18, 2016

netcdf~mpi also depends on hdf5+mpi

netcdf~mpi^hdf5~mpi works?

nevermind, i see that it will from the code.

variant('fortran', default=True, description='Enable Fortran support')

variant('mpi', default=False, description='Enable MPI support')
variant('mpi', default=True, description='Enable MPI support')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of modifying HDF5, can you modify NetCDF to have ~mpi as default?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could. But shouldn't all packages have mpi default to true? Spack is a Supercomputer PACKage manager after all.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For comparison, the following packages default to +mpi:

  • bertini, cp2k, dakota, dealii, elk, espresso, fenics, gromacs, hmmer, hoomd-blue, intel-parallel-studio, kripke, llvm, meep, mumps, netcdf, paradiseo, petsc, plumed, py-meep, sundials, swiftsim, tau, valgrind

and the following packages default to ~mpi:

  • arpack-ng, caliper, fftw, gmsh, hdf5, hpx5, mfem, paraview, pgi, py-h5py, scotch, turbomole, zoltan

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with @adamjstewart that as many things as possible should be +mpi by default.

@adamjstewart
Copy link
Copy Markdown
Member Author

Ping @tgamblin

@tgamblin tgamblin merged commit 3d3d65f into spack:develop Sep 8, 2016
@eschnett
Copy link
Copy Markdown
Contributor

Next time before changing the default setting of a package, we should check that all dependents of that package still build.

Building HDF5 with MPI means that all dependents of HDF5 need to use mpicc instead of cc. Packages that don't expect this are now broken. In other words, changing the default exposed latent bugs in these packages' dependencies.

This is the list of packages that depend on HDF5, and which don't declare whether they want MPI. I assume some of them are broken now. At least visit is; a plain spack install visit does not work.

adios/package.py:73:    depends_on('hdf5', when='+hdf5')
armadillo/package.py:46:    depends_on('hdf5', when='+hdf5')
cgns/package.py:41:    depends_on('hdf5', when='+hdf5')
fenics/package.py:77:    depends_on('hdf5', when='+hdf5')
gdal/package.py:55:    depends_on("hdf5", when='+hdf5')
gmsh/package.py:67:    depends_on('hdf5', when='+hdf5')
hdf5-blosc/package.py:58:    depends_on("hdf5")
kealib/package.py:49:    depends_on("hdf5")
libsplash/package.py:48:    depends_on('[email protected]:')
netcdf/package.py:53:    depends_on('hdf5')
octave/package.py:89:    depends_on('hdf5',         when='+hdf5')
py-pytables/package.py:37:    depends_on('hdf5')
silo/package.py:45:    depends_on("hdf5 @:1.8.12")
visit/package.py:42:    depends_on("hdf5")

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Sep 13, 2016

Next time before changing the default setting of a package, we should check that all dependents of that package still build.

CI or a simple script everyone can run? :)

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Sep 13, 2016

Building HDF5 with MPI means that all dependents of HDF5 need to use mpicc instead of cc.

actually not, nearly every CMake project does not use the wrappers but adds the mpi compile and linker flags "manually" from the package config.

also, nobody with a complex application uses the h5cc or the h5pcc.openmpi wrappers (how should anyone? One can not use mpicc and h5cc and scorep cc etc. at the same time...) :)

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Sep 13, 2016

adios/package.py:73: depends_on('hdf5', when='+hdf5')

updated in #1699 (pls merge)

libsplash/package.py:48: depends_on('[email protected]:')

actually builds with a serial (any) and a (explicitly) parallel HDF5 variant

    depends_on('[email protected]:')
    depends_on('hdf5+mpi', when='+mpi')

see version dependency problems with spack spec HDF5 in #1727 (comment) and following comments.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Sep 13, 2016

This is the list of packages that depend on HDF5, and which don't declare whether they want MPI. I assume some of them are broken now.

That just means they do not care if MPI support in HDF5 is provided or not. Those packages declare they work with both variants of HDF5.

@eschnett
Copy link
Copy Markdown
Contributor

@ax3l I know what it means if a package doesn't declare whether they support the MPI variant of HDF5 or not. The point is that this change uncovered latent bugs since at least some of these packages break if it is enabled. Proper testing would have discovered this.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Sep 13, 2016

Yes that would be wonderful. Do you have an idea how establish a workflow for such changes? Maybe automated nightly builds of all passing PRs as in kitware's next branches?
Building the stack mentioned above from scratch (e.g., VTK->Visit) takes GBytes of HDD and hours of compiling :)

@eschnett
Copy link
Copy Markdown
Contributor

Yes, I do. See https://github.com/eschnett/spack-test.

In this case, the committer could have kicked off a build on a particular system where Spack is regularly tested. (I don't know whether such a system exists, but arguably there should be one.) I assume that each package can be built via a simple spack install X, and we could have run such tests. If that delays progress too much, then we can commit optimistically, and then revert when builds break.

If there is no such system, then we could ask XSEDE or NERSC for access to a system to run the tests.


def install(self, spec, prefix):
# Workaround until variant forwarding works properly
if '+mpi' in spec and spec.satisfies('^hdf5~mpi'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this read instead:

 if '+mpi' in spec and not spec.satisfies('^hdf5+mpi'):

???

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are equivalent?

@skosukhin skosukhin mentioned this pull request Oct 19, 2017
@adamjstewart adamjstewart mentioned this pull request Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants