Fix NetCDF/HDF5 dependency resolution problems#1553
Conversation
|
i was under impression that all |
|
@davydden Not sure what you mean. Anyway, concretization was already broken for big DAGs like NetCDF. Now it is not broken. If you have anything you want me to test, let me know. |
i meant that for each version/variant of a package |
|
This is pretty cool! I was under the same impression as @davydden when it came to @adamjstewart: Just for the sake of completeness, would you mind running |
|
Oops, just discovered a bug with my implementation. When you specify the variants or versions, everything works fine. But if you don't specify anything and don't have a The problem is that the NetCDF +mpi variant defaults to True and the HDF5 +mpi variant defaults to False. I tried removing |
fine with me. The only problem is if somebody later change it in hdf5 it will break netcdf.
would not do that. I would also check that the following works: I am afraid that |
|
Discovered a problem with my new solution. If I make the HDF5 +mpi variant default to true and set NetCDF to netcdf~mpi in my packages.yaml file, |
|
just clarify, in the current version of patch there is So when you install |
|
@adamjstewart pls, put |
|
@davydden The version handling works fine with my changes: In fact, it ensures that you can't fuck up: The only remaining hurdle is variant forwarding. I believe this is more of a Spack-wide problem. Does anyone have any ideas as to how we can handle this? |
|
The way that Spack dependency resolution works is that it considers all constraints and tries to satisfy all of them. So if netcdf depends on hdf5, and netcdf+mpi depends on hdf5+mpi, and [email protected] depends on hdf5@:1.8, then Spack can easily satisfy all of those constraints. In that sense, having 5 separate depends_on statements in a package isn't a problem as long as they don't conflict. The problem now becomes that HDF5 could try to enable MPI at the same time that NetCDF disables MPI. I'm not sure how to handle this one. |
|
The best thing that I can think of until global variant forwarding is put in place is to remove the variant forwarding from the NetCDF packages and raise an error message if someone tries to build |
|
If two packages want conflicting variants, and one of these is closer to the "root", then this package should take precedence. Say the user wants to install A, and A depends on B, B defaults to |
|
@eschnett Should as in "in an ideal world" or should as in "that's the way Spack should currently work"? |
|
@adamjstewart "in an ideal world", as in "this seems to be a reasonable way to resolve such a conflict that will work in many cases". Of course, another reasonable way is "detect this, output an error, ask the user to make an additional choice". Heeding this choice is something that should already be happening... |
|
Alright, I removed variant forwarding and made the HDF5 +mpi variant default to True. Now, |
nevermind, i see that it will from the code. |
| variant('fortran', default=True, description='Enable Fortran support') | ||
|
|
||
| variant('mpi', default=False, description='Enable MPI support') | ||
| variant('mpi', default=True, description='Enable MPI support') |
There was a problem hiding this comment.
Instead of modifying HDF5, can you modify NetCDF to have ~mpi as default?
There was a problem hiding this comment.
I could. But shouldn't all packages have mpi default to true? Spack is a Supercomputer PACKage manager after all.
There was a problem hiding this comment.
For comparison, the following packages default to +mpi:
- bertini, cp2k, dakota, dealii, elk, espresso, fenics, gromacs, hmmer, hoomd-blue, intel-parallel-studio, kripke, llvm, meep, mumps, netcdf, paradiseo, petsc, plumed, py-meep, sundials, swiftsim, tau, valgrind
and the following packages default to ~mpi:
- arpack-ng, caliper, fftw, gmsh, hdf5, hpx5, mfem, paraview, pgi, py-h5py, scotch, turbomole, zoltan
There was a problem hiding this comment.
i agree with @adamjstewart that as many things as possible should be +mpi by default.
|
Ping @tgamblin |
|
Next time before changing the default setting of a package, we should check that all dependents of that package still build. Building HDF5 with MPI means that all dependents of HDF5 need to use This is the list of packages that depend on HDF5, and which don't declare whether they want MPI. I assume some of them are broken now. At least |
CI or a simple script everyone can run? :) |
actually not, nearly every CMake project does not use the wrappers but adds the mpi compile and linker flags "manually" from the package config. also, nobody with a complex application uses the |
updated in #1699 (pls merge)
actually builds with a serial (any) and a (explicitly) parallel HDF5 variant see version dependency problems with |
That just means they do not care if MPI support in HDF5 is provided or not. Those packages declare they work with both variants of HDF5. |
|
@ax3l I know what it means if a package doesn't declare whether they support the MPI variant of HDF5 or not. The point is that this change uncovered latent bugs since at least some of these packages break if it is enabled. Proper testing would have discovered this. |
|
Yes that would be wonderful. Do you have an idea how establish a workflow for such changes? Maybe automated nightly builds of all passing PRs as in kitware's |
|
Yes, I do. See https://github.com/eschnett/spack-test. In this case, the committer could have kicked off a build on a particular system where Spack is regularly tested. (I don't know whether such a system exists, but arguably there should be one.) I assume that each package can be built via a simple If there is no such system, then we could ask XSEDE or NERSC for access to a system to run the tests. |
|
|
||
| def install(self, spec, prefix): | ||
| # Workaround until variant forwarding works properly | ||
| if '+mpi' in spec and spec.satisfies('^hdf5~mpi'): |
There was a problem hiding this comment.
Should this read instead:
if '+mpi' in spec and not spec.satisfies('^hdf5+mpi'):
???
There was a problem hiding this comment.
Those are equivalent?
Fixes #1552. This was causing errors like:
and
Here's my understanding of the problem. @tgamblin can correct me if I'm wrong. During normalization, variants are ignored unless explicitly stated. So
netcdf+mpi ^hdf5andnetcdf~mpi ^hdf5work butnetcdf ^hdf5doesn't. Also, versions are ignored during normalization unless explicitly stated. So[email protected]+mpi ^mvapich2and[email protected]+mpi ^mvapich2work butnetcdf+mpi ^mvapich2doesn't. This solution spreads all of the constraints out so that they do not interfere. Obviously dependency resolution is a little bit broken if mpi defaults to true but netcdf doesn't depend on mpi, but that's another issue that's been well documented.