Skip to content

Extend packages.yaml to allow custom attributes#16526

Closed
alalazo wants to merge 59 commits intospack:developfrom
alalazo:packages/gcc_detection
Closed

Extend packages.yaml to allow custom attributes#16526
alalazo wants to merge 59 commits intospack:developfrom
alalazo:packages/gcc_detection

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented May 8, 2020

fixes #3181 (either that or this issue was already fixed by #15158)

Modifications:

  • The schema for packages.yaml has been updated. Old configuration files are not allowed anymore and Spack will error out in their presence. The new schema permits to store arbitrary attributes for each external detected.
  • The detection mechanism has been extended to account for these custom attributes. A validation step is now in place to ensure the detected specs are in a proper state before writing them to packages.yaml. All specs that are detected must be created with Spec.from_detection.
  • The GCC package can now be detected automatically
  • A command to update existing configuration files to the new format has been added
  • A command to revert configuration updates has been added too
  • All the occurences of packages.yaml displayed in the docs have been updated
  • Added a decorator to mark detectable packages and a command to list them
  • Documentation for external spec detection has been extended

@alalazo alalazo requested review from scheibelp and tgamblin May 8, 2020 12:30
@alalazo alalazo force-pushed the packages/gcc_detection branch from 9a2eb15 to 5cfc5b7 Compare May 8, 2020 12:35
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 8, 2020

With the current state of things I can detect:

packages:
  gcc:
    paths:
      [email protected] languages=c,c++,fortran: /usr
      [email protected] languages=c,c++,fortran: /usr
      [email protected] languages=c,c++,fortran: /usr
      [email protected] languages=c: /usr
      [email protected] languages=c,c++,fortran: /usr
      [email protected] languages=c,c++,fortran: /usr
    buildable: false
    version: []
    target: []
    compiler: []
    modules: {}
    providers: {}

As we were discussing, having only the prefix as an information in packages.yaml is limiting since we don't keep track of other aspects that might be relevant for the package. What do you think if we could have a structure like:

packages:
  gcc:
    externals:
    - spec: '[email protected] languages=c,c++,fortran'
      prefix: /usr
      compilers:
        c: /usr/bin/gcc-4.8
        c++: /usr/bin/g++-4.8
        fortran: /usr/bin/gfortran-4.8
      # Here we should put other relevant information, for instance
      # on host and target for cross-compilation
    - spec: '[email protected] languages=c,c++,fortran'
      prefix: /usr
      compilers:
        c: /usr/bin/x86_64-linux-gnu-gcc-9
        c++: /usr/bin/x86_64-linux-gnu-g++-9
        fortran: /usr/bin/x86_64-linux-gnu-gfortran-9
    buildable: false
    version: []
    target: []
    compiler: []
    modules: {}
    providers: {}

with a few mandatory fields for every package (e.g. spec and prefix) and other fields that are instead package specific? @scheibelp is this similar to what you had in mind?

@alalazo alalazo force-pushed the packages/gcc_detection branch 2 times, most recently from 15afff1 to 0c1aa6e Compare May 13, 2020 15:43
@scheibelp
Copy link
Copy Markdown
Member

Regarding #16526 (comment)

a few mandatory fields for every package (e.g. spec and prefix) and other fields that are instead package specific? @scheibelp is this similar to what you had in mind?

This looks excellent to me. It looks like this doesn't generate customized yaml yet - let me know if you want to discuss how to do that.

@tgamblin
Copy link
Copy Markdown
Member

Question: is there any reason not to make the paths to executables string-valued variants? They can still be accessible as direct attributes on the spec, but this would allow us to store better provenance (e.g., in lockfiles) and to track configuration.

There are other attributes from compilers.yaml that should likely not be variants (extra RPATHs, modules, etc.), but the executable names seem like they should be exposed as part of the package -- i.e. they describe what the package is, not something we would configure about it.

@scheibelp
Copy link
Copy Markdown
Member

Question: is there any reason not to make the paths to executables string-valued variants?

Generally specs should only differ in hash if there is a difference in the behavior (i.e. the bits which make up the files in the package). If the paths were variants then changing the paths would alter the hash (I consider this theoretically problematic, but it also has pragmatic side effects e.g. relocation).

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 14, 2020

Question: is there any reason not to make the paths to executables string-valued variants? They can still be accessible as direct attributes on the spec, but this would allow us to store better provenance (e.g., in lockfiles) and to track configuration.

I'd rather make them a property of the spec instance. A package providing a compiler might be installed by Spack or not and it doesn't have a meaning to let the user specify via a variant the path to the compiler that is still to be installed. Think for instance:

$ spack install gcc cc=/usr/foo

Having variants that are attached only to external specs and are not declared in the package doesn't seem appealing too (i.e. I don't see as an option to have these variants only in packages.yaml for external specs). What I think is the right way to go is to ensure instead that something like:

cc = spec['c'].cc

has a consistent meaning on spec objects and returns the path to the current C compiler etc.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 14, 2020

@tgamblin It occurs to me that maybe you were thinking at variants like patches:

$ spack spec mpfr
Input spec
--------------------------------
mpfr

Concretized
--------------------------------
[email protected]%[email protected] patches=8f15fd27ab65341a60d724d594897d32f4597ddf642d0dc121995e2150181b0c arch=linux-ubuntu18.04-broadwell
    ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell
        ^[email protected]%[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-ubuntu18.04-broadwell
            ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell
        ^[email protected]%[email protected]+cpanm+shared+threads arch=linux-ubuntu18.04-broadwell
            ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell
                ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell
                    ^[email protected]%[email protected]~symlinks+termlib arch=linux-ubuntu18.04-broadwell
                        ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell
    ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell
    ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell
    ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell
        ^[email protected]%[email protected] arch=linux-ubuntu18.04-broadwell

that are not under the "direct" control of users. Even in that case I would prefer using properties for the reasons outlined by @scheibelp and for the fact that while the sha256 is known upfront the path of a compiler might depend on the build configuration.

@tgamblin
Copy link
Copy Markdown
Member

@scheibelp:

Generally specs should only differ in hash if there is a difference in the behavior (i.e. the bits which make up the files in the package)

Technically it does change the behavior. A package with a gcc executable responds to the gcc command, while one with a gcc-4.8 executable does not. The package isn't usable without this information.

I guess the question is what does external configuration "mean". If you look at it as "I am telling Spack how to map this package to a Spack package" then it makes sense that the executable names might just be config options. If I look at the hash as a real reproducibility test, then I start worrying about compiler names. Maybe it's best to do the former (at least for now) since you give up some reproducibility with externals anyway.

At the moment we do not have a use case where my concern matters, but let me bounce this one off you because it'll matter in the future. Right now we can get away with this because we only carer about the compiler names (and other configuration like them) at build time. They are only used within the package build, and we would usually rebuild if something changed.

I would like to get to a place eventually where we can swap in a spack-built package for an external and say "this is compatible" -- things will continue to work. So if I have a user who built something on, say, Lassen, and they want to reproduce their stack on Summit without rebuilding much, they could take the spec from Lassen and rebuild only external packages (or swap in new externals) to do that. For that, you'd need to know things like compiler paths.

Example: consider the UPC package, or MPI packages, which hard-code compiler paths in scripts. We currently patch them at build time, but if I were dealing with binaries and not builds, I would need to know these paths to relocate them. Or I would need to at least know that they differ, so that I know that one binary won't work with another.

Consider openmpi -> gcc -- the mpicc wrappers needs to know that to swap in a new gcc, the compiler names must be similar to the gcc that openmpi was built with. If you instead had openmpi ^gcc cc=bin/gcc-4.8 (i.e. the spec has the additional provenance for the dependency), then you could know that you can't use gcc cc=bin/gcc as a substitute. This gets into more complicated territory, as we would need to store both build and deployment provenance on packages (e.g., this package "is" openmpi ^gcc cc=bin/gcc-4.8, but it is "deployed with" gcc cc=bin/gcc -- we'd have two sets of hashes).

Anyway, I think I've convinced myself here that this use case is far off, and I don't really see anything precluding it in the design proposed here, so we can go (mostly) with what is proposed above. But does this at least make sense to you? The paths are relevant for compatibility -- just not if we always rebuild before deploying (the current model).

@tgamblin
Copy link
Copy Markdown
Member

@alalazo:

the fact that while the sha256 is known upfront the path of a compiler might depend on the build configuration.

I guess I'll just point out that you're kind of proving my point here. The hash represents the build configuration, so you do want it in the spec. It changes the functionality of the package.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented May 14, 2020

One last question. What is compilers here?

- spec: '[email protected] languages=c,c++,fortran'
      prefix: /usr
      compilers:
        c: /usr/bin/x86_64-linux-gnu-gcc-9
        c++: /usr/bin/x86_64-linux-gnu-g++-9
        fortran: /usr/bin/x86_64-linux-gnu-gfortran-9

Is it a special keyword just for compiler pacakges? Or can configuration passed through packages.yaml have a more generic name? I could easily see wanting to set properties like these for other types of packages (though I do not think we should make a habit of it -- for most of the reasons I mentioned in the thread above).

Oh, and i think c++ should be cxx since that will be a valid python property name.

@alalazo alalazo force-pushed the packages/gcc_detection branch from 495112a to 9775f3d Compare May 14, 2020 20:56
@scheibelp
Copy link
Copy Markdown
Member

A package with a gcc executable responds to the gcc command, while one with a gcc-4.8 executable does not. The package isn't usable without this information.

Fair enough, but /pathx/gcc-4.8 and /pathy/gcc-4.8 would be the same (down to offering exactly the same interface to any consumer package).

If it comes down to a difference that could be resolved with symlinks, I'm inclined to say they are compatible (but that's a separate argument).

When it comes to comparisons between Spack packages, it's also easier on us to make some assumptions about how if a certain set of inputs remains the same, you get the same package:

  • operating system
  • configuration options (or CMake options etc.)
  • such details about dependencies etc.

given a fixed set of the above variables, it's unlikely to produce layout differences, so I imagine that these sorts of compatibility concerns come up mainly for comparisons between Spack and non-Spack packages. However:

I would like to get to a place eventually where we can swap in a spack-built package for an external and say "this is compatible" -- things will continue to work.

That may actually come down to details other than querying the spec: some details of how the package was built (which factor into the hash) may not be determinable for an external package, but it still may be possible to determine that a Spack package and external are compatible.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 14, 2020

What is compilers here?

It's an extra attribute that gets written and is mandated by the GCC package. The commit message of 9775f3d should explain the rationale of the new methods that are introduced to support this. Currently, I'm working on modifying packages.yaml to allow for custom package attribute to be written.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 14, 2020

Anyway, I think I've convinced myself here that this use case is far off, and I don't really see anything precluding it in the design proposed here, so we can go (mostly) with what is proposed above. But does this at least make sense to you? The paths are relevant for compatibility -- just not if we always rebuild before deploying (the current model)

This makes sense to me. My point is more that for things like compiler paths I would avoid using variants. Even using variants for patches was a hack to get patches influence the dag_hash with minimal reworking, if I recall correctly. I guess I see variants as entities that should expose options which are under the direct control of the user, not other kind of relevant facts that could be determined after a package has been installed.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented May 14, 2020

I guess I see variants as entities that should expose options which are under the direct control of the user, not other kind of relevant facts that could be determined after a package has been installed.

A few things:

  1. I don't think we should distinguish based on user control. That's not what the hash is. It's supposed to represent the build configuration, i.e., anything that might change the generated binary's functionality, which is really installer-controlled.
  2. Things like version suffixes are actually installer-controlled. gcc, for instance has --program-prefix and --program-suffix arguments to configure, and they're determined before the package is installed. We just don't expose them as variants at the moment. We could add those to the package and say that a particular gcc installation has those options, and those determine thee executable names. It's a bit more complicated than that, though, as HPC sites (and vendors like Cray) will rename compilers or deploy wrappers that are outside the installer's immediate control. Supporting that is actually a good case for keeping the configuration separate in the compiler case.
  3. For externals, the installer might not be the user, so the spec functions as configuration -- but it still describes what the external is, and therefore how it can be used. The only difference is you're asking a human to deduce the spec for an already-installed package, rather than asking the concretizer to deduce it for something yet to be built.

We make exceptions for things like RPATH and modules because we have relocatable binaries, and we need to use them regardless of where they're installed. So most of the other stuff in compilers.yaml has to do with finding libraries (extra_rpaths, modules, etc.), and those can be determined (and changed) after the package is built. Compiler paths are a bit different at the moment because we don't relocate them and and we haven't thought about swapping different binaries into builds.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 14, 2020

@tgamblin I agree on 2. and 3. I have a comment on 1. though:

I don't think we should distinguish based on user control. That's not what the hash is. It's supposed to represent the build configuration, i.e., anything that might change the generated binary's functionality, which is really installer-controlled.

I agree on what the hash (and here we are talking of the full_hash, right?) should represent, what I am saying is that we should make it such that things like compiler paths, patches or other kind of relevant metadata are accounted in the hash via mechanisms other than variants.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 14, 2020

Still work in progress, but with 897433f I generate the following:

packages:
  gcc:
    externals:
    - spec: [email protected] languages=c,c++,fortran
      prefix: /usr
      compilers:
        c: /usr/bin/gcc-5
        cxx: /usr/bin/g++-5
        fortran: /usr/bin/gfortran-5
    - spec: [email protected] languages=c,c++,fortran
      prefix: /usr
      compilers:
        cxx: /usr/bin/x86_64-linux-gnu-g++-7
        c: /usr/bin/x86_64-linux-gnu-gcc-7
        fortran: /usr/bin/x86_64-linux-gnu-gfortran-7
    - spec: [email protected] languages=c
      prefix: /usr
      compilers:
        c: /usr/bin/x86_64-linux-gnu-gcc-6
...

For [email protected] I just installed the C compiler on the system and the spec is identified correctly. Another point: the CI build failing on lz4 shows the error message people will encounter when updating Spack. Can you let me know if it is clear enough or should be improved?

@tgamblin
Copy link
Copy Markdown
Member

@alalazo: This is getting a little off track since we're not changing it right now 😁 but:

patches or other kind of relevant metadata are accounted in the hash via mechanisms other than variants

how though? Where would you represent this information if you wanted it in the hash?

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 15, 2020

@scheibelp @tgamblin Incidentally, once we have all the packages that are used in CI covered in terms of detection we can eliminate a few configuration files from the git repository and just rely on:

$ spack external find pkg1 pkg2 ...

@alalazo alalazo force-pushed the packages/gcc_detection branch 3 times, most recently from c8f51e7 to 93a0433 Compare May 15, 2020 08:19
alalazo added 25 commits July 30, 2020 07:23
The documentation now instructs users to assert conditions, but
in reality will catch any exception that might occur in that
method to prevent an error in a single package to stop detection
for everything else.
The process to make a package detectable has been simplified
by removing the need to decorate the package class. Now a
package class is assumed to be detectable if it defines
the executables attribute.
If a packages.yaml in the old format is found, Spack
will convert it in memory. This makes any configuration
in the old format usable by Spack.

If something is added to a configuration file in the old
format via cli, Spack will error and point the user to
the command that is needed to update it. The message points
out that any update will not be forward-compatible.
Since the conversion from an old format to a new one
happens in memory automatically (to ensure backward
compatibility), the commands used to update or revert
files on disk are interactive by default.
The function was returning the opposite of what its name suggested.
Avoid using `pop` retrieve a key instead.
- Multiple scopes updated at a time
- Environment are not written implicitly
The current code is now based on the
assumption that all the module objects
that might be needed are already imported
at the top level.
@alalazo alalazo force-pushed the packages/gcc_detection branch from bfbbac2 to 55c4377 Compare July 30, 2020 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generating potential packages.yaml file?

7 participants