Skip to content

Added legacy repository for deprecated recipes#15420

Closed
alalazo wants to merge 3 commits intospack:developfrom
alalazo:features/legacy_repo
Closed

Added legacy repository for deprecated recipes#15420
alalazo wants to merge 3 commits intospack:developfrom
alalazo:features/legacy_repo

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Mar 10, 2020

closes #10411
closes #12941

This commit introduces a new repository where to move deprecated recipes e.g. for packages that changed build system. This will allow to benefit from base classes specialized for each build-system while still providing support for both recipes.

Legacy packages used as example:

  • mpifileutils
  • arpack-ng

closes spack#10411
closes spack#12941

This commit introduces a new repository where to move
deprecated recipes e.g. for packages that changed build
system. This will allow to benefit from base classes
specialized for each build-system while still providing
support for both recipes. Mpifileutils used as an example.
@alalazo alalazo force-pushed the features/legacy_repo branch from 973a404 to 2b641cd Compare March 10, 2020 14:18
@alalazo

This comment has been minimized.

Copy link
Copy Markdown
Member

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution a lot. It's definitely by far the simplest solution. With this PR, the following simple cases can be handled properly:

  1. A package changes its build system a single time

As far as I can tell, the following more complicated cases cannot be handled with this approach:

  1. A package changes its build system more than twice
  2. A package supports multiple build systems for the same version

Based on previous discussion, 1 seems rare, and if it ever occurs, we can just fall back on the previous approach of a generic Package base class. 2 is pretty common, but I don't think we need to support that.

Additional things I would like to see (either in this PR or another):

  1. We need documentation for how to package something that changes build system, possibly in https://spack.readthedocs.io/en/latest/build_systems.html
  2. There are a lot of other packages that should be updated with this new logic: #12941 (comment)

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 10, 2020

As far as I can tell, the following more complicated cases cannot be handled with this approach:

  1. A package changes its build system more than twice

This one in my opinion should be solved with a rotation. The new build system in builtin, the one in builtin goes to legacy, the one in legacy gets dropped. I wouldn't expect more than a handful of cases like this diluted over time.

  1. A package supports multiple build systems for the same version

Here I would just use the build system that upstream developers recommend. Usually supporting two build systems at the same time happens during transition phases from one to the other, but I didn't investigate if we have less trivial cases of e.g. different build systems used for different architectures. I think though this will be a very small fraction of the supported packages.

@gonsie
Copy link
Copy Markdown
Contributor

gonsie commented Mar 10, 2020

This is really clean solution!

Echoing @adamjstewart 's comment, there needs to be some documentation around how to leverage this feature (it appears to be based solely on version numbers).

Thanks for converting the mpifileutils package... looks great to me!

Copy link
Copy Markdown
Member

@ChristianTackeGSI ChristianTackeGSI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having multiple classes in the same file?
Basicly those two classes are currently just scattered around in two repos.

I don't know, what a good naming convention for "older" package classes would be, but it should be doable. That way, even (2) can be handled.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 11, 2020

What about having multiple classes in the same file?

I think the approach you are suggesting requires extensive modifications of Spack. Say I have 2 classes in hdf5 where one uses cmake and the other autotools. What should Spack build for:

$ spack install hdf5

if the versions overlap? How can a user specify the build-system? Can different build-systems have different variants if the classes are in the same package.py file (meaning the same builtin repository)?

Adding a new repository I can give a clear answer to each of the questions above and there are no core modifications to Spack, so I don't see what would be the gain following your approach.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 11, 2020

@ChristianTackeGSI As a further note (2) is doable also with this PR even though whether it's a good idea or not to maintain both build systems is debatable. We just need to maintain a build-system (the "new" one) in builtin and the other in legacy.

@ChristianTackeGSI
Copy link
Copy Markdown
Member

@ChristianTackeGSI As a further note (2) is doable also with this PR even though whether it's a good idea or not to maintain both build systems is debatable. We just need to maintain a build-system (the "new" one) in builtin and the other in legacy.

My fault. I meant (1): More than two build systems.
Having multiple classes would solve that easily.

[…]
if the versions overlap? How can a user specify the build-system? Can different build-systems have different variants if the classes are in the same package.py file (meaning the same builtin repository)?

One possible solution would be spack install hdf5.foomapping to a Hdf5_foo class.

This whole suggestion was a brainstormy idea.
I don't actually mind the final solution, as long as enough old versions of packages can be kept. We need to be able to build old versions, that's one of the major points for using spack.

@alalazo alalazo requested a review from becker33 March 24, 2020 13:59
@scheibelp scheibelp self-assigned this Mar 24, 2020
Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts on the overall idea:

  • First inclination: multiple package support is an example of why packages should use composition rather than inheritance to get the functionality they need
    • The primary issue is that packages with fixes (e.g. patches) that span the lifetimes of multiple build systems will fall prey to all the issues that plague duplicate code
    • This is unlikely to affect packages currently in Spack (since most of them transitioned long ago). It is more likely to be an issue for future code projects that switch build systems. In other words, fixes that span build system lifetimes are most likely to come up near the time of a build system transition.
  • That being said, the changes here don’t preclude a composition-based approach to multiple-package support
  • One other problem with this approach: If I start with the configs offered in this PR, and a user calls “spack spec legacy-spec@oldversion (where oldversion appears only in the older spec), will Spack understand that it needs to use the legacy package? I don’t think it will - I’m assuming I would need to say “spack spec legacy.legacy-spec@oldversion” to avoid a complaint

I have looked through all the individual package changes and see no issues.

@adamjstewart
Copy link
Copy Markdown
Member

Can we make a final decision about this sometime in the near future? I want to update the gmt package, which switches from Autotools to CMake, so now would be a good time to split it into a modern and legacy package.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 30, 2020

@scheibelp

First inclination: multiple package support is an example of why packages should use composition rather than inheritance to get the functionality they need
[...] fixes that span build system lifetimes are most likely to come up near the time of a build system transition.

I fully agree that in general composition is better than inheritance for code reuse. On the other hand, changing how we treat build-systems is a quite impacting design choice that deals with at least:

  1. The package DSL (are directives in their current state still appropriate or do they need modifications? Where should we put general directives that are shared by classes of packages if we use composition?)
  2. The structure of the classes in packages.py and their interaction with the concretizer (since most of what we do is based on the inheritance mechanism for packages metadata)

This solution instead:

  1. Is super-simple
  2. Doesn't preclude, as you point out, further changes down the line
  3. Deals with most of the relevant cases

Fyi, I have written down a few considerations about the issue of packages that change build system in #10411 (comment) My personal point of view here is that having a legacy repository can be of use also if we have composition of build-systems already implemented - for instance as a way to notify users that some packages might be removed in future releases.

One other problem with this approach [...] to say “spack spec legacy.legacy-spec@oldversion” to avoid a complaint

I consider this another manifestation of the shortcomings of the current concretizer, in the same way as I can't specify hdf5 ^mpich and expect Spack to understand that it needs to activate +mpi.

@scheibelp
Copy link
Copy Markdown
Member

My personal point of view here is that having a legacy repository can be of use also if we have composition of build-systems already implemented

Agreed. But I want to make sure we know what a legacy repo is and isn't good for: e.g. I'm concerned that code duplication will make it unsuitable for most cases which is why I'd like to consider how a composition-based approach would work. If it is decided that it is too much work and that there are use cases which need to be supported on a shorter time frame, then that would make the approach of this PR appealing.

I want to avoid suddenly being responsible for condensing a number of legacy packages should it be decided that this wasn't the appropriate design decision in the future.

I consider this another manifestation of the shortcomings of the current concretizer

Each use case regarded as work for the concretizer should be recorded as such: many of these things we'd like to solve could possibly be solved, but if we don't account for it in the design, then most likely it will be difficult to support.

I'd like to discuss how to avoid shifting this responsibility on the concretizer in the future. I'm thinking there could be a Package method like

def choose_build_system(self):
  if self.version < Version(2.0)
    return Package.CMakePhases()
  else:
    return Package.AutotoolsPhases()

To address some concerns more directly:

  • The package DSL (are directives in their current state still appropriate or do they need modifications? Where should we put general directives that are shared by classes of packages if we use composition?)

    • I don't see any changes that would need to be made to the DSL: the build system used changes the phases that are run, AFAIK all current directives apply the same either way and don't interact with the build system
  • The structure of the classes in packages.py and their interaction with the concretizer (since most of what we do is based on the inheritance mechanism for packages metadata)

    • I think interaction with the concretizer is unaffected.
  • dynamic inheritance for packages that change build system #10411 (comment) discusses packages with build systems that have versions which can support multiple build systems. My proposal is that Spack picks one and supports only that one for such versions. Packages that conditionally require both under certain conditions should be rare, and I think it should be possible to support them but I'm not concerned with minimizing effort for that particular use case. I also don't think this necessarily requires the intervention of the concretizer.

@adamjstewart
Copy link
Copy Markdown
Member

@alalazo there's one thing I would like to understand about your proposed solution. Let's say I have a package builtin.foo that uses CMake, and an older package legacy.foo that uses Autotools. Let's say builtin.foo contains versions 2.0 and 3.0, and legacy.foo contains version 1.0.

  1. If I run spack info foo, will it only show versions 2 and 3, and only list the package as a CMake package? Or will is list 1, 2, and 3?
  2. If I run spack install [email protected], will it correctly determine that is should use the Autotools build system specified in legacy.foo?
  3. If I run spack install [email protected], what will it do?
  4. If a package depends on foo@1, which build system will be used?

Spack can already handle installing versions that are not available in a package, so I want to know how these things will work if we use 2 repos. If the above use cases aren't handled correctly, then we're essentially punting the problem onto the user to explicitly specify builtin.foo or legacy.foo. This isn't really a better solution than having separate foo and legacy-foo packages like we already do.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Apr 1, 2020

@scheibelp

Agreed. But I want to make sure we know what a legacy repo is and isn't good for: [...]

Sure, we're on the same page here. There are many proposed solutions for packages that change build-systems - none of them 100% satisfactory, including this one. What I want to point out is that this solution is the simplest by far that permits to:

  1. Use build-system specific classes
  2. Doesn't require extensive changes to the core (it doesn't require any!)

Code duplication is mainly for things that shouldn't be there in the first place (patches to some point release when the rule should be pristine sources) or for things that don't require a lot of maintenance (docstrings for descriptions). Also, my impression is that code in the legacy repository will be mostly "stable" in the sense that active development for the package will be done in the builtin repository - but of course I have no metric or history to support that.

Each use case regarded as work for the concretizer should be recorded as such: many of these things we'd like to solve could possibly be solved, but if we don't account for it in the design, then most likely it will be difficult to support.

I can file an issue if this is not tracked already. Just to be clear, this is a problem we currently have when adding a new repository, since this PR does not change at all the behavior of repositories or of the concretizer.

I'd like to discuss how to avoid shifting this responsibility on the concretizer in the future.

Sure. Should we continue here or should we have the discussion in a dedicated issue?

I don't see any changes that would need to be made to the DSL: the build system used changes the phases that are run, AFAIK all current directives apply the same either way and don't interact with the build system

To better specify in case it was unclear, I mean the package DSL not specs. Semantically there are directives that are tied to the build-system. For instance, CMakePackage comes with a variant that defines its build-type and a dependency on cmake that don't make sense for other build-systems. Another point to consider is that directives are based on meta-classes to perform most of their magic of being function calls that attach attribute to a class that is currently being defined. This is just to say that I didn't look carefully into what needs to be changed but I don't expect it to be as easy as pushing phases down into an attribute.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Apr 1, 2020

@adamjstewart To reply to your questions:

  1. If I run spack info foo, will it only show versions 2 and 3, and only list the package as a CMake package? Or will is list 1, 2, and 3?

Currently only 2 and 3 and you need to say spack info legacy.foo to get 1. This is how things work with repositories right now in Spack. I have to add that spack info behavior is probably the easiest to modify.

  1. If I run spack install [email protected], will it correctly determine that is should use the Autotools build system specified in legacy.foo?

Currently you need to specify legacy.foo. Again, if you add a repository to Spack outside of this PR that's how it works.

  1. If I run spack install [email protected], what will it do?

Same behavior as spack install zlib@29 on develop. It didn't change here.

  1. If a package depends on foo@1, which build system will be used?

This I didn't try but as above there's no change in behavior in this PR with respect to develop.

If the above use cases aren't handled correctly, then we're essentially punting the problem onto the user to explicitly specify builtin.foo or legacy.foo. This isn't really a better solution than having separate foo and legacy-foo packages like we already do.

I kind of disagree:

  1. I believe it will be easier / less surprising to users to instruct the new concretizer to take into account packages with the same name that are in different repositories than considering legacy-foo as a package that satisfies foo.

  2. Currently we don't have a clear rule to treat this case. Some packages took the legacy-* approach others went back to inherit from Package and they duplicate code in the build-system classes

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Apr 1, 2020

@adamjstewart @scheibelp Added #15800 to keep track of this use case for concretization.

@scheibelp
Copy link
Copy Markdown
Member

scheibelp commented Apr 1, 2020

If a package depends on foo@1, which build system will be used?

This I didn't try but as above there's no change in behavior in this PR with respect to develop.

There would be an error: Spack would select the foo package from the non-legacy repo, and then it would complain that this package doesn't have version 1. (See #15420 (comment))

To handle this, the user would have to specify depends_on("legacy.foo@1")

@adamjstewart
Copy link
Copy Markdown
Member

@scheibelp why would that be an error? Spack already allows you to install versions of packages that aren't listed, so I would imagine it would confirm whether or not you want to download that version even though there is no checksum available, then install it via CMake.

@scheibelp
Copy link
Copy Markdown
Member

Also, my impression is that code in the legacy repository will be mostly "stable" in the sense that active development for the package will be done in the builtin repository - but of course I have no metric or history to support that.

I think that's true for all current packages in Spack. I think it won't be true for any package in the future which decides to switch build systems: it's common for developers to support the last few recent versions, and in their case that would include versions with different build systems.

this is a problem we currently have when adding a new repository

The concretizer this far has never been expected to select among repositories for the most suitable package (not in the same way that it chooses among providers for the best virtual) - that's an entirely-new concept and I think this PR is the first place that I've seen where the concretizer has been expected to do that.

Code duplication is mainly for things that shouldn't be there in the first place (patches to some point release when the rule should be pristine sources)

I agree that patches will always ideally disappear at some point, but I don't think this means anything in terms of Package implementation: the code still has to be there, and if users forget to duplicate their changes it still breaks.

What I want to point out is that this solution is the simplest by far that permits to:

I acknowledge that this is the simplest solution. I want to explore the other criteria though (e.g. whether it handles the use cases that people need). For example based on #15420 (comment) and #15420 (comment), we may start to see packages in the non-legacy repo that reference legacy packages. Also, perhaps we can come up with an approach that addresses versions with overlapping build systems that you refer to in #10411 (comment).

This solution may be good for a subset of legacy packages, but I one thing I want to come out of this conversation is the guidelines for which packages should not be turned into legacy packages. Spack package reviewers will have to be aware of these constraints when looking at new PRs, otherwise the relationship between the legacy/non-legacy repos may become over-complicated.

@scheibelp
Copy link
Copy Markdown
Member

scheibelp commented Apr 1, 2020

why would that be an error? Spack already allows you to install versions of packages that aren't listed, so I would imagine it would confirm whether or not you want to download that version even though there is no checksum available, then install it via CMake.

Oh you are correct. I would consider that also to be an error, but you are right that it would be Spack doing something different vs. complaining.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Apr 2, 2020

I think that's true for all current packages in Spack. I think it won't be true for any package in the future which decides to switch build systems: it's common for developers to support the last few recent versions, and in their case that would include versions with different build systems.

What I've seen so far is that packages switch build-system from one version to the next, or in any case very quickly: see #15812 for a recent example.

The only example I have of a package that maintains 2 build-systems is hdf5. That package can build with CMake or autotools on Unix-like platforms while CMake is mandatory on Windows. My opinion there is that, since both build-systems are maintained we should switch when "it becomes appropriate" e.g. when we'll support Windows or when the HDF5 group will stop supporting autotools for linux or deprecate it.

A case I never encountered but that is possible is that a similar package might decide to keep CMake to build on Win only and autotools to build on linux, in which case "legacy" won't be appropriate as a term. But as I said I never encountered such a case so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants