Skip to content

Multi Build System Packages [Prototype; Please Review]#12941

Closed
citibeth wants to merge 1 commit intodevelopfrom
efischer/190923-MultiPackagePrototype
Closed

Multi Build System Packages [Prototype; Please Review]#12941
citibeth wants to merge 1 commit intodevelopfrom
efischer/190923-MultiPackagePrototype

Conversation

@citibeth
Copy link
Copy Markdown
Member

@citibeth citibeth commented Sep 25, 2019

related #10411

@alalazo @scheibelp @adamjstewart
With a mass migration to CMake and Spack having lasted more than a couple of years, it is now becoming common for Spack packages to change build system due to upstream changes in build system. For example, superlu-dist changed from ad-hoc manually-edited makefiles to CMake (see #12938). This introduces a problem because now there need to be two fundamentally different Spack recipes for the same package --- one to build older versions, and one to build newer versions. So far, there is no way to do this. Instead, people have simply been dropping old versions of packages, which can create its own problems.

This PR provides a prototype of how a multiple build systems can be handled for a package in Spack, using superlu-dist as a real-world example. I have used it successfully to install [email protected]. The downside of this is that now, the phases of the CMake-build package are hidden. I'm sure that could cause headaches somewhere; but remember that early versions of Spack only had a single install() phase and life went on. Maybe there is some what to finess the phase issue; for example, make the top-level package inherit phaes from the most-recent sub-package (in this case, the CMakePackage version of superlu-dist).

Comments / improvements welcome. Ultimately, it would be great if this could be turned into a top-level, reusable MultiBuildPackage or something of the sort.

…ckages can be created.

This should utlimately be packaged up into something robust and reusable.
@citibeth citibeth requested a review from alalazo September 25, 2019 03:16
@citibeth citibeth changed the title Multi Build System Packages Multi Build System Packages [Prototype; Please Review] Sep 25, 2019
@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Sep 25, 2019

Trying to build superlu-dist (a CMakePackage) yields:

==> Executing phase: 'install'
==> Error: ProcessError: cmake: Permission denied
    Command: 'cmake' '/home2/rpfische/spack7/var/spack/stage/superlu-dist-5.2.2-lxgy2rtqhfkyzpvit52w7hi4djxwod2w/superlu_dist-5.2.2' '-G' 'Unix Makefiles' '-DCMAKE_INSTALL_PREFIX:PATH=/home2/rpfische/spack7/opt/spack/linux-centos7-x86_64/gcc-4.9.3/superlu-dist-5.2.2-lxgy2rtqhfkyzpvit52w7h

Looks like the problem is a missing depends_on('cmake') at the top level.

@G-Ragghianti
Copy link
Copy Markdown
Contributor

We will also need to implement something like this for the PAPI package once the next release comes out (in 6 months to 1 year).

@chuckatkins
Copy link
Copy Markdown

chuckatkins commented Oct 3, 2019

I worry that pushing all of this into a single package can become unmaintainable. There was a while back the idea of having "archived" versions of packages that spack could use, thus the current package would use "buildsystemA" and the archived version would use "buildsystemB", etc. I think that would be much more sustainable, although probably harder to implement in core spack.

@adamjstewart
Copy link
Copy Markdown
Member

I agree with @chuckatkins, @alalazo had a nice idea where we could create a separate repository for older build systems of packages. The package names themselves would be the same, so the concretizer could still find them. I'm not sure how much work that would be in Spack core to make this happen though.

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Oct 3, 2019

What does "unmaintainable" mean here, can one be specific?

How about something more convenient, built into Spack Core: Expand the concept of "package" to allow for two or more recipes (xyz/package.py) side by side. As long as the versions in the recipes don't conflict, all is OK. Spack would load whichever package.py makes sense for the version the user requests. For example:

mypackage/
    autotools/
        package.py
            class MyPackage(AutotoolsPackage):
                 version('1.0', sha256='...')
                 version('1.1', sha256='...')
    cmake/
        package.py
            class MyPackage(CmakePackage):
                version('2.0', sha256='...')
                version('2.1', sha256='...')

In this case, the autotools/ and cmake/ directories could be anything. As long as the package.py files don't specify conflicting versions. If Spack wants to know the versions available for the package, it looks in all the subdirectories. If it wants to concretize, it only loads up the package.py for the required version.

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Oct 3, 2019

In any case... my current problem is I can't use the versions of SuperLU-dist that I need to use, because they were removed to "make way" for newer versions.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 3, 2019

@adamjstewart @citibeth Can you list a few packages that have already been changed from one build-system to another? I'll try to see how far I can go with the idea of having a builtin.legacy repository.

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Oct 3, 2019

We currently have 3 suggestions on the table, and (AFAIK) no processes of thinking through the pros and cons of them. BEFORE we go to the work to implement one solution, can we work through on how that solution would work, how it would affect end users, and agree that it's a good compromise for all the different angles?

Can you describe how the builtin.legacy repo would work? I'm supposing that old versions of packages (eg SuperLU-dist) would be moved into it. After that... how would the user access these packages? How would it affect concretization, etc? Can you work through a simple example of how this would work from an end user perspective?

@G-Ragghianti
Copy link
Copy Markdown
Contributor

I would prefer a solution that was a bit more general than just addressing the build system change problem. For example, a solution that would allow distinct recipes based on version would address this problem as well as the problem of a recipe becoming too complicated when the build process changes significantly between versions.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 3, 2019

@citibeth It works like an additional repo, docs here, which contains legacy recipes and is set at lower priority than builtin. I just wanted to check a few use cases to see how much builtin and builtin.legacy would be referring to each other and what are the limitations of the current concretizer in this configuration.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 3, 2019

Can you work through a simple example of how this would work from an end user perspective?

Use cases are essentially what I was asking for.

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Oct 3, 2019

@alalazo OK... I don't know all the packages that have been "buried" with new packages, hopefully not too many so far. But you can certainly start by digging the old version of the superlu-dist recipe out of the Git repo, and playing around with that. I'm looking forward to seeing your assessment of how it works out.

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Oct 3, 2019

There are also upstream packages that have switched to CMake; but we still use the Autotools build becuase we don't know how to support both at once. The NetCDF packages is a good example of that; and it's quite widely used in various concretizations. To try that, one would have to make a new NetCDF package based on CMakePackage.

@healther
Copy link
Copy Markdown
Contributor

healther commented Oct 3, 2019

@alalazo one thing to keep in mind here: There may be packages, that support multiple build systems for some versions (thinking about tensorflow with bazel/Cmake). Sure ultimately most will revert to only supporting one, but still.
The point why we have to care about that case is that it may end up being the case that e.g. autotools only works on Linux, whereas Cmake works on both Linux and MacOSX (example obviously just invented). In that case it wouldn't be sufficient to have multiple package.pys, but we would have to actively allow for the user to express a preference.

We may very well decide, that we just don't care enough for this (hopefully rare) edge-case. But we should make that decision consciously

@adamjstewart
Copy link
Copy Markdown
Member

adamjstewart commented Oct 3, 2019

Can you list a few packages that have already been changed from one build-system to another?

Here is what I found. Feel free to add to this list:

Package Previous Build System New Build System
amber custom CMake
arpack-ng Autotools CMake
atk Autotools Meson
blast None Autotools
dyninst Autotools CMake
evtgen Autotools CMake
fish Autotools CMake
gdk-pixbuf Autotools Meson
glib Autotools Meson
glog Autotools CMake
gmt Autotools CMake
gtkplus Autotools Meson
hpl Makefile Autotools
interproscan Perl Maven
jasper Autotools CMake
kahip SCons CMake
kokkos Makefile CMake
kokkos-kernels Makefile CMake
leveldb Makefile CMake
libdrm Autotools Meson
libjpeg-turbo Autotools CMake
mesa Autotools Meson
metis None CMake
mpifileutils Autotools CMake
muparser Autotools CMake
mxnet Makefile CMake
nest Autotools CMake
neuron Autotools CMake
nsimd CMake nsconfig
opennurbs Makefile CMake
optional-lite None CMake
plasma Makefile CMake
preseq Makefile Autotools
protobuf Autotools CMake
py-pygobject Autotools Python
singularity Autotools Makefile
span-lite None CMake
ssht Makefile CMake
string-view-lite None CMake
superlu Makefile CMake
superlu-dist Makefile CMake
uncrustify Autotools CMake

@chuckatkins
Copy link
Copy Markdown

There are also upstream packages that have switched to CMake; but we still use the Autotools build becuase we don't know how to support both at once.

This is one of the reasons Mesa is pinned to a year old version (not being able to consider the deps as a graph in the concretizer is another).

What does "unmaintainable" mean here, can one be specific?

Namely that the particular package becomes very complex and very difficult for somebody other than the person who wrote it or other spack maintainers to make changes and add features.

How about something more convenient, built into Spack Core: Expand the concept of "package" to allow for two or more recipes (xyz/package.py) side by side. As long as the versions in the recipes don't conflict, all is OK. Spack would load whichever package.py makes sense for the version the user requests. For example:

mypackage/
    autotools/
        package.py
            class MyPackage(AutotoolsPackage):
                 version('1.0', sha256='...')
                 version('1.1', sha256='...')
    cmake/
        package.py
            class MyPackage(CmakePackage):
                version('2.0', sha256='...')
                version('2.1', sha256='...')

In this case, the autotools/ and cmake/ directories could be anything. As long as the package.py files don't specify conflicting versions. If Spack wants to know the versions available for the package, it looks in all the subdirectories. If it wants to concretize, it only loads up the package.py for the required version.

I think this is a fantastic solution!!! It gives the ability to do more than just one previous version which can be important if a package goes through a major overhaul where variants are changed, etc. This will be increasingly common as spack ages and packages have more than just build system changes.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 4, 2019

A comment that I did on a similar issue some time ago is #10411 (comment) very much in line with @healther concerns.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 4, 2019

If Spack wants to know the versions available for the package, it looks in all the subdirectories. If it wants to concretize, it only loads up the package.py for the required version.

It's not that easy I think.

it only loads up the package.py for the required version

already implies that the version is known, which might not be. Also, it's very likely that different package.py might come with different variants etc. A solution like this needs potentially to extend the spec syntax (a user needs to be able to specify the build system he wants), support from the parser and from the concretizer (a given build-system might be implied by another detail of the spec / variants will depend on the build system).

@adamjstewart
Copy link
Copy Markdown
Member

Do we want to consider the possibility that a particular version can be built using multiple build systems? Is there any advantage to allowing this? It sounds like it increases the complexity quite a bit.

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Oct 4, 2019 via email

@chuckatkins
Copy link
Copy Markdown

chuckatkins commented Oct 4, 2019

I agree with @citibeth on that point. It's a use case for sure but I don't think spack needs to worry about it. So long as the upstream dependee packages can use it, even if it takes a little hand holding by explicitly pulling the pieces out of the dependency spec to craft the right configure args i think it'd be fine.

@adamjstewart
Copy link
Copy Markdown
Member

Follow-up: do we want to consider the possibility of a package that changes its build system multiple times? I'm not aware of any packages in Spack right now that do this, but as Spack ages, I wouldn't be shocked to see a Makefile -> Autotools -> CMake build system change for a single package. This renders the builtin.foo and legacy.foo solution obsolete.

@healther
Copy link
Copy Markdown
Contributor

healther commented Oct 4, 2019

Do we want to consider the possibility that a particular version can be built using multiple build systems? Is there any advantage to allowing this?

We might have to if there are exotic build systems. A current example I have is everything that is meson based, as it doesn't play with our software stack that is (at least partially) still python2 based. So if something switches from anything else to meson, it will break any legacy py2 environment, even so it still builds with a different build system.

It sounds like it increases the complexity quite a bit.

It definitely does and there likely aren't that many package versions that will end up supporting multiple build systems. However spack was created at least partially with reproducibility and versatility in mind. This is one point where we would "needlessly" restrict its abilities.

From a user/developer perspective I like the idea of just having multiple package.py's providing the instructions for the different build systems (I wouldn't mind having them as different classes in the same file either). However from a programatic point of view I don't think we could handle that right now. We would at least have to have:

  1. a way to hint which build system to use (if the user has a preference)
  2. a new notion of what a package means, in particular, we would have to load multiple classes to construct a list of all possible variants and versions

I don't think that 2 is a fundamental problem, it is quite a large step from our current setup and it probably wouldn't work out too nice with the current state of the optimiser. I'm really quite torn here, on the one hand I really want an easy way to add new build systems to new versions of packages, on the other hand screwing the implementation up here will hurt us in the long term. As @citibeth said: The number of packages we will have to deal with will only grow.

It would probably be a good idea to separate the two questions here:

  1. On the long run: Do we want to support multi-build-system packages? (Yes) And if so, what is a reasonable way to represent that to the developers? (Multiple classes seems like the right idea to me)
  2. What can we do right now to make the life of us easier without having to resort to git-history-look-ups?

Ideally we want an answer to 1) before we should talk about 2), however for practical reasons 2 is the more pressing question.

My gut feeling is that separating the implementation of different build system receipts is the right call. I'd probably prefer it in a single file, in order to allow for helper functions to be reused, but that's not a strong opinion. In the short run we could then go ahead and enforce (by convention) that each version may only appear under one build system, reducing the implementation problem (for now) to "aggregate all versions before selecting one".

edit:

Follow-up: do we want to consider the possibility of a package that changes its build system multiple times?

We should, it's not much more (implementation) work and as you say: It will happen and at least we use spack mainly because it can build older software without much headaches (looking at you: Qt)

@adamjstewart
Copy link
Copy Markdown
Member

@healther I agree that it would be nice to support multiple build-systems for the same version of a package, if we can figure out a simple way to do it. But I personally rank simplicity higher than flexibility, so the final solution should ideally have both.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 4, 2019

In the short run we could then go ahead and enforce (by convention) that each version may only appear under one build system, reducing the implementation problem (for now) to "aggregate all versions before selecting one".

Spack does not only select based on versions. The concretizer might need to undo its decisions and the two classes, even with disjoint versions, might have different variants.

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Oct 4, 2019 via email

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 4, 2019

@citibeth Since when do we have conditional variants?

@adamjstewart
Copy link
Copy Markdown
Member

It’s already the case that different versions have different variants

In reality, yes. In Spack, no. We could really use a when arg for variants though.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 4, 2019

@adamjstewart Exactly, and that needs modifications to the concretizer. See #9740

@healther
Copy link
Copy Markdown
Contributor

healther commented Oct 5, 2019

I agree that it would be nice to support multiple build-systems for the same version of a package, if we can figure out a simple way to do it. But I personally rank simplicity higher than flexibility, so the final solution should ideally have both.

@adamjstewart ack, the problem is that we need to decide how we want the final solution (kind of a weird phrasing from my german perspective ;)) to look before we can implement something that works right now. Which makes life tricky...

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Oct 7, 2019

I talked to @becker33 about this a bit after #10411. I think there are four use cases here:

  1. Package evolves to use a new build system (by far the most common)
  2. @G-Ragghianti's:

    a solution that would allow distinct recipes based on version

  3. Building the same version with different build systems.
  4. Versioning a package.py in a git repo for a single project over time.

I think 1-3 have been discussed above. I tend to agree that we should not try to do (3), at least not immediately. I'll say more below.

(4) is one that other package managers (like npm) support -- it essentially allows a project to have many different versions of its package file over time, and you just version the changes to the package file with the project. I do not know that we're completely ready for that one -- it's a different package manager architecture in that the package repo data is fundamentally distributed, and you collect it periodically in a registry. I don't see us going completely down that route -- it's harder for Spack to do this than other PMs as we have build recipes in addition to metadata, and Spack is always going to be at least partly centralized (or at least someone will have to write recipes because not all projects will maintain them). But it would enable some developer workflows we've been thinking about. That's another discussion, but it is related.

I think allowing multiple package.py files just by version is the way to go here, at least initially. It would support 1 and 2 and wouldn't rule out 4, and I don't think it would necessarily rule out 3.

Here's my proposal. Allow something like this:

mypackage/
    package.py
    package-2.0.py    # 2.0 on
    package-5.1.py    # 5.1 on
    ...

I suppose these could be subdirectories if we want, but things will get complicated w.r.t. locating any common patch files, etc. that might live in the directory with the package.

From a build environment perspective, this is not actually so hard to implement. Packages are instantiated using spack.repo.get(spec) with a concrete spec, and you would just need to check the version on the spec to determine which one to load. From a metadata/concretizer perspective, we'd need to work out more of the details. Dependencies would likely have an implicit when="@<version>" depending on which file we loaded them from. At least currently, variant directives would likely need to be the union of all versions' variants. We don't currently allow variants to have when clauses, but maybe that's something we should think about here. We'd need to also think about conflicts(), but I think that might be all.

This is a major change but I don't think it's a huge lift. This is probably a good place to use some type of composite Package class that contains the instantiations of all the per-version ones. That can be implemented in the loader (every Spack package repo is a full-blown Python loader to begin with, so we can insert magic there).

I think (3) could be supported in a similar way, but I have a hard time thinking about how you'd want to lay out package files to support it. In my outline above, the packages are distinguished by version, and we only allow so many characters to be in versions, so they make decent suffixes. I worry it'll get extremely cumbersome to support more than that. I think you'd need to just allow arbitrarily-named package files and require them to specify what they match as a class decorator or something. e.g.:

@when("+meson")
class MyPackage():
    pass

You could allow these classes to live in the same package.py file. I am not sure I like either of these solutions, or that the complexity adds much over just differentiating by version. The use cases for this are pretty vague so far, and I think most differentiation is going to be evolutionary; i.e. it'll happen over time, by version.

My thought for (4) is that if you do the by-version thing described here, you can pretty easily make a loader later that supports looking through git history or tags for historical versions of a package, and basically use the same techniques. That wouldn't be part of the initial implementation, but it's something I think about when this comes up.

Thoughts?

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Oct 7, 2019

The key insight I've had from this discussion is that concretization and building are different things. For the purposes of concretization, it makes sense to have just one package. But for building, we sometimes need different implementations for different versions (or variants or whatnot).

Conceptually, imagine if we had classes ConcretizationPackage and BuildPackage instead of just Package. Each of today's package.pys essentially inherits from both. But suppose we allowed them to be separated, in some cases. Then we would have something like this (syntax / conventions definitely need refining, this is just a sketch):

mypackage/
    package.py    # Concretization package
    package_A.py    # Build package
    package_B.py    # Build package
    ...

The concretization package would have (among other things): available versions (including download locations), available variants (undoubtedly with when= clauses). The build packages would have: subclass from build system specific package, install() method (or methods, depending on number of stage for the build system).

The concretization package would ALSO contain instructions on which build package to use, using standard when= syntax. This would normally be used to select by version. But it could just as well be used to select by variants, etc. Thus covering all the cases we've discussed above.

For example:

mypackage/package.py

class MyPackage(ConcretizationPackage):
    version('2.1', ...)
    version('2.0', ...)
    version('1.0', ...)

    variant('featureA')
    variant('featureB', when='@2.0:')

    build_package('MyPackage_A', when='@:1.999')
    build_package('MyPackage_B', when='@2:')

mypackage/packageA.py

class MyPackage_A(AutotoolsBuildPackage):
   def configure_args(...):
       ...

mypackage/packageB.py

class MyPackage_B(CmakeBuildPackage):
    def cmake_args(...):
        args += ['YES' if spec.featureB else 'NO']
        ...

PS: Any way we can get #10403 merged?

@chuckatkins
Copy link
Copy Markdown

mypackage/
    package.py
    package-2.0.py    # 2.0 on
    package-5.1.py    # 5.1 on
    ...

This is good. It checks a lot of the boxes for addressing most of the pain points at issue here. I'd jump on using it for mesa right away.

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Oct 7, 2019

The key insight I've had from this discussion is that concretization and building are different things.

They're not entirely different things, though. For example, the build system superclasses provide metadata, like dependencies on CMake, Python, etc. We are not using build system packages strictly as build systems.

@matz-e
Copy link
Copy Markdown
Member

matz-e commented Feb 7, 2020

I talked to @becker33 about this a bit after #10411. I think there are four use cases here:

1. Package evolves to use a new build system (by far the most common)

2. @G-Ragghianti's:
   > a solution that would allow distinct recipes based on version

3. Building the same version with different build systems.

4. Versioning a `package.py` in a git repo for a single project over time.

[snip]
I think (3) could be supported in a similar way, but I have a hard time thinking about how you'd want to lay out package files to support it. In my outline above, the packages are distinguished by version, and we only allow so many characters to be in versions, so they make decent suffixes. I worry it'll get extremely cumbersome to support more than that. I think you'd need to just allow arbitrarily-named package files and require them to specify what they match as a class decorator or something. e.g.:

@when("+meson")
class MyPackage():
    pass

[snip]
Thoughts?

Unearthing this… We've had this issue come up with the Neuron simulator, switching from Autotools to CMake. To be sure that the transition is smooth and for the peace of mind of all collaborators, it was decided to keep both build systems upstream for the time being. For now, we resolved to having a +cmake variant that triggers different build stages:

https://github.com/BlueBrain/spack/pull/644/files#diff-9c99522c3911ab2746b508533a5507ac

This should allow us to build neuron with either build system and ensure that features are en par.

FYI, @pramodk, @iomaganaris

alalazo added a commit to alalazo/spack that referenced this pull request Mar 10, 2020
closes spack#10411
closes spack#12941

This commit introduces a new repository where to move
deprecated recipes e.g. for packages that changed build
system. This will allow to benefit from base classes
specialized for each build-system while still providing
support for both recipes. Mpifileutils used as an example.
@nightlark
Copy link
Copy Markdown
Contributor

Is there a recommended best practice for packages that seem to be in the process of switching build systems (has both Makefiles and Meson build files, presumably the Makefiles will be going away in the future)? Would it be better to stick with the old build system until it gets removed or pick an intermediate version for the spack package to switch over to using Meson?

@adamjstewart
Copy link
Copy Markdown
Member

@nightlark we don't really have an agreed upon approach to handle that at the moment. For now, I would stick with a single build system for as long as you can, so if older versions only support one and newer versions support two stick with the older one. In the long run, it will probably be whichever build system is better supported by the developers, or whichever is "newer". In your example, Meson is likely better for cross-platform builds than Makefiles. CMake would be another example of a "newer" build system.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Jun 11, 2021

Closing as stale. Feel free to reopen if you want to rebase and restart working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants