Skip to content

Discussion: Move built-in packages to a separate repository #47480

@tldahlgren

Description

@tldahlgren

Supersedes #1773

@tgamblin @becker33 @haampie @alalazo @psakievich @citibeth

Contents

Summary

The Spack repository contains a significant number of built-in package recipes and associated files that are needed for installing software. This discussion proposes to move those files into a separate repository and modify the core software to support the change.

Task Checklist

The following is a checklist of the high-level tasks described in Approach. Each task should correspond to an approach subsection.

Back to Contents

Rationale

Spack packages specify the versions, options, and processes required for installing and optionally testing software. Their directories may contain additional files, such as patches and custom tests. For some, the number of associated files is a significant hurdle while package drift is a key issue for others. This change is also considered by the community as a requirement of Spack 1.0.

With over 8,340 packages, the number of files associated with built-in packages currently comprise 86% of a fresh clone of the Spack repository. Embedding the ever growing number of packages in the same repository as the core software creates a burden for some installation environments, especially those with space and or i-node constraints.

Another problem with maintaining built-in packages in the same repository is that packages generally evolve independently, which can be problematic for projects that rely on a specific version (or commit) of a package (e.g., for software they've already installed) but want or need to get updates to the core Spack software. This "drift" in the package implementation can lead to the unanticipated and possibly undesirable installation of new versions of dependent software.

Maintaining packages in the same repository as the core software leads to problems for many Spack users. These problems range from excessive use of file space/i-nodes to the effects on project development associated with package implementation "drift".

Back to Contents

Description

While Spack already supports multiple package repositories (https://spack.readthedocs.io/en/latest/repositories.html), there are additional considerations beyond simply moving the built-in packages folder into a separate repository (https://docs.github.com/en/get-started/using-git/splitting-a-subfolder-out-into-a-new-repository) and configuring the new repository. Hence, the need for discussing issues here include:

  • Which directories and files need to be moved?
  • How will PR checks be affected?
  • How will spack fetch packages for bootstrapping? Where?
  • What is the process for fetching and configuring the package repository?
  • How will updating separate repositories be managed and changes synchronized?
  • What changes, if any, are needed to the many commands that work on the built-in repository?

Back to Contents

Affected Directories and Files

Which directories and files need to be moved?

At a minimum the new package repository shall contain the build systems ($SPACK_ROOT/lib/spack/spack/build_systems) and built-in packages (under $SPACK_ROOT/var/spack/repos/builtin/packages). Additional files may need to be moved depending on the approach chosen for handling CI checks (described in the next section).

Mock packages are not to be moved to the new repository since they are integral to Spack's unit tests.

Back to Contents

PR Checks

How will PR checks be affected?

Pull requests (PRs) trigger a number of checks related to packages. One of the checks uses GitLab CI pipelines to rebuild modified packages to maintain the build cache. Our plan is to migrate these pipelines to the new packages repository (away from core Spack). Unit tests in the core Spack repo will be responsible for properly exercising the package API to maintain compatibility between these two repos.

Spack performs additional checks to include style and audits. Those relevant to packages-only would need to be moved to the new repository; while common ones would need to be copied.

At some point in the not-too-distant future we hope to trigger running stand-alone tests for modified packages. This check could continue as originally envisioned to be part of the CI workflow approach above or could be separated (assuming it can be triggered after any build cache updates or any non-build cache builds that might be added by @bernhardkaindl as discussed in the 2024 Nov 6 Technical Steering Committee meeting).

Back to Contents

Bootstrapping

How will spack fetch packages for bootstrapping? Where?

Spack has several sets of requisite software that must be bootstrapped:

Back to Contents

Package Repository

What processes will be used for fetching and configuring the package repository?

Spack will need the ability to configure the package repository and a process for fetching packages. Spack currently supports multiple package repositories through entries in a repos.yaml file (see the docs at https://spack.readthedocs.io/en/latest/repositories.html). However, those entries are required to be on the file system.

If that mechanism is retained, then it will need to support remote package repositories (specified with URLs that can include a commit hash) and the default configuration ($SPACK_ROOT/etc/spack/defaults/repos.yaml) modified to point to the new package repository. The relevant packages would need to at least be fetched from that location and likely cached locally.

Alternatively, spack could support a package registry with a modified or alternate representation for the configuration (such as that used by cargo, which supports named registries with URLs).

Progress:

Back to Contents

Repository Updates and Synchronization

How will updating separate repositories be managed and changes synchronized?

There are changes to Spack core that require changes to packages and or, if moved, CI stacks. Some features, such as adding support for stand-alone tests, are optional while others, such as a change in syntax (e.g., maintainers) or the schema for configuring CI stacks, are not. Mandatory changes to packages resulting from changes to the core require coordination of updates to both repositories.

One approach to managing updates and synchronization is to include the package repository as a git submodule of the Spack (core) repository. While this approach makes it somewhat easier to synchronize changes to both repositories (since you can clone the project repository under the core repository), its use of git is more advanced than many contributors may be accustomed (https://git-scm.com/book/en/v2/Git-Tools-Submodules).

An alternative is to rely on some form of dependency management. (TBD: What would this look like? Would we support synchronization features related to the repository configuration? CI checks?)

Back to Contents

Package Commands

What changes, if any, are needed to the many commands that work on the built-in repository?

Commands that expect to operate on built-in packages shall continue to do so, which we'll need to be confirmed (https://spack.readthedocs.io/en/latest/command_index.html#command-reference). These include package creation (e.g., create), query (e.g., find, info), build (e.g., gc), and developer (e.g., blame, pkg).

Approach

There is a significant amount of work needed to accomplish a well designed, implemented, and supported split for the repository. We split the work into more manageable tasks. The high-level tasks, not in priority order, are described here.

Explore Relevant Features of Other Products

There are several tools having relevant features that we would like explored in terms of processing. These can include:

And possibly:

Back to Contents

Document User Stories or Use Cases

It would be helpful to everyone working on these tasks to have a better understanding of the use cases this work is addressing, to include the uses for and how people expect to utilize the features being implemented here.

Anticipated Use Cases

1. pinned Spack + pinned Packages

  • Intended audience: users that greatly value stability and are okay using older versions of software.
  • Pros:
    • Most stable
  • Cons:
    • Cannot install new versions of packages as they are released
    • Cannot benefit from new features and performance improvements as they are added to Spack
  • CI requirements:
    • one-time thorough installation testing (and build cache population) for packages of interest when releases are created.

2. pinned Spack + floating Packages

  • Intended audience: users that value stability but want to keep their dependencies as up-to-date as possible.
  • Pros:
    • Can quickly and easily install new versions of packages.
    • Minimize exposure to temporary regressions in the core Spack code base.
  • Cons:
    • “Early adopter” for packages: willing to accept temporary regressions in package definitions.
  • CI requirements:
    • CI pipelines using a recently verified version of Spack to test relevant, proposed changes to the Packages repository before they are merged. This is essentially our existing GitLab CI pipelines, which will be relocated to the Packages repo.

3. floating Spack + pinned Packages

  • Intended audience: users with pinned dependency versions that are eager to adopt new functionality and performance improvements for Spack.
  • Pros:
    • Access to latest Spack features & improvements
    • Stable package recipes
  • Cons:
    • Can’t easily install new versions of dependencies
    • Willingness to accept temporary regressions in core Spack
  • CI requirements:
    • Unit tests in Spack that thoroughly exercise the Package API.
    • More generally, Spack's existing pytest suite.

4. floating Spack + floating Packages

  • Intended audience: users that want cutting edge dependency versions and the latest features Spack has to offer.
  • Pros:
    • access to both the latest features in Spack, as well as most up-to-date versions of package recipes.
  • Cons:
    • willingness to accept some instability as regressions are addressed in either repo.
  • CI requirements:
    • This use case will automatically benefit from CI supporting use cases 2) and 3).
    • Will additionally require acceptance tests where we periodically (weekly?) do a “rebuild everything” with bleeding edge versions of both repos.
    • If these acceptance tests pass, we bump the pinned version of Spack used for testing in the packages repo (and vice versa, if we end up deciding to test some packages from core Spack).
    • When acceptance testing fails we will try to fix it quickly to resume automated snapshot releases.

Back to Contents

Move Build Systems into BuiltIn Package Repository

Determine what needs to be moved from the core's build systems and into the builtin package repository. This task will include any restructuring and refactoring of existing modules and updates/additions to the corresponding unit tests.

Since these changes will impact how packages utilize the core's API, packages that violate the new API will need to be identified and their upgrade status tracked. One proposed option is to maintain a file the records individual packages that violate the new API. One example that was given was dbcsr, which utilizes the protected (some say private) function self._if_ninja_target_execute().

See #47480 (comment) for a snapshot of imports that could be affected by this task.

Progress

Back to Contents

Assess Untangling Unit Tests

Determine what aspects of unit tests are tied to packages in the builtin package repository (e.g., testing ecosystem) and which are specific to the core software. Come up with solutions to any issues such that the (remaining/revised) core unit testing process works after the builtin package repository is moved to a new GitHub repository.

Make as many changes as possible prior to the actual repository spit and create a plan for the remaining tasks.

Progress

- #48232 (Umbrella PR)

Back to Contents

Untangle Compiler Wrappers

Compiler wrappers are part of the package hash so this work needs to be assessed to determine what should be part of the core and what should be move to the separate builtin package repository. Implement solutions to any issues that arise, which may include restructuring and refactoring existing modules (e.g., establishing a compiler interface in Spack core) and updates/additions to the corresponding unit tests.

Warning: This task is dependent on maturing support for compilers as dependencies.

Back to Contents

Support Repository Compatibility

After the split, we anticipate maintaining the following independent versions:

  • Spack version (semver, major.minor.patch)
    • We expect to maintain the current cadence of two releases per year.
  • Spack package API version (semver, major.minor)
    • The package API includes the spack.package module, which is a small subset of Spack's scripting API. It also includes the structure of a repository (both on the filesystem and as Python modules).
    • The minor version is bumped if the package API is extended in a backward compatible way. For example: a new directive is added.
    • The major version is bumped if the package API has a breaking change. For example: a directive is removed, a directive's function signature is changed in a breaking way, or the repository filesystem layout is changed in an incompatible way. We will strive to avoid major version bumps as much as possible.
    • Each package repo will define the package API version for recipes contained within. If a repo specifies compatibility with package API version 1.3 it means >=1.3 and <2.
    • CI for the package repo will need to use an appropriate version of Spack (one that can understand this package API version).
  • Spack package repo version (date + patch release, e.g. 2025-06.2)
    • Used for binary caches

Compatibility Guarantees

All versions of Spack within a major release stream (e.g. 1.x.y) will be able to understand the package API version supported by the initial major release (1.0.0). If at some point in the future Spack drops support for a package API version, this will require a major version bump for Spack.

Every Spack release will be able to read packages for the prior two years worth of Package Repo releases.

When a backward incompatible change to the package API is deemed necessary, we will release a final supporting version of Spack and the official Package repo before bumping the package API version.

Spack will define and announce what Package API version(s) it understands.

Back to Contents

Determine Support for Bootstrapping

As described in Bootstrapping](#bootstrapping), Spack relies on a core set of packages. Determine what is needed to support bootstrapping once the builtin package repository is no longer available.

Back to Contents

Assess Untangling GitHub Actions and CI

As discussed in PR Checks, Spack has GitHub actions that result in a variety of checks against the proposed changes. Determine which checks are common to the core and package repositories, specific to the core software, and specific to the *builtin package repository. Document where each belongs and what is needed to ensure they are being performed for the proper repositories.

Back to Contents

Separate the Builtin Repository

Once the groundwork is in place in terms of the API and a plan is in place regarding GitHub Actions and CI, create the new builtin repository and configure the GitHub Actions and CI accordingly. Be mindful of the Affected Paths.

Ensure that the build (and test) outputs continue to get reported in CDash at https://cdash.spack.io/index.php?project=Spack+Testing.

Progress:

Back to Contents

Support and Access Package Repositories

One of the goals is to support the builtin package repository in the same manner as other package repositories. So there will be no use of git submodules or subsets. This means that not only will builtin packages be moved into and retrieved from a separate (GitHub) repository, but the mechanisms for staging, ensuring compatibility, moving package repositories to their local (cache) location, and accessing the packages from the local cache need to be designed and implemented.

Spack already supports caching of included remote files for environments but the goal is to design and implement the processes such that they can be used to support not only package repositories but mirrors and environments.

Preliminary ideas for specifying repositories have the following forms:

repos.yaml:

repos:
- url: https://github.com/my/package/repo.git
  ref: 1.0.0
  namespace: builtin
  path: /path/to/local/repo

Repositories can also be overridden in a spack.yaml file:

spack:
  repos::
  - url: https://github.com/other/package/repo.git
    ref: develop
    namespace: bultin
    path: /path/to/local/builtin/repo

If a path is not given, then it would default to one specified/calculated by Spack.

See also Package Repository](#package-repository).

Warning: This task relies on the work done in Support Repository Compatibility.

Back to Contents

Confirm Package-Related Commands Work

Confirm that all of the package-related commands continue to work as before with the separated builtin package repository.

Back to Contents

Add Repository CLI Commands

Design and implement new CLI (sub)commands for interacting with the repositories. At a minimum, there should be the ability to retrieve and update repositories. Preliminary ideas include:

$ spack repo get  [<repo-name>]  # <repo-name> assumes presence in `repos.yaml`; get all repos if no name provided
$ spack update <repo-name> [<version>]   # Update the named repository to the specified version
$ spack update --advance-packages  # Alternate proposal to update spack *and* get the latest package repo(s) commits

In the case of the first syntax for spack update, the <version> is optional because a repo with a reference to a branch (e.g., develop) would only need to retrieve the latest commits.

See #47480 (comment) for the alternate spack update syntax.

The update command would need to update repos.yaml if the version is different from that provided. The command should also ensure a new version is compatible with Spack's version.

Back to Contents

Additional information

Back to Contents

General information

  • I have searched the issues of this repo and believe this is not a duplicate of an open issue

Back to Contents

Metadata

Metadata

Labels

discussionepicA high level task that is broken down into smaller, more focused, units of workfeatureA feature is missing in Spack

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions