Skip to content

Only reuse externals when configured#41707

Merged
haampie merged 7 commits intospack:developfrom
haampie:fix/reuse-externals-only-when-configured
Dec 20, 2023
Merged

Only reuse externals when configured#41707
haampie merged 7 commits intospack:developfrom
haampie:fix/reuse-externals-only-when-configured

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Dec 14, 2023

Closes #40843

Followup to #35975

Users often learn the hard way that removing or modifying an external in
packages.yaml is not enough to avoid the original external from getting
picked up in concretization. To users it feels like their config changes have
no effect, or they think that Spack somehow caches their old config.

I don't think it's reasonable to expect users to understand that their old
config ended up as an "installed" spec in the database, and follows the
usual reuse rules of installed specs. This jargon makes barely any sense,
since users don't think of externals being "installed in the database" at all,
their externals are "installed on the system", and the only way they interact
with them is by registering them in config -- externals getting "installed"
is really just an implementation detail. Asking users to spack uninstall
their old external by hash also raises eyebrows.

It can be quite a time waster for users, in particular the case reported in
#40843, where the only change to config was fixing a typo in the module
list for the external. Barely any user would understand that config is copied
into the database, and it's never really exposed to the user. That means
that accidental reuse of the old external with the typo is a really fun and
time consuming debugging activity (in particular since it may not cause
and error on module load, and on cray systems module load related
problems are notoriously hard to troubleshoot)

Since this such a footgun, and gets reported almost every week on Slack,
I think it would be clean and consistent to apply the same reuse condition we
already apply to build caches also to the local database: reuse externals
that are configured in packages.yaml.

@spackbot-app spackbot-app bot added the core PR affects Spack core functionality label Dec 14, 2023
@haampie haampie force-pushed the fix/reuse-externals-only-when-configured branch 2 times, most recently from c7b8824 to 490d068 Compare December 15, 2023 00:12
@spackbot-app spackbot-app bot added the tests General test capability(ies) label Dec 15, 2023
@haampie haampie force-pushed the fix/reuse-externals-only-when-configured branch from 490d068 to 6ee7e9a Compare December 15, 2023 00:13
Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add this to the upcoming major changes pinned discussion.

Also, I think we should include a feature to add all the externals associated with a given spec to be added to the config. For example, if I specify spack install foo ^bar/hash. Under this feature I may get an error that bar/hash depends on an external that is no longer configured. We should have an easy one-liner from that point to add the relevant externals to my config.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Dec 15, 2023

ping @alalazo who prefers when the database is the single source of truth instead of config files

@haampie haampie force-pushed the fix/reuse-externals-only-when-configured branch from d6acbaa to e36375a Compare December 15, 2023 13:01
@spackbot-app spackbot-app bot added the stand-alone-tests Stand-alone (or smoke) tests for installed packages label Dec 15, 2023
@haampie haampie force-pushed the fix/reuse-externals-only-when-configured branch from bbccfa9 to a91349e Compare December 15, 2023 14:04
@becker33
Copy link
Copy Markdown
Member

@haampie as I was thinking about this more last night, I realized this would require additional functionality to add the externals from the cray manifest file to the packages configuration in addition to the database. This might prove tricky, since the cray externals from the manifest can have dependency information, and we don't currently allow dependency information from externals in the packages config.

Allowing dependencies on externals looked challenging last time I looked at it, but I can take another stab at it now. The other option would be some more complex mapping between the externals in the database and the externals in the config. For example, externals in the db could store the yaml block in which they were defined, which would make identifying the correct external possible even if there are multiple satisfying externals

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Dec 15, 2023

I didn't think of the cray manifest at all, that's a bit of a blocker.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Dec 16, 2023

@becker33 can't we keep track of how externals were added and use that to determine reuse or not? That could also help with undoing an import from a cray config file.

I agree that migrating entries from db to config works, but doesn't really sound great.

I don't understand what you meant with that foo ^bar/hash example, that doesn't trigger reuse, it's just a constraint?

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Dec 16, 2023

I'll add my thoughts on this:

  1. This is in line with reuse concretization: allow externals from remote when locally configured #35975 which was merged. Can't we merge this as is as an improvement on Linux?
  2. On Cray, afaik, externals can be added from a JSON file that is on the system (and is structured in a way that is very similar to a DB) and from config files. We can monitor the JSON file AND the config files for the time being.
  3. Migrating stuff from DB to packages.yaml is a very bad idea as a) we'll end up with a less powerful DB b) we'll get back entries that are the "concretization" of something that was in the DB already c) packages.yaml will be cluttered1

As for a more long term way forward, I still advocate for an intermediate DB to store externals. That will:

  1. Solve the one to many mapping we have now on externals (right now the same external on disk has potentially many different hashes in the DB)
  2. Allow the same semantic we use here and in reuse concretization: allow externals from remote when locally configured #35975 (instead of satisfiability of an external abstract spec, we need to check if the external we want to reuse is in the intermediate DB - and don't consider it if it isn't)
  3. Allow for automatic updates / cleaning procedures for externals (the intermediate DB is under Spack control, so can be updated whenever a change in configuration etc. is detected. Based on the intermediate DB we can have commands that update, with prompt, the user DB too).
  4. (This is for us, not for users) Remove the weird concretization logic and ad-hoc rules we have for externals during concretization

Footnotes

  1. I think people will start, rightfully, to complain "Why Spack is messing with my packages.yaml"?

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Dec 16, 2023

We can monitor the JSON file AND the config files for the time being.

we don't keep track what file was used, so maybe easier to mark externals from config as such so we can apply different reuse conditions.

Idk how to deal with reuse from build cache with imported cray manifest entries, that doesn't work now 🤔 currently I think that is fine; shouldn't affect many users, and if we allow reuse there, it can cause the same issue fixed by #35975

@psakievich
Copy link
Copy Markdown
Contributor

psakievich commented Dec 17, 2023

users don't think of externals being "installed in the database" at all,
their externals are "installed on the system"

☝🏻 this times 💯.

Maybe this is crazy talk, but I wish externals could be implemented as an in-memory artifact rather than registering them in a database. I'm sure in practice this is hard to do, but in principle that seems closer to reality. An external is not a spack artifact, just a spack registration. When system externals shift under us they usually affect low level specs that require a lot of rebuilding across the DAG. Often times the best thing to do is just delete everything and start from scratch. (Especially if uninstalling by hash is the other option 😝 ).

If the system changed then spack's database is not valid anymore wrt to the external even if nothing changed in spack. AND the only way spack will be able to detect these changes ATM is if the users tell it.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Dec 20, 2023

So, what do we do here @becker33 / @alalazo?

I noticed that cray entries set spec.origin = "external-db" which ends up as part of an install record, so we do keep track after all.

Keep the PR as is but allow reuse of records marked external-db w/o corresponding config sounds sensible to me then.

haampie and others added 4 commits December 20, 2023 11:54
Users often learn the hard way that removing or modifying an external in
packages.yaml is not enough to avoid an external from getting pick up in
concretization.

Since this such a footgun, and gets reported almost every week on Slack,
I think it makes sense and is pretty clean to apply the same reuse rule
we already apply to build caches also to the local database: only reuse
those externals that are in config.
@haampie haampie force-pushed the fix/reuse-externals-only-when-configured branch from a91349e to 69f8add Compare December 20, 2023 10:54
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Dec 20, 2023

Pushed that change

@haampie haampie requested a review from becker33 December 20, 2023 15:18
Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks perfect except it should have a test for local=True with a DB entry with origin="external-db".

@haampie haampie force-pushed the fix/reuse-externals-only-when-configured branch from 4046965 to 6e4d555 Compare December 20, 2023 18:18
@haampie haampie force-pushed the fix/reuse-externals-only-when-configured branch from 6e4d555 to d6b4470 Compare December 20, 2023 18:18
@haampie haampie enabled auto-merge (squash) December 20, 2023 18:41
@haampie haampie merged commit 1aaab97 into spack:develop Dec 20, 2023
@haampie haampie deleted the fix/reuse-externals-only-when-configured branch December 20, 2023 19:41
RikkiButler20 pushed a commit to RikkiButler20/spack that referenced this pull request Jan 31, 2024
Users expect that changes to the externals sections in packages.yaml config apply immediately, but reuse concretization caused this not to be the case. With this commit, the concretizer is only allowed to reuse externals previously imported from config if identical config exists.
kwryankrattiger pushed a commit to kwryankrattiger/spack that referenced this pull request Jul 9, 2024
Users expect that changes to the externals sections in packages.yaml config apply immediately, but reuse concretization caused this not to be the case. With this commit, the concretizer is only allowed to reuse externals previously imported from config if identical config exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core PR affects Spack core functionality stand-alone-tests Stand-alone (or smoke) tests for installed packages tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

External packages modules are cached in opt/spack/.spack-db without being updated on configuration changes

4 participants