Skip to content

Package.installed now checks both the presence of prefix and a DB entry#8038

Merged
scheibelp merged 3 commits intospack:developfrom
epfl-scitas:fixes/external_packages_are_always_installed
Jul 17, 2018
Merged

Package.installed now checks both the presence of prefix and a DB entry#8038
scheibelp merged 3 commits intospack:developfrom
epfl-scitas:fixes/external_packages_are_always_installed

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented May 7, 2018

fixes #8036

Before this PR Package.installed was returning True if the spec prefix existed, without checking the DB. This is wrong for external packages, whose prefix exists before being registered into the DB.

Now the property checks for both the prefix and a DB entry.

# If the spec is in the DB, check the installed
# attribute of the record
rec = spack.store.db.get_record(self.spec)
db_says_installed = rec.installed
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp @tgamblin I was wondering what is a case that could leave us with a record in the DB that has rec.installed ==False. Do you have any in mind?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When packages are uninstalled the corresponding record is not removed - instead it is marked as .installed = False

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem the case to me, that's why I was wondering if I missed some corner case where it could happen:

$ spack find
==> 0 installed packages.

$ cat opt/spack/.spack-db/index.json 
{
 "database": {
  "installs": {},
  "version": "0.9.3"
 }
}
$ spack install zlib
==> Installing zlib
==> Fetching file:///home/mculpo/production/mirror/zlib/zlib-1.2.11.tar.gz
...
  Fetch: 0.02s.  Build: 4.33s.  Total: 4.35s.
[+] /home/mculpo/PycharmProjects/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-8/zlib-1.2.11-ypfhlhjehavum42gl6uvax7o6kzhseoi

$ cat opt/spack/.spack-db/index.json 
{
 "database": {
  "installs": {
   "ypfhlhjehavum42gl6uvax7o6kzhseoi": {
    "explicit": true,
    "installation_time": 1526021395.450302,
    "ref_count": 0,
    "installed": true,
    "path": "/home/mculpo/PycharmProjects/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-8/zlib-1.2.11-ypfhlhjehavum42gl6uvax7o6kzhseoi",
    "spec": {
     "zlib": {
     ...
}

$ spack uninstall -y zlib
==> Successfully uninstalled [email protected]%gcc@8+optimize+pic+shared arch=linux-ubuntu18.04-x86_64 /ypfhlhj

$ cat opt/spack/.spack-db/index.json 
{
 "database": {
  "installs": {},
  "version": "0.9.3"
 }
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp Any comment on:

When packages are uninstalled the corresponding record is not removed - instead it is marked as .installed = False

and the post above? As far as I can tell it always worked like that (removing the entry from the DB).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how that comes about, nevertheless it is safest to proceed by checking the .installed field of the DB record since that is there.

@becker33 do you have a notion of when the .installed field on a DB record is used?

# above raises a Key error
db_says_installed = False

return has_prefix and db_says_installed
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I did the simplest modification possible. Don't know if you want to raise, or log a message somewhere, in case we find something inconsistent, like has_prefix == False and db_says_installed == True.

@scheibelp
Copy link
Copy Markdown
Member

This feels like a matter of interpretation: I don't normally have an issue with considering external packages as installed. Does something go wrong when external packages are considered installed?

Would this also imply there needs to be consideration about when a user adds an external package but doesn't explicitly install it? In this case has_prefix would be True and db_says_installed would be False. As of now the tutorial on external packages doesn't explicitly instruct the user to invoke spack install for external packages (see: http://spack.readthedocs.io/en/latest/tutorial_configuration.html#external-packages). If it's going to report installed = False then there should be a warning when the values don't match: I think users may have logic which checks for Package.installed; from skimming through the codebase I don't think that exists now, but I think it's reasonable for users to check for that.

Also the unit test looks good although IMO instead of directly invoking database.install it would be best to invoke Package.install to confirm that the external ends up installed in the database (if you agree then it also might make sense to move this test to the test/install.py module).

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 11, 2018

This feels like a matter of interpretation: I don't normally have an issue with considering external packages as installed. Does something go wrong when external packages are considered installed?

Nothing super-severe, but if you consider external packages installed without being registered in the DB:

  1. spack find won't show them
  2. Module files are not generated for them
  3. spack install will take action for something that is considered already installed

Would this also imply there needs to be consideration about when a user adds an external package but doesn't explicitly install it?

Probably a couple of lines in the doc that inform users of the discrepancies above? If you want I can try to add a sentence somewhere to show how installing an external package differs from installing a regular one.

I think users may have logic which checks for Package.installed; from skimming through the codebase I don't think that exists now, but I think it's reasonable for users to check for that.

That's how I hit this issue. I wrote a command to help scripting a deployment procedure (#7899) and noticed that external packages were always considered installed, even if they were not explicitly spack installed. That prevented logic like "if X is not installed, then install it (which also means generate module files, etc.)" from working.

Also the unit test looks good although IMO instead of directly invoking database.install it would be best to invoke Package.install to confirm that the external ends up installed in the database (if you agree then it also might make sense to move this test to the test/install.py module).

👍

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 11, 2018

@scheibelp I just double checked this:

Also the unit test looks good although IMO instead of directly invoking database.install it would be best to invoke Package.install to confirm that the external ends up installed in the database

and the database fixture is done in such a way to call:

def _install(spec):
    s = spack.spec.Spec(spec)
    s.concretize()
    pkg = spack.repo.get(s)
    pkg.do_install(fake=True)

on its install method (thus I think we are done here, right?). Let me know if you still want to move the test. I have no objection to it in case.

@scheibelp
Copy link
Copy Markdown
Member

That's how I hit this issue. I wrote a command to help scripting a deployment procedure (#7899) and noticed that external packages were always considered installed, even if they were not explicitly spack installed. That prevented logic like "if X is not installed, then install it (which also means generate module files, etc.)" from working.

It should be ok to invoke Package.do_install on an already-installed package, but I agree that if not X.installed: X.do_install is sensible.

IMO there are two competing desires:

  • Users that add externals should expect that Spack considers them installed without invoking Spack install on them (admittedly nothing that I know of now depends on this, I just think it's reasonable)
  • Assuming that one-time actions are tied to installation state, the user should be able to trigger those actions (which is your issue in Added a new command to filter input specs based on their properties #7899)

Would there be an issue with having Package.installed automatically update the DB record etc. if it is missing?

Probably a couple of lines in the doc that inform users of the discrepancies above?

I'm worried this would be fairly common and it would be nice to handle things automatically if possible.

and the database fixture is done in such a way to call... on its install method (thus I think we are done here, right?)

I don't think there's a test which confirms right now that invoking Package.do_install will appropriately add an external to the database. Even if we know that Package.do_install is called indirectly by the mock db I think that (a) it adds work for people to verify this and (b) the behavior of the mock db may change in the future.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 14, 2018

Would there be an issue with having Package.installed automatically update the DB record etc. if it is missing?

I see two valid reasons why not doing that:

  1. Entries in packages.yaml act more like constraints than full specs (there's no 1:1 mapping from an entry to a single spec that needs to be installed)
  2. It would be surprising to have a querying property with side-effects like writing module files on disk or modify the DB

By 1. I mean that, for instance, the configuration file in #8036:

packages:
  openssl:
    buildable: False
    paths:
      openssl@system: /usr/lib

says that whenever openssl is needed it must be at version @system, but leaves open the compiler and target part of the spec. In this sense it's just prescribing a constraint to the concretizer, rather than identifying a single spec that was installed manually outside of Spack. If we need openssl %gcc and openssl %intel we'll end-up using the same external installation for both (and this is completely fine).

Regarding 2. I think I would really be surprised (not positively) by a behavior like:

# Evaluating the condition might write module files to disk for external packages
# or modify spack DB
if openssl.installed:
   # This will never be hit by external packages, even if they don't appear in `spack find`
   pass

TLDR I would stay explicit, and let people install via spack install <specs> whatever they need without having to differentiate between external packages and packages managed by Spack.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 14, 2018

I don't think there's a test which confirms right now that invoking Package.do_install will appropriately add an external to the database. Even if we know that Package.do_install is called indirectly by the mock db I think that (a) it adds work for people to verify this and (b) the behavior of the mock db may change in the future.

There is, but is done indirectly via the database fixture:

def test_external_entries_in_db(database):
install_db = database.mock.db
rec = install_db.get_record('mpileaks ^zmpi')
assert rec.spec.external_path is None
assert rec.spec.external_module is None
rec = install_db.get_record('externaltool')
assert rec.spec.external_path == '/path/to/external_tool'
assert rec.spec.external_module is None
assert rec.explicit is False
rec.spec.package.do_install(fake=True, explicit=True)
rec = install_db.get_record('externaltool')
assert rec.spec.external_path == '/path/to/external_tool'
assert rec.spec.external_module is None
assert rec.explicit is True

This was added as part of #1167

@scheibelp
Copy link
Copy Markdown
Member

There is, but is done indirectly via the database fixture:

I think there should be a test to ensure that Package.do_install does this directly: if someone had forgotten to update the database in Package.do_install, the behavior would be incorrect but test_external_entries_in_db would still pass.

[the packages.yaml entry] says that whenever openssl is needed it must be at version @system, but leaves open the compiler and target part of the spec. In this sense it's just prescribing a constraint to the concretizer, rather than identifying a single spec that was installed manually outside of Spack.

I think we support that mainly because we don't want to necessarily force users to be specific about the details. Even though we weren't specific, the external openssl does actually have a specific compiler that it was built with. Not specifying the compiler could lead to ABI incompatibilities. You mention

this is completely fine

so I'm assuming perhaps this works out for openssl specifically. I don't think it's guaranteed to work in general.

It would be surprising to have a querying property with side-effects like writing module files on disk or modify the DB

I agree that would be strange.

I want to make sure that if we require users to explicitly install externals, that it's clear to users what is going wrong if an external is considered not-installed. Perhaps something close to that would be to print a warning message when a not-installed external spec is encountered.

Actually wouldn't this be rare? If the external appears as a dependency in a spec DAG and that DAG is installed, the database should be updated with the external (or I think that's how it works from looking at Package.do_install).

In other words I'm asking: when do you encounter an external that hasn't been registered in the DB?

It may also be strange to users to have to explicitly install the same external multiple times with different specs if they want Spack to treat it as installed in different contexts (DAGs).

@alalazo alalazo force-pushed the fixes/external_packages_are_always_installed branch from e5e3765 to 30cee13 Compare May 21, 2018 06:22
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 21, 2018

I think there should be a test to ensure that Package.do_install does this directly: if someone had forgotten to update the database in Package.do_install, the behavior would be incorrect but test_external_entries_in_db would still pass

This, I think, as been taken care of in the latest rebase - at least for the modifications that are part of the PR.

Even though we weren't specific, the external openssl does actually have a specific compiler that it was built with. Not specifying the compiler could lead to ABI incompatibilities.

Agreed, but with the caveat that I don't see it as a problem. To explain better my point of view: I interpret the semantic of entries in packages.yaml somewhat like how the keyword unsafe is interpreted in certain languages: it lets you take control over some dependencies, but then you are on your own all the way down from there (i.e. without the checks that Spack does for you to ensure you have a consistent DAG). I think this is sensible as we can't expect a user to fill in all the details for a spec that Spack would have filled for him.

In other words I'm asking: when do you encounter an external that hasn't been registered in the DB?

It may also be strange to users to have to explicitly install the same external multiple times with different specs if they want Spack to treat it as installed in different contexts (DAGs).

For openssl you're completely correct: it will be installed as a dependency when needed. The issue solved here arises, in my experience, when you are deploying a complete environment on clusters and is specific to third party applications that are distributed as binaries.

At least in our case, these applications (like gaussian, adf, etc.) are installed manually in a folder separate from the installations managed by Spack. Yet we want our users not to be aware of the difference and provide them a module file as we do for all the other libraries and applications. This module file, for consistency and convenience, should be generated by Spack. To generate it we need to install the external application explicitly and for as many different targets as we have on our clusters.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Jun 8, 2018

@scheibelp Pinging you as you told me to do 😄

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Jul 13, 2018

ping

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems fine, in fact as I think on it, it may be sensible to use the database exclusively to check for installation, e.g. to account for module-only externals (not that this needs to be considered in this PR).

The docstring of check_for_unfinished_installation needs to be updated (since it is no longer "more strict" than Package.installed).



@pytest.mark.regression('8036')
def test_regression_issue_8036(mutable_database):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish this test didn't depend on the existence of a system path - would it be possible to just set Spec.prefix to some path you create with a pytest fixture?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit. I patched os.path.isdir for this particular regression test.

alalazo added 2 commits July 15, 2018 21:48
fixes spack#8036

Before this PR Package.installed was returning True if the spec prefix
existed, without checking the DB. This is wrong for external packages,
whose prefix exists before being registered into the DB.

Now the property checks for both the prefix and a DB entry.
Improved the docstring of two functions (``installed`` and
``check_for_unfinished_installation``).

A new fixture has been added so that the regression test is
platform independent, and does not rely on ``/usr`` being present
in the system.
@alalazo alalazo force-pushed the fixes/external_packages_are_always_installed branch from 30cee13 to 91462b3 Compare July 16, 2018 06:55
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Jul 16, 2018

@scheibelp Should be done. Let me know if there's anything else you want changed.

@pytest.mark.regression('8036')
def test_regression_issue_8036(mutable_database, usr_folder_exists):
# This version should not be installed on entry, but it points to /usr
# which is a directory that most likely exists everywhere (see #8036)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalazo now that you have added usr_folder_exists, you can replace

... but it points to /usr which is a directory that most likely exists everywhere

with

... but the test arranges for the package prefix to exist

Or perhaps altogether:

The test ensures that the external package prefix is treated as existing. Even when the package prefix exists, the package should not be considered installed until it is added to the database with do_install

The initial comment in the regression test was outdated by the last
change. Now it has been updated and is worded more clearly
(thanks to Peter's suggestion).
@scheibelp scheibelp merged commit 373b3d2 into spack:develop Jul 17, 2018
@scheibelp
Copy link
Copy Markdown
Member

Thanks!

@alalazo alalazo deleted the fixes/external_packages_are_always_installed branch July 17, 2018 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working external-packages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

External packages appear installed even if they are not

2 participants