Skip to content

Package-Level Module Autoloading#3134

Closed
citibeth wants to merge 4 commits intospack:developfrom
citibeth:efischer/170211-Autoload
Closed

Package-Level Module Autoloading#3134
citibeth wants to merge 4 commits intospack:developfrom
citibeth:efischer/170211-Autoload

Conversation

@citibeth
Copy link
Copy Markdown
Member

@adamjstewart @tgamblin

This addresses issues brought up in conversation between @citibeth and @alalazo on module autoloading in #1662...

It introduces the concept of autoloading at the Package and depends_on() level.

  1. If you put autoload=True in package, and anything that depends on that package will autoload its module. The canonical example here is PythonPackage. Anything depending on a PythonPackage subclass will be autoloaded. For example, py-numpy will be autoloaded by anything that depends on it. However, if py-numpy depends on (say) openblas, then openblas will not be autoloaded.

  2. You can override the package-level autoloading in the depends_on() declarations. py-netcdf depends on netcdf. But by default, netcdf will NOT be autoloaded from py-netcdf because netcdf does not subclass PythonProject (and autoload=True was not set in netcdf/package.py). However, Python extensions (such as the one generated in py-netcdf) do not have RPATHs, and therefore will not be able to properly link netcdf without something in LD_LIBRARY_PATH. Therefore, we use the following in py-netcdf/package.py:

depends_on('netcdf', autoload=True)

I believe that this feature will provide the tools to eventually make module autoloading "just work," based on particulars of the packages. I expect there will be some adjustment, as we find exceptions to the default autoloading rules (such as PythonPackage.autoload=True).

The way to specify autoloading described in this PR does not affect the way autoloading is currently specified in modules.yaml. If there is demand, I could also add a switch to modules.yaml that would turn this feature off entirely.

1. Put autoload=True in package, and anything that depends on us will autoload our module

2. Put depends_on(xyz, autoload=...) and it will override xyz.autoload
Elizabeth Fischer added 3 commits February 11, 2017 22:26
2. Take into account that virtual packages don't have a package object that can provide an autoload value.
Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remain of the opinion expressed in #1662: I don't like information on which modules should be auto-loaded be part of the directives. This comes out of concerns for modularity: packages currently don't have any kind of knowledge that modules exist, while after this PR we will have a keyword in a directive which refers to an option of a particular post-install hook.

I see that we may want more flexibility than what is currently available for auto-loading modules. Nonetheless I think we may achieve that:

  1. generalizing the syntax for auto-loading modules
  2. shipping a modified modules.yaml that enables by default auto-loading for selected modules

For instance, if we want that packages load any dependency that depends on python and python itself I suggest we extend the current syntax to something like:

modules:
  tcl:
    all:
      autoload:
        -  '^python'
        -  'python'

Does this seem a viable alternative? Or are there cases that wouldn't be covered by such an approach?

@citibeth
Copy link
Copy Markdown
Member Author

I remain of the opinion expressed in #1662: I don't like information on which modules should be auto-loaded be part of the directives. This comes out of concerns for modularity: packages currently don't have any kind of knowledge that modules exist, while after this PR we will have a keyword in a directive that refers to an option of a particular post-install hook.

In a perfect world, all Spack-built packages would have full RPATHs, and modules would barely be useful. The world is not perfect, and only compiled binaries consistently get the RPATH treatment. Python, R, Java, Lua, etc. all use the equivalent of LD_LIBRARY_PATH; meaning that to use package py-x, you have to set an env var pointing to the dependencies of py-x. Other packages pick up data files or libraries to load on-the-fly from their dependencies, and expect to find those data files through randomly-named env vars. Spack's compiler wrappers also don't work for Python extensions, meaning they need LD_LIBRARY_PATH to find the underlying C libraries they depend on.

With that in mind, consider:

Getting Complete Info for Env Var Assembly

Spack packages already indicate what environment variables must be set in order for dependents to use them at runtime. This is specified, in the package, in setup_environment(). So module-related stuff is already specified in the packages, as the second argument to setup_environment().

Let's think clearly about the problem of assembling the environment required to use a package at runtime. Currently, that problem is only partially addressed. Yes, some information along those lines is accepted in the second argument of setup_environment(). But without knowing which dependencies of A provide env vars that need to be forwarded to A, Spack does not know how to fully assemble the environment needed to run A. The only way to really know I'm getting something that will work is to load all the env vars set by A and all its dependencies.

The autoload directive provides the criticial missing information Spack needs to assemble the correct environment to run A. Note that I haven't used the word "module" in this explanation. The core problem that needs to be solved is how to assemble the correct environment needed to run A. With this PR, the packages now provide that information.

Would you like the autoload keyword to be named something else??? I have no problem with renaming it...

Issues of Modularity

Modules are one particular way to assemble the environment required to run package A. Consider another, for sake of thought experiment: one could make a spack run command that would assemble the environment needed to run a particular spec on-the-fly, based on running a bunch of setup_environment() methods, and then run an arbitrary command in that environment.

This PR does not make core Spack know about or favor modules or any other way of assembling a runtime environment. Modularity is preserved. What this PR does do is provide information --- in addition to information already provided through setup_envrionment() --- that modules or other environment-assembling systems can use to assemble the correct environment in a reliable fashion. That information did not previously exist, leaving us with modules that didn't quite work when transitively assembled environments were required.

Motivation of this PR: Why Now?

From a practical standpoint... After losing two battles on the issue, I have concluded that users don't like being asked to edit packages.yaml. Users learn about how to build packages, and they want that to be enough. packages.yaml is a different file with a different format and different semantics from the way they usually do things in package.py; see #3131 for musings on that issue. The result is the users want to put stuff in packages that they really shouldn't: arbitrary and random forwarding of variant, pre-set variants or versions based on one specific machine, etc.

My proposed solution to this issue is to provide BundlePackage, which will allow people to separate "plain vanilla" package creation from whatever modifications they want to make, global to the DAG, for their own purposes; see #3133 for a brief discussion on planned use cases.

Now suppose that my big numerical simulation is split into 12 Spack packages (quite common in fact), and I've created a single overall Bundle package (call it ueber) that depends on those 12 and sets variants on them in a coordinated fashion. When I write spack load ueber, I will now expect my 12 "smaller" package to be loaded (or at least, some of them, depending on the circumstances). I don't want to have to mess with packages.yaml or modules.yaml to get this working: that is when we will lose users, and get resistance to our begging and pleading for clean/reusible package.py files.

Conclusion: This PR is necessary to achieve the goals outlined in #3133. That is why I implemented it this weekend, instead of putting it off for later. I believe this will also play into our ongoing disucssion on Spack Environments.


I see that we may want more flexibility than what is currently available for auto-loading modules. Nonetheless I think we may achieve that:

I appreciate your offer to extend modules.yaml semantics. Note that this PR leaves modules.yaml 100% intact, and in no way prevents further improvement of modules.yaml. I would also be happy to add an option to ignore all Package-level autoload directives, if you want to run Spack that way. However... as described above, I firmly believe that the future lies in having packages provide the complete information required to assemble the correct environment.

I must also point out that the ability to use that information when generating modules (our only real-life way of assembling env vars) rests entirely on your significant contributions to module functionality. Without that existing machinery, this PR could not exist.

@paulhopkins
Copy link
Copy Markdown
Contributor

Doesn't the depends_on type run already serve the same purpose as autoload=True. If it is required to use the application or library, and this is not already satisfied by a linking mechanism then it should be auto loading it.

I agree with @alalazo that we should be shipping autoload modules by default. I did started to work on this with some other changes, I will fix it up....

@citibeth
Copy link
Copy Markdown
Member Author

citibeth commented Feb 13, 2017

Doesn't the depends_on type run already serve the same purpose as autoload=True. If it is required to use the application or library, and this is not already satisfied by a linking mechanism then it should be auto loading it.

Great observation. However, I've concluded the two are different, due to RPATH. Consider the canonical case where A->B where A is a binary executable using B, a shared library. In this case A depends_on(type='run') B because B is required to exist in order to run A. However, A does NOT need to autoload B, since A is built with RPATHs.

When considering dependencies (i.e. edges of the DAG), note that edges where autoload=True are a subset of edges where type='run'.

Also, whether an edge needs autoload=True is a function of BOTH ends of the edge. For example, consider A->B->C. If B is not built with RPATH, then we must use package-level B.autoload=True; because everyone depending on B will require C in their LD_LIBRARY_PATH. However... even if B is built with RPATH, we still might need depends_on(B, autoload=True) in A, if A is not built using RPATH. This justifies the two forms of the autoload keyword here.

@scheibelp
Copy link
Copy Markdown
Member

scheibelp commented Apr 26, 2017

Consider the canonical case where A->B where A is a binary executable using B, a shared library. In this case A depends_on(type='run') B because B is required to exist in order to run A.

This conflicts with the proposed meaning of the "run" deptype in #3768 - not to say this makes it incorrect but I'm wondering if you could weigh in there.

In #3768 the proposed meaning of the run deptype is when a dependency effects environment variables other than those related to libraries.

@citibeth
Copy link
Copy Markdown
Member Author

This conflicts with the proposed meaning of the "run" deptype in #3768 - not to say this makes it incorrect but I'm wondering if you could weigh in there.

I opened #3768 because I realized we didn't really know what these deptypes mean. Either this PR or #3768 needs to change based on this conflict.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Nov 11, 2019

Closing as inactive for a while and maybe not relevant anymore. The best way to have an entire set of specs ready to be used is using "Spack environments". Feel free to reopen if you disagree.

@alalazo alalazo closed this Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants