Skip to content

enhancement proposal : customization of module files #552

Merged
tgamblin merged 31 commits intospack:developfrom
epfl-scitas:features/env_objects_flying_around
Mar 21, 2016
Merged

enhancement proposal : customization of module files #552
tgamblin merged 31 commits intospack:developfrom
epfl-scitas:features/env_objects_flying_around

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Mar 15, 2016

Objective

Being able to customize what gets written in a module file

TLDR

Currently the modifications that gets written in a module file are:

  • generic modifications to some list of paths in the environment

These modifications are obtained inspecting pkg.prefix according to the same set of rules for every package. I would like to extend this process allowing:

  • package-specific modifications to the environment (this PR)
  • site-specific modifications via a configuration file (a subsequent PR)
Current Status
  • class that manages environment modifications during both installation and module file creation (module + unit tests)
  • environment variables set in a package are written to tcl or dotkit module files
  • polished the interface that package maintainers will need to use
  • validation of environment modifications (i.e. detect conflicting changes to the same variable issued by different packages)
Examples of use
  1. openmpi and mpich packages in this PR set the environment variables for compiler wrappers differently depending on whether they are used as a dependency during an installation or outside of spack via module files
Details

The biggest change done in this PR is the introduction of a class (EnvironmentModifications) to manage environment modifications both during install and module file creation. An object of this class is essentially a FIFO queue containing requests to modify the environment. Those requests can be executed at a later time and from a scope other than where they were created (in which case the environment is modified and the list is cleared.)

Alternatively the requests can be inspected and processed. This is what is done during module file creation when each module file writer (TclModule, Dotkit) maps each single request to a formatted line and writes that line to a file.

An interesting side effect of this design is the possibility to inspect the whole set of requests done by different packages before applying them. This could permit to reveal inconsistencies (e.g. variables that are set by a package and unset by another) and output diagnostic messages to the user.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 15, 2016

@tgamblin : could you please have a look at the skeleton of the thing and tell me if the overall idea fits what you had in mind?

@citibeth
Copy link
Copy Markdown
Member

Looks likes a well-thought out idea so far. Can you please explain more about the use case and the problem(s) it solves? Thanks!

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 15, 2016

@citibeth One of the most requested features (and one that we need at our site) is the possibility to 'customize' the modulefiles that are written after a successfull installation. The customizations in which I am interested here will consist in the addition of new entries and/or in the modification of the usual entries that are added via automatic discovery of some standard directories. Also, these customizations may happen either per-package (most part of the cases) or per site.

As an example of a per-package customization you may think of an MPICH installation that could set MPICH_CC or similar variables in the modulefiles. IntelMPI would instead represent an example of a per-site customization, as the installation will be tuned to the right network fabrics, etc. setting a bunch of environment variables appropriately.

To reach the goal without creating a tight coupling among different parts of spack a viable approach would be to (imo) :

  • group together the modifications that are done to the environment (and are now plain os.environ[name] assignment within some function) into an object
  • build a mechanism that constructs one of these objects appropriately and produces the desired customizations (in another PR?)

@alalazo alalazo changed the title refactoring proposal : environment modifications stored in an object enhancement proposal : customization of module files Mar 16, 2016
env.prepend_path(variable, absolute_path)

env, inspector = EnvironmentModifications(), PathInspector()
os.path.walk(path, inspector, env)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to walk the whole prefix? For most of these I think just checking at the top level would be fine. For pkgconfig, there are a few standard locations you could check (lib/pkgconfig, lib64/pkgconfig) but I'm not sure you wan to grab all of them unconditionally. I think for some packages this might pick up stuff you don't want, like tests and such. I would leave adding the deeper directories to the package implementor --- they can extend it if they want.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgamblin Ok, will restore the previous logic. It seems in fact better to leave the responsibility of adding non-standard directories to the package maintainer.

@tgamblin
Copy link
Copy Markdown
Member

@alalazo: ok I went over this. It looks really good! I like the design; I think this is definitely the right direction. High level, my comments are basically:

  • A few suggestions to make things prettier.
  • There are a few nits with setup_environment, setup_dependent_environment, and setup_extension_environment I'm not sure about. I think we probably need more than one method here b/c the env info goes in separate module files. I think the choice to factor out the module-scope modifications as you have done is a good one, too.
  • We need to figure out how to get compiler wrapper variables like OMPI_CC to behave differently in and out of Spack. In Spack they should use Spack's compiler wrappers. Outside of Spack they should call real compilers.

I think that is it. Thanks!

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 18, 2016

@tgamblin I think I am done with the comments of your review. If you like the approach above I'll just clean the few TODO I left behind as a reminder and I will be ready to merge.

By the way, thanks for taking the time to review the code : I see that it's a one versus many battle and I imagine it's taking some effort!

tty.error("Will not uninstall %s" % e.spec.format("$_$@$%@$#", color=True))
print
print "The following packages depend on it:"
print()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalazo: this prints () in Python 2.7.11. I think it has to be print('')

@tgamblin
Copy link
Copy Markdown
Member

@alalazo: I just pushed a branch with some additions to this that I think makes the interface a bit cleaner and avoids the need to patch anything. It's in epfl-scitas-features/env_objects_flying_around. See what you think.

Basically the interface looks like this now:

    def setup_environment(self, spack_env, run_env)
    def setup_dependent_environment(self, spack_env, run_env, dependent_spec)
    def setup_dependent_python_module(self, module, dependent_spec)

The intent is to make it clear when we're modifying the compile environment and when we're modifying the runtime/module environment. To make it clear, the two env methods have two environment module objects: spack_env and run_env. One modifies the compile environment and the other modifies the run environment, and changes to the run env will go into modules. It was possible to modify the compile env for a package before by simply setting env['foo'] = 'bar' within install(), but I think this factoring is nicer because it allows the package author to use the env modifications the same as all the other env settings if they like.

modify_module is renamed to setup_dependent_python_module to be consistent with the other methods, and to distinguish the python module scope from environment modules in the name.

I also tried to rename set_env and unset_env to set and unset in EnvironmentModifications (because the _env was repetitive most places it was used) and added set_path to simplify dealing with paths (user doesn't have to call concatenate_paths).

Anyway, see what you think. I think separating this a little more makes the code more clear and avoids the need to patch.

PS: I wish Github let us both iterate on the same PR. Do you know if that is possible? I do not see how I could commit to the PR branch. Bitbucket is kind of nicer in this regard b/c it has better branch permissions and you can interact with collaborators on a branch within the main repo.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 21, 2016

@tgamblin I really like passing two different environments, in particular as it permits to avoid monkey patching the code (a behavior which would not have been obvious from user perspective). I also think that all the modifications to EnvironmentModifications effectively make the API more usable.

There are a few thoughts / remarks that I would like to share :

  1. is it still necessary to have setup_environment at all? Before the idea was that the run-time environment was responsibility of setup_environment, while the spack environment was on setup_dependent_environment. Now that two env instances are passed to setup_dependent_environment is the other function still needed?
  2. I would rename setup_dependent_python_module to setup_dependent_module, just to avoid people confusing the python module you are referring to with the python package provided by spack
  3. there are a few missing things in openmpi and mpich (basically setting run_env) + mpich call seem to have a wrong signature

If you agree with point 1. in particular I can pull your modifications here and try to simplify the PR a bit further implement the use case below.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 21, 2016

@tgamblin It seems that also edits to older messages are not notified to me. I just saw accidentaly your PS above (and that's kind of a shame, as I edit constantly my posts). If it may work for you, I've issued an invitation to our organization so to let you being able to push on PR branches.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 21, 2016

@tgamblin On a second thought having both calls may be indeed useful. Here's the use case that comes to my mind : hdf5+mpi. I think its tcl module file should have depends_on <concrete-mpi> and it's the actual mpi that should add that to the run_env in setup_dependent_environment. If you agree I'll try to implement this case here.

@tgamblin
Copy link
Copy Markdown
Member

@alalazo:

  1. yes I think we both agree now that we need both. One allow you to set up your package's environment, the other allows you to set up something in the environment of dependencies. The Python case actually makes use of this -- extensions' modules need to add their own PYTHONPATH to the environment, but it depends on a) the version of Python (the extendee) and b) the prefix of the extension.
  2. I'm not opposed to setup_dependent_module. I do worry that people will be confused by this, though, as they'll thing module refers to a TCL/Lmod module. What if we called it setup_dependent_package? Package at least is not quite as overloaded and you're really setting variables to be used by the dependent package.
  3. I was going to comment on the OMPI and MPICH stuff. I took out the run_env settings there because the compilers are patched by filter_compilers. You don't actually need to have a module set OMPI_CC or MPICC_CC. Do you think the env vars should be set anyway?

I think its tcl module file should have depends_on and it's the actual mpi that should add that to the run_env in setup_dependent_environment.

I think this is useful, although the ORNL folks (@robertdfrench, others) mentioned that they have some more generic prereq logic that they would like to see added. Do you think this capability could be added later if this PR is mostly ready to go? I know @jgalarowicz would appreciate the capabilities that are already in here.

Aslo, did you see #498? Is this a decent naming scheme? It would be nice to support @glennpj's use case here, and I think his module names are better than the current builtin ones.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 21, 2016

I think this is useful, although the ORNL folks (@robertdfrench, others) mentioned that they have some more generic prereq logic that they would like to see added. Do you think this capability could be added later if this PR is mostly ready to go?

Yes. Next on my list for module files I have :

and having this PR merged would surely help to proceed.

What if we called it setup_dependent_package

Ok.

@tgamblin
Copy link
Copy Markdown
Member

@alalazo: awesome! So is this one ready to be merged?

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 21, 2016

@tgamblin Yes, the only detail I am wondering is if in modules.py we should iterate only on extendees (I originally meant it for python and the likes) or on all the dependencies. Maybe the latter is more appropriate right now. Besides that I think we can merge.

@tgamblin
Copy link
Copy Markdown
Member

@alalazo: I think the Python package should probably iterate only on extendees, but I don't think we should add another callback function for that. Packagers can call dependent_spec.package.is_extension to check for this if they only want to affect extendees.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 21, 2016

@tgamblin Ok then. Merge! 😄

tgamblin added a commit that referenced this pull request Mar 21, 2016
…around

enhancement proposal : customization of module files
@tgamblin tgamblin merged commit 861a235 into spack:develop Mar 21, 2016
@tgamblin
Copy link
Copy Markdown
Member

Thanks for all the work on modules!

@tgamblin
Copy link
Copy Markdown
Member

@robertdfrench @jgalarowicz: you should be able to add some variables to the generated module files now.

@tgamblin
Copy link
Copy Markdown
Member

TODO: docs.

@alalazo alalazo deleted the features/env_objects_flying_around branch March 21, 2016 22:58
@tgamblin
Copy link
Copy Markdown
Member

Oops -- forgot to merge my changes here. Sigh. I'll put those in.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 21, 2016

@tgamblin Eheh, I thought you decided to merge this and then your changes from llnl branch...

@tgamblin
Copy link
Copy Markdown
Member

Fixed. ☺️

tgamblin added a commit that referenced this pull request Mar 22, 2016
…pendent_package

- Fixed in package.py
- Fixed wrong prototypes in packages that use it.
- Fixed build_environment to set module variables properly
  - added hacky fix to ensure spec/package consistency in build processes.
  - Need to think about defensive spec copy done by `Repo.get`.  May be
    time to think about an immutable spec implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants