Add multiprocessing to buildcache command by obreitwi · Pull Request #9440 · spack/spack

obreitwi · 2018-10-04T17:42:02Z

Add -w/--without-dependencies to suppress dependency resolution.
Add -j/--jobs to install/create several package from/to buildcache in parallel (defaults to 1).
Applied some flake8-fixes
Added (and compressed) tests

Reasoning:

The current buildcache implementation lacks the ability to install packages/create tarballs en bulk:

Dependencies are always checked for and included
the package index is updated after EVERY installed package instead of once after everything is installed

In order to reduce build time of our spack-based singularity container we want to pre-compute all hashes to be installed from buildcache in advance (i.e. we do not want spack to perform additional checks) and we want this installation to be possible with more than one core (decompressing is cpu-intensive and therefore not completely IO-bound).

The same goes for creating tarballs that are put into the buildcache after container creation.

Hence this commit introduces these two possibilities.

By default the behavior remains the same, i.e. spack fails after encountering its first error. When using several cores we gather all occurred errors and print them afterwards.

In theory parallelization could also be achieved by using tools such as xargs and spawning multiple python processes, but I think it would be nicer to have this possibility out of the box in spack (in the same way as you can install packages with multiple cores).

Note: We already use the proposed modifications without problems. I only got around to porting the modifications to the current develop-branch now and added some tests while I was at it.

gartung

As long as default behavior is identical to the behavior pre-change I am fine with this change.

healther · 2018-10-05T08:09:59Z

@alalazo @scheibelp could you take a look at this?

* Add -w/--without-dependencies to suppress dependency resolution. * Add -j/--jobs to install/create several package from/to buildcache in parallel (defaults to 1). * Applied some flake8-fixes * Added (and compressed) tests Reasoning: The current buildcache implementation lacks the ability to install packages/create tarballs en bulk: * Dependencies are *always* checked for and included * the package index is updated after EVERY installed package instead of once after everything is installed In order to reduce build time of our spack-based singularity container we want to pre-compute all hashes to be installed from buildcache in advance (i.e. we do not want spack to perform additional checks) and we want this installation to be possible with more than one core (decompressing is cpu-intensive and therefore not completely IO-bound). The same goes for creating tarballs that are put into the buildcache after container creation. Hence this commit introduces these two possibilities. By default the behavior remains the same, i.e. spack fails after encountering its first error. When using several nodes we gather all occurred errors and print them afterwards. The only (minor) difference is that since it is not possible to raise exceptions in multiprocessing.Pool-workers we have to convert all encountered SpackErrors to strings and print them manually. In theory parallelization could also be achieved by using tools such as xargs and spawning multiple python processes, but I think it would be nicer to have this possibility out of the box in spack (in the same way as you can install packages with multiple cores). Change-Id: I97b1c14bf7563ac9d68b904b0c38eeb08b8d210e

obreitwi · 2018-10-05T15:11:03Z

Interestingly, spack flake8 $(git diff --name-only HEAD~) did not show the style violations travis complained about…

scheibelp

Hi @obreitwi I have a few notes/requests

Most importantly, I think this will conflict with #8718, which we are prioritizing, so I recommend waiting on that before syncing
I'm not sure why you are overriding Process to disable daemon functionality - could you explain why you're doing that?
Comments include a few suggestions for improving robustness

scheibelp · 2018-11-13T17:29:16Z

lib/spack/spack/cmd/buildcache.py

-                    spec.dag_hash(7) +
-                    ' to install "%s"' %
-                    spec.format())
+                    spec.dag_hash(7) + ' to install "%s"' % spec.format())


As long as flake wasn't complaining before, I prefer that style changes be minimized (minimally it would be useful to separate out style changes into a separate commit)

I did these changes in response to Travis-complaints but will separate them out into another commit if they should re-appear in the rewrite.

scheibelp · 2018-11-13T20:14:24Z

lib/spack/llnl/util/multiproc.py

+
+class NoDaemonProcess(Process):
+    """Daemon processes are not allowed to create child processes.
+    Yet Sometimes that feature is needed."""


It would be best to mention when it is needed - ideally whatever reason this is being used for in Spack (for example if there are many, just list the most relevant ones for its use in Spack).

See my reply below. Will update the comment string in the rework.

scheibelp · 2018-11-13T20:17:08Z

lib/spack/llnl/util/multiproc.py

+    def _get_daemon(self):
+        return False
+
+    def _set_daemon(self, value):


I'm leery of preventing processes from becoming Daemons by overriding and treating this as a noop. Are you trying to prevent multiprocessing.Pool from doing this automatically? I didn't get that sense from skimming https://docs.python.org/2/library/multiprocessing.html

If somewhere in the Spack code is setting processes as daemons, I think it would be good to address that directly rather than intercepting and dropping the daemon requests at this level

It is the other way around - daemon just means processes can't have children and are cleaned up along with their parent process (see below). We need the processes to have children for unpacking packages → daemon-flag needs to be set to False.

scheibelp · 2018-11-13T20:25:30Z

lib/spack/spack/cmd/buildcache.py

+        # currently, specs cause an infinite recursion bug when pickled
+        # -> as multiprocessing uses pickle internally, we need to transform
+        #    specs prior to distributing the work via worker pool
+        specs = [s.to_dict() for s in specs]


I think you'll want to set all_deps=True for to_dict so that build deps are included in this

scheibelp · 2018-11-13T20:28:30Z

lib/spack/spack/cmd/buildcache.py

 # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 ##############################################################################
 import argparse
+import copy


@obreitwi @scottwittenburg this and #8718 will likely conflict, I haven't yet determined how extensively. We are prioritizing #8718 because it is helping us do more extensive testing of building packages with Spack.

The current PR can already not be trivially rebased on top of develop, so I'll wait until #8718 is merged before continuing here.

In principle, though, I don't see any reason why this PR could not be implemented on top of #8718 after quickly reading through the proposed changes.

scheibelp · 2018-11-13T20:42:01Z

lib/spack/spack/test/packaging.py

+        args = ['create', '-d', mirror_path, "-j", jobs, str(spec)]
+        if not signed:
+            args.insert(-1, "-u")
+        elif not key_created:


It appears key_created may not be defined at this point (reading to the beginning of the function, there appear to be code paths where it isn't set)

I did not touch the gpg-key-handling with my PR in any way - this could be some side-effects from rebasing - for the rework I'll again rebase our working solution on to the current develop branch.

scheibelp · 2018-11-14T16:42:51Z

lib/spack/spack/cmd/buildcache.py

+
+        pool = NoDaemonPool(args.jobs)
+        retvals = pool.map(f_create, specs, chunksize=1)
+        # TODO: every f_create-call updates the index -> unnecessary


Is this still a TODO? The last two lines are phrased as though the TODO is handled.

Ah, yes, it is handled, will remove the TODO..

obreitwi · 2018-11-15T11:55:14Z

Most importantly, I think this will conflict with #8718, which we are prioritizing, so I recommend waiting on that before syncing

Agreed! As far as I can tell I would also need to do some work when rebasing to the current develop-branch.

I'm not sure why you are overriding Process to disable daemon functionality - could you explain why you're doing that?

Quoting the documentation:

When a process exits, it attempts to terminate all of its daemonic child processes.

Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-daemonic processes have exited.

As described here, multiprocessing.Pool uses "daemonized" processes so that they get cleaned up when the parent process dies.
When performing buildcache-operations, however, spack spawns additional processes (to (un-)zip packages etc). Hence we need to disable the daemon flag for the workers to do their job. In the rework (after #8718) I'll make sure that the worker threads are properly join()ed. However, in practice that has not been an issue for us.

Comments include a few suggestions for improving robustness

Will answer them in place.

obreitwi · 2018-11-15T12:08:43Z

Organisational question: Should I rebase this PR or open a new one for the new rebase/rework?

scheibelp · 2018-11-15T18:12:09Z

Organisational question: Should I rebase this PR or open a new one for the new rebase/rework?

Both approaches are OK with me. If you think the changes will be significant, I think it would be reasonable to close this and then refer to it in a new PR - we've done that with many Spack PRs and I think it can be a useful way to refocus the conversation.

gartung · 2019-03-01T02:35:16Z

@obreitwi sorry for the slew of commits. I was trying to resolve conflicts using the web interface which led to all sorts of issues that needed to be fixed.

obreitwi · 2019-03-01T13:39:45Z

@obreitwi sorry for the slew of commits. I was trying to resolve conflicts using the web interface which led to all sorts of issues that needed to be fixed.

No problem. I was meaning to ask what the current state regarding these change is? It seems #8718 (or rather: #10274) is merged. Does that mean I can start with the rework or are you already in the process of rewriting these changes, @gartung ?

gartung · 2019-03-01T13:49:03Z

I just merged #10756
You can rebase on that or I can give it a try.

gartung · 2019-03-01T14:12:17Z

@obreitwi can you please reset the head of your branch to
eaa8ba4
and force push

obreitwi · 2019-03-01T14:30:05Z

@obreitwi can you please reset the head of your branch to
eaa8ba4
and force push

Done.

obreitwi · 2019-03-01T14:35:23Z

I just merged #10756
You can rebase on that or I can give it a try.

I don't mind either way. We recently discovered (after almost 6 months of using the feature) that installing a lot of packages with many processes takes a lot of memory and I have the suspicion that either there is a memory leak somewhere or one can improve memory usage drastically.

I would prefer concluding my investigation prior to doing the rework. But since this only occurs when installing ~400 packages at the same time I don't know if it is a show-stopper.

alalazo · 2020-08-19T06:56:01Z

I would prefer concluding my investigation prior to doing the rework. But since this only occurs when installing ~400 packages at the same time I don't know if it is a show-stopper.

Can you update on the state of this PR?

obreitwi · 2020-08-19T09:26:57Z

Can you update on the state of this PR?

It is effectively dead, as even with multiprocessing processing all files in Python to apply RPATH-related transformations was much too slow for us. As we do not need to take care of RPATHs in our setup, we have since switched to a simple chain of plain old unix tools to implement caching.

alalazo · 2020-08-19T09:28:23Z

@obreitwi Thanks for getting back so quickly!

gartung approved these changes Oct 4, 2018

View reviewed changes

gartung reviewed Oct 4, 2018

View reviewed changes

obreitwi force-pushed the feature/buildcache_multiproc branch from f351742 to eaa8ba4 Compare October 5, 2018 15:09

healther mentioned this pull request Oct 9, 2018

Travis jobs update their system #9464

Closed

gartung added the buildcache label Oct 12, 2018

scheibelp requested changes Nov 14, 2018

View reviewed changes

gartung mentioned this pull request Feb 15, 2019

Buildache install fix: replace placeholder and origin install paths in rpath when relocating binaries #10287

Closed

obreitwi force-pushed the feature/buildcache_multiproc branch from 25aa30b to eaa8ba4 Compare March 1, 2019 14:29

alalazo added the blocked label Nov 21, 2019

obreitwi mentioned this pull request Feb 25, 2020

remove catch-all exception handling in buildcache command #15185

Merged

obreitwi closed this Aug 19, 2020

Conversation

obreitwi commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reasoning:

Uh oh!

gartung left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

healther commented Oct 5, 2018

Uh oh!

obreitwi commented Oct 5, 2018

Uh oh!

scheibelp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

obreitwi commented Nov 15, 2018

Uh oh!

obreitwi commented Nov 15, 2018

Uh oh!

scheibelp commented Nov 15, 2018

Uh oh!

gartung commented Mar 1, 2019

Uh oh!

obreitwi commented Mar 1, 2019

Uh oh!

gartung commented Mar 1, 2019

Uh oh!

gartung commented Mar 1, 2019

Uh oh!

obreitwi commented Mar 1, 2019

Uh oh!

obreitwi commented Mar 1, 2019

Uh oh!

alalazo commented Aug 19, 2020

Uh oh!

obreitwi commented Aug 19, 2020

Uh oh!

alalazo commented Aug 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

obreitwi commented Oct 4, 2018 •

edited

Loading

gartung left a comment •

edited

Loading